ArticlePDF Available

The first draft genome of feather grasses using SMRT sequencing and its implications in molecular studies of Stipa

Springer Nature
Scientific Reports
Authors:

Abstract and Figures

The Eurasian plant Stipa capillata is the most widespread species within feather grasses. Many taxa of the genus are dominants in steppe plant communities and can be used for their classification and in studies related to climate change. Moreover, some species are of economic importance mainly as fodder plants and can be used for soil remediation processes. Although large-scale molecular data has begun to appear, there is still no complete or draft genome for any Stipa species. Thus, here we present a single-molecule long-read sequencing dataset generated using the Pacific Biosciences Sequel System. A draft genome of about 1004 Mb was obtained with a contig N50 length of 351 kb. Importantly, here we report 81,224 annotated protein-coding genes, present 77,614 perfect and 58 unique imperfect SSRs, reveal the putative allopolyploid nature of S. capillata, investigate the evolutionary history of the genus, demonstrate structural heteroplasmy of the chloroplast genome and announce for the first time the mitochondrial genome in Stipa. The assembled nuclear, mitochondrial and chloroplast genomes provide a significant source of genetic data for further works on phylogeny, hybridisation and population studies within Stipa and the grass family Poaceae.
Content may be subject to copyright.

Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports
The rst draft genome of feather
grasses using SMRT sequencing
and its implications in molecular
studies of Stipa
Evgenii Baiakhmetov1,2*, Cervin Guyomar3,4, Ekaterina Shelest3,5, Marcin Nobis1,2* &
Polina D. Gudkova2,6
The Eurasian plant Stipa capillata is the most widespread species within feather grasses. Many taxa
of the genus are dominants in steppe plant communities and can be used for their classication and
in studies related to climate change. Moreover, some species are of economic importance mainly as
fodder plants and can be used for soil remediation processes. Although large-scale molecular data
has begun to appear, there is still no complete or draft genome for any Stipa species. Thus, here we
present a single-molecule long-read sequencing dataset generated using the Pacic Biosciences
Sequel System. A draft genome of about 1004 Mb was obtained with a contig N50 length of 351 kb.
Importantly, here we report 81,224 annotated protein-coding genes, present 77,614 perfect and
58 unique imperfect SSRs, reveal the putative allopolyploid nature of S. capillata, investigate
the evolutionary history of the genus, demonstrate structural heteroplasmy of the chloroplast
genome and announce for the rst time the mitochondrial genome in Stipa. The assembled nuclear,
mitochondrial and chloroplast genomes provide a signicant source of genetic data for further works
on phylogeny, hybridisation and population studies within Stipa and the grass family Poaceae.
In the year 2000, the Arabidopsis thaliana L. genome became the rst plant genome to be completely sequenced
and assembled1. Since then, many genomes from the plant kingdom have been sequenced, e.g. green algae2,3,
bryophytes4,5, ferns6, gymnosperms7,8 and angiosperms9,10. In the grass family (Poaceae) the reference assemblies
were primarily obtained for crops1113 and model plants1416. e advent of second-generation sequencing and
the subsequent decreasing of the overall sequencing costs have enabled the determination of whole genome
sequences in many non-model plant species1720.
Recently, the 1KP project that was aiming to sequence 1,000 green plant transcriptomes2123 has been followed
by the 10KP project24. e later initiative intends to sequence complete genomes from more than 10,000 plants
and protists. e project is supposed to be completed in 2023 and it presumes to provide family-level high-quality
reference genomes, ideally with chromosome-scale assemblies. Nevertheless, the data at the level of genera may
not be processed immediately24. In comparison to other kingdoms, plants have very large genomes13,25,26, high
ploidy level27 and the abundance of repetitive sequences2830. Currently, to face these issues, the third-generation
sequencing has been applied. e so-called single-molecule real-time (SMRT) sequencing provided by Pacic
Biosciences (PacBio)31 and nanopore sequencing by Oxford Nanopore Technologies32 aord a range of benets,
including exceptionally long-read lengths (20kb or more), resolving extremely repetitive and GC-rich regions
and direct variant phasing32,33.
In the fossil record Stipa L., or a close relative genus, is known from about 34 Mya of the upper Eocene34,35.
For many decades, Stipa has been described as a genus with over 300 species common in steppe zones of Eurasia,
North Africa, Australia and the Americas36,37. According to the recent studies based on both morphological and
molecular data, the genus has been reduced and currently includes over 150 species geographically conned to
OPEN
Research
German
Institute for Genetics,
      
France.            Department
           * 
doctoral.uj.edu.pl
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
Europe, Asia and North Africa3842. Most species of Stipa are dominants and/or subdominants in steppe plant
communities4345 and can be used for their classication46. Moreover, some species are of economic importance
mainly as pasture and fodder plants, especially in the early phases of vegetation36,47, they can be used for soil
remediation processes48,49, in studies related to climate change5052 and as ornamental plants (e.g. S. capillata L.,
S. pulcherrima K. Koch, S. pennata L.).
In recent years, large-scale molecular data began to appear for Stipa: de novo transcriptome assemblies of S.
purpurea Griseb.50,53, S. grandis P. A. Smirn.54 and S. lagascae Roem. & Schult.52, whole chloroplast genomes for
19 taxa57 and raw genomic data available via the NCBI Sequence Read Archive (SRA) for S. capillata58 and S.
breviora Griseb.59. In addition, nucleolar organising regions (NORs) were sequenced for six Stipa taxa60. Nev-
ertheless, no complete or dra genome assembly currently exists for any Stipa species. In order to ll this gap,
here we aim to: (1) present for the rst time a single-molecule long-read dataset (nuclear, mitochondrial and
chloroplast genomes) generated using the SMRT sequencing on the PacBio Sequel platform; (2) demonstrate
and discuss the potential usage of this data in further studies of Stipa.
For the goals of the study we chose to sequence the entire genome of S. capillata (Fig.1) as it is the most
widespread taxon within the genus, growing on sandy to loamy, nutrient poor soils in the dry grasslands of
Eurasia61. Currently, this species is increasingly attracting the interest of conservation biologists due to its large
distribution range, common occurrence in the Eurasian steppes and pseudosteppes, a limited number of refugia
in Europe and both great morphological and genetic variability within its range6264.
Results
Assembled nuclear genome. e SMRT sequencing yielded in 23.16-fold genome coverage consisting of
25.84Gb sequence data with an N50 read length of 17,096bp (Supplementary TableS1). De novo assembling of
PacBio reads using Flye v.2.465,66 resulted in a genome size of 1,004 Mb67 with a contig N50 of 351kb and a GC
level of 45.97%. On the other hand, another de novo assembly performed with FALCON v.0.2.568 demonstrated
a smaller genome size of 773Mb with a GC level of 46.04%. However, the Flye assembly has a better N50 of
350,543 that is almost three times bigger than for FALCON. In case of applying Purge Haplotigs v1.1.169, the nal
genome size was reduced by 177Mb with an N50 of 381,155 (Table1) and a GC level of 45.82%.
Figure1. A representative individual of Stipa capillata.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
e subsequent analysis based on a benchmark of 4,896 conserved genes belonging to the Poales order
(dataset poales_odb10) revealed that the Flye assembly has 4,557 (93.10%) completed BUSCO (Benchmarking
Universal Single-Copy) genes and only 293 (6%) missing BUSCOs versus 2,765 (56.50%) and 1,945 (39.70%)
for the FALCON assembly. e Flye assembly aer Purge Haplotigs shows 4,304 (87.90%) completed BUSCOs
and 512 (10.50%) missing BUSCOs (Table2).
Scaolding of contigs. Nearly all contigs of S. capillata genome can be assigned to the reference chro-
mosomes of Brachypodium distachyon L., Hordeum vulgare L. and Aegilops tauschii Coss., whereas genomes of
Oryza sativa L. and especially Triticum aestivum L., have much less homology to the feathergrass assembly. In
particular, 95.16% contigs of S. capillata genome were assigned to seven chromosomes of A. tauschii genome,
94.68% to ve chromosomes of B. distachyon, 94.20% to seven chromosomes of H. vulgare, 89.92% to 12 chro-
mosomes of O. sativa and only 41.67% to 21 chromosomes of T. aestivum.e total length of non-assigned
contigs was reasonably low for A. tauschii (48.59Mb), B. distachyon (53.40Mb) and H. vulgare (58.17Mb),
whereas for O. sativa and T. aestivum it was about 101.15Mb and 585.40Mb, respectively (Table3). In addi-
tion, the RaGOO grouping condence and orientation condence scores per chromosome ranged from 57.81
to 76.11% and from 80.03 to 95.11%, respectively, indicating that the contigs could be placed on a chromosome
with an acceptable level of condence (Supplementary TableS2). e only exception is T. aestivum for which
scores ranged from 30.49 to 47.76% for the grouping condence score and from 57.81 to 70.19% for the orienta-
tion condence score. Nevertheless, based on the location condence score, the exact position of the contigs on
a chromosome could not be accurately estimated, reecting a low level of synteny to the reference genomes. In
Table 1. Statistics of the nuclear genome assemblies.
Metrics Flye assembly FALCON assembly Flye assembly aer Purge Haplotigs
Length of assembly, bases 1,003,531,354 773,212,558 826,891,869
Number of sequences 5,931 885 3,683
Largest length of a sequence, bases 2,321,367 590,564 2,321,367
Average length of sequences, bases 169,201 88,015 224,516
N50, bases 350,543 119,836 381,155
Number of sequences with N50 837 2,061 640
N100, bases 1,001 20,078 1,014
Table 2. BUSCO statistics.
Metrics Flye assembly FALCON assembly Flye assembly aer Purge Haplotigs
Complete BUSCOs 4,557 (93.10%) 2,765 (56.50%) 4,304 (87.90%)
Complete and single-copy BUSCOs 2,383 (48.70%) 2,408 (49.20%) 2,916 (59.60%)
Complete and duplicated BUSCOs 2,174 (44.40%) 357 (7.30%) 1,388 (28.30%)
Fragmented BUSCOs 46 (0.90%) 186 (3.80%) 80 (1.60%)
Missing BUSCOs 293 (6%) 1,945 (39.70%) 512 (10.50%)
Total BUSCO groups searched 4,896 (100%) 4,896 (100%) 4,896 (100%)
Table 3. RaGOO statistics.
Species Number of chromosomes (n) Number and the total length of contigs
assigned to the reference Number and the total length of non-
assigned contigs
B. distachyon70 54,061 (950.13Mb) 1,871 (53.40Mb)
94.68% 5.32%
H. vulgare71 74,036 (945.36Mb) 1,896 (58.17Mb)
94.20% 5.80%
A. tauschii72 74,161 (954.95Mb) 1,771 (48.59Mb)
95.16% 4.84%
O. sativa73 12 3,477 (902.39Mb) 2,455 (101.15Mb)
89.92% 10.08%
T. a est iv um74 21 2,434 (418.14Mb) 3,498 (585.40Mb)
41.67% 58.33%
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
particular, the score was in a range of 31.30–43.66% for O. sativa, 26.06–39.13% for B. distachyon, 19.56–31.41%
for H. vulgare, 17.47–24.15% for A. tauschii and 10.30–38.23% for T. aestivum.
Transposable elements and nuclear genome annotation. Identication of transposable elements
(TEs) revealed that more than half of the S. capillata genome (57.68%) is occupied by repetitive sequences.
Particularly, retrotransposons represent at least 16.12% and transposons are reaching no less than 7.22% of the
genome. Nonetheless, 34.34% of TEs are currently unclassied. Among classied repeats, long terminal repeats
(LTRs) were the most abundant elements within retrotransposons, whereas Tourist/Harbinger elements were
more common amid DNA-transposons. In total, 114,826 sequences were identied as simple repeats and occupy
0.57% of the genome. In addition, rolling-circles (0.28% of the genome) and low complexity sequences (0.11%
of the genome) were found (Table4).
e subsequent structural annotation of the masked genome revealed 53,535 nuclear genes (Supplementary
File 1). On the other hand, the unmasked genome has 154,755 structurally annotated genes and 94,237 of them
have BLAST hits in the NCBI non-redundant database. Nonetheless, among the 94,237 genes of the unmasked
genome, 12,094 sequences are related to transposable elements. In particular, 2,925 genes associated with trans-
posons, and 9,859 assigned to retrotransposons. In addition, 229 genes encode transposase-related proteins.
us, except transposable elements the unmasked genome has 81,224 genes that can be associated with already
known proteins (Supplementary File 2).
SSR markers. In total, 77,614 perfect repeat motifs were identied for the nuclear genome assembly using
Krait75 (Supplementary File 3). Within those, di- and tri-nucleotides were the most common types, accounting
28,365 (36.55%) and 25,794 (33.23%) repeats, respectively. Tetra-nucleotide motifs were the third most abundant
repeats with 9,777 SSRs (12.60%), followed by mono-nucleotides with 6,572 SSRs (8.47%) and penta-nucleotides
with 4,629 SSRs (5.96%). Hexa-nucleotides were the rarest motifs with 2,477 SSRs (3.19%). Only four mono-
nucleotide, four di-nucleotide and three tetra-nucleotide motifs were found in the mitochondrial and chloro-
plast genomes. However, a total length of those SSRs was in a range of 12–16bp. In addition, in total 58 unique
repeats present only in a single copy in a range 101–325bp were retrieved from the analysis of TEs. Within those
were four hexa-, 35 hepta-, nine octa-, ve nona- and ve deca- nucleotide motifs (Supplementary TableS3).
Divergence time of Stipa. e Bayesian phylogenetic reconstruction based on the ve loci within NORs
revealed the divergence time of Stipa from Brachypodium around 30.00–35.52 Mya and the putative origin of
feather grasses about 2.90–6.02 Mya (Fig.2). Although not all branches were well supported within the genus,
the current analysis conrmed the monophyly of Stipa and the general grouping of the analysed species regard-
ing their taxonomic positions. In particular, S. capillata and S. grandis represent the section Leiostipa Dumort; S.
magnica Junge, S. narynica Nobis, S. lipskyi Roshev. and S. caucasica Schmalh. belong to the section Smirnovia
Tzvelev. e remaining three groups include (1) S. orientalis Trin. and S. pennata L., (2) S. richteriana Kar. & Kir.,
S. lessingiana Trin. & Rupr., S. heptapotamica Golosk. and S. korshinskyi Roshev, (3) S. lagascae and S. breviora
currently have a discrepancy between morphological and molecular data. In addition, the divergence time esti-
mation indicates that the potential origin of the clade comprising S. capillata and S. grandis is in a range of 0.67–
2.93 Mya while the sister clade has the 95% credibility intervals for that parameter in a range of 2.38–4.78 Mya.
Furthermore, the lowest genetic divergence time was registered for S. lessingiana and S. richteriana (0.00–0.48
Mya) as well as for the split between S. heptapotamica and the two above-mentioned species (0.01–0.78 Mya).
e divergence times for the rest of taxa are present in Table5.
Table 4. Statistics of repetitive elements.
Type of repeats Number of elements Total (bp) % of genome
Class I: Retrotransposon: 123,524 161,756,598 16.12
 SINEs 6,211 2,422,254 0.24
 LINEs 26,453 19,189,619 1.91
 LTR elements 90,860 140,144,725 13.97
Class II: DNA-transposon: 99,245 72,448,468 7.22
 Hobo-Activator 6,824 3,826,368 0.38
 Tc1-IS630-Pogo 619 500,988 0.05
 PiggyBac 1 75 0.00
 Tourist/Harbinger 11,326 3,980,231 0.40
 Other 2 113 0.00
Unclassied 758,908 344,622,074 34.34
Total repeats 981,677 578,827,140 57.68
Rolling-circles 3,306 2,797,158 0.28
Low complexity 18,762 1,145,428 0.11
Simple repeats 114,826 5,716,291 0.57
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
Figure2. Phylogeny and divergence time estimation by molecular clock analysis. Letters at each node refer
to Table5. Numbers in brackets represent the Bayesian posterior probabilities (BPP > 0.50 only). e blue
rectangles on the nodes indicate the 95% credibility intervals (CI) of the estimated posterior distributions of the
divergence times. e red circles indicate the presumed divergence time splits set as a reference. e scale on
the bottom shows divergence time in Mya. e gure was created using Figtree v1.4.4, https:// tree. bio. ed. ac. uk/
sow are/ gtr ee/.
Table 5. Node ages, BPP and CI related to Fig.2.
Node Node age (Mya) BPP 95% CI
A 48.59 1.00 44.53–52.78
B 32.77 1.00 30.00–35.52
C 4.39 1.00 2.90–6.02
D 3.55 0.40 2.38–4.78
E 3.02 0.28 1.95–4.14
F 2.26 0.40 1.21–3.40
G 2.15 0.63 1.05–3.32
H 2.04 1.00 1.15–3.02
I 1.77 0.85 0.76–2.87
J 1.73 1.00 0.67–2.93
K 1.56 0.96 0.81–2.38
L 0.91 0.67 0.28–1.60
M 0.71 1.00 0.11–1.46
N 0.33 0.39 0.01–0.78
O 0.16 0.28 0.00–0.48
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
Assembled mitochondrial and chloroplast genomes. e resulting Flye assembly contained four
mitochondrial contigs with a total length of 438,037 bp7679 represented by six edges and an entire 137,832bp-
long circular chloroplast genome combining a long single copy region (LSC) of 81,710bp, a short single copy
region (SSC) of 12,836bp and two inverted repeats (IR) of 21,643bp each (Fig.3). However, aer a manual
checking in IGV v.2.8.680 the nal size of the chloroplast genome was slightly reduced to 137,823bp. In addi-
tion, an analysis using Cp-hap81 detected two structural haplotypes of the chloroplast genome: haplotype A82
(LSC—IR, reverse-complement (rc)—SSCrc—IR) and haplotype B83 (LSC—IRrc—SSC—IR). We also obtained
one assembly using Unicycler v.0.4.884 resulted in 76 linear contigs from which 29 can be assigned to mitochon-
drial sequences with a total length of 1,668,569bp. Due to the Unicycler assembly being more complex and none
of the obtained contigs were likely to be circular in nature, for the downstream genome annotation we used the
Flye assembly.
In total, 112 and 133 genes were functionally annotated for mitochondrial and chloroplast genomes, respec-
tively. e mitochondrial annotation resulted in 78 protein-coding genes, 4 ribosomal RNA genes and 30 tRNA
genes. e chloroplast annotation contained 85 protein-coding genes, 8 ribosomal RNA genes and 40 tRNA
genes. e chloroplast genome size of 137,823bp generated with Flye and the number of annotated genes in the
current study were similar to the known assemblies for S. capillata obtained by Illumina sequencing57. However,
the previous genome assemblies were slightly longer, specically 137,830 bp86 and 137,835 bp87.
DArTseq markers. e DArT pipeline analysis resulted in 61,328 Silico markers and in 52,970 sequences
with SNPs. e BLAST process revealed 58,701 Silico markers and 52,252 sequences with SNPs that were suc-
cessfully mapped to 4,361 and 3,935 genome contigs, respectively. us, the current genome assembly has
95.72% of Silico markers and 98.64% of sequences with SNPs that are represented in 73.52% (the total length of
969.30Mb) and 66.34% (940.37Mb) of the contigs, respectively. In addition, we established that 50,953 Silico
markers and 47,181 sequences with SNPs were present only in a single copy in the genome. Finally, we identied
30 Silico markers and 10 sequences with SNPs aligned to the mitochondrial genome and only 2 Silico markers
and 4 sequences with SNPs that were found in the chloroplast genome.
Discussion
e number of sequenced plant genomes is rapidly increasing year by year serving as a fundamental resource
for various genomic studies. In the current work, we present a 1004Mb genome with the 23 × coverage of the
most widespread feather grass species, S. capillata, using SMRT PacBio sequencing. e current assembly com-
prises 5,931 sequences with a contig N50 length of 351kb (Table1). e BUSCO completeness score of 93.10%
(Table2), the observation of a large portion of TEs (57.68%, Table4) and the presence of Silico (95.72%) and
SNPs (98.64%) markers derived from the DArT platform indicate that the assembly is of high quality. Moreover,
the proportion of TEs has been reported for the rst time in the genus due to the previous de novo assemblies
which were performed exclusively based on transcriptomic data50,52,54. In addition, here we also attempted to
perform a reference-guided scaolding of the assembled contigs. Nevertheless, although nearly all contigs of
the S. capillata genome were assigned to the chromosomes of B. distachyon, H. vulgare and A. tauschii, it was
not possible to estimate their proper position on the reference with an acceptable level of condence (Table3
Figure3. Visualisation of the de novo mitochondrial and chloroplast genome assemblies using Bandage
v.0.8.185. (a) Contigs representing mitochondrion. (b) Contig representing chloroplast. Dierent colours
represent dierent contigs; length (in bp) and coverage (x) of edges within contigs are shown. e gure was
created using Bandage v.0.8.1, https:// rrwick. github. io/ Banda ge/.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
and Supplementary TableS2). In general, in the absence of a high-density genetic linkage map the task of recon-
structing pseudomolecules of chromosomes seems to be challenging. On the other hand, we believe that in order
to improve the contiguity of the long-read assembly the high-throughput chromosome conformation capture
(Hi-C)88 technique should be applied. Currently, many studies on non-model species successfully utilised a
combination of long-read techniques and Hi-C data to perform assemblies at chromosome scale8991. Moreover,
an additional key for improving this genome assembly in the future is merely to get more sequencing reads.
Recently, it was shown that contig length metrics are positively correlated with both read length and sequence
coverage. Specically, long-read assemblies in maize demonstrated that the highest contig N50 of 24.54Mb was
reached with a subread N50 of 21,166bp and a 75-fold depth of coverage while the longest contig of 79.68Mb
was observed with the same subread N50 but with a 60-fold depth92.
e newly generated genome has a GC content of 45.97% that is similar to the known estimates for species
in Stipa varying in a range of 46.61–49.05%93, and more broadly to grasses ranging from 43.57% in O. sativa to
46.90% in Z. mays94. Recently, it was shown that a higher GC content in monocots is associated with adaptation
to extremely cold and/or dry climates95. e genus Stipa highly supports this hypothesis due to the fact that
all feather grasses are adapted to temperate, dry climates36. In addition, a positive correlation between the GC
content and genome size was established96 suggesting insertion of LTR retrotransposons as a potential driving
force of genome enlargement97. Similarly, here we showed that the expansion of the S. capillata genome also
resulted from insertions of repetitive sequences that occupy 57.68% of the genome including LTR retrotrans-
posons (13.97%). However, among all repetitive sequences around 34.34% are currently unclassied (Table4).
Nonetheless, the total proportion of TEs in S. capillata in comparison to other species within the Poaceae family
is close to Oryza minuta J. Presl (58.35%) and O. alta Swallen98 (57.54%), bigger than in B. distachyon99 (28.10%)
and O. sativa100 (45.52%) and smaller than in O. granulata Nees & Arn.101 (67.96%), Avena sativa L.102 (69.47%)
and T. aestivum103 (84.67%).
Importantly, the presented genome size is roughly twice smaller than the expected size of 2,355Mb and twice
bigger than the expected monoploid size of 589Mb estimated using ow cytometry93. Considering that we were
unable to remove redundant sequences due to possible heterozygosity and the number of duplicated BUSCOs
(Tables1 and 2), it may be presumed that the current genome assembly combines two very distinct genomes.
To the current knowledge, the vast majority of Stipa species have 44 (2n = 4x) chromosomes and are supposed
to be tetraploids41,104. In addition, recently it was shown that a single-copy region ACC1 and a low-copy nuclear
gene At103 have two dierent copies in Stipa104,105. us, it may suggest that S. capillata, and the genus Stipa in
general, has arisen through hybridisation between genetically distant diploid species (2n = 22) and the subsequent
allopolyploidisation via whole genome duplication (WGD) rather than via one WGD event of an ancestral spe-
cies. Well-documented examples of natural allopolyploid taxa in the Pooideae subfamily are Triticum turgidum
L. (2n = 4x = 28, genome constitution AABB) and T. a estivum (2n = 6x = 42, AABBDD) formed through hybridi-
sation and successive chromosome doubling of ancestral diploid species T. urartu (2n = 2x = 14, AA), Aegilops
speltoides Tausch. (2n = 2x = 14, BB) and A. tauschii (2n = 2x = 14, DD)106. Moreover, in the tribe Stipeae based on
the At103 gene allopolyploidy was reported for the genus Patis Ohwi (2n = 46, 48)105. Heretofore, at least three
hypotheses were considered regarding the base chromosome number in Stipeae: x = 7107, x = 11108,109 and x = 12110.
Recently, it was suggested that the latter two are more plausible41,104. us, in order to better assemble the S. capil-
lata genome and verify if Stipa is an allopolyploid genus we suggest sequencing at chromosome level the close
relative diploid species (2n = 22) from genera representing, e.g. Ptilagrostis Griseb., Achnatherum P. Beauv., e.g.
A. calamagrostis L. (2n = 22 + 02B), or Piptatheropsis Romasch., P. M. Peterson & Soreng (2n = 20, 22, 24)41,104.
In general, the number of genes in Poaceae varies from 28,835 in the smallest known genome, Oropetium
thomaeum Trin. (2n = 20; genome size of 245Mb)111, to 107,891 in T. aestivum (2n = 42; 14,547Mb)112. Here, we
reported 53,535 nuclear genes that were structurally annotated for the masked genome assembly. Such a num-
ber of genes was roughly 1.8 and 1.6 times smaller than previously determined for S. grandis (94,674 genes)54
and S. purpurea (84,298 genes)50, respectively. On the other hand, the annotation analysis of the unmasked
genome resulted in 81,224 genes associated with already known proteins. In comparison, only 65,047 function-
ally annotated genes were reported for S. grandis while S. purpurea had 58,966. Nonetheless, as RNA-seq data is
currently unavailable for S. capillata, we believe that the current version of the genome annotation demands a
further investigation to properly characterise the genes sets when the appropriate information will be available.
SSR markers are widely distributed across the genome and they are commonly applied in establishing genetic
structure in Stipa. Previously, polymorphic microsatellite primers were reported in populations of S. purpurea
(11113, 15114 and 29115 loci), S. pennata (7 loci116), S. breviora (21 loci117) and S. glareosa (9 loci118). In the pre-
sent study, we identied 77,614 perfect SSR markers (Supplementary File 3) and 58 imperfect repeat motifs
presented only in a single copy (Supplementary TableS3). Although we did not test them on the population
level we are condent that such a number of new loci will be a valuable source for the farther development of
SSR markers in S. capillata, and more generally in the genus Stipa. Additionally, the revealed loci could be used
for the designing dominant inter simple sequence repeat (ISSR) markers119. Recently, the usefulness of applying
ISSRs were shown for studies in S. bungeana120, S. ucrainica and S. zalesskii121, S. tenacissima122 and the hybrid
complex S. heptapotamica123.
According to the previous studies, based on three chloroplast loci124 and four chloroplast loci and one nuclear
region105, it was shown that the origin of Stipeae can be estimated in a range of 30.60–47.30 Mya and 21.20–39
Mya, respectively. Here, based on the ve loci within NORs we demonstrated that the potential split between
Stipa representing the tribe Stipeae and Brachypodium (the tribe Brachypodieae) took place approximately
30–35.52 Mya that supports the previous ndings105,124,125. e present results also suggest that the genus Stipa
likely originated ca. 4.39 (2.90–6.02) Mya. On the other hand, one previous study indicated the origin of feather
grasses at about 12.90 Mya124 while another one showed dierent estimates based on chloroplast loci (21.20
Mya, 13–22) and the At103 region105. Specically, two copies of At103 had the following suggested ages: 15.78
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
(6.30–26.60) Mya for the Eurasian Stipeae lineage and 5.62 (0–6.50) Mya for the American Stipeae lineage105.
us, the latter estimate is close enough to the origin-age calculated in the current study. In addition, our data
on the divergence time among S. richteriana, S. lessingiana and S. heptapotamica (Fig.2 and Table5) conforms
to the previous ndings on the ongoing hybridisation among these taxa123 suggesting NORs as a useful tool
for revealing species of putative hybrid origin. Nonetheless, we believe that the current and previous estimates
regarding the origin of Stipa should be treated with caution. Firstly, to our knowledge, there is still no available
fossil data for any Stipa species from the Old World that can properly calibrate the historical diversication in the
genus. Currently, the earliest denite Stipa caryopses were found in central Poland and are dated ca. 4,000 BC126.
Secondly, available data demonstrate incongruence between chloroplast and nuclear loci analyses. In further
studies we suggest utilising single-copy nuclear genes derived from whole genome sequencing projects. irdly,
dierent sets of species and parameters used for inferring diversication dates may result in dierent estimates127.
Finally, we report a 137,823bp chloroplast genome that is similar to the known assemblies in Stipa and spe-
cically in S. capillata57. Here we highlight the applicability of a long-read sequencing technology like PacBio
for the straightforward assembling of plastomes using Flye67,68. In addition, due to the long-reads we were able
to identify two haplotypes presented in S. capillata. is result supports the previous ndings in Poaceae81 sug-
gesting that plastome structural heteroplasmy can be a common state in feather grasses. Moreover, for the rst
time in the genus Stipa, here we present a 438,037bp mitochondrial genome. e current size of this genome
is close to Alloteropsis semialata (R.Br.) Hitchc. (442,063bp)128, T. aestivum (452,526bp)129, Sorghum bicolor
L. (468,628bp)130 and A. speltoides (476,091bp)131. Nevertheless, the present version of the genome is consti-
tuted by four contigs rather than one circular sequence. Although the general acceptance among mitochondrial
biologists is that plant mitochondrial genomes have a variety of congurations132134, in order to verify if a more
accurate assembly could be performed, we suggest reusing our data for a more comprehensive analysis of the
mitochondrial structures within Stipa.
Materials and methods
Plant material and DNA extraction. Our research complies with relevant institutional, national, and
international guidelines and legislation. A S. capillata sample from Kochkor River Valley, central Kyrgyzstan
(Supplementary TableS4), was selected for genome sequencing. e sample was stored in silica gel at ambient
temperature until DNA extraction was performed. Total genomic DNA was isolated from dried leaves aer a
six-month storage period using a CTAB large-scale DNA extraction protocol (Supplementary information S1,
describedin Supplementary File 6). DNA extraction was performed by SNPsaurus (USA). In addition, we iso-
lated DNA from dried leaves using a Genomic Mini AX Plant Kit (A&A Biotechnology, Poland). Subsequently,
quality check, quantication and concentration adjustment were accomplished using a NanoDrop One (ermo
Scientic, USA) and agarose gel electrophoresis visualisation. e concentration of the sample was adjusted to
50ng/μL. e puried DNA sample (1μg) was sent to Diversity Arrays Technology Pty Ltd (Canberra, Aus-
tralia) for sequencing and DArT marker identication. Moreover, to test the phylogenetic power of NORs in
Stipa, we supplemented the study with ve specimens of S. richteriana Kar. & Kir, three of S. lessingiana Trin. &
Rupr., four of S. heptapotamica Golosk. and four of S. korshinskyi Roshev. (Supplementary TableS4). e isola-
tion of genomic DNA was performed from dried leaf tissues using a modied CTAB method135.
Library construction and sequencing. In total, 5 ug of S. capillata genomic DNA were used to construct
a PacBio library according to the 20kb PacBio template preparation protocol omitting a shearing step. e size
selection cut-o was set at 15kb. e library preparation followed by sequencing on three PacBio Sequel SMRT
cells (Pacic Biosciences, Menlo Park, CA, USA) was carried out by SNPsaurus, LLC. Prior to the assembly,
reads from each SMRT cell were inspected and quality metrics were calculated using SequelQC v.1.1.0136. A
high-density assay using the DArT complexity reduction method for S. capillata was performed according to a
previously reported procedure137.
For the rest of the specimens used in the current study, the quality control using a uorometer (PerkinElmer
Victor3, USA) and gel electrophoresis, library construction using a TruSeq Nano DNA Library kit (350bp
insert size; Illumina, USA) and sequencing using 100bp paired-end reads on an Illumina HiSeq 2500 platform
(Illumina, USA) were performed by Macrogen Inc. (South Korea).
Nuclear genome assembly and validation. e execution of this work involved using many soware
tools, whose versions, settings and parameters are described in Supplementary information S2(available in Sup-
plementary File 6). e de novo assembly of the PacBio data was performed using Flye v.2.465,66. e dra assem-
bly was cleaned by running BLASTn v.2.10.0138 against the NCBI nucleotide database v.5, and subsequently
sending each BLAST hit to the JGI taxonomy server (https:// taxon omy. jgi- psf. org/) with a downstream step of
keeping only plant contigs. ereaer, Qualimap v.2.2.2139 was used to identify mean coverage for each contig. In
the nal assembly we kept only contigs with an average coverage of more than 10x. In addition, overrepresented
contigs (> 60x) were BLASTed against the NCBI nucleotide database v.5 and sequences assigned to chloroplasts
and mitochondria were removed.
Due to the nal assembly performed with Flye v.2.4 being roughly twice bigger than an expected monoploid
genome size of 589 Mb93, we accomplished an additional assembly with FALCON v.0.2.568 and applied Purge
Haplotigs v1.1.169 to lter redundant sequences due to possible heterozygosity. e assemblies’ statistics were
analysed using assembly-stats v.1.0.1140. In addition, in order to assess the completeness of the genome assemblies,
we investigated the presence of highly conserved orthologous genes using BUSCO v.4.0.6141.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
Scaolding of contigs. Due to there being no reference genome for any Stipa species, here we applied
RaGOO v.1.1142 to verify if a reference-guided scaolding can be performed for the dra genome contigs based
on four genomes from the Pooideae subfamily (B. distachyon70, H. vulgare71, A. tauschii72, T. aestivum 74) and one
genome from the Oryzoideae subfamily (O. sativa73). e subsequent assessment of the scaolding accuracy
was based on three parameters: (1) location condence score, (2) orientation condence score and (3) grouping
condence score142.
Repeat prediction and nuclear genome annotation. e repeat prediction for S. capillata was per-
formed using a de novo transposable element (TE) family identication and modeling package RepeatModeler
v.2.0.1143 which includes three repeat nding programs; RECON144, RepeatScout145, and TRF146. e resulting
TE library was supplemented by the transposable elements database (Release 19, http:// botse rv2. uzh. ch/ kelld
ata/ trep- db/).147 Subsequently, the genome assembly was masked for TEs regions by RepeatMasker v.4.1.0148
(http:// repea tmask er. org) with the search engine RMBlast v.2.9.0 + 149 and the custom library created in the pre-
vious step. Next, gene and protein sequences were predicted using Augustus v.3.2.3 with the unmasked and
v.3.3.3150 with the masked genome assemblies. e predicted protein sequences of the unmasked assembly were
then BLASTed against the NCBI protein database v.5 and the subsequent BLAST hit descriptions were added to
GFF (General Feature Format) les.
Genome-wide identication of microsatellite markers. e unmasked nuclear genome, chloroplast
and mitochondrial genome assemblies were screened for perfect mono-, di-, tri-, tetra-, penta- and hexa-nucleo-
tide repeat motifs using Krait v.1.3.375. We applied the following criteria: mono-nucleotide repeat motifs contain
at least 12 repeats, di-nucleotide repeat motifs contain at least seven repeats, tri-nucleotide repeat motifs contain
at least ve repeats, tetra-, penta- and hexa-nucleotide repeat motifs contain at least four repeats.
Divergence time of Stipa. In order to estimate the divergence between S. capillata and other Stipa species
we used the nucleolar organising regions. Firstly, we prepared a set of reference sequences including S. lipskyi
Roshev.151, S. magnica Junge152, S. narynica Nobis153, S. caucasica Schmalh.154, S. orientalis Trin.155 and S. pen-
nata L.156. Secondly, we mapped raw reads of S. capillata, S. richteriana, S. lessingiana, S. heptapotamica and S.
korshinskyi (Supplementary TableS2) as well as S. grandis55, S. breviora59, S. lagascae157 to the reference set
using Minimap2 v.2.17-r941158 with keeping only uniquely mapped reads by Samtools v.1.9159. irdly, the de
novo assembly of the NORs was performed using Canu v.2.0160 for S. capillata and SPAdes v.3.14.1161 for the rest
of Stipa species. Additionally, we added to the analysis B. distachyon162 as an ingroup member of the Pooideae
subfamily and O. sativa163 as an outgroup representing the Oryzoideae subfamily within the Poaceae family.
Next, all sequences were aligned using MAFFT v.7.471164. Subsequently, the aligned sequences were visualised
in AliView v.1.26165 and divided in ve loci: (1) 18S ribosomal RNA, (2) Internal Transcribed Spacer 1 (ITS1),
(3) 5.8S ribosomal RNA, (4) Internal Transcribed Spacer 2 (ITS2) and (5) 26S ribosomal RNA (Supplementary
File 4). Estimation of divergence times was performed in BEAST2 v.2.6.3166 using the 121,321 substitution model
determined by bModelTest167. We used the following constraints for time calibrations: 38–48 million years ago
(Mya) for the Brachypodium-Oryza split101 and 33–39 Mya for the potential origin and divergence of Stipa34,35.
en, the divergence time was estimated using the strict clock model and the Yule prior. In total, we ran the
analysis three times independently, 50 million Markov chain Monte Carlo (MCMC) generations for each run.
e log and tree les were combined using LogCombiner v.2.6.3 (a part of the BEAST package) with the rst ve
million generations discarded as burn-in from each run. Next, Tracer v.1.7.1168 was used to check the log les
regarding Eective Sample Size (ESS) values. As all ESSs exceeded 200, we summarised the nal maximum clade
credibility tree (Supplementary File 5) in TreeAnnotator v.2.6.3 (a part of the BEAST package). e nal tree was
visualised and edited using FigTree v.1.4.4169.
Mitochondrial and chloroplast genomes assembly, annotation and validation. Prior to assem-
bly, we mapped raw reads to 11 reference mitochondrial genomes of species belonging to the Poaceae family
(Supplementary TableS5) using Minimap2 v.2.17-r941158. Only uniquely mapped reads were kept by Samtools
v.1.9159 for the next step. De novo mitochondrial assembly of the 4.08Mb data was performed using Flye v.2.7.1-
b1590.
In the next step, we BLASTed the resulting contigs against the NCBI nucleotide database v.5, and sequences
assigned to mitochondria were kept. en, the PacBio subreads were mapped onto the kept contigs using Mini-
map2, and only uniquely mapped reads were retained by Samtools. A new de novo assembly of the 15.51Mb
data was performed using Flye. In order to check if the mitochondrial contigs obtained by Flye could be merged
into larger scaolds we applied Circlator v.1.5.5170. However, the resulting sequences were identical to the Flye
contigs. In addition, we used Unicycler v.0.4.884 with reads that were mapped onto the Flye contigs as a reference.
Further, to detect all possible structural haplotypes of the chloroplast genome we applied Cp-hap81. Next, we
mapped raw reads onto the resulting mitochondrial contigs and the chloroplast genomes to manually check in
IGV v.2.8.680 if any potential SNPs or indels are present. Eventually, annotations of the nal mitochondrial con-
tigs of 438,037bp and the chloroplast genomes of 137,823bp were performed using Geneious Prime v.2021.1.1
(https:// www. genei ous. com) based on 85% and 95% similarities to the reference genomes of mitochondria and
chloroplasts, respectively (Supplementary TableS5).
In Silico mapping of DArT marker sequences. Since the DArT markers are designed to target active
regions of the genome171, here we use them to validate the completeness of the nuclear genome assembly and
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
improve the accuracy of data ltering in further genomic studies on Stipa. Two data types, Silico and SNPs mark-
ers, were mapped to the nuclear genome using BLASTn v.2.10.0. As a query we used trimmed DArT sequences
in a range of 29–69bp with the percent identity values to the reference genome of 95% or greater and removing
alignments below 95% of a query.
Data availability
e raw PacBio reads are available at NCBI Sequence Read Archive172. e nal genome assemblies are deposited
into NCBI Assembly database under the following Accession Numbers: nuclear assembly (JAGXJF000000000)67;
mitochondrion assembly, contig 1 (MZ161090)76, contig 2 (MZ161091)77, contig 3 (MZ161093)78 and contig 4
(MZ161092)79; chloroplast assemblies, haplotype A (MZ146999)82 and haplotype B (MZ145043)83. e masked
and the unmasked versions of the nuclear genome annotation are presented in the Supplementary File 1 and
the Supplementary File 2, respectively.
Received: 17 November 2020; Accepted: 24 June 2021
References
1. Initiative, T. A. G. Analysis of the genome sequence of the owering plant Arabidopsis thaliana. Nature 408, 796–815. https://
doi. org/ 10. 1038/ 35048 692 (2000).
2. Moreau, H. et al. Gene functionalities and genome structure in Bathycoccus prasinos reect cellular specializations at the base
of the green lineage. Genome Biol. 13, R74. https:// doi. org/ 10. 1186/ gb- 2012- 13-8- r74 (2012).
3. Hamaji, T. et al. Anisogamy evolved with a reduced sex-determining region in volvocine green algae. Commun. Biol. 1, 17.
https:// doi. org/ 10. 1038/ s42003- 018- 0019-5 (2018).
4. Rensing, S. A. et al. e Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319,
64–69. https:// doi. org/ 10. 1126/ scien ce. 11506 46 (2008).
5. Bowman, J. L. et al. Insights into Land Plant Evolution Garnered from the Marchantia polymorpha Genome. Cell 171, 287–304.
https:// doi. org/ 10. 1016/j. cell. 2017. 09. 030 (2017).
6. Li, F. W. et al. Fern genomes elucidate land plant evolution and cyanobacterial symbioses. Nat. Plants 4, 460–472. https:// doi.
org/ 10. 1038/ s41477- 018- 0188-8 (2018).
7. Nystedt, B. et al. e Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584. https:// doi. org/ 10.
1038/ natur e12211 (2013).
8. Mosca, E. et al. A reference genome sequence for the European silver r (Abies alba Mill): A community-generated genomic
resource. G3: Genes Genomes, Genetics 9, 2039–2049. https:// doi. org/ 10. 1534/ g3. 119. 400083 (2019).
9. Amborella Genome Project. e Amborella genome and the evolution of owering plants. Science 342, 1241089. https:// do i . org/
10. 1126/ scien ce. 12410 89 (2013).
10. Strijk, J. S., Hinsinger, D. D., Zhang, F. & Cao, K. Trochodendron aralioides, the rst chromosome-level dra genome in Tro-
chodendrales and a valuable resource for basal eudicot research. GigaScience 8, 11. https:// doi. org/ 10. 1093/ gigas cience/ giz136
(2019).
11. Yu, J. et al. A dra sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92. https:// doi. org/ 10. 1126/ scien
ce. 10680 37 (2002).
12. Paterson, A. H. et al. e Sorghum bicolor genome and the diversication of grasses. Nature 457, 551–556. https:// doi. org/ 10.
1038/ natur e07723 (2009).
13. International Wheat Genome Sequencing Consortium. Shiing the limits in wheat research and breeding using a fully annotated
reference genome. Science 361, 705. https:// doi. org/ 10. 1126/ scien ce. aar71 91 (2018).
14. Bennetzen, J. L. et al. Reference genome sequence of the model plant Setaria. Nat. Biotechnol. 30, 555–561. https:// doi. org/ 10.
1038/ nbt. 2196 (2012).
15. Studer, A. J. et al. e dra genome of the C3 panicoid grass species Dichanthelium oligosanthes. Genome Biol. 17, 223. https://
doi. org/ 10. 1186/ s13059- 016- 1080-3 (2016).
16. Gordon, S. P. et al. Gradual polyploid genome evolution revealed by pan-genomic analysis of Brachypodium hybridum and its
diploid progenitors. Nat. Commun. 11, 3670. https:// doi. org/ 10. 1038/ s41467- 020- 17302-5 (2020).
17. Yagi, M. et al. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.). DNA Res. 21, 231–241. https:// doi. org/
10. 1093/ dnares/ dst053 (2014).
18. Cai, J. et al. e genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 47, 65–72. https:// doi. org/ 10. 1038/ ng. 3149
(2015).
19. Kim, Y. M. et al. Genome analysis of Hibiscus syriacus provides insights of polyploidization and indeterminate owering in
woody plants. DNA Res. 24, 71–80. https:// doi. org/ 10. 1093/ dnares/ dsw049 (2017).
20. Li, L. et al. Genome sequencing and population genomics modeling provide insights into the local adaptation of weeping for-
sythia. Horticulture Res. 7, 130. https:// doi. org/ 10. 1038/ s41438- 020- 00352-7 (2020).
21. Matasci, N. et al. Data access for the 1,000 Plants (1KP) project. Gigascience 3, 17. https:// doi. org/ 10. 1186/ 2047- 217X-3- 17
(2014).
22. Wickett, N. J. et al. Phylotranscriptomic analysis of the origin and early diversication of land plants. PNAS 111, 4859–4868.
https:// doi. org/ 10. 1073/ pnas. 13239 26111 (2014).
23. Leebens-Mack, J. H. et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574, 679–685.
https:// doi. org/ 10. 1038/ s41586- 019- 1693-2 (2019).
24. Cheng, S. et al. 10KP: A phylodiverse genome sequencing plan. GigaScience 7, giy013. https:// doi. org/ 10. 1093/ gigas cience/
giy013 (2018).
25. Pellicer, J., Fay, M. F. & Leitch, I. J. e largest eukaryotic genome of them all?. Bot. J. Linn. Soc. 164, 10–15. https:// doi. org/ 10.
1111/j. 1095- 8339. 2010. 01072.x (2010).
26. Stevens, K. A. et al. Sequence of the sugar pine megagenome. Genetics 204, 1613–1626. https:// doi. org/ 10. 1534/ genet ics. 116.
193227 (2016).
27. Meyers, L. A. & Levin, D. A. On the abundance of polyploids in owering plants. Evolution 60, 1198–1206. https:// doi. org/ 10.
1111/j. 0014- 3820. 2006. tb011 98.x (2006).
28. Flavell, R. B., Bennett, M. D., Smith, J. B. & Smith, D. B. Genome size and proportion of repeated nucleotide-sequence DNA in
plants. Biochem. Genet. 12, 257–269 (1974).
29. Schnable, P. S. et al. e B73 maize genome: Complexity diversity and dynamics. Science 326, 1112–1115. https:// doi. org/ 10.
1126/ scien ce. 11785 34 (2009).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
30. Daron, J. et al. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome Biol. 15,
546. https:// doi. org/ 10. 1186/ s13059- 014- 0546-4 (2014).
31. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138. https:// doi. org/ 10. 1126/ scien
ce. 11629 86 (2009).
32. Clarke, J. et al. Continuous base identication for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270.
https:// doi. org/ 10. 1038/ nnano. 2009. 12 (2009).
33. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. e Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics
community. Genome Biol. 17, 239. https:// doi. org/ 10. 1186/ s13059- 016- 1103-0 (2016).
34. MacGinitie, H. D. Fossil Plants Of e Florissant Beds, Colorado (Carnegie Institute of Washington Publication, 1953).
35. Manchester, S. R. Update on the megafossil ora of Florissant Colorado. Denver Museum Nat. Sci. 4, 137–161 (2001).
36. Freitag, H. e genus Stipa (Gramineae) in southwest and south Asia. Notes From Royal Botanic Garden 42, 355–489 (1985).
37. B arkworth, M. E. & Everett, J. Evolution in the Stipeae: identication and relationships of its monophyletic taxa. Grass systematics
and evolution (eds. Soderstrom, T. R., Hilu, K. W., Campbell, C. S. & Barkworth, M. E.) 251–264 (Smithsonian Institution Press,
1987).
38. Hamasha, H. R., von Hagen, K. B. & Röser, M. Stipa (Poaceae) and allies in the Old World: molecular phylogenetics realigns
genus circumscription and gives evidence on the origin of American and Australian lineages. Plant Syst. Evol. 298, 351–367.
https:// doi. org/ 10. 1007/ s00606- 011- 0549-5 (2012).
39. Kellogg, E. A. Subfamily Pooideae in e families and genera of vascular plants (ed. Kubitzki, K.) 199–229 (Springer International
Publishing, 2015).
40. Nobis, M. Taxonomic revision of the Central Asiatic Stipa tianschanica complex (Poaceae) with particular reference to the
epidermal micromorphology of the lemma. Folia Geobot. 49, 283–308. https:// doi. org/ 10. 1007/ s12224- 013- 9164-2 (2014).
41. Romaschenko, K. et al. Systematics and evolution of the needle grasses (Poaceae: Pooideae: Stipeae) based on analysis of multiple
chloroplast loci, ITS, and lemma micromorphology. Taxon 61, 18–44. https:// doi. org/ 10. 1002/ tax. 611002 (2012).
42. Nobis, M., Gudkova, P. D., Nowak, A., Sawicki, J. & Nobis, A. A synopsis of the genus Stipa (Poaceae) in Middle Asia, including
a key to species identication, an annotated checklist, and phytogeographic analyses. Ann. Mo. Bot. Gard. 105, 1–63. https://
doi. org/ 10. 3417/ 20193 78 (2020).
43. Yunatov, A. A. Main patterns of the vegetation cover of the Mongolian people’s republic. Proc. Mongolian Commission 39, 233
(1950).
44. Lavrenko, E. M., Karamasheva, Z. V. & Nikulina, R. I. Eurasian steppe. 143 (Nauka, 1991).
45. Nowak, A., Nowak, S., Nobis, A. & Nobis, M. Vegetation of feather grass steppes in the western Pamir Alai Mountains (Tajikistan,
Middle Asia). Phytocoenologia 46, 295–315. https:// doi. org/ 10. 1127/ phyto/ 2016/ 0145 (2016).
46. Danzhalova, E. V. et al. Indicators of pasture digression in steppe ecosystems of Mongolia. Exploration Biol. Resour. Mongolia
12, 297–306 (2012).
47. Maevsky, V. V. & Amerkhanov, H. H. e note of Poaceae species from former USSR ora, recommended as fodder for agricul-
tural production. Bull. Botanical Garden Saratov State Univ. 6, 80–83 (2007).
48. Brunetti, G., Soler-Rovira, P., Farrag, K. & Senesi, N. Tolerance and accumulation of heavy metals by wild plant species grown in
contaminated soils in Apulia region Southern Italy. Plant Soil 318, 285–298. https:// do i. org/ 10. 1007/ s11104- 008- 9838-3 (2009).
49. Moameri, M. et al. Investigating lead and zinc uptake and accumulation by Stipa hohenackeriana Trin and Rupr in eld and pot
experiments. Biosci. J. 34, 138–150. https:// doi. org/ 10. 14393/ BJ- v34n1 a2018- 37238 (2018).
50. Yang, Y. Q. et al. Transcriptome analysis reveals diversied adaptation of Stipa purpurea along a drought gradient on the Tibetan
Plateau. Funct. Integr. Genomics 15, 295–307. https:// doi. org/ 10. 1007/ s10142- 014- 0419-7 (2015).
51. Lv, X., He, Q. & Zhou, G. Contrasting responses of steppe Stipa ssp to warming and precipitation variability. Ecol. Evolut. 9,
9061–9075. https:// doi. org/ 10. 1002/ ece3. 5452 (2019).
52. Schubert, M., Grønvold, L., Sandve, S. R., Hvidsten, T. R. & Fjellheim, S. Evolution of cold acclimation and its role in niche
transition in the temperate grass subfamily Pooideae. Plant Physiol. 180, 404–419. https:// doi. org/ 10. 1104/ pp. 18. 01448 (2019).
53. NCBI BioSample, https:// www. ncbi. nlm. nih. gov/ biosa mple/? term= SAMN0 31781 90 (2014).
54. Wan, D. et al. De novo assembly and transcriptomic proling of the grazing response in Stipa grandis. PLoS ONE 10, e0122641.
https:// doi. org/ 10. 1371/ journ al. pone. 01226 41 (2015).
55. NCBI Sequence Read Archive, https:// www. ncbi. nlm. nih. gov/ sra/? term= SRP05 1667 (2020).
56. ArrayExpress, https:// www. ebi. ac. uk/ array expre ss/ exper iments/ E- MTAB- 5300 (2020).
57. Krawczyk, K., Nobis, M., Myszczyński, K., Klichowska, E. & Sawicki, J. Plastid superbarcodes as a tool for species discrimination
in feather grasses (Poaceae: Stipa). Sci. Rep. 8, 1924. https:// doi. org/ 10. 1038/ s41598- 018- 20399-w (2018).
58. NCBI Sequence Read Archive, https:// www. ncbi. nlm. nih. gov/ sra/ SRR82 08353 (2020).
59. NCBI Sequence Read Archive, https:// www. ncbi. nlm. nih. gov/ sra/ SRS32 90204 (2020).
60. Krawczyk, K., Nobis, M., Nowak, A., Szczecińska, M. & Sawicki, J. Phylogenetic implications of nuclear rRNA IGS variation in
Stipa L (Poaceae). Sci. Rep. 7, 11506. https:// doi. org/ 10. 1038/ s41598- 017- 11804-x (2017).
61. Wagner, V. et al. Similar performance in central and range-edge populations of a Eurasian steppe grass under dierent climate
and soil pH regimes. Ecography 34, 498–506. https:// doi. org/ 10. 1111/j. 1600- 0587. 2010. 06658.x (2011).
62. Wagner, V., Durka, W. & Hensen, I. Increased genetic dierentiation but no reduced genetic diversity in peripheral vs. central
populations of a steppe grass. Am. J. Botany 98, 1173–1179. https:// doi. org/ 10. 3732/ ajb. 10003 85 (2011).
63. Durka, W. et al. Extreme genetic depauperation and dierentiation of both populations and species in Eurasian feather grasses
(Stipa). Plant Syst. Evol. 299, 259–269. https:// doi. org/ 10. 1007/ s00606- 012- 0719-0 (2013).
64. Kirschner, P. et al. Long-term isolation of European steppe outposts boosts the biome’s conservation value. Nat. Commun. 11,
1968. https:// doi. org/ 10. 1038/ s41467- 020- 15620-2 (2020).
65. Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl. Acad. Sci. 113, 8396–8405. https:// doi. org/
10. 1073/ pnas. 16045 60113 (2016).
66. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37,
540–546. https:// doi. org/ 10. 1038/ s41587- 019- 0072-8 (2019).
67. NCBI Assembly, https:// www. ncbi. nlm. nih. gov/ assem bly/ JAGXJ F0000 00000 (2021).
68. Chin, C.-H. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054.
https:// doi. org/ 10. 1038/ nmeth. 4035 (2016).
69. Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome
assemblies. BMC Bioinf. 19, 460. https:// doi. org/ 10. 1186/ s12859- 018- 2485-7 (2018).
70. NCBI Assembly, https:// www. ncbi. nlm. nih. gov/ assem bly/ GCF_ 00000 5505.3 (2021).
71. NCBI Assembly, https:// www. ncbi. nlm. nih. gov/ assem bly/ GCA_ 90381 3605.1 (2021).
72. NCBI Assembly, https:// www. ncbi. nlm. nih. gov/ assem bly/ GCA_ 00257 5655.1 (2021).
73. NCBI Assembly, https:// www. ncbi. nlm. nih. gov/ assem bly/ GCF_ 00143 3935.1 (2021).
74. NCBI Assembly, https:// www. ncbi. nlm. nih. gov/ assem bly/ GCA_ 00222 0415.3 (2021).
75. Du, L., Zhang, C., Liu, Q., Zhang, X. & Yue, B. Krait: an ultrafast tool for genome-wide survey of microsatellites and primer
design. Bioinformatics 34, 681–683. https:// doi. org/ 10. 1093/ bioin forma tics/ btx665 (2018).
76. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MZ161 090 (2021).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
77. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MZ161 091 (2021).
78. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MZ161 093 (2021).
79. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MZ161 092 (2021).
80. Robinson, J. T., orvaldsdóttir, H., Wenger, A. M., Zehir, A. & Mesirov, J. P. Variant review with the integrative genomics viewer.
Can. Res. 77, 31–34. https:// doi. org/ 10. 1158/ 0008- 5472. CAN- 17- 0337 (2017).
81. Wang, W. & Lanfear, R. Long-reads reveal that the chloroplast genome exists in two distinct versions in most plants. Genome
Biol. Evol. 11, 3372–3381. https:// doi. org/ 10. 1093/ gbe/ evz256 (2019).
82. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MZ146 999 (2021).
83. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MZ145 043 (2021).
84. Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial genome assemblies from short and long
sequencing reads. PLoS Comput. Biol. 13, 1–22. https:// doi. org/ 10. 1371/ journ al. pcbi. 10055 95 (2017).
85. Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualisation of de novo genome assemblies. Bioinformatics
31, 3350–3352. https:// doi. org/ 10. 1093/ bioin forma tics/ btv383 (2015).
86. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ NC_ 037026.1 (2020).
87. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MG052 599.1 (2020).
88. Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chen-Shan, C. Scaolding of long read assemblies using long range contact infor-
mation. BMC Genom. 18, 527. https:// doi. org/ 10. 1186/ s12864- 017- 3879-z (2017).
89. Carballo, J. et al. A high-quality genome of Eragrostis curvula grass provides insights into Poaceae evolution and supports new
strategies to enhance forage quality. Sci. Rep. 9, 10250. https:// doi. org/ 10. 1038/ s41598- 019- 46610-0 (2019).
90. Chen, B. et al. e sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies.
Scientic Data 6, 188. https:// doi. org/ 10. 1038/ s41597- 019- 0194-3 (2019).
91. Shan, T. et al. First genome of the brown alga Undaria pinnatida: Chromosome-level assembly using PacBio and Hi-C tech-
nologies. Front. Genet. 11, 140. https:// doi. org/ 10. 3389/ fgene. 2020. 00140 (2020).
92. Ou, S. et al. Eect of sequence depth and length in long-read assembly of the maize inbred NC358. Nat. Commun. 11, 2288.
https:// doi. org/ 10. 1038/ s41467- 020- 16037-7 (2020).
93. Šmarda, P. et al. Genome sizes and genomic guanine + cytosine (GC) contents of the Czech vascular ora with new estimates
for 1700 species. Preslia 91, 117–142. https:// doi. org/ 10. 23855/ presl ia. 2019. 117 (2019).
94. Singh, R., Ming, R. & Yu, Q. Comparative analysis of GC content variations in plant genomes. Tropical Plant Biol. 9, 136–149.
https:// doi. org/ 10. 1007/ s12042- 016- 9165-4 (2016).
95. Šmarda, P. et al. Ecological and evolutionary signicance of genomic GC content diversity in monocots. Proc. Natl. Acad. Sci.
111, 4096–4102. https:// doi. org/ 10. 1073/ pnas. 13211 52111 (2014).
96. Bureš, P. et al. Correlation between GC content and genome size in plants. Cytometry A 71, 764 (2007).
97. Grover, C. E. & Wendel, J. F. Recent insights into mechanisms of genome size change in plants. J. Bot . 2010, 382732. https:// doi.
org/ 10. 1155/ 2010/ 382732 (2010).
98. Zuccolo, A. et al. Transposable element distribution, abundance and role in genome size variation in the genus Oryza. BMC
Evol. Biol. 7, 152. https:// doi. org/ 10. 1186/ 1471- 2148-7- 152 (2007).
99. Vogel, J. et al. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768. https:// doi.
org/ 10. 1038/ natur e08747 (2010).
100. Sasaki, T. e map-based sequence of the rice genome. Nature 436, 793–800. https:// doi. org/ 10. 1038/ natur e03895 (2005).
101. Wu, Z. et al. De novo genome assembly of Oryza granulata reveals rapid genome expansion and adaptive evolution. Commun.
Biol. 1, 84. https:// doi. org/ 10. 1038/ s42003- 018- 0089-4 (2018).
102. Liu, Q. et al. e repetitive DNA landscape in Avena (Poaceae): Chromosome and genome evolution dened by major repeat
classes in whole-genome sequence reads. BMC Plant Biol. 19, 226. https:// doi. org/ 10. 1186/ s12870- 019- 1769-z (2019).
103. Wicker, T. et al. Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol. 19, 103.
https:// doi. org/ 10. 1186/ s13059- 018- 1479-0 (2018).
104. Tkach, N. et al. Molecular phylogenetics and micromorphology of Australasian Stipeae (Poaceae), and the interrelation of
whole-genome duplication and evolutionary radiations in this grass tribe. Front. Plant Sci. 11, 630788, https:// doi. org/ 10. 3389/
fpls. 2020. 630788 (2021).
105. Romaschenko, K. et al. Miocene-Pliocene speciation, introgression, and migration of Patis and Ptilagrostis (Poaceae: Stipeae).
Mol. Phylogenet. Evol. 70, 244–259. https:// doi. org/ 10. 1016/j. ympev. 2013. 09. 018 (2014).
106. Matsuoka, Y., Takumi, S. & Nasuda, S. Genetic mechanisms of allopolyploid speciation through hybrid genome doubling: novel
insights from wheat (Triticum and Aegilops) studies. Int. Rev. Cell Mol. Biol. 309, 199–258. https:// doi. org/ 10. 1016/ b978-0- 12-
800255- 1. 00004-1 (2014).
107. Tzvelev, N. N. On the origin and evolution of the feathergrasses (Stipa L.). Problems of ecology, geobotany, botanical geography
and oristics (eds. Lebedev, D. V. & Karamysheva, Z. V.) 139–150 (Academiya Nauk SSSR, 1977).
108. Clayton, W. D. & Renvoize, S. A. Genera Graminum. Kew Bull. Additional Ser. 13, 1–389 (1986).
109. Hilu, K. W. Phylogenetics and chromosomal evolution in the Poaceae (grasses). Aust. J. Bot. 52, 13–22. https:// doi. org/ 10. 1071/
BT031 03 (2004).
110. Avdulov, N. P. Karyo-systematische Untersuchung der Familie Gramineen. Bull. Appl. Bot. Genet. Plant Breed. 43, 1–352 (1931).
111. VanBuren, R., Wai, C. M., Keilwagen, J. & Pardo, J. A chromosome-scale assembly of the model desiccation tolerant grass
Oropetium thomaeum. Plant Direct 2, e00096. https:// doi. org/ 10. 1002/ pld3. 96 (2018).
112. Appels, R. et al. Shiing the limits in wheat research and breeding using a fully annotated reference genome. Science 361,
eaar7191. https:// doi. org/ 10. 1126/ scien ce. aar71 91 (2018).
113. Liu, W. et al. Morphological and genetic variation along a North-to-South transect in Stipa purpurea, a dominant grass on the
qinghai-tibetan plateau: implications for response to climate change. PLoS ONE 11, e0161972. https:// doi. org/ 10. 1371/ journ al.
pone. 01619 72 (2016).
114. Liu, W., Liao, H., Zhou, Y., Zhao, Y. & Song, Z. Microsatellite primers in Stipa purpurea (Poaceae), a dominant species of the
steppe on the Qinghai-Tibetan Plateau. Am. J. Bot. 98, e150–e151. https:// doi. org/ 10. 3732/ ajb. 10004 44 (2011).
115. Yin, X., Yang, Y. & Yang, Y. Development and characterization of 29 polymorphic EST-SSR markers for Stipa purpurea (Poaceae).
Appl. Plant Sci. 4, 1600027. https:// doi. org/ 10. 3732/ apps. 16000 27 (2016).
116. Klichowska, E., Ślipiko, M., Nobis, M. & Szczecińska, M. Development and characterization of microsatellite markers for endan-
gered species Stipa pennata (Poaceae) and their usefulness in intraspecic delimitation. Mol. Biol. Rep. 45, 639–643. https:// doi.
org/ 10. 1007/ s11033- 018- 4192-x (2018).
117. Ren, J. et al. Development and characterization of EST-SSR markers in Stipa breviora (Poaceae). Applications in Plant Sciences
5, 1600157. https:// doi. org/ 10. 3732/ apps. 16001 57 (2017).
118. Oyundelger, K. et al. Climate and land use aect genetic structure of Stipa glareosa P. A. Smirn. in Mongolia. Flora 266, 151572.
https:// doi. org/ 10. 1016/j. ora. 2020. 151572 (2020).
119. Zietkiewicz, E., Rafalski, A. & Labuda, D. Genome ngerprinting by simple sequence repeat (SSR)-anchored polymerase chain
reaction amplication. Genomics 20, 176–183. https:// doi. org/ 10. 1006/ geno. 1994. 1151 (1994).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
120. Yu, J., Jing, Z. B. & Cheng, J. M. Genetic diversity and population structure of Stipa bungeana, an endemic species in Loess
Plateau of China, revealed using combined ISSR and SRAP markers. Genet. Mol. Res. 13, 1097–1108. https:// doi. org/ 10. 4238/
2014. Febru ary. 20. 11 (2014).
121. Kopylov-Guskov, Y. O. & Kramina, T. E. Investigating of Stipa ucrainica и Stipa zalesskii (Poaceae) from Rostov Oblast using
morphological and ISSR analyses. Bull. Moscow Soc. Nat. Biol. Ser. 119, 46–53 (2014).
122. Boussaid, M., Benito, C., Harche, M., Naranjo, T. & Z edek, M. Genetic variation in natural populations of Stipa tenacissima from
Algeria. Biochem. Genet. 48, 857–872. https:// doi. org/ 10. 1007/ s10528- 010- 9367-7 (2010).
123. Nobis, M. et al. Hybridisation, introgression events and cryptic speciation in Stipa (Poaceae): a case study of the Stipa hepta-
potamica hybrid-complex. Perspect. Plant Ecol. Evolut. Syst. 39, 125457. https:// doi. org/ 10. 1016/j. ppees. 2019. 05. 001 (2019).
124. Schubert, M., Marcussen, T., Meseguer, A. S. & Fjellheim, S. e grass subfamily Pooideae: Cretaceous-Palaeocene origin and
climate-driven Cenozoic diversication. Glob. Ecol. Biogeogr. 28, 1168–1182. https:// doi. org/ 10. 1111/ geb. 12923 (2019).
125. Hodkinson, T. R. Evolution and taxonomy of the grasses (Poaceae): a model family for the study of species-rich groups. Annual
Plant Rev. Online 1, 39. https:// doi. org/ 10. 1002/ 97811 19312 994. apr06 22 (2018).
126. Mueller-Bieniek, A., Kittel, P., Muzolf, B., Cywa, K. & Muzolf, P. Plant macroremains from an early Neolithic site in eastern
Kuyavia, central Poland. Acta Palaeobotanica 56, 79–89. https:// doi. org/ 10. 1515/ acpa- 2016- 0006 (2016).
127. Brown, R. P. & Yang, Z. Rate variation and estimation of divergence times using strict and relaxed clocks. BMC Evol. Biol. 11,
271. https:// doi. org/ 10. 1186/ 1471- 2148- 11- 271 (2011).
128. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MH644 808.1 (2020).
129. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ MH051 716.1 (2020).
130. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ NC_ 008360.1 (2020).
131. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ NC_ 022666.1 (2020).
132. Bendich, A. J. Structural analysis of mitochondrial DNA molecules from fungi and plants using moving pictures and pulsed-eld
gel electrophoresis. J. Mol. Biol. 255, 564–588. https:// doi. org/ 10. 1006/ jmbi. 1996. 0048 (1996).
133. Cheng, N. et al. Correlation between mtDNA complexity and mtDNA replication mode in developing cotyledon mitochondria
during mung bean seed germination. New Phytol. 213, 751–763. https:// doi. org/ 10. 1111/ nph. 14158 (2017).
134. Kozik, A. et al. e alternative reality of plant mitochondrial DNA: One ring does not rule them all. PLoS Genet. 15, e1008373.
https:// doi. org/ 10. 1371/ journ al. pgen. 10083 73 (2019).
135. Doyle, J. J. & Doyle, J. L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11–15
(1987).
136. Hufnagel, D. E., Huord, M. B. & Seetharam, A. S. SequelTools: a suite of tools for working with PacBio Sequel raw sequence
data. BMC Bioinf. 21, 429. https:// doi. org/ 10. 1186/ s12859- 020- 03751-8 (2020).
137. Baiakhmetov, E., Nowak, A., Gudkova, P. D. & Nobis, M. Morphological and genome-wide evidence for natural hybridisation
within the genus Stipa (Poaceae). Sci. Rep. 10, 13803. https:// doi. org/ 10. 1038/ s41598- 020- 70582-1 (2020).
138. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinf. 10, 421. https:// doi. org/ 10. 1186/ 1471- 2105- 10- 421
(2009).
139. Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput
sequencing data. Bioinformatics 32, 292–294. https:// doi. org/ 10. 1093/ bioin forma tics/ btv566 (2016).
140. Hunt, M.: Assembly statistics from FASTA and FASTQ les (Version 1.0.1). Github https:// github. com/ sanger- patho gens/ assem
bly- stats/ (2014).
141. Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness. Methods Mol.
Biol. 227–245, 2019. https:// doi. org/ 10. 1007/ 978-1- 4939- 9173-0_ 14 (1962).
142. Alonge, M. et al. RaGOO: fast and accurate reference-guided scaolding of dra genomes. Genome Biol. 20, 224. https:// doi.
org/ 10. 1186/ s13059- 019- 1829-6 (2019).
143. Flynn, J. M. et al. RepeatModeler2: Automated genomic discovery of transposable element families. PNAS 117, 9451–9457.
https:// doi. org/ 10. 1073/ pnas. 19210 46117 (2020).
144. Bao, Z. & Eddy, S. R. Automated de novo identication of repeat sequence families in sequenced genomes. Genome Res. 12,
1269–1276. https:// doi. org/ 10. 1101/ gr. 88502 (2002).
145. Price, A. L., Jones, N. C. & De Pevzner, P. A. novo identication of repeat families in large genomes. Bioinformatics 21, 351–358.
https:// doi. org/ 10. 1093/ bioin forma tics/ bti10 18 (2005).
146. Benson, G. Tandem repeats nder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. https:// doi. org/ 10.
1093/ nar/ 27.2. 573 (1999).
147. Wicker, T., Matthews, D. E. & Keller, B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 7, 561–562. https://
doi. org/ 10. 1016/ S1360- 1385(02) 02372-5 (2002).
148. Smit, A. F. A, Hubley, R. & Green, P. RepeatMasker Open-4.0, http:// www. repea tmask er. org, (2020).
149. Boratyn, G. M. et al. Domain enhanced lookup time accelerated BLAST. Biol. Direct 7, 12. https:// doi. org/ 10. 1186/ 1745- 6150-
7- 12 (2012).
150. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de
novo gene nding. Bioinformatics 24, 637–644. https:// doi. org/ 10. 1093/ bioin forma tics/ btn013 (2008).
151. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KY826 233 (2020).
152. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KY826 234 (2020).
153. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KY826 235 (2020).
154. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KY826 229 (2020).
155. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KY826 231 (2020).
156. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KY826 232 (2020).
157. NCBI Sequence Read Archive, https:// www. ncbi. nlm. nih. gov/ sra/ ERR17 44610 (2020).
158. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100. https:// doi. org/ 10. 1093/ bioin
forma tics/ bty191 (2018).
159. Li, H. et al. e sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079. https:// doi. org/ 10. 1093/ bioin
forma tics/ btp352 (2009).
160. Koren, S., Walenz, B. P., Berlin, K., Miller, J. R. & Phillippy, A. M. Canu: scalable and accurate long-read assembly via adaptive
k-mer weighting and repeat separation. Genome Res. 27, 722–736. https:// doi. org/ 10. 1101/ gr. 215087. 116 (2017).
161. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol.
19, 455–477. https:// doi. org/ 10. 1089/ cmb. 2012. 0021 (2012).
162. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ NC_ 016135. 3? report= fasta & from= 16402 0& to= 167409 (2020).
163. NCBI Nucleotide, https:// www. ncbi. nlm. nih. gov/ nucco re/ KM036 284.1 (2020).
164. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment soware version 7: Improvements in performance and usability.
Mol. Biol. Evol. 30, 772–780. https:// doi. org/ 10. 1093/ molbev/ mst010 (2013).
165. Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30, 3276–3278. https://
doi. org/ 10. 1093/ bioin forma tics/ btu531 (2014).
166. Bouckaert, R. et al. BEAST 25: An advanced soware platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 15,
1006650. https:// doi. org/ 10. 1371/ journ al. pcbi. 10066 50 (2019).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:15345 | 
www.nature.com/scientificreports/
167. Bouckaert, R. & Drummond, A. bModelTest: Bayesian phylogenetic site model averaging and model comparison. BMC Evol.
Biol. 17, 42. https:// doi. org/ 10. 1186/ s12862- 017- 0890-6 (2017).
168. Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using
tracer 17. System. Biol. 67, 901–904. https:// doi. org/ 10. 1093/ sysbio/ syy032 (2018).
169. Rambaut, A. Figtree v1.4.4 https:// tree. bio. ed. ac. uk/ sow are/ gtr ee (2018).
170. Hunt, M. et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294.
https:// doi. org/ 10. 1186/ s13059- 015- 0849-0 (2015).
171. Kilian, A. et al. Diversity arrays technology: a generic genome proling technology on open platforms. Methods Mol. Biol. 888,
67–89. https:// doi. org/ 10. 1007/ 978-1- 61779- 870-2_5 (2012).
172. NCBI Sequence Read Archive, https:// www. ncbi. nlm. nih. gov/ sra/ PRJNA 726584 (2021).
Acknowledgements
We would like to express our gratitude to Eric Johnson from SNPsaurus, Artem Kasianov from Institute for Infor-
mation Transmission Problems of the Russian Academy of Sciences (Moscow, Russia) and Igor A. Shmakov from
Altai State University (Barnaul, Russia) for their valuable assistance in the genome assembling. We also thank the
iDiv High-Performance Computing cluster for providing computing resources for this paper. Finally, we thank
two anonymous reviewers for providing valuable comments on the manuscript. e study was supported by the
Russian Science Foundation (grant no.19-74-10067). E.B. was supported via the RSF (grant no.19-74-10067) and
a DS grant of the Jagiellonian University (DS/D/WB/IB/2/2019). M.N. was supported by the National Science
Centre, Poland (grant no. 2018/29/B/NZ9/00313). P.D.G. was supported by the RSF (grant no.19-74-10067).
e openaccess publication of this article was funded by the BioS Priority Research Area under the program
"Excellence Initiative – Research University" at the Jagiellonian University in Krakow.
Author contributions
E.B., P.D.G., M.N. planned the study. E.B. supervised the research. M.N. and P.D.G. identied and collected
biological samples. E.B., C.G., E.S. performed the nuclear genome assembly. E.B. performed the remaining bio-
informatic analyses and wrote the manuscript. All authors revised the dra, provided comments and approved
the nal manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at https:// doi. org/
10. 1038/ s41598- 021- 94068-w.
Correspondence and requests for materials should be addressed to E.B.orM.N.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2021
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... The current result is likely more accurate due to applying several thousand SNPs across the genome in comparison to only one nuclear locus in the abovementioned study. Additionally, we demonstrated that the potential split between S. capillata and the remaining representatives of Leiostipa took place approximately 1.07 Mya, which is similar to our previous estimate of 1.73 Mya based on the nucleolar organising regions [48]. Here, we also reported for the first time that diversification within species S. capillata, S. baicalensis, S. krylovii and S. grandis started ca. ...
... From this point of view, Stipa may be a suitable genus to study these phenomena. Despite the increasing interest in feather grasses at the molecular level [19,20,48,[62][63][64][65], there is still a lack of substantial knowledge regarding, e.g., chromosome numbers of admixed and pure taxa in hybrid zones, fertility of pollen grains in F1 and later generation hybrids and backcrosses and genomic information related to specific loci contributing to reproductive barriers. We believe that only an integrative approach combining the aforesaid data can properly interpret evolutionary patterns and processes in feather grasses. ...
... Nonetheless, the package currently does not support more than 600 contigs/scaffolds [83] decreasing the potential number of used SNPs for very fragmented genomes. That was the case of Stipa, where only one draft genome comprising 5931 contigs is currently available [48]. Additionally, SNPs positions and genetic distances are used for a block jackknife method to test for a significant deviation from the null expectation of the f 4 statistic. ...
Article
Full-text available
Background The proper identification of feather grasses in nature is often limited due to phenotypic variability and high morphological similarity between many species. Among plausible factors influencing this issue are hybridisation and introgression recently detected in the genus. Nonetheless, to date, only a bounded set of taxa have been investigated using integrative taxonomy combining morphological and molecular data. Here, we report the first large-scale study on five feather grass species across several hybrid zones in Russia and Central Asia. In total, 302 specimens were sampled in the field and classified based on the current descriptions of these taxa. They were then genotyped with high density genome-wide markers and measured based on a set of morphological characters to delimitate species and assess levels of hybridisation and introgression. Moreover, we tested species for past introgression and estimated divergence times between them. Results Our findings demonstrated that 250 specimens represent five distinct species: S. baicalensis , S. capillata , S. glareosa , S. grandis and S. krylovii . The remaining 52 individuals provided evidence for extensive hybridisation between S. capillata and S. baicalensis , S. capillata and S. krylovii , S. baicalensis and S. krylovii , as well as to a lesser extent between S. grandis and S. krylovii , S. grandis and S. baicalensis . We detected past reticulation events between S. baicalensis , S. krylovii , S. grandis and inferred that diversification within species S. capillata , S. baicalensis , S. krylovii and S. grandis started ca. 130–96 kya. In addition, the assessment of genetic population structure revealed signs of contemporary gene flow between populations across species from the section Leiostipa , despite significant geographical distances between some of them. Lastly, we concluded that only 5 out of 52 hybrid taxa were properly identified solely based on morphology. Conclusions Our results support the hypothesis that hybridisation is an important mechanism driving evolution in Stipa . As an outcome, this phenomenon complicates identification of hybrid taxa in the field using morphological characters alone. Thus, integrative taxonomy seems to be the only reliable way to properly resolve the phylogenetic issue of Stipa . Moreover, we believe that feather grasses may be a suitable genus to study hybridisation and introgression events in nature.
... Recently, the first draft of the genome of feather grass, Stipa capillata, was reported [38]. This genome features a single-molecule, long-read sequencing dataset, which is distinct from our paired-end sequencing approach. ...
... These results may be justified by the drought conditions of the steppe. Recently, the first draft of the genome of feather grass, Stipa capillata, was reported [38]. This genome features a single-molecule, long-read sequencing dataset, which is distinct from our paired-end sequencing approach. ...
Article
Full-text available
Due to climate change and global warming, the frequency of sandstorms in northern China is increasing. Stipa breviflora, a dominant species in Eurasian grasslands, can help prevent desertification from becoming more serious. Studies on S. breviflora cover a wide range of fields. To the best of our knowledge, the present study is the first to sequence, assemble, and annotate the S. breviflora genome. In total, 2,781,544 contigs were assembled, and 2,600,873 scaffolds were obtained, resulting in a total length of 649,849,683 bp. The number of scaffolds greater than 1 kb was 70,770. We annotated the assembled genome (>121 kb), conducted a selective sweep analysis, and ultimately succeeded in assembling the Matk gene of S. breviflora. More importantly, our research identified 26 scaffolds that may be responsible for the drought tolerance of S. breviflora Griseb. In summary, the data obtained regarding S. breviflora will be of great significance for future research.
... Grasses play a vital role in ecology by stabilizing soil with their extensive root systems and preventing soil erosion. [16][17][18][19][20][21] Their abundance is a result of their fast growth rate, which makes them available in almost every landscape on the planet. But this also points out their invasiveness and possible disproportionation of ecological balance, due to their intrinsic properties of high thermal stability and lignocellulosic nature. ...
Article
Full-text available
The pressing environmental challenges have catalyzed a burgeoning interest in creating composite derived from biodegradable resources. Among these, researchers have garnered significant attention from grass family (Poaceae) fibers for their...
... nikolai respectively, while the admixture proportions of the other four samples were around 80% and 20%. According to Baiakhmetov et al. [81], S. caucasica and S. magnifica emerged around 0.91 and 2.04 mya respectively and the split of S. magnifica and S. caucasica based on our evolutionary scenario falls into that range. According to Arnold [82,83] and Mallet [77], the occurrence of backcross hybrids is typical in plant species that spontaneously hybridize. ...
Article
Full-text available
Stipa is a genus comprising ca. 150 species found in warm temperate regions of the Old World and around 30% of its representatives are of hybrid origin. In this study, using integrative taxonomy approach, we tested the hypothesis that hybridization and introgression are the explanations of the morphological intermediacy in species belonging to Stipa sect. Smirnovia, one of the species-rich sections in the mountains of Central Asia. Two novel nothospecies, S. magnifica × S. caucasica subsp. nikolai and S. lingua × S. caucasica subsp. nikolai, were identified based on a combination of morphological characters and SNPs markers. SNPs marker revealed that all S. lingua × S. caucasica samples were F1 hybrids, whereas most of S. magnifica × S. caucasica samples were backcross hybrids. Furthermore, the above mentioned hybrids exhibit transgressive morphological characters to each of their parental species. These findings have implications for understanding the process of hybridization in the genus Stipa, particularly in the sect. Smirnovia. As a taxonomic conclusion, we describe the two new nothospecies S. × muksuensis (from Tajikistan) and S. × ochyrae (from Kyrgyzstan) and present an identification key to species morphologically similar to the taxa mentioned above.
... The method 2 avoided this problem as it did not polish the sequences. The assembly graph may actually serve as a main illustration of genome's structural complexity and the long reads were instrumental in simplifying the graph model representation of the mitogenome (Fischer et al., 2022) (Baiakhmetov et al., 2021). ...
Article
Cannabis sativa L. belongs to the family Cannabaceae in Rosales. It has been widely used as medicines, building materials, and textiles. Elucidating its genome is critical for molecular breeding and synthetic biology study. Many studies have shown that the mitochondrial genomes (mitogenomes) and even chloroplast genomes (plastomes) had complex polymeric structures. Using the Nanopore sequencing platform, we sequenced, assembled, and analyzed its mitogenome and plastome. The resulting unitig graph suggested that the mitogenome had a complex polymeric structure. However, a gap-free, circular sequence was further assembled from the unitig graph. In contrast, a circular sequence representing the plastome was obtained. The mitogenome major conformation was 415,837 bp long, and the plastome was 153,927 bp long. To test if the repeat sequences promote recombination, which corresponds to the branch points in the structure, we tested the sequences around repeats by long-read mapping. Among 208 pairs of predicted repeats, the mapping results supported the presence of cross-over around 25 pairs of repeats. Subsequent PCR amplification confirmed the presence of cross-over around 15 of the 25 repeats. By comparing the mitogenome and plastome sequences, we identified 19 mitochondria plastid DNAs, including seven complete genes (trnW-CCA, trnP-UGG, psbJ, trnN-GUU, trnD-GUC, trnH-GUG, trnM-CAU) and nine gene fragments. Furthermore, the selective pressure analysis results showed that five genes (atp1, ccmB, ccmC, cox1, nad7) had 19 positively selected sites. Lastly, we predicted 28 RNA editing sites. A total of 8 RNA editing sites located in the coding regions were successfully validated by PCR amplification and Sanger sequencing, of which four were synonymous, and four were nonsynonymous. In particular, the RNA editing events appeared to be tissue-specific in C. sativa mitogenome. In summary, we have confirmed the major confirmation of C. sativa mitogenome and characterized its structural features in detail. These results provide critical information for future variety breeding and resource development for C. sativa.
... The gene fragments (accD, matk, rps16-trnQ, psbA-trnH, etc) in plastids are commonly used to mark phylogenetic status and taxonomic attributes of species (Mishra et al. 2016;Van Do et al. 2021). It has been reported that these barcodes can be used as a species identification tool for Stipa of Poaceae, such as S. pennata (Krawczyk et al. 2018), S. capillata (Baiakhmetov et al. 2021), and S. lipskyi (Myszczy nski et al. 2016). These highlight the importance of cp genomes in taxonomy. ...
Article
Full-text available
Stipa bungeana Trin. 1833 is an important forage grass in Poaceae, widely distributed in the temperate steppe of Northern China, with strong grazing tolerance and feeding value. In this study, we performed the complete chloroplast (cp) genome sequence of S. bungeana to explore its phylogenetic position with other Stipa. The results showed that the circular complete cp genome of S. bungeana was 137,759 bp in length, including a large single copy (LSC) of 81,652 bp, a small single copy (SSC) of 12,817 bp, and two inverted repeats (IR) of 21,645 bp. The GC content accounts for 43.71% and annotated 134 single genes, which include 87 protein-coding genes, eight rRNA genes, and 39 tRNA genes. Maximum-likelihood (ML) phylogenetic tree suggested that the S. bungeana was closely related to other Stipa except for S. purpurea.
Preprint
Full-text available
The nuclear genome sizes of 59 species from 33 genera of the Poaceae subfamily Pooideae were investigated by ow cytometry (FCM). This subfamily is characterized by a wide range of holoploid (2C values) and monoploid (1Cx values) genome sizes and mean chromosome sizes (MC), including both the highest and some of the lowest values of the entire grass family. For example, the tribe Brachypodieae has the smallest monoploid genomes and chromosomes, followed by the majority of Stipeae and individual representatives of the tribes Ampelodesmeae, Duthieeae and Meliceae, which belong to the phylogenetically 'early-diverging' lineages. Comparatively large genome and chromosome sizes were found in the Lygeeae and some Meliceae. The 'core Pooideae' had the largest values in the subfamily, with the greatest variation in Aveneae, Festuceae and Poeae. The tribes Bromeae and especially Triticeae, which includes wheat and related crops, had larger minimum monoploid genome and chromosome sizes compared to the other 'core Pooideae' tribes. It appears that the occurrence of exclusively rather large monoploid genomes (> 3.4 pg/1Cx) and chromosomes (MC ≥ 0.5 pg) is restricted to Triticeae. The origin of x = 7 of the 'core Pooideae' from x = 12 of the 'early-diverging' Pooideae lineages was apparently not related to an increase in genome size, whereas chromosome fusion caused an increase in chromosome size. The evolutionary aspects of chromosome base number variation in Pooideae are discussed, and new chromosome numbers are presented, including the rst polyploid (2n = 4x = 20) of the model plant Brachypodium distachyon s.s.
Article
Full-text available
Phylogenetic analysis provides crucial insights into the evolutionary relationships and diversification patterns within specific taxonomic groups. In this study, we aimed to identify the phylogenetic relationships and explore the evolutionary history of Stipa using transcriptomic data. Samples of 12 Stipa species were collected from the Qinghai-Tibet Plateau and Mongolian Plateau, where they are widely distributed, and transcriptome sequencing was performed using their fresh spikelet tissues. Using bidirectional best BLAST analysis, we identified two sets of one-to-one orthologous genes shared between Brachypodium distachyon and the 12 Stipa species (9397 and 2300 sequences, respectively), as well as 62 single-copy orthologous genes. Concatenation methods were used to construct a robust phylogenetic tree for Stipa, and molecular dating was used to estimate divergence times. Our results indicated that Stipa originated during the Pliocene. In approximately 0.8 million years, it diverged into two major clades each consisting of native species from the Mongolian Plateau and the Qinghai-Tibet Plateau, respectively. The evolution of Stipa was closely associated with the development of northern grassland landscapes. Important external factors such as global cooling during the Pleistocene, changes in monsoonal circulation, and tectonic movements contributed to the diversification of Stipa. This study provided a highly supported phylogenetic framework for understanding the evolution of the Stipa genus in China and insights into its diversification patterns.
Article
Full-text available
The mainly Australian grass genus Austrostipa (tribe Stipeae) comprising approximately 64 species represents a remarkable example of an evolutionary radiation. To investigate aspects of diversification, macro- and micromorphological variation in this genus, we conducted molecular phylogenetic and scanning electron microscopy (SEM) analyses including representatives from most of Austrostipa’s currently accepted subgenera. Because of its taxonomic significance in Stipeae, we studied the lemma epidermal pattern (LEP) in 34 representatives of Austrostipa. Plastid DNA variation within Austrostipa was low and only few lineages were resolved. Nuclear ITS and Acc1 yielded comparable groupings of taxa and resolved subgenera Arbuscula, Petaurista, and Bambusina in a common clade and as monophyletic. In most of the Austrostipa species studied, the LEP was relatively uniform (typical maize-like), but six species had a modified cellular structure. The species representing subgenera Lobatae, Petaurista, Bambusina as well as A. muelleri from subg. Tuberculatae were well-separated from all the other species included in the analysis. We suggest recognizing nine subgenera in Austrostipa (with number of species): Arbuscula (4), Aulax (2), Austrostipa (36), Bambusina (2), Falcatae (10), Lobatae (5), Longiaristatae (2), Petaurista (2) and the new subgenus Paucispiculatae (1) encompassing A. muelleri. Two paralogous sequence copies of Acc1, forming two distinct clades, were found in polyploid Austrostipa and Anemanthele. We found analogous patterns for our samples of Stipa s.str. with their Acc1 clades strongly separated from those of Austrostipa and Anemanthele. This underlines a previous hypothesis of Tzvelev (1977) that most extant Stipeae are of hybrid origin. We also prepared an up-to-date survey and reviewed the chromosome number variation for our molecularly studied taxa and the whole tribe Stipeae. The chromosome base number patterns as well as dysploidy and whole-genome duplication events were interpreted in a phylogenetic framework. The rather coherent picture of chromosome number variation underlines the enormous phylogenetic and evolutionary significance of this frequently ignored character.
Article
Full-text available
The European steppes and their biota have been hypothesized to be either young remnants of the Pleistocene steppe belt or, alternatively, to represent relicts of long-term persisting populations; both scenarios directly bear on nature conservation priorities. Here, we evaluate the conservation value of threatened disjunct steppic grassland habitats in Europe in the context of the Eurasian steppe biome. We use genomic data and ecological niche modelling to assess pre-defined, biome-specific criteria for three plant and three arthropod species. We show that the evolutionary history of Eurasian steppe biota is strikingly congruent across species. The biota of European steppe outposts were long-term isolated from the Asian steppes, and European steppes emerged as disproportionally conservation relevant, harbouring regionally endemic genetic lineages, large genetic diversity, and a mosaic of stable refugia. We emphasize that conserving what is left of Europe’s steppes is crucial for conserving the biological diversity of the entire Eurasian steppe biome.
Article
Full-text available
Background PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. Results Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. Conclusions SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools.
Article
Full-text available
Hybridisation in the wild between closely related species is a common mechanism of speciation in the plant kingdom and, in particular, in the grass family. Here we explore the potential for natural hybridisation in Stipa (one of the largest genera in Poaceae) between genetically distant species at their distribution edges in Mountains of Central Asia using integrative taxonomy. Our research highlights the applicability of classical morphological and genome reduction approaches in studies on wild plant species. The obtained results revealed a new nothospecies, Stipa × lazkovii, which exhibits intermediate characters to S. krylovii and S. bungeana. A high-density DArTseq assay disclosed that S. × lazkovii is an F1 hybrid, and established that the plastid and mitochondrial DNA was inherited from S. bungeana. In addition, molecular markers detected a hybridisation event between morphologically and genetically distant species S. bungeana and probably S. glareosa. Moreover, our findings demonstrated an uncertainty on the taxonomic status of S. bungeana that currently belongs to the section Leiostipa, but it is genetically closer to S. breviflora from the section Barbatae. Finally, we noticed a discrepancy between the current molecular data with the previous findings on S. capillata and S. sareptana.
Article
Full-text available
Understanding the genetic basis underlying the local adaptation of nonmodel species is a fundamental goal in evolutionary biology. In this study, we explored the genetic mechanisms of the local adaptation of Forsythia suspensa using genome sequence and population genomics data obtained from specific-locus amplified fragment sequencing. We assembled a high-quality reference genome of weeping forsythia (Scaffold N50 = 7.3 Mb) using ultralong Nanopore reads. Then, genome-wide comparative analysis was performed for 15 natural populations of weeping forsythia across its current distribution range. Our results revealed that candidate genes associated with local adaptation are functionally correlated with solar radiation, temperature and water variables across heterogeneous environmental scenarios. In particular, solar radiation during the period of fruit development and seed drying after ripening, cold, and drought significantly contributed to the adaptive differentiation of F. suspensa. Natural selection exerted by environmental factors contributed substantially to the population genetic structure of F. suspensa. Our results supported the hypothesis that adaptive differentiation should be highly pronounced in the genes involved in signal crosstalk between different environmental variables. Our population genomics study of F. suspensa provides insights into the fundamental genetic mechanisms of the local adaptation of plant species to climatic gradients.
Article
Full-text available
Our understanding of polyploid genome evolution is constrained because we cannot know the exact founders of a particular polyploid. To differentiate between founder effects and post polyploidization evolution, we use a pan-genomic approach to study the allotetraploid Brachypodium hybridum and its diploid progenitors. Comparative analysis suggests that most B. hybridum whole gene presence/absence variation is part of the standing variation in its diploid progenitors. Analysis of nuclear single nucleotide variants, plastomes and k-mers associated with retrotransposons reveals two independent origins for B. hybridum, ~1.4 and ~0.14 million years ago. Examination of gene expression in the younger B. hybridum lineage reveals no bias in overall subgenome expression. Our results are consistent with a gradual accumulation of genomic changes after polyploidization and a lack of subgenome expression dominance. Significantly, if we did not use a pan-genomic approach, we would grossly overestimate the number of genomic changes attributable to post polyploidization evolution. Existing plant pan-genomic studies usually report considerable intraspecific whole gene presence-absence variation. Here, the authors use pan-genomic approach to reveal gradual polyploid genome evolution by analyzing of Brachypodium hybridum and its diploid progenitors.
Article
Full-text available
Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature. Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.
Article
Full-text available
The genus Stipa L. comprises over 150 species, all native to the Old World, where they grow in warm temperate regions throughout Europe, Asia, and North Africa. It is one of the largest genera in the family Poaceae in Middle Asia, where one of its diversity hotspots is located. However, identification of Middle Asian Stipa species is difficult because of the lack of new, comprehensive taxonomic studies including all of the species recorded in the region. We present a critical review of the Mid-Asian representatives of Stipa, together with an identification key and taxonomic listing. We relied on both published and unpublished information for the taxa involved, many of which are poorly known. For each taxon, we present a taxonomic and nomenclatural overview, habitat preferences, distribution, altitudinal range, and additional notes as deemed appropriate. We describe four new nothospecies: S. ×balkanabatica M. Nobis & P. D. Gudkova, S. ×dzungarica M. Nobis, S. ×pseudomacroglossa M. Nobis, S. ×subdrobovii M. Nobis & A. Nowak, one subspecies S. caucasica Schmalh. subsp. nikolai M. Nobis, A. Nobis & A. Nowak, and eight varieties: S. araxensis Grossh. var. mikojanovica M. Nobis, S. caucasica var. fanica M. Nobis, P. D. Gudkova & A. Nowak, S. drobovii (Tzvelev) Czerep. var. jarmica M. Nobis, S. drobovii var. persicorum M. Nobis, S. glareosa P. A. Smirn. var. nemegetica M. Nobis, S. kirghisorum P. A. Smirn. var. balkhashensis M. Nobis & P. D. Gudkova, S. richteriana Kar. & Kir. var. hirtifolia M. Nobis & A. Nowak, and S. ×subdrobovii var. pubescens M. Nobis & A. Nowak. Additionally, 12 new combinations, Achnatherum haussknechtii (Boiss.) M. Nobis, A. mandavillei (Freitag) M. Nobis, A. parviflorum (Desf.) M. Nobis, Neotrinia chitralensis (Bor) M. Nobis, S. badachschanica Roshev. var. pamirica (Roshev.) M. Nobis, S. borysthenica Klokov ex Prokudin var. anomala (P. A. Smirn.) M. Nobis, S. holosericea Trin. var. transcaucasica (Grossh.) M. Nobis, S. kirghisorum P. A. Smirn. var. ikonnikovii (Tzvelev) M. Nobis, S. macroglossa P. A. Smirn. var. kazachstanica (Kotuchov) M. Nobis, S. macroglossa var. kungeica (Golosk.) M. Nobis, S. richteriana var. jagnobica (Ovcz. & Czukav.) M. Nobis & A. Nowak, and S. zalesskii Wilensky var. turcomanica (P. A. Smirn.) M. Nobis are proposed, and the lectotypes for 14 taxa (S. arabica Trin. & Rupr., S. bungeana Trin. ex Bunge, S. caspia K. Koch, S. ×consanguinea Trin. & Rupr., S. effusa Mez, S. ×heptapotamica Golosk., S. jacquemontii Jaub. & Spach., S. kungeica Golosk., S. margelanica P. A. Smirn., S. richteriana, S. rubentiformis P. A. Smirn., S. sareptana A. K. Becker, S. tibetica Mez, and Timouria saposhnikovii Roshev.) are designated. In Middle Asia the genus Stipa comprises 98 taxa, including 72 species, four subspecies, and 22 varieties. Of the 72 species of feather grasses, 23 are of hybrid origin (nothospecies). In Middle Asia, feather grasses can be found at elevations from (0 to)300 to 4500(to 5000) m, but most are montane species. The greatest species richness is observed at altitudes between 1000 and 2500 m. Nineteen species grow above 3000 m, but only nine above 4000 m. The number of taxa (species and subspecies) growing in each country also varies considerably, with the highest noted in Kazakhstan (42), Tajikistan (40), and Kyrgyzstan (35). Of the 76 taxa of Stipa (species and subspecies) recorded in Middle Asia, 41 are confined to the region, with some being known only from a single country or mountain range. Distribution maps of selected species are provided.
Article
Full-text available
The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license ( https://github.com/Dfam-consortium/RepeatModeler , http://www.repeatmasker.org/RepeatModeler/ ).
Article
In dry steppes, strong climatic constraints, especially highly variable precipitation, and grazing are the most important factors controlling plant life. Growth is strongly limited by water availability, while grazing may affect species presence and performance. However, there is a lack of studies on population genetics of dryland plants in general, and of those addressing grazing effects in particular. To determine the landscape-scale genetic structure of dryland species, and if grazing has an impact on that, we chose the Eurasian steppe grass Stipa glareosa for a population genetic study employing nine polymorphic Simple Sequence Repeat (SSR) markers. We assessed genetic fingerprints of 200 individuals from six populations in Mongolia, which were sampled along a large-scale precipitation and altitudinal gradient. Nested within this gradient, sub-populations were sampled along short local transects representing different grazing intensities. Overall, S. glareosa population showed rather low levels of genetic diversity at a mean Bruvo distance among individuals within a given population of 0.494 (mean expected heterozygosity He = 0.053). Linear mixed model analysis implied that genetic diversity was affected by both climatic constraints and local grazing conditions. We found a moderate isolation-by-distance pattern across all populations; grazing additionally influenced the genetic structure at local scale. Analysis of Molecular Variance revealed a modest genetic differentiation between populations (9% of variation) and among sub-populations representing different grazing levels (11%). Moreover, we detected indicator alleles that were exclusive for populations along the precipitation gradient; other alleles were associated with certain grazing levels across all populations. Thus, our data suggest that climatic constraints affect the genetic structure of S. glareosa populations, while at local scales differences in grazing disturbance may also matter.