ArticlePDF Available

Characterization of genome-wide genetic variations between two varieties of tea plant (Camellia sinensis) and development of InDel markers for genetic research

Authors:

Abstract and Figures

Background: Single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) are the major genetic variations and are distributed extensively across the whole plant genome. However, few studies of these variations have been conducted in the long-lived perennial tea plant. Results: In this study, we investigated the genome-wide genetic variations between Camellia sinensis var. sinensis 'Shuchazao' and Camellia sinensis var. assamica 'Yunkang 10', identified 7,511,731 SNPs and 255,218 InDels based on their whole genome sequences, and we subsequently analyzed their distinct types and distribution patterns. A total of 48 InDel markers that yielded polymorphic and unambiguous fragments were developed when screening six tea cultivars. These markers were further deployed on 46 tea cultivars for transferability and genetic diversity analysis, exhibiting information with an average 4.02 of the number of alleles (Na) and 0.457 of polymorphism information content (PIC). The dendrogram showed that the phylogenetic relationships among these tea cultivars are highly consistent with their genetic backgrounds or original places. Interestingly, we observed that the catechin/caffeine contents between 'Shuchazao' and 'Yunkang 10' were significantly different, and a large number of SNPs/InDels were identified within catechin/caffeine biosynthesis-related genes. Conclusion: The identified genome-wide genetic variations and newly-developed InDel markers will provide a valuable resource for tea plant genetic and genomic studies, especially the SNPs/InDels within catechin/caffeine biosynthesis-related genes, which may serve as pivotal candidates for elucidating the molecular mechanism governing catechin/caffeine biosynthesis.
Content may be subject to copyright.
R E S E A R C H A R T I C L E Open Access
Characterization of genome-wide genetic
variations between two varieties of tea
plant (Camellia sinensis) and development
of InDel markers for genetic research
Shengrui Liu
1
, Yanlin An
1
, Wei Tong
1
, Xiuju Qin
2
, Lidia Samarina
3
, Rui Guo
1
, Xiaobo Xia
1
and Chaoling Wei
1*
Abstract
Background: Single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) are the major genetic
variations and are distributed extensively across the whole plant genome. However, few studies of these variations
have been conducted in the long-lived perennial tea plant.
Results: In this study, we investigated the genome-wide genetic variations between Camellia sinensis var. sinensis
Shuchazaoand Camellia sinensis var. assamica Yunkang 10, identified 7,511,731 SNPs and 255,218 InDels based on
their whole genome sequences, and we subsequently analyzed their distinct types and distribution patterns. A total
of 48 InDel markers that yielded polymorphic and unambiguous fragments were developed when screening six tea
cultivars. These markers were further deployed on 46 tea cultivars for transferability and genetic diversity analysis,
exhibiting information with an average 4.02 of the number of alleles (Na) and 0.457 of polymorphism information
content (PIC). The dendrogram showed that the phylogenetic relationships among these tea cultivars are highly
consistent with their genetic backgrounds or original places. Interestingly, we observed that the catechin/caffeine
contents between Shuchazaoand Yunkang 10were significantly different, and a large number of SNPs/InDels
were identified within catechin/caffeine biosynthesis-related genes.
Conclusion: The identified genome-wide genetic variations and newly-developed InDel markers will provide a
valuable resource for tea plant genetic and genomic studies, especially the SNPs/InDels within catechin/caffeine
biosynthesis-related genes, which may serve as pivotal candidates for elucidating the molecular mechanism
governing catechin/caffeine biosynthesis.
Keywords: Molecular markers, Genetic diversity, SNP, InDel, Catechin/caffeine biosynthesis, Camellia sinensis
Background
Tea is the most popular non-alcoholic beverage and pos-
sesses numerous crucial properties including attractive
aroma, pleasant taste, and helpful and medicinal benefits
[13]. The tea plant (Camellia sinensis (L.) O. Kuntze) is
a perennial evergreen woody plant (2n = 2x = 30) belong-
ing to the section Thea of the genus Camellia in the
family Theaceae [4,5]. Evidence is accumulating that the
tea plant was originated from Yunnan Province in
southwestern China [47]. Currently, cultivated tea plant
varieties primarily belong to two groups, Camellia sinen-
sis var. sinensis (CSS) and Camellia sinensis var. assa-
mica (CSA), are extensively cultivated in tropical and
subtropical regions around the world [6,8]. Generally,
CSS is a slower-growing shrub with a relatively higher
cold-resistance capacity, while CSA is quick-growing
with larger leaves and high sensitivity to cold climate [9].
With the successive release of two draft genome se-
quences, CSA Yunkang 10[10] and CSS Shuchazao
[9], this plant is rapidly becoming another tractable ex-
perimental model for genetics and functional genomics
research on tea trees. It is known that self-
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
* Correspondence: weichl@ahau.edu.cn
Shengrui Liu and Yanlin An contributed equally to this work.
1
State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural
University, 130 Changjiang West Road, Hefei, China
Full list of author information is available at the end of the article
Liu et al. BMC Genomics (2019) 20:935
https://doi.org/10.1186/s12864-019-6347-0
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
incompatibility and long-term allogamy contributed con-
siderably to the highly heterogeneous and abundant gen-
etic variation of tea plant [11,12]. Therefore, it is highly
important to characterize genome-wide genetic variation
between the two varieties.
Molecular markers, based on DNA polymorphisms,
are useful and powerful tools for genetic and breeding
research. Numerous molecular markers have been
successfully developed and applied in genetic and
genomic research in tea plant, such as restriction
fragment length polymorphisms (RFLPs), amplified
fragment length polymorphisms (AFLPs), random
amplification of polymorphic DNAs (RAPDs), cleaved
amplified polymorphic sequences (CAPS), inter-simple
sequence repeats (ISSRs), and simple sequence repeats
(SSRs) [12,13]. With the rapid development of the
high-throughput sequencing approaches, the third-
generation single nucleotide polymorphism (SNP) and
insertion/deletion (InDel) markers are gradually be-
coming the most widely used molecular markers,
demonstrating a promising future in plant genetic
and breeding research.
SNPs are the most abundant genetic variations in most
plant species, and the exploitation of SNP markers in
single-copy regions is considerably easier than use of the
other DNA markers [1416]. InDel markers have prac-
tical value for those laboratories with limited resources,
which also showed reliable transferability between dis-
tinct populations [14,17,18]. Both SNPs and InDels
have been extensively applied for breeding programs and
genetic studies including pedigree analysis, origin and
evolutionary analysis, population structure and diversity
analysis, construction of linkage maps, QTL mapping,
and marker-assisted selection [14,1922]. Several stud-
ies have also reported the development and application
of SNP/InDel markers in tea plant genetic studies. For
instance, 16 expressed sequence tag (EST)-SNP based
CAPS markers were developed and applied for tea plant
cultivar identification [23]. A set of SNPs from EST da-
tabases was identified and verified [24]. Fang et al.
(2014) validated 60 EST-SNPs, and constructed genetic
relationships among tea cultivars and their specific DNA
fingerprinting [25]. Based on specific locus amplified
fragment sequencing (SLAF-seq), a total of 6042 SNP
markers were validated and a final genetic map contain-
ing 6448 markers was constructed [26]. Through restric-
tion site-associated DNA sequencing (RAD-Seq)
approach, Yang et al. (2016) identified a vast number of
SNPs from 18 cultivated and wild tea accessions, and
found that 13 genes containing non-synonymous SNPs
exhibited strong selective signals suggesting artificial se-
lective footprints during domestication of these tea ac-
cessions [27]. By harnessing the two reference genomes,
it is now suitable for identifying genome-wide SNPs/
InDels between them to guide rapid and efficient devel-
opment of markers for high-resolution genetic analysis.
The whole genome sequences of tea trees can provide
an elegant platform for identifying abundant genetic
variation and developing many genetic markers. The
completion of the two reference genome sequences is a
notable advance for genetic and genomic studies and a
basis for this study. The tea plant whole genome CSA
Yunkang 10was first reported based on the Illumina
next-generation sequencing platform, producing a ~ 3.02
Gb genome assembly containing 37,618 scaffolds with
N50 length of 449 Kb [10]. Subsequently, the genome
assembly of CSS Shuchazaowas released by combined
Illumina and PacBio sequencing platforms, yielding a ~
3.14 Gb genome assembly that consists of 36,676 scaf-
folds with N50 length of 1.39 Mb [9]. In this study, sev-
eral principal objectives were completed. Genome-wide
genetic variation and distribution patterns were investi-
gated. A number of polymorphic and stable InDel
markers were developed, providing informative molecu-
lar markers for genetic and genomic studies. The cat-
echin and caffeine contents of the two tea cultivars were
detected, and SNPs/InDels within catechin/caffeine
biosynthesis-related genes were characterized. The iden-
tified genome-wide genetic variations and newly devel-
oped InDel markers provide valuable resources for tea
plant genetic and genomic studies, and the identification
of SNPs/InDels within catechin/caffeine biosynthesis-
related genes can serve as important candidate loci for
functional analysis.
Results
Mapping of clean reads to the reference genome
Shuchazao
CSS Shuchazaohas been observed to have significant
differences in bud, leaf and budding flower size com-
pared with CSA Yunkang 10(Fig. 1). The completion
of the two reference genome sequences (Shuchazaoand
Yunkang 10) is a notable advance for comparative gen-
omic studies on tea plants in Thea section. Therefore,
genome-wide genetic variations were identified between
the two genome assemblies. After filtering the raw data,
a total of 324,154,064 clean reads from the CSA whole
genome sequencing data were generated; these reads
had a coverage depth of 10.4X the Yunkang 10genome
with a 100 bp length and 43% GC content. Through
alignment, a total of 317,878,025 clean reads were
mapped to the reference genome, accounting for 98.1%
of total reads. The mapped clean reads contained two
types of sequencing reads: pair-end and single-end reads.
The former was predominantly type (317,063,284,
99.7%), while single-end reads accounted for only 0.3%
(814,741 clean reads).
Liu et al. BMC Genomics (2019) 20:935 Page 2 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Fig. 1 Comparison of bud and leaf size between Shuchazaoand Yunkang 10. Young buds and leaves were collected on April 2019, while
mature leaves were collected from branches of last-year autumn
Fig. 2 Classification and distribution of identified SNPs/InDels in Yunkang 10/Shuchazaocomparison. aFrequency of different substitution
types in the identified SNPs; the x-axis and y-axis represent the types and number of SNPs, respectively. bDistribution of the length of InDels
identified between the two tea cultivars; the x-axis shows the number of nucleotides of InDels, and the y-axis represents the number of InDels at
each length
Liu et al. BMC Genomics (2019) 20:935 Page 3 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Identification and distribution of SNP and InDel loci
After a series of filtering, a total of 7,071,433 SNP loci
were generated, with an average SNP density in the tea
genome being estimated to be 2341 SNPs/Mb. Based on
nucleotide substitutions, the detected SNPs were classi-
fied as transitions (Ts: G/A and C/T) and transversions
(Tv: A/C, A/T, C/G, and G/T), which accounted for
77.46% (5,818,773) and 22.54% (1,692,958), respectively
(Fig. 2a), with a Ts/Tv ratio of 3.44. In transitions, the
number of A/G is equivalent to the C/T type, which in-
cluded 2,905,203 and 2,913,570, respectively. For trans-
versions, the number of four types (A/C, A/T, C/G and
G/T) are almost evenly distributed with an insignificant
difference among them, which accounted for 27.23%
(460,988), 24.72% (418,536), 20.84% (352,802) and
27.21% (460,632), respectively (Fig. 2a).
A total of 255,218 InDels were identified, with an
average density of 84.5 InDels/Mb. The length distri-
bution of InDels was analyzed by dividing the lengths
into different groups and calculating the ratios for the
corresponding length groups (Fig. 2b). It is obvious
that mononucleotide InDels is the most abundant
type, accounting for 44.27% (112,976) of the total
number. The length of InDels ranging from 1 to 20
bp was predominant, accounting for more than 95.5%
(243,749) of the total InDels. A clear tendency was
that the number of InDels gradually decreased with
increasing InDel length.
Location and functional annotation of SNPs and InDels
The annotation of the Shuchazaoreference genome
was used to uncover the distribution of SNPs and InDels
within distinct genomic regions. According to the gene
structure of the reference genome, the overwhelming
number of SNPs (94%) was identified in intergenic re-
gions, while only 6% (440,298) of SNPs were located in
genic regions (Fig. 3a). Among the SNPs located in genic
regions, 89,511 SNPs were detected in the CDs region,
which contained 38,670 synonymous and 50,841 non-
synonymous SNPs, respectively. Similarly, a small pro-
portion of InDels were located in the genic regions,
which accounted for only 12% (31,130) of the total num-
ber (Fig. 3b). Remarkably, 3406 InDels were located in
the CDs region, which can be regarded as the preference
for developing InDel markers.
To better understand the potential functions of these
genetic variations within genes, GO term enrichment
analysis of genes containing SNPs/InDels within CDs re-
gion was performed. These genes were classified into
biological process, cellular component and molecular
function categories (Additional file 2: Figure S2). Regard-
ing the genes containing SNPs, the GO terms of cellular
process, metabolic process and single-organism process
were dominantly abundant in the biological process
(Additional file 2: Figure S2A). In the cellular compo-
nent category, the top three enriched GO terms were
membrane, cell and cell part. Based on the molecular
function category, catalytic activity and binding are pre-
dominantly enriched, while others accounted for a small
proportion (Additional file 2: Figure S2A). Interestingly,
a nearly consensus result was obtained for GO terms
analysis of genes containing InDels, nothing but the
number of genes is less compared with the number of
genes containing SNPs (Additional file 2: Figure S2B).
Fig. 3 Annotation of SNPs and InDels identified between Shuchazaoand Yunkang 10.aAnnotation of SNPs. bAnnotation of InDels. SNPs and
InDels were classified as intergenic and genic on the Shuchazaoreference genome, and locations within the gene models were annotated
Liu et al. BMC Genomics (2019) 20:935 Page 4 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Validation and polymorphism of newly-developed InDel
markers
Initially, all InDels were used for designing primer pairs
using Primer3.0. To validate the InDels and develop
polymorphic InDel markers, we selected 100 InDel
markers that were distributed on different scaffolds. To
facilitate the screening and development of more prac-
tical markers, the lengths of all selected InDels ranged
from 5 to 20 bp in length. To determine the reliability
and polymorphisms of the primers, six tea cultivars were
selected for testing their amplified fragments using Frag-
ment Analyzer96. Of the total primer sets tested, 48
primer pairs were successfully amplified with unambigu-
ous bands and length polymorphisms among the six tea
cultivars, 19 primer sets generated non-polymorphic or
empty amplifications, and 33 primer pairs yielded non-
specific amplification or ambiguous bands.
Consequently, the 48 primer sets were regarded as ele-
gant InDel markers and used for further analysis.
To test cross-cultivars/subspecies transferability, the
48 InDel markers were conducted on a panel of 46 tea
cultivars belonging to section Thea of genus Camellia.
The detailed information of the 46 tea cultivars is listed
in Additional file 4: Table S1. The results of 18 InDel
markers testing on various tea cultivars are shown in
Fig. 4, demonstrating that unambiguous and poly-
morphic bands were obtained based on these markers.
The amplified results of the remaining 30 markers were
also demonstrated (Additional file 3: Figure S3). For the
newly developed markers, 20, 25 and 3 InDel markers
generated high polymorphism, moderate polymorphism,
and low polymorphism in the 46 tea cultivars, respect-
ively. The PIC value of each InDel marker was presented
in Table 1. The amplified allele sizes across them were
Fig. 4 Exhibition of transferability and polymorphism detected by 18 out of 48 InDel markers among 46 tea cultivars
Liu et al. BMC Genomics (2019) 20:935 Page 5 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 1 Characteristics of 48 newly developed InDel markers
Marker ID Scaffold location Fragment size (bp) Na MAF Ho He PIC
CsInDel01 Scaffold 5: 236696 139156 3 0.787 0.383 0.361 0.327
CsInDel02 Scaffold 5: 1208833 186205 4 0.489 1.000 0.633 0.555
CsInDel03 Scaffold 12: 195263 332354 3 0.500 0.489 0.577 0.478
CsInDel04 Scaffold 30: 3820588 214242 5 0.532 0.532 0.636 0.576
CsInDel05 Scaffold 39: 128636 236264 4 0.479 0.979 0.556 0.448
CsInDel06 Scaffold 41: 2074123 280295 3 0.808 0.180 0.319 0.273
CsInDel07 Scaffold 46: 249178 176189 3 0.734 0.362 0.405 0.336
CsInDel08 Scaffold 51: 314982 206215 6 0.394 0.638 0.691 0.627
CsInDel09 Scaffold 51: 760768 201248 7 0.532 0.660 0.679 0.645
CsInDel10 Scaffold 52: 469482 288306 3 0.745 0.255 0.394 0.329
CsInDel11 Scaffold 60: 843530 292332 6 0.383 0.213 0.748 0.701
CsInDel12 Scaffold 60: 843632 240275 5 0.426 0.660 0.704 0.645
CsInDel13 Scaffold 64: 151635 270289 3 0.404 0.617 0.643 0.559
CsInDel14 Scaffold 66: 500052 203232 4 0.436 0.064 0.621 0.535
CsInDel15 Scaffold 77: 505984 185207 2 0.500 1.000 0.505 0.375
CsInDel16 Scaffold 89: 1202911 231248 2 0.819 0.149 0.300 0.252
CsInDel17 Scaffold 98: 664107 306354 6 0.395 0.256 0.731 0.677
CsInDel18 Scaffold 114: 416691 283326 6 0.489 0.809 0.703 0.661
CsInDel19 Scaffold 129: 540746 180214 6 0.422 1.000 0.652 0.579
CsInDel20 Scaffold 154: 767901 285297 5 0.266 0.979 0.763 0.709
CsInDel21 Scaffold 225: 80286 191204 2 0.649 0.362 0.461 0.352
CsInDel22 Scaffold 1000: 52494 216288 3 0.532 0.404 0.612 0.537
CsInDel23 Scaffold 1001: 123324 236326 6 0.628 0.489 0.568 0.526
CsInDel24 Scaffold 1001: 149678 190199 2 0.798 0.021 0.326 0.271
CsInDel25 Scaffold 1001: 155681 195218 2 0.649 0.319 0.461 0.352
CsInDel26 Scaffold 1001: 1251845 341363 3 0.583 0.833 0.511 0.399
CsInDel27 Scaffold 1001: 1261469 273290 3 0.777 0.064 0.359 0.306
CsInDel28 Scaffold 1001: 1400899 213253 6 0.660 0.383 0.537 0.501
CsInDel29 Scaffold 1001: 1491192 182226 4 0.457 1.000 0.586 0.489
CsInDel30 Scaffold 1001: 1691928 238258 4 0.745 0.362 0.411 0.363
CsInDel31 Scaffold 1001: 1982826 284316 4 0.489 0.915 0.619 0.539
CsInDel32 Scaffold 1452: 285463 272299 3 0.596 0.426 0.511 0.406
CsInDel33 Scaffold 1539: 196438 271280 2 0.798 0.404 0.326 0.271
CsInDel34 Scaffold 1541: 138532 265286 3 0.564 0.851 0.523 0.413
CsInDel35 Scaffold 1543: 253456 172207 2 0.915 0.128 0.157 0.144
CsInDel36 Scaffold 1551: 196819 157237 3 0.606 0.745 0.499 0.391
CsInDel37 Scaffold 1553: 529121 211237 4 0.564 0.511 0.547 0.451
CsInDel38 Scaffold 1555: 5209 109340 14 0.298 0.489 0.869 0.849
CsInDel39 Scaffold 1579: 1466247 261272 2 0.606 0.787 0.483 0.363
CsInDel40 Scaffold 1592: 672899 276329 7 0.596 0.979 0.666 0.489
CsInDel41 Scaffold 1593: 1022219 172187 2 0.957 0.085 0.082 0.078
CsInDel42 Scaffold 1594: 195199 184206 3 0.691 0.426 0.454 0.380
CsInDel43 Scaffold 1611: 1270988 226254 5 0.426 0.319 0.684 0.619
CsInDel44 Scaffold 2220: 166816 292328 3 0.543 0.575 0.521 0.402
Liu et al. BMC Genomics (2019) 20:935 Page 6 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
within the ranges detected in the donor tea cultivar, im-
plying that the amplified fragments were derived from
the same loci and that the primer binding sites of the al-
leles were highly conserved among distinct tea cultivars/
subspecies. Several crucial parameters for evaluating
polymorphism of markers were subsequently conducted,
such as the number of alleles (Na) per locus ranged
from 2 (CsInDel15, CsInDel16, CsInDel21, CsInDel24,
CsInDel25, CsInDel33, CsInDel35, CsInDel39, CsIn-
Del41, CsInDel46, and CsInDel47) to 14 (CsInDel38)
with an average of 4.02 alleles, the major allele frequency
(MAF) ranged from the lowest 0.266 (CsInDel20) to the
highest at 0.957 (CsInDel41 and CsInDel47) with an
average of 0.585, the observed heterozygosity (Ho)
ranged from 0.021 (CsInDel24) to 1.000 (CsInDel15,
CsInDel19, and CsInDel29) with an average of 0.524 and
the expected heterozygosity (He) ranged from 0.082
(CsInDel41 and CsInDel47) to 0.869 with an average of
0.528, the polymorphic information content (PIC) values
were from the lowest value 0.078 (CsInDel41 and CsIn-
Del47) to the highest 0.849 (CsInDel38) with an average
of 0.457 (Table 1). Notably, the value of He has a similar
variation trend as the PIC value, while it has a distinct
variation trend with Ho values. The primer sequences
and genomic locations of these newly developed markers
are listed in Additional file 5: Table S2. These results
showed that these newly developed InDel markers are
informative and possess good transferability among vari-
ous tea subspecies/cultivars.
Population structure and genetic relationship analysis
Population structure analysis was performed on the 46
tea cultivars using Structure 2.3.3 software based on 48
newly-developed InDel markers. The Q-plot output pre-
sented our grouping results, indicating that the two
groups were the optimal classification at K = 2 (Fig. 5a).
Apparently, tea cultivars from southern and southwest-
ern China (Guangxi, Guangdong, Yunnan and Sichuan
Provinces) belonging to Camellia sinensis var. assamica
were clustered tightly together. In comparison, the tea
cultivars possessing smaller leaf sizes and shorter heights
that were cultivated in several other provinces were clas-
sified into another group (Fig. 5b).
To further confirm the applicability of the developed
InDel markers for classification, we constructed a phylo-
genetic tree based on their genetic distances (Fig. 5c).
Two major branches were generated (designated as α
and βgroups), which contained 17 and 29 tea cultivars,
respectively. Group αcan be further divided into two
subgroups, which were designated as α-1 and α-2 sub-
groups and consisted of 13 and 4 members, respectively.
The dendrogram reflects that the phylogenetic relation-
ships among them are highly consistent with their back-
grounds or places of origin, as well as displaying
consistency with the results from population structure
analysis although a small discrepancy was observed (Fig.
5c).
Identification of genetic variation in catechin/caffeine
biosynthesis-related genes
Tea cultivars belonging to Camellia sinensis var. assa-
mica possess significant differences in phenotypes (plant
height, leaf size and flower) and major characteristic sec-
ondary metabolites (such as catechin and caffeine, which
contributed tremendously to tea quality) compared with
Camellia sinensis var. sinensis. Therefore, we detected
the contents of catechin (flavan-3-ols) and caffeine in
both Shuchazaoand Yunkang 10based on HPLC ana-
lysis. The total content of catechin in both buds and the
second leaf from Yunkang 10was higher than from
Shuchazao(Fig. 6a). To understand the potential mo-
lecular mechanism of difference, we performed the cat-
echin biosynthesis pathway based on several previous
studies (Fig. 6b). After search, we identified a number of
SNPs and InDels in some crucial genes that are involved
in the catechin biosynthesis pathway, including phenyl-
alanine ammonia-lyase (PAL), cinnamic acid 4-
hydroxylase (C4H), 4-coumarate-CoA ligase (4CL), chal-
cone synthase (CHS), chalcone isomerase (CHI), flava-
none 3-hydroxylase (F3H), flavonoid 3-hydroxylase
(F3H), flavonoid 3,5-hydroxylase (F35H), dihydrofla-
vonol 4-reductase (DFR), leucoanthocyanidin reductase
(LAR), anthocyanidin synthase (ANS), anthocyanidin re-
ductase (ANR), and 1-O-galloyl-β-D-glucose O-
galloyltransferase (ECGT, which belongs to subclade 1A
of serine carboxypeptidase-like (SCPL) acyltransferases)
(Table 2).
Table 1 Characteristics of 48 newly developed InDel markers (Continued)
Marker ID Scaffold location Fragment size (bp) Na MAF Ho He PIC
CsInDel45 Scaffold 15,285: 211487 281321 5 0.333 0.952 0.752 0.699
CsInDel46 Scaffold 15,433: 302840 190253 2 0.638 0.468 0.467 0.355
CsInDel47 Scaffold 15,579: 267174 176186 2 0.957 0.043 0.082 0.078
CsInDel48 Scaffold 15,650: 137667 228266 6 0.489 0.596 0.671 0.614
Average ––4.02 0.585 0.524 0.528 0.457
Na number of alleles, MAF major allele frequency, Ho observed heterozygosity, He expected heterozygosity, PIC polymorphism information content
Liu et al. BMC Genomics (2019) 20:935 Page 7 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Detection of caffeine content in the two tea var-
ieties demonstrated that the caffeine in both bud
and the second leaf from Yunkang 10is lower than
that from Shuchazao(Fig. 7a). In Fig. 7b, the well-
studied caffeine biosynthesis pathway was also per-
formed based on previous studies [10,2831]. Simi-
larly, a number of genetic variations within some
critical regulatory genes were also detected, such as
in IMP dehydrogenase (IMPDH), guanosine synthase
(GMPS), 5-nucleotidase (5-Nase) and tea caffeine
synthase (TCS) genes (Fig. 7candTable2). Collect-
ively, these results indicate that certain genetic varia-
tions within these genes may explain the significant
difference in catechin/caffeine synthesis between
Shuchazaoand Yunkang 10.
Discussion
Identification of genetic variations in tea plant whole
genome
The recent release of the Shuchazaoand Yunkang 10
genome sequences will strongly facilitate the efficiency
of comparative genomics and functional research in tea
plants. This advance may enable researchers to study
numerous agronomic traits associated with the perennial
tea trees with a complete set of tools, including identifi-
cation and development of SNP/InDel markers. Never-
theless, genome-wide identification and development of
SNP/InDel markers are still in infancy, especially genetic
variations related to important agronomical traits. By
mapping the clean reads of Yunkang 10to the reference
genome assembly Shuchazao, we comprehensively
Fig. 5 Population structure and phylogenetic relationship analysis based on 48 InDel markers. aEstimation of the optimal group number through
ΔK, the number of Kwas set from 2 to 9. bQ-plot of the population structure when K= 2. Each tea cultivar is represented by a horizontal bar. c
The dendrogram was constructed based on genotypes using neighbor-joining algorithm with 1000 bootstrap replicates
Liu et al. BMC Genomics (2019) 20:935 Page 8 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
surveyed DNA polymorphisms at the genome-wide scale
and revealed the high level of genetic diversity between
them. The vast number of SNPs and InDels identified in
this study will provide valuable resources for tea plant
genetics and breeding studies.
After filtering, a total of 7,071,433 SNPs and 255,218
InDels were identified, and their densities distributed in
the tea plant genome were estimated to be 2341 SNPs/
Mb and 84.5 InDels/Mb, respectively. The densities of
SNP and InDel in the tea plant were significant differ-
ences compared with in other plant species, such as in
Arabidopsis [32], Brassica rapa [17], quinoa [19], and
soybean [33]. These significant differences in SNP/InDel
density among different plant species may be due to the
distinct filtering protocols and/or the different genomic
composition. It is known that tea cultivars belonging to
distinct varieties are highly heterogeneous with broad
genetic variation due to their self-incompatibility and
long-term allogamy [11]. In terms of SNPs, our results
showed that A/G and C/T transitions are the most
common pattern of nucleotide substitution, which is
consistent with the results obtained in other plant spe-
cies, such as foxtail millet [34], citrus [35], and soybean
[33]. For InDels, the most prevalent types in the tea
plant genome are short InDels. The number of 15bp
InDels is the predominant types, accounting for 76% of
all InDels, and similar results were displayed in several
other plant species [14,3335].
Knowing the genomic positions of genetic variations
in genetic markers or functional genes is highly import-
ant. It was shown that only minimal SNPs and InDels
were distributed in the CDs region, which can be ex-
plained by the fact that the CDs region only accounted
for a small proportion of the whole genome sequences
and had relatively higher conservation compared with
other regions. Among the 89,511 SNPs located in the
CDs region, a total of 50,841 SNPs were non-
synonymous variations. Non-synonymous variations can
usually have several functional impacts due to an altered
amino acid sequence, such as hampering the interaction
Fig. 6 Detection of catechin content and genetic variations within catechin biosynthesis-related genes. aDetection of catechin content of the
bud and leaf of both Shuchazaand Yunkang 10. T-test was employed for significant analysis and two asterisks represent p< 0.01. Each sample
was tested with three independent biological replicates and two technical replicates. bThe flavonoid biosynthesis pathway. PAL, phenylalanine
ammonia-lyase; C4H, cinnamic acid 4-hydroxylase; 4CL, 4-coumarate-CoAligase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone
3-hydroxylase; F3H, flavonoid 3-hydroxylase; F3,5H, flavonoid 3,5-hydroxylase; FLS, flavonol synthase; DFR, dihydroflavonol 4-reductase; ANS,
anthocyanidin synthase; ANR, anthocyanidin reductase; LAR, leucocyanidin reductase; SCLP1A, subclade 1A of serine
carboxypeptidase-like acyltransferases
Liu et al. BMC Genomics (2019) 20:935 Page 9 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 2 Statistics on SNPs and InDels within catechin biosynthesis-related genes
Gene
name
Gene ID SNP InDel Gene
name
Gene ID SNP InDel
DNA CDs DNA CDs DNA CDs DNA CDs
PAL TEA014056.1 2 2 0 0 F3H TEA004906.1 0 0 1 0
TEA034008.1 6 6 0 0 TEA010326.1 1 1 0 0
TEA003137.1 16 16 0 0 TEA032907.1 3 3 1 1
TEA023243.1 3 3 0 0 TEA028622.1 75 1 3 0
TEA024587.1 3 3 0 0 TEA009737.1 4 4 0 0
TEA003374.1 2 2 1 0 TEA000753.1 1 1 0 0
C4H TEA034001.1 16 8 1 1 TEA023937.1 1 1 0 0
TEA016772.1 5 1 1 0 TEA016601.1 4 2 1 0
TEA034002.1 6 6 0 0 TEA023790.1 10 3 1 0
4CL TEA018887.1 1 1 0 0 TEA000474.1 8 1 0 0
TEA034012.1 9 4 1 1 TEA026443.1 1 1 0 0
TEA019275.1 14 10 0 0 TEA004898.1 1 1 0 0
TEA027829.1 12 3 1 0 TEA006643.1 15 15 0 0
TEA025906.1 2 1 0 0 TEA014951.1 29 8 2 0
TEA009431.1 42 10 4 2 DFR TEA032730.1 2 0 1 0
TEA018045.1 22 3 4 0 TEA023829.1 13 1 0 0
TEA006577.1 6 1 0 0 TEA021807.1 2 0 0 0
TEA031627.1 11 8 0 0 TEA021815.1 2 2 0 0
TEA022274.1 2 1 0 0 ANS TEA010322.1 1 1 0 0
TEA010681.1 8 4 0 0 TEA015762.1 1 1 0 0
TEA002100.1 13 0 1 0 TEA015769.1 1 0 0 0
CHS TEA018665.1 1 1 0 0 ANR TEA030023.1 1 1 0 0
TEA034046.1 34 10 0 0 TEA022960.1 6 2 0 0
TEA034011.1 6 4 0 0 TEA007646.1 1 0 1 0
TEA034045.1 1 1 0 0 TEA003247.1 1 1 0 0
TEA023331.1 2 2 0 0 LAR TEA021535.1 1 1 0 0
TEA023340.1 3 3 2 0 TEA027582.1 0 0 2 0
TEA034013.1 2 2 0 0 TEA009266.1 3 3 1 0
TEA034043.1 31 7 0 0 SCPLA1 TEA034031.1 4 2 0 0
TEA034019.1 3 3 0 0 TEA034032.1 11 5 0 0
TEA034014.1 1 1 0 0 TEA010715.1 6 5 0 0
TEA011908.1 6 1 0 0 TEA034056.1 33 1 0 0
TEA019029.1 4 4 0 0 TEA009664.1 4 0 0 0
CHI TEA034003.1 10 2 1 0 TEA016469.1 2 0 0 0
TEA033023.1 127 4 10 0 TEA016463.1 9 1 0 0
TEA033031.1 2 1 0 0 TEA034055.1 59 1 0 0
F3H TEA016718.1 2 2 0 0 TEA034034.1 4 0 0 0
TEA010133.1 5 2 0 0 TEA034036.1 1 1 0 0
TEA006847.1 14 10 1 1 TEA023444.1 3 0 0 0
F35H TEA013315.1 12 12 0 0 TEA034039.1 31 2 0 0
TEA034021.1 6 1 0 0 TEA023451.1 4 1 0 0
TEA034051.1 32 4 4 0 TEA000223.1 4 0 0 0
Liu et al. BMC Genomics (2019) 20:935 Page 10 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
between proteins and affecting gene expression due to
the functional consequences of distinct motif binding at
variation sites [33,36]. It is worth noting that a total of
3406 identified InDels were located in the CDs region.
InDels tend to have more impact on protein structure
and function than single base changes, especially those
in the CDs region [33]. Nevertheless, genetic variations
at UTRs may also play important roles, such as modifi-
cation of regulatory elements affecting the interaction of
the UTRs with proteins and miRNAs [37]. Overall, these
SNPs and InDels can serve as important candidates for
functional research, especially those InDels in the CDs,
which can be considered as a valuable resource for de-
veloping phylogenetic and/or functional markers.
Development and application of InDel markers
Molecular markers are becoming indispensable tools for
evolutionary analysis, germplasm identification and con-
servation, and marker-assisted selection (MAS). SSR is
an extensively used marker type among genetic markers,
and a large number of highly polymorphic SSR markers
have been developed and applied in various genetic stud-
ies in tea plants [8,13]. These SSR markers, however,
could easily result in non-specific amplifications and
cause confusion in genotyping scoring [19], especially
for plant species with large genome and high repetitive
sequences. In fact, InDel markers are also PCR-based
markers and are similarly affected by genomic complex-
ity. However, they gave relatively less stutter bands due
to the variations are more conservative compared with
SSR markers [18,19]. Through a series of screenings, we
developed a final of 48 polymorphic and stable InDel
markers with 520 bp in length based on the genomic
assembled sequences (Table 1). The length of fragments
of the alleles amplified across tea cultivars was consist-
ent with the expected sizes of the products, implying
that the primer binding sites of the alleles were highly
conserved. The large proportion of InDel markers dis-
played a moderate PIC value (0.25 < PIC< 0.5), and the
average of PIC was 0.4. It is obvious that the PIC values
of most InDel markers were lower than the PIC of the
majority SSR markers [2,8,38,39], supporting that the
InDel markers are stable and bi-allelic throughout the
genome. Therefore, these newly developed InDel
markers are suitable for germplasm identification and
conservation, genetic diversity analysis, population struc-
ture and phylogenetic relationship analysis. In addition,
InDels can affect gene functions by causing the gain or
loss of a frameshift and/or a stop codon, it is therefore
suitable for developing functional markers that might be
particularly valuable for MAS [19,40].
Population structure analysis and phylogenetic trees
can reflect the genetic diversity, pedigree relationships,
and geographic distances among plant species and/or
varieties [2,16,22]. They can also be used to evaluate
the reliability of molecular markers. To test the reliabil-
ity and practicability of the newly-developed InDel
markers, population structure and phylogenetic
Fig. 7 Detection of caffeine content and genetic variations within caffeine biosynthesis-related genes. a. Detection of catechin content of the
bud and leaf of both Shuchazaand Yunkang 10. T-test was employed for significant analysis and one asterisk represents p< 0.05. Each sample
was tested with three independent biological replicates and two technical replicates. b. The caffeine biosynthesis pathway. IMP, Inosine
monophosphate; XMP, Xanthosine monophosphate; GMP, Guanosine monophosphate; IMPDH, IMP dehydrogenase; GMPS, Guanosine synthase;
5-Nase, 5-nucleotidase; 7-NMT, 7-methylxanthosine synthase; SAM, S-Adenosyl-L-methionine; N-MeNase, N-methylnucleotidase; TCS, tea caffeine
synthase. c. SNPs and InDels calling in caffeine biosynthesis-related genes
Liu et al. BMC Genomics (2019) 20:935 Page 11 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
relationship analysis were employed, and a consistent re-
sult was established (Fig. 5). Apparently, the tea cultivars
from southern and southwestern China were clustered
together, which originated from C. sinensis var. assamica
populations. In comparison, most tea cultivars from cen-
tral China had relatively close relationships with each
other, which have distinct phenotypes, including small
leaf size and short height of tea trees. These results indi-
cate that the population structure analysis and phylogen-
etic tree reflect the relationships of the 46 tea cultivars,
demonstrating the high reliability of these InDel markers
for genetic analysis.
Genetic variations within catechin/caffeine biosynthesis-
related genes
Catechin and caffeine are among the most important
components in tea plant leaves, which enormously affect
the quality of tea products and pharmacy [9,41]. It is
well-known that the contents of catechin and caffeine
are influenced by genotypic factors, and significant dif-
ferences can be observed among distinct tea varieties/
cultivars [31,42,43].
Based on HPLC detection, we found that the total cat-
echin content from Yunkang 10was significantly higher
than that from Shuchazaoin both bud and the second
leaf (Fig. 6a). Evidence has shown that the total catechin
content of tea varieties tended to decline from the
southern to the northern regions [42,43], and our result
is consistent with this tendency. Because catechins are
important factors for the oxidation degree and dark tea
was produced with severe fermentation during process-
ing [41,43], our results supported the fact that most tea
cultivars belonging to Camellia sinensis var. assamica
are more suitable for producing dark tea. To understand
the potential molecular mechanisms, genetic variations
within key genes associated with the catechin biosyn-
thesis pathway were investigated between the two var-
ieties. Unsurprisingly, a large number of SNPs and
InDels were identified and some of them were located in
the CDs (Table 2). Combining the results of detection of
catechin constitutes, it is likely to successfully select cer-
tain candidate genetic variations associated with the
genotypic factors. For instance, a study reported that a
number of candidate allelic variants relating to catechin
traits at the F35H locus were identified, and the genetic
effects of SNP840/848 were the most robust among
them [41].
The result of HPLC detection showed that the caffeine
content from Yunkang 10was significantly lower than
from Shuchazao(Fig. 7a). Remarkably, a number of
SNPs and InDels were found within some genes associ-
ated with the caffeine biosynthesis pathway (Fig. 7c).
Previously, a study reported that a 252 bp InDel muta-
tion in the 5-UTR of TCS1 plays a crucial role in
caffeine biosynthesis [44]. Thus, our results can provide
valuable candidates for identifying variations within
genes related to caffeine biosynthesis. Overall, these
valuable resources can be used for further validation,
such as functional characterization, association analysis,
or development of functional markers for marker-
assisted selection.
Conclusions
Comparison of the whole genome sequences between
Yunkang 10and Shuchazaorevealed a large amount of
genetic variations, including SNPs and InDels, demon-
strating that the tea plant genome is highly variable. The
types of SNPs and InDels were subsequently investi-
gated, and their distributions and annotations were also
analyzed. Based on these InDel loci, a total of 48 novel
InDel markers with moderate polymorphism and high
stability were developed. Population structure and phylo-
genetic relationship analyses were conducted based on
these markers, revealing that tea cultivars from Camellia
sinensis var. assamica were apparently clustered to-
gether, while the other tea cultivars from Camellia
sinensis var. sinensis were clustered into another group.
Remarkably, significant differences were observed in cat-
echin and caffeine content between Yunkang 10and
Shuchazao, and a number of SNPs and InDels were
identified within genes related to the catechin/caffeine
biosynthesis pathways.
Methods
Plant materials and DNA extraction
A total of 46 clonal tea cultivars were collected from the
main tea-growing regions in China, and we obtained
permission to collect all the tea samples. The details of
these samples, including cultivar name, subspecies,
germplasm type, registration number in China and culti-
vation region are listed in Additional file 4: Table S1.
Two individuals (Keke 1and Keke 2) were collected
from the local natural population in Guangdong Prov-
ince with the local governments permission; three clonal
tea cultivars (Liubaoxiye,Lingyun 2and Zihong) were
collected from the Tea Germplasm Repository of the
Tea Research Institute of Guangxi Province with permis-
sion; the rest of 41 clonal tea cultivars were commercial
cultivars and cultivated widespread in China, which were
deposited in the Tea Plant Cultivar and Germplasm Re-
source Garden in Guohe Town (N31°49, E117°13,
Hefei, China) of our Institute (Anhui Agricultural Uni-
versity). Until now, a total of 107 national tea cultivars
(NTCs) and 139 provincial tea cultivars (PTCs) were
registered in China [45]. In this study, 20 NTCs and 13
PTCs were used (the deposition numbers of NTCs are
included in Additional file 4: Table S1), and the
remaining 13 local tea cultivars (LTCs) were registered
Liu et al. BMC Genomics (2019) 20:935 Page 12 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
by the corresponding provincial government, while the
subspecies type of four tea cultivars (Keke 1,Keke 2,
Ziyanand Zixian) was still undetermined.
Young leaves of these tea cultivars were collected and
immediately frozen in liquid nitrogen, and subsequently
stored at 80 °C until further use. Total genomic DNA
was extracted using the EZgeneCP Plant Miniprep Kit
(Biomiga, USA) following the manufacturers protocol.
The quality and quantity of DNA samples were deter-
mined by 1% agarose gel electrophoresis and the Nano-
Drop 2000 UV-Vis spectrophotometer, respectively. The
concentration of each sample was adjusted to approxi-
mately 30 ng/ul for further use in the subsequent PCR
amplifications.
Identification of SNPs and InDels by genome-wide
comparison
Considering the quality of genome assemblies of Shu-
chazaois better than the assemblies of Yunkang 10[9,
10], it is reasonable to choose the assemblies of Shucha-
zaoas the reference genome. The clean reads of Yun-
Kang 10were retrieved from the NCBI Sequence Read
Archive under project number PRJNA381277 (Only the
reads with library insert size equal to 500 bp (~ 10 ×)
were applied for the further variation calling).
Subsequently, several steps were applied to identify gen-
etic variations between the two assemblies: aligning the
clean reads of Yunkang 10to the reference using BWA-
MEM (version 0.7.17) with parameter -M R-t40,remov-
ing PCR duplicates with Picard program, calling SNPs and
InDels using GATK-HaplotypeCaller method with param-
eter -stand_call_conf 30, and the combination method of
Samtools-mpileup with parameters -ugf -t DP -t SPand
Bcftools-call with parametes -v m-O, respectively. Then
take the intersection of the two results and use the GATK
software to filter according to the following parameters:
QD <20.0|| ReadPosRankSum <-8.0|| FS >10.0||QUAL
<$MEANQUAL (the first filter)and DP < 50.0||GQ <
10.0||QD <20.0||FS >200.0||SOR >10.0||MQRankSum
<-12.5||ReadPosRankSum <-8.0||QUAL <$MEANQUAL,
finally get a high quality variation locus set (Additional file
1: Figure S1). Annotation for the remaining variations was
conducted using snpEFF, and statistics of variations with
Vcftools. The genes containing SNPs/InDels in CDs were
selected by SnpSift, and their GO term enrichment analysis
were performed using the free online platform OmicShare
tools (http://www.omicshare.com/tools)(Additionalfile1:
Figure S1). These software programs have been accurately
and expediently applied in SNP calling from next-
generation sequencing data [46,47].
Validation and development of InDel markers
To develop suitable InDel markers for genetic research,
the InDel lengths 5 and 20 bp were used as candidate
loci. Specific primers were designed based on the se-
quences flanking the InDel loci through the Primer 3.0
program with the following parameters: amplicons
length (bp) 150350; primer length 2022, with the
optimum length being 20 bp; Tm (°C) 5060, with 55 °C
being the optimum; GC content (%) 4060, with 50%
being the optimum.
A total of 100 primer pairs were randomly selected
and preliminarily screened on six tea cultivars (Guyux-
iang,Longjing 43,Echa 5,Guilv, Yungui, and
Fudingdabaicha) using the Fragment Analyzer96 (Ad-
vanced Analytical Technologies, Inc., Ames, IA). Primers
that gave polymorphic and unambiguous bands were
further screened for identification against the 46 tea cul-
tivars. Details refer to PCR reagents and amplification
conditions were performed according to our previous
study [2]. If more than two fragments were amplified
against some individuals using certain markers, only two
fragments were collected based on the following criteria:
selecting the higher peak value, the higher concentration
of amplified products, and the more frequency of frag-
ments occurred among other individuals.
Genetic diversity analysis
The PROSizeTM 2.0 included in the Fragment Analyzer
96 system was applied to visually select strong and clearly
polymorphic DNA fragments for scoring, with the same
strategy as described previously [8]. The values of ex-
pected heterozygosity (He) and observed heterozygosity
(Ho) were determined by Popgene 32 version software.
The number of alleles (Na), major allele frequency (MAF),
and polymorphism information content (PIC) were calcu-
lated using PowerMarker 3.25 [48]. Based on the PIC
value, markers were divided into three types: highly in-
formative (PIC> 0.5), moderately informative (0.25 < PIC<
0.5) and slightly informative (PIC< 0.25) [19].
Population structure analysis
Genetic structure analysis of distinct tea accessions was
performed using the Structure 2.3.4 program [49]. To
minimize Hardy-Weinberg and linkage disequilibrium
within each group, the model-based Bayesian clustering
algorithm was employed to assign individuals to groups
with a predetermined number (K, it represents the num-
ber of inferred populations). Ten independent runs for
each Kranging from 2 to 9 were employed and 10,000
iterations were conducted for estimation after a 10,000
iterations burn-in period [19]. Estimation of the sub-
groups and the best Kvalue was performed according to
a previous study [50].
Phylogenetic analysis
Neis genetic distances of the 46 tea cultivars based on
48 InDel markers were calculated using PowerMarker
Liu et al. BMC Genomics (2019) 20:935 Page 13 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3.25. The dendrogram was constructed using the
neighbor-joining (NJ) algorithm as implemented in
MEGA 7.0 [51], with bootstrap values at the default set-
ting of 1000 replicates. Pairwise gap deletion mode was
employed to guarantee that the divergent domains could
contribute to the topology of the tree [52].
Detection of catechin content using HPLC
The contents of catechin and caffeine were extracted
and examined according to the previous study [53]. All
samples were detected with three independent biological
replicates and each independent sample was examined
with two technical replicates. The content of (+)-Gallo-
catechin (GC), (+)-Gallocatechin gallate (GCG), ()-Epi-
catechin (EC), ()-Epicatechin gallate (ECG),
()-Epigallocatechin gallate (EGCG), and caffeine were
detected. The catechin biosynthesis pathways were
established according to previous studies [41,5457].
The number of SNP/InDel within the catechin/caffeine
biosynthesis-related genes was also identified based on
the result of alignment and functional annotation.
Supplementary information
Supplementary information accompanies this paper at https://doi.org/10.
1186/s12864-019-6347-0.
Additional file 1: Figure S1. Flowchart diagram for identifying
genome-wide genetic variations between Shuchazaoand Yunkang 10
and functional annotation.
Additional file 2: Figure S2. Functional categorization of the genes
containing genetic variations within the CDs region. aFunctional
annotation of genes containing SNPs within in the CDs region. b
Functional annotation of genes containing InDels within in the CDs
region. These genes were categorized based on GO annotation, and the
number of each category is shown based on biological process, cellular
component and molecular function.
Additional file 3: Figure S3. Exhibition of transferability and
polymorphism detected by the remaining 30 InDel markers among 46
tea cultivars.
Additional file 4: Table S1. Detailed information for the 46 tea cultivars
used in this study.
Additional file 5: Table S2. Primer sequences of 48 newly developed
InDel markers.
Abbreviations
AFLPs: Amplified fragment length polymorphisms; CAPS: Cleaved amplified
polymorphic sequence; EST: Expressed sequence tag; He: Expected
heterozygosity; Ho: Observed heterozygosity; InDels: Insertions/Deletions;
ISSRs: Inter-simple sequence repeats; MAF: Major allele frequency;
Na: Number of alleles; PIC: Polymorphism information content; RAD-
seq: Restriction site-associated DNA sequencing; RAPDs: Random
amplification of polymorphic DNAs; RFLPs: Restriction fragment length
polymorphisms; SLAF-seq: Specific locus amplified fragment sequencing;
SNPs: Single nucleotide polymorphisms; SSRs: Simple sequence repeat
Acknowledgments
The authors thank the other members of our groups for technical assistance
and appreciate the anonymous reviewers for constructive comments on this
manuscript.
Authorscontributions
SRL performed data analysis and manuscript drafting. YLA conducted DNA
extraction, primer design, PCR amplification, and InDel marker validation. WT
were involved in the identification and analysis of variation loci. XJQ and LS
were involved in sample collection and data analysis. XBX and RG are
involved in DNA extraction and PCR amplification. CLW conceived and
designed the research. All authors read and approved the final manuscript.
Funding
This work was financially supported by the Key R&D Program of China
(2018YFD1000601), the Anhui Provincial Natural Science Foundation
(1808085QC92), the China Postdoctoral Science Foundat ion (2017 M621991),
the Natural Science Foundation of Anhui Provincial Department of Education
(KJ2018A0131), and the National Natural Science Foundation of China
(31800585). The funding bodies had no role in the design of the study,
collection, analysis, and interpretation of data, and in writing the manuscript.
Availability of data and materials
Most of the important data generated or analyzed during this study are
included in the article and its supplementary information files. The other
data and materials associated with the current study are available from the
corresponding author on reasonable request.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1
State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural
University, 130 Changjiang West Road, Hefei, China.
2
Guangxi LuYI Institute
of Tea Tree Species, 17 Jinji Road, Guilin, China.
3
Department of
Biotechnology, Russian Research Institute of Floriculture and Subtropical
Crops, Sochi, Russia.
Received: 25 August 2019 Accepted: 28 November 2019
References
1. Yang CS, Wang X, Lu G, Picinich SC. Cancer prevention by tea: animal
studies, molecular mechanisms and human relevance. Nat Rev Cancer. 2009;
9(6):42939.
2. Liu SR, Liu HW, Wu AL, Hou Y, An YL, Wei CL. Construction of fingerprinting
for tea plant (Camellia sinensis) accessions using new genomic SSR markers.
Mol Breed. 2017;37:93.
3. Zhang XC, Wu HH, Chen LM, Liu LL, Wan XC. Maintenance of mesophyll
potassium and regulation of plasma membrane H+-ATPase are associated
with physiological responses of tea plants to drought and subsequent
rehydration. Crop J. 2018;6:61120.
4. Hashimoto M, Takasi S. Morphological studies on the origin of the tea plant
V, a proposal of one place of origin by cluster analysis. Jpn J Trop Agric.
1978;21:93101.
5. Chen L, Yu FL, Tong QQ. Discussions on phylogenetic classification and
evolution of section Thea. J Tea Sci. 2000;20:8994.
6. Chang HT. Theaa section of beveragial tea trees of the genus Camellia.
Acta Sci Nat Univ Sunyats. 1981;1:8799.
7. Yu FL. Discussion on the originating place and the originating center of tea
plants. J Tea Sci. 1986;6(1):18.
8. Liu SR, An YL, Li FD, Li SJ, Liu LL, Zhou QY, Zhao SQ, Wei CL. Genome-wide
identification of simple sequence repeats and development of polymorphic
SSR markers for genetic studies in tea plant (Camellia sinensis). Mol Breed.
2018;38:59.
9. Wei C, Yang H, Wang S, Zhao J, Liu C, Gao L, Xia E, Lu Y, Tai Y, She G, et al.
Draft genome sequence of Camellia sinensis var. sinensis provides insights
into the evolution of the tea genome and tea quality. Proc Natl Acad Sci U
S A. 2018;115(18):E41518.
Liu et al. BMC Genomics (2019) 20:935 Page 14 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
10. Xia EH, Zhang HB, Sheng J, Li K, Zhang QJ, Kim C, Zhang Y, Liu Y, Zhu T, Li W,
et al. The tea tree genome provides insights into tea flavor and independent
evolution of caffeine biosynthesis. Mol Plant. 2017;10(6):86677.
11. Chen L, Gao QK, Chen DM, Xu CJ. The use of RAPD markers for detecting
genetic diversity, relationship and molecular identification of Chinese elite
tea genetic resources [Camellia sinensis (L.)O. Kuntze] preserved in tea
germplasm repository. Biodivers Conserv. 2005;14(6):143344.
12. Ni S, Yao MZ, Chen L, Zhao LP, Wang XC. Germplasm and breeding
research of tea plant based on DNA marker approaches. Front Agric China.
2008;2(2):2007.
13. Mukhopadhyay M, Mondal TK, Chand PK. Biotechnological advances in tea
(Camellia sinensis [L.] O. Kuntze): a review. Plant Cell Rep. 2016;35:25587.
14. Hu YY, Mao BG, Peng Y, Sun YD, Pan YL, Xia YM, Sheng XB, Li YK, Tang L,
Yuan LP, et al. Deep re-sequencing of a widely used maintainer line of
hybrid rice for discovery of DNA polymorphisms and evaluation of genetic
diversity. Mol Gen Genomics. 2014;289(3):30315.
15. Garrido-Cardenas JA, Mesa-Valle C, Manzano-Agugliaro F. Trends in plant
research using molecular markers. Planta. 2018;247(3):54357.
16. Villano C, Esposito S, Carucci F, Iorizzo M, Frusciante L, Carputo D, Aversano
R. High-throughput genotyping in onion reveals structure of genetic
diversity and informative SNPs useful for molecular breeding. Mol Breed.
2019;39:5.
17. Liu B, Wang Y, Zhai W, Deng J, Wang H, Cui Y, Cheng F, Wang XW, Wu J.
Development of InDel markers for Brassica rapa based on whole-genome
re-sequencing. Theor Appl Genet. 2012;126:2319.
18. Thakur O, Randhawa GS. Identification and characterization of SSR, SNP and
InDel molecular markers from RNA-Seq data of guar (Cyamopsis
tetragonoloba, L. Taub.) roots. BMC Genomics. 2018;19(1):951.
19. Zhang TF, Gu MF, Liu YH, Lv YD, Zhou L, Lu HY, Liang SQ, Bao HB, Zhao H.
Development of novel InDel markers and genetic diversity in Chenopodium
quinoa through whole-genome re-sequencing. BMC Genomics. 2017;18:685.
20. Belaj A, de la Rosa R, Lorite IJ, Mariotti R, Cultrera NGM, Beuzón CR,
González-Plaza JJ, Muñoz-Mérida A, Trelles O, Baldoni L. Usefulness of a new
large set of high throughput EST-SNP markers as a tool for olive Germplasm
collection management. Front Plant Sci. 2018;9:1320.
21. Zhang N, Zhang H, Ren Y, Chen L, Zhang J, Zhang L. Genetic analysis and
gene mapping of the orange flower trait in Chinese cabbage (Brassica rapa
L.). Mol Breed. 2019;39:76.
22. Sarkar D, Kundu A, Das D, Chakraborty A, Mandal NA, Satya P, Karmakar PG,
Kar CS, Mitra J, Singh NK. Resolving population structure and genetic
differentiation associated with RAD-SNP loci under selection in tossa jute
(Corchorus olitorius L.). Mol Genet Genomics. 2019;294:47992.
23. Ujihara T, Taniguchi F, Tanaka J, Hayashi N. Development of expressed
sequence tag (EST)-based cleaved amplified polymorphic sequence (CAPS)
markers of tea plant and their application to cultivar identification. J Agric
Food Chem. 2011;59:155764.
24. Zhang CC, Wang LY, Wei K, Cheng H. Development and characterization of
single nucleotide polymorphism markers in Camellia sinensis (Theaceae).
Genet Mol Res. 2014;13(3):582231.
25. Fang WP, Meinhardt LW, Tan HW, Zhou L, Mischke S, Zhang D. Varietal
identification of tea (Camellia sinensis) using nanofluidic array of single
nucleotide polymorphism (SNP) markers. Hortic Res. 2014;1:14035.
26. Ma JQ, Huang L, Ma CL, Jin JQ, Li CF, Wang RK, Zheng HK, Yao MZ, Chen L.
Large-scale SNP discovery and genotyping for constructing a high-density
genetic map of tea plant using specific-locus amplified fragment
sequencing (SLAF-seq). PLoS One. 2015;10(6):e0128798.
27. Yang H, Wei C-L, Liu H-W, Wu J-L, Li Z-G, Zhang L, Jian J-B, Li Y-Y, Tai Y-L,
Zhang J, et al. Genetic divergence between Camellia sinensis and its wild
relatives revealed via genome-wide SNPs from RAD sequencing. PLoS One.
2016;11(3):e0151424.
28. Deng W-W, Han J, Fan Y, Tai Y, Zhu B, Lu M, Wang R, Wan X, Zhang Z-Z.
Uncovering tea-specific secondary metabolism using transcriptomic and
metabolomic analyses in grafts of Camellia sinensis and C oleifera. Tree
Genet Genomes. 2018;14:23.
29. Guo Y, Zhu C, Zhao S, Zhang S, Wang W, Fu H, Li X, Zhou C, Chen L, Lin Y,
et al. De novo transcriptome and phytochemical analyses reveal
differentially expressed genes and characteristic secondary metabolites in
the original oolong tea (Camellia sinensis) cultivar Tieguanyincompared
with cultivar Benshan. BMC Genomics. 2019;20(1):265.
30. Han J, Lu M, Zhu B, Wang R, Wan X, Deng W-W, Zhang Z-Z. Integrated
transcriptomic and phytochemical analyses provide insights into
characteristic metabolites variation in leaves of 1-year-old grafted tea
(Camellia sinensis). Tree Genet Genomes. 2019;15:58.
31. Zhu B, Chen LB, Lu M, Zhang J, Han J, Deng WW, Zhang ZZ. Caffeine
content and related gene expression: novel insight into caffeine
metabolism in Camellia plants containing low, Normal, and high caffeine
concentrations. J Agric Food Chem. 2019;67(12):340011.
32. Jander G, Norris SR, Rounsley SD, Bush DF, Levin IM, Last RL. Arabidopsis map-
based cloning in the post-genome era. Plant Physiol. 2002;129(2):44050.
33. Ramakrishna G, Kaur P, Nigam D, Chaduvula PK, Yadav S, Talukdar A, Singh
NK, Gaikwad K. Genome-wide identification and characterization of InDels
and SNPs in Glycine max and Glycine soja for contrasting seed permeability
traits. BMC Plant Biol. 2018;18(1):141.
34. Bai H, Cao Y, Quan J, Dong L, Li Z, Zhu Y, Zhu L, Dong Z, Li D. Identifying
the genome-wide sequence variations and developing new molecular
markers for genetics research by re-sequencing a landrace cultivar of foxtail
millet. PLoS One. 2013;8(9):e73514.
35. Zhang JZ, Liu SR, Hu CG. Identifying the genome-wide genetic variation
between precocious trifoliate orange and its wild type and developing new
markers for genetics research. DNA Res. 2016;23(4):40314.
36. García-Lor A, Luro F, Navarro L, Ollitrault P. Comparative use of InDel and
SSR markers in deciphering the interspecific structure of cultivated citrus
genetic diversity: a perspective for genetic association studies. Mol Gen
Genomics. 2012;287(1):7794.
37. Steri M, Idda ML, Whalen MB, Orru V. Genetic variants in mRNA untranslated
regions. Wiley Interdiscip Rev RNA. 2018;9(4):e1474.
38. Yao MZ, Ma CL, Qiao TT, Jin JQ, Chen L. Diversity distribution and
population structure of tea germplasms in China revealed by EST-SSR
markers. Tree Genet Genomes. 2012;8:20520.
39. Tan LQ, Peng M, Xu LY, Wang LY, Chen SX, Zou Y, Qi GN, Cheng H.
Fingerprinting 128 Chinese clonal tea cultivars using SSR markers provides new
insights into their pedigree relationships. Tree Genet Genomes. 2015;11:90.
40. Liu TJ, Li YP, Zhou JJ, Hu CG, Zhang JZ. Genome-wide genetic variation and
comparison of fruit-associated traits between kumquat (Citrus japonica) and
Clementine mandarin (Citrus Clementina). Plant Mol Biol. 2018;96(45):493507.
41. Jin JQ, Ma JQ, Yao MZ, Ma CL, Chen L. Functional natural allelic variants of
flavonoid 3,5-hydroxylase gene governing catechin traits in tea plant and
its relatives. Planta. 2017;245(3):52338.
42. Chen L, Zhou ZX. Variations of main quality components of tea genetic
resources [Camellia sinensis (L.) O. Kuntze] preserved in the China national
germplasm tea repository. Plant Foods Hum Nutr. 2005;60:315.
43. Jin JQ, Ma JQ, Ma CL, Yao MZ, Chen L. Determination of catechin content in
representative Chinese tea germplasms. J Agric Food Chem. 2014;62:943641.
44. Jin JQ, Yao MZ, Ma CL, Ma JQ, Chen L. Natural allelic variations of TCS1 play
a crucial role in caffeine biosynthesis of tea plant and its related species.
Plant Physiol Biochem. 2016;100:1826.
45. Yang YJ, Liang YR. Clonal tea cultivars in China. Shanghai: Shanghai
Scientific and Technical Publishers; 2014.
46. Wright B, Farquharson KA, McLennan EA, Belov K, Hogg CJ, Grueber CE.
From reference genomes to population genomics: comparing three
reference-aligned reduced-representation sequencing pipelines in two
wildlife species. BMC Genomics. 2019;20:453.
47. Zhao Y, Wang K, Wang WL, Yin TT, Dong WQ, Xu CJ. A high-throughput
SNP discovery strategy for RNA-seq data. BMC Genomics. 2019;20:160.
48. Liu K, Muse SV. PowerMarker: an integrated analysis environment for
genetic marker analysis. Bioinformatics. 2005;21:21289.
49. Pritchard JK, Stephens M, Donnelly P. Inference of population structure
using multilocus genotype data. Genetics. 2000;155(2):94559.
50. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals
using the software structure: a simulation study. Mol Ecol. 2005;14(8):261120.
51. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics
analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:18704.
52. Liu SR, Khan MRG, Li YP, Zhang JZ, Hu CG. Comprehensive analysis of
CCCH-type zinc finger gene family in citrus (Clementine mandarin) by
genome-wide characterization. Mol Gen Genomics. 2014;289:85572.
53. Liu S, Mi X, Zhang R, An Y, Zhou Q, Yang T, Xia X, Guo R, Wang X, Wei C.
Integrated analysis of miRNAs and their targets reveals that miR319c/TCP2
regulates apical bud burst in tea plant (Camellia sinensis). Planta. 2019;250:111129.
54. Shi CY, Yang H, Wei CL, Yu O, Zhang ZZ, Jiang CJ, Sun J, Li YY, Chen Q, Xia
T, et al. Deep sequencing of the Camellia sinensis transcriptome revealed
candidate genes for major metabolic pathways of tea-specific compounds.
BMC Genomics. 2011;12:131.
Liu et al. BMC Genomics (2019) 20:935 Page 15 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
55. Wu ZJ, Li XH, Liu ZW, Xu ZS. Zhuang J De novo assembly and
transcriptome characterization: novel insights into catechins biosynthesis in
Camellia sinensis. BMC Plant Biol. 2014;14:277.
56. Li CF, Zhu Y, Yu Y, Zhao QY, Wang SJ, Wang XC, Yao MZ, Luo D, Li X, Chen
L, et al. Global transcriptome and gene regulation network for secondary
metabolite biosynthesis of tea plant (Camellia sinensis). BMC Genomics.
2015;16:560.
57. Wang YS, Xu YJ, Gao LP, Yu O, Wang XZ, He XJ, Jiang XL, Liu YJ, Xia T.
Functional analysis of flavonoid 3,5-hydroxylase from tea plant (Camellia
sinensis): critical role in the accumulation of catechins. BMC Plant Biol. 2014;
14:347.
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Liu et al. BMC Genomics (2019) 20:935 Page 16 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Consistently, the maximum numbers of InDels were identified as single-nucleotide InDels. This trend, whereby the highest ratio of single-and binucleotide InDels has been noted in crops like maize (Batley et al., 2003), Brassica rapa , tea (Liu et al., 2019), and chickpea (Jain et al., 2019), underscores the fundamental genetic mechanisms that favor the formation of smaller InDels. These findings align with known patterns of the formation of small InDels across various plant genomes, where single nucleotide polymorphisms often occur more frequently than more extensive mutations due to the simplicity of their mutational processes. ...
... This is particularly pertinent in peanuts, where approximately 80.9% of the identified InDels are located in intergenic areas, underscoring their potential beyond mere structural genomic elements. Moreover, the observed coding sequence (CDS) regions accounted for only 9.5% of the InDels, attributed to the higher degree of conservation in these regions than others within the genome (Liu et al., 2019). This conservation is crucial as InDels within CDS regions can profoundly impact protein structure and function, often more significantly than single base alterations. ...
Article
Full-text available
The advent of next-generation sequencing technologies, and particularly double digest restriction site-associated DNA sequencing (ddRADSeq), has significantly advanced the development of molecular markers for crop genetics. This study used ddRADSeq to identify and develop insertion–deletion (InDel) markers in 25 peanut genotypes from diverse geographical regions. The bioinformatic analysis revealed 62,728 InDels across the peanut genome, predominantly between 1 and 5 bp, which constituted 96% of the total, while InDels of ≥6 bp accounted for 3.96%. We focused on 1013 InDels of at least 10 bp for further analysis, representing 1.61% of the total reads, with a distribution of 832 insertions and 181 deletions. Of those, 21 InDels were selected for primer design and successfully amplified to produce markers within the range of 150–400 bp. Approximately 9.5% of the InDels were located in coding sequences, enhancing their potential utility in genomics-led breeding. These markers’ polymorphic information content varied from 0 to 0.371, demonstrating substantial genetic diversity with an average value of 0.163. These findings confirm the effectiveness of ddRADSeq for InDel marker development in peanuts, illustrating its potential to enhance marker-assisted breeding programs by providing robust tools for assessing genetic diversity.
... As one of the important sources of variation, INDELs have been increasingly reported in various studies [33][34][35][36]. However, there has been limited attention to the distribution patterns of INDELs on the genome [37,38]. For example, while multiple types of variation have been explored in rice, comparative studies on the variation patterns of INDELs, particularly in relation to other types of variation (such as PAVs, Presence/Absence Variations), remain scarce. ...
Article
Full-text available
Background Rice, as one of the most important staple crops, its genetic improvement plays a crucial role in agricultural production and food security. Although extensive research has utilized single nucleotide polymorphisms (SNPs) data to explore the genetic basis of important agronomic traits in rice improvement, reports on the role of other types of variations, such as insertions and deletions (INDELs), are still limited. Results In this study, we extracted INDELs from resequencing data of 148 rice improved varieties. We identified 938,585 INDELs and found that as the length of the variation increases, the number of variations decreases, with 89.0% of INDELs being 2–10 bp. The highest number of INDELs was found on chromosome 1, while the least was on chromosome 10. INDELs were unevenly distributed across the genome, generating a total of 33 hotspot regions. 47.0% of INDELs were located within 2 kb upstream and downstream of genes. Using phenotypic data from five agronomic traits (heading date, flag leaf length, flag leaf width, panicle number, and plant height) along with INDEL data to perform genome-wide association study (GWAS), we identified 6,331 significant loci involving 157 cloned genes. Haplotype analysis of candidate genes revealed INDELs affecting important functional genes, such as OsMED25 and OsRRMh related to heading date, and MOC2 related to plant height. Conclusions Our work analyzed the variation patterns of INDELs in rice improvement and identified INDELs associated with agronomic traits. These results will provide valuable genetic and material resources for the genetic improvement of rice.
... Possible reasons for this include the short-read nature of NGS, which is false positive when aligned with the reference genome, and the bioinformatic tools used for variation mining such as SAMtools [22] and GATK [42] are not tailored for indel discovery, where the algorithms are restricted to identify indels less than 10 bp [32]. Notably, long indels (>15 bp) and structural variations can be more effectively detected through high-coverage whole-genome resequencing or pan-genome sequencing to facilitate the fine mapping of loci for traits of interest, as evident in another perennial tree plant tea (Camellia sinensis (L.) Kuntze, [43]). However, it still requires high costs and calls for international research collaboration to achieve this in Leucaena. ...
Article
Full-text available
Leucaena is a versatile legume shrub/tree used as tropical livestock forage and in timber industries, but it is considered a high environmental weed risk due to its prolific seed production and broad environmental adaptation. Interspecific crossings between Leucaena species have been used to create non-flowering or sterile triploids that can display reduced weediness and other desirable traits for broad use in forest and agricultural settings. However, assessing the success of the hybridisation process before evaluating the sterility of putative hybrids in the target environment is advisable. Here, RNA sequencing was used to develop breeding markers for hybrid parental identification in Leucaena. RNA-seq was carried out on 20 diploid and one tetraploid Leucaena taxa, and transcriptome-wide unique genetic variants were identified relative to a L. trichandra draft genome. Over 16 million single-nucleotide polymorphisms (SNPs) and 0.8 million insertions and deletions (indels) were mapped. These sequence variations can differentiate all species of Leucaena from one another, and a core set of about 75,000 variants can be genetically mapped and transformed into genotyping arrays/chips for the conduction of population genetics, diversity assessment, and genome-wide association studies in Leucaena. For genetic fingerprinting, more than 1500 variants with even allele frequencies (0.4–0.6) among all species were filtered out for marker development and testing in planta. Notably, SNPs were preferable for future testing as they were more accurate and displayed higher transferability within the genus than indels. Hybridity testing of ca. 3300 putative progenies using SNP markers was also more reliable and highly consistent with the field observations. The developed markers pave the way for rapid, accurate, and cost-effective diversity assessments, variety identification and breeding selection in Leucaena.
... Furthermore, the availability of genome-wide molecular data for black pepper has remained limited until recently [68,69]. Although SNP-based analyses of genetic variation and population structure have been reported in several crop species such as rice [70], maize [71], sorghum [72], tomato [73], and tea [74], this is the first report on the utilization of SNP markers for genetic diversity studies in black pepper. ...
Article
Full-text available
Despite the economic importance of Piper nigrum (black pepper), a highly valued crop worldwide, development and utilization of genomic resources have remained limited, with diversity assessments often relying on only a few samples or DNA markers. Here we employed restriction-site associated DNA sequencing to analyze 175 P. nigrum accessions from eight main black pepper growing regions in Sri Lanka. The sequencing effort resulted in 1,976 million raw reads, averaging 11.3 million reads per accession, revealing 150,356 high-quality single nucleotide polymorphisms (SNPs) distributed across 26 chromosomes. Population structure analysis revealed two subpopulations (K = 2): a dominant group consisting of 152 accessions sourced from both home gardens and large-scale cultivations, and a smaller group comprising 23 accessions exclusively from native collections in home gardens. This clustering was further supported by principal component analysis, with the first two principal components explaining 35.2 and 12.1% of the total variation. Genetic diversity analysis indicated substantial gene flow (Nm = 342.21) and a low fixation index (FST = 0.00073) between the two subpopulations, with no clear genetic differentiation among accessions from different agro-climatic regions. These findings demonstrate that most current black pepper genotypes grown in Sri Lanka share a common genetic background, emphasizing the necessity to broaden the genetic base to enhance resilience to biotic and abiotic stresses. This study represents the first attempt at analyzing black pepper genetic diversity using high-resolution SNP markers, laying the foundation for future genome-wide association studies for SNP-based gene discovery and breeding.
... Scientists used TALENs to target the promoter region of the HvPaphya phytase gene in barley (Haq et al. 2022). Multiple INDELs (insertions and deletions) were found in 16-31% of the stable altered plants (Liu et al. 2019). In soybean, sitedirected mutation was performed using TALENs. ...
Article
Full-text available
Sugarcane (Saccharum officinarum) has gained more attention worldwide in recent decades because of its importance as a bioenergy resource and in producing table sugar. However, the production capabilities of conventional varieties are being challenged by the changing climates, which struggle to meet the escalating demands of the growing global population. Genome editing has emerged as a pivotal field that offers groundbreaking solutions in agriculture and beyond. It includes inserting, removing or replacing DNA in an organism's genome. Various approaches are employed to enhance crop yields and resilience in harsh climates. These techniques include zinc finger nuclease (ZFN), transcription activator-like effector nuclease (TALEN) and clustered regularly interspaced short palindromic repeats/associated protein (CRISPR/Cas). Among these, CRISPR/Cas is one of the most promising and rapidly advancing fields. With the help of these techniques, several crops like rice (Oryza sativa), tomato (Solanum lycopersicum), maize (Zea mays), barley (Hordeum vulgare) and sugarcane have been improved to be resistant to viral diseases. This review describes recent advances in genome editing with a particular focus on sugarcane and focuses on the advantages and limitations of these approaches while also considering the regulatory and ethical implications across different countries. It also offers insights into future prospects and the application of these approaches in agriculture.
... The clean reads were aligned to the 'Shuchazao' reference genome using the Burrows-Wheeler Aligner (BWA) procedure with default parameters, generating SAM files. The mapping results were sorted in SAM tools and duplicates were marked using functions implemented in Picar [22,23]. Uniquely mapped reads were utilized for SNP calling. ...
Article
Full-text available
The highly unique zigzag-shaped stem phenotype in tea plants boasts significant ornamental value and is exceptionally rare. To investigate the genetic mechanism behind this trait, we developed BC1 artificial hybrid populations. Our genetic analysis revealed the zigzag-shaped trait as a qualitative trait. Utilizing whole-genome resequencing, we constructed a high-density genetic map from the BC1 population, incorporating 5,250 SNP markers across 15 linkage groups, covering 3,328.51 cM with an average marker interval distance of 0.68 cM. A quantitative trait locus (QTL) for the zigzag-shaped trait was identified on chromosome 4, within a 61.2 to 97.2 Mb range, accounting for a phenotypic variation explained (PVE) value of 13.62%. Within this QTL, six candidate genes were pinpointed. To better understand their roles, we analyzed gene expression in various tissues and individuals with erect and zigzag-shaped stems. The results implicated CsXTH (CSS0035625) and CsCIPK14 (CSS0044366) as potential key contributors to the zigzag-shaped stem formation. These discoveries lay a robust foundation for future functional genetic mapping and tea plant genetic enhancement. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-024-05082-9.
Article
Full-text available
Background Breeding programs for nutrient-efficient tea plant varieties could be advanced by the combination of genotyping and phenotyping technologies. This study was aimed to search functional SNPs in key genes related to the nitrogen-assimilation in the collection of tea plant Camellia sinensis (L.) Kuntze. In addition, the objective of this study was to reveal efficient vegetation indices for phenotyping of nitrogen deficiency response in tea collection. Methods The study was conducted on the tea plant collection of Camellia sinensis (L.) Kuntze of Western Caucasus grown without nitrogen fertilizers. Phenotypic data was collected by measuring the spectral reflectance of leaves in the 350–1100 nm range calculated as vegetation indices by the portable hyperspectral spectrometer Ci710s. Single nucleotide polymorphisms were identified in 30 key genes related to nitrogen assimilation and tea quality. For this, pooled amplicon sequencing, SNPs annotation and effect prediction with SnpEFF tool were used. Further, a linear regression model was applied to reveal associations between the functional SNPs and the efficient vegetation indices. Results PCA and regression analysis revealed significant vegetation indices with high R2 values (more than 0.5) and the most reliable indices to select ND-tolerant genotypes were established: ZMI, CNDVI, RENDVI, VREI1, GM2, GM1, PRI, and Ctr2, VREI3, VREI2. The largest SNPs frequency was observed in several genes, namely F3’5’Hb , UFGTa , UFGTb , 4Cl , and AMT1.2 . SNPs in NRT2.4 , PIP , AlaDC , DFRa , and GS1.2 were inherent in ND-susceptible genotypes. Additionally, SNPs in AlaAT1 , MYB4 , and WRKY57 , were led to alterations in protein structure and were observed in ND-susceptible tea genotypes. Associations were revealed between flavanol reflectance index (FRI) and SNPs in ASNb and PIP , that change the amino acids. In addition, two SNPs in 4Cl were associated with water band index (WBI). Conclusions The results will be useful to identify tolerant and susceptible tea genotypes under nitrogen deficiency. Revealed missense SNPs and associations with vegetation indices improve our understanding of nitrogen effect on tea quality. The findings in our study would provide new insights into the genetic basis of tea quality variation under the N-deficiency and facilitate the identification of elite genes to enhance tea quality.
Chapter
Tea is a popular beverage with a high nutritional and economic value. Tea plants are diploid and have 15 pairs of homologous chromosomes, containing the entire genetic information of its genome. Despite the past publication of the data for a number of tea genomes, it is still difficult to identify novel germplasm resources and breeding new cultivars in an effective manner. Here, we present a 200 K tea single nucleotide polymorphism (SNP) array to improve the efficiency of tea plant genotyping. The re-sequencing data in our previous study was used for SNP identification of the array, and a total of 179,970 unique and informative SNPs were obtained in the SNP array. A set of 142 tea cultivars from different provinces of China and an F1 population of 'Longjing 43 (LJ43)' and 'Baihaozao (BHZ)' with 327 individuals were genotyped. Our SNP array data showed that tea plants are domesticated polyphyletically, with Southwest China as the domestication center of the Camellia sinensis var. assamica (CSA) population and the East of China as the domestication center of the C. sinensis var. sinensis (CSS) population. These results indicate the importance of SNP array for forward genetic research in tea plants.
Chapter
Tea genetic resources are the sum total of hereditary material, which includes all the alleles of various genes, present in tea plant and its wild relatives. Tea genetic resources consist of modern cultivars, landraces, advanced breeding materials, and wild relatives. These resources offer a wide range of traits, including yield potential, flavor characteristics, resistance to pests and diseases, and adaptability to various growing conditions. By exploring and utilizing tea genetic resources, breeders can develop new and improved tea cultivars with desired traits, ultimately enhancing agricultural productivity and the overall quality of tea production. Additionally, tea genetic resources play a crucial role in various systematic studies, such as biochemistry, evolutionary biology, phylogenetics, physiology, molecular studies, and cytogenetics. This chapter focuses on recent advances in the conservation, evaluation, utilization, and diversity of tea genetic resources.
Article
Full-text available
We present the latest version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, MEGA has been optimized for use on 64-bit computing systems for analyzing bigger datasets. Researchers can now explore and analyze tens of thousands of sequences in MEGA. The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit MEGA is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OSX. The command line MEGA is available as native applications for Windows, Linux, and Mac OSX. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.
Article
Full-text available
Tea is a natural nonalcoholic beverage consumed worldwide. And grafting technique has been widely used to improve tea quality and quantity. Camellia sinensis cv. Xianghongdian 1 is an excellent variety of Lu’an guapian tea in China, because of its clones, early sprouting, high yielding, superior quality, and strong stress resistance characters. In this study, we grafted shoots from Xianghongdian 1 onto the rootstocks of low-yielding indigenous tea plants cv. Qimenzhong to improve their varietal characteristics. In order to investigate the variations of metabolites and transcription patterns in the grafted tea plants, the leaves were collected from 1-year-old grafted tea plants, whereas the leaves from scions and rootstocks were served as controls. The metabolic profiles and the gene expression patterns in the target pathways were determined and correlated. The contents of caffeine and total catechins were both higher in the grafting leaf compared with those in scion leaf and rootstock leaf, while the total free amino acid content was lower. All these metabolic changes were correlated to the variations in the transcripts expression patterns. It indicated a feasibility of enhancing the agronomic and economic traits in the short-term grafting tea plants through asexual propagation. And the long-term effects should undergo further investigation. Graphical abstract
Article
Full-text available
Main conclusion The roles of microRNA-mediated epigenetic regulation were highlighted in the bud dormancy–activity cycle, implying that certain differentially expressed miRNAs play crucial roles in apical bud burst, such as csn-miR319c/TCP2. Abstract microRNAs (miRNAs) are a class of small non-coding RNAs that regulate gene expression by targeting mRNA transcripts for cleavage or directing translational inhibition. To investigate whether miRNAs regulate bud dormancy–activation transition in tea plant, which largely affects the yield and price of tea products and adaptability of tea trees, we constructed small RNA libraries from three different periods of bud dormancy–burst transition. Through sequencing analysis, 262 conserved and 83 novel miRNAs were identified, including 118 differentially expressed miRNAs. Quantitative RT-PCR results for randomly selected miRNAs exhibited that our comprehensive analysis is highly reliable and accurate. The content of caffeine increased continuously from the endodormancy bud to flushing bud, and differentially expressed miRNAs coupling with their targets associated with bud burst were identified. Remarkably, csn-miR319c was downregulated significantly from the quiescent bud to burst bud, while its target gene CsnTCP2 (TEOSINTE BRANCHED/CYCLOIDEA/PROLIFERATING CELL FACTOR 2) displayed opposite expression patterns. Co-transformation experiment in tobacco demonstrated that csn-miR319c can significantly suppress the functions of CsnTCP2. This study on miRNAs and the recognition of target genes could provide new insights into the molecular mechanism of the bud dormancy–activation transition in tea plant.
Article
Full-text available
Background: Recent advances in genomics have greatly increased research opportunities for non-model species. For wildlife, a growing availability of reference genomes means that population genetics is no longer restricted to a small set of anonymous loci. When used in conjunction with a reference genome, reduced-representation sequencing (RRS) provides a cost-effective method for obtaining reliable diversity information for population genetics. Many software tools have been developed to process RRS data, though few studies of non-model species incorporate genome alignment in calling loci. A commonly-used RRS analysis pipeline, Stacks, has this capacity and so it is timely to compare its utility with existing software originally designed for alignment and analysis of whole genome sequencing data. Here we examine population genetic inferences from two species for which reference-aligned reduced-representation data have been collected. Our two study species are a threatened Australian marsupial (Tasmanian devil Sarcophilus harrisii; declining population) and an Arctic-circle migrant bird (pink-footed goose Anser brachyrhynchus; expanding population). Analyses of these data are compared using Stacks versus two widely-used genomics packages, SAMtools and GATK. We also introduce a custom R script to improve the reliability of single nucleotide polymorphism (SNP) calls in all pipelines and conduct population genetic inferences for non-model species with reference genomes. Results: Although we identified orders of magnitude fewer SNPs in our devil dataset than for goose, we found remarkable symmetry between the two species in our assessment of software performance. For both datasets, all three methods were able to delineate population structure, even with varying numbers of loci. For both species, population structure inferences were influenced by the percent of missing data. Conclusions: For studies of non-model species with a reference genome, we recommend combining Stacks output with further filtering (as included in our R pipeline) for population genetic studies, paying particular attention to potential impact of missing data thresholds. We recognise SAMtools as a viable alternative for researchers more familiar with this software. We caution against the use of GATK in studies with limited computational resources or time.
Article
Full-text available
Flower color is considered an important appealing signal to pollinators and also a marker trait in Brassica crop breeding. However, the genetic basis of orange flower trait remains poorly understood in Brassica rapa. In this study, we conducted a genetic analysis of orange flower trait and fine mapped the underlying gene in B. rapa. Two populations, BC1F1 and BC1F2 with 478 and 443 individuals, respectively, were constructed from a cross between 94C9 (orange flower) and 92S105 (yellow flower). Genetic analysis showed that a single recessive gene, BrOF, controlled the orange flower trait. Using Indel and dCAPS markers developed from whole-genome resequencing data of 94C9 and 92S105, BrOF was mapped to a 41.5-kb region on chromosome A09 delimited by InDel409 and dCAPS425 containing six putative genes. Among them, only Bra037124 and Bra037125, which encode an AP2 domain–containing transcription factor and an SEC-C motif–containing protein/OTU-like cysteine protease family protein, respectively, were successfully cloned. The sequence analysis revealed two SNPs resulting in amino acid residue changes in the coding region of Bra037124, as well as seven SNPs and one insertion leading to amino acid residue mutations in the coding region of Bra037125, between 94C9 and 92S105. The reliability of a co-segregating marker InDel314 in marker-assisted selection (MAS) was confirmed by testing different yellow/orange flower Chinese cabbage lines. These results provide a good foundation to identify BrOF and facilitate our understanding of the genetic basis underlying the development of orange flowers in Chinese cabbage.
Article
Full-text available
Background The two original plants of the oolong tea cultivar (‘Tieguanyin’) are “Wei shuo” ‘Tieguanyin’—TGY (Wei) and “Wang shuo” ‘Tieguanyin’—TGY (Wang). Another cultivar, ‘Benshan’ (BS), is similar to TGY in its aroma, taste, and genetic make-up, but it lacks the “Yin Rhyme” flavor. We aimed to identify differences in biochemical characteristics and gene expression among these tea plants. Results The results of spectrophotometric, high performance liquid chromatography (HPLC), and gas chromatography-mass spectrometry (GC-MS) analyses revealed that TGY (Wei) and TGY (Wang) had deeper purple-colored leaves and higher contents of anthocyanin, catechins, caffeine, and limonene compared with BS. Analyses of transcriptome data revealed 12,420 differentially expressed genes (DEGs) among the cultivars. According to a Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, the flavonoid, caffeine, and limonene metabolic pathways were highly enriched. The transcript levels of the genes involved in these three metabolic pathways were not significantly different between TGY (Wei) and TGY (Wang), except for two unigenes encoding IMPDH and SAMS, which are involved in caffeine metabolism. The comparison of TGY vs. BS revealed eight up-regulated genes (PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR) and two down-regulated genes (FLS and CCR) in flavonoid metabolism, four up-regulated genes (AMPD, IMPDH, SAMS, and 5′-Nase) and one down-regulated XDH gene in caffeine metabolism; and two down-regulated genes (ALDH and HIBADH) in limonene degradation. In addition, the expression levels of the transcription factor (TF) PAP1 were significantly higher in TGY than in BS. Therefore, high accumulation of flavonoids, caffeine, and limonene metabolites and the expression patterns of their related genes in TGY might be beneficial for the formation of the “Yin Rhyme” flavor. Conclusions Transcriptomic, HPLC, and GC-MS analyses of TGY (Wei), TGY (Wang), and BS indicated that the expression levels of genes related to secondary metabolism and high contents of catechins, anthocyanin, caffeine, and limonene may contribute to the formation of the “Yin Rhyme” flavor in TGY. These findings provide new insights into the relationship between the accumulation of secondary metabolites and sensory quality, and the molecular mechanisms underlying the formation of the unique flavor “Yin Rhyme” in TGY.
Article
Full-text available
Background: Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of reliable SNP discovery methods. Especially, the optimum assembler and SNP caller for accurate SNP prediction from next generation sequencing data are not known. Results: Herein we performed SNP prediction based on RNA-seq data of peach and mandarin peel tissue under a comprehensive comparison of two paired-end read lengths (125 bp and 150 bp), five assemblers (Trinity, IDBA, oases, SOAPdenovo, Trans-abyss) and two SNP callers (GATK and GBS). The predicted SNPs were compared with the authentic SNPs identified via PCR amplification followed by gene cloning and sequencing procedures. A total of 40 and 240 authentic SNPs were presented in five anthocyanin biosynthesis related genes in peach and in nine carotenogenic genes in mandarin. Putative SNPs predicted from the same RNA-seq data with different strategies led to quite divergent results. The rate of false positive SNPs was significantly lower when the paired-end read length was 150 bp compared with 125 bp. Trinity was superior to the other four assemblers and GATK was substantially superior to GBS due to a low rate of missing authentic SNPs. The combination of assembler Trinity, SNP caller GATK, and the paired-end read length 150 bp had the best performance in SNP discovery with 100% accuracy both in peach and in mandarin cases. This strategy was applied to the characterization of SNPs in peach and mandarin transcriptomes. Conclusions: Through comparison of authentic SNPs obtained by PCR cloning strategy and putative SNPs predicted from different combinations of five assemblers, two SNP callers, and two paired-end read lengths, we provided a reliable and efficient strategy, Trinity-GATK with 150 bp paired-end read length, for SNP discovery from RNA-seq data. This strategy discovered SNP at 100% accuracy in peach and mandarin cases and might be applicable to a wide range of plants and other organisms.
Article
Full-text available
The genetic basis of selection for geographic adaptation and how it has contributed to population structure are unknown in tossa jute (Corchorus olitorius), an important bast fibre crop. We performed restriction site-associated DNA (RAD) sequencing-based (1115 RAD-SNPs) population genomic analyses to investigate genetic differentiation and population structure within a collection of 221 fibre-type lines from across nine geographic regions of the world. Indian populations, with relatively higher overall diversity, were significantly differentiated (based on FST and PCA) from the African and the other Asian populations. There is strong evidence that African C. olitorius was first introduced in peninsular India that could perhaps be its secondary centre of origin. However, multiple later introductions have occurred in central, eastern and northern India. Based on four assignment tests with different statistical bases, we infer that two ancestral subpopulations (African and Indian) structure the C. olitorius populations, but not in accordance with their geographic origins and patterns of diversity. Our results advocate recent migration of C. olitorius through introduction and germplasm exchange across geographical boundaries. We argue that high intraspecific genetic admixture could be associated with increased genetic variance within Indian populations. Employing both subpopulation (FST/GST-outlier) and individual-based (PCAdapt) tests, we detected putative RAD-SNP loci under selection and demonstrated that bast fibre production was an artificial, while abiotic and biotic stresses were natural selection pressures in C. olitorius adaptation. By reinferring the population structure without outlier loci, we propose ad interim that C. olitorius was possibly domesticated as a fibre crop in the Indian subcontinent.
Article
Full-text available
Background Guar [Cyamopsis tetragonoloba, L. Taub.] is an important industrial crop because of the commercial applications of the galactomannan gum contained in its seeds. Plant breeding programmes based on marker-assisted selection require a rich resource of molecular markers. As limited numbers of such markers are available for guar, molecular breeding programmes have not been undertaken for the genetic improvement of this important crop. Hence, the present work was done to enrich the molecular markers resource of guar by identifying high quality SSR, SNP and InDel markers from the RNA-Seq data of the roots of two guar varieties. Results We carried out RNA-Seq analysis of the roots of two guar varieties, namely, RGC-1066 and M-83. A total of 102,479 unigenes with an average length of 1016 bp were assembled from about 30 million high quality pair-end reads generated by an Illumina HiSeq 2500 platform. The assembled unigenes had 86.55% complete and 97.71% partially conserved eukaryotic genes (CEGs). The functional annotation of assembled unigenes using BLASTX against six databases showed that the guar unigenes were most similar to Glycine max. We could assign GO terms to 45,200 unigenes using the UniProt database. The screening of 102,479 unigenes with MISA and SAMtools version 1.4 softwares resulted in the identification of 25,040 high-confidence molecular markers which consisted of 18,792 SSRs, 5999 SNPs and 249 InDels. These markers tagged most of the genes involved in root development, stress tolerance and other general metabolic activities. Each of the 25,040 molecular markers was characterized, particularly with respect to its position in the unigene. For 71% of the molecular markers, we could determine the names, products and functions of the unigenes. About 80% of the markers, from a random sample of molecular markers, showed PCR amplification. Conclusions We have identified and characterized 25,040 high confidence SSR, SNP and InDel molecular markers in guar. It is expected that these markers will be useful in molecular breeding programmes and will also be helpful in studying molecular mechanisms of root development, stress tolerance and gum synthesis in guar. Electronic supplementary material The online version of this article (10.1186/s12864-018-5205-9) contains supplementary material, which is available to authorized users.
Article
Caffeine is a crucial secondary metabolic product in tea plants. Although the presence of caffeine in tea plants has been identified, the molecular mechanisms regulating relevant caffeine metabolism remain unclear. To elucidate the caffeine biosynthesis and catabolism in Camellia plants, fresh, germinated leaves from four Camellia plants with low (2), normal (1) and high (1) caffeine concentrations, namely low-caffeine tea 1 (LCT1, Camellia crassicolumna), low-caffeine tea 2 (LCT2, C. crassicolumna), Shuchazao (SCZ, C. sinensis) and Yunkang 43 (YK43, C. sinensis) were used in this research. Transcriptome and purine alkaloids analyses of these Camellia leaves were performed using RNA-Seq and liquid chromatography–mass spectrometry (LC-MS). Moreover, 15N-caffeine tracing was performed to determine the metabolic fate of caffeine in leaves of these plants. Caffeine content was correlated with related genes expression levels, and a quantitative real-time (qRT) PCR analysis of specific genes showed a consistent tendency with the obtained transcriptomic analysis. Based on the results of stable isotope-labelled tracer experiments, we discovered a degradation pathway of caffeine to theobromine. These findings could assist researchers in understanding the caffeine-related mechanisms in Camellia plants containing low, normal, and high caffeine content and be applied to caffeine regulation and breeding improvement in future research.