Content uploaded by Jordan Zager
Author content
All content in this area was uploaded by Jordan Zager on Dec 09, 2019
Content may be subject to copyright.
Gene Networks Underlying Cannabinoid and Terpenoid
Accumulation in Cannabis1[OPEN]
Jordan J. Zager,
a
Iris Lange,
a
Narayanan Srividya ,
a
Anthony Smith,
b
and B. Markus Lange
a,2,3
a
Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University,
Pullman, Washington 99164-6340
b
Evio Labs, Central Point, Oregon 97502
ORCID IDs: 0000-0001-6970-5832 (J.J.Z.); 0000-0001-7934-7987 (N.S.); 0000-0001-6565-9584 (B.M.L.).
Glandular trichomes are specialized anatomical structures that accumulate secretions with important biological roles in plant-
environment interactions. These secretions also have commercial uses in the flavor, fragrance, and pharmaceutical industries.
The capitate-stalked glandular trichomes of Cannabis sativa (cannabis), situated on the surfaces of the bracts of the female flowers,
are the primary site for the biosynthesis and storage of resins rich in cannabinoids and terpenoids. In this study, we profiled nine
commercial cannabis strains with purportedly different attributes, such as taste, color, smell, and genetic origin. Glandular
trichomes were isolated from each of these strains, and cell type-specific transcriptome data sets were acquired. Cannabinoids
and terpenoids were quantified in flower buds. Statistical analyses indicated that these data sets enable the high-resolution
differentiation of strains by providing complementary information. Integrative analyses revealed a coexpression network of
genes involved in the biosynthesis of both cannabinoids and terpenoids from imported precursors. Terpene synthase genes
involved in the biosynthesis of the major monoterpenes and sesquiterpenes routinely assayed by cannabis testing laboratories
were identified and functionally evaluated. In addition to cloning variants of previously characterized genes, specifically
CsTPS14CT [(2)-limonene synthase] and CsTPS15CT (b-myrcene synthase), we functionally evaluated genes that encode
enzymes with activities not previously described in cannabis, namely CsTPS18VF and CsTPS19BL (nerolidol/linalool
synthases), CsTPS16CC (germacrene B synthase), and CsTPS20CT (hedycaryol synthase). This study lays the groundwork for
developing a better understanding of the complex chemistry and biochemistry underlying resin accumulation across commercial
cannabis strains.
Cannabis sativa (cannabis) was originally discovered
in Central Asia and has likely been cultivated for tens of
thousands of years by human civilizations, with the
first mention about 5,000 years ago in Chinese texts
(Unschuld, 1986). Whereas the initial utility was pri-
marily as a source of grain and fiber, strains with me-
dicinal properties were already in use in northwest
China some 2,700 years ago, as evidenced by the
detection of the psychoactive cannabinoid, (2)-trans-
D
9
-tetrahydrocannabinol (THC), in plant residues re-
covered from an ancient grave (Russo et al., 2008).
Cannabis strains containing less THC but more of
the nonpsychoactive cannabidiol (CBD), commonly
referred to as hemp, were grown in Roman Britain for
grain and fiber but later found additional uses as a
medicine during the Anglo-Saxon period (Grattan and
Singer, 1952). The 1925 Geneva International Opium
Convention required signatories to control the trade of
certain drugs (including cannabis), which was followed
by increasingly restrictive resolutions by the League of
Nations and later the United Nations (United Nations,
1966). Until very recently, cannabis was considered an
illicit substance of abuse by many governments and
could only be researched by selected, authorized sci-
entists in tightly supervised laboratories. Despite these
restrictions, evidence for the medicinal potential was
sufficiently convincing that, by the mid-1980s, the
synthetic cannabinoids nabilone and dronabinol had
been granted approval by the U.S. Food and Drug
Administration to suppress nausea during chemother-
apy (Abuhasira et al., 2018). The discovery of the exis-
tence of a high-affinity cannabinoid receptor in the
rat brain during the late 1980s (Devane et al., 1988)
prompted further research to identify the endogenous
ligands. This resulted in the characterization, beginning
in the early 1990s, of several lipid-based retrograde
1
This work was supported by gifts from private individuals, with
no association with the cannabis industry. All work with raw mate-
rials was conducted by A.S. at a facility accredited to National Envi-
ronmental Laboratory Accreditation Program standards and licensed
by the Oregon Liquor Control Commission. Work of employees of
Washington State University (J.J.Z., I.L., and B.M.L.) was performed
in accordance with the OR/ORSO Guideline of July 2017.
2
Author for contact: lange-m@wsu.edu.
3
Senior author.
The author responsible for distribution of materials integral to the
findings presented in this article in accordance with the policy de-
scribed in the Instructions for Authors (www.plantphysiol.org) is: B.
Markus Lange (lange-m@wsu.edu).
J.J.Z., A.S., and B.M.L. designed the experiments; A.S. harvested
and extracted plant materials; A.S. performed metabolite analyses;
J.J.Z., I.L. and N.S. cloned terpene synthase genes and performed
functional assays; J.J.Z., A.S., and B.M.L. analyzed the data; J.J.Z.
and B.M.L. wrote the article, with input from all authors.
[OPEN]Articles can be viewed without a subscription.
www.plantphysiol.org/cgi/doi/10.1104/pp.18.01506
Plant Physiology
Ò
,August 2019, Vol. 180, pp. 1877–1897, www.plantphysiol.org Ó2019 American Society of Plant Biologists. All Rights Reserved. 1877
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
neurotransmitters (endocannabinoids) and multiple
enzymes involved in their biosynthesis, trafficking, and
perception (the endocannabinoid system), which were
subsequently demonstrated to regulate a multitude of
physiological and cognitive processes in humans and
other animals (Devane et al., 1992). With receptor tar-
gets in hand, follow-up research and clinical trials
brought several additional cannabis-related products to
the pharmaceutical marketplace, including nabiximols
(marketed as Sativex in Canada since 2005), a cannabis
extract used to treat symptoms of multiple sclerosis,
and a formulation of highly purified, plant-sourced
CBD (marketed as Epidiolex in the United States since
early 2018) to treat certain forms of epilepsy. In the
meantime, several jurisdictions and even entire coun-
tries changed their policies on cannabis, endorsing laws
that allow its therapeutic use and decriminalizing or
even legalizing it for recreational purposes (Abuhasira
et al., 2018). Legislation has not been able to keep up
with these recent developments, and specific labeling
regulations with regard to the composition of active
ingredients, serving sizes, and recommended doses are
woefully lacking (Subritzky et al., 2016). This situation
is exacerbated by an inadequate understanding of how
the chemistry (cannabinoids and other specialized
metabolites) of cannabis extracts and formulations re-
lates to their biological effects.
Since the original structural elucidation, during the
early 1960s, of THC as a psychoactive principle in
cannabis (Gaoni and Mechoulam, 1964), the structures
of more than 90 biogenic cannabinoids have been
reported to occur in members of the genus Cannabis
(Andre et al., 2016), with a handful of constituents being
the most prominent across strains (Fig. 1). These
Figure 1. Shared origin of the cannabinoid and
terpenoid biosynthetic pathways. A circled P
denotes phosphate moieties.
1878 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
cannabinoids accumulate primarily in capitate-stalked
glandular trichomes of female plants at the flowering
stage. A second class of metabolites with high abun-
dance and even greater chemical diversity in cannabis
glandular trichomes are monoterpenes and sesquiter-
penes (Fig. 1; Brenneisen, 2007). These volatile terpe-
noids are responsible for the distinctive aromas of
different cannabis strains. The popular press and trade
magazines liberally use the term “entourage effect”
to suggest that synergism among cannabinoids or
between cannabinoids and other constituents (in par-
ticular terpenoids) may contribute to different psycho-
logical perceptions of cannabis varieties by users. In
support of this view, b-caryophyllene, a sesquiterpene
with almost ubiquitous occurrence in plant oils and
resins, was demonstrated to bind with high affinity to
the CB2 cannabinoid receptor and has therefore been
referred to as a dietary cannabinoid (Gertsch et al.,
2008). However, there is only limited clinical evidence
for entourage effects of terpenoids in cannabis formu-
lations (Gertsch et al., 2010; Russo, 2011). Irrespective of
these considerations, the chemical composition of each
cannabis strain is unique, and acquiring a metabolic
fingerprint is an excellent first step in building a more
robust scientific foundation for assessing the correlation
between the composition of plant material and the
perception by users (Fischedick et al., 2010).
Most of the cannabis products traded licitly or illicitly
today are sourced from strains for which minimal
documentation is available in the public domain and
for which the primary goal was clearly to breed high-
THC strains (Cascini et al., 2012). In other words, the
genetics underlying chemical diversity in commercial
cannabis strains is currently poorly understood
(Welling et al., 2016). In this context, it is interesting that
cannabinoids and terpenoids share a common biosyn-
thetic origin. The biosynthesis of the prominent can-
nabinoids involves two direct precursor pathways. The
polyketide pathway gives rise to olivetolic acid from a
short-chain fatty acid intermediate (hexanoyl-CoA),
whereas the methylerythritol 4-phosphate (MEP)
pathway provides geranyl diphosphate (GPP; Fig. 1;
Fellermeier et al., 2001; Taura et al., 2009; Gagne et al.,
2012; Stout et al., 2012 ). An aromatic prenyltransferase
catalyzes the formation of cannabigerolic acid from
oilvetolic acid and GPP (Fellermeier and Zenk, 1998;
Page and Boubakir, 2012). The pathway then branches
again toward different cyclized products, such as tet-
rahydrocannabinolic acid (THCA), cannabidiolic acid
(CBDA), and cannabichromanic acid (Fig. 1; Sirikantaramas
et al., 2005; Taura et al., 2007). Reduced metabolic
products of these acids are formed nonenzymatically
by exposure to heat . Plant monoterpenes are mostly
derived from the plastid-localized MEP pathway,
whereas the cytosolic/peroxisomal mevalonate path-
way is a common source of precursors for sesquiter-
penes, although cross talk between both pathways has
also been reported (Fig. 1; Hemmerlin et al., 2012).
Terpene synthases catalyze the first committed step in
the biosynthesis of a specific terpenoid from a prenyl
diphosphate precursor of the appropriate chain length.
To date, monoterpene synthases (accepting a C10 pre-
cursor) and sesquiterpene synthases (acting on a C15
precursor) that are responsible for the production of
about half a dozen terpenoids in cannabis have been
reported (Fig. 1; Günnewich et al., 2007; Booth et al.,
2017), with many more awaiting functional characteri-
zation. In this article, we report the chemical profiles and
corresponding gene networks across several cannabis
strains, thereby building the foundation for a better un-
derstanding of their chemical and biochemical diversity.
RESULTS
Strategic Considerations for Logistics, Strain Selection, and
Experimental Design
One of the goals of this pilot study was to test the
utility of combining metabolic and transcriptomic data
to differentiate cannabis strains with regard to the most
relevant traits. To ensure the consistency of data sets, all
plant materials were sourced from the same facility,
where they had been maintained under comparable
growth conditions (Shadowbox Farms in Williams,
Oregon). Plant harvest was performed when the ap-
pearance of glandular trichome content had changed
from a turbid white to clear and before another change
to an amber-like color occurred. For most strains, the
pistils had changed color from white to yellow or or-
ange. These are the visual cues used by experienced
growers to indicate optimal harvest time. All further
processing was performed with fresh (uncured) mate-
rial to avoid the previously reported loss of terpenoid
volatiles during drying (Ross and ElSohly, 1996). Can-
nabinoids and terpenoids were extracted and quanti-
fied at a testing facility licensed according to the National
Environmental Laboratory Accreditation Program’sTNI
2009 Standard (Evio Labs). At this facility, fractions highly
enriched in glandular trichomes were obtained and RNA
was isolated, with minor modifications, using previously
established protocols (Lange et al., 2000). Glandular
trichome-specific RNA sequencing (RNA-seq) data were
then acquired by a commercial service provider (Quick
Biology). Metabolite and transcriptome data were ac-
quired for three biological replicates per strain.
This study involved a selection of strains with
C. sativa ancestry, whereas Cannabis indica (formally
classified as C. sativa forma indica) was dominant in
others (Fig. 2). Strains of C. sativa provenance are gen-
erally characterized by fairly thin and narrow leaves,
comparatively longer flowering cycles, and a relatively
tall stature. A typical example in this study is Mama
Thai, which is generally considered a landrace of
C. sativa. In contrast, C. indica strains ordinarily have
large and thick leaves, a rather short flowering cycle
(6–8 weeks), and a proportionately short habitus
(Fig. 2A). Our pilot study featured Blackberry Kush as a
C. indica dominant strain. The remaining strains were
hybrids of mixed C. sativa and C. indica lineage, plus one
strain (Terple) with poorly documented origin (Fig. 2B).
Plant Physiol. Vol. 180, 2019 1879
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
To address our goal of assessing the utility of our data
for classifying strains, RNA-seq and chemical data
(cannabinoid and terpenoid profiles) were subjected to
multivariate statistical analyses. We then tested the
hypothesis that cannabinoid and terpenoid pathways
are coregulated by performing gene coexpression net-
work analyses. A combination of gene network and
phylogenetic analyses was subsequently used to iden-
tify candidate genes for hitherto uncharacterized ter-
pene synthases that contribute significantly to the
cannabis volatile bouquet.
Strain Differentiation Based on RNA-Seq Data
High-quality libraries reflecting transcripts expressed
in isolated glandular trichomes were subjected to RNA-
seq analysis (nine strains, three biological replicates each,
27 samples total) on the Illumina HiSeq 4000 platform.
A de novo consensus transcriptome assembly was
generated using the Trinity suite (Haas et al., 2013; as-
sembly statistics are given in Supplemental Table S1).
The reads were assembled into contigs covering a total
of 305 Mb of sequence with a GC content of 40.4%.
The resulting assembly produced an N50 (sum of the
lengths of all contigs of N50 value or longer contain at
least 50 % of the total transcriptome sequence) value of
833 bp, containing 514,208 contigs of at least 201 bp in
length. The assembled transcriptome data set was
searched against the National Center for Biotechnology
Information nonredundant protein database, which
resulted in the annotation of 82,523 sequences at
e-values ,1e-5. Read counts for each transcript in each
sample were then processed with the RSEM software
package (Li and Dewey, 2011) to calculate normalized
Figure 2. Characteristics of cannabis
strains. A, Floral phenotypes. B, Origins
and aroma descriptions (according to
https://www.leafly.com).
1880 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
expression levels as transcripts per kilobase million
(TPM). Transcripts with TPM values lower than 5
across all varieties were removed from subsequent
analysis, resulting in 46,559 predicted genes with sig-
nificant expression (Supplemental Table S2).
As a first step to investigate the utility of RNA-seq for
strain categorization, transcriptome data sets were
subjected to principal component analysis (PCA), a
statistical procedure that reduces attribute space from a
larger number of variables to a smaller number of so-
called principal components, thereby decreasing the
dimensionality of the original data. The first three
principal components accounted for 83% of the varia-
bility in the data set (Fig. 3A). The replicates for each
strain clustered together in a three-dimensional PCA
plot, whereas the component scores for each strain were
separated from those of all other strains, indicating that
the overall transcriptome of each strain is unique
(Fig. 3A). Processing of RNA-seq data by hierarchical
clustering analysis (HCA), which builds a cluster hier-
archy that is commonly displayed as a dendrogram,
grouped strains into two major clades (Fig. 3B). The first
clade contained Blackberry Kush, Cherry Chem, and
Terple, whereas the second consisted of Mama Thai,
White Cookies, Valley Fire, Black Lime, Canna Tsu, and
Sour Diesel, indicating a clear separation of strains by
heritage (C. indica for clade 1 and C. sativa for clade 2).
Strain Differentiation Based on Metabolite Profiling Data
The highly robust analytical platforms that served as
the basis for the analysis of six cannabinoids and 24
terpenoids were described in a previous report
(Fischedick et al., 2010) and used here with minor
modifications. Cannabinoid concentration was highest
in White Cookies (28.4% of flower bud dry weight),
with relatively high contents also occurring in Cherry
Chem (17.7%), Black Lime (17.5%), Backberry Kush
(15.8%), Valley Fire (15.7%), Terple (15.6%), Sour Diesel
(12.4%), and Canna Tsu (12.2%; Table 1). Significantly
lower concentrations were detected in Mama Thai
(6.4%). In eight of the nine strains investigated, THCA
was the major cannabinoid, ranging from 26.3% of the
flower bud dry weight in White Cookies to 5.9% in
Mama Thai (Table 1). The only exception was the Canna
Figure 3. Cannabis strain differentiation based on glandular trichome-specific RNA-seq data. A, Three-dimensional plot rep-
resenting outcomes of a PCA. B, Heat map of a two-way HCA. The numerical values and red-white-blue color code indicate the
log
2
fold change compared with the average geneexpression value across all strains. Strain abbreviations at the bottom of B are as
follows: BB, Blackberry Kush; BL, Black Lime; CC, Cherry Chem; CT, Canna Tsu; MT, Mama Thai; SD, Sour Diesel; T, Terple; VF,
Valley Fire; WC, White Cookies.
Plant Physiol. Vol. 180, 2019 1881
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Table 1. Constituents of cannabis female flower buds (metabolite content in nine strains expressed as percentage of dry weight)
n.d., Not detectable.
Metabolite Blackberry Kush Black Lime Canna Tsu Cherry Chem Valley Fire Mamma Thai Sour Diesel Terple White Cookies
Cannabinoids
THCA 13.56 60.90 15.02 61.10 3.19 60.20 16.55 60.81 13.89 61.33 5.91 60.60 11.31 61.04 13.72 61.36 26.33 60.54
Tetrahydrocannabinol 0.31 60.02 1.62 60.19 0.55 60.055 0.15 60.008 0.41 60.049 0.14 60.02 0.22 60.027 1.15 60.12 0.86 60.091
CBDA 0.45 60.02 0.12 60.012 7.76 60.63 0.079 60.007 0.037 60.001 0.016 60.003 0.032 60.002 0.067 60.002 0.088 60.004
CBD 0.95 60.07 0.139 60.016 0.085 60.013 0.079 60.008 0.12 60.004 0.047 60.005 0.086 60.006 0.11 60.008 0.098 60.013
Cannabigerol 0.12 60.015 0.086 60.008 0.093 60.008 0.051 60.005 0.15 60.027 0.016 60.001 0.052 60.004 0.093 60.002 0.25 60.005
Cannabinol 1.74 60.20 0.55 60.019 0.53 60.051 0.83 60.019 1.12 60.14 0.29 60.028 0.68 60.033 0.502 60.007 0.78 60.025
Cannabichromene n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d.
Total cannabinoids 15.87 61.13 17.53 61.25 12.20 60.85 17.74 60.82 15.70 61.53 6.41 60.65 12.387 61.05 15.64 61.47 28.40 60.54
Monoterpenes
?-Myrcene 2.35 60.2 4.34 60.36 1.70 60.15 1.61 60.049 2.24 60.28 0.11 60.009 0.70 60.046 2.96 60.25 1.14 60.17
(2)-Limonene 0.29 60.02 0.89 60.08 0.16 60.021 0.23 60.015 0.65 60.098 0.03 60.003 0.17 60.011 0.23 60.019 1.53 60.24
?-Pinene 0.015 60.001 1.99 60.12 0.38 60.039 0.016 60.001 0.044 60.008 0.007 60.001 0.004 60 0.82 60.051 0.20 60.032
?-Pinene 0.086 60.005 0.50 60.034 0.18 60.025 0.056 60.003 0.11 60.013 0.026 60.002 0.039 60.002 0.31 60.022 0.04 60.007
1,8-Cineole 0.26 60.02 0.38 60.038 0.52 60.075 0.464 60.012 0.22 60.028 0.057 60.007 0.11 60.011 0.00 60 0.31 60.037
Linalool 0.082 60.005 0.079 60.004 0.052 60.005 0.13 60.003 0.16 60.027 0.023 60.002 0.074 60.005 0.067 60.006 0.57 60.072
Terpinolene 0.019 60.001 0.034 60.003 0.019 60.002 0.019 60.001 0.02 60.003 0.13 60.016 0.017 60.001 0.02 60.002 0.041 60.006
Borneol 0.039 60.002 0.041 60.003 n.d. 0.032 60.002 0.033 60.005 0.021 60.002 0.026 60.002 0.036 60.002 0.048 60.008
?-Ocimene n.d. 0.039 60.003 n.d. n.d. 0.006 60.001 0.13 60.014 n.d. 0.086 60.007 0.015 60.002
Camphene n.d. 0.089 60.008 0.055 60.007 n.d. 0.004 60.001 n.d. n.d. 0.019 60.002 0.07 60.012
d-3-Carene 0.029 60.002 0.052 60.006 0.003 60 0.008 60.001 0.022 60.003 0.003 60.001 n.d. 0.027 60.002 0.016 60.002
Camphor 0.044 60.003 0.006 60.001 n.d. n.d. n.d. n.d. n.d. n.d. 0.101 60.013
(1)-Terpinene 0.001 60.001 n.d. n.d. n.d. n.d. 0.005 60.001 n.d. n.d. 0.002 60
Total monoterpenes 3.23 60.26 8.43 60.66 3.07 60.32 2.56 60.085 3.52 60.47 0.54 60.057 1.14 60.078 4.57 60.36 4.09 60.60
Sesquiterpenes
?-Caryophyllene 0.13 60.01 0.24 60.023 0.21 60.022 0.74 60.012 0.23 60.034 0.12 60.013 0.45 60.026 0.15 60.009 0.60 60.068
?-Humulene 0.03 60.002 0.06 60.005 0.051 60.005 0.20 60.011 0.087 60.014 0.068 60.008 0.19 60.009 0.058 60.003 0.15 60.018
Nerolidol n.d. 0.06 60.004 n.d. n.d. n.d. n.d. n.d. n.d. n.d.
Total sesquiterpenes 0.16 60.015 0.361 60.032 0.26 60.027 0.93 60.019 0.32 60.048 0.19 60.021 0.64 60.035 0.21 60.012 0.75 60.086
Total terpenoids 3.39 60.27 8.79 60.69 3.33 60.35 3.49 60.10 3.84 60.51 0.73 60.078 1.78 60.11 4.78 60.38 4.83 60.69
1882 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Tsu strain, in which CBDA (7.8% of flower bud dry
weight) dominated over THCA (3.2%), whereas CBDA
in all other strains remained at 1% or less. Two addi-
tional cannabinoids of fairly high abundance were
cannabinol, which accumulated to 0.2% to 1.7% of
flower bud dry weight, and tetrahydrocannabinol,
which amounted to 0.2% to 1.6% (Table 1; for struc-
tures, see Fig. 1). Cannabichromene was not detected in
any of the sampled varieties.
Terpenoid content was highest in Black Lime (8.8% of
flower bud dry weight), with fairly high contents also
occurring in White Cookies (4.8%), Terple (4.8%), Val-
ley Fire (3.8%), Cherry Chem (3.5%), Blackberry Kush
(3.4%), and Canna Tsu (3.3%; Table 1). Significantly
lower concentrations were detected in Sour Diesel
(1.8%) and Mama Thai (0.7%). The monoterpene (C10)-
to-sesquiterpene (C15) ratio was generally very high
(greater than 10), with only three strains in which the
ratio was below 3 (Cherry Chem, Mama Thai, and
Sour Diesel; Table 1). It should be noted that this ratio
only applies to the terpenoids we were able to quan-
tify based on the availability of authentic standards.
b-Myrcene was the most abundant monoterpene in
most strains (up to 4.3% of flower bud dry weight in
Black Lime). The only exceptions were Mama Thai
(generally low terpenoid contents, with terpinolene as
the most abundant monoterpene at 0.1%) and White
Cookies (with limonene at 1.5%; Table 1). Limonene
content was also high in Black Lime (0.9%) and Valley
Fire (0.7%). a-Pinene and b-pinene amounts were quite
high in Black Lime (2% and 0.5%, respectively). 1,8-
Cineole was particularly abundant in Canna Tsu
and Cherry Chem (0.5% in both; Table 1). All other
monoterpenes had concentrations below 0.2%. All
strains contained sesquiterpenes, of which b-caryophyllene
was consistently the most abundant (0.1%–0.7% of flower
bud dry weight). a-Humulene was also detectable in all
strains (less than 0.2%), whereas Black Lime was the only
strain in which the nerolidol concentration rose above the
limit of quantitation (less than 0.1%; Table 1).
Processing of the metabolite data (cannabinoid and
terpenoid profiles) by PCA resulted in a clear separa-
tion of the strains, with individual biological replicates
clustering closely together (Fig. 4A). Remarkably, 99%
of the data variation across genotypes was captured by
the first three principal components. Application of
orthogonal projections to latent structures discriminant
analysis (OPLS-DA), a statistical modeling tool used
commonly in metabolomics research (Worley and
Powers, 2013), indicated a separation of strains into
two groups based on our metabolite profiling data, one
representing the C. indica-dominant strains, whereas
the other constituted the C. sativa-dominant strains
(Fig. 4B). Biological replicates for each strain once again
clustered together, whereas significant separation was
observed across strains. In summary, glandular
trichome-specific gene expression and metabolite data
were consistent in differentiating cannabis strains.
Evidence for Coexpression of Cannabinoid and
Terpenoid Pathways
Our glandular trichome RNA-seq data sets were fil-
tered to eliminate genes with consistently low expression
levels (below 50 TPM), thereby retaining roughly 16,000
Figure 4. Cannabis strain differentiation based on cannabinoid and terpenoid profiles. A, Three-dimensional plot representing
outcomes of a PCA. B, Two-dimensional plot of the outcomes of OPLS-DA.
Plant Physiol. Vol. 180, 2019 1883
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
expressed genes with significant expression levels in at
least one strain. Gene abundance across strains was
then evaluated using the weighted gene correlation
network analysis (WGCNA) package in R(Langfelder
and Horvath, 2008), which resulted in the binning of
genes (only those with Spearman correlation coeffi-
cients [SCCs] $0.8 were considered) into seven coex-
pression modules (Supplemental Table S3). Further
analysis using the moduleEigengenes function indi-
cated that the accumulation of CBDA, the signature
cannabinoid of the Canna Tsu strain, was highly
correlated (SCC of 0.97, Pvalue of 2e-17) with one of the
coexpression modules (indicated by brown color in
Fig. 5A). Interestingly, this module contained the gene
coding for CBDA synthase, the enzyme responsible
for the conversion of cannabigerolic acid to CBDA
(Table 2). An analogous analysis for THCA or THC
(which correlated with a module indicated by yellow
color in Fig. 5A) and THCA synthase was not possible,
because single-nucleotide polymorphisms in this gene
(and not lack of expression) result in an inactive enzyme
in strains that accumulate primarily CBDA (Kojoma
Figure 5. Coexpression of genes involved in cannabinoid and terpenoid biosynthesis. A, WGCNA of glandulartrichome-specific
RNA-seq data categorizes transcripts into eight color-coded modules (for gene lists, see Supplemental Table S3). B, Correlation of
WGCNA modules with metabolites. A color code is used to visualize the SCCs for each module-metabolite pair, with red color
representing positive and blue color indicating negative SCCs. C, Genes involved in cannabinoid and terpenoid biosynthesis are
enriched in the yellow coexpression module obtained by WGCNA. Color code for pathways: light blue, hexanoate
formation; dark green, precursors for monoterpenes; light green, monoterpene synthases; orange, sesquiterpenes; dark blue,
cannabinoids; cyan, remaining genes. D, Functional context of genes highlighted in C in a simplified metabolic pathway scheme.
AAE1, Acyl-activating enzyme for short-chain fatty acids; Ac-CoA, acetyl-CoA; ACC1, acetyl-CoA carboxylase; CsTPS1FN/
CsTPS14CT, (2)-limonene synthase; CsTPS2SK, (1)-a-pinene synthase; CsTPS3FN/CsTPS15CT, b-myrcene synthase;
CsTPS16CC, germacrene B synthase; DHAP, dihydroxyacetone phosphate; DXS, 1-deoxy-D-xylulose-5-phosphate synthase;
ENO, enolase; FNR-Root, ferredoxin-NADP
1
reductase (isoform of roots and glandular trichomes); FPPS, farnesyl diphosphate
synthase; GAP, glyceraldehyde-3-phosphate; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; GPP, geranyl diphosphate;
GPPS, geranyl diphosphate synthase; KR, b-ketoacyl reductase (fatty acid synthase complex); OA, olivetolic acid; PDH, pyruvate
dehydrogenase; PFK, phosphofructokinase; PGI, phosphoglucoisomerase; PGM, phosphoglucomutase; PK, pyruvate kinase;
PT1, cannabigerolic acid synthase; Pyr, pyruvate; THCAS, tetrahydrocannabinolic acid synthase; TPI, triose phosphate isomerase.
1884 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Table 2. Transcript abundance (in TPM) for genes involved in the biosynthesis of cannabinoids and terpenoids in cannabis strains
n.d., Not detectable.
Gene Annotation UniProt Identifier
Transcript Abundance
Blackberry
Kush
Black
Lime Canna Tsu Cherry
Chem Mama Thai Sour
Diesel Terple Valley Fire White
Cookies
Cannabinoid pathway
Acyl activating enzyme1 H9A1V3_CANSA 80.63 160.44 316.06 840.92 377.29 397.99 93.59 188.84 229.65
Olivetol synthase OLIS_CANSA 3,946.85 9,454.00 10,400.03 14,619.66 17,955.05 4,984.60 9,706.06 11,374.75 12,373.11
Geranyl diphosphate:olivetolate
geranyltransferase
CsPT1 422.42 222.42 189.43 407.76 649.37 263.62 246.13 175.87 115.21
CBDA synthase CBDAS_CANSA n.d. n.d. 1282.46 n.d. n.d. 18.39 n.d. n.d. n.d.
THCA synthase THCAS_CANSA 885.17 423.29 1203.31 2321.64 2317.68 1557.54 619.22 309.23 524.08
MEP pathway
1-Deoxy-D-xylulose-5-phosphate synthase A0A1V0QSH6_CANSA 221.85 284.41 412.74 1627.02 319.76 1751.70 533.57 288.69 16.32
1-Deoxy-D-xylulose 5-phosphate
reductoisomerase
A0A1V0QSG8_CANSA 172.63 228.15 185.07 667.96 304.62 117.92 176.79 256.25 16.01
2-C-Methyl-D-erythritol 4-phosphate
cytidylyltransferase
A0A1V0QSI6_CANSA 36.77 95.99 96.25 168.24 160.40 146.38 46.96 75.40 64.73
4-(Cytidine 59-diphospho)-2-C-methyl-D-
erythritol kinase
A0A1V0QSI2_CANSA 35.20 3.70 67.94 211.85 212.43 109.88 57.60 104.05 80.23
2-C-Methyl-D-erythritol 2,4,-
cyclodiphosphate synthase
G9C075_HUMLU 67.75 118.23 315.86 338.21 184.98 419.84 69.75 171.17 207.15
(E)-4-Hydroxy-3-methylbut-2-enyl-
diphosphate synthase
A0A1V0QSG3_CANSA 107.65 287.57 794.25 744.09 444.09 596.36 349.56 297.07 317.55
(E)-4-Hydroxy-3-methylbut-2-enyl-
diphosphate reductase
A0A1V0QSH9_CANSA 1,485.98 561.96 3,447.50 3,468.57 3,090.49 3,024.22 1,889.37 1,031.90 4,544.35
Isopentenyldiphosphate isomerase A0A1V0QSG5_CANSA 165.10 272.72 433.46 1,836.07 306.03 347.85 476.86 509.70 9.96
Mevalonate pathway
Acetoacetyl-CoA thiolase A0A1V0QSH3_CANSA 38.35 11.90 253.38 302.58 313.99 134.71 252.40 54.35 248.13
3-Hydroxy-3-methylglutaryl-CoA synthase A0A1V0QSH3_CANSA 13.44 22.98 20.81 21.60 27.81 34.33 9.24 19.32 91.24
3-Hydroxy-3-methylglutaryl-CoA reductase A0A1V0QSF5_CANSA 26.69 56.93 21.92 43.41 29.05 107.71 19.75 69.30 48.26
Mevalonate kinase A0A1V0QSI0_CANSA 1.63 1,449.32 3.63 3.41 5.81 4.75 2.45 5.93 5.05
Phosphomevalonate kinase A0A1V0QSH8_CANSA 3.68 7.58 7.99 6.63 8.09 6.03 3.81 7.40 305.27
Mevalonate diphosphate decarboxylase A0A1V0QSG4_CANSA 5.00 11.89 10.21 14.89 21.24 19.39 9.67 9.64 9.96
Plant Physiol. Vol. 180, 2019 1885
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
et al., 2006; Laverty et al., 2019; Table 2). Interestingly, the
THCA synthase sequences were essentially identical,
with the exception of that of the Canna Tsu strain, the only
CBDA accumulator in our pilot study (Supplemental Fig.
S1). Consequently, a full-length CBDA synthase gene was
expressed only in the Canna Tsu strain (Supplemental
Fig. S2), which is novel information that furthers our
understanding of the mechanisms underlying CBDA ac-
cumulation. Finally, the yellow-colored module (which as
mentioned above contained THCA synthase) also com-
prised cannabigerolic acid synthase (Table 2), the gene
preceding THCA synthase in the cannabinoid pathway
(Fig. 1), thereby providing additional evidence for gene-
to-metabolite correlation in the cannabinoid pathway.
We then asked if similar gene-to-metabolite correla-
tions occurred in the terpenoid pathway. Interestingly,
two coexpression modules (indicated by black and
yellow color in Fig. 5A) correlated with b-myrcene ac-
cumulation (Fig. 5B). This metabolite is formed by a
monoterpene synthase encoded by the CsTPS3FN gene
(Booth et al., 2017), which was contained in one of these
modules (yellow color in Fig. 5A; Table 3). Analogous
gene-to-metabolite correlations were observed for lim-
onene and CsTPS1FN,a-pinene and CsTPS2FN,
b-ocimene and CsTPS6FN, and b-caryophyllene/
a-humulene and CsTPS9FN (color of modules in
Fig. 5A: black, yellow, and yellow, turquoise, respec-
tively; terpene synthase annotation based on
Günnewich et al. [2007] and Booth et al. [2017]; Fig. 5B).
Transcripts corresponding to CsTPS5FN (b-myrcene/
a-pinene synthase), CsTPS4FN (alloaromadendrene
synthase), CsTPS8FN (g-eudesmol/valencene syn-
thase), and CsTPS13PK (a second b-ocimene synthase;
Booth et al., 2017) remained below the threshold ex-
pression level in our data sets. The corresponding ter-
penoids were not detected in the strains investigated,
indicating that the expressed gene complement was
generally sufficient to account for the presence of the
major terpenoids (Table 3). Linalool and nerolidol were
exceptions for which the corresponding terpene syn-
thases had hitherto not been identified from cannabis.
Notably, genes involved in the formation of these ter-
penoids (and others) were cloned and functionally
characterized as part of this study, which contributes
significantly to a better understanding of the genetic
underpinnings of terpenoid diversity.
The yellow module featured prominently in our
gene-to-metabolite correlation analysis for the canna-
binoid and terpenoid pathways. Interestingly, a Gene
Ontology (GO) analysis implied a substantial enrich-
ment of genes involved in terpenoid biosynthesis in the
yellow module (Pvalue of 1.4e-05; Supplemental Table
S3; note that GO terms for cannabinoid biosynthesis as
a biological process have not yet been released). Inter-
estingly, a total of 22 genes involved in the conver-
sion of precursor metabolites into cannabinoid and
terpenoid end products were coexpressed with THCA
synthase (Fig. 5C). Specifically, these genes code for
enzymes involved in glycolysis (conversion of an
imported carbon source into triose phosphates and
pyruvic acid), the MEP pathway toward GPP and ul-
timately monoterpenes, the production of sesquiter-
penes, the formation of olivetolic acid from fatty acid
precursors, and the incorporation of olivetolic acid and
GPP into cannabinoids (Fig. 5D).
Target Gene Identification and Characterization
Building on our terpenoid profiling and glandu-
lar trichome-specific transcriptome data sets, we
embarked on gene discovery efforts aimed at charac-
terizing terpene synthases associated with the biosyn-
thesis of major monoterpenes and sesquiterpenes
routinely quantified in commercial cannabis testing as
well as other terpenoids that are not assayed routinely.
The analytical chemistry data were employed to assess
which genes would be expected to be expressed to
support the observed terpenoid profiles. We then per-
formed BLASTX searches with previously character-
ized terpene synthases to identify contigs with high
sequence identity in our transcriptome data sets. We
then asked which of the putative cannabis terpene
synthases were expressed at appreciable levels in par-
ticular cannabis strains. Sequences of selected contigs
were then chosen to perform a sequence relatedness
analysis with previously characterized terpene syn-
thases, thereby enabling their categorization by class.
cDNAs of putative terpene synthases were cloned into
appropriate vectors and expressed heterologously in
Escherichia coli, the corresponding recombinant proteins
were purified, and assays were performed with ap-
propriate prenyl diphosphate substrates. Expression
for genes putatively encoding geranyl diphosphate
synthase and trans,trans-farnesyl diphosphate syn-
thase was readily detectable in transcriptome data sets
of all strains; in contrast, no putative orthologs of neryl
diphosphate (NPP) synthase and cis,cis-farnesyl di-
phosphate synthase were recognizable based on se-
quence identity (Supplemental Tables S1 and S2).
Nevertheless, terpene synthase assays were performed
with GPP, NPP, 2-trans,6-trans-farnesyl diphosphate
(tFPP), and 2-cis,6-cis-farnesyl diphosphate (cFPP).
b-Myrcene and (2)-limonene were principal mono-
terpenes in all strains (Table 1), and expectedly, contigs
with high sequence identity to the previously charac-
terized b-myrcene and (2)-limonene synthases of can-
nabis (Günnewich et al., 2007; Booth et al., 2017), which
belong to the TPS-b clade of terpene synthases (Fig. 6;
Supplemental Table S4), were expressed at high levels
across most strains investigated in this study (Table 2).
Cloning was successful for the corresponding cDNAs
from the Canna Tsu strain (CsTPS14CT and CsTPS15CT),
and a functional evaluation confirmed the annotation
[(2)-limonene synthase and b-myrcene, respectively;
Fig. 7, A and B]. The translated peptide sequences of
b-myrcene synthases (CsTPS3FN and CsTPS15CT; ex-
cluding plastidial targeting sequence) had 13 mis-
matches (Supplemental Fig. S3) but identical specificity
(100% b-myrcene as product with GPP as substrate).
1886 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Table 3. Transcript abundance (in TPM) for terpene synthases across cannabis strains
n.d., Not detectable.
Gene
GenBank Accession
No.
CsTPS
Identifier Transcript Abundance
Blackberry
Kush
Black
Lime Canna Tsu Cherry
Chem Valley Fire Mamma
Thai
Sour
Diesel Terple White
Cookies
Monoterpene synthases (TPS-b clade)
(2)-Limonene synthase
a
MK801766 CsTPS14CT 646.24 898.94 612.37 651.86 2272.48 751.48 201.94 2.46 895.86
(1)-a-Pinene synthase
b
KY014565 CsTPS2FN 217.36 2,041.33 1,554.77 101.32 96.90 n.d. n.d. 1,298.95 49.52
b-Myrcene synthase
a
MK801765 CsTPS15CT 183.29 597.88 325.85 272.65 157.78 254.10 183.29 436.63 n.d.
b-Myrcene/(2)-a-pinene synthase
b
KY014560 CsTPS5FN 217.59 640.97 483.09 547.24 157.78 445.85 125.94 472.33 50.51
(E)-b-Ocimene synthase
b
KY014563 CsTPS6FN n.d. n.d. n.d. n.d. n.d. 103.41 n.d. 191.65 n.d.
(Z)-b-Ocimene synthase
b
KY014558 CsTPS13PK n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d.
Acyclic terpene synthases (TPS-g clade)
(E)-Nerolidol/(1)-linalool synthase
a
MK801764 CsTPS18VF 2.82 9.41 2.62 16.21 16.39 2.51 4.80 16.77 8.76
(E)-Nerolidol/linalool synthase
a
MK801763 CSTPS19BL 56.78 81.13 27.22 80.23 249.23 62.53 47.73 90.86 66.47
Sesquiterpene synthases (TPS-a clade)
Alloaromadendrene synthase
b
KY014564 CsTPS4FN n.d. 108.92 n.d. 639.56 n.d. 329.87 148.17 n.d. 323.36
g-Eudesmol/valencene synthase
(putative)
b
KY014556 CsTPS8FN n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d.
d-Selinene synthase (putative)
b
KY014554 CsTPS7FN 356.34 n.d. 367.47 n.d. 316.74 210.58 n.d. n.d. 268.50
b-Caryophyllene/a-humulene
synthase
b
KY014555 CsTPS9FN 764.18 794.46 435.11 3,241.85 1,090.94 738.74 555.25 495.72 591.86
Germacrene B synthase
a
MK131289 CsTPS16CC 16.14 19.44 9.13 156.08 20.60 40.36 20.22 7.19 22.72
Hedycaryol synthase
a
MK801762 CSTPS20CT 310.43 27.00 498.70 98.21 19.35 11.98 17.67 0.00 17.02
a
Functionally characterized as part of this study.
b
From Booth et al. (2017).
Plant Physiol. Vol. 180, 2019 1887
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
The sequence of the (2)-limonene synthase character-
ized as part of this study (CsTPS14CT; excluding plas-
tidial targeting sequence) had two mismatches when
compared with CsTPS1SK and nine mismatches when
compared with CsTPS1FN (Supplemental Fig. S3). As
described for CsTPS1SK, CsTPS14CT generated several
other products, and we report the stereochemistry of
those (Fig. 7A).
The monoterpene linalool was accumulated to fairly
high amounts in the Valley Fire and White Cookies
strains, whereas the sesquiterpene nerolidol was
quantifiable only in the Black Lime strain (Table 1).
Contigs with moderate sequence identity (slightly
above 50%) to bifunctional nerolidol/linalool syn-
thases (strawberry [Fragaria spp.; Aharoni et al., 2004]
and snapdragon [Antirrhinum majus; Nagegowda
et al., 2008]) and considerable expression in glandular
trichomes were identified in our transcriptome data
sets (Table 3), and corresponding cDNAs were cloned
from the Valley Fire (CsTPS18VF) and Black Lime
(CsTPS19BL) strains. These sequences belong to the
TPS-g clade of terpene synthases (Fig. 6; Supplemental
Table S4). Heterologous expression and functional
characterization confirmed that the corresponding re-
combinant proteins were able to catalyze the formation
of (E)-nerolidol from tFPP and linalool from GPP, but
no activity was detected with NPP or cFPP (Fig. 8).
Interestingly, follow-up chiral separation of products
from assays performed with GPP as substrate indicated
that CsTPS18VF generated almost exclusively (1)-
linalool, whereas CsTPS19BL produced a mixture of
(2)-linalool and (1)-linalool (Fig. 7, C and D). Sequence
differences across sesquiterpene synthases with differ-
ent product profiles included residues with potential
roles in catalysis (Fig. 9), and the implications are
evaluated in “Discussion.”
To further investigate the genetic potential for
generating terpenoid chemical diversity, two repre-
sentatives of the TPS-b clade of terpene synthases
(CsTPS16CC and CsTPS20CT) were selected for
functional characterization. CsTPS16CC had very
high expression levels in the ‘Cherry Chem’strain
(Table 3). The sequence was most similar to that of the
previously characterized alloaromadendrene syn-
thase (Booth et al., 2017; Fig. 6; Supplemental Table
S4). In our assays, the recombinant protein generated
germacrene B from tFPP (Fig. 8C), with g-elemene
being detected as a thermal breakdown product (de
Kraker et al., 1998). Other prenyl diphosphate sub-
strates were not accepted as substrates with appre-
ciable conversion rates (Fig. 8). The ‘Canna Tsu’strain
had a particularly high expression level of CsTPS20CT
Figure 6. Maximum likelihood phylo-
genetic tree of selected, functionally
characterized terpene synthases. The
tree is rooted with the ancestral ent-
kaurene synthase of Physcomitrella
patens (PpCPS/KS). A color code is used
to indicate different clades (yellow,
TPS-a; green, TPS-b; and purple, TPS-
g). Abbreviations are as follows: BL,
Black Lime; CC, Cherry Chem; Cs,
Cannabis sativa; CT, Canna Tsu; FN,
Finola; FRAAN, Fragaria 3ananassa;
FRAVE, Fragaria vesca; HUMLU, Hu-
mulus lupulus; OCIBA, Ocimum basi-
licum; ROSRU, Rosa rugosa; SALOF,
Salvia officinalis; VF, Valley Fire; VITVI,
Vitis vinifera. The accession numbers
and sequences of the terpene synthases
are provided in Supplemental Table S4.
1888 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Figure 7. Functional characterization of cannabis terpene synthases that act on GPP as substrate. Left, Chiral gas chromatography
(GC) scans; center, mass spectra of primary products; right, product distribution. A, (2)-Limonene synthase (CsTPS14CT). B,
b-Myrcene synthase (CsTPS15CT). C, (E)-Nerolidol/(1)-linalool synthase (CsTPS18VF). D, (E)-Nerolidol/(1)-linalool synthase
(CsTPS19BL).
Plant Physiol. Vol. 180, 2019 1889
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Figure 8. Functional characterization of cannabis terpene synthases that act on tFPP as substrate. Left, GC-mass spectrometry
scans; center, mass spectra of primary products; right, product distribution. A, (E)-Nerolidol/(1)-linalool synthase (CsTPS18VF). B,
(E)-Nerolidol/(1)-linalool synthase (CsTPS19BL). C, Germacrene B synthase (CsTPS16CC). D, Hedycaryol synthase (CsTPS20CT).
1890 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
(Table 3). Its closest neighbor in the sequence relat-
edness tree was a putative d-selinenesynthasefrom
cannabis(Boothetal.,2017;Fig.6;Supplemental
Table S4). Functional assays with the purified, re-
combinant protein indicated a conversion of tFPP to ele-
mol, a thermal breakdown product of the sesquiterpene
hedycaryol (Koo and Gang, 2012; Hattan et al., 2016), but
there was little or no activity with other prenyl diphos-
phate substrates (Fig. 8D). In summary, we demon-
strate that the resources and approaches described here
can be employed to identify candidates and subse-
quently characterize functions of terpene synthase genes
that belong to three different clades, thereby contribut-
ing to a better understanding of the genetic determinants
of terpenoid chemical diversity in cannabis.
DISCUSSION
Utility of Transcript Profiling for Strain Differentiation
Competition in decriminalized retail markets for
cannabis has put pressure on breeders to differentiate
their product from that of their competitors. This has
led to branding with a plethora of distinct and memo-
rable names, which has caused both confusion and
controversy (Small, 2015). Chemical profiling can be
employed as a powerful tool in strain differentiation,
but adding genotyping information further increases
the resolution of the analysis. The differentiation of
drug-type and fiber-type cannabis strains can be ach-
ieved with standard genotyping analyses (Piluzza et al.,
2013). However, a differentiation of genetically related
strains has been much more challenging (Sawler et al.,
2015; Punja et al., 2017). Traditional genotyping
approaches benefit significantly from high-quality ref-
erence genome sequences (Scheben et al., 2017), but,
unfortunately, only fairly low-quality genome se-
quences have been published for two cannabis strains
(van Bakel et al., 2011). We employed RNA-seq as an
alternative approach for genotyping (Haseneyer et al.,
2011), which does not depend on prior sequence data
(Wang et al., 2009). We used RNA-seq to obtain the
transcriptome of glandular trichome cells of nine se-
lected cannabis strains (with three biological replicates
each). Importantly, statistical analyses of these data sets
allowed the differentiation of strains into broader
clades (descendants of landraces of C. sativa or C. indica)
Figure 9. Variation of the residue putatively stabilizing carbocation intermediates correlates with outcome of catalysis in can-
nabis sesquiterpene synthases. A, Sequence alignment of sesquiterpene synthases (with carbocation-stabilizing residues high-
lighted). B, Proposed cyclization reactions catalyzed by sesquiterpene synthases. Identifiers for sequences from the literature
(Aharoni et al., 2004; Nagegowda et al., 2008) are as follows: AmNES/LIS1, EF433761; AmNES/LIS2, EF433762; FvNES1,
AX529002; FaNES2, AX529067; FaNES1, KX450224, with species abbreviations as follows: Am, Antirrhinum majus;Fa,Fragaria
3ananassa;Fv,Fragaria vesca).
Plant Physiol. Vol. 180, 2019 1891
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
but also resulted in the full separation of all individual
strains (with biological replicates clustering closely to-
gether; Fig. 3). We fully recognize that RNA-seq is not a
viable option for routine genotyping, but it can be used
to develop single-nucleotide polymorphism-based
genotyping platforms. This approach has been employed
successfully for a number of crops, including alfalfa
(Medicago sativa; Yang et al., 2011), maize (Zea mays;Hansey
et al., 2012), and wheat (Triticum aestivum;Ramirez-
Gonzalez et al., 2015). Our data sets are therefore highly
valuable for building resources for follow-up research with
cannabis. As an added benefit, RNA-seq data can be used
for gene expression analysis, thereby providing a func-
tional context, which is discussed in more detail below.
Utility of Metabolite Profiling for Strain Differentiation
We assessed the utility of cannabinoid and terpenoid
profiling, in addition to strain differentiation by geno-
typing as discussed above, to demarcate nine com-
mercial cannabis strains. Two independent statistical
approaches, PCA and OPLS-DA, grouped biological
replicates closely together while still separating indi-
vidual strains and classes of strains (those of C. sativa or
C. indica heritage; Fig. 4). Several authors have advo-
cated the profiling of both cannabinoids and terpenoids
in recent publications (Fischedick et al., 2010; Elzinga
et al., 2015; Aizpurua-Olaizola et al., 2016; Hazekamp
et al., 2016; Fischedick, 2017; Lewis et al., 2018;
Orser et al., 2018; Richins et al., 2018; Sexton et al., 2018).
The key advantage of this approach over merely pro-
filing cannabinoids lies in the enormous diversity of
terpenoids accumulated in cannabis (and in other
plants as well), which significantly increases the power
of statistical analyses. It also reflects the fact that many
users select cannabis strains based on both the reported
THC content and aroma (which is largely imparted by
terpenoids; Gilbert and DiVerdi, 2018). A comprehen-
sive analysis of cannabis strains recently indicated the
presence of close to 200 detectable volatiles, which were
tentatively identified based on searches against various
spectral databases (Rice and Koziel, 2015). A notable
challenge with terpenoid profiling pertains to the lim-
itation that authentic standards are often very costly or
unavailable from commercial sources, which is partic-
ularly true for sesquiterpenes (dozens detected by Rice
and Koziel [2015]). Commercial cannabis testing labo-
ratories therefore rarely offer services that comprise
more than 20 terpenoids. While such analyses may
detect the most abundant terpenoids for popular
strains, it is not unlikely that important aroma volatiles
with a low odor detection threshold could be missed
(Chin and Marriott, 2015). Another reason why a
comprehensive profiling of terpenoids would be de-
sirable relates to testing the validity of the entourage
effect, the proposed synergism between cannabinoids
and other constituents (in particular terpenoids) that
might affect the experience of the user (Gertsch et al.,
2008; Russo, 2011). Should such effects be substantiated
by empirical evidence, it would be advisable to recon-
sider the current laws and rules for formulations con-
taining cannabis extracts, which are based solely on
THC. An improved understanding of terpenoid phy-
tochemistry in cannabis would be an important first
step in this direction (Booth and Bohlmann, 2019).
Coregulation of Metabolic Pathways in Cannabis Is
Consistent with Gene Expression Patterns Commonly
Observed in Glandular Trichomes
Our statistical analyses using the WGCNA package
indicated a tight correlation of biosynthetic genes with
cannabinoid and terpenoid end products (Fig. 5). We
recently performed a meta-analysis of gene expression
patterns in glandular trichomes across various species
(Zager and Lange, 2018). One of the conclusions, con-
sistent with the data presented here, was that gene ex-
pression patterns correlate well with the metabolic
specialization in these anatomical structures. Cor-
egulation has been observed for genes across multiple
pathways of specialized metabolism, such as cannabi-
noids and terpenoids (this study), monoterpenes and
diterpenes (Salvia pomifera; Trikka et al., 2015), flavo-
noids and acyl sugars (Salpiglossis sinuata and Solanum
quitoense; Moghe et al., 2017), and bitter acids and pre-
nylflavonoids (Humulus lupulus; Kavalier et al., 2011;
Clark et al., 2013). These tight gene-to-metabolite cor-
relations were also reflective of predicted fluxes
through the relevant pathways (Zager and Lange,
2018). In contrast, gene expression patterns appear to
be less predictive of fluxes through central carbon me-
tabolism, where regulation at the protein level plays a
more significant role (Paul and Pellny, 2003; Koch, 2004;
Gibon et al., 2006; Schwender et al., 2014; Rocca et al.,
2015). This does not mean that feedback regulation of
specialized metabolism is negligible in glandular tri-
chomes; there is just a particularly strong overall gene-
to-metabolite correlation, and unraveling the details
will be an exciting topic for future research.
Functional Characterization of Terpene Synthases
Contributes to an Improved Understanding of the Genetic
Determinants of Terpenoid Diversity
The observed gene-to-metabolite correlations in
cannabis glandular trichomes provided opportunities
for gene discovery efforts. Booth et al. (2017) analyzed
transcriptome data sets obtained with the Finola and
Purple Kush strains to obtain candidate genes for ter-
pene synthases that were subsequently characterized to
encode enzymes for the production of 14 monoterpenes
and sesquiterpenes. Those that contribute to the for-
mation of some of the common monoterpenes and
sesquiterpenes [e.g. b-myrcene, (2)-limonene, a-pinene,
b-caryophyllene, and a-humulene] were found to be
expressed at fairly high levels across the strains in-
cluded in this analysis, whereas those that generate less
1892 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
common products [e.g. (Z)-b-ocimene, g-eudesmol,
alloaromadendrene, d-selinene, and valencene] were
found to be expressed only in a limited number of
strains or not at all (Table 3). To assess sequence vari-
ation among these genes, we cloned genes with high
sequence identity to the previously characterized
b-myrcene and (2)-limonene synthases.
Prior to this study, a notable gap existed with regard
to the terpene synthases underlying the formation of
the monoterpene linalool and the sesquiterpene ner-
olidol, which are both common constituents in cannabis
resin. We now identified a gene coding for an enzyme
(CsTPS19BL) that generates a mixture of (1)-linalool
and (2)-linalool from GPP and (E)-nerolidol from tFPP
in the Black Lime strain. We also cloned a putative
ortholog from the Valley Fire strain to evaluate the ef-
fects of sequence variation. Interestingly, the encoded
enzyme (CsTPS18VF) had the same specificity as
CsTPS19BL with regard to the tFPP substrate [(E)-
nerolidol as product]; however, with GPP as substrate,
(1)-linalool was detected as the essentially exclusive
product. This difference in specificity is surprising
given that the peptide sequences have only three mis-
matches (Supplemental Fig. S3).
Finally, we cloned genes that, based on sequence
relatedness, were expected to code for enzymes that
generate sesquiterpene products not previously detec-
ted in assays with cannabis terpene synthases. Indeed,
CsTPS16CC was demonstrated to produce germacrene
B and CsTPS20CT formed hedycaryol as primary pro-
duct. In assays with CsTPS16CC, g-elemene was also
detected, but this is a well-known product of thermal
degradation in the GC inlet (de Kraker et al., 1998).
Elemol was the sole product of assays with CsTPS20CT,
which is also a thermal degradation product, in this
case of hedycaryol (Koo and Gang, 2012; Hattan et al.,
2016). Consequently, the enzyme activities are referred
to as germacrene B synthase and hedycaryol synthase,
respectively. To the best of our knowledge, the sesqui-
terpenes generated by these terpene synthases (ger-
macrene B and hedycaryol) have not been identified in
cannabis samples yet, indicating the need for a more
comprehensive coverage of terpenoids to better un-
derstand strain-specific aroma profiles. It should also be
noted that several recent studies reporting on compre-
hensive chemical and sensory analyses of volatiles
emitted from cannabis found that nonterpenoid alco-
hols and aldehydes have potent odor impacts (Rice and
Koziel, 2015; Wiebelhaus et al., 2016; Calvi et al., 2018).
These considerations indicate that more emphasis
needs to be placed on comprehensive metabolite pro-
filing, including cannabinoids and terpenoids but also
extending to other volatiles, for future efforts focused
on strain characterization.
With a larger number of functionally characterized
genes in cannabis, sequence comparisons are now
allowing us to ask questions about some of the deter-
minants of specificity. The overall sequence identity of
the sesquiterpene synthases characterized here is fairly
low (less than 70% at the amino acid level), but there are
striking differences in the nature of a conserved aro-
matic residue (Tyr-527) that had previously been hy-
pothesized to stabilize the positive charge of the
carbocation occurring during the formation of a ger-
macrene intermediate in the epi-aristolochene synthase
catalytic sequence (Starks et al., 1997). The equivalent
residues in sesquiterpene synthases that catalyze the
formation of cyclic products (CsTPS16CC and CsTPS20CT)
are also Tyr residues (Fig. 9). In contrast, Gln residues oc-
cupy this position in CsTPS18VF, CsTPS19BL, and other
characterized enzymes of the TPS-g clade (Fig. 9A; Aharoni
et al., 2004; Nagegowda et al., 2008), which, possibly
because of insufficient carbocation stabilization, generate
(E)-nerolidol as a noncyclic product (Fig. 9). Testing this
hypothesis will be an important future goal for follow-up
research.
MATERIALS AND METHODS
Plant Materials and Chemicals
Clonal plant cuttings of nine Cannabis sativa (cannabis) strains (Sour Diesel,
Canna Tsu, Black Lime, Valley Fire, White Cookies, Mama Thai, Terple, Cherry
Chem, and Blackberry Kush) were placed in 250-L pots and grown in hoop-
style, light-deprivation greenhouses at Shadowbox Farms in Williams, Oregon,
under a 18-h-light/6-h-dark regime (natural light) to stimulate vegetative
growth, before shifting to a 12-h-light/12-h-dark cycle to induce flowering. The
length of these time periods varied from strain to strain and was adjusted based
on phenotypic evaluations. All aspects of plant growth, harvest, and transport
were performed in accordance with the laws and rules under Chapter 475B, as
released by the Oregon Liquor Control Commission. Plant harvest was per-
formed when the consistency of glandular trichome content had changed from
a turbid white to clear and before another change to an amber-like color oc-
curred. For most strains, the pistils had changed color from white to yellow or
orange. Buds were harvested, parts with low glandular trichome content were
removed using scissors, and the remainder were placed on ice until further
processing (always within 3 h). Monoterpene and sesquiterpene reference
standards were purchased from Restek. Cannabinoid reference standards were
obtained from Sigma-Aldrich. Solvents for extraction were procured from
Sigma-Aldrich Solvents and chemicals for chromatography were sourced from
Burdick & Jackson. Substrates for enzyme assays (GPP, NPP, and E,E-FPP)
were prepared synthetically (Davisson et al., 1986) or obtained from a com-
mercial source (Z,Z-FPP; Echelon Biosciences). The sources of standards for
enzyme assays were as follows: germacrene B, isolated as a side product from
assays with germacrene C synthase (Colby et al., 1998); g-elemene, obtain ed by
heating germacrene B under argon (de Kraker et al., 1998); elemol, institutional
chemical repository (originally purchased from Parchem); hedycaryol, institu-
tional chemical repository (source unknown); (S)-(1)-linalool, isolated from c o-
riander (Coriandrum sativum)oil;(2)-limonene, (1)-limonene, (R)-(2)-linalool,
b-myrcene, (E)-nerolidol, (2)-a-pinene, (2)-b-pinene, and a-terpinolene, all
purchased from Sigma-Aldrich.
Metabolite Extraction and Analysis
Cannabinoids and terpenoids were extracted and quantified according to
Fischedick et al. (2010), with modifications, at a testing facility with accredita-
tion by ISO/IEC 17025 and licensed through the National Environmental
Laboratory Accreditation Program (Evio Labs). Briefly, roughly 2 g of fresh bud
tissue was crushed in a Falcon tube, suspended in 10 mL of methyl tert-butyl
ether (containing 1-octanol as internal standard) with gentle shaking for 15 min,
followed by centrifugation at 2,000gfor 5 min. The supernatant was transferred
to a new vial, and the plant material was extracted two more times as above (no
addition of internal standard to solvent). The combined supernatants were
filtered through a polytetrafluoroethylene syringe filter (0.45 mm pore size,
25 mm diameter), and an aliquot was transferred to a screw-cap glass vial,
which was stored at 220°C until further analysis. Following extraction, the
Plant Physiol. Vol. 180, 2019 1893
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
remaining plant material was dried in an oven (50°C) and weighed to determine
dry weights for each sample.
Cannabinoids were separated via HPLC (modelLC-2030C; Shimadzu) using a
Kinetex C18 reverse-phase column (50 34.6 mm, 2.6 mm particle size; Phenom-
enex) and a binary gradient of solvent A (water containing 0.1% [v/v] formic acid
and 10 mMammonium formate) and solvent B (methanol containing 0.05% [v/v]
formic acid) with the following settings: 0 to 9 min, 68% to 78% B; 9 to 11.9 min,
78% to 100% B; 11.9 to 13.5 min, hold at 100% B. Analytes were monitored at 228
nm in a diode array detector. Peak identification was achieved based on com-
parisons of retention times and spectral characteristics with those of authentic
cannabinoid reference standards. Analytes were quantified based on calibration
curves acquired withauthentic standards.The validationof the analytical method
was performed according to Fischedick et al. (2010).
Terpenoids were separated via GC (model 6890; Agilent Technologies) using
a DB5 column (30 m 325 mm, 25 mmfilm thickness; Agilent Technologies) and
detected with a flame ionization detector. The conditions for separation were as
follows: injector at 250°C, 20:1 split injection mode (1 mL injected); detector at
250°C (H
2
flow at 30 mL min
21
, airflow at 400 mL min
21
, makeup flow [He] at
25 mL min
21
); oven heating from 40°C to 120°C at 2°C min
21
, then ramped to
200°C at 50°C min
21
, with a final hold at 200°C for 2 min. GC peaks were
identified based on comparisons of retention times of authentic standards
(purchased from Sigma-Aldrich). Analytes were quantified based on calibra-
tion curves acquired with authentic standards. The validation of the analytical
method was performed according to Fischedick et al. (2010).
RNA Isolation from Glandular Trichomes and cDNA
Library Preparation
Secretory cells of glandular trichomes were removed from 10 to 15 g of bud tissue
by surface abrasion and then collected by filtering through a series of nylon meshes
(Lange et al., 2000). Total RNA was isolated from secretory cells using the RNeasy
Plant kit (Qiagen) according to the manufacturer’s instructions. RNA integrity was
determined using a BioAnalyzer 2100 (Agilent Technologies). cDNA libraries from
1to2mg of total RNA were generated using the SuperScript III Reverse Tran-
scriptase kit (Invitrogen) according to the manufacturer’sinstructions.
RNA-Seq and Transcriptome Assembly
RNA-seq libraries were prepared from 250 ng of total glandular trichome
RNA with the Stranded mRNA-Seq Poly(A) Selection kit (KAPA Biosystems).
The quality and quantity of the sequencing library were assessed using a Bio-
analyzer 2100 and a Qubit 3.0 Fluorometer (Agilent Technologies and Life
Technologies). Sequencing of 150-bp paired-end reads was performed on a
HiSeq 4000 instrument (Illumina). Sequenced reads were trimmed of adapter
sequences with Trimmomatic (Bolger et al., 2014), and sequence quality was
checked with FastQC (Andrews, 2010). Trimmed sequences were merged and
assembled using the Trinity de novo assembler, and downstream functional
annotation of the assembly was performed with Trinotate (Haas et al., 2013).
The resulting transcriptome assembly contained 514,208 contigs, with a mean
contig length of 875 bp and an N50 value of 1,529 bp. Transcript abundance in
each RNA-seq data set (three biological replicates per strain) was determined
with RSEM (Li and Dewey, 2011).
Analysis of Global Gene Expression Patterns and
GO Enrichment
Testing for differential gene expression across strains was performed using
the Bioconductor package DESeq2 (version 1.18.1; Love et al., 2014). Pvalues
were adjusted using the Benjamini-Hochberg procedure (Benjamini and
Hochberg, 1995). An adjusted Pvalue (false discovery rate) #1e-10 and log
2
ratio $3 were set as thresholds. A cluster analysis of gene expression patterns
between strains was performed within the Trinity suite (Haas et al., 2013) by
partitioning genes into clusters by cutting the hierarchically clustered gene tree
at 60% height of the tree. A GO enrichment analysis of differentially expressed
genes was performed using the GOseq package in R (Young et al., 2010). GO
terms with an adjusted P,0.01 were considered significantly enriched.
Gene Coexpression Network Analysis
A gene coexpression network was built using the WGCNA package in R
(Langfelder and Horvath, 2008). Transcriptome data sets were filtered to
remove genes with an average expression value of 50 TPM or smaller. Coex-
pression modules were identified using the function blockwiseModules with
the following settings: power at 7, mergeCutHeight at 0.55, and minModuleSize
at 30. Eigengene values were determined for each coexpression module to test
for association significance. Modules with similar eigengene values were
merged to obtain the final coexpression modules.
Phylogenetic Analysis of TPS Candidates
The identification of TPS candidate genes was accomplished by searching the
translated transcriptome consensus assembly against a manually curated pro-
tein database specific to characterized plant TPSs using the BLASTx algorithm. A
reciprocal search (tBLASTn) was performed with sequences of 114 characterized
angiosperm TPSs against the assembly for each individual strain. Predic ted TPS
sequences were then analyzed for gene expression values across strains.
Translated amino acid sequences of these and reference TPSs (from C. sativa and
Humulus lupulus) were aligned using the MUSCLE algorithm. Alignments were
analyzed with maximum likelihood analysis using a Jones-Taylor-Thornton
model with gamma distribution for rates among amino acid sites. One thou-
sand bootstrap replicates were then used to construct a phylogeny using
MEGA7 (Jones et al., 1992; Kumar et al., 2016).
Cloning of TPS cDNAs
First-strand cDNA was prepared from RNA with the SuperScript III First
Strand Synthesis kit (Invitrogen) with random hexamer oligonucleotides. Open
reading frames for TPSs were amplified using gene-specific primers
(Supplemental Table S5; amplicons for full-length cDNAs were generated for
putative sesquiterpene synthases, whereas cDNAs devoid of the plastidial
targeting sequence were amplified for putative monoterpene synthases).
Amplicons were ligated into the pGEM-T Easy vector (Promega) and sequence
verified. For expression in Escherichia coli, full-length or truncated genes were
subcloned into the pSBET expression vector (predigested with NdeIand
BamHI). Several terpene synthase cDNAs (CsTPS18VF,CsTPS19BL,and
CsTPS20CT) were purchased as synthetic products (in the pET28B expression
vector) from GenScript.
In Vitro Functional Assays for Recombinant TPSs
Plasmids were transformed into chemically competent cells of several E. coli
strains [BL21 (DE3), C41 (DE3), C43 (DE3), C43 (DE3) pLysS, and ArcticExpress
(DE3)], which were then grown in 25 mL of liquid Luria-Bertani medium at
37°C with shaking to an OD
600
of 0.8. Expression of TPS genes was induced with
0.1 or 0.5 mMisopropyl b-D-1-thiogalactopyranoside (Goldbio), and cells were
grown for another 24 h at three different temperatures (16°C, 10°C, and 4°C).
Bacterial cells were harvested by centrifugation at 5,000gand resuspended in
300 mL of MOPSO buffer, pH 7, supplemented with 1 mMDTT (Goldbio). Cells
were lysed using a model 475 sonicator (VirTis), with three 15-s bursts and
cooling on ice between bursts. The resulting homogenate was centrifuged at
15,000gfor 30 min at 4°C, and the clear supernatant was mixed with ceramic
hydroxyapatite (Bio-Rad). The purification of recombinant protein was per-
formed as described by Srividya et al. (2016) for constructs in the pSBET ex-
pression vector, whereas those in the pET28B expression vector were purified
over Ni
21
affinity columns according to the manufacturer’s instructions
(Novagen-EMD Millipore). In vitro assays were performed in 2-mL glass vials
containing 200 mg of purified enzyme in MOPSO buffer containing DTT and
MgCl
2
(total volume of 100 mL). A prenyl diphosphate substrate (GPP, NPP,
tFPP, or cFPP) was added to a final concentration of 0.5 mM. The assay mixtures
were overlaid with 100 mLofn-hexane (Avantor) and incubated at 30°C for 16 h
on a multitube rotator (Labquake; Barnstead Thermolyne). The enzymatic re-
action was stopped by vigorous mixing of the contents of the tubes, followed by
30 min at 280°C for phase separation. The organic phase was removed and
transferred to glass vial inserts and stored in GC vials at 220°C until further
analysis.
Enzymatically formed products were analyzed on a 6890N gas chromato-
graph coupled to a 5973 mass selective detector (Agilent). Analyte separation
was achieved under the conditions developed by Adams (2007), which includes
a comprehensive resource for spectral comparisons of volatiles. The chiral
separation of monoterpenes was achieved as described by Turner et al. (2019).
Enzymatically generated products were identified based on retention times and
mass spectral properties when compared with those of authentic standards.
1894 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Statistical Analyses
For metabolite analyses, statistical analyses were performed in R using the
MetaboAnalystR package (Chong and Xia, 2018). Quantitative terpenoid and
cannabinoid data were scaled by dividing mean centered values by the SD of
each variable to generate principal component loadings. Principal components
were then plotted in three dimensions within the R environment. OPLS-DA
analysis was also performed in the same way using the MetaboAnalystR
package. Differential gene expression patterns were assessed using the Bio-
conductor package DESeq2 (version 1.18.1; Love et al., 2014), with the Pvalue
for the Benjamini-Hochberg false discovery threshold being adjusted to 1e-10 or
less and the log
2
fold-change ratio to 3 or greater. Cluster analysis of differential
gene expression was performed within the Trinity suite (Haas et al., 2013) by
cutting the clustered gene tree at 60% tree height, and differentially expressed
genes were subjected to further analysis within GOseq as described above
(Young et al., 2010). TPS candidates were identified based on sequence identity
with functionally characterized TPSs in tBLASTn searches. Candidates with
e-values .0.001 and bitscores ,250 were removed from further consideration.
Accession Numbers
The raw transcriptome sequence data for cannabis strains are available at the
National Center for Biotechnology Information Sequence Read Archive, project
number PRJNA498707. Nucleotide sequences for genes characterized as part
of this study were deposited in GenBank and received the accession
numbers MK131289 (CsTPS16CC), MK801762 (CsTPS20CT), MK801763
(CsTPS19BL), MK801764 (CsTPS18VF), MK801765 (CsTPS15CT), and
MK801766 (CsTPS14CT).
Supplemental Data
The following supplemental materials are available.
Supplemental Figure S1. Alignment of translated peptide sequences,
based on RNA-seq data, of THCA synthase across cannabis strains.
Supplemental Figure S2. Nucleotide and translated peptide sequence,
based on RNA-seq data, of CBDA synthase from the cannabis strain
Canna Tsu.
Supplemental Figure S3. Alignment of terpene synthase sequences.
Supplemental Table S1. Statistics of de novo assemblies performed based
on cannabis glandular trichome-specific RNA-seq data sets.
Supplemental Table S2. Annotation of transcripts represented in cannabis
glandular trichome-specific RNA-seq data sets.
Supplemental Table S3. Clustering of genes into coexpression modules
obtained by WGCNA of cannabis glandular trichome-specific RNA-seq
data sets.
Supplemental Table S4. Accession numbers and sequences of terpene
synthases considered for phylogenetic analysis.
Supplemental Table S5. Primers used to clone cannabis cDNAs for func-
tional characterization.
ACKNOWLEDGMENTS
This study was supported by gifts from private individuals, and we are
grateful for their generosity. We also thank Shadowbox Farms for allowing A.S.
to harvest plant materials.
Received December 5, 2018; accepted May 15, 2019; published May 28, 2019.
LITERATURE CITED
Abuhasira R, Shbiro L, Landschaft Y (2018) Medical use of cannabis and
cannabinoids containing products: Regulations in Europe and North
America. Eur J Intern Med 49: 2–6
Adams RP (2007) Identification of Essential Oil Components By Gas
Chromatography/Mass Spectrometry, 4. Allured Publishing Corpora-
tion, Carol Steam, IL
Aharoni A, Giri AP, Verstappen FW, Bertea CM, Sevenier R, Sun Z,
Jongsma MA, Schwab W, Bouwmeester HJ (2004) Gain and loss of fruit
flavor compounds produced by wild and cultivated strawberry species.
Plant Cell 16: 3110–3131
Aizpurua-Olaizola O, Soydaner U, Öztürk E, Schibano D, Simsir Y,
Navarro P, Etxebarria N, Usobiaga A (2016) Evolution of the cannabi-
noid and terpene content during the growth of Cannabis sativa plants
from different chemotypes. J Nat Prod 79: 324–331
Andre CM, Hausman JF, Guerriero G (2016) Cannabis sativa: The plant of
the thousand and one molecules. Front Plant Sci 7: 19
Andrews S (2010) FastQC: A quality control tool for high throughput se-
quence data. http://www.bioinformatics.babraham.ac.uk/projects/
fastqc
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A
practical and powerful approach to multiple testing. J R Stat Soc B 57:
289–300
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for
Illumina sequence data. Bioinformatics 30: 2114–2120
Booth JK, Bohlmann J (2019) Terpenes in Cannabis sativa: From plant
genome to humans. Plant Sci 284: 67–72
Booth JK, Page JE, Bohlmann J (2017) Terpene synthases from Cannabis
sativa.PLoSONE12: e0173911
Brenneisen R (2007) Chemistry and analysis of phytocannabinoids and
other cannabis constituents. In M.A. ElSohly, ed, Forensic Science and
Medicine. Marijuana and the Cannabinoids. Humana Press, New York,
pp 17–49
Calvi L, Pentimalli D, Panseri S, Giupponi L, Gelmini F, Beretta G, Vitali
D, Bruno M, Zilio E, Pavlovic R, et al (2018) Comprehensive quality
evaluation of medical Cannabis sativa L. inflorescence and macerated oils
based on HS-SPME coupled to GC-MS and LC-HRMS (q-exactive orbi-
trapÒ) approach. J Pharm Biomed Anal 150: 208–219
Cascini F, Aiello C, Di Tanna G (2012) Increasing delta-9-tetrahydrocan-
nabinol (D-9-THC) content in herbal cannabis over time: Systematic re-
view and meta-analysis. Curr Drug Abuse Rev 5: 32–40
Chin ST, Marriott PJ (2015) Review of the role and methodology of high
resolution approaches in aroma analysis. Anal Chim Acta 854: 1–12
Chong J, Xia J (2018) MetaboAnalystR: An R package for flexible and re-
producible analysis of metabolomics data. Bioinformatics 34: 4313–4314
Clark SM, Vaitheeswaran V, Ambrose SJ, Purves RW, Page JE (2013)
Transcriptome analysis of bitter acid biosynthesis and precursor path-
ways in hop (Humulus lupulus). BMC Plant Biol 13: 12
Colby SM, Crock J, Dowdle-Rizzo B, Lemaux PG, Croteau R (1998)
Germacrene C synthase from Lycopersicon esculentum cv. VFNT cherry
tomato: cDNA isolation, characterization, and bacterial expression of
themultipleproductsesquiterpenecyclase.ProcNatlAcadSciUSA95:
2216–2221
Davisson VJ, Woodside AB, Neal TR, Stremler KE, Muehlbacher M,
Poulter CD (1986) Phosphorylation of isoprenoid alcohols. J Org Chem
51: 4768–4779
de Kraker JW, de Groot A, Franssen MC, Konig WA, Bouwmeester HJ
(1998) (1)-Germacrene A biosynthesis: The committed step in the bio-
synthesis of bitter sesquiterpene lactones in chicory. Plant Physiol 117:
1381–1392
Devane WA, Dysarz FA III, Johnson MR, Melvin LS, Howlett AC (1988)
Determination and characterization of a cannabinoid receptor in rat
brain. Mol Pharmacol 34: 605–613
Devane WA, Hanus L, Breuer A, Pertwee RG, Stevenson LA, GriffinG,
Gibson D, Mandelbaum A, Etinger A, Mechoulam R (1992) Isolation
and structure of a brain constituent that binds to the cannabinoid re-
ceptor. Science 258: 1946–1949
Elzinga S, Fischedick J, Podkolinski R, Raber JC (2015) Cannabinoids and
terpenes as chemotaxonomic markers in cannabis. Nat Prod Chem Res 3:
2
Fellermeier M, Zenk MH (1998) Prenylation of olivetolate by a hemp
transferase yields cannabigerolic acid, the precursor of tetrahydrocan-
nabinol. FEBS Lett 427: 283–285
Fellermeier M, Eisenreich W, Bacher A, Zenk MH (2001) Biosynthesis of
cannabinoids. Incorporation experiments with (13)C-labeled glucoses.
Eur J Biochem 268: 1596–1604
Fischedick JT (2017) Identification of terpenoid chemotypes among high
(2)-trans-D9-tetrahydrocannabinol-producing Cannabis sativa L. culti-
vars. Cannabis Cannabinoid Res 2: 34–47
Plant Physiol. Vol. 180, 2019 1895
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Fischedick JT, Hazekamp A, Erkelens T, Choi YH, Verpoorte R (2010)
Metabolic fingerprinting of Cannabis sativa L., cannabinoids and terpe-
noids for chemotaxonomic and drug standardization purposes. Phyto-
chemistry 71: 2058–2073
Gagne SJ, Stout JM, Liu E, Boubakir Z, Clark SM, Page JE (2012) Iden-
tification of olivetolic acid cyclase from Cannabis sativa reveals a unique
catalytic route to plant polyketides. Proc Natl Acad Sci USA 109:
12811–12816
Gaoni Y, Mechoulam R (1964) Isolation, Structure, and Partial Synthesis of
an Active Constituent of Hashish. J Am Chem Soc 86: 1646–1647
Gertsch J, Leonti M, Raduner S, Racz I, Chen JZ, Xie XQ, Altmann KH,
Karsak M, Zimmer A (2008) Beta-caryophyllene is a dietary cannabi-
noid.ProcNatlAcadSciUSA105: 9099–9104
Gertsch J, Pertwee RG, Di Marzo V (2010) Phytocannabinoids beyond the
cannabis plant - do they exist?. Br J Pharmacol 160: 523–529
Gibon Y, Usadel B, Blaesing OE, Kamlage B, Hoehne M, Trethewey R,
Stitt M (2006) Integration of metabolite with transcript and enzyme
activity profiling during diurnal cycles in Arabidopsis rosettes. Genome
Biol 7: R76
Gilbert AN, DiVerdi JA (2018) Consumer perceptions of strain differences
in Cannabis aroma. PLoS ONE 13: e0192247
Grattan JHG, Singer CJ (1952) Anglo-Saxon Magic and Medicine. Oxford
University Press, London
Günnewich N, Page JE, Köllner TG, Degenhardt J, Kutchan TM (2007)
Functional expression and characterization of trichome-specific(2)-
limonene synthase and (1)-a-pinene synthase from Cannabis sativa.
Nat Prod Commun 2: 223–232
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J,
CougerMB,EcclesD,LiB,LieberM,etal(2013) De novo transcript
sequence reconstruction from RNA-seq using the Trinity platform for
reference generation and analysis. Nat Protoc 8: 1494–1512
Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, Buell
CR (2012) Maize (Zea mays L.) genome diversity as revealed by RNA-
sequencing. PLoS ONE 7: e33071
Haseneyer G, Schmutzer T, Seidel M, Zhou R, Mascher M, Schön CC,
TaudienS,ScholzU,SteinN,MayerKF,etal(2011) From RNA-seq to
large-scale genotyping: Genomics resources for rye (Secale cereale L.).
BMC Plant Biol 11: 131
Hattan J, Shindo K, Ito T, Shibuya Y, Watanabe A, Tagaki C, Ohno F,
Sasaki T, Ishii J, Kondo A, et al (2016) Identification of a novel hedy-
caryol synthase gene isolated from Camellia brevistyla flowers and floral
scent of Camellia cultivars. Planta 243: 959–972
Hazekamp A, Tejkalová K, Papadimitriou S (2016) Cannabis: From cul-
tivar to chemovar. II. A metabolomics approach to Cannabis classifica-
tion. Cannabis Cannabinoid Res 1: 202–215
Hemmerlin A, Harwood JL, Bach TJ (2012) A raison d’être for two distinct
pathways in the early steps of plant isoprenoid biosynthesis? Prog Lipid
Res 51: 95–148
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mu-
tation data matrices from protein sequences. Comput Appl Biosci 8:
275–282
Kavalier AR, Litt A, Ma C, Pitra NJ, Coles MC, Kennelly EJ, Matthews PD
(2011) Phytochemical and morphological characterization of hop (Hu-
mulus lupulus L.) cones over five developmental stages using high per-
formance liquid chromatography coupled to time-of-flight mass
spectrometry, ultrahigh performance liquid chromatography photodi-
ode array detection, and light microscopy techniques. J Agric Food
Chem 59: 4783–4793
Koch K (2004) Sucrose metabolism: Regulatory mechanisms and pivotal
roles in sugar sensing and plant development. Curr Opin Plant Biol 7:
235–246
Kojoma M, Seki H, Yoshida S, Muranaka T (2006) DNA polymorphisms
in the tetrahydrocannabinolic acid (THCA) synthase gene in “drug-
type”and “fiber-type”Cannabis sativa L. Forensic Sci Int 159: 132–140
Koo HJ, Gang DR (2012) Suites of terpene synthases explain differential
terpenoid production in ginger and turmeric tissues. PLoS ONE 7:
e51481
Kumar S, Stecher G, Tamura K (2016) MEGA7: Molecular Evolutionary
Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol 33:
1870–1874
Lange BM, Wildung MR, Stauber EJ, Sanchez C, Pouchnik D, Croteau R
(2000) Probing essential oil biosynthesis and secretion by functional
evaluation of expressed sequence tags from mint glandular trichomes.
Proc Natl Acad Sci USA 97: 2934–2939
Langfelder P, Horvath S (2008) WGCNA: An R package for weighted
correlation network analysis. BMC Bioinformatics 9: 559
Laverty KU, Stout JM, Sullivan MJ, Shah H, Gill N, Bolbrook L, Deikus
G, Sebra R, Hughes TR, Page JE, et al (2019) A physical and genetic
map of Cannabis sativa identifies extensive rearrangement at the THC/
CBDacidsynthaselocus.GenomeRes29: 146–156
Lewis MA, Russo EB, Smith KM (2018) Pharmacological foundations of
Cannabis chemovars. Planta Med 84: 225–233
Li B, Dewey CN (2011) RSEM: Accurate transcript quantification from
RNA-Seq data with or without a reference genome. BMC Bioinformatics
12: 323
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change
and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550
Moghe GD, Leong BJ, Hurney SM, Daniel Jones A, Last RL (2017) Evo-
lutionary routes to biochemical innovation revealed by integrative
analysis of a plant-defense related specialized metabolic pathway. eLife
6: e28468
Nagegowda DA, Gutensohn M, Wilkerson CG, Dudareva N (2008) Two
nearly identical terpene synthases catalyze the formation of nerolidol
and linalool in snapdragon flowers. Plant J 55: 224–239
Orser C, Johnson S, Speck M, Hilyard A, AfiaI(2018) Terpenoid che-
moprofiles distinguish drug-type Cannabis sativa L. cultivars in Nevada.
Nat Prod Chem Res 6: 304
Page JE, Boubakir Z (2012) Aromatic prenyltransferase from Cannabis. US
Patent 20120144523.
Paul MJ, Pellny TK (2003) Carbon metabolite feedback regulation of leaf
photosynthesis and development. J Exp Bot 54: 539–547
Piluzza G, Delogu G, Cabras A, Marceddu S, Bullitta S (2013) Differen-
tiation between fiber and drug types of hemp (Cannabis sativa L.) from a
collection of wild and domesticated accessions. Genet Resour Crop Evol
60: 2331–2342
Punja ZK, Rodriguez G, Chen S (2017) Assessing genetic diversity in
Cannabis sativa using molecular approaches. In S Chandra, H Lata, MA
ElSohly, eds, Cannabis sativa L.: Botany and Biotechnology. Springer
International Publishing, Cham, Switzerland, pp 395–418
Ramirez-Gonzalez RH, Segovia V, Bird N, Fenwick P, Holdgate S, Berry
S, Jack P, Caccamo M, Uauy C (2015) RNA-Seq bulked segregant
analysis enables the identification of high-resolution genetic markers for
breeding in hexaploid wheat. Plant Biotechnol J 13: 613–624
Rice S, Koziel JA (2015) Characterizing the smell of marijuana by odor
impact of volatile compounds: An application of simultaneous chemical
and sensory analysis. PLoS ONE 10: e0144160
RichinsRD,Rodriguez-UribeL,LoweK,FerralR,O’Connell MA (2018)
Accumulation of bioactive metabolites in cultivated medical Cannabis.
PLoS ONE 13: e0201119
Rocca JD, Hall EK, Lennon JT, Evans SE, Waldrop MP, Cotner JB,
Nemergut DR, Graham EB, Wallenstein MD (2015) Relationships be-
tween protein-encoding gene abundance and corresponding process are
commonlyassumedyetrarelyobserved.ISMEJ9: 1693–1699
Ross SA, ElSohly MA (1996) The volatile oil composition of fresh and air-
dried buds of Cannabis sativa.JNatProd59: 49–51
Russo EB (2011) Taming THC: Potential cannabis synergy and
phytocannabinoid-terpenoid entourage effects. Br J Pharmacol 163:
1344–1364
Russo EB, Jiang HE, Li X, Sutton A, Carboni A, del Bianco F, Mandolino
G, Potter DJ, Zhao YX, Bera S, et al (2008) Phytochemical and genetic
analyses of ancient cannabis from Central Asia. J Exp Bot 59: 4171–4182
Sawler J, Stout JM, Gardner KM, Hudson D, Vidmar J, Butler L, Page JE,
Myles S (2015) The genetic structure of marijuana and hemp. PLoS ONE
10: e0133292
Scheben A, Batley J, Edwards D (2017) Genotyping-by-sequencing ap-
proaches to characterize crop genomes: Choosing the right tool for the
right application. Plant Biotechnol J 15: 149–161
Schwender J, König C, Klapperstück M, Heinzel N, Munz E,
Hebbelmann I, Hay JO, Denolf P, De Bodt S, Redestig H, et al (2014)
Transcript abundance on its own cannot be used to infer fluxes in central
metabolism. Front Plant Sci 5: 668
Sexton M, Shelton K, Haley P, West M (2018) Evaluation of cannabinoid
and terpenoid content: Cannabis flower compared to supercritical CO
2
concentrate. Planta Med 84: 234–241
1896 Plant Physiol. Vol. 180, 2019
Zager et al.
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.
Sirikantaramas S, Taura F, Tanaka Y, Ishikawa Y, Morimoto S, Shoyama
Y(2005) Tetrahydrocannabinolic acid synthase, the enzyme controlling
marijuana psychoactivity, is secreted into the storage cavity of the
glandular trichomes. Plant Cell Physiol 46: 1578–1582
Small E (2015) Evolution and classification of Cannabis sativa (marijuana,
hemp) in relation to human utilization. Bot Rev 81: 189–294
Srividya N, Lange I, Lange BM (2016) Generation and functional evalua-
tion of designer monoterpene synthases. Methods Enzymol 576: 147–165
Starks CM, Back K, Chappell J, Noel JP (1997) Structural basis for cyclic
terpene biosynthesis by tobacco 5-epi-aristolochene synthase. Science
277: 1815–1820
Stout JM, Boubakir Z, Ambrose SJ, Purves RW, Page JE (2012) The
hexanoyl-CoA precursor for cannabinoid biosynthesis is formed by an
acyl-activating enzyme in Cannabis sativa trichomes. Plant J 71: 353–365
Subritzky T, Lenton S, Pettigrew S (2016) Legal cannabis industry
adopting strategies of the tobacco industry. Drug Alcohol Rev 35:
511–513
Taura F, Sirikantaramas S, Shoyama Y, Shoyama Y, Morimoto S (2007)
Phytocannabinoids in Cannabis sativa: Recent studies on biosynthetic
enzymes. Chem Biodivers 4: 1649–1663
Taura F, Tanaka S, Taguchi C, Fukamizu T, Tanaka H, Shoyama Y,
Morimoto S (2009) Characterization of olivetol synthase, a polyketide
synthase putatively involved in cannabinoid biosynthetic pathway.
FEBS Lett 583: 2061–2066
Trikka FA, Nikolaidis A, Ignea C, Tsaballa A, Tziveleka LA, Ioannou E,
Roussis V, Stea EA, Bo
zi´
cD,ArgiriouA,etal(2015) Combined me-
tabolome and transcriptome profiling provides new insights into di-
terpene biosynthesis in S. pomifera glandular trichomes. BMC Genomics
16: 935
Turner GW, Parrish AN, Zager JJ, Fischedick JT, Lange BM (2019) As-
sessment of flux through oleoresin biosynthesis in epithelial cells of
loblolly pine resin ducts. J Exp Bot 70: 217–230
United Nations (1966) Commission on Narcotic Drugs. Document E/4294;
Economic and Social Council: Official Records
Unschuld PU (1986) Medicine in China: A History of Pharmaceutics. Uni-
versity of California Press, Berkeley
van Bakel H, Stout JM, Cote AG, Tallon CM, Sharpe AG, Hughes TR,
Page JE (2011) The draft genome and transcriptome of Cannabis sativa.
Genome Biol 12: R102
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for
transcriptomics. Nat Rev Genet 10: 57–63
Welling MT, Shapter T, Rose TJ, Liu L, Stanger R, King GJ (2016) A
belated green revolution for cannabis: Virtual genetic resources to fast-
track cultivar development. Front Plant Sci 7: 1113
Wiebelhaus N, Kreitals NM, Almirall JR (2016) Differentiation of mari-
juana headspace volatiles from other plants and hemp products using
capillary microextraction of volatiles (CMV) coupled to gas-chroma-
tography-mass spectrometry (GC-MS). Forensic Chem 2: 1–8
Worley B, Powers R (2013) Multivariate analysis in metabolomics. Curr
Metabolomics 1: 92–107
Yang SS, Tu ZJ, Cheung F, Xu WW, Lamb JF, Jung HJG, Vance CP,
Gronwald JW (2011) Using RNA-Seq for gene identification, polymor-
phism detection and transcript profiling in two alfalfa genotypes with
divergent cell wall composition in stems. BMC Genomics 12: 199
Young MD, WakefieldMJ,SmythGK,OshlackA(2010) Gene Ontology
analysis for RNA-seq: Accounting for selection bias. Genome Biol 11:
R14
Zager JJ, Lange BM (2018) Assessing flux distribution associated with
metabolic specialization of glandular trichomes. Trends Plant Sci 23:
638–647
Plant Physiol. Vol. 180, 2019 1897
Coregulation of Cannabinoid and Terpenoid Pathways
www.plantphysiol.orgon July 31, 2019 - Published by Downloaded from
Copyright © 2019 American Society of Plant Biologists. All rights reserved.