Access to this full-text is provided by Wiley.
Content available from Evolutionary Applications
This content is subject to copyright. Terms and conditions apply.
Evolutionary Applications. 2020;13:11–30.
|
11
wileyonlinelibrary.com/journal/eva
1 | INTRODUCTION
Pinus sylvestris (Scots pine) is one of the world's most widely dis‐
tributed and northern conifers reaching from the British Isles
in the west to the Siberian taiga in the east. It is also found in
the mountainous areas of Mediterranean peninsulas in Southern
Europe. It is especially competitive in poor soils and in dry and ex‐
tremely cold environments, and compared to many other Pinus, it s
ecological niche is wide (Rehfeldt et al., 2002; Svenning, Normand,
& Kageyama, 2008). Especially in the northern part of its dis‐
tribution, it is a dominant tree in forested areas (Durrant, Rigo,
& Caudullo, 2016) and has an important role as a part of forest
Received:16November2018
|
Revised:5A pril2019
|
Accepted:24April2019
DOI:10.1111/eva.12809
SPECIAL ISSUE REVIEW AND SYNTHESES
275 years of forestry meets genomics in Pinus sylvestris
TanjaPyhäjärvi1,2 | SonjaTKujala3 | OutiSavolainen1,2
ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicense,whichpermitsuse,distributionandreproductioninanymedium,
provide d the orig inal work is proper ly cited .
©2019TheAuthors.Evolutionary ApplicationspublishedbyJohnWiley&SonsLtd
1Depar tment of Ecology and
Genetics, University of Oulu, Oulu, Finland
2Biocenter Oulu, Universi ty of Oulu, Oulu,
Finland
3NaturalResourcesInstituteFinland,Luke,
Oulu, Finland
Correspondence
Tanja Pyhäjär vi, Department of Ecology and
Genetics, Uni versit y of Oulu, P.O. Box 3000,
90014Oulu,Finland.
Email: tanja.pyhajarvi@oulu.fi
Fundinginformation
Biotieteiden ja Ympäristön Tutkimuksen
Toimikunta,Grant/AwardNumber:287431,
293819,307582and309978
Abstract
Pinus sylvestris has a long history of basic and applied research that is relevant for
both forestry and evolutionary studies. Its patterns of adaptive variation and role
in forest economic and ecological systems have been studied extensively for nearly
275 years, detailed demography for a 100 years and mating system more than
50years.However,itsreferencegenomesequenceisnotyetavailableandgenomic
studies have been lagging compared to, for example, Pinus taeda and Picea abies, two
other economically important conifers. Despite the lack of reference genome, many
modern genomic methods are applicable for a more detailed look at its biological
characteristics.Forexample,RNA‐seqhasrevealedacomplextranscriptionalland‐
scape an d targeted DNA sequ encing displays an excess of r are variants and ge o‐
graphicallyhomogenouslydistributedmoleculargeneticdiversity.CurrentDNAand
RNAresourcescanbeusedasareferenceforgeneexpressionstudies,SNPdiscov‐
ery, and fur ther targeted seq uencing. In the futu re, specific conseq uences of the
large genome size, such as functional effects of regulatory open chromatin regions
and transposable elements, should be investigated more carefully. For forest breed‐
ing and long‐term management purposes, genomic data can help in assessing the ge‐
netic basis of inbreeding depression and the application of genomic tools for genomic
prediction and relatedness estimates. Given the challenges of breeding (long genera‐
tion time, no easy vegetative propagation) and the economic importance, application
ofgenomictoolshasapotentialtohaveaconsiderableimpact.Here,weexplorehow
genomic characteristics of P. sylvestris, such as rare alleles and the low extent of link‐
agedisequilibrium,impacttheapplicabilityandpowerofthetools.
KEYWORDS
adaptation,allelefrequencyspectrum,breeding,genomicprediction,genomics,inbreeding
depression,linkagedisequilibrium,Pinus sylvestris
12
|
PYHÄJÄRVI et a l.
ecosystems as a global carbon reservoir and also via interactions
withsoil microbesandfungi (Högberg et al., 2001;Lindén et al.,
2014;Panetal.,2011).Economically,itisanimportanttreespe‐
cies in Nor thern Eurasia as raw material for paper and pulp indus‐
try and as timber (Durrant et al., 2016; Gardner, 2013; Mason &
Alía,2000;Mullinetal.,2011).Itiswidelyplantedfortimberout‐
side its natural range. P. sylvestris isestimatedto cover over 145
million hectares of forest in Eurasia (Durrant et al., 2016; Mason &
Alía,2000;Mullinetal.,2011).
Very rough estimates of the actual population census size can be
made by combining distribution and density information. In commer‐
cial P. sylvestris forests, there are about 200 0 stems per hectare after
precommercialthinning(Fahlvik,Ekö,&Pettersson,2005;Väisänen,
Kellomäki,Oker‐Blom,&Valtonen,1989),whichyieldsanestimate
ofpopulationcensussizeof290×109 trees. P. sylvestris is not glob‐
ally under threat in terms of species viability as such (Gardner, 2013).
However,its dominant role in large forest ecosystems means that
any changes in its distribution or mortality are likely to have large
ecologicalandeconomicconsequences.
Multiple subspecies and varieties of P. sylvestris have been
described based on morphological and phenological differences
(Molotkov&Patlaj,1991;Pravdin,1969;Ruby&Wright, 1976),but
groupings are not consistent across studies and var y depending on
the studied traits (Shutyaev & Giertych, 2000). Often the under‐
lying phenotypic differences have more clinal rather than discrete
distribution and the description of separate varieties reflects more
the sampling design and silvicultural purposes than true biologic al
groups(Langlet,1971;Shutyaev&Giertych,2000).Atthegenome‐
wide level, molecular genetic differentiation among sampling loca‐
tions at nuc lear loci is low (FST = 0.02) in mos t parts of the dis tribution
(Karhuetal.,1996;Kujala&Savolainen,2012;Muona&Harju,1989;
Prus‐Głowacki, Urbaniak, Bujas, & Curtu, 2012; Pyhäjärvi et al.,
2007;Tyrmietal.,2019).
Due to its dominance and economical importance, phenotypic
variation of P. sylvestris has been of interest since the early days of
biological sciences. The first studies of phenotypic and adaptive dif‐
ferences among populations originating from different geographic
regionsweremotivatedbyshipbuildingandforestry(Albertoetal.,
2013;Engler,1913;Langlet,1971;Morgenstern,1996).Thefirstev‐
idence of P. sylvestris provenance (common garden) trials in France
comesfromasearlyas1745(Langlet,1971).Later,several variably
documented national (Eiche, 1966; Heikinheimo, 1950) and inter‐
national (Giertych & Oleksyn, 1992; Shutyaev & Giertych, 1998;
Wiedemann,1930)provenance trialshave beenestablishedmainly
for silvicultural purposes, also out side its natural range (Wright & Ira,
1962).Even though not originally designed for the purpose, these
data have proved valuable for evaluating the level of ecological ad‐
aptation(Savolainen,Pyhäjärvi,&Knürr,2007) andmakingpredic‐
tionsonresponsestoclimatechange(Persson,1998;Rehfeldtetal.,
2002). In the 20th century, very detailed studies on reproductive
biology(Koski,1970;Lönnroth, 1926;Sarvas,1962)formeda solid
basis for modern applied and evolutionar y genetic research of P.
sylvestris.
The extensive distribution, large effective population size (Ne),
efficient gene flow, predominantly outcrossing nature, and large
22 Gbp genome, yet strong phenotypic differentiation in adaptive
traits, make P. sylvestris an intriguing system to examine adaptive
processesof polygenictraits.Lack ofsignificantlongitudinalstruc‐
ture allows replicating sampling along latitudinal transects and in‐
specting whether the same loci or variants are participating in the
adaptive trait variation throughout the distribution or whether the
adaption has emerged via different combinations of adaptive al‐
leles.Incomparisonwith,forexample,theNorthAmericaninterior
spruce complex (Yeaman et al., 2016), P. sylvestris does not signifi‐
cantly hybridize with other species (see, however, Wachowiak &
Prus‐Głowacki,2008),likelybecausemostcloserelativeshavesmall
distributions or do not overlap with P. sylvestris.Lackof hybridiza‐
tion should further facilitate understanding of the genetic basis of
local adaptation. The species is also known to harbor large number
oflethalequivalents(Kärkkäinen,Koski,&Savolainen,1996;Koski,
1971),whichallowsexaminingthejointeffectsofdeleteriousvaria‐
tion with adaptive and phenotypic variations.
In comparison with farm animals, crops, and vegetables, many
coniferous forest trees gene pools have not been extensively
changed by humans, natural regeneration is common in large part of
the distribution, and breeding is conducted in a handful of countries
(Mullin et al., 2011). In Finland and Sweden, breeding efforts on P.
sylvestriswerestartedinthelate1940sbycollectingphenotypically
superior plus trees from natural and cultivated forests. Plus trees
wereusedtoestablishseedorchardsforseedproduction.Later,the
geneticqualityoftheseedorchardshasbeenimprovedbyselection
amongplustreesbasedonprogenytesting,formingthe“1.5gener‐
ation”seedorchards(Haapanen,Hynynen,Ruotsalainen,Siipilehto,
& Kilpeläinen, 2016; Jansson, Hansen, Haapanen, Kvaalen, &
Steffenrem,2017).Withtraditionalbreedingmethods,geneticgains
upto~25%forgrowthtraitslikevolume,andreductioninrotation
times have been obtained and resulted in significant economic gains
inFinlandandScandinavia(Ahtikoski,Ojansuu,Haapanen,Hynynen,
&Kärkkäinen,2012;Janssonetal.,2017).
Given the low number of breeding cycles conducted so far, there
should be plenty of functional genetic variation, and thus, large
genetic gains can be expected. By adding genomic methods to the
available tools, we can expect further economic gains per unit of
time. Genomic information can improve the breeding efforts in P.
sylvestristhrough genomewideassociationanalyses(GWAS),infer‐
enceofgeneticrelatedness,andbygenomic prediction (Isik,2014;
Isiketal.,2016;Meuwissen,Hayes,&Goddard,2001).Sofar,efforts
to make P. sylvestris to flower at earlier age have not been successful.
Traits that can be measured only on adult trees hold greatest poten‐
tialintermsofshorteningthebreedingcycle(Isik, 2014).Genomic
prediction can deal with the highly polygenic basis of trait variation
(with unknown underlying loci) unlike other breeding methods, such
asgeneediting,requiringdetailedmolecularinformation.
Given its economical (Praciak, 2013) and ecological impor‐
tance, and the long record of research and knowledge (Giertych
&Matyas,1991),itis unfortunate that a whole genomesequence
|
13
PYHÄ JÄRVI et al .
assembly of P. sylvestris has not yet been presented. Even though in
many other ways, it is one of the most thoroughly studied conifers
(Figure 1); from genomic s point of view, it is still a nonmodel organ‐
ism. However, whole genome sequencingand reference genomes
arenotaprerequisiteforusingcurrentmethodologiesforgenomic
analysis of P. sylvestris.
In this review, we consider biological, life history, and genomic
characteristics that are relevant for future applications of genomic
information in P. sylvestris. We will address (a) mating system and in‐
breedingdepression,(b)geneticdiversityandlinkagedisequilibrium,
(c) populati on struct ure, histor y,an d allele frequ ency distr ibution,
(d) adaptive variation at trait and molecular level, (e) genome size
and architecture, and (f) application of genomics in breeding and
conservation.
2 | MATINGSYSTEMANDINBREEDING
DEPRESSION
Pinus sylvestris, as many other fores t trees, is wind‐pollinated , diploid,
and predominantly outcrossing. The selfing rate measured in mature
seeds is about 5%–10% (Muona, Yazdani, & Rudin, 1987; Rudin,
Muona, & Yazdani, 1986).P. sylvestris is also known to display high
inbreeding depression, also a common observation in forest trees.
Inbreeding depression is normally measured by comparing the per‐
formance of selfed (or other inbred) offspring to that of controlled
outcrossed or open‐pollinated offspring. In pines, embr yonic mortal‐
ity can be evaluated by examining the proportion of empty seeds
fromdifferenttypesofcrosses(Kärkkäinenetal.,1996;Koski,1971).
Theaveragemortalityinselfedseedsis75%–85%(Kärkkäinenetal.,
1996;Koelewijn,Koski,&Savolainen,1999;Koski,1971),compared
to 20%–30% for open‐pollinated seed. Based on these kinds of
data, with simplif ying assumptions, the number of embryonic lethal
equivalentsinP. sylvestrishasbeenestimatedtobe9.4(Koski,1971).
The level is much higher than in most animals and angiosperms and
similar to many other conifers, such as Picea abies, Picea glauca, or
Pseudotsuga menziesii, while some pines with limited distributions
have much lower levels (e.g., Pinus radiata and Pinus resinosa;Lynch&
Walsh,1998;Williams&Savolainen,1996).Theselfingrateatfertili‐
zation is thus much higher than at the mature se ed stage. Further, the
selfing rate at fertilization can be even higher bec ause of polyzygotic
(simple) polyembryony: An ovulemaycontainuptofour embryos,
which can be of selfed or outcrossed origin. These embryos share
their maternal haplotype but have been fertilized by different pollen.
Only one dominant embryo will survive (Hedrick &Muona,1990;
Koski,1970;Sorensen,1982).Inbredmortalityduringearlyembryo‐
genesis and replacement by outcrossed progeny results in reduced
loss of maternal resources, thus reducing post‐seed‐maturation in‐
breedingdepression(Kärkkäinen&Savolainen,1993;Sarvas,1962).
Other studies (Koelewijn et al., 1999; Muona et al., 1987;
Yazdani,Muona,Rudin,&Szmidt,1985)haveshownthatthelow
fitness of selfs continue during later years, such that already in a
few years old seedlings and especially in young adult s, the selfed
progeny has been mostly eliminated and the genotypes are in
Hardy–Weinberg equilibrium. The result s onconifer sin general
suggest that the overall inbreeding depression is due to a large
FIGURE 1 Number of approximate
Google Scholar citations (as of October
19th2018)forsomemostintensively
studied conifer species. Number of results
indicated in each bar
225 000
214 000
108 000
75 600
64 800
50 500
48 000
46 100
34 900
34 800
33 300
30 300
27 200
26 300
25 500
23 000
22 700
Pinus sylvestris
Picea abies
Pinus radiata
Pinus taeda
Pseudotsuga menziesii
Pinus contorta
Pinus pinaster
Picea glauca
Abies alba
Cryptomeria japonica
Pinus halepensis
Picea sitchensis
Abies balsamea
Pinus banksiana
Ta xus baccata
Larix decidua
Pinus pinea
050 000 100 000150 000 200 000
Citations
Species
14
|
PYHÄJÄRVI et a l.
number of deleterious, partially recessive alleles (Williams &
Savolainen, 1996).Lande, Schemske, and Schultz (1994)showed
that the lethals can be maintained in the population despite the
selfing when there is a high per genome per generation mutation
rate to the deleterious alleles. In addition, stabilizing selection on
quantitativetr aitscaninterfereincomplexwayswithselectionon
lethals and either increase or decrease the probability of purging
(Lande&Porcher,2017).
Genomics can shed light on the genetic architecture of inbreed‐
ing depression by characterizing the numbers and effect sizes of
deleterious alleles, for example, by mapping (Hedrick & Muona,
1990;Ritland, 1996),andbyanalyzingheterozygosity oftheselfed
progeny, as in Eucalyptus grandis(Hedrick,Hellsten,&Grattapaglia,
2016): In the selfed progeny, heteroz ygosity was much higher than
the neutral expectation, suggesting that overall selection against ho‐
mozygotes in selfed offspring was very high (s=0.47).Furthermore,
inbreeding depression likely was due to par tially deleterious alleles
at more than 100 loci, even if overdominance effects could not be
fully excluded. Similar methods could help to identify genomic areas
with lethals also in conifers. So far, various functional and evolu‐
tionary prediction models of allelic substitution have been used to
identify deleterious alleles. This has then allowed comparisons of
allelicandgenotypicfrequenciesbetweenpopulationsorspecies,as
inPopulus(Zhang,Zhou,Bawa,Suren,&Holliday,2016)orinPicea
(Conteetal.,2017).Estimationofthedistributionoffitnesseffects
also allows conclusions on the nature of deleterious alleles. Allele
frequency spectrum(AFS) of some400 loci suggested thatP. s y l‐
vestris has fewer slightly deleterious alleles and a larger proportion
of more highly deleterious alleles than other conifers and plants in
general(Grivetetal.,2017;Hodgins,Yeaman,Nurkowski,Rieseberg,
&Aitken,2016).
While mating between relatives can be an import ant breeding
tool for some species, in conifers deleterious alleles are so numerous
that breeding strategies using inbreeding will often lead to fixation
ofdeleterious alleles, as showninsimulationsbyWu, Hallingbäck,
andSánchez(2016).However,insome cases, variation in levelsof
inbreeding depression may allow using mating between relatives
(Ford,McKeand,Jett, &Isik, 2014).Becauseof themostlyharmful
effects of mating between relatives, it is important to manage in‐
breeding levels in breeding programs by measuring relatedness with
genetic tools.
3 | GENETICDIVERSITYANDLINKAGE
DISEQUILIBRIUM
In many aspects, patterns of genetic diversity in P. sylvestris match
the expectations for a wind‐pollinated tree with large population
size.Populationstructureisalmostnonexistent, linkagedisequilib‐
rium (LD), the nonrandomassociationofallelesgenerallydoes not
extendfar,andgenotypesareinHardy–Weinbergequilibriumasex‐
pected under random mating (Kujala & Savolainen, 2012; Muona &
Szmidt,1985;Pyhäjärvietal.,2007;Tyrmietal.,2019;Wachowiak,
Salmela, Ennos, Iason, & Cavers, 2011).
Comparative studies have indicated that Pinus in general do
not have particularly low nucleotide diversity compared to other
plants(Chen,Glémin,&Lascoux,2017;Eckertetal.,2013;Figure2).
Silent site diversity in the P. sylvestris genic regions (based on Sanger
FIGURE 2 4‐foldsitenucleotide
diversit y for variety of angiosperms and
Pinus(datafromChenetal.,2017except
for Pinus sylvestris)
Arabidopsis thaliana (global)
Amborella trichopoda
Brachypodium distachyon (global)
Capsella grandiflora
Citrus sinensis
Coffea canephora
Cucumis hardwickii (wild cucumber)
Glycine soja (wild soybean)
Manihot glaziovii (wild cassava)
Medicago truncatula
Oryza rufipogon
Sorghum bicolor (wild)
Vitis vinifera (grape)
Zea mays (wild, teosinte)
Betula pendula
Eucalyptus grandis
Phoenix dactylifera (date palm, global)
Populus nigra
Populus tremula
Populus trichocarpa (global)
Pinus albicaulis
Pinus aristata
Pinus ayacahuite
Pinus balfouriana
Pinus flexilis
Pinus monophylla
Pinus monticola
Pinus strobiformis
Pinus strobus
Pinus sylvestris, Sanger
Pinus sylvestris, exome capture
0.000
0.005
0.010
0.015
4−fold nucleotide diversity
Species
Group
Angiosperm
Angiosperm tree
Pinus
Pinus sylvestris
|
15
PYHÄ JÄRVI et al .
sequencing) seems to converge to 0.006/bp (Dvornyk, Sirviö,
Mikkonen,&Savolainen,2002;García‐Gil,Mikkonen,&Savolainen,
2003;Grivetetal.,2017;Kujala&Savolainen,2012;Pyhäjärvietal.,
2007;Wachowiak etal., 2011),andthefirstgenomewideestimate
of silent, fourfold degenerate sites genetic diversity is 0.004/bp
(Tyrmietal 2019).Thisestimate isslightlylowerthanthevariation
observed in other Pinus at four fold degenerate sites (Eckert et al.,
2013) and also slightly lower than diversity observed in silent sites
of P. sylvestris in earlier studies. Note that even silent and fourfold
degenerate nucleotide diversity and mutation rate estimates for P.
sylvestris are based on data derived from genic regions. The overall
patterns of diversity further away from genes may be very dif ferent
and likely higher.
As for many other species, there is a mismatch between the
observed nucleotide diversity and the census size in P. s yl ves‐
tris (Lewontin, 1974). Under the assumption of standard neutral
equilibrium, 4Neμ (where Ne is the ef fective population size and μ
is mutati on rate per site pe r generation) is ex pected to equ al the
pairwise nucleotide diversity, θ (Tajima, 1983). Ne can then be es‐
timated with Ne = θ/(4μ). The mutation rates per bp per year vary
from0.22 to 1.3× 10−9 (Buschiazzo, Ritland, Bohlmann, & Ritland,
2012; De La Torre, L i, Peer, & Ingvarsso n, 2017; Py häjärvi et al. ,
2007;Willyard, Syring,Gernandt,Liston,&Cronn,2007). Together
with silentnucleotide diversity of 0.004/bpand assuming genera‐
tion time of 20 years, this yields population size estimates ranging
from 38,000 to 230,000 individuals. Yet, the actual census size is
in the scale of billions of individuals in a seemingly random mating
population. Clearly, some of the above‐mentioned assumptions are
violated. Potential reasons for the discrepancy are as follows: (a) un‐
even fecundity among individual trees, which would increase the al‐
lelefrequencyvarianceacrossthegenerations,leadingtoincreased
drift, and deviations from the assumed Wright–Fisher population
model; (b) much lower mutation rate than estimated based on fossil
record and Picea–Pinus divergence;(c) nonequilibrium population
history reducing the long‐term Ne; and (d) effect of linked selection
permeating most of the genic areas, thus reducing genetic diversity
(Charlesworth, Morgan, & Charlesworth, 1993;Maynard Smith &
Haigh,1974).
These potential explanations do not exclude each other. It is
likely that of fspring number of P. sylvestris does not follow the as‐
sumed Poisson distribution with mean number of offspring = 1.
Mature P. sylvestris trees have high fecundity, and thus, single trees
have a potential to make a large contribution to next generation. This
can lead to Moran model like multiple‐merger population coalescent
processes, reduced diversity in relation to population census size,
and also to an excess of rare alleles due to star‐shaped gene genealo‐
gies (Eldon & Wakeley, 2006). In seed orchard conditions, significant
variation in offspring number has indeed been identified among gen‐
otypes(Gömöry,Bruchánik,&Longauer,2003;Gömöry,Bruchanik,
& Paule, 2000; Kang, Bila, Harju, & Lindgren, 2003; Savolainen,
Kärkkäinen,Harju,Nikkanen,&Rusanen,1993).Infullynaturalcon‐
ditions , this variance is e xpected to b e even larger. The AFS ind i‐
catesnonequilibriumpopulationhistory(Kujala&Savolainen,2012;
Pyhäjä rvi et al., 20 07), which also r educes the amo unt of genetic
diversity.Linkedselectionhasbeenshowntoaffectawidespectrum
of species (Charlesworth & Charlesworth, 2018), and evaluating its
importance in P. sylvestrisisoneofthemajorevolutionaryquestions
that moregenomicdata will help us to tackle. Grivet et al. (2017)
observed both high efficiency of purifying selection and high rate
of positive selection in P. sylvestris and Pinus pinaster, in accordance
withgymnospermsingeneral(DeLaTorreetal.,2017), supporting
linked selection as a likely explanation for low genetic diversity in P.
sylvestris genic areas. Evaluation of the overall ef fect of linked se‐
lectionongeneticdiversityrequiresadensegeneticmapcombined
with the physical map because a correlation between nucleotide di‐
versity and recombination rate is expec ted. Current estimates sug‐
gestthatLDinP. sylvestris in general decays fast, often within few
hundred bp (however,see detailed discussion onLDbelow), which
would predict local effects of linked selection nearby genic regions,
asobserved,forexample,inmaizethatalsohaslowLD(Beissingeret
al.,2016).However,thepartialselfingmaygeneratesomeopportu‐
nity for linked selection, but this has not yet been studied.
Note that dif ferent genetic estimates and measures of nucle‐
otide diversity may have apparent discrepancy due to sc ale. Some
estimates such as nucleotide diversity are estimated at base‐pair
resolution.However,otherobservations,suchaslethalequivalents,
reflect whole genome‐level phenomena. The genomewide mutation
rate to deleterious alleles can be high even with low per base‐pair
mutation rate level, if the mutational target size is large.
The recom bination rate per bp (c)and exte n tofLDa rec riti c alf a c‐
tors in breeding as they determine how selection on a subset of loci
will affect other nearby loci and essentially determines the genomic
resolutionofmanybreedingefforts.In anequilibriumsituation,LD
depends on both c and Ne and the population‐level recombination
parameter ρ = 4Necbeestimatedfrom LD patterns.Further,c can
be independently estimated with genetic mapping. In a practical
context, it is good to remember that c affects the precision of, for
example,QTLandothergeneticmappingefforts,whereasρ is more
significant for association studies.
Asmentionedabove,P. sylvestris appears to have relatively low
LD,whenmeasuredbyr2,squaredcorrelationcoefficient,extending
the level above 0.2 often only just few hundreds of base pairs (Kujala
&Savolainen,2012;Pyhäjärvi et al.,2007;Tyrmietal., 2019).The
extentofLDmeasuredwithr2 is useful for many applications, such
as genomic prediction, because it directly informs about the power
ofalocustopredicttheallelic stateofanotherlocus(Hahn,2018).
However,asallmeasuresofLD,itisdependenton themarginalal‐
lelefrequencies.Whentherearemanylow‐frequencySNPs,asinP.
sylvestris, r2 tends to be low not only due to high recombination but
alsoduetolowallele frequencies.P. sylvestris LDmeasuredas|D′|
(LDmeasurescaledbyitsmaximumvaluegiventheallelefrequen‐
cies) that is more informative about the recombination rate suggest s
moreextensiveLD than the valuefromtheoverallr2 decay (Kujala
& Savolainen, 2012).
Asinmostspecies,LDpatternsinconifersareprobablyhetero‐
geneous among different genomic regions (Pavy, Namroud, Gagnon,
16
|
PYHÄJÄRVI et a l.
Isabel , & Bousquet, 201 2). Despite the gen eral trend of ra pid LD
decay(Pyhäjärvietal.,2007;Wachowiak,Balk,&Savolainen,2009),
there is considerable variation along the genome in P. sylvestris at
the gene level LD (Kujala & Savolainen, 2012; Pyhäjärvi, Kujala, &
Savolainen, 2011). For example, Pyhäjärvi et al. (2011) found that
several allozymecoding loci have strong LD, not showing signsof
decayevenovera12‐kbpregion.Also inP. taeda, earlier data sug‐
gestedthatLDdecaysrapidly(Brown,Gill,Kuntz,Langley,&Neale,
2004),butsomerecentworksuggestslargevariationintheextentof
LDinthegenome(Luetal.,2016,see,however,Acostaetal.,2019).
The overall recombination rate in conifers, estimated based on
genetic maps, is in general low (Jaramillo‐Correa, Verdú, & González‐
Martínez,2010)andthesameappliestoP. sylvestris. The estimated
maplengthis 1,500cM(Komulainen etal.,2003)andwiththege‐
nomesize22×109bpresultsinrecombinationrateof0.07cM/Mb
or c=0.7×10−9 per bp per generation. Thus, assuming Ne 38,000–
230,000 obtained from nucleotide diversity data yields ρ estimates
0.0001–0.0006, whereas some ρ‐estimates obtained from the
nucleot ide diversit y data are much h igher ranging f rom 0 to 0.04
(Kujala&Savolainen,2012;Pyhäjärvietal.,2007,2011;Wachowiak
etal., 2009).The apparent discrepancyoflow LD and lowrecom‐
bination rate canbe explainedbylowminorallelefrequencies, rel‐
atively high Ne and potential for low recombination and extensive
LDinintergenicregions.Currently,LDestimatesof P. sylvestris are
available within genes and between pairs of coding areas. Evidence
for LD in inter genic areas is rare, a nd studies at an inte rmediate
range are missing in P. sylvestris. A study on Cryptomeria japonica
has indeed shown that noncoding regions of conifer genomes can
harborextensiveLD(Moritsukaetal.,2012).Bettergenomeassem‐
bliescombinedwithextensiveresequencingeffortsarerequiredto
geta fullerpicture ofLDacrossP. sylvestrisgenome.Longreadse‐
quencingcombinedwith,forexample,opticalandgeneticmapping
will allow identifying regions where physical and genetic distances
have most discrepancies. Identifying these regions is important for
breeding and understanding adaptive variation as large fragment s of
genome are dragged along when responding to selec tion.
4 | POPULATIONSTRUCTURE,HISTORY,
ANDALLELEFREQUENCYDISTRIBUTION
Pinus sylvestris has efficient wind pollination with potential for very
long‐distance pollen dispersal. In addition, female flowering pre‐
cedes the m ale flowering by t wo to five days (Kosk i, 1970;Var is,
Pakkan en, Galofré, & Pu lkkinen, 20 09). Pinus sylvestris pollen dis‐
persal distribution has a “fat‐tailed” leptokurtic shape allowing spo‐
radic long‐distance dispersal events. Most pollen comes from nearby
sources(Koski,1970;Muona&Harju,1989;Torimaru,Wang,Fries,
Andersson, & Lindgren, 2009), but nonlocal airborne germinable
pollen is of ten observed during the female flowering, and northern
populations receive some southern pollen potentially from hundreds
of kilometers away before the local pollen is available ( Varis et al.,
2009).InRobledo‐ArnuncioandGil(2005),4.3%ofthepollencame
outside the isolated P. sylvestris stand despite an estimated average
pollen di spersal dist ance of only 135 m (Robled o‐Arnuncio & Gil ,
2005).Evenrarelong‐distancedispersalcanhaveanimportantrole
in homogenizing the distribution of genetic variation and result in,
for example, suboptimal phenotypic variation. Seeds are also dis‐
persed by wind, but not as extensively as pollen, distances peaking in
<10m(Kellomäki,Hänninen,Kolström,Kotisaari,&Pukkala,1987).
The dispersal biology is reflected in the geographic distribution of
genetic diversity. In nuclear genes that are dispersed by both pollen
and seeds, the genetic differentiation is consistently low (FST = 0.02)
in the more continuous part of the distribution (Karhu et al., 1996;
Kujala & Savolainen,2012; Prus‐Głowackiet al., 2012; Pyhäjärviet
al., 20 07). Maternall y inherited mit ochondrial m arkers have a more
restricted geographic distribution of alleles (GST = 0.66; Naydenov,
Senneville,Beaulieu,Tremblay,&Bousquet,2007;Pyhäjärvi,Salmela,
& Savolainen, 2008). The eastern part of the distribution is less stud‐
ied in terms of nuclear markers, but isozyme studies imply that ge‐
netic differentiation is also low (Dvornyk, 2001; Goncharenko, Silin, &
Padutov,1994).Asinmanyotherspecies,subtlegeographicstructure
can be observed at the range margins and even in the main distri‐
bution when a large number of nuclear loci are observed (Kujala &
Savolainen,2012;Tyrmietal.,2019;Wachowiaketal.,2011).Insum‐
mary, the nuclear polymorphisms and continuous distribution indicate
a lack of actual discrete populations, and in many analyses, P. sylvestris
within most of Europe can be treated as a single panmictic population.
Pinus sylvestris distribution is continuous in the north, whereas in
the southern margin, the distribution consists more of fragmented
and isolated populations. Mediterranean peninsulas and Turkish
populations harbor mitochondrial haplotypes that have rarely or
never been observed outside these isolated populations (Cheddadi
etal.,2006;Naydenov et al., 20 07;Pyhäjärvi et al., 2008; Sinclair,
Morman, & Ennos, 1999; Soranzo, Alia, Provan, & Powell, 200 0;
Wójkiewicz, Cavers, & Wachowiak, 2016). This reflec ts the existence
oflimitedseeddispersalandMediterraneanrefugiaduringtheLast
GlacialMaximum(LGM)andstillongoingpostglacialchangesinsuit‐
able habitats, but also contemporar y land use and level of forested
areas in general. In addition to contemporary gene flow, part of the
low nuclear differentiation in higher latitudes could be explained by
colonizationprocesscombinedwithalongjuvenilestage(Austerlitz,
Mariette, Machon, Gouyon, & Godelle, 200 0).
Analysis of population genetic structure, measured as FST or
inferred using STRUCTURE type approaches (Falush, Stephens, &
Pritchard, 2003), is based on model of discrete populations. In real‐
ity, given the dispersal biology of P. sylvestris, the among‐population
covariance structure is probably more of isolation‐by‐distance (IBD)
type. Bradburd, Coop, and Ralph (2018) presented a promising new
method to account for IBD patterns in genetic population structure
analysis. When applied to nuclear P. sylvestris data, the spatial model
with IBD has a better predictive accuracy than nonspatial (cluster)
model(Tyrmietal.,2019).Thespatialmodelidentifiesageneticcom‐
ponent that is only present in an isolated Spanish population, consis‐
tentwithpreviouswork(Figure3).However,inthespatialmodelthe
genetic makeup of the rest of the sampling sites remains continuous
|
17
PYHÄ JÄRVI et al .
in contrast to the nonspatial model that assigns a proportion of west‐
ern populations to the Spanish‐type cluster (Figure 3). These results
support including IBD‐type spatial patterns in further analyses af‐
fected by population struc ture of P. sylvestrissuchasGWAS.
Nucleotidediversitystudieshaverevealedanonequilibriumpat‐
terninthedistributionofallelefrequenciesofP. sylvestris through‐
out its distribution, which is a common observation in forest trees
(Figure4;Heuertzetal.,2006;Ingvarsson,2008;Moscaetal.,2012;
Zhou,Bawa,&Holliday,2014).Theobservedexcessofrarevariants
is consistent with a bottleneck whose timing is uncer tain but much
earlier t han the LGM (Kujala & S avolainen, 2012; P yhäjärv i et al.,
2007).
As for many other conifers, the evolutionary timescales that
influence both phenotypic and molecular genetic diversity in P.
FIGURE 3 Applicationofspatialandnonspatialpopulationstructuremodel(Bradburdetal.,2018)withK = 2 to Pinus sylvestris exome
capturedatafrom(Tyrmietal.,2019).Geographicdistributionoflayercontributionsforeachpopulationunderspatial(a)andnonspatial
models (c). The propor tion of layer contributions in each population for spatial (b) and nonspatial models (d). Sampling locations are ordered
according to longitudes
Spatial, K = 2
0.0 0.2 0.4 0.6 0.8 1.0
Admixture
Baza
Inari
Kalvia
Kolari
Latvia
Megdurechensk
Penzenskaja
Punkaharju
Poland
Ust Chilma
Ust Kulom
Volgogradskaja
Nonspatial, K = 2
0.0 0.2 0.4 0.6 0.8 1.0
Admixture
Baza
Inari
Kalvia
Kolari
Latvia
Megdurechensk
Penzenskaja
Punkaharju
Poland
Ust Chilma
Ust Kulom
Vo
lgogradskaja
(a) (b)
(c) (d)
18
|
PYHÄJÄRVI et a l.
sylvestris can be very extensive, potentially reaching millions of
years(Pyhäjärvietal.,2007).Thisisacombinedpropertyofitslong
generation time and large Ne. Therefore, observing the signatures
ofpost‐LGMpopulationexpansionwould require largeamountof
data, exceeding the sample sizes used in most published studies.
Simple coalescent simulations show that, for example, doubling
the popu lation size from 25 × 106 to 50 × 106 individuals during
the past 20,000 years (1,0 00 generations) hardly shif ts the distri‐
bution of Tajima'sDfrom the equilibriumpopulationexpectations
(Figure5).ThelargeNe results in long expected coalescence times,
and thus, most of the observed diversit y reflects the time before
LGM.However,recentexpansioncouldpartlyexplaintheobserved
lownucleotidediversity(Figure5).Itisnoteworthythatduringad‐
aptation in rapidly growing populations, the current population size
governs the adaptive process, whereas overall molecular diversity is
defined by long‐term Ne, which can be considerably lower (Messer
&Petr ov,2013).Largersamplesize s,al lowedbymoreaf fordablese‐
quenci ng,willprovideabetterresolut iononaddres si ngmorerecent
demographic events as, for example, in Keinan and Clark (2012).
The skewed AFSalsohas practicalconsequences. Thegeneral
utility and information content of a given polymorphism depends
onits allelefrequency.Rare alleles are informative onlyinasmall
number of populations and families. Common alleles are generally
considered more useful in, for example, paternity analysis, breeding,
genetic mapping, genomic prediction, and genomewide association
analysis.However,insomecases,ignoringtherareallelesmaylead
to biased conclusions. It will result in biased estimates of diversit y
andmayleadtoign or ingimportantfunctionalvariation(DeLaTorre
etal.,2017,2019;Fahrenkrogetal.,2017;Manolioetal.,2009).
5 | ADAPTIVEVARIATIONATTRAITAND
MOLECULARLEVEL
The extent of local adaptation and distribution of adaptive variation
amonggeographicareas,genomes,andindividualsisacorequestion
in evolutionary genetics and also has impacts on conservation and
breeding efforts. The ultimate proof of local adaptation is the supe‐
rior fitness of a local population in comparison with nonlocal popu‐
lations (Kawecki&Ebert,2004). So far,the strongest evidencefor
local adaptation in P. sylvestris has been obtained at the phenotypic
level, but modern tools contribute to connecting phenotypic with
molecular variation. Many of the potentially adaptive traits display a
continuous, nearly normal within‐population distribution indicative
ofpolygenic basisof inheritance(Mather,1943).Phenotypicvaria‐
tion is often also continuous at a geographic scale so that traits are
correlated with, for example, altitude or latitude in a linear, clinal
manner, caused by an interplay of selective gradient and gene flow in
continuousspace(Huxley,1938).Whilemanyotherconiferspecies
have also been extensively characterized in terms of adaptive phe‐
notypic variation, P. sylvestris is among the few for which extremely
little population struc ture at genomewide level within a very large
continuous distribution is coupled with high QST values, measure
of phenotypic differentiation among populations. These properties
provide a relatively straightforward theoretical setup to study the
dynamicsofadaptivevariation(Adrion,Hahn,&Cooper,2015).
Barton(1999)suggeststhatapartofthelociaffectingtheclinal
phenotypic variationshouldformstrongallele frequencyclines by
thetimepopulationisapproachinganequilibrium.Anothersugges‐
tion emphasizes selection on favorable allele combinations across
adaptive loci (allelic covariation) with only weak selection pressure
on indivi dual loci, instea d of notable allel e frequency clin es. This
couldbeespeciallyimportantintheearlyphasesofselection(Latta,
1998;Le Corre&Kremer,2003,2012).Further,phenotypescould
well be genetically redundant, and allelic effects, transient (Barghi et
al.,2019;Yeaman,2015).
Over decades, P. sylvestris has been extensively characterized
in terms of possible adaptive trait variation. In a comparison of
27European conifer species, P. sylvestris had one of the highest
QSTvaluesonheightincrement,budflush,andbudset(Alberto
et al., 2013; see also Savolainen et al., 2007). Extensive pheno‐
typic variation has been observed, for example, in phenology
(Beuker, 1994;Clapham, Ekberg, Eriksson, Norell,&Vince‐Prue,
2002; Kujalaetal., 2017;Mikola, 1982; Notivol, García‐Gil, Alia,
FIGURE 4 Example of an allele
frequencyspectruminPinus sylvestris
(Tyrmietal.,2019).Observed4‐foldand
0‐fold degenerate sites are plotted as
bars, and the expected values (Nordborg
etal.,2005)underneutralequilibriumare
indicated by line
0.0
0.1
0.2
0.3
0.4
010203
04
0
Frequency
Proportion
Site.type
0−fold
4−fold
|
19
PYHÄ JÄRVI et al .
& Savolainen, 2007; Salmela, Cavers, Cottrell, Iason, & Ennos,
2013), cold tolerance (Hurme, Repo, Savolainen, & Pääkkönen,
1997),drought (Palmrothet al., 1999;Semerci etal., 2017),wa‐
terlogging stress (Donnelly, Cavers, Cottrell, & Ennos, 2018), root
(Zadworny, McCormack, Mucha, Reich, & Oleksyn, 2016), seed
(Reich, Oleksyn,&Tjoelker, 1994)and needle(Donnelly,Cavers,
Cottrell, & Ennos, 2016) characteristics, and carbohydrate and
nutrient dynamics (Oleksyn, Reich, Zytkowiak, Karolewski, &
Tjoelker, 2003; Oleksyn, Zytkowiak, Karolewski, Reich, & Tjoelker,
2000). For example, northern populations are more cold tolerant,
have earlier growth cessation, and grow less during the growing
season, consistent with the gradient in environmental conditions
and local adaptation.
Wealth of data on survival and grow th ser ving as proxies for
fitness exists in provenance trials (reviewed by Langlet (1971)).
Savolainenetal.(2007)usedsuchdatatoidentifylocaladaptation.
Comparison of fitness of transferred populations to the fitness of
local populations based on relative sur vival and height showed, for
example, that P. sylvestris populations in central Sweden are locally
adapted. This implies that tree populations have obviously had time
to evolve to match the new habitats exposed by the retreating con‐
tinental ice retreat (Davis & Shaw, 2001). Rapid adaptation is con‐
cordant with the theoretical expectation that selection is efficient
in large populations.
The molecular genetic basis of the adaptive clinal variation can
be identified with t wo basic approaches: association mapping that
identifies genetic polymorphisms correlated with a given pheno‐
typic variation, or by methods that rely solely on genotypic data.
Associationstudies,especiallywhenboththe between‐population
and within‐population variation can be addressed, can reveal im‐
port ant polymorphis ms underlying adaptation. At the sam e time,
the clinal setup poses a challenge for controlling population struc‐
ture when it occurs along the same environmental gradient. Kujala
etal.(2017)addressthisproblemwithlatitudinalvariationintiming
of bud set in the first‐year seedlings of P. sylvestris. In this Bayesian
approach, the presence of a within‐population association signal is
requiredacrossdifferent populations to exclude spuriousassocia‐
tions. Other new promising methods, akin to QST−FST comparisons,
FIGURE 5 Tajima's D (a) and
nucleotide diversity (b) distributions for
constant size (SNM) and exponentially
(EXP) growing population. Coalescent
simulations were conducted with Coala
package in R (Staab & Metzler, 2016)
with the following parameters. SNM:
N0=50×106, μ=0.004/bpandlocus
length1,000bpandsamplesize50.EXP:
as in SNM but exponential growth started
at time 0.00002
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●