ArticlePDF AvailableLiterature Review

Abstract and Figures

Pinus sylvestris has a long history of basic and applied research that is relevant for both forestry and evolutionary studies. Its patterns of adaptive variation and role in forest economic and ecological systems have been studied extensively for nearly 275 years, detailed demography for a 100 years and mating system more than 50 years. However, its reference genome sequence is not yet available and genomic studies have been lagging compared to e.g. Pinus taeda and Picea abies, two other economically important conifers. Despite the lack of reference genome, many modern genomic methods are applicable for a more detailed look at its biological characteristics. For example, RNA‐seq has revealed a complex transcriptional landscape and targeted DNA sequencing displays an excess of rare variants and geographically homogenously distributed molecular genetic diversity. Current DNA and RNA resources can be used as a reference for gene expression studies, SNP discovery and further targeted sequencing. In the future, specific consequences of the large genome size, such as functional effects of regulatory open chromatin regions and transposable elements should be investigated more carefully. For forest breeding and long‐term management purposes, genomic data can help in assessing the genetic basis of inbreeding depression and application of genomic tools for genomic prediction and relatedness estimates. Given the challenges of breeding (long generation time, no easy vegetative propagation) and the economic importance, application of genomic tools has potential to have a considerable impact. Here we explore how genomic characteristics of P. sylvestris, such as rare alleles and the low extent of linkage disequilibrium, impact the applicability and power of the tools. This article is protected by copyright. All rights reserved.
This content is subject to copyright. Terms and conditions apply.
Evolutionary Applications. 2020;13:11–30.    
|
 11
wileyonlinelibrary.com/journal/eva
1 | INTRODUCTION
Pinus sylvestris (Scots pine) is one of the world's most widely dis
tributed and northern conifers reaching from the British Isles
in the west to the Siberian taiga in the east. It is also found in
the mountainous areas of Mediterranean peninsulas in Southern
Europe. It is especially competitive in poor soils and in dry and ex
tremely cold environments, and compared to many other Pinus, it s
ecological niche is wide (Rehfeldt et al., 2002; Svenning, Normand,
& Kageyama, 2008). Especially in the northern part of its dis
tribution, it is a dominant tree in forested areas (Durrant, Rigo,
& Caudullo, 2016) and has an important role as a part of forest
Received:16November2018 
|
  Revised:5A pril2019 
|
  Accepted:24April2019
DOI:10.1111/eva.12809
SPECIAL ISSUE REVIEW AND SYNTHESES
275 years of forestry meets genomics in Pinus sylvestris
TanjaPyhäjärvi1,2 | SonjaTKujala3| OutiSavolainen1,2
ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicense,whichpermitsuse,distributionandreproductioninanymedium,
provide d the orig inal work is proper ly cited .
©2019TheAuthors.Evolutionary ApplicationspublishedbyJohnWiley&SonsLtd
1Depar tment of Ecology and
Genetics, University of Oulu, Oulu, Finland
2Biocenter Oulu, Universi ty of Oulu, Oulu,
Finland
3NaturalResourcesInstituteFinland,Luke,
Oulu, Finland
Correspondence
Tanja Pyhäjär vi, Department of Ecology and
Genetics, Uni versit y of Oulu, P.O. Box 3000,
90014Oulu,Finland.
Email: tanja.pyhajarvi@oulu.fi
Fundinginformation
Biotieteiden ja Ympäristön Tutkimuksen
Toimikunta,Grant/AwardNumber:287431,
293819,307582and309978
Abstract
Pinus sylvestris has a long history of basic and applied research that is relevant for
both forestry and evolutionary studies. Its patterns of adaptive variation and role
in forest economic and ecological systems have been studied extensively for nearly
275 years, detailed demography for a 100 years and mating system more than
50years.However,itsreferencegenomesequenceisnotyetavailableandgenomic
studies have been lagging compared to, for example, Pinus taeda and Picea abies, two
other economically important conifers. Despite the lack of reference genome, many
modern genomic methods are applicable for a more detailed look at its biological
characteristics.Forexample,RNA‐seqhasrevealedacomplextranscriptionalland
scape an d targeted DNA sequ encing displays an excess of r are variants and ge o
graphicallyhomogenouslydistributedmoleculargeneticdiversity.CurrentDNAand
RNAresourcescanbeusedasareferenceforgeneexpressionstudies,SNPdiscov
ery, and fur ther targeted seq uencing. In the futu re, specific conseq uences of the
large genome size, such as functional effects of regulatory open chromatin regions
and transposable elements, should be investigated more carefully. For forest breed‐
ing and long‐term management purposes, genomic data can help in assessing the ge‐
netic basis of inbreeding depression and the application of genomic tools for genomic
prediction and relatedness estimates. Given the challenges of breeding (long genera‐
tion time, no easy vegetative propagation) and the economic importance, application
ofgenomictoolshasapotentialtohaveaconsiderableimpact.Here,weexplorehow
genomic characteristics of P. sylvestris, such as rare alleles and the low extent of link‐
agedisequilibrium,impacttheapplicabilityandpowerofthetools.
KEYWORDS
adaptation,allelefrequencyspectrum,breeding,genomicprediction,genomics,inbreeding
depression,linkagedisequilibrium,Pinus sylvestris
12 
|
   PYHÄJÄRVI et a l.
ecosystems as a global carbon reservoir and also via interactions
withsoil microbesandfungi (Högberg et al., 2001;Lindén et al.,
2014;Panetal.,2011).Economically,itisanimportanttreespe
cies in Nor thern Eurasia as raw material for paper and pulp indus
try and as timber (Durrant et al., 2016; Gardner, 2013; Mason &
Alía,2000;Mullinetal.,2011).Itiswidelyplantedfortimberout
side its natural range. P. sylvestris isestimatedto cover over 145
million hectares of forest in Eurasia (Durrant et al., 2016; Mason &
Alía,2000;Mullinetal.,2011).
Very rough estimates of the actual population census size can be
made by combining distribution and density information. In commer‐
cial P. sylvestris forests, there are about 200 0 stems per hectare after
precommercialthinning(Fahlvik,Ekö,&Pettersson,2005;Väisänen,
Kellomäki,Oker‐Blom,&Valtonen,1989),whichyieldsanestimate
ofpopulationcensussizeof290×109 trees. P. sylvestris is not glob‐
ally under threat in terms of species viability as such (Gardner, 2013).
However,its dominant role in large forest ecosystems means that
any changes in its distribution or mortality are likely to have large
ecologicalandeconomicconsequences.
Multiple subspecies and varieties of P. sylvestris have been
described based on morphological and phenological differences
(Molotkov&Patlaj,1991;Pravdin,1969;Ruby&Wright, 1976),but
groupings are not consistent across studies and var y depending on
the studied traits (Shutyaev & Giertych, 2000). Often the under‐
lying phenotypic differences have more clinal rather than discrete
distribution and the description of separate varieties reflects more
the sampling design and silvicultural purposes than true biologic al
groups(Langlet,1971;Shutyaev&Giertych,2000).Atthegenome
wide level, molecular genetic differentiation among sampling loca
tions at nuc lear loci is low (FST = 0.02) in mos t parts of the dis tribution
(Karhuetal.,1996;Kujala&Savolainen,2012;Muona&Harju,1989;
Prus‐Głowacki, Urbaniak, Bujas, & Curtu, 2012; Pyhäjärvi et al.,
2007;Tyrmietal.,2019).
Due to its dominance and economical importance, phenotypic
variation of P. sylvestris has been of interest since the early days of
biological sciences. The first studies of phenotypic and adaptive dif‐
ferences among populations originating from different geographic
regionsweremotivatedbyshipbuildingandforestry(Albertoetal.,
2013;Engler,1913;Langlet,1971;Morgenstern,1996).Thefirstev
idence of P. sylvestris provenance (common garden) trials in France
comesfromasearlyas1745(Langlet,1971).Later,several variably
documented national (Eiche, 1966; Heikinheimo, 1950) and inter
national (Giertych & Oleksyn, 1992; Shutyaev & Giertych, 1998;
Wiedemann,1930)provenance trialshave beenestablishedmainly
for silvicultural purposes, also out side its natural range (Wright & Ira,
1962).Even though not originally designed for the purpose, these
data have proved valuable for evaluating the level of ecological ad‐
aptation(Savolainen,Pyhäjärvi,&Knürr,2007) andmakingpredic
tionsonresponsestoclimatechange(Persson,1998;Rehfeldtetal.,
2002). In the 20th century, very detailed studies on reproductive
biology(Koski,1970;Lönnroth, 1926;Sarvas,1962)formeda solid
basis for modern applied and evolutionar y genetic research of P.
sylvestris.
The extensive distribution, large effective population size (Ne),
efficient gene flow, predominantly outcrossing nature, and large
22 Gbp genome, yet strong phenotypic differentiation in adaptive
traits, make P. sylvestris an intriguing system to examine adaptive
processesof polygenictraits.Lack ofsignificantlongitudinalstruc
ture allows replicating sampling along latitudinal transects and in
specting whether the same loci or variants are participating in the
adaptive trait variation throughout the distribution or whether the
adaption has emerged via different combinations of adaptive al‐
leles.Incomparisonwith,forexample,theNorthAmericaninterior
spruce complex (Yeaman et al., 2016), P. sylvestris does not signifi‐
cantly hybridize with other species (see, however, Wachowiak &
Prus‐Głowacki,2008),likelybecausemostcloserelativeshavesmall
distributions or do not overlap with P. sylvestris.Lackof hybridiza
tion should further facilitate understanding of the genetic basis of
local adaptation. The species is also known to harbor large number
oflethalequivalents(Kärkkäinen,Koski,&Savolainen,1996;Koski,
1971),whichallowsexaminingthejointeffectsofdeleteriousvaria
tion with adaptive and phenotypic variations.
In comparison with farm animals, crops, and vegetables, many
coniferous forest trees gene pools have not been extensively
changed by humans, natural regeneration is common in large part of
the distribution, and breeding is conducted in a handful of countries
(Mullin et al., 2011). In Finland and Sweden, breeding efforts on P.
sylvestriswerestartedinthelate1940sbycollectingphenotypically
superior plus trees from natural and cultivated forests. Plus trees
wereusedtoestablishseedorchardsforseedproduction.Later,the
geneticqualityoftheseedorchardshasbeenimprovedbyselection
amongplustreesbasedonprogenytesting,formingthe“1.5gener
ation”seedorchards(Haapanen,Hynynen,Ruotsalainen,Siipilehto,
& Kilpeläinen, 2016; Jansson, Hansen, Haapanen, Kvaalen, &
Steffenrem,2017).Withtraditionalbreedingmethods,geneticgains
upto~25%forgrowthtraitslikevolume,andreductioninrotation
times have been obtained and resulted in significant economic gains
inFinlandandScandinavia(Ahtikoski,Ojansuu,Haapanen,Hynynen,
&Kärkkäinen,2012;Janssonetal.,2017).
Given the low number of breeding cycles conducted so far, there
should be plenty of functional genetic variation, and thus, large
genetic gains can be expected. By adding genomic methods to the
available tools, we can expect further economic gains per unit of
time. Genomic information can improve the breeding efforts in P.
sylvestristhrough genomewideassociationanalyses(GWAS),infer
enceofgeneticrelatedness,andbygenomic prediction (Isik,2014;
Isiketal.,2016;Meuwissen,Hayes,&Goddard,2001).Sofar,efforts
to make P. sylvestris to flower at earlier age have not been successful.
Traits that can be measured only on adult trees hold greatest poten‐
tialintermsofshorteningthebreedingcycle(Isik, 2014).Genomic
prediction can deal with the highly polygenic basis of trait variation
(with unknown underlying loci) unlike other breeding methods, such
asgeneediting,requiringdetailedmolecularinformation.
Given its economical (Praciak, 2013) and ecological impor
tance, and the long record of research and knowledge (Giertych
&Matyas,1991),itis unfortunate that a whole genomesequence
    
|
 13
PYHÄ JÄRVI et al .
assembly of P. sylvestris has not yet been presented. Even though in
many other ways, it is one of the most thoroughly studied conifers
(Figure 1); from genomic s point of view, it is still a nonmodel organ‐
ism. However, whole genome sequencingand reference genomes
arenotaprerequisiteforusingcurrentmethodologiesforgenomic
analysis of P. sylvestris.
In this review, we consider biological, life history, and genomic
characteristics that are relevant for future applications of genomic
information in P. sylvestris. We will address (a) mating system and in‐
breedingdepression,(b)geneticdiversityandlinkagedisequilibrium,
(c) populati on struct ure, histor y,an d allele frequ ency distr ibution,
(d) adaptive variation at trait and molecular level, (e) genome size
and architecture, and (f) application of genomics in breeding and
conservation.
2 | MATINGSYSTEMANDINBREEDING
DEPRESSION
Pinus sylvestris, as many other fores t trees, is wind‐pollinated , diploid,
and predominantly outcrossing. The selfing rate measured in mature
seeds is about 5%–10% (Muona, Yazdani, & Rudin, 1987; Rudin,
Muona, & Yazdani, 1986).P. sylvestris is also known to display high
inbreeding depression, also a common observation in forest trees.
Inbreeding depression is normally measured by comparing the per‐
formance of selfed (or other inbred) offspring to that of controlled
outcrossed or open‐pollinated offspring. In pines, embr yonic mortal‐
ity can be evaluated by examining the proportion of empty seeds
fromdifferenttypesofcrosses(Kärkkäinenetal.,1996;Koski,1971).
Theaveragemortalityinselfedseedsis75%–85%(Kärkkäinenetal.,
1996;Koelewijn,Koski,&Savolainen,1999;Koski,1971),compared
to 20%–30% for open‐pollinated seed. Based on these kinds of
data, with simplif ying assumptions, the number of embryonic lethal
equivalentsinP. sylvestrishasbeenestimatedtobe9.4(Koski,1971).
The level is much higher than in most animals and angiosperms and
similar to many other conifers, such as Picea abies, Picea glauca, or
Pseudotsuga menziesii, while some pines with limited distributions
have much lower levels (e.g., Pinus radiata and Pinus resinosa;Lynch&
Walsh,1998;Williams&Savolainen,1996).Theselfingrateatfertili
zation is thus much higher than at the mature se ed stage. Further, the
selfing rate at fertilization can be even higher bec ause of polyzygotic
(simple) polyembryony: An ovulemaycontainuptofour embryos,
which can be of selfed or outcrossed origin. These embryos share
their maternal haplotype but have been fertilized by different pollen.
Only one dominant embryo will survive (Hedrick &Muona,1990;
Koski,1970;Sorensen,1982).Inbredmortalityduringearlyembryo
genesis and replacement by outcrossed progeny results in reduced
loss of maternal resources, thus reducing post‐seed‐maturation in‐
breedingdepression(Kärkkäinen&Savolainen,1993;Sarvas,1962).
Other studies (Koelewijn et al., 1999; Muona et al., 1987;
Yazdani,Muona,Rudin,&Szmidt,1985)haveshownthatthelow
fitness of selfs continue during later years, such that already in a
few years old seedlings and especially in young adult s, the selfed
progeny has been mostly eliminated and the genotypes are in
Hardy–Weinberg equilibrium. The result s onconifer sin general
suggest that the overall inbreeding depression is due to a large
FIGURE 1 Number of approximate
Google Scholar citations (as of October
19th2018)forsomemostintensively
studied conifer species. Number of results
indicated in each bar
225 000
214 000
108 000
75 600
64 800
50 500
48 000
46 100
34 900
34 800
33 300
30 300
27 200
26 300
25 500
23 000
22 700
Pinus sylvestris
Picea abies
Pinus radiata
Pinus taeda
Pseudotsuga menziesii
Pinus contorta
Pinus pinaster
Picea glauca
Abies alba
Cryptomeria japonica
Pinus halepensis
Picea sitchensis
Abies balsamea
Pinus banksiana
Ta xus baccata
Larix decidua
Pinus pinea
050 000 100 000150 000 200 000
Citations
Species
14 
|
   PYHÄJÄRVI et a l.
number of deleterious, partially recessive alleles (Williams &
Savolainen, 1996).Lande, Schemske, and Schultz (1994)showed
that the lethals can be maintained in the population despite the
selfing when there is a high per genome per generation mutation
rate to the deleterious alleles. In addition, stabilizing selection on
quantitativetr aitscaninterfereincomplexwayswithselectionon
lethals and either increase or decrease the probability of purging
(Lande&Porcher,2017).
Genomics can shed light on the genetic architecture of inbreed‐
ing depression by characterizing the numbers and effect sizes of
deleterious alleles, for example, by mapping (Hedrick & Muona,
1990;Ritland, 1996),andbyanalyzingheterozygosity oftheselfed
progeny, as in Eucalyptus grandis(Hedrick,Hellsten,&Grattapaglia,
2016): In the selfed progeny, heteroz ygosity was much higher than
the neutral expectation, suggesting that overall selection against ho‐
mozygotes in selfed offspring was very high (s=0.47).Furthermore,
inbreeding depression likely was due to par tially deleterious alleles
at more than 100 loci, even if overdominance effects could not be
fully excluded. Similar methods could help to identify genomic areas
with lethals also in conifers. So far, various functional and evolu‐
tionary prediction models of allelic substitution have been used to
identify deleterious alleles. This has then allowed comparisons of
allelicandgenotypicfrequenciesbetweenpopulationsorspecies,as
inPopulus(Zhang,Zhou,Bawa,Suren,&Holliday,2016)orinPicea
(Conteetal.,2017).Estimationofthedistributionoffitnesseffects
also allows conclusions on the nature of deleterious alleles. Allele
frequency spectrum(AFS) of some400 loci suggested thatP. s y l
vestris has fewer slightly deleterious alleles and a larger proportion
of more highly deleterious alleles than other conifers and plants in
general(Grivetetal.,2017;Hodgins,Yeaman,Nurkowski,Rieseberg,
&Aitken,2016).
While mating between relatives can be an import ant breeding
tool for some species, in conifers deleterious alleles are so numerous
that breeding strategies using inbreeding will often lead to fixation
ofdeleterious alleles, as showninsimulationsbyWu, Hallingbäck,
andSánchez(2016).However,insome cases, variation in levelsof
inbreeding depression may allow using mating between relatives
(Ford,McKeand,Jett, &Isik, 2014).Becauseof themostlyharmful
effects of mating between relatives, it is important to manage in‐
breeding levels in breeding programs by measuring relatedness with
genetic tools.
3 | GENETICDIVERSITYANDLINKAGE
DISEQUILIBRIUM
In many aspects, patterns of genetic diversity in P. sylvestris match
the expectations for a wind‐pollinated tree with large population
size.Populationstructureisalmostnonexistent, linkagedisequilib
rium (LD), the nonrandomassociationofallelesgenerallydoes not
extendfar,andgenotypesareinHardy–Weinbergequilibriumasex
pected under random mating (Kujala & Savolainen, 2012; Muona &
Szmidt,1985;Pyhäjärvietal.,2007;Tyrmietal.,2019;Wachowiak,
Salmela, Ennos, Iason, & Cavers, 2011).
Comparative studies have indicated that Pinus in general do
not have particularly low nucleotide diversity compared to other
plants(Chen,Glémin,&Lascoux,2017;Eckertetal.,2013;Figure2).
Silent site diversity in the P. sylvestris genic regions (based on Sanger
FIGURE 2 4‐foldsitenucleotide
diversit y for variety of angiosperms and
Pinus(datafromChenetal.,2017except
for Pinus sylvestris)
Arabidopsis thaliana (global)
Amborella trichopoda
Brachypodium distachyon (global)
Capsella grandiflora
Citrus sinensis
Coffea canephora
Cucumis hardwickii (wild cucumber)
Glycine soja (wild soybean)
Manihot glaziovii (wild cassava)
Medicago truncatula
Oryza rufipogon
Sorghum bicolor (wild)
Vitis vinifera (grape)
Zea mays (wild, teosinte)
Betula pendula
Eucalyptus grandis
Phoenix dactylifera (date palm, global)
Populus nigra
Populus tremula
Populus trichocarpa (global)
Pinus albicaulis
Pinus aristata
Pinus ayacahuite
Pinus balfouriana
Pinus flexilis
Pinus monophylla
Pinus monticola
Pinus strobiformis
Pinus strobus
Pinus sylvestris, Sanger
Pinus sylvestris, exome capture
0.000
0.005
0.010
0.015
4−fold nucleotide diversity
Species
Group
Angiosperm
Angiosperm tree
Pinus
Pinus sylvestris
    
|
 15
PYHÄ JÄRVI et al .
sequencing) seems to converge to 0.006/bp (Dvornyk, Sirviö,
Mikkonen,&Savolainen,2002;García‐Gil,Mikkonen,&Savolainen,
2003;Grivetetal.,2017;Kujala&Savolainen,2012;Pyhäjärvietal.,
2007;Wachowiak etal., 2011),andthefirstgenomewideestimate
of silent, fourfold degenerate sites genetic diversity is 0.004/bp
(Tyrmietal 2019).Thisestimate isslightlylowerthanthevariation
observed in other Pinus at four fold degenerate sites (Eckert et al.,
2013) and also slightly lower than diversity observed in silent sites
of P. sylvestris in earlier studies. Note that even silent and fourfold
degenerate nucleotide diversity and mutation rate estimates for P.
sylvestris are based on data derived from genic regions. The overall
patterns of diversity further away from genes may be very dif ferent
and likely higher.
As for many other species, there is a mismatch between the
observed nucleotide diversity and the census size in P. s yl ves
tris (Lewontin, 1974). Under the assumption of standard neutral
equilibrium, 4Neμ (where Ne is the ef fective population size and μ
is mutati on rate per site pe r generation) is ex pected to equ al the
pairwise nucleotide diversity, θ (Tajima, 1983). Ne can then be es‐
timated with Ne = θ/(4μ). The mutation rates per bp per year vary
from0.22 to 1.3× 10−9 (Buschiazzo, Ritland, Bohlmann, & Ritland,
2012; De La Torre, L i, Peer, & Ingvarsso n, 2017; Py häjärvi et al. ,
2007;Willyard, Syring,Gernandt,Liston,&Cronn,2007). Together
with silentnucleotide diversity of 0.004/bpand assuming genera
tion time of 20 years, this yields population size estimates ranging
from 38,000 to 230,000 individuals. Yet, the actual census size is
in the scale of billions of individuals in a seemingly random mating
population. Clearly, some of the above‐mentioned assumptions are
violated. Potential reasons for the discrepancy are as follows: (a) un‐
even fecundity among individual trees, which would increase the al
lelefrequencyvarianceacrossthegenerations,leadingtoincreased
drift, and deviations from the assumed Wright–Fisher population
model; (b) much lower mutation rate than estimated based on fossil
record and Picea–Pinus divergence;(c) nonequilibrium population
history reducing the long‐term Ne; and (d) effect of linked selection
permeating most of the genic areas, thus reducing genetic diversity
(Charlesworth, Morgan, & Charlesworth, 1993;Maynard Smith &
Haigh,1974).
These potential explanations do not exclude each other. It is
likely that of fspring number of P. sylvestris does not follow the as‐
sumed Poisson distribution with mean number of offspring = 1.
Mature P. sylvestris trees have high fecundity, and thus, single trees
have a potential to make a large contribution to next generation. This
can lead to Moran model like multiple‐merger population coalescent
processes, reduced diversity in relation to population census size,
and also to an excess of rare alleles due to star‐shaped gene genealo‐
gies (Eldon & Wakeley, 2006). In seed orchard conditions, significant
variation in offspring number has indeed been identified among gen‐
otypes(Gömöry,Bruchánik,&Longauer,2003;Gömöry,Bruchanik,
& Paule, 2000; Kang, Bila, Harju, & Lindgren, 2003; Savolainen,
Kärkkäinen,Harju,Nikkanen,&Rusanen,1993).Infullynaturalcon
ditions , this variance is e xpected to b e even larger. The AFS ind i
catesnonequilibriumpopulationhistory(Kujala&Savolainen,2012;
Pyhäjä rvi et al., 20 07), which also r educes the amo unt of genetic
diversity.Linkedselectionhasbeenshowntoaffectawidespectrum
of species (Charlesworth & Charlesworth, 2018), and evaluating its
importance in P. sylvestrisisoneofthemajorevolutionaryquestions
that moregenomicdata will help us to tackle. Grivet et al. (2017)
observed both high efficiency of purifying selection and high rate
of positive selection in P. sylvestris and Pinus pinaster, in accordance
withgymnospermsingeneral(DeLaTorreetal.,2017), supporting
linked selection as a likely explanation for low genetic diversity in P.
sylvestris genic areas. Evaluation of the overall ef fect of linked se‐
lectionongeneticdiversityrequiresadensegeneticmapcombined
with the physical map because a correlation between nucleotide di‐
versity and recombination rate is expec ted. Current estimates sug‐
gestthatLDinP. sylvestris in general decays fast, often within few
hundred bp (however,see detailed discussion onLDbelow), which
would predict local effects of linked selection nearby genic regions,
asobserved,forexample,inmaizethatalsohaslowLD(Beissingeret
al.,2016).However,thepartialselfingmaygeneratesomeopportu
nity for linked selection, but this has not yet been studied.
Note that dif ferent genetic estimates and measures of nucle‐
otide diversity may have apparent discrepancy due to sc ale. Some
estimates such as nucleotide diversity are estimated at base‐pair
resolution.However,otherobservations,suchaslethalequivalents,
reflect whole genome‐level phenomena. The genomewide mutation
rate to deleterious alleles can be high even with low per base‐pair
mutation rate level, if the mutational target size is large.
The recom bination rate per bp (c)and exte n tofLDa rec riti c alf a c
tors in breeding as they determine how selection on a subset of loci
will affect other nearby loci and essentially determines the genomic
resolutionofmanybreedingefforts.In anequilibriumsituation,LD
depends on both c and Ne and the population‐level recombination
parameter ρ = 4Necbeestimatedfrom LD patterns.Further,c can
be independently estimated with genetic mapping. In a practical
context, it is good to remember that c affects the precision of, for
example,QTLandothergeneticmappingefforts,whereasρ is more
significant for association studies.
Asmentionedabove,P. sylvestris appears to have relatively low
LD,whenmeasuredbyr2,squaredcorrelationcoefficient,extending
the level above 0.2 often only just few hundreds of base pairs (Kujala
&Savolainen,2012;Pyhäjärvi et al.,2007;Tyrmietal., 2019).The
extentofLDmeasuredwithr2 is useful for many applications, such
as genomic prediction, because it directly informs about the power
ofalocustopredicttheallelic stateofanotherlocus(Hahn,2018).
However,asallmeasuresofLD,itisdependenton themarginalal
lelefrequencies.Whentherearemanylow‐frequencySNPs,asinP.
sylvestris, r2 tends to be low not only due to high recombination but
alsoduetolowallele frequencies.P. sylvestris LDmeasuredas|D′|
(LDmeasurescaledbyitsmaximumvaluegiventheallelefrequen
cies) that is more informative about the recombination rate suggest s
moreextensiveLD than the valuefromtheoverallr2 decay (Kujala
& Savolainen, 2012).
Asinmostspecies,LDpatternsinconifersareprobablyhetero
geneous among different genomic regions (Pavy, Namroud, Gagnon,
16 
|
   PYHÄJÄRVI et a l.
Isabel , & Bousquet, 201 2). Despite the gen eral trend of ra pid LD
decay(Pyhäjärvietal.,2007;Wachowiak,Balk,&Savolainen,2009),
there is considerable variation along the genome in P. sylvestris at
the gene level LD (Kujala & Savolainen, 2012; Pyhäjärvi, Kujala, &
Savolainen, 2011). For example, Pyhäjärvi et al. (2011) found that
several allozymecoding loci have strong LD, not showing signsof
decayevenovera12‐kbpregion.Also inP. taeda, earlier data sug‐
gestedthatLDdecaysrapidly(Brown,Gill,Kuntz,Langley,&Neale,
2004),butsomerecentworksuggestslargevariationintheextentof
LDinthegenome(Luetal.,2016,see,however,Acostaetal.,2019).
The overall recombination rate in conifers, estimated based on
genetic maps, is in general low (Jaramillo‐Correa, Verdú, & González‐
Martínez,2010)andthesameappliestoP. sylvestris. The estimated
maplengthis 1,500cM(Komulainen etal.,2003)andwiththege
nomesize22×109bpresultsinrecombinationrateof0.07cM/Mb
or c=0.7×10−9 per bp per generation. Thus, assuming Ne 38,000
230,000 obtained from nucleotide diversity data yields ρ estimates
0.0001–0.0006, whereas some ρ‐estimates obtained from the
nucleot ide diversit y data are much h igher ranging f rom 0 to 0.04
(Kujala&Savolainen,2012;Pyhäjärvietal.,2007,2011;Wachowiak
etal., 2009).The apparent discrepancyoflow LD and lowrecom
bination rate canbe explainedbylowminorallelefrequencies, rel
atively high Ne and potential for low recombination and extensive
LDinintergenicregions.Currently,LDestimatesof P. sylvestris are
available within genes and between pairs of coding areas. Evidence
for LD in inter genic areas is rare, a nd studies at an inte rmediate
range are missing in P. sylvestris. A study on Cryptomeria japonica
has indeed shown that noncoding regions of conifer genomes can
harborextensiveLD(Moritsukaetal.,2012).Bettergenomeassem
bliescombinedwithextensiveresequencingeffortsarerequiredto
geta fullerpicture ofLDacrossP. sylvestrisgenome.Longreadse
quencingcombinedwith,forexample,opticalandgeneticmapping
will allow identifying regions where physical and genetic distances
have most discrepancies. Identifying these regions is important for
breeding and understanding adaptive variation as large fragment s of
genome are dragged along when responding to selec tion.
4 | POPULATIONSTRUCTURE,HISTORY,
ANDALLELEFREQUENCYDISTRIBUTION
Pinus sylvestris has efficient wind pollination with potential for very
long‐distance pollen dispersal. In addition, female flowering pre
cedes the m ale flowering by t wo to five days (Kosk i, 1970;Var is,
Pakkan en, Galofré, & Pu lkkinen, 20 09). Pinus sylvestris pollen dis
persal distribution has a “fat‐tailed” leptokurtic shape allowing spo‐
radic long‐distance dispersal events. Most pollen comes from nearby
sources(Koski,1970;Muona&Harju,1989;Torimaru,Wang,Fries,
Andersson, & Lindgren, 2009), but nonlocal airborne germinable
pollen is of ten observed during the female flowering, and northern
populations receive some southern pollen potentially from hundreds
of kilometers away before the local pollen is available ( Varis et al.,
2009).InRobledo‐ArnuncioandGil(2005),4.3%ofthepollencame
outside the isolated P. sylvestris stand despite an estimated average
pollen di spersal dist ance of only 135 m (Robled o‐Arnuncio & Gil ,
2005).Evenrarelong‐distancedispersalcanhaveanimportantrole
in homogenizing the distribution of genetic variation and result in,
for example, suboptimal phenotypic variation. Seeds are also dis‐
persed by wind, but not as extensively as pollen, distances peaking in
<10m(Kellomäki,Hänninen,Kolström,Kotisaari,&Pukkala,1987).
The dispersal biology is reflected in the geographic distribution of
genetic diversity. In nuclear genes that are dispersed by both pollen
and seeds, the genetic differentiation is consistently low (FST = 0.02)
in the more continuous part of the distribution (Karhu et al., 1996;
Kujala & Savolainen,2012; Prus‐Głowackiet al., 2012; Pyhäjärviet
al., 20 07). Maternall y inherited mit ochondrial m arkers have a more
restricted geographic distribution of alleles (GST = 0.66; Naydenov,
Senneville,Beaulieu,Tremblay,&Bousquet,2007;Pyhäjärvi,Salmela,
& Savolainen, 2008). The eastern part of the distribution is less stud
ied in terms of nuclear markers, but isozyme studies imply that ge
netic differentiation is also low (Dvornyk, 2001; Goncharenko, Silin, &
Padutov,1994).Asinmanyotherspecies,subtlegeographicstructure
can be observed at the range margins and even in the main distri
bution when a large number of nuclear loci are observed (Kujala &
Savolainen,2012;Tyrmietal.,2019;Wachowiaketal.,2011).Insum
mary, the nuclear polymorphisms and continuous distribution indicate
a lack of actual discrete populations, and in many analyses, P. sylvestris
within most of Europe can be treated as a single panmictic population.
Pinus sylvestris distribution is continuous in the north, whereas in
the southern margin, the distribution consists more of fragmented
and isolated populations. Mediterranean peninsulas and Turkish
populations harbor mitochondrial haplotypes that have rarely or
never been observed outside these isolated populations (Cheddadi
etal.,2006;Naydenov et al., 20 07;Pyhäjärvi et al., 2008; Sinclair,
Morman, & Ennos, 1999; Soranzo, Alia, Provan, & Powell, 200 0;
Wójkiewicz, Cavers, & Wachowiak, 2016). This reflec ts the existence
oflimitedseeddispersalandMediterraneanrefugiaduringtheLast
GlacialMaximum(LGM)andstillongoingpostglacialchangesinsuit
able habitats, but also contemporar y land use and level of forested
areas in general. In addition to contemporary gene flow, part of the
low nuclear differentiation in higher latitudes could be explained by
colonizationprocesscombinedwithalongjuvenilestage(Austerlitz,
Mariette, Machon, Gouyon, & Godelle, 200 0).
Analysis of population genetic structure, measured as FST or
inferred using STRUCTURE type approaches (Falush, Stephens, &
Pritchard, 2003), is based on model of discrete populations. In real
ity, given the dispersal biology of P. sylvestris, the among‐population
covariance structure is probably more of isolation‐by‐distance (IBD)
type. Bradburd, Coop, and Ralph (2018) presented a promising new
method to account for IBD patterns in genetic population structure
analysis. When applied to nuclear P. sylvestris data, the spatial model
with IBD has a better predictive accuracy than nonspatial (cluster)
model(Tyrmietal.,2019).Thespatialmodelidentifiesageneticcom
ponent that is only present in an isolated Spanish population, consis
tentwithpreviouswork(Figure3).However,inthespatialmodelthe
genetic makeup of the rest of the sampling sites remains continuous
    
|
 17
PYHÄ JÄRVI et al .
in contrast to the nonspatial model that assigns a proportion of west
ern populations to the Spanish‐type cluster (Figure 3). These results
support including IBD‐type spatial patterns in further analyses af
fected by population struc ture of P. sylvestrissuchasGWAS.
Nucleotidediversitystudieshaverevealedanonequilibriumpat
terninthedistributionofallelefrequenciesofP. sylvestris through
out its distribution, which is a common observation in forest trees
(Figure4;Heuertzetal.,2006;Ingvarsson,2008;Moscaetal.,2012;
Zhou,Bawa,&Holliday,2014).Theobservedexcessofrarevariants
is consistent with a bottleneck whose timing is uncer tain but much
earlier t han the LGM (Kujala & S avolainen, 2012; P yhäjärv i et al.,
2007).
As for many other conifers, the evolutionary timescales that
influence both phenotypic and molecular genetic diversity in P.
FIGURE 3 Applicationofspatialandnonspatialpopulationstructuremodel(Bradburdetal.,2018)withK = 2 to Pinus sylvestris exome
capturedatafrom(Tyrmietal.,2019).Geographicdistributionoflayercontributionsforeachpopulationunderspatial(a)andnonspatial
models (c). The propor tion of layer contributions in each population for spatial (b) and nonspatial models (d). Sampling locations are ordered
according to longitudes
Spatial, K = 2
0.0 0.2 0.4 0.6 0.8 1.0
Admixture
Baza
Inari
Kalvia
Kolari
Latvia
Megdurechensk
Penzenskaja
Punkaharju
Poland
Ust Chilma
Ust Kulom
Volgogradskaja
Nonspatial, K = 2
0.0 0.2 0.4 0.6 0.8 1.0
Admixture
Baza
Inari
Kalvia
Kolari
Latvia
Megdurechensk
Penzenskaja
Punkaharju
Poland
Ust Chilma
Ust Kulom
Vo
lgogradskaja
(a) (b)
(c) (d)
18 
|
   PYHÄJÄRVI et a l.
sylvestris can be very extensive, potentially reaching millions of
years(Pyhäjärvietal.,2007).Thisisacombinedpropertyofitslong
generation time and large Ne. Therefore, observing the signatures
ofpost‐LGMpopulationexpansionwould require largeamountof
data, exceeding the sample sizes used in most published studies.
Simple coalescent simulations show that, for example, doubling
the popu lation size from 25 × 106 to 50 × 106 individuals during
the past 20,000 years (1,0 00 generations) hardly shif ts the distri
bution of Tajima'sDfrom the equilibriumpopulationexpectations
(Figure5).ThelargeNe results in long expected coalescence times,
and thus, most of the observed diversit y reflects the time before
LGM.However,recentexpansioncouldpartlyexplaintheobserved
lownucleotidediversity(Figure5).Itisnoteworthythatduringad
aptation in rapidly growing populations, the current population size
governs the adaptive process, whereas overall molecular diversity is
defined by long‐term Ne, which can be considerably lower (Messer
&Petr ov,2013).Largersamplesize s,al lowedbymoreaf fordablese
quenci ng,willprovideabetterresolut iononaddres si ngmorerecent
demographic events as, for example, in Keinan and Clark (2012).
The skewed AFSalsohas practicalconsequences. Thegeneral
utility and information content of a given polymorphism depends
onits allelefrequency.Rare alleles are informative onlyinasmall
number of populations and families. Common alleles are generally
considered more useful in, for example, paternity analysis, breeding,
genetic mapping, genomic prediction, and genomewide association
analysis.However,insomecases,ignoringtherareallelesmaylead
to biased conclusions. It will result in biased estimates of diversit y
andmayleadtoign or ingimportantfunctionalvariation(DeLaTorre
etal.,2017,2019;Fahrenkrogetal.,2017;Manolioetal.,2009).
5 | ADAPTIVEVARIATIONATTRAITAND
MOLECULARLEVEL
The extent of local adaptation and distribution of adaptive variation
amonggeographicareas,genomes,andindividualsisacorequestion
in evolutionary genetics and also has impacts on conservation and
breeding efforts. The ultimate proof of local adaptation is the supe‐
rior fitness of a local population in comparison with nonlocal popu
lations (Kawecki&Ebert,2004). So far,the strongest evidencefor
local adaptation in P. sylvestris has been obtained at the phenotypic
level, but modern tools contribute to connecting phenotypic with
molecular variation. Many of the potentially adaptive traits display a
continuous, nearly normal within‐population distribution indicative
ofpolygenic basisof inheritance(Mather,1943).Phenotypicvaria
tion is often also continuous at a geographic scale so that traits are
correlated with, for example, altitude or latitude in a linear, clinal
manner, caused by an interplay of selective gradient and gene flow in
continuousspace(Huxley,1938).Whilemanyotherconiferspecies
have also been extensively characterized in terms of adaptive phe‐
notypic variation, P. sylvestris is among the few for which extremely
little population struc ture at genomewide level within a very large
continuous distribution is coupled with high QST values, measure
of phenotypic differentiation among populations. These properties
provide a relatively straightforward theoretical setup to study the
dynamicsofadaptivevariation(Adrion,Hahn,&Cooper,2015).
Barton(1999)suggeststhatapartofthelociaffectingtheclinal
phenotypic variationshouldformstrongallele frequencyclines by
thetimepopulationisapproachinganequilibrium.Anothersugges
tion emphasizes selection on favorable allele combinations across
adaptive loci (allelic covariation) with only weak selection pressure
on indivi dual loci, instea d of notable allel e frequency clin es. This
couldbeespeciallyimportantintheearlyphasesofselection(Latta,
1998;Le Corre&Kremer,2003,2012).Further,phenotypescould
well be genetically redundant, and allelic effects, transient (Barghi et
al.,2019;Yeaman,2015).
Over decades, P. sylvestris has been extensively characterized
in terms of possible adaptive trait variation. In a comparison of
27European conifer species, P. sylvestris had one of the highest
QSTvaluesonheightincrement,budflush,andbudset(Alberto
et al., 2013; see also Savolainen et al., 2007). Extensive pheno
typic variation has been observed, for example, in phenology
(Beuker, 1994;Clapham, Ekberg, Eriksson, Norell,&Vince‐Prue,
2002; Kujalaetal., 2017;Mikola, 1982; Notivol, García‐Gil, Alia,
FIGURE 4 Example of an allele
frequencyspectruminPinus sylvestris
(Tyrmietal.,2019).Observed4‐foldand
0‐fold degenerate sites are plotted as
bars, and the expected values (Nordborg
etal.,2005)underneutralequilibriumare
indicated by line
0.0
0.1
0.2
0.3
0.4
010203
04
0
Frequency
Proportion
Site.type
0−fold
4−fold
    
|
 19
PYHÄ JÄRVI et al .
& Savolainen, 2007; Salmela, Cavers, Cottrell, Iason, & Ennos,
2013), cold tolerance (Hurme, Repo, Savolainen, & Pääkkönen,
1997),drought (Palmrothet al., 1999;Semerci etal., 2017),wa
terlogging stress (Donnelly, Cavers, Cottrell, & Ennos, 2018), root
(Zadworny, McCormack, Mucha, Reich, & Oleksyn, 2016), seed
(Reich, Oleksyn,&Tjoelker, 1994)and needle(Donnelly,Cavers,
Cottrell, & Ennos, 2016) characteristics, and carbohydrate and
nutrient dynamics (Oleksyn, Reich, Zytkowiak, Karolewski, &
Tjoelker, 2003; Oleksyn, Zytkowiak, Karolewski, Reich, & Tjoelker,
2000). For example, northern populations are more cold tolerant,
have earlier growth cessation, and grow less during the growing
season, consistent with the gradient in environmental conditions
and local adaptation.
Wealth of data on survival and grow th ser ving as proxies for
fitness exists in provenance trials (reviewed by Langlet (1971)).
Savolainenetal.(2007)usedsuchdatatoidentifylocaladaptation.
Comparison of fitness of transferred populations to the fitness of
local populations based on relative sur vival and height showed, for
example, that P. sylvestris populations in central Sweden are locally
adapted. This implies that tree populations have obviously had time
to evolve to match the new habitats exposed by the retreating con‐
tinental ice retreat (Davis & Shaw, 2001). Rapid adaptation is con‐
cordant with the theoretical expectation that selection is efficient
in large populations.
The molecular genetic basis of the adaptive clinal variation can
be identified with t wo basic approaches: association mapping that
identifies genetic polymorphisms correlated with a given pheno
typic variation, or by methods that rely solely on genotypic data.
Associationstudies,especiallywhenboththe between‐population
and within‐population variation can be addressed, can reveal im‐
port ant polymorphis ms underlying adaptation. At the sam e time,
the clinal setup poses a challenge for controlling population struc‐
ture when it occurs along the same environmental gradient. Kujala
etal.(2017)addressthisproblemwithlatitudinalvariationintiming
of bud set in the first‐year seedlings of P. sylvestris. In this Bayesian
approach, the presence of a within‐population association signal is
requiredacrossdifferent populations to exclude spuriousassocia
tions. Other new promising methods, akin to QSTFST comparisons,
FIGURE 5 Tajima's D (a) and
nucleotide diversity (b) distributions for
constant size (SNM) and exponentially
(EXP) growing population. Coalescent
simulations were conducted with Coala
package in R (Staab & Metzler, 2016)
with the following parameters. SNM:
N0=50×106, μ=0.004/bpandlocus
length1,000bpandsamplesize50.EXP:
as in SNM but exponential growth started
at time 0.00002