ArticlePDF Available

Long and short range multi-locus QTL interactions in a complex trait of yeast

Authors:
  • Phenotypeca Ltd

Abstract and Figures

We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by comparing them to an unselected pool of random individuals. Here we re-examine data on individual F12 progeny selected for heat tolerance, which have been genotyped at 25 locations identified by sequencing a selected pool [Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M., Zia, A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin, R., and Liti, G. (2011). Genome research, 21(7), 1131-1138]. 960 individuals were genotyped at these locations and multi-locus genotype frequencies were compared to 172 sequenced individuals from the original unselected pool (a control group). Various non-random associations were found across the genome, both within chromosomes and between chromosomes. Some of the non-random associations are likely due to retention of linkage disequilibrium in the F12 population, however many, including the inter-chromosomal interactions, must be due to genetic interactions in heat tolerance. One region of particular interest involves 3 linked loci on chromosome IV where the central variant responsible for heat tolerance is antagonistic, coming from the heat sensitive parent and the flanking ones are from the more heat tolerant parent. The 3-locus haplotypes in the selected individuals represent a highly biased sample of the population haplotypes with rare double recombinants in high frequency. These were missed in the original analysis and would never be seen without the multigenerational approach. We show that a statistical analysis of entropy and information gain in genotypes of a selected population can reveal further interactions than previously seen. Importantly this must be done in comparison to the unselected population's genotypes to account for inherent biases in the original population.
Content may be subject to copyright.
Long and short range multi-locus QTL interactions in a complex trait of yeast
Evgeny M Mirkes1, Thomas Walsh2, Edward J Louis2 and Alexander N Gorban1
1Centre for Mathematical Modelling and
2Centre for Genetic Architecture of Complex Traits
University of Leicester
Leicester LE1 7RH, UK
Abstract
We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by
comparing them to an unselected pool of random individuals. Here we re-
examine data on individual F12 progeny selected for heat tolerance, which have
been genotyped at 25 locations identified by sequencing a selected pool [Parts,
L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M., Zia,
A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin, R., & Liti, G. (2011).
Genome research, 21(7), 1131-1138]. 960 individuals were genotyped at these
locations and multi-locus genotype frequencies were compared to 172
sequenced individuals from the original unselected pool (a control group).
Various non-random associations were found across the genome, both within
chromosomes and between chromosomes. Some of the non-random associations
are likely due to retention of linkage disequilibrium in the F12 population,
however many, including the inter-chromosomal interactions, must be due to
genetic interactions in heat tolerance. One region of particular interest involves 3
linked loci on chromosome IV where the central variant responsible for heat
tolerance is antagonistic, coming from the heat sensitive parent and the flanking
ones are from the more heat tolerant parent. The 3-locus haplotypes in the
selected individuals represent a highly biased sample of the population
haplotypes with rare double recombinants in high frequency. These were missed
in the original analysis and would never be seen without the multigenerational
approach. We show that a statistical analysis of entropy and information gain in
genotypes of a selected population can reveal further interactions than
previously seen. Importantly this must be done in comparison to the unselected
population’s genotypes to account for inherent biases in the original population.
Introduction
The determination of the underlying genetic causes of particular phenotypes has
progressed greatly in recent years with studies in yeast being at the forefront of
the application and development of new techniques [1]. Recent advances in
quantitative genetic analysis in yeast have resulted in an unprecedented
dissection of the genetic architecture of complex traits. The sequencing of a pool
of selected progeny of a hybrid cross has provided high resolution mapping of
QTLs for a number of traits [2-5]. Adding multiple generations via an advanced
intercross line approach increases the resolution and sensitivity [6, 7]. In this
paper, we analyse data on individual F12 progeny selected for heat tolerance,
which have been genotyped at 25 locations identified by sequencing a selected
pool. Determining epistatic interactions among QTLs requires the knowledge of
genotypes in selected individuals and has been successfully used on progeny of
F1 hybrids [8, 9]. Basic quantitative trait analysis of the progeny of F1 crosses,
combined with the knowledge of gene function gained through decades of
analysis, allows for the determination of causal genetic variation within large
QTL regions [8, 10]. Some genetic interactions between these can also be
detected but only for very strong non-random associations [8]. Backcrosses help
resolve QTL regions and reveal linked sets of causal variants in some cases [11,
12, 13]. Improvements in resolution have been made by pooling selected
segregants of F1 hybrids [2, 14], and by using multi-generational hybrid
populations [6, 7]. The issue of interactions/associations of genetic variants is
still problematic, however, some progress has been made by sequencing large
numbers of individual F1 progeny [9].
Heat tolerance has been one phenotype studied extensively, first with crosses
involving a lab strain [10, 11, 13], then in 6 pairwise crosses between 4 different
populations [8], then in a pairwise multi-generational cross [6] followed by a 4-
way multigenerational cross incorporating the 4 populations originally studied
[7]. In the first study large regions were identified that explained some of the
phenotypic variation in heat tolerance with an analysis of candidate genes
revealing some responsible genetic variation [10]. Further backcrosses revealed
a linked set of QTLs within the regions, some of the variation providing heat
tolerance coming from the more heat sensitive parent [11, 13]. In the 6 pairwise
cross study, a total of 11 QTLs were identified for heat tolerance with low
resolution, though none of the crosses had more than 4 QTLs segregating.
Deleting one or the other allele in the hybrid, and measuring the heat tolerance
in the resulting hemizygote, confirmed candidate gene involvement. The
pairwise 12 generation cross, between the North American (NA) and West
African (WA) populations, resulted in a very high resolution determination of 21
to 22 QTLs as opposed to the 4 large QTL regions determined from the F1 cross
of the same parents [6, 8]. Here 8 of the variants providing heat tolerance came
from the more heat sensitive parent (WA). The 4-way, 12-generation study
increased the number of QTLs responsible for heat tolerance identified to 34 [7].
In the pairwise 12-generation cross, 960 individual heat resistant segregants
were genotyped at the 22 QTLs and a few other segregating markers. No 2-locus
inter-chromosomal interactions were found under the hypothesis of
independence before selection and correction for multiple comparisons when
analysing 19 of the segregating sites in the selected pool [6]. This is despite
strong evidence of epistasis among the QTLs responsible for heat tolerance and
the finding of negative epistatis between two loci in hemizygous double allele
deletions. This analysis didn’t detect interactions of linked QTLs nor could detect
those of multiple gene interactions.
The problem of the apparent lack of allele fixation and strong interchromosomal
interactions after 12 d under selection was discussed and several hypotheses were
proposed [6]. In our work, we apply more sensitive technics for analysis of multiple
testing results.
For analysis of associations in the heat selected population, which contains 896
multilocus genotypes after removing those with missing data, we apply a
bootstrap based analysis of ordered Relative Information Gain (RIG, [15])
(developed specially for this study) and a more sophisticated version of false
discovery rate control procedure [16]. These methods allow the identification of
more than 20 pairs of QTLs with significantly non-random dependences. The
requirement of significant differences of RIG between the unselected and heat
selected pools measured by fraction of RIG and relative entropy reduces the set
of significant pairs to 18. Weak correlations can be significant: statistically
significant association does not necessarily mean strong correlation. We
consider associations with RIG greater than 0.01 as moderate. There are four
moderately correlated significantly dependent pairs: chr15-0172xxx & chr15-
1032xxx, chr07-0859xxx & chr15-0172xxx, chr01-0119xxx & chr13-0910xxx
and chr10-0420xxx & chr15-0172xxx. These four pairs and three constant loci
(only one of two alleles present in the heat selected pool) can be considered as
associated with heat tolerance.
Materials and Methods
We used the genotype data for the 960 heat selected individuals from [6]
(provided in a supplemental spreadsheet Table S1). Firstly 64 individuals with
missing data were identified and removed. There was no bias in those with
missing data. Indeed, we consider two samples: the original sample and sample
with complete genotypes. To check the hypothesis that complete sample is not
biased we test the hypothesis of coincidence of the distributions of NA and WA
for each attribute in both samples. We apply tests [17] to check this
hypothesis. For this test p-value is the probability to observe by chance the same
or greater deviation in two samples if both samples are equally distributed. The
minimal p-value for all attributes is 64% (p-values are provided in a
supplemental spreadsheet Table S2). As a result there is no evidence to reject
the hypothesis of coincidence. It can be interpreted that missed values are
missed completely at random (MCAR). It means that removing of incomplete
records does not bias sample distributions.
In order to remove the possibility that there may have been bias in the original
population before selection, leading to this apparent association due to the
selection for heat tolerance, we needed to have the genotypes of individuals from
the original pool prior to heat selection. Fortunately, 172 individuals from this
study were sequenced as part of the 4-way study [7] to compare sizes of LD
blocks (associations due to linkage). We scored each at the 25 loci for the allele
to create a control data set for comparison (provided in a supplemental
spreadsheet Table S3). As a control group in our study we used these 172
genome sequences of unselected individuals from the pairwise cross [7], to
determine the genotypes at the QTLs and other segregating loci used in the
selected individuals. Testing the hypothesis that this complete sample is not
biased shows that missed values (present in 7 genomes) in this unselected pool
can be interpreted as missed completely at random (MCAR) (p-values are
provided in a supplemental spreadsheet Table S4).
To find associations between loci we use several approaches. Relative
Information Gain (RIG) [15] is widely used in data mining to measure
dependence. RIG is not symmetric. The greater value of RIG means the stronger
the correlation and it is zero for independent attributes. RIG of the locus with
respect the locus is defined as:


where  is the entropy of the allele distribution for the locus X:
  
where is the fraction of genotypes with the NA allele in the locus X among all
genotypes,  is the relative entropy:
     
where is the fraction of genotypes with the NA allele in the locus Y among all
genotypes,    and    are the specific
conditional entropies:
     
      
where  is the fraction of genotypes with NA allele in the locus X among all
genotypes with NA allele in the locus Y and  is the fraction of genotypes
with NA allele in the locus X among all genotypes with WA allele in the locus Y.
One of the most widely used measures of correlation is Pearson’s correlation
coefficient (PCC):


where  is the fraction of genotypes with the NA alleles in the loci X and Y
simultaneously among all genotypes, , are the marginal frequencies of the
NA alleles in the loci X and Y correspondingly. PCC is also used as a Linkage
disequilibrium measure [18].
In signal recognition the Hamming correlation coefficient [19] is an alternative to
the PCC. The normalized Hamming’s correlation coefficient (NHCC) is the
number of coincident symbols minus the number of different symbols divided by
the length of sequences:
  
For independent binary random variables, the probability of observing by
chance the same or greater deviation in contingency table is equal to the
probability of observing by chance the same or greater correlation (PCC, NHCC
or RIG) for uncorrelated variables. Therefore, usage of PCC, NHCC or RIG is
equivalent. Since RIG is evaluates correlation of categorical features, we use this
measure in our study.
The test of independence [17] provides us a direct technique for independence
analysis. To check the hypothesis of independence we apply the two-sided
Fisher’s exact test. It is closely related to the  measure of linkage
disequilibrium [20].
Revealing significant associations between different loci is a typical multiple
testing problem. There are several techniques of accounting for multiple testing.
The simplest one is the Bonferroni correction. Unfortunately the Bonferroni
correction is very conservative [16, 21, 22]. The widely used BH step-up
procedure [23] is less conservative than the Bonferroni correction, but is less
powerful and more conservative [16, 22] than the q-value technique suggested
by Storey and Tibshirani [16]. To define the significance of dependency we apply
the calculation of q-values, which characterizes the False Discovery Rate (FDR) in
the version suggested by Storey and Tibshirani [16].
Also we apply a Bootstrap Test for ordered RIG (BToRIG) to define the
significance of each RIG. We have L loci and N observations for each locus.
Calculate frequencies of NA for all loci. RIG is not symmetric and we calculate
    values of RIG. Let us select a large number    (in our study we
have    and use   ). Let us sort the RIGs in descending order:
    . Our test calculates -values which is the probability to
observe by chance the same or greater value of RIG for ith maximal value of RIG
if all loci are independently distributed with frequencies defined by original
sample. We consider the ith RIG as significant with significance level if
   . The algorithm of this test is:
1. Calculate the frequencies for all loci,   .
2. Calculate two RIGs for each pair of loci.
3. Sort RIGs in descending order    .
4. Select large number    and perform bootstrap procedure:
4.1. Generate artificial loci with frequencies (number of NA can slightly
fluctuate).
4.2. Calculate two RIGs  for each pair of loci, where    is the
number of generation.
4.3. Sort RIGs in order       .
5. Calculate p-value for ith correlation
5.1. 
 , where  is the Heaviside step function
      .
5.2. .
6. Correlation is significant with significance level if    .
This approach can be also applied for ordered PCC and for any other measure of
correlation.
To check the identity of distributions of NA and WA in the same locus in heat
selected and unselected pools we consider two random variables for each locus:
the first variable is whether it is in the heat selected or unselected pool and the
second variable is which allele, NA or WA. Independence of these two variables
means that selection has no effect for this locus. We apply Fisher’s exact test and
the test.
‘Statistically significant association’ does not necessarily mean large correlation’.
The multiple testing procedures return lists of statistically significant
associations between loci in the heat selected population. But the correlations
may be quite small. To select a reasonable level of correlation we retain
significant links with RIG>ε for some threshold ε>0 only (for example, ε=0.01).
A second selection is necessary to compare associations in the heat selected
populations with the unselected population (control group). For this purpose, we
consider the allele distributions in pairs of associated (after selection) loci and
calculate the relative entropy [24] with respect to this distribution in the
unselected population. A value of zero for relative entropy means that the
association is the same in both the selected and unselected groups. Associations
with small relative entropy with respect to the unselected population should be
additionally tested as they may be caused not by the heat tolerance but be a
property of the unselected population. Another measure for change of
association after selection gives the RIG ratio: ().
Results
Several methods of association analysis, i.e. lack of independence, were applied
to these data. Statistics of PCC, NHCC, and RIG were analysed for each pool
separately. The false discovery rate control procedure [16] was also
implemented and applied. Both inter and intra chromosomal associations were
found. Comparison of associations in the two pools allows identification of real
connections associated with heat tolerance.
There are two pools: unselected and heat selected. The first reasonable question
is ‘Are these pools significantly different in allele distribution inside loci?’ Tests
of identity of distributions of NA and WA in selected and unselected pools
reveals six loci with identical distributions. Four loci are identified with p-value
greater than 0.1 and two loci with p-value between 0.01 and 0.1 (see Figure 1
and Table S5 in supplementary material). In this test the p-value is the
probability of observing by chance the same or greater dependence in
contingency table if two random variables are independent.
Figure 1. The diagrams of    for unselected and heat
selected samples. Loci in which the hypothesis of independence of allele
distributions of the heat selected or unselected pools cannot be rejected are
marked by solid circle if p-value is greater than 0.1 and by circle if p-value is
between 0.01 and 0.1.
Figure 1 shows the significant differences between heat selected and unselected
pools for most of loci. As might be expected the fraction of NA is increased after
selection in most of the loci as this is the heat tolerant parent. A perhaps
unexpected elimination of NA is observed for locus chr04-0488xxx, however
such antagonisitic alleles are known and this one has been discussed in the
previous analysis [6]. For eight loci we can see the changing of sign, antagonisim,
of differences  . For three loci of chromosome IV an increase of
variability is observed.
Figure 1 shows that two of the markers (chr01-0040xxx and chr05-0196xxx) in
the unselected pool and three of the markers (chr02-0522xxx, chr04-0488xxx
and chr05-0196xxx) in the heat selected pool had alleles from only one of the
parents, i.e. they were fixed. Furthermore there are two almost constant markers
in the heat selected pool: chr01-0040xxx contains 99.6% WA and chr02-0517xxx
contains 99.8% NA. These two markers can be interpreted as constant loci
because numbers of observed NA and WA correspondingly are too small. Three
markers chr02-0522xxx, chr04-0488xxx and chr02-0517xxx can be interpreted
as exactly associated with heat resistance. We exclude these loci from the further
analysis of associations.
We call two loci linked if these loci are adjacent and fraction of mixed genes (NA-
WA and WA-NA) is low i.e. there is linkage disequilibrium. The linked loci (both
for the unselected and heat selected pools) have the same colour in Figure 2. For
each group of linked loci, one locus with the most balanced (nearest to 0.5)
frequencies of NA and WA is retained for further analysis.
a)
b)
Figure 2. Distribution of    versus QTLs for (a)
unselected and (b) heat selected pools. Red corresponds to one parent allele
(constant loci) and almost constant loci (the fraction of one of the gene is greater
than 99%), magenta, green, blue, violet and brown correspond to different
groups of linked loci and grey colour corresponds to all other loci.
To find correlated loci in the heat selected pool we apply a bootstrap test for PCC,
NHCC, and RIG.
The bootstrap test of significance of PCC defines two pairs (chr04-0461xxx &
chr04-0496xxx, and chr10-0234xxx & chr12-0730xxx) with significance level
p=0.05. This result means that for binary random variables PCC is not an
appropriate measure of correlation. The bootstrap test of significance of NHCC
defines two pairs (chr04-0461xxx & chr04-0496xxx, and chr07-0859xxx &
chr13-0910xxx) with significance level p=0.05. This result means that NHCC is
not an appropriate measure of correlation for this genomic study. However, the
sum of NHCC for all pairs of loci, except constant and linked loci in the unselected
pool is equal to 9.72 and for the heat selected pool 19.73. This indicates a
significant increase of correlation (measured by the sum of NHCC). This effect
(growth of correlations under stress) is well known [25, 26].
For BToRIG, the number of significant connections with respect to p-value is
depicted in Figure 3a. We consider as significant connections with a p-value which is
not greater than 0.005 (29 pairs in Figure 4a). This set includes three pairs found in
the previous analysis [6].
a) b)
Figure 3. The number of significant correlations for BToRIG (a) and FDR and
estimated number of false discoveries for FDR approach.
a) b)
c) d)
Figure 4. Sets of significantly dependent loci for heat selected pools: a) all
connections selected as significant, b) strong and moderate connections with
  , c) significant connections with    or
relative entropy (selected with respect to unselected group) is greater than 0.5
and d) significant connections with   , and one of the following
conditions:    or relative entropy is greater than 0.5.
Red solid circle depict the constant loci. Red circle with white centre depicts the
loci excluded because there are in linkage disequilibrium with other loci
(doubled red line). Solid green lines connect loci defined as significantly
dependent by DFR and BToRIG. Brown dashed lines connect loci defined as
significantly correlated by the Bootstrap test only. Blue dotted line connects loci
defined as significant by DFR only.
We also apply the False Discovery Rate (FDR) approach to identify significantly
dependent loci. Graphs of FDR and expected number of false discovery with respect
to number of significant connections are depicted in Figure 3b. For FDR we consider
as significantly dependent the first connections with , where is the q-
value of kth test. It means that expected number of false discovery is not greater than
  for k significant connections. Number of significantly dependent
connections is 23. All these connections also were detected as significantly correlated
by BToRIG except pair chr02-0472xxx & chr07-0131xxx. This set includes three
pairs found previously [6].
The heat selected pool contains 896 genomes. This is large enough a sample so
that a weak correlation can be identified as significant. To exclude significantly
correlated pairs with really weak correlation we remove all pairs with RIG which
is less than 0.01. All pairs with strong and moderate correlation are depicted in
Figure 4b.
To identify correlations associated with heat tolerance it is necessary to compare
RIG for the same pair of loci in unselected and heat selected pools. We apply two
approaches for this purpose: estimation of RIG ratio ()
and the relative entropy. We consider change of RIG as significant if RIG ratio is
greater than 2 or relative entropy is greater than 0.5. Significantly correlated
associations with significant changes of RIG are depicted in Figure 4c.
Finally we unite two requirements: a pair of loci can be associated with heat
tolerance if the correlation of this pair is significant, the RIG of this pair has
significant changes and the RIG is not less than 0.01. The final set of pairs of loci
associated with heat tolerance is depicted in Figure 4d. The expected number of
false discoveries for this set of pairs is less than one.
Chromosome IV needs additional discussion. Table 1 shows that alleles in the
first four loci in IV chromosome (chr04-0454xxx, chr04-0461xxx, chr04-0488xxx
and chr04-0496xxx) are not distributed independently with probability 1/2 of
NA and WA. For such an equidistribution, the probability of each combination of
alleles in four loci is 1/16. The probability of observing by chance the same or
greater than in Table 1 deviation in contingency table from this independent
equidistribution is less than 10-300. Alleles in the first four loci are also not
distributed independently with probabilities calculated by samples (these
probabilities are different for unselected and heat selected pools, the probability
of observing by chance the same or greater deviation in contingency table is less
than 10-300). In the unselected pool these four loci are in linkage disequilibrium
because the fraction of genotypes with the same marker in all loci is 91% (NA-
NA-NA-NA 15% and WA-WA-WA-WA 76%). If we remove the constant locus
chr04-0488xxx from the heat selected pool then we also can consider the other
three loci as in linkage disequilibrium because the fraction of genotypes with the
same marker in all loci is 89% (NA-NA-NA 31% and WA-WA-WA 58%).
However, the locus chr04-0488xxx is located between loci chr04-0461xxx and
chr04-0496xxx. Therefore, the strong correlation between loci chr04-0461xxx
and chr04-0496xxx cannot be explained by lack of crossovers as can be done for
the unselected pool.
Table 1. Frequencies of markers in four loci of IV chromosome.
Chromosome IV locus
Observed
Independent with
marginal probabilities
0454
xxx
0461
xxx
0488
xxx
0496
xxx
Unselected
Heat selected
Unselected
Heat selected
#
%
#
%
#
%
#
%
NA
NA
NA
NA
24
15
0
0
0.21
0
0.00
0
NA
NA
NA
WA
1
1
0
0
1.05
1
0.00
0
NA
NA
WA
NA
0
0
279
31
1.05
1
45.79
5
NA
NA
WA
WA
9
5
53
6
5.12
3
83.23
9
NA
WA
NA
NA
0
0
0
0
0.79
0
0.00
0
NA
WA
NA
WA
0
0
0
0
3.89
2
0.00
0
NA
WA
WA
NA
0
0
3
0
3.89
2
74.88
8
NA
WA
WA
WA
1
1
5
1
19.01
12
136.10
15
WA
NA
NA
NA
1
1
0
0
0.79
0
0.00
0
WA
NA
NA
WA
0
0
0
0
3.89
2
0.00
0
WA
NA
WA
NA
0
0
4
0
3.89
2
74.88
8
WA
NA
WA
WA
0
0
4
0
19.01
12
136.10
15
WA
WA
NA
NA
2
1
0
0
2.95
2
0.00
0
WA
WA
NA
WA
0
0
0
0
14.43
9
0.00
0
WA
WA
WA
NA
1
1
32
4
14.43
9
122.45
14
WA
WA
WA
WA
126
76
516
58
70.61
43
222.57
25
Statistical comparisons of the heat selected population to the control population
now reveal associations due to the selection for heat tolerance above any pre-
existing biases in the genotypes due to the process of generation of the test
population (linkage disequilibrium and inadvertent selection for some variants
during the 12 rounds of meiosis and random mating). There remain several 2-
way up to an 8-way association between unlinked markers (see Figure 4).
Importantly the linked associations where the NA-WA-NA haplotype on
chromosome IV, extremely rare in the unselected population (none were seen in
the 172 unselected individuals and have a predicted frequency of 0.08% if 12
generations of crossing result in independent genetic intervals), was found in
30% of the heat selected individuals.
Discussion/Conclusions
Thirty significant associations between QTLs are detected by two statistical
approaches, FDR (False Discovery Rate) and BToRIG (Bootstrap Test for ordered
RIG). 22 of them are detected by both approaches simultaneously (see Figure 4).
These include both inter-chromosomal, unlinked, interactions as seen in Figure
4as well as interactions among linked QTLs as seen on chromosome IV, Figure 4.
Previous analysis of these data [6] found three pairs of significant connections:
chr15-0172xxx & chr15-1032xxx, chr10-0234xxx & chr12-0730xxx and chr07-
0131xxx & chr12-0140xxx. The remarkable difference in findings is caused by
the more conservative BH step up procedure of multiple testing in [6] (it is well
known that this procedure has relatively low power). All three of these pairs are
found by both applied technique: FDR and BToRIG (see Figure 4a). However two
of them, chr10-0234xxx & chr12-0730xxx and chr07-0131xxx & chr12-0140xxx,
have relatively small value of RIG and small changing of RIG (in comparison of
the heat selected and unselected pools). It means that these associations are
significant but are not considered here as sufficiently strongly correlated.
Furthermore, both these associations have approximately the same correlations
in the heat selected and unselected pools. As a result the only pair chr15-
0172xxx & chr15-1032xxx which is found in the previous study [6] is presented
in Figure 4d as a significant and sufficiently strong association in the context of
heat tolerance.
The hypothesis that the unselected pool contains independent loci is not correct
as it is shown in Figures 1 and 2. This fact was the basis for the the usage of
correlation change analysis rather than simple correlation analysis.
Three loci become constant in the heat selected pool: chr02-0522xxx, chr04-
0488xxx and chr05-0196xxx. Loci chr02-0522xxx and chr05-0196xxx contain
NA markers only. Locus chr04-0488xxx contains WA markers only.
We exclude constant loci and keep for analysis only one locus from any pair of
loci in strong linkage disequilibrium. After that, we apply two multi testing
approaches: calculation of q-value to estimate FDR and BToRIG to find pairs with
significant highest RIGs. The FDR approach identifies 23 significantly dependent
pairs of loci and BToRIG identifies 29 correlated pairs of loci, which include 22 of
pairs identified by FDR. Removing weak correlations and pairs with small change
of correlations in the heat selected pool in comparison with the unselected pool
decreases number of pairs of loci associated with heat tolerance to four pairs:
chr15-0172xxx & chr15-1032xxx, chr07-0859xxx & chr15-0172xxx, chr01-
0119xxx & chr13-0910xxx and chr10-0420xxx & chr15-0172xxx.
We also find that distribution of chr04-0461xxx & chr04-0496xxx in the two
pools are very similar but distribution of three adjacent loci chr04-0461xxx,
chr04-0488xxx and chr04-0496xxx are drastically changed. The association
between chr04-0461xxx and chr04-0496xxx in selected and in unselected pools
is the same and formally we exclude it from Figure 4d. Nevertheless, the
situation in chromosome 4 is very special: the combination NA, NA, WA, NA in
loci chr04-0454xxx, chr04-0461xxx, chr04-0488xxx and chr04-0496xxx occurs
in 31% of genotypes in the heat selected pool and in 0% of genotypes in the
unselected pool (Table 1). Therefore, this association should be considered as a
result of selection.
The application of statistical methods developed for other purposes are proving
useful in the determination of interactions among QTLs of complex traits,
bringing us closer to understanding the entire heritability of traits.
The biology of the interactions found will require further experimentation. In
particular the linked set of QTLs with heat tolerance promoting alleles coming
from both parents, and the interaction within the set is of interest. The genetic
architecture of this region could be the result of adaptation to heat tolerance in
both populations.
Acknowledgements: This work was in part supported by the Wellcome Trust
Institutional Strategic Support Fund WT097828/Z/11/Z (RM33G0255 and
RM33G0335).
References
1. Liti, G., & Louis, E. J. (2012). Advances in quantitative trait analysis in
yeast. PLoS genetics, 8(8), e1002912.
2. Ehrenreich IM, Torabi N, Jia Y, Kent J, Martis S, Shapiro JA, Gresham D,
Caudy AA, Kruglyak L. 2010. Dissection of genetically complex traits with
extremely large pools of yeast segregants. Nature 464(7291): 1039-1042.
3. Schwartz K, Wenger JW, Dunn B, Sherlock G. 2012. APJ1 and GRE3 homologs
work in concert to allow growth in xylose in a natural Saccharomyces sensu
stricto hybrid yeast. Genetics 191(2): 621-632.
4. Swinnen S, Schaerlaekens K, Pais T, Claesen J, Hubmann G, Yang Y, Demeke M,
Foulquie-Moreno MR, Goovaerts A, Souvereyns K et al. 2012. Identification of
novel causative genes determining the complex trait of high ethanol tolerance in
yeast using pooled-segregant whole-genome sequence analysis. Genome Res
22(5): 975-984.
5. Hubmann G, Mathe L, Foulquie-Moreno MR, Duitama J, Nevoigt E, Thevelein JM.
2013. Identification of multiple interacting alleles conferring low glycerol and
high ethanol yield in Saccharomyces cerevisiae ethanolic fermentation.
Biotechnol Biofuels 6(1): 87.
6. Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J.,
Molin, M., Zia, A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin,
R., & Liti, G. (2011). Revealing the genetic structure of a trait by
sequencing a population under selection. Genome research, 21(7), 1131-
1138.
7. Cubillos, F. A., Parts, L., Salinas, F., Bergström, A., Scovacricchi, E., Zia, A.,
Illingworth, C. J. R., Mustonen, V., Ibstedt, S., Warringer, J., Louis, E. J.,
Durbin, R., & Liti, G. (2013). High-resolution mapping of complex traits
with a four-parent advanced intercross yeast population. Genetics, 195(3),
1141-1155.
8. Cubillos, F. A., Billi, E., Zörgö, E., Parts, L., Fargier, P., Omholt, S., Blomberg,
A., Warringer, J., Louis, E. J., & Liti, G. (2011). Assessing the complex
architecture of polygenic traits in diverged yeast populations. Molecular
ecology, 20(7), 1401-1413.
9. Bloom, J. S., Ehrenreich, I. M., Loo, W. T., Lite, T. L. V., & Kruglyak, L.
(2013). Finding the sources of missing heritability in a yeast
cross. Nature,494(7436), 234-237.
10. Steinmetz, L. M., Sinha, H., Richards, D. R., Spiegelman, J. I., Oefner, P. J.,
McCusker, J. H., & Davis, R. W. (2002). Dissecting the architecture of a
quantitative trait locus in yeast. Nature, 416(6878), 326-330.
11. Sinha H, Nicholson BP, Steinmetz LM, McCusker JH. 2006. Complex genetic
interactions in a quantitative trait locus. PLoS Genet 2(2): e13.
12. Ben-Ari G, Zenvirth D, Sherman A, David L, Klutstein M, Lavi U, Hillel J,
Simchen G. 2006. Four linked genes participate in controlling sporulation
efficiency in budding yeast. PLoS Genet 2(11): e195.
13. Sinha, H., David, L., Pascon, R. C., Clauder-Münster, S., Krishnakumar, S.,
Nguyen, M., Shi, G., Dean, J., Davis, R. W., Oefner, P. J., McCusker, J. H., &
Steinmetz, L. M. (2008). Sequential elimination of major-effect
contributors identifies additional quantitative trait loci conditioning high-
temperature growth in yeast. Genetics, 180(3), 1661-1670.
14. Ehrenreich IM, Bloom J, Torabi N, Wang X, Jia Y, Kruglyak L. 2012. Genetic
architecture of highly complex chemical resistance traits across four yeast
strains. PLoS Genet 8(3): e1002570.
15. Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw
Hill, NY.
16. Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genome wide
studies. Proceedings of the National Academy of Sciences, 100(16), 9440-
9445.
17. Greenwood, P. E. & Nikulin, M. S. (1996). A Guide to Chi-squared Testing.
New York: Wiley
18. Hill, W. G., & Robertson, A. (1968). Linkage disequilibrium in finite
populations.Theoretical and Applied Genetics, 38(6), 226-231.
19. Lempel, A., & Greenberger, H. (1974). Families of sequences with optimal
Hamming-correlation properties. Information Theory, IEEE Transactions
on,20(1), 90-94.
20. Lewontin, R. C., & Kojima, K. I. (1960). The evolutionary dynamics of
complex polymorphisms. Evolution, 458-472.
21. Hirschhorn, J. N., & Daly, M. J. (2005). Genome-wide association studies for
common diseases and complex traits. Nature Reviews Genetics, 6(2), 95-108.
22. Verhoeven, K. J., Simonsen, K. L., & McIntyre, L. M. (2005). Implementing
false discovery rate control: increasing your power. Oikos, 108(3), 643-647.
23. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a
practical and powerful approach to multiple testing. Journal of the Royal
Statistical Society. Series B (Methodological), 289-300.
24. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The
Annals of Mathematical Statistics, 79-86.
25. Gorban, A. N., Smirnova, E. V., & Tyukina, T. A. (2010). Correlations, risk
and crisis: From physiology to finance. Physica A, 389(16), 3193-3217.
26. Censi, F., Giuliani, A., Bartolini, P., & Calcagnini, G. (2011). A multiscale
graph theoretical approach to gene regulation networks: a case study in
atrial fibrillation. Biomedical Engineering, IEEE Transactions on, 58(10),
2943-2946.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.
Article
Full-text available
A large fraction of human complex trait heritability is due to a high number of variants with small marginal effects, and their interactions with genotype and environment. Such alleles are more easily studied in model organisms, where environment, genetic makeup and allele frequencies can be controlled. Here, we examine the effect of natural genetic variation on heritable traits in a very large pool of baker's yeast from a multi-parent 12(th) generation intercross. We selected four representative founder strains to produce the SGRP-4X mapping population, and sequenced 192 segregants to generate an accurate genetic map. Using these individuals, we mapped 25 loci linked to growth traits under heat stress, arsenite and paraquat, the majority of which were best explained by a diverging phenotype caused by a single allele in one condition. By sequencing pooled DNA from millions of segregants grown under heat stress, we further identified 34 and 39 regions selected in haploid and diploid pools respectively, with most of the selection against a single allele. While the most parsimonious model for the majority of loci mapped using either approach was the effect of an allele private to one founder, we could validate examples of pleiotropic effects, and complex allelic series at a locus. SGRP-4X is a deeply characterised resource that provides a framework for powerful and high-resolution genetic analysis of yeast phenotypes, and serves as a test bed for testing avenues to attack human complex traits.
Article
Full-text available
Background Genetic engineering of industrial microorganisms often suffers from undesirable side effects on essential functions. Reverse engineering is an alternative strategy to improve multifactorial traits like low glycerol/high ethanol yield in yeast fermentation. Previous rational engineering of this trait always affected essential functions like growth and stress tolerance. We have screened Saccharomyces cerevisiae biodiversity for specific alleles causing lower glycerol/higher ethanol yield, assuming higher compatibility with normal cellular functionality. Previous work identified ssk1E330N…K356N as causative allele in strain CBS6412, which displayed the lowest glycerol/ethanol ratio. Results We have now identified a unique segregant, 26B, that shows similar low glycerol/high ethanol production as the superior parent, but lacks the ssk1E330N…K356N allele. Using segregants from the backcross of 26B with the inferior parent strain, we applied pooled-segregant whole-genome sequence analysis and identified three minor quantitative trait loci (QTLs) linked to low glycerol/high ethanol production. Within these QTLs, we identified three novel alleles of known regulatory and structural genes of glycerol metabolism, smp1R110Q,P269Q, hot1P107S,H274Y and gpd1L164P as causative genes. All three genes separately caused a significant drop in the glycerol/ethanol production ratio, while gpd1L164P appeared to be epistatically suppressed by other alleles in the superior parent. The order of potency in reducing the glycerol/ethanol ratio of the three alleles was: gpd1L164P > hot1P107S,H274Y ≥ smp1R110Q,P269Q. Conclusions Our results show that natural yeast strains harbor multiple specific alleles of genes controlling essential functions, that are apparently compatible with survival in the natural environment. These newly identified alleles can be used as gene tools for engineering industrial yeast strains with multiple subtle changes, minimizing the risk of negatively affecting other essential functions. The gene tools act at the transcriptional, regulatory or structural gene level, distributing the impact over multiple targets and thus further minimizing possible side-effects. In addition, the results suggest polygenic analysis of complex traits as a promising new avenue to identify novel components involved in cellular functions, including those important in industrial applications.
Article
Full-text available
Understanding the genetic mechanisms underlying complex traits is one of the next frontiers in biology. The budding yeast Saccharomyces cerevisiae has become an important model for elucidating the mechanisms that govern natural genetic and phenotypic variation. This success is partially due to its intrinsic biological features, such as the short sexual generation time, high meiotic recombination rate, and small genome size. Precise reverse genetics technologies allow the high throughput manipulation of genetic information with exquisite precision, offering the unique opportunity to experimentally measure the phenotypic effect of genetic variants. Population genomic and phenomic studies have revealed widespread variation between diverged populations, characteristic of man-made environments, as well as geographic clusters of wild strains along with naturally occurring recombinant strains (mosaics). Here, we review these recent studies and provide a perspective on how these previously unappreciated levels of variation can help to bridge our understanding of the genotype-phenotype gap, keeping budding yeast at the forefront of genetic studies. Not only are quantitative trait loci (QTL) being mapped with high resolution down to the nucleotide, for the first time QTLs of modest effect and complex interactions between these QTLs and between QTLs and the environment are being determined experimentally at unprecedented levels using next generation techniques of deep sequencing selected pools of individuals as well as multi-generational crosses.
Article
A theoretical investigation has been made of the influence of population size (N) and recombination fraction (c) on linkage disequilibrium (D) between a pair of loci. Two situations were studied: (i) where both loci had no effect on fitness and (ii) where they showed heterozygote superiority, but no epistacy.If the populations are initially in linkage equilibrium, then the mean value ofD remains zero with inbreeding, but the mean ofD (2) increases to a maximum value and decreases until fixation is reached at both loci. The tighter the linkage and the greater the selection, then the later is the maximum in the mean ofD (2) reached, and the larger its value. The correlation of gene frequencies,r, in the population of gametes within segregating lines was also studied. It was found that, for a range of selection intensities and initial gene frequencies, the mean value ofr (2) was determined almost entirely byN c and time, measured proportional toN.The implication of these results on observations of linkage disequilibrium in natural populations is discussed.
Article
For many traits, including susceptibility to common diseases in humans, causal loci uncovered by genetic-mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this 'missing heritability' have been proposed. Here we use a large cross between two yeast strains to accurately estimate different sources of heritable variation for 46 quantitative traits, and to detect underlying loci with high statistical power. We find that the detected loci explain nearly the entire additive contribution to heritable variation for the traits studied. We also show that the contribution to heritability of gene-gene interactions varies among traits, from near zero to approximately 50 per cent. Detected two-locus interactions explain only a minority of this contribution. These results substantially advance our understanding of the missing heritability problem and have important implications for future studies of complex and quantitative traits.