Content uploaded by Evgeny Mirkes

Author content

All content in this area was uploaded by Evgeny Mirkes on Nov 23, 2018

Content may be subject to copyright.

Long and short range multi-locus QTL interactions in a complex trait of yeast

Evgeny M Mirkes1, Thomas Walsh2, Edward J Louis2 and Alexander N Gorban1

1Centre for Mathematical Modelling and

2Centre for Genetic Architecture of Complex Traits

University of Leicester

Leicester LE1 7RH, UK

Abstract

We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by

comparing them to an unselected pool of random individuals. Here we re-

examine data on individual F12 progeny selected for heat tolerance, which have

been genotyped at 25 locations identified by sequencing a selected pool [Parts,

L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M., Zia,

A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin, R., & Liti, G. (2011).

Genome research, 21(7), 1131-1138]. 960 individuals were genotyped at these

locations and multi-locus genotype frequencies were compared to 172

sequenced individuals from the original unselected pool (a control group).

Various non-random associations were found across the genome, both within

chromosomes and between chromosomes. Some of the non-random associations

are likely due to retention of linkage disequilibrium in the F12 population,

however many, including the inter-chromosomal interactions, must be due to

genetic interactions in heat tolerance. One region of particular interest involves 3

linked loci on chromosome IV where the central variant responsible for heat

tolerance is antagonistic, coming from the heat sensitive parent and the flanking

ones are from the more heat tolerant parent. The 3-locus haplotypes in the

selected individuals represent a highly biased sample of the population

haplotypes with rare double recombinants in high frequency. These were missed

in the original analysis and would never be seen without the multigenerational

approach. We show that a statistical analysis of entropy and information gain in

genotypes of a selected population can reveal further interactions than

previously seen. Importantly this must be done in comparison to the unselected

population’s genotypes to account for inherent biases in the original population.

Introduction

The determination of the underlying genetic causes of particular phenotypes has

progressed greatly in recent years with studies in yeast being at the forefront of

the application and development of new techniques [1]. Recent advances in

quantitative genetic analysis in yeast have resulted in an unprecedented

dissection of the genetic architecture of complex traits. The sequencing of a pool

of selected progeny of a hybrid cross has provided high resolution mapping of

QTLs for a number of traits [2-5]. Adding multiple generations via an advanced

intercross line approach increases the resolution and sensitivity [6, 7]. In this

paper, we analyse data on individual F12 progeny selected for heat tolerance,

which have been genotyped at 25 locations identified by sequencing a selected

pool. Determining epistatic interactions among QTLs requires the knowledge of

genotypes in selected individuals and has been successfully used on progeny of

F1 hybrids [8, 9]. Basic quantitative trait analysis of the progeny of F1 crosses,

combined with the knowledge of gene function gained through decades of

analysis, allows for the determination of causal genetic variation within large

QTL regions [8, 10]. Some genetic interactions between these can also be

detected but only for very strong non-random associations [8]. Backcrosses help

resolve QTL regions and reveal linked sets of causal variants in some cases [11,

12, 13]. Improvements in resolution have been made by pooling selected

segregants of F1 hybrids [2, 14], and by using multi-generational hybrid

populations [6, 7]. The issue of interactions/associations of genetic variants is

still problematic, however, some progress has been made by sequencing large

numbers of individual F1 progeny [9].

Heat tolerance has been one phenotype studied extensively, first with crosses

involving a lab strain [10, 11, 13], then in 6 pairwise crosses between 4 different

populations [8], then in a pairwise multi-generational cross [6] followed by a 4-

way multigenerational cross incorporating the 4 populations originally studied

[7]. In the first study large regions were identified that explained some of the

phenotypic variation in heat tolerance with an analysis of candidate genes

revealing some responsible genetic variation [10]. Further backcrosses revealed

a linked set of QTLs within the regions, some of the variation providing heat

tolerance coming from the more heat sensitive parent [11, 13]. In the 6 pairwise

cross study, a total of 11 QTLs were identified for heat tolerance with low

resolution, though none of the crosses had more than 4 QTLs segregating.

Deleting one or the other allele in the hybrid, and measuring the heat tolerance

in the resulting hemizygote, confirmed candidate gene involvement. The

pairwise 12 generation cross, between the North American (NA) and West

African (WA) populations, resulted in a very high resolution determination of 21

to 22 QTLs as opposed to the 4 large QTL regions determined from the F1 cross

of the same parents [6, 8]. Here 8 of the variants providing heat tolerance came

from the more heat sensitive parent (WA). The 4-way, 12-generation study

increased the number of QTLs responsible for heat tolerance identified to 34 [7].

In the pairwise 12-generation cross, 960 individual heat resistant segregants

were genotyped at the 22 QTLs and a few other segregating markers. No 2-locus

inter-chromosomal interactions were found under the hypothesis of

independence before selection and correction for multiple comparisons when

analysing 19 of the segregating sites in the selected pool [6]. This is despite

strong evidence of epistasis among the QTLs responsible for heat tolerance and

the finding of negative epistatis between two loci in hemizygous double allele

deletions. This analysis didn’t detect interactions of linked QTLs nor could detect

those of multiple gene interactions.

The problem of the “apparent lack of allele fixation and strong interchromosomal

interactions after 12 d under selection” was discussed and several hypotheses were

proposed [6]. In our work, we apply more sensitive technics for analysis of multiple

testing results.

For analysis of associations in the heat selected population, which contains 896

multilocus genotypes after removing those with missing data, we apply a

bootstrap based analysis of ordered Relative Information Gain (RIG, [15])

(developed specially for this study) and a more sophisticated version of false

discovery rate control procedure [16]. These methods allow the identification of

more than 20 pairs of QTLs with significantly non-random dependences. The

requirement of significant differences of RIG between the unselected and heat

selected pools measured by fraction of RIG and relative entropy reduces the set

of significant pairs to 18. Weak correlations can be significant: statistically

significant association does not necessarily mean strong correlation. We

consider associations with RIG greater than 0.01 as moderate. There are four

moderately correlated significantly dependent pairs: chr15-0172xxx & chr15-

1032xxx, chr07-0859xxx & chr15-0172xxx, chr01-0119xxx & chr13-0910xxx

and chr10-0420xxx & chr15-0172xxx. These four pairs and three constant loci

(only one of two alleles present in the heat selected pool) can be considered as

associated with heat tolerance.

Materials and Methods

We used the genotype data for the 960 heat selected individuals from [6]

(provided in a supplemental spreadsheet – Table S1). Firstly 64 individuals with

missing data were identified and removed. There was no bias in those with

missing data. Indeed, we consider two samples: the original sample and sample

with complete genotypes. To check the hypothesis that complete sample is not

biased we test the hypothesis of coincidence of the distributions of NA and WA

for each attribute in both samples. We apply tests [17] to check this

hypothesis. For this test p-value is the probability to observe by chance the same

or greater deviation in two samples if both samples are equally distributed. The

minimal p-value for all attributes is 64% (p-values are provided in a

supplemental spreadsheet – Table S2). As a result there is no evidence to reject

the hypothesis of coincidence. It can be interpreted that missed values are

missed completely at random (MCAR). It means that removing of incomplete

records does not bias sample distributions.

In order to remove the possibility that there may have been bias in the original

population before selection, leading to this apparent association due to the

selection for heat tolerance, we needed to have the genotypes of individuals from

the original pool prior to heat selection. Fortunately, 172 individuals from this

study were sequenced as part of the 4-way study [7] to compare sizes of LD

blocks (associations due to linkage). We scored each at the 25 loci for the allele

to create a control data set for comparison (provided in a supplemental

spreadsheet – Table S3). As a control group in our study we used these 172

genome sequences of unselected individuals from the pairwise cross [7], to

determine the genotypes at the QTLs and other segregating loci used in the

selected individuals. Testing the hypothesis that this complete sample is not

biased shows that missed values (present in 7 genomes) in this unselected pool

can be interpreted as missed completely at random (MCAR) (p-values are

provided in a supplemental spreadsheet – Table S4).

To find associations between loci we use several approaches. Relative

Information Gain (RIG) [15] is widely used in data mining to measure

dependence. RIG is not symmetric. The greater value of RIG means the stronger

the correlation and it is zero for independent attributes. RIG of the locus with

respect the locus is defined as:

where is the entropy of the allele distribution for the locus X:

where is the fraction of genotypes with the NA allele in the locus X among all

genotypes, is the relative entropy:

where is the fraction of genotypes with the NA allele in the locus Y among all

genotypes, and are the specific

conditional entropies:

where is the fraction of genotypes with NA allele in the locus X among all

genotypes with NA allele in the locus Y and is the fraction of genotypes

with NA allele in the locus X among all genotypes with WA allele in the locus Y.

One of the most widely used measures of correlation is Pearson’s correlation

coefficient (PCC):

where is the fraction of genotypes with the NA alleles in the loci X and Y

simultaneously among all genotypes, , are the marginal frequencies of the

NA alleles in the loci X and Y correspondingly. PCC is also used as a Linkage

disequilibrium measure [18].

In signal recognition the Hamming correlation coefficient [19] is an alternative to

the PCC. The normalized Hamming’s correlation coefficient (NHCC) is the

number of coincident symbols minus the number of different symbols divided by

the length of sequences:

For independent binary random variables, the probability of observing by

chance the same or greater deviation in contingency table is equal to the

probability of observing by chance the same or greater correlation (PCC, NHCC

or RIG) for uncorrelated variables. Therefore, usage of PCC, NHCC or RIG is

equivalent. Since RIG is evaluates correlation of categorical features, we use this

measure in our study.

The test of independence [17] provides us a direct technique for independence

analysis. To check the hypothesis of independence we apply the two-sided

Fisher’s exact test. It is closely related to the measure of linkage

disequilibrium [20].

Revealing significant associations between different loci is a typical multiple

testing problem. There are several techniques of accounting for multiple testing.

The simplest one is the Bonferroni correction. Unfortunately the Bonferroni

correction is very conservative [16, 21, 22]. The widely used BH step-up

procedure [23] is less conservative than the Bonferroni correction, but is less

powerful and more conservative [16, 22] than the q-value technique suggested

by Storey and Tibshirani [16]. To define the significance of dependency we apply

the calculation of q-values, which characterizes the False Discovery Rate (FDR) in

the version suggested by Storey and Tibshirani [16].

Also we apply a Bootstrap Test for ordered RIG (BToRIG) to define the

significance of each RIG. We have L loci and N observations for each locus.

Calculate frequencies of NA for all loci. RIG is not symmetric and we calculate

values of RIG. Let us select a large number (in our study we

have and use ). Let us sort the RIGs in descending order:

. Our test calculates -values which is the probability to

observe by chance the same or greater value of RIG for ith maximal value of RIG

if all loci are independently distributed with frequencies defined by original

sample. We consider the ith RIG as significant with significance level if

. The algorithm of this test is:

1. Calculate the frequencies for all loci, .

2. Calculate two RIGs for each pair of loci.

3. Sort RIGs in descending order .

4. Select large number and perform bootstrap procedure:

4.1. Generate artificial loci with frequencies (number of NA can slightly

fluctuate).

4.2. Calculate two RIGs for each pair of loci, where is the

number of generation.

4.3. Sort RIGs in order .

5. Calculate p-value for ith correlation

5.1.

, where is the Heaviside step function

.

5.2. .

6. Correlation is significant with significance level if .

This approach can be also applied for ordered PCC and for any other measure of

correlation.

To check the identity of distributions of NA and WA in the same locus in heat

selected and unselected pools we consider two random variables for each locus:

the first variable is whether it is in the heat selected or unselected pool and the

second variable is which allele, NA or WA. Independence of these two variables

means that selection has no effect for this locus. We apply Fisher’s exact test and

the test.

‘Statistically significant association’ does not necessarily mean ‘large correlation’.

The multiple testing procedures return lists of statistically significant

associations between loci in the heat selected population. But the correlations

may be quite small. To select a reasonable level of correlation we retain

significant links with RIG>ε for some threshold ε>0 only (for example, ε=0.01).

A second selection is necessary to compare associations in the heat selected

populations with the unselected population (control group). For this purpose, we

consider the allele distributions in pairs of associated (after selection) loci and

calculate the relative entropy [24] with respect to this distribution in the

unselected population. A value of zero for relative entropy means that the

association is the same in both the selected and unselected groups. Associations

with small relative entropy with respect to the unselected population should be

additionally tested as they may be caused not by the heat tolerance but be a

property of the unselected population. Another measure for change of

association after selection gives the RIG ratio: ().

Results

Several methods of association analysis, i.e. lack of independence, were applied

to these data. Statistics of PCC, NHCC, and RIG were analysed for each pool

separately. The false discovery rate control procedure [16] was also

implemented and applied. Both inter and intra chromosomal associations were

found. Comparison of associations in the two pools allows identification of real

connections associated with heat tolerance.

There are two pools: unselected and heat selected. The first reasonable question

is ‘Are these pools significantly different in allele distribution inside loci?’ Tests

of identity of distributions of NA and WA in selected and unselected pools

reveals six loci with identical distributions. Four loci are identified with p-value

greater than 0.1 and two loci with p-value between 0.01 and 0.1 (see Figure 1

and Table S5 in supplementary material). In this test the p-value is the

probability of observing by chance the same or greater dependence in

contingency table if two random variables are independent.

Figure 1. The diagrams of for unselected and heat

selected samples. Loci in which the hypothesis of independence of allele

distributions of the heat selected or unselected pools cannot be rejected are

marked by solid circle if p-value is greater than 0.1 and by circle if p-value is

between 0.01 and 0.1.

Figure 1 shows the significant differences between heat selected and unselected

pools for most of loci. As might be expected the fraction of NA is increased after

selection in most of the loci as this is the heat tolerant parent. A perhaps

unexpected elimination of NA is observed for locus chr04-0488xxx, however

such antagonisitic alleles are known and this one has been discussed in the

previous analysis [6]. For eight loci we can see the changing of sign, antagonisim,

of differences . For three loci of chromosome IV an increase of

variability is observed.

Figure 1 shows that two of the markers (chr01-0040xxx and chr05-0196xxx) in

the unselected pool and three of the markers (chr02-0522xxx, chr04-0488xxx

and chr05-0196xxx) in the heat selected pool had alleles from only one of the

parents, i.e. they were fixed. Furthermore there are two almost constant markers

in the heat selected pool: chr01-0040xxx contains 99.6% WA and chr02-0517xxx

contains 99.8% NA. These two markers can be interpreted as constant loci

because numbers of observed NA and WA correspondingly are too small. Three

markers chr02-0522xxx, chr04-0488xxx and chr02-0517xxx can be interpreted

as exactly associated with heat resistance. We exclude these loci from the further

analysis of associations.

We call two loci linked if these loci are adjacent and fraction of mixed genes (NA-

WA and WA-NA) is low – i.e. there is linkage disequilibrium. The linked loci (both

for the unselected and heat selected pools) have the same colour in Figure 2. For

each group of linked loci, one locus with the most balanced (nearest to 0.5)

frequencies of NA and WA is retained for further analysis.

a)

b)

Figure 2. Distribution of versus QTLs for (a)

unselected and (b) heat selected pools. Red corresponds to one parent allele

(constant loci) and almost constant loci (the fraction of one of the gene is greater

than 99%), magenta, green, blue, violet and brown correspond to different

groups of linked loci and grey colour corresponds to all other loci.

To find correlated loci in the heat selected pool we apply a bootstrap test for PCC,

NHCC, and RIG.

The bootstrap test of significance of PCC defines two pairs (chr04-0461xxx &

chr04-0496xxx, and chr10-0234xxx & chr12-0730xxx) with significance level

p=0.05. This result means that for binary random variables PCC is not an

appropriate measure of correlation. The bootstrap test of significance of NHCC

defines two pairs (chr04-0461xxx & chr04-0496xxx, and chr07-0859xxx &

chr13-0910xxx) with significance level p=0.05. This result means that NHCC is

not an appropriate measure of correlation for this genomic study. However, the

sum of NHCC for all pairs of loci, except constant and linked loci in the unselected

pool is equal to 9.72 and for the heat selected pool 19.73. This indicates a

significant increase of correlation (measured by the sum of NHCC). This effect

(growth of correlations under stress) is well known [25, 26].

For BToRIG, the number of significant connections with respect to p-value is

depicted in Figure 3a. We consider as significant connections with a p-value which is

not greater than 0.005 (29 pairs in Figure 4a). This set includes three pairs found in

the previous analysis [6].

a) b)

Figure 3. The number of significant correlations for BToRIG (a) and FDR and

estimated number of false discoveries for FDR approach.

a) b)

c) d)

Figure 4. Sets of significantly dependent loci for heat selected pools: a) all

connections selected as significant, b) strong and moderate connections with

, c) significant connections with or

relative entropy (selected with respect to unselected group) is greater than 0.5

and d) significant connections with , and one of the following

conditions: or relative entropy is greater than 0.5.

Red solid circle depict the constant loci. Red circle with white centre depicts the

loci excluded because there are in linkage disequilibrium with other loci

(doubled red line). Solid green lines connect loci defined as significantly

dependent by DFR and BToRIG. Brown dashed lines connect loci defined as

significantly correlated by the Bootstrap test only. Blue dotted line connects loci

defined as significant by DFR only.

We also apply the False Discovery Rate (FDR) approach to identify significantly

dependent loci. Graphs of FDR and expected number of false discovery with respect

to number of significant connections are depicted in Figure 3b. For FDR we consider

as significantly dependent the first connections with , where is the q-

value of kth test. It means that expected number of false discovery is not greater than

for k significant connections. Number of significantly dependent

connections is 23. All these connections also were detected as significantly correlated

by BToRIG except pair chr02-0472xxx & chr07-0131xxx. This set includes three

pairs found previously [6].

The heat selected pool contains 896 genomes. This is large enough a sample so

that a weak correlation can be identified as significant. To exclude significantly

correlated pairs with really weak correlation we remove all pairs with RIG which

is less than 0.01. All pairs with strong and moderate correlation are depicted in

Figure 4b.

To identify correlations associated with heat tolerance it is necessary to compare

RIG for the same pair of loci in unselected and heat selected pools. We apply two

approaches for this purpose: estimation of RIG ratio ()

and the relative entropy. We consider change of RIG as significant if RIG ratio is

greater than 2 or relative entropy is greater than 0.5. Significantly correlated

associations with significant changes of RIG are depicted in Figure 4c.

Finally we unite two requirements: a pair of loci can be associated with heat

tolerance if the correlation of this pair is significant, the RIG of this pair has

significant changes and the RIG is not less than 0.01. The final set of pairs of loci

associated with heat tolerance is depicted in Figure 4d. The expected number of

false discoveries for this set of pairs is less than one.

Chromosome IV needs additional discussion. Table 1 shows that alleles in the

first four loci in IV chromosome (chr04-0454xxx, chr04-0461xxx, chr04-0488xxx

and chr04-0496xxx) are not distributed independently with probability 1/2 of

NA and WA. For such an equidistribution, the probability of each combination of

alleles in four loci is 1/16. The probability of observing by chance the same or

greater than in Table 1 deviation in contingency table from this independent

equidistribution is less than 10-300. Alleles in the first four loci are also not

distributed independently with probabilities calculated by samples (these

probabilities are different for unselected and heat selected pools, the probability

of observing by chance the same or greater deviation in contingency table is less

than 10-300). In the unselected pool these four loci are in linkage disequilibrium

because the fraction of genotypes with the same marker in all loci is 91% (NA-

NA-NA-NA 15% and WA-WA-WA-WA 76%). If we remove the constant locus

chr04-0488xxx from the heat selected pool then we also can consider the other

three loci as in linkage disequilibrium because the fraction of genotypes with the

same marker in all loci is 89% (NA-NA-NA 31% and WA-WA-WA 58%).

However, the locus chr04-0488xxx is located between loci chr04-0461xxx and

chr04-0496xxx. Therefore, the strong correlation between loci chr04-0461xxx

and chr04-0496xxx cannot be explained by lack of crossovers as can be done for

the unselected pool.

Table 1. Frequencies of markers in four loci of IV chromosome.

Chromosome IV locus

Observed

Independent with

marginal probabilities

0454

xxx

0461

xxx

0488

xxx

0496

xxx

Unselected

Heat selected

Unselected

Heat selected

#

%

#

%

#

%

#

%

NA

NA

NA

NA

24

15

0

0

0.21

0

0.00

0

NA

NA

NA

WA

1

1

0

0

1.05

1

0.00

0

NA

NA

WA

NA

0

0

279

31

1.05

1

45.79

5

NA

NA

WA

WA

9

5

53

6

5.12

3

83.23

9

NA

WA

NA

NA

0

0

0

0

0.79

0

0.00

0

NA

WA

NA

WA

0

0

0

0

3.89

2

0.00

0

NA

WA

WA

NA

0

0

3

0

3.89

2

74.88

8

NA

WA

WA

WA

1

1

5

1

19.01

12

136.10

15

WA

NA

NA

NA

1

1

0

0

0.79

0

0.00

0

WA

NA

NA

WA

0

0

0

0

3.89

2

0.00

0

WA

NA

WA

NA

0

0

4

0

3.89

2

74.88

8

WA

NA

WA

WA

0

0

4

0

19.01

12

136.10

15

WA

WA

NA

NA

2

1

0

0

2.95

2

0.00

0

WA

WA

NA

WA

0

0

0

0

14.43

9

0.00

0

WA

WA

WA

NA

1

1

32

4

14.43

9

122.45

14

WA

WA

WA

WA

126

76

516

58

70.61

43

222.57

25

Statistical comparisons of the heat selected population to the control population

now reveal associations due to the selection for heat tolerance above any pre-

existing biases in the genotypes due to the process of generation of the test

population (linkage disequilibrium and inadvertent selection for some variants

during the 12 rounds of meiosis and random mating). There remain several 2-

way up to an 8-way association between unlinked markers (see Figure 4).

Importantly the linked associations where the NA-WA-NA haplotype on

chromosome IV, extremely rare in the unselected population (none were seen in

the 172 unselected individuals and have a predicted frequency of 0.08% if 12

generations of crossing result in independent genetic intervals), was found in

30% of the heat selected individuals.

Discussion/Conclusions

Thirty significant associations between QTLs are detected by two statistical

approaches, FDR (False Discovery Rate) and BToRIG (Bootstrap Test for ordered

RIG). 22 of them are detected by both approaches simultaneously (see Figure 4).

These include both inter-chromosomal, unlinked, interactions as seen in Figure

4as well as interactions among linked QTLs as seen on chromosome IV, Figure 4.

Previous analysis of these data [6] found three pairs of significant connections:

chr15-0172xxx & chr15-1032xxx, chr10-0234xxx & chr12-0730xxx and chr07-

0131xxx & chr12-0140xxx. The remarkable difference in findings is caused by

the more conservative BH step up procedure of multiple testing in [6] (it is well

known that this procedure has relatively low power). All three of these pairs are

found by both applied technique: FDR and BToRIG (see Figure 4a). However two

of them, chr10-0234xxx & chr12-0730xxx and chr07-0131xxx & chr12-0140xxx,

have relatively small value of RIG and small changing of RIG (in comparison of

the heat selected and unselected pools). It means that these associations are

significant but are not considered here as sufficiently strongly correlated.

Furthermore, both these associations have approximately the same correlations

in the heat selected and unselected pools. As a result the only pair chr15-

0172xxx & chr15-1032xxx which is found in the previous study [6] is presented

in Figure 4d as a significant and sufficiently strong association in the context of

heat tolerance.

The hypothesis that the unselected pool contains independent loci is not correct

as it is shown in Figures 1 and 2. This fact was the basis for the the usage of

correlation change analysis rather than simple correlation analysis.

Three loci become constant in the heat selected pool: chr02-0522xxx, chr04-

0488xxx and chr05-0196xxx. Loci chr02-0522xxx and chr05-0196xxx contain

NA markers only. Locus chr04-0488xxx contains WA markers only.

We exclude constant loci and keep for analysis only one locus from any pair of

loci in strong linkage disequilibrium. After that, we apply two multi testing

approaches: calculation of q-value to estimate FDR and BToRIG to find pairs with

significant highest RIGs. The FDR approach identifies 23 significantly dependent

pairs of loci and BToRIG identifies 29 correlated pairs of loci, which include 22 of

pairs identified by FDR. Removing weak correlations and pairs with small change

of correlations in the heat selected pool in comparison with the unselected pool

decreases number of pairs of loci associated with heat tolerance to four pairs:

chr15-0172xxx & chr15-1032xxx, chr07-0859xxx & chr15-0172xxx, chr01-

0119xxx & chr13-0910xxx and chr10-0420xxx & chr15-0172xxx.

We also find that distribution of chr04-0461xxx & chr04-0496xxx in the two

pools are very similar but distribution of three adjacent loci chr04-0461xxx,

chr04-0488xxx and chr04-0496xxx are drastically changed. The association

between chr04-0461xxx and chr04-0496xxx in selected and in unselected pools

is the same and formally we exclude it from Figure 4d. Nevertheless, the

situation in chromosome 4 is very special: the combination NA, NA, WA, NA in

loci chr04-0454xxx, chr04-0461xxx, chr04-0488xxx and chr04-0496xxx occurs

in 31% of genotypes in the heat selected pool and in 0% of genotypes in the

unselected pool (Table 1). Therefore, this association should be considered as a

result of selection.

The application of statistical methods developed for other purposes are proving

useful in the determination of interactions among QTLs of complex traits,

bringing us closer to understanding the entire heritability of traits.

The biology of the interactions found will require further experimentation. In

particular the linked set of QTLs with heat tolerance promoting alleles coming

from both parents, and the interaction within the set is of interest. The genetic

architecture of this region could be the result of adaptation to heat tolerance in

both populations.

Acknowledgements: This work was in part supported by the Wellcome Trust

Institutional Strategic Support Fund WT097828/Z/11/Z (RM33G0255 and

RM33G0335).

References

1. Liti, G., & Louis, E. J. (2012). Advances in quantitative trait analysis in

yeast. PLoS genetics, 8(8), e1002912.

2. Ehrenreich IM, Torabi N, Jia Y, Kent J, Martis S, Shapiro JA, Gresham D,

Caudy AA, Kruglyak L. 2010. Dissection of genetically complex traits with

extremely large pools of yeast segregants. Nature 464(7291): 1039-1042.

3. Schwartz K, Wenger JW, Dunn B, Sherlock G. 2012. APJ1 and GRE3 homologs

work in concert to allow growth in xylose in a natural Saccharomyces sensu

stricto hybrid yeast. Genetics 191(2): 621-632.

4. Swinnen S, Schaerlaekens K, Pais T, Claesen J, Hubmann G, Yang Y, Demeke M,

Foulquie-Moreno MR, Goovaerts A, Souvereyns K et al. 2012. Identification of

novel causative genes determining the complex trait of high ethanol tolerance in

yeast using pooled-segregant whole-genome sequence analysis. Genome Res

22(5): 975-984.

5. Hubmann G, Mathe L, Foulquie-Moreno MR, Duitama J, Nevoigt E, Thevelein JM.

2013. Identification of multiple interacting alleles conferring low glycerol and

high ethanol yield in Saccharomyces cerevisiae ethanolic fermentation.

Biotechnol Biofuels 6(1): 87.

6. Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J.,

Molin, M., Zia, A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin,

R., & Liti, G. (2011). Revealing the genetic structure of a trait by

sequencing a population under selection. Genome research, 21(7), 1131-

1138.

7. Cubillos, F. A., Parts, L., Salinas, F., Bergström, A., Scovacricchi, E., Zia, A.,

Illingworth, C. J. R., Mustonen, V., Ibstedt, S., Warringer, J., Louis, E. J.,

Durbin, R., & Liti, G. (2013). High-resolution mapping of complex traits

with a four-parent advanced intercross yeast population. Genetics, 195(3),

1141-1155.

8. Cubillos, F. A., Billi, E., Zörgö, E., Parts, L., Fargier, P., Omholt, S., Blomberg,

A., Warringer, J., Louis, E. J., & Liti, G. (2011). Assessing the complex

architecture of polygenic traits in diverged yeast populations. Molecular

ecology, 20(7), 1401-1413.

9. Bloom, J. S., Ehrenreich, I. M., Loo, W. T., Lite, T. L. V., & Kruglyak, L.

(2013). Finding the sources of missing heritability in a yeast

cross. Nature,494(7436), 234-237.

10. Steinmetz, L. M., Sinha, H., Richards, D. R., Spiegelman, J. I., Oefner, P. J.,

McCusker, J. H., & Davis, R. W. (2002). Dissecting the architecture of a

quantitative trait locus in yeast. Nature, 416(6878), 326-330.

11. Sinha H, Nicholson BP, Steinmetz LM, McCusker JH. 2006. Complex genetic

interactions in a quantitative trait locus. PLoS Genet 2(2): e13.

12. Ben-Ari G, Zenvirth D, Sherman A, David L, Klutstein M, Lavi U, Hillel J,

Simchen G. 2006. Four linked genes participate in controlling sporulation

efficiency in budding yeast. PLoS Genet 2(11): e195.

13. Sinha, H., David, L., Pascon, R. C., Clauder-Münster, S., Krishnakumar, S.,

Nguyen, M., Shi, G., Dean, J., Davis, R. W., Oefner, P. J., McCusker, J. H., &

Steinmetz, L. M. (2008). Sequential elimination of major-effect

contributors identifies additional quantitative trait loci conditioning high-

temperature growth in yeast. Genetics, 180(3), 1661-1670.

14. Ehrenreich IM, Bloom J, Torabi N, Wang X, Jia Y, Kruglyak L. 2012. Genetic

architecture of highly complex chemical resistance traits across four yeast

strains. PLoS Genet 8(3): e1002570.

15. Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw

Hill, NY.

16. Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genome wide

studies. Proceedings of the National Academy of Sciences, 100(16), 9440-

9445.

17. Greenwood, P. E. & Nikulin, M. S. (1996). A Guide to Chi-squared Testing.

New York: Wiley

18. Hill, W. G., & Robertson, A. (1968). Linkage disequilibrium in finite

populations.Theoretical and Applied Genetics, 38(6), 226-231.

19. Lempel, A., & Greenberger, H. (1974). Families of sequences with optimal

Hamming-correlation properties. Information Theory, IEEE Transactions

on,20(1), 90-94.

20. Lewontin, R. C., & Kojima, K. I. (1960). The evolutionary dynamics of

complex polymorphisms. Evolution, 458-472.

21. Hirschhorn, J. N., & Daly, M. J. (2005). Genome-wide association studies for

common diseases and complex traits. Nature Reviews Genetics, 6(2), 95-108.

22. Verhoeven, K. J., Simonsen, K. L., & McIntyre, L. M. (2005). Implementing

false discovery rate control: increasing your power. Oikos, 108(3), 643-647.

23. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a

practical and powerful approach to multiple testing. Journal of the Royal

Statistical Society. Series B (Methodological), 289-300.

24. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The

Annals of Mathematical Statistics, 79-86.

25. Gorban, A. N., Smirnova, E. V., & Tyukina, T. A. (2010). Correlations, risk

and crisis: From physiology to finance. Physica A, 389(16), 3193-3217.

26. Censi, F., Giuliani, A., Bartolini, P., & Calcagnini, G. (2011). A multiscale

graph theoretical approach to gene regulation networks: a case study in

atrial fibrillation. Biomedical Engineering, IEEE Transactions on, 58(10),

2943-2946.