Page 1

Copyright © 2005 by the Genetics Society of America

DOI: 10.1534/genetics.104.040386

Bayesian Model Selection for Genome-Wide Epistatic Quantitative

Trait Loci Analysis

Nengjun Yi,*,†,1Brian S. Yandell,‡Gary A. Churchill,§David B. Allison,*,†

Eugene J. Eisen** and Daniel Pomp††

*Department of Biostatistics, Section on Statistical Genetics,†Clinical Nutrition Research Center, University of Alabama, Birmingham,

Alabama 35294,‡Departments of Statistics and Horticulture, University of Wisconsin, Madison, Wisconsin 53706,

§The Jackson Laboratory, Bar Harbor, Maine 04609, **Department of Animal Science, North Carolina

State University, Raleigh, North Carolina 27695 and††Department of Animal Science,

University of Nebraska, Lincoln, Nebraska 68583

Manuscript received December 29, 2004

Accepted for publication April 4, 2005

ABSTRACT

The problem of identifying complex epistatic quantitative trait loci (QTL) across the entire genome

continues to be a formidable challenge for geneticists. The complexity of genome-wide epistatic analysis

results mainly from the number of QTL being unknown and the number of possible epistatic effects being

huge. In this article, we use a composite model space approach to develop a Bayesian model selection

framework for identifying epistatic QTL for complex traits in experimental crosses from two inbred lines.

By placing a liberal constraint on the upper bound of the number of detectable QTL we restrict attention

to models of fixed dimension, greatly simplifying calculations. Indicators specify which main and epistatic

effects of putative QTL are included. We detail how to use prior knowledge to bound the number of

detectable QTL and to specify prior distributions for indicators of genetic effects. We develop a computa-

tionally efficient Markov chain Monte Carlo (MCMC) algorithm using the Gibbs sampler and Metropolis-

Hastings algorithm to explore the posterior distribution. We illustrate the proposed method by detecting

new epistatic QTL for obesity in a backcross of CAST/Ei mice onto M16i.

M

mined by multiple genetic and environmental influ-

ences (Lynch and Walsh 1998). Mounting evidence

suggests that interactions among genes (epistasis) play

an important role in the genetic control and evolu-

tion of complex traits (Cheverud 2000; Carlborg and

Haley 2004). Mapping quantitative trait loci (QTL) is

a process of inferring the number of QTL, their geno-

micpositions,andgeneticeffectsgivenobservedpheno-

type and marker genotype data. From a statistical per-

spective, two key problems in QTL mapping are model

search and selection (e.g., Broman and Speed 2002;

Sillanpa ¨a ¨ and Corander 2002; Yi 2004). Traditional

QTL mapping methods utilize a statistical model, which

estimates the effects of only one QTL whose putative

positionisscannedacrossthegenome(e.g.,Landerand

Botstein 1989; Jansen and Stam 1994; Zeng 1994).

Extensions of this approach can allow for main and epi-

static effects at two or perhaps a few QTL at a time and

employ a multidimensional scan to detect QTL. How-

ever, such an approach neglects potential confound-

ing effects from additional QTL and requires prohibi-

ANY complex human diseases and traits of bio-

logical and/or economic importance are deter-

tive corrections for multiple testing. Non-Bayesian model

selection methods combine simultaneous search with a

sequential procedure such as forward or stepwise selec-

tion and apply criteria such as P-values or modified Baye-

sian information criterion (BIC) to identify well-fitting

multiple-QTL models (Kao et al. 1999; Carlborg et al.

2000;Reifsnyderetal.2000;Bogdanetal.2004).These

methods, although appealing in their simplicity and pop-

ularity,haveseveraldrawbacks,including:(1)theuncer-

tainty about the model itself is ignored in the final in-

ference, (2) they involve a complex sequential testing

strategy that includes a dynamically changing null hy-

pothesis, and (3) the selection procedure is heavily in-

fluenced by the quantity of data (Raftery et al. 1997;

George 2000; Gelman et al. 2004; Kadane and Lazar

2004).

Bayesian model selection methods provide a power-

fulandconceptuallysimpleapproachtomappingmulti-

ple QTL (Satagopan et al. 1996; Hoeschele 2001; Sen

and Churchill 2001). The Bayesian approach pro-

ceeds by setting up a likelihood function for the pheno-

type and assigning prior distributions to all unknowns

in the problem. These induce a posterior distribution

on the unknown quantities that contains all of the avail-

able information for inference of the genetic architec-

ture of the trait. Bayesian mapping methods can treat

the unknown number of QTL as a random variable,

1Corresponding author: Department of Biostatistics, University of Ala-

bama, Ryals Public Health Bldg., 1665 University Blvd., Birmingham,

AL 35294-0022.E-mail: nyi@ms.soph.uab.edu

Genetics 170: 1333–1344 (July 2005)

Page 2

1334N. Yi et al.

A BAYESIAN MODEL SELECTION FRAMEWORK

FOR QTL MAPPING

whichhasseveraladvantagesbutresultsinthecomplica-

tion of varying the dimension of the model space. The

reversible jump Markov chain Monte Carlo (MCMC)

algorithm, introduced by Green (1995), offers a power-

ful and general approach to exploring posterior distri-

butions in this setting. However, the ability to “move”

between models of different dimension requires a care-

ful construction of proposal distributions. Despite the

challenges of implementation of reversible jump algo-

rithms, effective approaches for mapping multiple non-

interacting QTL have been developed (Satagopan and

Yandell1996;Heath1997;Thomasetal.1997;Uimari

and Hoeschele 1997; Sillanpa ¨a ¨ and Arjas 1998; Ste-

phens and Fisch 1998; Yi and Xu 2000; Gaffney 2001).

Bayesian model selection methods employing the re-

versible jump MCMC algorithm have been proposed to

mapepistaticQTLininbredlinecrossesandoutbredpop-

ulations (Yi and Xu 2002; Yi et al. 2003, 2004a,b; Narita

and Sasaki 2004). However, the complexity of the reversi-

ble jump steps increases computational demand and may

prohibit improvements of the algorithms.

Recently, Yi (2004) proposed a unified Bayesian

model selection framework to identify multiple nonepi-

static QTL for complex traits in experimental designs,

based upon a composite space representation of the

problem. The composite space approach, which is a

modification of the product space concept developed

by Carlin and Chib (1995), provides an interesting

viewpoint on a wide variety of model selection prob-

lems (Godsill 2001). The key feature of the composite

model space is that the dimension remains fixed,

allowing for MCMC simulation to be performed on a

space of fixed dimension, thus avoiding the complexi-

ties of reversible jump. In Yi (2004), the varying dimen-

sional space is augmented to a fixed dimensional space

(the composite model space) by placing an upper bound

on the number of detectable QTL. In the composite

model space, latent binary variables indicate whether

each putative QTL has a nonzero effect. The result-

ing hierarchical model can vastly simplify the MCMC

search strategy.

In this work we extend the composite model space

approach to include epistatic effects. We develop a frame-

work of Bayesian model selection for mapping epistatic

QTL in experimental crosses from two inbred lines. We

show how to incorporate prior knowledge to select an

upper bound on the number of detectable QTL and

prior distributions for indicator variables of genetic ef-

fects and other parameters. A computationally efficient

MCMC algorithm using a Gibbs sampler or Metropolis-

Hastings (M-H) algorithm is developed to explore the

posterior distribution on the parameters. The proposed

algorithm is easy to implement and allows more com-

plete and rapid exploration of the model space. We first

describe the implementation of this algorithm and then

illustrate the method by analyzing a mouse backcross

population.

We consider experimental crosses derived from two

inbred lines. In QTL studies, the observed data consist

of phenotypic trait values, y, and marker genotypes, m,

forindividualsinamappingpopulation.Weassumethat

markers are organized into a linkage map and restrict

attention to models with, at most, pairwise interactions.

We partition the entire genome into H loci, ? ? {?1,

. . . , ?H}, and assume that the possible QTL occur at

these fixed positions. This introduces only a minor bias

inestimatingthepositionofQTLwhenHislarge.When

the markers are densely and regularly spaced, we set ?

to the marker positions; otherwise, ? includes not only

the marker positions but also points between markers.

In general, the genotypes, g, at loci ? are unobservable

except at completely informative markers, but their

probabilitydistribution,p(g|?,m),canbeinferredfrom

the observed marker data using the multipoint method

(Jiang and Zeng 1997). This probability distribution is

used as the prior distribution of QTL genotypes in our

Bayesian framework.

The problem of inferring the number and locations

of multiple QTL is equivalent to the problem of select-

ing a subset of ? that fully explains the phenotypic varia-

tion. Although a complex trait may be influenced by

multitudes of loci, our emphasis is on a set of at most

L QTL with detectable effects. Typically L will be much

smaller than H. Let ? ? {?1, . . . , ?L} (?{?1, . . . , ?H})

be the current positions of L putative QTL. Each locus

may affect the trait through its marginal (main) effects

and/or interactions with other loci (epistasis). The phe-

notype distribution is assumed to follow a linear model,

y ? ? ? X? ? e,(1)

where ? is the overall mean, ? denotes the vector of

all possible main effects and pairwise interactions of L

potential QTL, X is the design matrix, and e is the vec-

tor of independent normal errors with mean zero and

variance ?2. The number of genetic effects depends on

the experimental design, and the design matrix X is

determined from those genotypes g at the current loci

? by using a particular genetic model (see appendix a

for details of the Cockerham genetic model used here).

There is prior uncertainty about which genetic effects

should be included in the model. As in Bayesian vari-

able selection for linear regression (e.g., George and

McCulloch 1997; Kuo and Mallick 1998; Chipman

et al. 2001), we introduce a binary variable ? for each

effect, indicating that the corresponding effect is in-

cluded (? ? 1) or excluded (? ? 0) from a model.

Letting ? ? diag(?), the model becomes

y ? ? ? X?? ? e. (2)

This linear model defines the likelihood, p(y|?, X, ?),

with ? ? (?, ?, ?2), and the full posterior can be writ-

ten as

Page 3

1335 Bayesian Analysis of Genome-Wide Epistasis

p(?, ?, g, ?|y, m) ? p(y|?, X, ?) p(?, ?, g, ?|m).wmand we, it may be better to first determine the prior

expected numbers of main-effect QTL, lm, and all QTL,

l0? lm(i.e., main-effect and epistatic QTL), and then

solve for wmand wefrom the expressions of the prior ex-

pected numbers. It is reasonable to require that wm? we,

which requires some adjustment below when lm? 0.

As shown in appendix b, the prior expected number

of main-effect QTL can be expressed as

(3)

Specifications of priors p(?, ?, g, ?|m) and posterior

calculation are given in subsequent sections.

The vector ? determines the number of QTL (see

appendix b). Hereafter, we denote the included po-

sitions of QTL by ??. The vector (?, ??) comprises a

model index that identifies the genetic architecture of

the trait. A natural model selection strategy is to choose

the most probable model (?, ??) on the basis of its

marginal posterior, p(?, ??|y, m) (George and Foster

2000). For genome-wide epistatic analysis, however, no

single model may stand out, and thus we average over

possible models when assessing characteristics of ge-

netic architecture, with the various models weighted by

their posterior probability (Raftery et al. 1997; Ball

2001; Sillanpa ¨a ¨ and Corander 2002).

lm? L[1 ? (1 ? wm)K],(5)

and the prior expected number of all QTL as

l0? L[1 ? (1 ? wm)K(1 ? we)K2(L?1)], (6)

where K is the number of possible main effects for each

QTL and K2is the number of possible epistatic effects

for any two QTL.

The prior expected number of main-effect QTL, lm,

could be set to the number of QTL detected by tra-

ditional nonepistatic mapping methods, e.g., interval

mapping or composite interval mapping (Lander and

Botstein 1989; Zeng 1994). The prior expected num-

ber of all QTL, l0, should be chosen to be at least lm.

The number of QTL detected by traditional epistatic

mapping methods, e.g., two-dimensional genome scan,

could provide a rough guide for choosing l0. From

Equations 5 and 6, we obtain

PRIOR DISTRIBUTIONS

The above Bayesian model selection framework pro-

videsaconceptuallysimpleandgeneralmethodtoiden-

tify complex epistatic QTL across the entire genome.

However, its practical implementation entails two chal-

lenges: prior specification and posterior calculation. In

this section, we first propose a method to choose an

upper bound for the number of QTL and then describe

the prior specifications for the model index and other

unknowns.

Choice of the upper bound L: We suggest first speci-

fying the prior expected number of QTL, l0, on the

basis of initial investigations with traditional methods,

and then determining a reasonably large upper bound,

L. We assign the prior probability distribution for the

number of QTL, l, to be a Poisson distribution with

mean l0. The value of L can be selected to be large

enough that the probability Pr(l ? L) is very small. On

the basis of a normal approximation to the Poisson

distribution, we could take L as l0? 3√l0.

Prior on ?: For the indicator vector ?, we use an

independence prior of the form

p(?) ??w?j

wm? 1 ??1 ?lm

L?

1/K

(7)

and

we? 1 ??

1 ? (l0/L)

(1 ? wm)K?

1/K2(L?1)

. (8)

We note above that if no main-effect QTL is detected

by traditional nonepistatic mapping methods and lm?

0, then wm ? 0. In this case, we suggest making all

weights equal, wm? we ?

?w, and using (6) to obtain

w ? 1 ??1 ?l0

L?

1/(K?K2(L?1))

. (9)

Prior on ?: When there is no prior information con-

cerning QTL locations, these could be assumed to be

independent and uniformly distributed over the H pos-

sible loci. Thus, given l0the prior probability that any

locus is included becomes l0/H. In practice, it may be

reasonable to assume that any intervals of a given length

(e.g., 10cM) contain atmost one QTL. Althoughthis as-

sumptionisnotnecessary,itcansubstantiallyreducethe

model space and thus accelerate the search procedure.

Prior on ?: We propose the following hierarchical

mixture prior for each genetic effect,

j(1 ? wj)1??j, (4)

where wj? p(?j? 1) is the prior inclusion probability

for the jth effect. We assume that wjequals the predeter-

mined hyperparameter wmor we, depending on the jth

effect being main effect or epistatic effect, respectively.

Under this prior, the importance of any effect is inde-

pendent of the importance of any other effect and the

prior inclusion probability of main effect is different

from that of epistatic effect.

Thehyperparameterswmand wecontroltheexpected

numbers of main and epistatic effects included in the

model, respectively; small wmand wewould concentrate

the priors on parsimonious models with few main ef-

fects and epistatic effects. Instead of directly specifying

?j|(?j, ?2, x•j) ? N(0, ?jc?2(xT

•jx•j)?1),(10)

wherex•j?(x1j,. . . ,xnj)Tisthevectorofthecoefficients

of ?j, and c is a positive scale factor. Many suggestions

have been proposed for choice of c for variable selec-

Page 4

1336N. Yi et al.

tion problems of linear regression (e.g., Chipman et al.

2001; Fernandez et al. 2001). In this study, we take c ?

n, which is a popular choice and yields the BIC if the

prior inclusion probability for each effect equals 0.5

(e.g., George and Foster 2000; Chipman et al. 2001).

In this prior setup, a point mass prior at 0 is used for

the genetic effect ?jwhen ?j? 0, effectively removing

?jfrom the model. If ?j? 1, the prior variances reflect

the precision of each ?j and are invariant to scales

changes in the phenotype and the coefficients. The

value (xT

fects. For a large backcross population with no segrega-

tion distortion, for example, (xT

ginal effects and [1 ? (1 ? 2r)2]/16 for epistatic effects,

with r the recombination fraction between two QTL,

under Cockerham’s model (Zeng et al. 2000).

Priors on ? and ?2: The prior for the overall mean

? is N(?0, ?2

p(??, g?, ??|?, y) ? p(y|?, X?, ??)p(??, g?, ??|?),

(14)

p(???, g??, ???|?, y) ? p(???, g??, ???|?), (15)

and

p(?|?, g, ?, y) ? p(y|?, X?, ??)p(?)p(??, g?, ??|?)

? p(???, g??, ???|?).(16)

It can be seen that the unused parameters do not affect

the conditional posterior of (??, g?, ??) and thus do

not need to be updated conditional on ?. Since the

unused parameters do not contribute to the likelihood,

the posterior of (???, g??, ???) is identical to its prior.

From (16), the conditional posterior of ? depends on

(???,g??,???)andthustheupdateof?requiresgenera-

tion of the corresponding unused parameters in the

current model. These properties lead us to develop

MCMC algorithms as described below. We first briefly

describe the algorithms for updating ??, g?, and ??and

then develop a novel Gibbs sampler and Metropolis-

Hastings algorithm to update the indicator variables for

main and epistatic effects, respectively.

Conditional on ?, X?, and ??, the parameters ?, ?2,

and ?? can be sampled directly from their posterior

distributions, which have standard form (Gelman et al.

2004). Conditional on ?, ??, and ??, the posterior distri-

bution of each element of g?is multinomial and thus

can be sampled directly as well (Yi and Xu 2002). We

adapt the algorithm of Yi et al. (2003) to our model to

update locations ??: (1) ? is restricted to the discrete

space ? ? {?1, . . . , ?H}, and (2) any intervals of some

length ? include at most one QTL. To update ?q, there-

fore, we propose a new location ?*

uniformly from 2d most flanking loci of ?q, where d is

apredeterminedinteger(e.g.,d?2),andthengenerate

genotypes at the new location for all individuals. The

proposals for the new location and the genotypes are

then jointly accepted or rejected using the Metropolis-

Hastings algorithm.

At each iteration of the MCMC simulation, we update

all elements of ? in some fixed or random order. For

the indicator variable of a main effect, we need to con-

sider two different cases: a QTL is currently (1) in or

(2) out of the model. For (1), the QTL position and

genotypes were generated at the preceding iteration.

For (2), we sample a new QTL position from its prior

distribution and generate its genotypes for all individu-

als. An epistatic effect involves two QTL, hence three

different cases: (1) both QTL are in, (2) only one QTL

is in, and (3) both QTL are out of the model. Again,

the new QTL position(s) and genotypes are sampled as

needed.

We update ?j, the indicator variable for an effect,

using its conditional posterior distribution of ?j, which

is Bernoulli,

•jx•j)?1varies for different types of genetic ef-

•jx•j)?1/n ?1⁄4for mar-

0). We could empirically set

?0? y ?1

n?

i?1

n

yi

and

?2

0? s2

y?

1

n ? 1?

n

i?1

(yi? y)2.

We take the noninformative prior for the residual vari-

ance, p(?2) ? 1/?2(Gelman et al. 2004). Although this

prior is improper, it yields a proper posterior distri-

bution for the unknowns and so can be used formally

(Chipman et al. 2001).

MARKOV CHAIN MONTE CARLO ALGORITHM

To develop our MCMC algorithm, we first partition

the vector of unknowns (?, g, ?) into (??, g?, ??) and

(???, g??, ???), representing the unknowns included

or excluded from the model, respectively, where ??and

g?(???and g??) are the positions and the genotypes

of QTL included (excluded), respectively, ??(???) rep-

resent the genetic effects included (excluded), ? ? (?,

?, ?2), ??? (??, ?, ?2), and ???? ???. Similarly, X?

(X??) represent the model coefficients included (ex-

cluded), which are determined by g and ?.

We suppressthe dependenceon theobserved marker

data below. For a particular ? the likelihood function

depends onlyupon theparameters (X?,??) usedby that

model, i.e.,

q for the qth QTL

p(y|?, X, ?) ? p(y|?, X?, ??).(11)

The prior distribution of (?, ?, g, ?) can be partitionedas

p(?, ?, g, ?) ? p(?)p(??, g?, ??|?)p(???, g??, ???|?).

(12)

The full posterior distribution for (?, ?, g, ?) can now

be expressed as

p(?, ?, g, ?|y) ? p(y|?, X?, ??)p(?)p(??, g?, ??|?)

? p(???, g??, ???|?).(13)

From (13), we can derive the conditional posterior dis-

tributions

Page 5

1337 Bayesian Analysis of Genome-Wide Epistasis

p(?j? 1|???j, X, ???j, y) ? 1 ? p(?j? 0|???j, X, ???j, y)

p(?h|y) ?1

N?

N

t?1?

L

q?1

1(?(t)

q ? ?h, ?(t)

q ? 1),h ? 0, 1, . . . , H,

(19)

?

wR

(1 ? w) ? wR

,

(17)

where ?qis the binary indicator that QTL q is included

or excluded from the model. Thus, we can obtain the

cumulative distribution function per chromosome, de-

finedasFc(x|y) ? ?

mosome c. It is worth noting that the cumulative distribu-

tionfunctiondefinedherecanbe?1ifthecorresponding

chromosome contains more than one QTL. Both p(?h|y)

and Fc(x|y) can be graphically displayed and show evi-

dence of QTL activity across the whole genome. Com-

monly used summaries include the posterior probabil-

ity that a chromosomal region contains QTL, the most

likely position of QTL (the mode of QTL positions),

and the region of highest posterior density (HPD) (e.g.,

Gelman et al. 2004). To take the prior specifications,

p(?h), into consideration, we can use the Bayes factor

to show evidence for inclusion of ?hagainst exclusion

of ?h(Kass and Raftery 1995),

where

R ?

p(y|?j? 1, ???j, X, ???j)

p(y|?j? 0, ???j, X, ???j)??

(?

??2

?j? ??2?

??2

n

i?1x2

ij

?j

?

?0.5

x

?h?0p(?h|y)foranypositionxonchro-

? exp?

1

2

n

i?1xij(yi? ? ? xi·? ? xij?j)??2)2

??2

?j? ??2?

n

i?1x2

ij

?,

xi•isthevectorofthecoefficientsof?fortheithindivid-

ual,w?pr(?j?1)isthepriorprobabilitythat?jappears

inthemodel,?2

10), ???jmeans all the elements of ? except for ?j, and

???jrepresents all the elements of ? except for ?j. We

cansample ?jdirectly from(17)or update?jwithproba-

bility min(1, r), where r ? ((w/1 ? w)R)1?2?j.

Theeffect?jwasintegratedfrom(17).Wecangenerate

?jas follows. If ?jis sampled to be zero, ?j? 0. Otherwise,

?jis generated from its conditional posterior

?jisthepriorvarianceof?j(seeEquation

BF(?h) ?

p(?h|y)

1 ? p(?h|y)·1 ? p(?h)

p(?h)

.

(20)

p(?j|?j? 1, ???j, X, ???j, y) ? N(? ˜j, ? ˜2

j),(18)

where

In a similar fashion, we can compute the Bayes factor

comparing a chromosomal region containing QTL to

that excluding QTL.

We can estimate the main effects at any locus or chro-

mosomal intervals ?,

? ˜j? (?2??2

?j??

n

i?1

x2

ij)?1?

n

i?1

xij(yi? ? ? xi•? ? xij?j)

and

? ˜?2

j

? ??2

?j? ??2?

n

i?1

x2

ij.

?k(?) ?1

N?

N

t?1?

L

q?1

1(?(t)

q ? ?, ?(t)

q ? 1)?(t)

qk,k ? 1, 2, . . . , K.

(21)

The heritabilities explained by the main effects can also

be estimated. In epistatic analysis, we need to estimate

two types of additional parameters, the posterior inclu-

sion probability and the size of epistatic effects, both

involving pairs of loci. These two types of unknowns can

be estimated with natural extensions of (19) and (21),

respectively.

POSTERIOR ANALYSIS

TheMCMCalgorithmdescribedabovestartsfromini-

tial values and updates each group of unknowns in turn.

Initial iterations are discarded as “burn-in.” To reduce

serial correlation, we thin the subsequent samples by

keeping every kth simulation draw and discarding the

rest, where k is an integer. The MCMC sampler se-

quence {(?(t), ?(t)

draw from the joint posterior distribution p(?, ??, g?,

??|y), and thus the embedded subsequence {(?(t), ?(t)

t ? 1, . . . , N} is a random sample from its marginal

posterior distribution p(?, ??|y), which is used to infer

the genetic architecture of the complex trait. For ge-

nome-wide epistatic analysis, no single model may stand

out, and we may average over all possible models to as-

sessgeneticarchitecture.Bayesianmodelaveragingpro-

videsmorerobustinferencesaboutquantitiesofinterest

than any single model since it incorporates model un-

certainty (Raftery et al. 1997; Ball 2001; Sillanpa ¨a ¨

and Corander 2002).

The most important characteristic may be the poste-

rior inclusion probability of each possible locus ?h, esti-

mated as

?, g(t)

?, ?(t)

?); t ? 1, . . . , N} is a random

EXAMPLE

?);We illustrate the application of our Bayesian model

selection approach by an analysis of a mouse cross pro-

duced from two highly divergent strains: M16i, consist-

ing of large and moderately obese mice, and CAST/Ei,

a wild strain of small mice with lean bodies (Leamy et al.

2002). CAST/Ei maleswere mated to M16ifemales, and

F1males were backcrossed to M16i females, resulting in

54 litters and 421 mice (213 males, 208 females) reach-

ing 12 weeks of age. All mice were genotyped for 92

microsatellitemarkerslocatedon19autosomalchromo-

somes. The marker linkage map covered 1214 cM with

average spacing of 13 cM. In this study, we analyze FAT,

the sum of right gonadal and hindlimb subcutaneous

fatpads.PriortoQTLanalysis,thephenotypicdatawere

Page 6

1338N. Yi et al.

Figure1.—ProfilesofLODscoresfrommaxi-

mum-likelihoodinterval mapping. Onthe x-axis,

large tick marks represent chromosomes and

small tick marks represent markers.

linearly adjusted by sex and dam and standardized to

mean 0 and variance 1, although this transformation

may result in the possibility of destroying true biological

interaction (Jansen 2003). We used the Cockerham

genetic model (appendix a), in which the coefficients

of main effects are defined as 0.5 and ?0.5 for the two

genotypes, CM and MM, where C and M represent the

CAST/Ei and M16i alleles, respectively.

We partitioned each chromosome with a 1-cM grid,

resulting in 1214 possible loci across the genome. A

nonepistatic and an epistatic QTL model were evalu-

ated. For all analyses, the MCMC started with no QTL

and ran for 4 ? 105cycles after discarding the first 2000

burn-ins. The chain was thinned by one in k ? 20,

yielding 2 ? 104samples for posterior Bayesian analysis.

An initial interval map scan revealed three significant

QTL (LOD ? 3.2) on chromosomes 2, 13, and 15 (Fig-

ure 1), explaining 20.7, 4.9, and 5.1% of the phenotypic

variance, respectively.

Under the nonepistatic analysis, epistatic effects are

always excluded from the model and thus putative QTL

are chosen only on the basis of their main effects. As

Figure 2.—Bayesian nonepistatic anal-

ysis: profiles of posterior inclusion proba-

bilityandcumulativeprobabilityfunction.

Black line, lm? 1; red line, lm? 3; blue

line,lm?6.Onthex-axis,largetickmarks

represent chromosomes and small tick

marks represent markers.

Page 7

1339Bayesian Analysis of Genome-Wide Epistasis

Figure 3.—Bayesian nonepistatic analysis:

profiles of Bayes factor. Black line, lm? 1; red

line, lm? 3; blue line, lm? 6. On the x-axis,

large tick marks represent chromosomes and

small tick marks represent markers.

described earlier, we took the number of significant

QTL detected in the interval mapping as the prior num-

ber of main-effect QTL (lm). To check prior sensitivity,

we reran the algorithm for lm? 1, 6. The upper bound

of the number of QTL was calculated as L ? lm? 3√lm,

or L ? 9, 4, and 14 for lm? 3, 1, and 6, respectively.

Therefore, the prior probabilities of inclusion for each

main effect were wm? 1 ? [1 ? (lm/L)]1/K?1⁄3,1⁄4, and

3⁄7, respectively. Figure 2, top, displays the posterior prob-

ability of inclusion for each locus across the genome.

Note the similarity to Figure 1, with clear evidence of

QTLandflatprofilesonotherchromosomes.Thepeaks

Figure4.—Bayesianepistaticanalysis:

profiles of posterior inclusion probabil-

ity and cumulative probability function.

Black line, l0? 4; red line, l0? 6; blue

line,l0?8.Onthex-axis,largetickmarks

represent chromosomes and small tick

marks represent markers.

Page 8

1340N. Yi et al.

Figure 5.—Bayesian epistatic analysis: pro-

files of Bayes factor. Black line, l0? 4; red line,

l0? 6; blue line, l0? 8. On the x-axis, large

tick marks represent chromosomes and small

tick marks represent markers.

on chromosomes 2, 13, and 15 overlap those identified

by interval mapping. The graphs of the cumulative dis-

tribution function, displayed in Figure 2, bottom, show

that the posterior inclusion probability of each chromo-

some is close to 1 for chromosomes 2, 13, and 15. The

results show that, at least in this data set, detection of

large-effect QTL is not sensitive to the choice of lm.

However, larger lmtend to pick up more small-effect

QTL as expected. The profiles of the Bayes factor are

depicted in Figure 3. For the three choices of lm, the

regions on chromosomes 2, 13, and 15 show strong

evidence for being selected, and other regions show a

very low Bayes factor.

The epistatic analysis took lm? 3, the number of

QTL detected in the nonepistatic analyses, as the prior

expected number of main-effect QTL. Three values,

l0? 4, 6, and 8, were chosen as the prior expected

number of all QTL under the epistatic model. The up-

per bound of the number of QTL, L, was thus L ? 10,

14, and 17, respectively. From Equations 7 and 8, the

prior inclusion probabilities were 0.30, 0.21, and 0.18

for main effects and 0.017, 0.025, and 0.027 for epistatic

effects, for the three values of (l0, L), respectively. The

profiles of the posterior inclusion probability for each

locus across the genome and the cumulative posterior

probabilityforeachchromosomearedepictedinFigure

4, top and bottom, respectively. It can be seen that the

three different prior specifications of (l0, L) provided

fairly similar profiles of the posteriors, indicating that

the posterior inference may be not very sensitive toward

the small or mediate change of l0. As expected, the

choice of a smaller prior expected number of QTL

tendedtoprovidesmallerposteriors,especiallyforinfre-

quently arising loci. However, the identification of fre-

quent arising loci remained the same. The profiles of

the Bayes factor are depicted in Figure 5. The three

choices of lmprovided similar profiles of the Bayes fac-

tor, especially for infrequently arising loci.

As shown in Figures 4 and 5, the epistatic analyses

detected the same regions on chromosomes 2, 13, and

15 as the nonepistatic analyses. In addition to those on

chromosomes2, 13,and 15,our epistaticanalyses found

strong evidence of QTL on chromosomes 1, 18, and

19 with high cumulative probabilities (close to 1) and

suggestive evidence of QTL on chromosomes 7 and 14.

In the nonepistatic analyses, these chromosomes were

found to have weak main effects and hence were de-

tected in the epistatic model mainly due to epistatic

interactions.

The profiles of the location-wise main effects and the

variances explained by the main effects are depicted in

Figure6.Forthethreepriorspecifications,theposterior

inferences were essentially identical. Therefore, we re-

ported only the summary statistics for l0? 6 (see Tables

1 and 2). For the HPD regions on chromosomes 2, 13,

and 15, the posterior inclusion probabilities are close

to 1, and the corresponding Bayes factors are high. The

estimated main effects were ?0.856, 0.371, and ?0.342

and explained 18.4, 3.5, and 3.1% of the phenotypic

variance, respectively. For the HPD regions on chromo-

somes1,18,and19,theposteriorinclusionprobabilities

were ?82, 88, and 70%, and the corresponding Bayes

factors were ?28, 47, and 12, respectively. In these HPD

regions, the average main effects were weak and ex-

Page 9

1341Bayesian Analysis of Genome-Wide Epistasis

Figure6.—Bayesianepistaticanalysis:

profiles of main effect and heritability

explainedbymaineffect.Blackline, l0?

4; red line, l0 ? 6; blue line, l0 ? 8.

Onthex-axis,largetickmarksrepresent

chromosomes and small tick marks rep-

resent markers.

plained low proportions of the phenotypic variance.

However,ourepistaticanalysesdetectedstrongepistatic

interactions associated with the HPD regions on chro-

mosomes 1, 18, and 19. As shown in Table 2, the strong-

est epistasis is the interaction between chromosomes 1

and 18. This epistatic effect was estimated to be 0.936

and explained 5.6% of the phenotypic variance. The pos-

terior inclusion probability of this epistasis was 81.9%.

The region of chromosome 19 was found to interact

with chromosomes 15 and 7. The interaction between

the regions of chromosomes 19 and 15 was 0.604 and

explained 2.5% of the phenotypic variance. The epi-

staticanalysesalsorevealedinteractionsamongchromo-

somes 2, 13, and 15. For example, the interaction be-

tween the HPD regions on chromosomes 2 and 13 was

included in the model with probability of ?60% and

explained ?2.5% of the phenotypic variance.

from relatively short runs. The Bayesian framework pro-

vides a robust inference of genetic architecture that

incorporates model uncertainty by averaging over all

possible models (Raftery et al. 1997; Ball 2001; Sil-

lanpa ¨a ¨ and Corander 2002).

One of the most challenging statistical problems pre-

sented by QTL mapping is that the number of QTL is un-

known. Most previous Bayesian mapping methods treat

QTL models as models of varying dimension and em-

ploy the reversible jump MCMC algorithm to explore

theposterior.Althoughsuchaframeworkisverygeneral

and powerful (Green 1995), it is difficult to implement

efficient search strategies. The key idea of the proposed

Bayesian approach is to turn varying dimensional space

of multiple-QTL models into fixed dimensional model

space by using a fixed but large set of known loci, ?,

and putting a constraint on the upper bound of the

number of detectable QTL. In this setting, posterior

simulation then can be achieved with a relatively simple

Gibbs sampler or M-H algorithm (Godsill 2001; Yi

2004).Thealgorithmproposedhereiniseasiertoimple-

ment than the reversible jump method and it reduces

the computational time of model search, an essential

feature for the practical analysis of complex genetic

architectures.

Aprerequisiteoftheproposedmethodisareasonable

choice of the upper bound of the number of detectable

DISCUSSION

The Bayesian model selection approach provides a

comprehensive solution to mapping multiple epistatic

QTLacrosstheentiregenomeusingtheposteriordistri-

bution as a selection criterion. MCMC algorithms based

on the composite model space representation mix rap-

idly, thus ensuring that high-probability models are vis-

ited frequently and quickly, resulting in good inference

Page 10

1342N. Yi et al.

TABLE 1

Summary statistics for epistatic analysis: high posterior density (HPD) regions of QTL locations,

posterior inclusion probabilities of main effects, Bayes factors, estimated main effects,

and heritabilities explained by main effects in the HPD regions

Chromosome

2 13151 18 197 14

HPD region (cM)

Posterior probability (%)

Bayes factor

Main effect

Heritability

[72, 85]

98.3

821.4

?0.856

0.184

[20, 42]

97.2

291.3

0.371

0.035

[1, 29]

93.5

92.2

?0.342

0.031

[26, 54]

81.9

28.1

?0.037

0.002

[43, 71]

88.4

47.3

0.103

0.015

[15, 45]

70.6

12.2

?0.167

0.020

[50, 75]

36.7

4.1

?0.137

0.019

[12, 41]

30.1

2.7

?0.147

0.009

QTL.Aminimalrequirementisthatthepredetermined

upper bound is greater than the true number of QTL

with high probability. As an extreme case, we could take

the total number of loci (H) as the upper bound. Since

the number of detectable QTL is usually much less than

H, such a choice is unlikely to be optimal. The sugges-

tion made here utilizes the expected number of QTL

and the prior probability distribution of the number

of QTL to determine the upper bound. The expected

number of QTL could be roughly estimated using stan-

dard genome scans. In practice, one could experiment

with several values of the expected number of QTL and

investigate their impact on the posterior inference. In

high-dimensional problems, specifying the prior distri-

butions on both the model space and parameters is

perhaps the most difficult aspect of Bayesian model

selection. We propose a novel method for elicitation of

prior distribution on the indicator variables. Instead of

directly specifying the prior inclusion probabilities wm

and we, the expected numbers of main-effect QTL and

all QTL can first be given incorporating previous results

and then are used to determine wmand we. Here we

have fixed wmand webut we could relax this by treating

wmand weas unknown model parameters and assigning

priors (Kohn et al. 2001).

A major difficulty of genome-wide epistatic analysis

iscreated bythe hugesizeof themodel space.Strategies

to reasonably reduce the model space, such as our pro-

posed composite model space approach, can improve

the performanceof theMCMC algorithmsand enhance

our ability to detect complex epistatic QTL. We parti-

tion the entire genome into intervals by a number of

points and restrict putative QTL to these fixed points,

reducing loci to a discrete space. Additional speedup is

achieved by computing the conditional probability of

the genotypes given the marker data on this fixed (but

dense) grid of possible locations before the MCMC pro-

cedure starts.

Several other strategies of reducing the model space

could be incorporated into the proposed approach to

improve the procedure. We could adopt a two-stage

search method, first searching for main-effect QTL and

second searching for epistatic effects of these and addi-

tional epistatic QTL given the already detected main-

effect QTL. The positions and main effects of the QTL

detected in the first stage should be updated in the

secondstagesinceinclusionofepistaticeffectsmayyield

more accurate estimation of the positions and the ef-

fects. Alternatively, we could selectively ignore some ge-

netic effects. Even with a moderate number of detect-

able QTL, the epistatic models must accommodate many

potential genetic effects. In a backcross population, for

example, there are a total of L(L ? 1)/2 (? 210, if L ?

20, say) possible effects, but many may be negligible.

TABLE 2

Summary statistics for epistatic analysis: posterior inclusion probabilities of epistatic effects,

estimated epistatic effects, and heritability explained by each epistatic effect

Posterior

probability (%)

Epistatic

effectHeritability

Chr 1 [26, 54] ? Chr 18 [43, 71]

Chr 2 [72, 85] ? Chr 13 [20, 42]

Chr 15 [1, 29] ? Chr 19 [15, 45]

Chr 2 [72, 85] ? Chr 14 [12, 41]

Chr 7 [50, 75] ? Chr 19 [15, 45]

Chr 13 [20, 42] ? Chr 15 [1, 29]

81.9

59.5

43.2

18.4

17.2

13.6

0.936

?0.575

0.606

0.567

0.552

?0.501

0.056

0.025

0.024

0.022

0.021

0.018

Chr, chromosome.

Page 11

1343Bayesian Analysis of Genome-Wide Epistasis

M. Wade, B. Brodie and J. Wolf. Oxford University Press, New

York.

Chipman,H.,E.I.EdwardsandR.E.McCulloch,2001

cal implementation of Bayesian model selection, pp. 65–116 in

Model Selection, edited by P. Lahiri. Institute of Mathematical

Statistics, Beachwood, OH.

Fernandez, C., E. Ley and M. F. J. Steel, 2001

for Bayesian model averaging. J. Econom. 100: 381–427.

Gaffney, P. J., 2001An efficient reversible jump Markov chain

Monte Carlo approach to detect multiple loci and their effects

in inbred crosses. Ph.D. Dissertation, Department of Statistics,

University of Wisconsin, Madison, WI.

Gelman, A., J. Carlin, H. Stern and D. Rubin, 2004

Analysis. Chapman & Hall, London.

George, E. I., 2000 The variable selection problem. J. Am. Stat.

Assoc. 95: 1304–1308.

George, E. I., and D. P. Foster, 2000

Bayes variable selection. Biometrika 87: 731–747.

George, E.I., and R.E. McCulloch, 1997

variable selection. Stat. Sin. 7: 339–373.

Godsill, S. J., 2001 On the relationship between MCMC model

uncertainty methods. J. Comput. Graph. Stat. 10: 230–248.

Green, P. J., 1995 Reversible jump Markov chain Monte Carlo com-

putation and Bayesian model determination. Biometrika 82:

711–732.

Heath, S. C., 1997Markov chain Monte Carlo segregation and

linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:

748–760.

Hoeschele, I., 2001 Mapping quantitative trait loci in outbred pedi-

grees, pp. 599–644 in Handbook of Statistical Genetics, edited by

D. J. Balding, M. Bishop and C. Cannings. Wiley, New York.

Jansen,R.C.,2003Studyingcomplexbiologicalsystemsusingmulti-

factorial perturbation. Nat. Rev. Genet. 4: 145–151.

Jansen, R. C., and P. Stam, 1994 High resolution of quantitative traits

into multiple loci via interval mapping. Genetics 136: 1447–1455.

Jiang, C., and Z-B. Zeng, 1997Mapping quantitative trait loci with

dominantandmissingmarkersinvariouscrossesfromtwoinbred

lines. Genetica 101: 47–58.

Kadane,J.B.,andN.A.Lazar,2004

selection. J. Am. Stat. Assoc. 99: 279–290.

Kao, C. H., and Z-B. Zeng, 2002

trait loci using Cockerham’s model. Genetics 160: 1243–1261.

Kao, C. H., Z-B. Zeng and R. D. Teasdale, 1999

mapping for quantitative trait loci. Genetics 152: 1203–1216.

Kass,R.E., andA.E.Raftery, 1995

90: 773–795.

Kohn, R., M. Smith and D. Chen, 2001

using linear combinations of basis functions. Stat. Comput. 11:

313–322.

Kuo, L., and B. Mallick, 1998

models. Sankhya Ser. B 60: 65–81.

Lander, E. S., and D. Botstein, 1989

underlying quantitative traits using RFLP linkage maps. Genetics

121: 185–199.

Leamy, L. J., D. Pomp, E. J. Eisen and J. M. Cheverud, 2002

ropy of quantitative trait loci for organ weights and limb bone

lengths in mice. Physiol. Genomics 10: 21–29.

Lynch, M., and B. Walsh, 1998

Traits. Sinauer Associates, Sunderland, MA.

Narita, A., and Y. Sasaki, 2004

epistatic effects under a mixed inheritance model in an outbred

population. Genet. Sel. Evol. 36: 415–433.

Raftery, A. E., D. Madigan and J. A. Hoeting, 1997

model averaging for linear regression models. J. Am. Stat. Assoc.

92: 179–191.

Reifsnyder, P. R., G. Churchill and E. H. Leiter, 2000

environment and genotype interact to establish diabesity in mice.

Genome Res. 10: 1568–1578.

Satagopan,J.M.,andB.S.Yandell,1996

quantitative trait loci via Bayesian model determination. Special

Contributed Paper Session on Genetic Analysis of Quantitative

Traits and Complex Disease. Biometric Section, Joint Statistical

Meeting, Chicago.

Satagopan, J. M., B. S. Yandell, M. A. Newton and T. C. Osborn,

1996A Bayesian approach to detect quantitative trait loci using

Markov chain Monte Carlo. Genetics 144: 805–816.

To see this, categorize putative QTL into three types:

(1) QTL with main effects (main-effect QTL), (2) QTL

with weak main effects but epistatic effects with other

main-effect QTL, and (3) QTL with weak main effects

butepistaticeffectsamongthemselves.Lettingthenum-

bers of these three types of QTL be L1, L2, and L3(L ?

L1 ? L2 ? L3), respectively, and ignoring the main

effects of (2) and (3) QTL, the number of possible ef-

fects reduces to L1(L1? 1)/2 ? L1L2? L3(L3? 1)/2

(? 115, if L1? 10, L2? 5, and L3? 5). These three

types of QTL can be detected either simultaneously or

conditionally with a three-stage approach.

A number of extensions of the basic model are possi-

ble within this framework. The simplicity of the MCMC

search enhances the overall flexibility of this approach

and enables one to consider analysis in more complex

settings. Extensions to binary or ordinal traits, inclusion

of fixed- or random-effect covariates, and gene-by-envi-

ronmentinteractionsarefeasible.Inprinciple,thecom-

posite space method can be directly applied to identify

higher-order interactions. However, the dramatic in-

crease in the size of model space is likely to limit the

performance of the MCMC algorithm. We regard the

methodsproposedhereasasteptowardachievingmore

efficientandcomprehensiveanalysisofcomplexgenetic

architectures. There are many opportunities to extend

and improve upon this general approach.

Thepracti-

Benchmark priors

Bayesian Data

Calibration and empirical

Approachesfor Bayesian

N.Y.andD.B.A.weresupportedbyNationalInstitutesofHealth(NIH)

grants NIH RO1GM069430, NIH RO1ES09912, NIH RO1 DK056366,

NIH P30DK056336, and an obesity-related pilot/feasibility studies

grant at the University of Alabama (Birmingham) (528176). G.A.C.

was supported by NIH GM070683. B.S.Y. was supported by NIH/

National Institute of Diabetes and Digestive and Kidney Diseases

(NIDDK)5803701,NIH/NIDDK66369-01,AmericanDiabetesAssoci-

ation 7-03-IG-01, and U.S. Department of Agriculture Cooperative

State Research,Education andExtension Servicegrants tothe Univer-

sity of Wisconsin (Madison) (C.J. and B.S.Y.). This research is a con-

tribution of the University of Nebraska Agricultural Research Division

(Lincoln, NE; journal series no. 14858) and the North Carolina Ag-

ricultural Research Service and was supported in part by funds pro-

vided through the Hatch Act.

Methodsandcriteriaformodel

Modeling epistasis of quantitative

Multiple interval

Bayesfactors.J.Am. Stat.Assoc.

Nonparametric regression

Variable selection for regression

Mapping Mendelian factors

Pleiot-

LITERATURE CITED

Genetics and Analysis of Quantitative

Ball, R. D., 2001

ping based on model selection: approximate analysis using the

Bayesian information criterion. Genetics 159: 1351–1364.

Bogdan, M., J. K. Ghosh and R. W. Doerge, 2004

Schwarz Bayesian information criterion to locate multiple inter-

acting quantitative trait loci. Genetics 167: 989–999.

Broman, K. W., and T. P. Speed, 2002

foridentificationofquantitativetraitlociinexperimentalcrosses.

J. R. Stat. Soc. B 64: 641–656.

Carlborg, O., and C. Haley, 2004

in complex trait studies? Nat. Rev. Genet. 5: 618–625.

Carlborg, O., L. Andersson and B. Kinghorn, 2000

a genetic algorithm for simultaneous mapping of multiple inter-

acting quantitative trait loci. Genetics 155: 2003–2010.

Carlin, B. P., and S. Chib, 1995

chain Monte Carlo. J. Am. Stat. Assoc. 88: 881–889.

Cheverud, J. M., 2000Detecting epistasis among quantitative trait

loci, pp. 58–81 in Epistasis and the Evolutionary Process, edited by

Bayesian methods for quantitative trait loci map-

Detection of multiple QTL with

Modifying the

Bayesian

A model selection approach

Maternal

Epistasis: Too often neglected

Estimatingthenumberof

The use of

Bayesian model choice via Markov

Page 12

1344 N. Yi et al.

Sen,S.,andG.Churchill,2001

tive trait mapping. Genetics 159: 371–387.

Sillanpa ¨a ¨, M. J., and E. Arjas, 1998

quantitative trait loci from incomplete inbred line cross data.

Genetics 148: 1373–1388.

Sillanpa ¨a ¨, M. J., and J. Corander, 2002

mapping: what and why. Trends Genet. 18: 301–307.

Stephens,D. A.,and R.D.Fisch, 1998

tive trait locus data using reversible jump Markov chain Monte

Carlo. Biometrics 54: 1334–1347.

Thomas, D. C., S. Richardson, J. Gauderman and J. Pitkaniemi,

1997A Bayesian approach to multipoint mapping in nuclear

families. Genet. Epidemiol. 14: 903–908.

Uimari, P., and I. Hoeschele, 1997

traitlociusingBayesianmethodanalysisandMarkovchainMonte

Carlo algorithms. Genetics 146: 735–743.

Yi, N., 2004A unified Markov chain Monte Carlo framework for

mapping multiple quantitative trait loci. Genetics 167: 967–975.

Yi, N., and S. Xu, 2000Bayesian mapping of quantitative trait loci

for complex binary traits. Genetics 155: 1391–1403.

Yi, N., and S. Xu, 2002Mapping quantitative trait loci with epistatic

effects. Genet. Res. 79: 185–198.

Yi, N., D. B. Allison and S. Xu, 2003

search strategies for mapping multiple epistatic quantitative trait

loci. Genetics 165: 867–883.

Yi,N.,A.Diament,S.Chiu,J.FislerandC.Warden,2004a

erization of epistasis influencing complex spontaneous obesity

in the BSB model. Genetics 167: 399–409.

Yi, N., A. Diament, S. Chiu, J. Fisler and C. Warden, 2004b

interaction between chromosomes 7 and 3 influences hepatic

lipase activity in BSB mice. J. Lipid Res. 45: 2063–2070.

Zeng, Z-B., 1994Precision mapping of quantitative trait loci. Genet-

ics 136: 1457–1468.

Zeng, Z-B., C. Kao and C. J. Basten, 2000

architecture of quantitative traits. Genet. Res. 74: 279–289.

Astatisticalframeworkforquantita-

xiq1? ziq? 1,

Bayesian mapping of multiple

xiq2? (1 ? xiq1)(1 ? xiq1) ? 0.5,

Model choice in gene

xiqq?k?

⎧

⎪

⎨

⎪

⎩

xiq1xiq?1,

xiq1xiq?2,

xiq2xiq?1,

xiq2xiq?2,

k ? 1

k ? 2

k ? 3

k ? 4.

Bayesiananalysis ofquantita-

For the Cockerham model, ?q1and ?q2correspond to

additive and dominance effects of QTL q, respectively;

and ?qq?1, ?qq?2, ?qq?3, and ?qq?4are the epistatic effects be-

tween loci q and q?, called additive-by-additive, additive-

by-dominance, dominance-by-additive, and dominance-

by-dominanceeffects,respectively.TheCockerhammodel

keeps the same interpretation of main effects with or

without epistatic effects. However, main effects should

always be interpreted with caution in the presence of

epistatic interactions.

Mapping linked quantitative

Bayesian model choice and

Charact-

APPENDIX B: THE PRIOR EXPECTED NUMBER OF

QTL INCLUDED IN THE MODEL

Epistatic

We define ?qas the binary variable to indicate inclu-

sion (?q? 1) or exclusion (?q? 0) of QTL q. QTL q is

included into the model when and only when at least

one of the genetic effects associated with QTL q is in-

cluded. Therefore, we have

Estimating the genetic

Communicating editor: J. B. Walsh

?q? 1 ??

K

k?1

(1 ? ?qk)?

K2

k?1? ?

L

q??q

(1 ? ?qq?k)?

L

q??q

(1 ? ?q?qk)?,

APPENDIX A: THE MODIFIED COCKERHAM

EPISTATIC MODEL FOR BACKCROSS AND

INTERCROSS POPULATIONS

where K is the number of possible main effects for each

locus, K2is the number of possible epistatic effects for

any two loci, and ?qkand ?qq?kare the indicators of main

and epistatic effects, respectively. The actual number of

QTL then equals ?L

of all QTL is the expectation of the actual number of

QTL and thus can be derived as

For a mapping population with K ? 1 genotypes per

locus, there are K marginal effect degrees of freedom

(d.f.) for each locus and K2interaction-effect d.f. for

any two loci. The design matrix X for model (1) has KL

main-effectcoefficients,xiqk,andK2L(L?1)/2epistatic

effect coefficients, xiqq?k, obtained from the genotypes at

the corresponding loci by using a particular epistatic

model. The main and epistatic effects are denoted by

?qkand ?qq?k, respectively.

For a backcross population, there are two segregating

genotypes denoted by bqbq, Bqbqat locus q. For the com-

monly used Cockerham epistatic model (Kao and Zeng

2002), the coefficients are defined as

q?1?q. The prior expected number

l0??

L

q?1

pr(?q? 1)

? L ??

L

q?1??

K

k?1

pr(?qk? 0)?

K2

k?1??

L

q??q

pr(?qq?k? 0)?

L

q??q

pr(?q?qk? 0)??

? L[1 ? (1 ? wm)K(1 ? we)K2(L?1)].

If we consider only main effects, then QTL q is included

into the model when at least one of the main effects of

QTL q is included. The binary indicator variable of QTL

q then becomes ?q? 1 ? ?

prior expected number of main-effect QTL is

pr(?qk? 0)?? L[1 ? (1 ? wm)K].

xiq1? ziq? 0.5andxiqq?1? xiq1xiq?1,

where ziqdenotes the number of alleles Bq. For an in-

tercross derived from two inbred lines, there are three

segregating genotypes denoted by bqbq, Bqbq, and BqBqat

locus q. For the commonly used Cockerham epistatic

model, the coefficients are defined as

K

k?1(1 ? ?qk). Therefore, the

lm? L ??

L

q?1??

K

k?1