# Bayes factor between Student t and Gaussian mixed models within an animal breeding context.

**ABSTRACT** The implementation of Student t mixed models in animal breeding has been suggested as a useful statistical tool to effectively mute the impact of preferential treatment or other sources of outliers in field data. Nevertheless, these additional sources of variation are undeclared and we do not know whether a Student t mixed model is required or if a standard, and less parameterized, Gaussian mixed model would be sufficient to serve the intended purpose. Within this context, our aim was to develop the Bayes factor between two nested models that only differed in a bounded variable in order to easily compare a Student t and a Gaussian mixed model. It is important to highlight that the Student t density converges to a Gaussian process when degrees of freedom tend to infinity. The two models can then be viewed as nested models that differ in terms of degrees of freedom. The Bayes factor can be easily calculated from the output of a Markov chain Monte Carlo sampling of the complex model (Student t mixed model). The performance of this Bayes factor was tested under simulation and on a real dataset, using the deviation information criterion (DIC) as the standard reference criterion. The two statistical tools showed similar trends along the parameter space, although the Bayes factor appeared to be the more conservative. There was considerable evidence favoring the Student t mixed model for data sets simulated under Student t processes with limited degrees of freedom, and moderate advantages associated with using the Gaussian mixed model when working with datasets simulated with 50 or more degrees of freedom. For the analysis of real data (weight of Pietrain pigs at six months), both the Bayes factor and DIC slightly favored the Student t mixed model, with there being a reduced incidence of outlier individuals in this population.

**0**Bookmarks

**·**

**66**Views

- Joaquim Casellas, Rodrigo J Gularte, Charles R Farber, Luis Varona, Margarete Mehrabian, Eric E Schadt, Aldon J Lusis, Alan D Attie, Brian S Yandell, Juan F Medrano[Show abstract] [Hide abstract]

**ABSTRACT:**Transmission ratio distortion (TRD) is the departure from the expected genotypic frequencies under Mendelian inheritance. This departure can be due to multiple physiological mechanisms during gametogenesis, fertilization, fetal and embryonic development, and early neonatal life. Although a few TRD loci have been reported in mouse, inheritance patterns have never been evaluated for TRD. In this article, we developed a Bayesian binomial model accounting for additive and dominant deviation TRD mechanisms. Moreover, this model was used to perform genome-wide scans for TRD quantitative trait loci (QTL) on six F2 mouse crosses involving between 296 and 541 mice and between 72 and 1854 genetic markers. Statistical significance of each model was checked at each genetic marker with Bayes factors. Genome scans revealed overdominance TRD QTL located in mouse chromosomes 1, 2, 12, 13, and 14 and additive TRD QTL in mouse chromosomes 2, 3, and 15, although these results did not replicate across mouse crosses. This research contributes new statistical tools for the analysis of specific genetic patterns involved in TRD in F2 populations, our results suggesting a relevant incidence of TRD phenomena in mouse with important implications for both statistical analyses and biological research.Genetics 02/2012; 191(1):247-59. · 4.39 Impact Factor

Page 1

Original article

Bayes factor between Student t

and Gaussian mixed models

within an animal breeding context

Joaquim CASELLAS1*, Noelia IBA´N ˜EZ-ESCRICHE1,

LuisAlbertoGARCI´A-CORTE´S2,LuisVARONA1

1Gene `tica i Millora Animal, IRTA-Lleida, 25198 Lleida, Spain

2Departamento de Mejora Gene ´tica Animal, SGIT-INIA, Carretera de la Corun ˜a, km. 7,

28040 Madrid, Spain

(Received 2 April 2007; accepted 19 December 2007)

Abstract – The implementation of Student t mixed models in animal breeding has been

suggested as a useful statistical tool to effectively mute the impact of preferential treatment

or other sources of outliers in field data. Nevertheless, these additional sources of variation

are undeclared and we do not know whether a Student t mixed model is required or if a

standard, and less parameterized, Gaussian mixed model would be sufficient to serve the

intended purpose. Within this context, our aim was to develop the Bayes factor between

two nested models that only differed in a bounded variable in order to easily compare a

Student t and a Gaussian mixed model. It is important to highlight that the Student t

density converges to a Gaussian process when degrees of freedom tend to infinity. The two

models can then be viewed as nested models that differ in terms of degrees of freedom. The

Bayes factor can be easily calculated from the output of a Markov chain Monte Carlo

sampling of the complex model (Student t mixed model). The performance of this Bayes

factor was tested under simulation and on a real dataset, using the deviation information

criterion (DIC) as the standard reference criterion. The two statistical tools showed similar

trends along the parameter space, although the Bayes factor appeared to be the more

conservative. There was considerable evidence favoring the Student t mixed model for data

sets simulated under Student t processes with limited degrees of freedom, and moderate

advantages associated with using the Gaussian mixed model when working with datasets

simulated with 50 or more degrees of freedom. For the analysis of real data (weight of

Pietrain pigs at six months), both the Bayes factor and DIC slightly favored the Student t

mixed model, with there being a reduced incidence of outlier individuals in this population.

Bayes factor / Gaussian distribution / mixed model / Student t distribution / preferential

treatment

*Corresponding author: Joaquim.Casellas@irta.es

Genet. Sel. Evol. 40 (2008) 395–413

? INRA, EDP Sciences, 2008

DOI: 10.1051/gse:2008007

Available online at:

www.gse-journal.org

Article published by EDP Sciences

Page 2

1. INTRODUCTION

Genetic evaluations in animal breeding are generally performed using the

mixed effects models pioneered by Henderson [9]. Usually, these models assume

Gaussian distributions for most random effects, including the residuals, and in

absence of contradictory evidence, it is practical to assume normality on the basis

of both mathematical convenience and biological plausibility. Nevertheless,

departures from normality are common in animal breeding, e.g. when more valu-

able animals receive preferential treatment [14,15]. This preferential treatment

could be defined as any management practice that increases or decreases produc-

tion and is applied to one or several animals, but not to their contemporaries [14].

Amongst others, these practices may include separate housing, better (or worse)

or more (or less) feed, or better (or worse) sanitary attentions. Obviously, which

animals or productive records receive preferential treatment is not known with

any degree of certainty in real populations and this information loss could imply

substantial bias in genetic evaluations [14,15]. Other potential causes of outliers

or abnormal phenotypic records could be measurement errors, sickness, short-

term-changes in herd environment and mismanagement of data [11].

We generally lack a priori sufficient information relating to the presence or

absence of preferential treatment in our livestock data sets. It has been recently

demonstrated that the specification of heavy-tailed residual distributions (such as

the Student t distribution) instead of the usual Gaussian process in best linear

unbiased prediction (BLUP) models may effectively mute the impact of residual

outliers, particularly in situations where the preferential treatment of some breed

stock may be anticipated [16,21]. As a result, accurate statistical tests are

required to compare the mathematical simplicity of the Gaussian mixed model

with the improved goodness of fit (under preferential treatment or other

unknown sources of outliers) of the Student t mixed model.

General statistical tools such as the deviance information criterion (DIC) [20]

or other approaches to Bayes factors [6] have been used to make comparisons

between Gaussian and Student t mixed models. However, they imply high com-

putational demands because both the Gaussian and the Student t mixed model

must be analysed to calculate the corresponding comparison parameter. Within

this context, the Bayes factor developed by Garcı ´a-Corte ´s et al. [5] and Varona

et al. [23] in the animal breeding context implies a substantial simplification

because it compares two models that only differ in terms of a single bounded

variable, and therefore only the analysis of the complex model is required.

The Student t distribution converges with the Gaussian distribution when the

number of degrees of freedom tends to infinity. This property can be exploited

396

J. Casellas et al.

Page 3

to appropriately adapt Varona et al. [23] Bayes factor, generating a useful statis-

tical tool for the analysis of field data, especially when used for genetic evalu-

ation purposes. In this paper, we focused our efforts on describing the

development of this Bayes factor to make comparisons between Gaussian and

Student t processes, and we tested its performance on both simulated and real

data sets, using DIC as the standard reference criterion.

2. MATERIALS AND METHODS

2.1. Statistical background for Student t mixed models

Take as a starting point a standard linear model [9] such as

y ¼ Xb þ Wp þ Za þ e;

ð1Þ

where y is the vector with n phenotypic data, X, W, Z are the incidence matri-

ces of systematic (b), permanent environmental (p) and additive genetic

effects (a), respectively, and e is the vector of residuals. The probability

density of phenotypic data can be modeled under a multivariate Student t

distribution with m degrees of freedom (with m being equal to or greater

than 2):

p y b;p;a;r2

e;m

??

??¼

Y

n

i¼1

Cmþ1

? ?C1

ð

2

? ?m

yi? xib ? wip ? zia

??

2

C

m

2

1

2

r2

e

???1

2

? 1 þ

Þ

mr2

0yi? xib ? wip ? zia

e

ðÞ

"#?1

2mþ1

ðÞ

;

ð2Þ

where xi, wiand ziare the ith row of X, W and Z, respectively, yiis the ith

scalar element of y, r2

function with the argument as defined within parentheses. For small values

of m, the Student t distribution shows a Gaussian-like pattern with increased

probability in tails, whereas this distribution converges to a Gaussian distribu-

tion when m tends to infinity [16]. For mathematical convenience, we can

define d = 2/m (0 ? d < 1) and then, the conditional density (2) reduces to

a normal density when d = 0 (as is, m tends to infinity).

Following Strande ´n and Gianola [21], the previous model can be extended

to an alternative parameterization if the data vector is partitioned according to

eis the residual variance and C(.) is the standard gamma

Student t versus Gaussian mixed models

397

Page 4

m ‘clusters’ typified by a common factor (e.g. animal, maternal environment,

herd-year-season at birth), with the previous linear model defined as:

2

y1

...

ym

664

3

775¼

X1

...

Xm

2

664

3

775b þ

W1

...

Wm

2

664

3

775p þ

Z1

...

Zm

2

664

3

775a þ

e1

...

em

2

664

3

775;

ð3Þ

Xj,WjandZjbeingtheappropriateincidencematricesofrecordsinthejthclus-

ter (yj), and ejbeing the corresponding vector of residuals. This reparameteriza-

tion allows for an alternative description of the conditional density of y [21]:

p y b;p;a;r2

e;d

??

??¼

Y

m

j¼1

p yjb;p;a;r2

e;s2

j

???

??

p s2

jd j

??

;

ð4Þ

where p yjb;p;a;r2

e;sj

??

??is a multivariate normal distribution weighted by s2

p yjb;p;a;r2

j

? N Xjb þ Wjp þ Zja;Inj

j,

e;s2

???

??

r2

s2

e

j

!

;

ð5Þ

Injbeing an identity matrix with dimensions nj· nj, and the conditional dis-

tribution of the mixing parameter (s2

j) is a Gamma density

p s2

jd j

??

¼

1

2d

? ? s2

? ? 1

C

2d

1

2d

j

? ?

1

2d?1

ðÞexp ?s2

j

2d

??

ð6Þ

with it having an expectation of 1 when d = 0 [4,21].

2.2. Bayes factor between Student t and Gaussian linear models

The Bayes factor developed by Verdinelli and Wasserman [25], and applied

to the animal breeding context by Garcı ´a-Corte ´s et al. [5] and Varona et al. [23],

contrasts nested linear models that only differ in terms of a bounded variable.

We adapted this methodology to compare a Student t mixed linear model with

its simplification to the Gaussian mixed linear model when m tends to infinity or,

for mathematical convenience, d = 2/m = 0. Within this context, the posterior dis-

tribution of all the parameters of a Student t mixed model can be stated in two

ways, with a pure Student t Bayesian likelihood (Model T1):

?

? pTr2

pT1b;p;a;r2

p;r2

a;r2

e;d y j

?

/ pT1y b;p;a;r2

?

e;d

??

??

??pTd ð ÞpTb

a

ð ÞpTp r2

?pTr2

p

???

?

e

?

p

?

pTa A;r2

??pTr2

a

???;

ð7Þ

398

J. Casellas et al.

Page 5

or with a Gaussian · Gamma Bayesian likelihood (Model T2):

pT2b;p;a;r2

p;r2

a;r2

e;d;s2

j2 1;m

ðÞy j

??

/

Y

? pTd ð ÞpTb

? pTa A;r2

m

j¼1

pT2yjb;p;a;r2

e;s2

j

???

ð ÞpTp r2

?pTr2

??

pTr2

?

pT2s2

jd j

?

??

p

???

?

?

?

p

?

ea

??

?

a

?pTr2

?;

ð8Þ

where A is the numerator relationship matrix between individuals. Following

in part Varona et al. [23], the prior distribution assumed for the bounded

variable (d) was assumed

(

pTd ð Þ ¼

1 if d 2 0;1

otherwise:

½?;

0

ð9Þ

The permanent environmental and the additive genetic effects were assumed

to be drawn from multivariate normal distributions,

???

pTa A;r2

a

pT p r2

p

??

? N 0;Ipr2

p

??

;

ð10Þ

??

??? N 0;Ar2

a

??;

ð11Þ

with Ipbeing an identity matrix with dimensions equal to the number of

elements of p. The prior distributions for the remaining parameters of the

model were defined as:

?

0otherwise for each levellof b;

pTb

ð Þ ¼

k1

if bl2 ?

1

2k1;

1

2k1

?

;

8

>

>

:

<

ð12Þ

pTr2

p

??

¼

k2

if r2

p2 0;1

k2

??

;

0 otherwise;

8

>

>

:

<

ð13Þ

Student t versus Gaussian mixed models

399

Page 6

pTr2

a

??¼

k3

if r2

a2 0;1

k3

??

;

0otherwise

8

>

>

:

8

>

<

ð14Þ

pTr2

e

??¼

k4

if r2

e2 0;1

k4

??

;

0 otherwise;

>

:

<

ð15Þ

where k1, k2, k3and k4are four values that were small enough to ensure a flat

distribution over the parameter space [23].

The joint posterior distribution of all the parameters in the alternative Gaus-

sian mixed model (Model G) was proportional to

?

? pGr2

pGb;p;a;r2

p;r2

a;r2

ey j

?

/ pGy b;p;a;r2

?

e

??

??pGb

ð ÞpGp r2

?pGr2

p

???

?

?

?

p

?

pGa A;r2

a

??

?

a

?pGr2

e

??;

ð16Þ

where the Bayesian likelihood was defined as multivariate normal,

??

and the prior distributions pG(b), pGp r2

pGr2

e

The Bayes factor between Model T1(or Model T2) and Model G (BFT/G) can

be easily calculated from the Markov chain Monte Carlo sampler output of the

complex model (Student t mixed model). Under Model T1, the conditional

posterior distribution of all the parameters in the model did not reduce to well-

known distributions and generic sampling processes such as Metropolis-Hastings

[8] are required. Simplicity was gained under the alternative Model T2during the

sampling process. In this case, sampling from all the parameters in Model T2can

beperformedusingaGibbssampler [7],withtheexceptionofd, whichrequiresa

Metropolis-Hastings step [8]. Following Garcı ´a-Corte ´s et al. [5] and Varona et al.

[23], the posterior density pT(d = 0|y) suffices to obtain BFT/G,

pGy b;p;a;r2

e

??? N Xb þ Wp þ Za;Ir2

p

, pGr2

e

??;

ð17Þ

?and

???

??

p

??

, pGa A;r2

a

??

??, pGr2

a

?

??were identical to the prior distributions of Model T1(or Model T2).

BFT=G¼

pTd ¼ 0

pTd ¼ 0 y j

ðÞ

ðÞ¼

1

pTd ¼ 0 y j

ðÞ;

ð18Þ

400

J. Casellas et al.

Page 7

because pT(d = 0) = 1 (see equation (9)). Alternatively,

BFG=T¼pTd ¼ 0 y j

ðÞ

pTd ¼ 0

ðÞ

¼ pTd ¼ 0 y j

ðÞ:

ð19Þ

The BFT/Gcan be obtained by averaging the full conditional densities of each

cycle at d = 0 using the Rao-Blackwell argument [26]. At this point, compu-

tational simplicity is gained with Model T1(or a normal density for d = 0),

whereas Model T2tends to computationally unquantifiable extreme probabil-

ities when d is close to zero. A BFT/Ggreater than 1 indicates that the Student t

mixed model is more suitable, whereas a BFT/Gsmaller than 1 indicates that

the Gaussian mixed model is more suitable.

From the standard definition of the Bayes factor [13],

POT=G¼ BFT=G? PrOT=G¼ BFT=G?pT

pG

;

ð20Þ

where POT/Gis the posterior odds between models, PrOT/Gis the prior odds

between models, and pTand pGare the a priori probabilities for Student t

mixed model and Gaussian mixed model, respectively. In the standard devel-

opment of the Bayes factor described above, we assumed that prior odds were

1 and pTand pGwere both 0.5. Nevertheless, we could modify prior odds

depending on our a priori knowledge, e.g. Student t mixed model is a more

parameterized model and it could be easily penalized with a smaller-than-1

prior odds. Posterior odds can be viewed as the weighted value of the Bayes

factor, conditional to our a priori degree of belief.

2.3. Simulation studies

The Bayes factor methodology developed above was validated through sim-

ulation. Seven different scenarios were analyzed following a Student t residual

process, with degrees of freedom equal to 5 (d = 0.4), 10 (d = 0.2), 20 (d = 0.1),

50 (d = 0.04), 100 (d = 0.02), 200 (d = 0.01) and 300 (d = 0.007), respectively.

Twenty-five replicates were simulated for each case and each replicate included

five non-overlapping generations with 200 individuals (10 sires and 190 dams)

and random mating. Following Model T2, each individual had a phenotypic

record and was assigned its own independent cluster. Data were generated from

a normal density Nðl;Ir?

from equation (6). Note that l included a unique systematic effect (10 levels ran-

domly assigned with equal probability and sampled from a uniform distribution

between 0 and 1) and a normally distributed additive genetic effect generated

eÞ weighted by a cluster-characteristic value drawn

Student t versus Gaussian mixed models

401

Page 8

under standard rules [1]. Residual and additive genetic variances were equal to 1

and 0.5, respectively.

This simulation process generated seven different scenarios with 25 data sets

which were analyzed twice, through the previously described Bayes factor and

through a standard Gaussian model (Model G). For each analysis, a single chain

was launched that contained 100 000 rounds, after discarding the first 10 000

rounds as burn-in [19]. Comparisons between the two models were performed

through three approaches: (a) Bayes factor between nested models, (b) DIC

[20], and (c) correlation coefficient between simulated and predicted breeding

values (qa,a ˜). Note that DIC is based on the posterior distribution of the deviance

statistic [20], which is ?2 times the sampling distribution of the data as specified

in formula (2) or as the conjugated distribution of (5) and (6), p y b;p;a;r2

andQ

D b;p;a;r2

is the posterior expectation of the deviance statistic,

pD¼ D b;p;a;r2

D b;p;a;r2

h h 2 b;p;a;r2

e;d

??

??

m

j¼1p yjb;p;a;r2

gained with (2), DIC being calculated as D b;p;a;r2

e;d

e;d

e;d

e;d

e;s2

j

???

??

p s2

jd j

??

, respectively. Computational simplicity is

?

?? D?b;? p;? a;? r2

?

2.4. Analysis of weight at six months in Pietrain pigs

e;d

?? pD where

?

?

?

?is the mean of the deviance statistic and?h is the mean value of

?

?

e;?d

??is the effective number of parameters,

??.

After editing, 2330 records of live weight at six months in Pietrain pigs were

analyzed, with an average weight (± SE) of 102.9 (± 0.265) kg. These pigs

were randomly chosen from 641 litters from successive generations grouped

in 135 batches during the fattening period, and their records were collected

between years 2003 and 2006 in a purebred Pietrain farm registered in the

reference Spanish Databank (BDporc?, http://www.bdporc.irta.es). At the

beginning of the fattening period (two months of age), batches were created with

pigs from different litters in order to homogenize piglet weight, and these groups

were maintained up to slaughter (six months of age). Pigs were reared under

standard farm management during the suckling and fattening periods. Pedigree

expanded up to five generations and comprised 2601 individuals, with 109 boars

and 337 dams with known progeny.

The operational model included the additive genetic effect of each individual,

the permanent environmental effect characterized by the batch during the fatten-

ing period, and three systematic sources of variation: sex (male or female),

year · season with 11 levels, and age at weighing (180.0 ± 0.3 days) treated

as a covariate. Data were analyzed by applying the Bayes factor described above

and assuming a different cluster for each pig with phenotypic data. To easily

402

J. Casellas et al.

Page 9

compare this method with a standard Gaussian model, data were also analyzed

under Model G. The empirical correlation between estimated breeding values

(posterior mean) was calculated in the two models and, as for the simulated data

sets, DIC was calculated for Model T and Model G. Each Gibbs sampler ran

with a single chain of 450 000 rounds after discarding the first 50 000 iterations

as burn-in [19].

3. RESULTS

3.1. Simulated datasets

Summarized results of the 25 replicates for each simulated Student t process

(5, 10, 20, 50, 100, 200 and 300 degrees of freedom) are shown in Table I.

Estimates for additive genetic variance showed coherent behavior with average

estimates slightly greater than 0.5. Average residual variance estimated using the

Student t mixed model clearly agreed with the simulated value. Nevertheless,

residual variance was clearly over-estimated for simulations with few degrees

of freedom in which a Gaussian mixed model was applied, showing higher stan-

dard errors in data sets with few degrees of freedom. Simulations with 5 degrees

of freedom showed the highest average residual variance under the Gaussian

mixed model (1.664 ± 0.038), whereas the average residual variance was

reduced to 1.222 ± 0.025 for replicates with 10 degrees of freedom, and con-

verged to one for datasets with 300 degrees of freedom (showing a standard

error smaller than 0.020). Under the Student t mixed model, average estimates

of degrees of freedom fitted with true values without any noticeable bias,

although precision decreased with larger degrees of freedom (Tab. I). Substantial

discrepancies were observed between the two models in terms of predicted

breeding values in extreme heavy-tailed simulations. Although the correlation

coefficients between predicted breeding values in the Student t and Gaussian

mixed models increased quickly in line with the degrees of freedom, the empir-

ical correlation in replicates with 5 degrees of freedom was very small (0.377 ±

0.030) and average correlations greater than 0.9 were observed in simulations

with 100 or more degrees of freedom (Tab. I).

Empirical correlations between simulated and predicted breeding values

increased with degrees of freedom in both the Student t and Gaussian mixed

models, although the Student t mixed model reached higher correlations when

simulated degrees of freedom were small. As seen in Table II, simulations under

extremely heavy-tailed processes (5 degrees of freedom) showed average corre-

lations of 0.420 and 0.377 for Student t and Gaussian mixed models, respec-

tively, suggesting substantial bias for genetic evaluations performed with

Student t versus Gaussian mixed models

403

Page 10

Table I. Variance component (· 100), degrees of freedom and breeding value correlation estimates (mean ± SE).

Simulation (m) Student t mixed model

~ r2

e

106.8 ± 2.5

98.8 ± 2.1

98.0 ± 1.4

99.8 ± 1.7

101.5 ± 1.7

101.9 ± 1.8

101.5 ± 1.8

Gaussian mixed model

~ r2

a

55.2 ± 3.5

56.7 ± 2.3

52.3 ± 1.8

50.7 ± 2.5

50.8 ± 2.4

51.9 ± 1.7

52.3 ± 2.0

~ r2

a

~ m

~ r2

e

qT;G

550.7 ± 2.1

55.5 ± 2.0

52.1 ± 2.1

51.3 ± 2.1

51.7 ± 2.5

50.5 ± 1.9

51.8 ± 2.0

5.0 ± 0.3

10.4 ± 0.4

22.3 ± 0.9

53.9 ± 1.0

102.5 ± 1.2

200.2 ± 1.3

304.1 ± 1.5

166.4 ± 3.8

122.2 ± 2.5

108.8 ± 1.5

104.7 ± 2.0

103.9 ± 1.7

102.2 ± 1.8

101.6 ± 1.9

0.377 ± 0.030

0.438 ± 0.028

0.632 ± 0.025

0.862 ± 0.019

0.961 ± 0.010

0.997 ± 0.001

0.999 ± 0.001

10

20

50

100

200

300

qT,G: Empirical correlation between predicted breeding values in Student t and Gaussian mixed models.

404

J. Casellas et al.

Page 11

standard Gaussian models when normality did not hold. Differences between the

two models quickly decreased with increasing degrees of freedom and were

almost negligible for m = 20 and higher values.

As was expected, both the Bayes factor and DIC clearly favored Student t

mixed models in simulated scenarios with few degrees of freedom and showed

similar behavior throughout the analyzed framework (Fig. 1A). Datasets simu-

lated under a residual Student t process with 5 degrees of freedom reached an

average Bayes factor favoring the Student t mixed model of 4.6 · 1092, and

the average difference between DIC was also huge (? 490). Although both com-

parison criteria decreased when degrees of freedom increased, our results sug-

gest that the Student t mixed model was preferable instead of the Gaussian

model up to 20 degrees of freedom, and that even for simulations with

50 degrees of freedom, the superiority of the Student t model remained almost

total (Tab. III). Substantial discrepancies between the Bayes factor and DIC

appeared in the last two scenarios (200 and 300 degrees of freedom). While

the average Bayes factor slightly favored the Gaussian model, the DIC contin-

ued to produce smaller estimates for the Student t model, although with only a

minimal difference in the last scenario (Tab. II). This suggests that the Bayes fac-

tor was more conservative, favoring the less parameterized model. This hypoth-

esis was confirmed in Table III where the Bayes factor supported the Gaussian

mixed model in 0, 0, 0, 1, 11, 18 and 19 data sets (for simulations with 5, 10, 20,

Table II. Comparison criteria (average estimates) between Student t and Gaussian

mixed models for each simulation scenario.

Simulation (m) Student t

mixed model

Gaussian

mixed model

~ qa;~ a

0.420

0.443

0.463

0.470

0.469

0.470

0.470

DICT

3116

2895

2854

2842

2841

2840

2840

~ qa;~ a

0.377

0.431

0.460

0.468

0.469

0.469

0.470

DICG

3606

3112

2939

2887

2867

2850

2841

DICDiff.

?490

?217

?85

?45

?27

?10

?1

BFT/G

4:6 ? 1092

1:2 ? 1071

2:2 ? 1034

1:8 ? 1014

5:7 ? 104

0.569

0.420

5

10

20

50

100

200

300

~ qa;~ a: Empirical correlation between simulated and predicted breeding values.

DICT: Deviance information criterion for the Student t mixed model.

DICG: Deviance information criterion for the Gaussian mixed model.

DICDiff.= DICT– DICG.

BFT/G: Bayes factor of the Student t mixed model against the Gaussian mixed model.

Student t versus Gaussian mixed models

405

Page 12

Table III. Distribution of the Bayes factors (log10(BFT/G)) between Student t and Gaussian mixed models (and the number of Gaussian

models favored by the DIC) for each simulation scenario.

Simulation (m)

5

10

20

50

100

200

300

(?10,?1]

0 (0)

0 (0)

0 (0)

0 (0)

3 (1)

3 (2)

11 (5)

?? b:

(?1, 0]

0 (0)

0 (0)

0 (0)

1 (0)

8 (2)

15 (3)

8 (5)

(0, 1](1, 10](10, 50](50, 100](100, 150] (150, 200] Overall

0 (0)

0 (0)

0 (0)

1 (0)

6 (0)

7 (1)

6 (1)

0 (0)

1 (0)

9 (0)

20 (0)

8 (0)

0 (0)

0 (0)

2 (0)

19 (0)

16 (0)

3 (0)

0 (0)

0 (0)

0 (0)

14 (0)

4 (0)

0 (0)

0 (0)

0 (0)

0 (0)

0 (0)

6 (0)

1 (0)

0 (0)

0 (0)

0 (0)

0 (0)

0 (0)

3 (0)

0 (0)

0 (0)

0 (0)

0 (0)

0 (0)

0 (0)

25 (0)

25 (0)

25 (0)

25 (0)

25 (3)

25 (6)

25 (11)

a;b?ð¼ a < log10BFT=G

?

406

J. Casellas et al.

Page 13

50, 100, 200 and 300 degrees of freedom), whereas DIC favored the Gaussian

mixed model in 0, 0, 0, 0, 3, 5 and 10 data sets, respectively (Fig. 1B).

3.2. Analysis of Pietrain pig weight at six months

Analysis under a Student t linear mixed model placed the highest posterior

density region at 95% (HPD95) for the d parameter between 0.004 and 0.009,

with the modal value at 0.005 (Tab. IV). Degrees of freedom (m) determine

Figure 1. Plot of differences in DIC between Student t and Gaussian mixed models

against log10(BFT/G) for all replicates (A) and for combinations close to 0 (B).

Student t versus Gaussian mixed models

407

Page 14

the slope of the Student t distribution and they were consequently drawn within

a wide HPD95 (111.20–235.70) and the mode reached 191.12. The posterior

distribution of m was roughly symmetrical (Fig. 2). The posterior mean of the

weights (s2

i) for the Student t mixed model ranged from 0.924 to 1.026, although

TableIV.Summary statistics forthe analysis of pig weight at six months under Student t

and Gaussian mixed models.

HPD95 = highest posterior density region at 95%.

~ c2¼ ~ r2

~h2¼ ~ r2

p= ~ r2

aþ ~ r2

~ r2

pþ ~ r2

pþ ~ r2

e

?

e

?

.

a=

aþ ~ r2

?

:

?

MeanMode HPD95

Student t model

~ r2

a

~ r2

p

~ r2

e

~ c2

~h2

~d

~ m

Gaussian model

~ r2

a

~ r2

p

~ r2

e

~ c2

~h2

36.87

35.16

68.72

0.248

0.260

0.006

177.95

35.92

35.86

68.63

0.239

0.254

0.005

191.12

24.81–50.06

22.33–48.83

59.97–77.32

0.177–0.321

0.178–0.341

0.004–0.009

111.20–235.70

36.21

35.88

69.54

0.247

0.255

35.84

35.81

69.01

0.237

0.256

24.67–49.21

22.32–48.42

60.91–77.82

0.179–0.318

0.173–0.338

Figure 2. Posterior distribution of degrees of freedom for weight at six months in

Pietrain pigs.

408

J. Casellas et al.

Page 15

the majority were located between 0.975 and 1.025 (95.19%; Tab. V). In this

sense, values smaller than 0.925 were in a minority (0.22%), although they

could have had a substantial influence as outliers. It is important to highlight

the fact that differences in variance component estimation between the Gaussian

and Student t mixed models were minimal (Tab. IV), with similar values for her-

itability (0.256 and 0.254, respectively) and for the coefficient of common envi-

ronment (0.237 and 0.239, respectively).

The Bayes factor favored the Student t model rather than the Gaussian model

(BFT/G= 2.532), although the small size of this value is not worth more than a

bare mention according to Jeffreys’ [12] scale of evidence. He/she suggested that

differences could be minimal, although DIC reported substantial discrepancies

between the Student t mixed model (16 445) and the Gaussian mixed model

(16 450); this difference was smaller than the average difference between DIC

in simulations with 200 degrees of freedom. On the contrary, the empirical

correlation coefficient between predicted breeding values in each model was

0.999 (this was 0.993 if only breeding animals were considered) and showed

that, although the posterior probabilities of both models were slightly (Bayes

factor) or substantially (DIC) different, genetic evaluations performed with a

Student t or a Gaussian mixed model provided an almost identical genetic rank-

ing for this data set.

4. DISCUSSION

The Bayes factor as originally proposed by Verdinelli and Wasserman [25]

has been applied to various models used in animal breeding and the genetics

research field. Although the method was initially developed to test for the

genetic background of linear traits [5] and the location of quantitative trait loci

(QTL) [23], this Bayes factor has been recently modified to discriminate

Table V. Distribution of the posterior mean value of weights (s2

weight at six months under Student t mixed models.

i) for the analyses of pig

n

%

0:900 < s2

0:925 < s2

0:950 < s2

0:975 < s2

1:000 < s2

1:025 < s2

i? 0:925

i? 0:950

i? 0:975

i? 1:000

i? 1:025

i? 1:050

Overall

50.22

0.64

3.78

34.29

60.90

0.17

100.00

15

88

799

1419

4

2330

Student t versus Gaussian mixed models

409