Content uploaded by Claudia Damo Bertoli

Author content

All content in this area was uploaded by Claudia Damo Bertoli on Apr 17, 2016

Content may be subject to copyright.

500

INTRODUCTION

The benet of crossbreeding includes improve-

ment in growth and carcass traits (Williams et al.,

2010) and has been studied by many researchers over

the past decade (Roso et al., 2005a,b; Carvalheiro et al.,

2006; Dias et al., 2011). For the purpose of selection

for production, the greatest challenge is the nonbiased

comparison of breeding bulls and dams of different

breed compositions. In crossbred populations, effects

that are assumed to be null in pure populations become

important and must be taken into account. For a fair

comparison, a multibreed genetic evaluation including

crossbred and purebred individuals in the same data set

is required, as proposed by Arnold et al. (1992).

Multibreed analysis requires the inclusion of ef-

fects for direct and maternal breed additive, heterosis

(Cardoso et al., 2008; Williams et al., 2010), epistatic

loss (Dias et al., 2011), and complementarity between

different breeds (Carvalheiro et al., 2006; Cardoso et

al., 2008) effects. However, this model may be dif-

cult to t if the data structure does not adequately

sample all the genetic relationships. The additive ge-

netic effect for each breed involved and their combin-

ing ability, general or specic, should be considered.

The nonadditive genetic effects are usually included as

covariates (Carvalheiro et al., 2006; Dias et al., 2011).

The inclusion of these effects as xed covariates may

Comparing methodologies to estimate xed genetic effects

and to predict genetic values for an Angus × Nellore cattle population

C. D. Bertoli,*†1 J. Braccini,† V. M. Roso‡

*Departamento de Zootecnia, Instituto Federal Catarinense Campus Camboriú

(IFC-Camboriu), Camboriú, SC, Brasil; †Departamento de Zootecnia, Universidade Federal do Rio Grande

do Sul (UFRGS), Porto Alegre, RS, Brazil; and ‡GenSys Consultores Associados, Porto Alegre, RS, Brazil

ABSTRACT: The study assesses the need for and

effectiveness of using ridge regression when estimat-

ing regression coefcients of covariates representing

genetic effects due to breed proportion in a crossbreed

genetic evaluation. It also compares 2 ways of select-

ing the ridge parameters. A large crossbred Angus ×

Nellore population with 294,045 records for weaning

gain and 148,443 records for postweaning gain was

used. Phenotypic visual scores varying from 1 to 5 for

weaning and postweaning conformation, weaning and

postweaning precocity, weaning and postweaning mus-

cling, and scrotal circumference were analyzed. Three

models were used to assess the need for ridge regres-

sion, having 4, 6, and 8 genetic covariates. All 3 models

included the xed contemporary group effect and ran-

dom animal, maternal, and permanent environment

effects. Model AH included xed direct and maternal

breed additive and the direct and maternal heterosis

covariates, model AHE also included direct and mater-

nal epistatic loss covariates, and model AHEC further

included direct and maternal complementarity effects.

The normal approach is to include these covariates as

xed effects in the model. However, being all derived

from breed proportions, they are highly collinear and,

consequently, may be poorly estimated. Ridge regres-

sion has been proposed as a method of reducing the

collinearity. We found that collinearity was not a prob-

lem for models AH and AHE. We found a high variance

ination factor, >20, associated with some maternal

covariates in the AHEC model reecting instability of

the regression coefcients and that this instability was

well addressed by using ridge regression using a ridge

parameter calculated from the variance ination factor.

Key words: complementarity, crossbred beef cattle evaluation,

epistatic loss, heterosis, nonadditive genetic effects, ridge regression.

© 2016 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2016.94:500–513

doi:10.2527/jas2015-9344

1Correspondig author: cdbertoli@gmail.com

Received May 25, 2015.

Accepted November 9, 2015.

Published February 12, 2016

Comparing methodologies 501

lead to problems with collinearity; the high correlations

among the covariates may result in poor estimates with

large SE (Schabenberger and Pierce, 2002). Ridge re-

gression is a numerical modication to least squares

regression that seeks to address this problem.

The objective of this study is to compare the least

squares and ridge regression methodologies in the

estimation of xed genetic effects and in the predic-

tion of breeding values for a crossbred population of

Angus and Nellore beef cattle. We also compare 2

ways of setting the ridge parameter to use.

MATERIALS AND METHODS

Animal Care and Use Committee approval was not

obtained for this study because the data were obtained

from an existing database.

Data with different breed compositions, resulting

from crosses between Angus and Nellore cattle, were

used. This data came from more than 200 herds distrib-

uted across the Brazilian states of Rio Grande do Sul,

Paraná, São Paulo, Mato Grosso, Mato Grosso do Sul,

and Goiás and also from Paraguay. The change in the

estimated values for the xed genetic effects for the traits

of weaning gain (WG) and postweaning gain (PG) over

16 yr of genetic evaluations was taken every 2 yr (1994–

2010). All herds were participants of Programa Natura

de Melhoramento Genético de Bovinos (Natura Cattle

Breeding Program, Brazil; Table 1).

A connectedness analysis between contemporary

groups was performed according to Roso and Schenkel

(2006). The total number of genetic links between con-

temporary groups, due to all common ancestors (sire,

dam, grand sire, grand dam, etc.) weighted by the addi-

tive relationships, was used. Contemporary groups with

at least 5 genetic links were considered connected and

retained for analysis. Roso et al. (2004) related that as the

degree of connectedness among test groups decreases,

the accuracy of comparisons of predicted breeding val-

ues (EBV) of bulls in different test groups also decreases.

The 8 genetic breed covariates were direct and ma-

ternal covariates for additive effect, heterosis, epistatic

loss, and complementarity. The proportion of Nellore

genes in the genetic composition of the animal (aa)

and their dams (am) were used as covariates to esti-

mate the direct and maternal breed genetic effect. To

estimate the direct and maternal heterosis effects, the

heterozygosity coefcients ha and hm were used as de-

scribed by Bertoli (1991) and Schenkel (1993) and also

used by Cardoso et al. (2008), Pimentel et al. (2007),

and Roso et al. (2005a). These coefcients are given

by hm = 1 – [(NMaternalGrandSire × N

MaternalGrandDam)

+ (AMaternalGrandSire × A

MaternalGrandSire)] and ha = 1 –

[(NSire × N

Dam) + (ASire × A

Dam)], in which A refers

to the Angus breed proportion and N refers to Nellore

breed proportion. To estimate the direct and maternal

epistatic loss effects, the epistazygosity coefcients ea

and em were used, as proposed by Fries et al. (2000,

2002) and also used by Roso et al. (2005a,b), Pimentel

et al. (2006), Carvalheiro et al. (2006), and Cardoso et

al. (2008). These coefcients are given by ea = 1/2(Hs

+ Hd) and em = 1/2(Hmgs + Hmgd), in which Hs is the

sire heterozygosity, Hd is the dam heterozygosity, Hmgs

is the maternal grand sire heterozygosity, and Hmgd is

the maternal grand dam heterozygosity. When the breed

composition of a cow was not known (all progenies had

known breed composition), the cow was considered to

be inter se mating. Finally, the coefcients, proposed

by Kinghorn (1993) and also used by Fries et al. (2000),

Piccoli et al. (2002), and Cardoso et al. (2008), were used

to estimate the breed complementarity effects. The direct

complementarity (ca) coefcient is described as ca = aa

× (1.0 – aa) and the maternal complementarity (cm) coef-

cient is described as cm = am × (1.0 – am), in which aa

is the Nellore fraction of the animal breed and am is the

Nellore fraction of the dam breed composition.

Nine traits were analyzed: WG and PG; phenotypic

scores of weaning conformation (WC), weaning precoc-

ity (WP), and weaning muscling (WM); and phenotypic

scores of postweaning conformation (PC), postweaning

precocity (PP), and postweaning muscling (PM) as well

as scrotal circumference (SC) taken after weaning. The

phenotypic score for each trait is given on a 5-point scale,

in which 1 is the worst and 5 is the best score for each

management group.

The general model is described by Eq. [1]. The pair

traits WG-PG, WC-PC, WP-PP, WM-PM, and WG-SC

were analyzed in a bivariate analysis using the general

model for each trait:

y = Xβ + Wγ + Zα + ε, [1]

in which y is the vector of observations for the trait; β is the

vector of xed effects of environment, which includes con-

temporary group; γ is the vector of xed genetic effects; α

is the vector of random direct, maternal, and permanent

environmental of the dam effects; and ε is the vector of

random residual effects. Incidence matrices X, W, and Z

relate records to xed environmental effect of contempo-

rary group (herd sex, year, season; CG), to xed genetic

effects, and to random direct and maternal additive genetic

and permanent environmental effects, respectively.

The vectors of random effect α and ε

were assumed to have variance and covari-

ance = V(α) = 0

0

00

éù

ÄÄ

êú

êú

ÄÄ

êú

êú

Ä

ëû

a am

am m

AG AG

AG AG

IPe

and

Bertoli et al.

502

( )

=⊗VIRε, in which A is the additive numerator

relationship matrix among animals and I is the identity

matrix,

Ga =

2

112

2

12 2

aaa

aa a

ss

ss

, Gm =

2

112

2

12 2

mmm

mm m

ss

ss

,

Gam = 11 12

21 22

am am

am am

ss

ss

, Pe =

2

112

2

12 2

ppp

pp p

ss

ss

,

and R =

2

112

2

12 2

eee

ee e

ss

ss

,

in which σ2 refers to variance, σ refers to covariance, a

refers to direct additive genetic effects, m refers to ma-

ternal additive genetic effects, p refers to permanent

environmental effects, e refers to residual effects, and

1 refers to the rst trait and 2 refers to the second trait

on a 2-trait analysis.

Three versions of this model were considered

(Table 2). The base model, AH, contained 4 covari-

ables in W, the direct and maternal additive effects and

heterosis effects. The second model, AHE, contained

6 covariables in W, adding the direct and maternal

epistatic loss variables, and the third model, AHEC,

contained 8 covariables in W, adding the direct and

maternal complementarity variables.

The models described above were analyzed using the

least squares (LS) and ridge regression methodologies

(ridge regression 1 [R1] and ridge regression 1 [R2]) in

a 2-trait analysis. Data were preadjusted for xed effects

of animal age, dam age, and birth date (Julian). GenSys

Consultores Associados (Porto Alegre-RS, Brazil) devel-

oped the analysis programs in Fortran 95.

The ridge regression methodology (RR) is “…an

ad hoc regression method to combat collinearity. The

ridge regression estimator allows for some bias to break

the collinearity and thus reduce the mean square error

compared with ordinary least squares. The user must

choose the ridge parameter, a small number by which to

shrink the least squares estimates” (Schabenberger and

Pierce, 2002). The ridge regression estimator consists

of adding a small positive number to the diagonal of the

Ta bl e 1. Distributions of animals according to sire and dam breed composition, presented as percentage of Nellore,

used in two-trait weaning–postweaning gain analysis1

Sires

Dams

Total

Angus

One-eighth

Nellore

Two-eighths

Nellore

Three-eighths

Nellore

Four-eighths

Nellore

Five-eighths

Nellore

Six-eighths

Nellore

Seven-eighths

Nellore

Nellore

Angus 27,795 96 698 3,164 6,219 4,048 20,243 194 28,330 90,787

One-eighth

Nellore

52 5 24 10 71 0 2 0 17 181

Two-eighths

Nellore

85 206 760 1,434 6,780 163 3,520 1 502 13,451

Three-eighths

Nellore

1,305 371 2,238 78,159 43,194 2,690 6,061 148 18,713 152,879

Four-eighths

Nellore

47 125 96 687 4,066 218 1,587 28 5,947 12,801

Five-eighths

Nellore

1,170 0 14 443 1,046 1,186 1,084 21 1,053 6,017

Six-eighths

Nellore

8,668 4 5 697 466 181 285 3 185 10,494

Seven-eighths

Nellore

0 0 0 0 5 0 0 0 13 18

Nellore 241 31 52 1,341 4,726 236 738 51 1 7,417

Total 39,363 838 3,887 85,935 66,573 8,722 33,520 446 54,761 294,045

1Values are approximated to the closest class composition. Every class included fractions equal or smaller than the mentioned breed proportions.

Table 2. Fixed genetic effects included in the 3 geno-

typic models (AH including the direct and maternal

breed additive and heterosis effects, AHE including

the direct and maternal breed additive, heterosis and

epistatic loss effects and AHEC including the direct

and maternal breed additive, heterosis, epistatic loss

and complementarity effects) considered in this study

Model

Effects included on γ1

aa am ha hm ea em ca cm

AH xxxx

AHE xxxxxx

AHEC xxxxxxxx

1aa = direct breed additive effect; am = maternal breed additive effect;

ha = direct heterosis effect; hm = maternal heterosis effect; ea = direct

epistatic loss effect; em = maternal epistatic loss effect; ca = direct comple-

mentarity nonadditive effect; cm = maternal complementarity nonadditive

effect.

Comparing methodologies 503

W′W matrix to break the dependence among the col-

umns of the W′W matrix. No clear rule exists to choose

a ridge parameter so it must be empirically determined

(Dias et al., 2011). Two methods of determining the

ridge parameter, that is, the values of the nonnegative

diagonal elements of K (k1, k2, …, kp), were used in

this study.

The rst method (R1) was adapted from Roso et al.

(2005a) and estimated as ki = θ[(VIF

i/VIFmax)1/2], in

which i is the ith element of the diagonal matrix K, VIFi

is the variance ination factor (VIF) of the ith predictor

variable, and VIFmax is the maximum value of all VIFi.

Va r i a n c e i n a t i on f a c t o r ( VIFi) was given by VIFi = 1/

(1 – Ri2), in which Ri2 is the coefcient of determination

(Schabenberger and Pierce, 2002). The magnitude of the

elements ki will be proportional to the square root of the

VIF of each predictor variable. To choose the value of

θ, the bootstrap procedure, originally proposed by Efron

(1979) and described by Roso (2005b) was performed.

This is an iterative process, with the value of θ ranging

from 0.000 to 1.000 with increments of 0.002. For each

value of θ, 10 bootstrap samples were used to estimate

all effects of model [1] as well as VIF. For each value

of θ, an average of VIF, obtained with the 10 bootstrap

samples, was calculated. The lower value of θ, whose

average bootstrap samples generated VIF below 10 for

all xed genetic effects, was used in each analysis. With

this value of θ, a new analysis was performed, containing

the complete data set, for the estimation of all effects ac-

cording to model [1] and its reductions. Carvalheiro et al.

(2006) used a similar method, choosing different values

where all VIF also became lower than 10. The VIF of the

regression coefcients is a simple but effective measure

for diagnosing collinearity (Schabenberger and Pierce,

2002).

For the second method (R2), the θ was empirically

chosen when the graphical plots shows a stabilization

of xed genetic estimates with different values of θ

used. Cardoso et al. (2008) and Piccoli et al. (2002)

used the value of 0.06 and Lopes et al. (2010) used

0.05 for the diagonal of K. For this analysis, the cho-

sen value of θ was 0.06.

In the present study, the RR analysis was per-

formed after standardization (centering and scaling)

of predictor variables, as recommended by Marquardt

and Snee (1975) and Freund and Littell (2000). After

estimation, the estimates were transformed to the orig-

inal scale and were presented in this way.

Another measure of multicollinearity presented

by Schabenberger and Pierce (2002) is the condition

index (CI), the square root of the ratio of the largest

to the smallest eigenvalue of the correlation matrix

formed from WC′WC:

CIi = (λmax/λi)1/2,

in which λmax is the largest eigenvalue and λi is the ith

eigenvalue of the correlation matrix.

Small eigenvalues result in high CI, indicating a

potential collinearity problem. In the current investi-

gation, multicollinearity diagnostics were performed

using different measures: VIF, CI, and variance-de-

composition proportions associated with eigenvalues.

To compare the methodologies and models, we

used the relative efciency (RE), the stability of the

estimates over several generations, and the com-

parison between the across-breed EBV (AB-EBV).

Unbiasedness of an estimator may not necessarily be

the most desirable property. An estimator with a small

bias but with high accuracy may be preferred when the

unbiased estimator has a high variance. Schabenberger

and Pierce (2002) proposed RE, based on the mean

square error (MSE) of the functions that generated

such estimators. The RE of f(y) compared with g(y) as

estimators of a parameter λ was measured by the ratio

of their MSE. The RE, described in Schabenberger and

Pierce (2002), comprises RE[f(y), g(y)|λ] = MSE[g(y),

λ]/MSE[f(y), λ]. These authors suggest that if RE[f(y),

g(y)|λ] > 1, then f(y) should be preferred.

A simulation of data accumulation was performed

to demonstrate the variation of the estimates of xed

genetic effects over time, obtained by methodologies

LS, R1, and R2 and models AHEC, AHE, and AH.

The ridge parameter ki, i = 1, 2, …, p, was determined

using records accumulated from the beginning until

1994, 1995, …, 2010 and was used in the estimation of

xed genetic effects for each period. Ridge regression

methods and LS were then compared with respect to

stability of estimates over years.

The AB-EBV are the EBV added to the direct breed

additive effect, proportional to its breed composition.

RESULTS AND DISCUSSION

Multicollinearity

Eigenvalues and Condition Index. Figure 1 shows

the eigenvalues of the correlation matrix among predic-

tion variables of the xed genetic effects and correspond-

ing CI for all traits to the AHEC model. This model si-

multaneously considers all covariates tested in this study

and is the most likely model to present problems of col-

linearity between covariates. It can be noted that, for any

of the traits, the CI is lower than 30. The CI associated

with the last eigenvalue were between 20 and 30 for all

traits. The CI associated with the second smallest eigen-

value were greater than 10. When we consider the AHE

Bertoli et al.

504

and AH models (not shown), all CI were lower than 10

(the highest CI was 7.71 for SC in the AHE model).

Dias et al. (2011) considered that collinearity is

strong when CI is greater than 30 and weak when it

is between 10 and 30. For CI below 10, any linear

dependence that exists should not be problematic

(Schabenberger and Pierce, 2002; Roso et al., 2005a;

Dias et al., 2011). The results presented here show val-

ues within this range of 10 to 30.

The variance-decomposition proportions associ-

ated with the largest CI, presented in Table 3, suggest

that the strongest collinearity problem is between the

covariates used to estimate the effects of the maternal

components of heterosis, epistatic loss, and comple-

mentarity. When the largest CI was analyzed, these

covariates showed variance fractions greater than

80% for all traits. Observing the second largest CI,

the strongest association is between covariates that

are estimating the direct components of the same ef-

fects (heterosis, epistatic loss, and complementarity)

involving fractions around 60 to 70%. This is the best

method to detect collinearity between covariates.

When we are trying to estimate different effects from

the same information, we will just nd collinearity. Roso

et al. (2005a, p. 1793) state that “Multicollinearity in-

volving breed composition can be partially explained by

the mathematical constraint among breeds because breed

proportions of the breed composition of an animal add to

1 and the breed composition of a calf is equal to average

breed compositions of the sire and the dam.” The use of

breed composition, however, is presented as a relatively

simple measure to obtain and can be interpreted as if we

were looking at the same object from several different

perspectives (Fries et al., 2000). Dias et al. (2011) men-

tioned the value of 0.5 as a threshold to empirically de-

termine a strong linear relationship between the different

components of variance. Independent of this empirical

value, the data analyzed here has shown really high val-

ues (above 0.80) for the xed effects components. In this

case, to make a further check through VIF is still possi-

ble (Schabenberger and Pierce, 2002; Roso et al., 2005a;

Dias et al., 2011; Petrini et al., 2012).

Variance Ination Factor. Figure 2 shows the

VIF for all tested models and traits. It can easily be

seen that the AHE and AH models, regardless of the

methodology used to estimate the xed genetic ef-

fects, show low collinearity, whether strong or mod-

erate. Only the AHE model, when estimated by LS,

presented a VIF greater than 10 and it is only for the

SC trait (10.66) for direct epistatic loss.

The AHEC model, when estimated by LS, shows

VIF above 10 and sometimes above the upper threshold

of 30 (110 for ca for the SC trait). The breed additive

components show no VIF values that indicate moderate

or severe collinearity problems in any of the analyzed

models for the estimation of the xed genetic effects.

Ta bl e 3 . Decomposition of the variance structure of the parameter estimates associated with the 2 largest condition indices

(model AHEC that includes direct and maternal breed additive, heterosis, epistatic loss and complementarity effects)

Condition

index1

WG2WC2WP2WM2SC2

21.25 10.44 21.06 10.53 20.44 10.41 20.82 10.34 28.57 14.59

aa 0.00 0.06 0.00 0.07 0.00 0.07 0.00 0.07 0.00 0.11

am 0.01 0.09 0.01 0.10 0.01 0.11 0.01 0.10 0.01 0.18

ha 0.18 0.62 0.18 0.62 0.18 0.60 0.17 0.62 0.12 0.68

hm 0.94 0.01 0.94 0.01 0.94 0.01 0.94 0.01 0.94 0.02

ea 0.15 0.73 0.15 0.73 0.16 0.72 0.15 0.74 0.10 0.78

em 0.84 0.09 0.83 0.09 0.83 0.10 0.84 0.09 0.89 0.06

ca 0.18 0.65 0.18 0.65 0.19 0.64 0.18 0.65 0.12 0.70

cm 0.96 0.03 0.96 0.03 0.96 0.04 0.96 0.03 0.97 0.02

PG2PC2PP2PM2

Condition index 24.46 12.58 22.83 11.88 22.00 11.66 22.35 11.50

aa 0.01 0.07 0.01 0.07 0.01 0.08 0.01 0.08

am 0.01 0.09 0.01 0.09 0.02 0.11 0.01 0.10

ha 0.23 0.63 0.26 0.59 0.25 0.57 0.25 0.59

hm 0.95 0.00 0.95 0.00 0.95 0.00 0.95 0.00

ea 0.20 0.67 0.23 0.65 0.23 0.65 0.22 0.66

em 0.83 0.11 0.81 0.13 0.80 0.14 0.81 0.12

ca 0.24 0.65 0.26 0.62 0.26 0.62 0.25 0.62

cm 0.95 0.04 0.95 0.05 0.94 0.05 0.95 0.04

1aa = direct breed additive effect; am = maternal breed additive effect; ha = direct heterosis effect; hm = maternal heterosis effect; ea = direct epistatic

loss effect; em = maternal epistatic loss effect; ca = direct complementarity nonadditive effect; cm = maternal complementarity nonadditive effect.

2WG = weaning gain; WC = weaning conformation; WP = weaning precocity; WM = weaning muscling; SC = scrotal circumference; PG = postweaning

gain; PC = postweaning conformation; PP = postweaning precocity; PM = postweaning muscling.

Comparing methodologies 505

Figure 1. Eigenvalues and condition index of the correlation matrix among the prediction variables of xed genetic effects under model AHEC that

includes direct and maternal breed additive, heterosis, epistatic loss and complementarity effects. WG = weaning gain; PG = postweaning gain; WC = wean-

ing conformation; PC = postweaning conformation; WP = weaning precocity; PP = postweaning precocity; WM = weaning muscling; PM = postweaning

muscling; SC = scrotal circumference.

Bertoli et al.

506

The prediction variable of direct heterosis presented

values above 10 only for postweaning traits and SC, un-

der the AHEC model and LS methodology. For this same

model and methodology, the prediction variable of com-

ponent of complementarity remains at the same level for

all traits (between 10 and 23); maternal covariates, both

as complementarity and heterozygosity, presented a VIF

above 30 for all traits and the epistatic loss components

showed values above 10 and below 30 for all tested traits.

If a covariate is orthogonal to all the others, their

VIF is 1 (Schabenberger and Pierce, 2002). As the lin-

ear dependence increases, the VIF also increases. Dias

et al. (2011) suggested that the VIF may overestimate

the presence of multicollinearity, not differentiating

between high and low VIF values, making it impos-

sible to distinguish “quasi-dependence.” It is possible

to use a value of 10 for VIF as a cutoff value to indicate

that collinearity may be causing problems in the esti-

mation. Also, it is possible to use values above 30 to

indicate severe problems with collinearity between co-

variates (Schabenberger and Pierce, 2002; Schoeman et

al., 2002; Roso et al., 2005a; Dias et al., 2011). Dias et

Figure 2. Variance ination factors for the xed genetic effects, under 3 models (AHEC including the direct and maternal breed additive, heterosis,

epistatic loss and complementarity effects, AHE including the direct and maternal breed additive, heterosis and epistatic loss effects and AH including

the direct and maternal breed additive and heterosis effects and 3 methodologies (least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2]).

aa=direct breed additive, am=maternal breed additive, ha=direct heterosis, hm=maternal heterosis, ea=direct epistatic loss, em=maternal epistatic loss,

ca=direct complementarity and cm=maternal complementarity effects; WG = weaning gain; PG = postweaning gain; WC = weaning conformation; PC =

postweaning conformation; WP = weaning precocity; PP = postweaning precocity; WM = weaning muscling; PM = postweaning muscling; SC = scrotal

circumference.

Comparing methodologies 507

al. (2011) proposed only to verify the directions (posi-

tive or negative signals) of the values of xed genetic

effects. In this study, the directions have not changed

with the use of different methodologies; only their mag-

nitudes have changed. But this will be reected in the

breeding values, which will be discussed later.

Many studies propose ridge regression as an alter-

native to overcome the collinearity (Roso et al., 2005a;

Carvalheiro et al., 2006; Cardoso et al., 2008; Dias et

al., 2011; Petrini et al., 2012). The ridge regression

estimator consists of adding a small positive amount

on the diagonal of the W′W matrix, causing a reduc-

tion in the variance of the estimates at the expense of

introducing some bias. Therefore, the RR estimator of

γ takes the general form ˆ

ãk = (W′W + K)–1W′y, in

which K = diag(k1, k2, …, kp) and ki > 0. When all ki

elements are equal to 0,

ˆ

èk reduces to LS estimator.

This method, however, usually proposes an empirical

choice of the values of K. The K matrix should be

large enough to break the existing linear relationship

between the covariates and small enough to produce

the smallest possible bias (Schabenberger and Pierce,

2002).

When different methodologies are being con-

sidered to overcome the problem of collinearity,

Schabenberger and Pierce (2002) suggest the concept

of RE as an aid in decision making. Sometimes we

will have to choose between different optimal proper-

ties of estimation and it is necessary to establish clear

rules. It is not always possible to gather all the desir-

able properties in the same estimation process.

Methodologies and Models

Relative Efciency. The RE between the tested

methodologies is presented in Table 4. The values of

the chosen θ are also presented in Table 4 and vary

from 0.062 to 0.082. For the model AHEC, methodol-

ogies R1 and R2 showed a slight (1%) superiority over

LS for the traits PP and SC. The R2 method showed

better RE compared with LS and also in relation to the

R1 method for PG. For all other traits, there is no dif-

ference in RE between tested methodologies.

For the AHE model, estimates with LS and R1 are

the same for all traits, except for SC. For this trait, R1

and R2 showed a slightly higher efciency (1%) in

the estimation of the parameter when compared with

LS. For this model (AHE), when comparing LS with

R2, the biased estimator (R2) shows better RE just for

postweaning traits (PG, PC, PP, and PM).

With respect to the AH model, the comparison

shows slight superiority of R2 over LS for PG, PP, PM,

and SC. For the other traits, RE was equal to all differ-

ent methodologies.

According to the results obtained for RE, all test-

ed models have many similarities to each other, with

methodology R2 showing a slight advantage over the

other 2 for some of the postweaning traits. No refer-

ences were found with the use of RE in the choice of

methodologies for estimation of these parameters.

Comparisons of the Estimates of Fixed Genetic

Effects over Time. The change in the estimated values

for the xed genetic effects is shown in Fig. 3. The num-

ber of observations available in each analyzed period is

shown in Table 5. In the early years, the variation of the

estimated values for each of the xed genetic effects was

Table 4. Relative efciency between least squares (LS), ridge regression 1 (R1), and ridge regression 2 (R2)

methodologies for the 3 tested models

Trait1

AHEC2AHE2AH2

θ3RE4 (LS/R1) RE (LS/R2) RE (R1/R2) θ3RE (LS/R1) RE (LS/R2) RE (R1/R2) RE (LS/R2)

WG 0.064 1.00 1.00 1.00 1.00 1.00

WC 0.064 1.00 1.00 1.00 1.00 1.00

WP 0.062 1.00 1.00 1.00 1.00 1.00

WM 0.062 1.00 1.00 1.00 1.00 1.00

PG 0.066 1.00 1.01 1.01 1.01 1.01

PC 0.064 1.00 1.00 1.00 1.01 1.00

PP 0.062 1.01 1.01 1.00 1.01 1.01

PM 0.064 1.00 1.00 1.00 1.01 1.01

SC 0.082 1.01 1.01 1.00 0.062 1.01 1.01 1.00 1.01

1WG = weaning gain; WC = weaning conformation; WP = weaning precocity; WM = weaning muscling; PG = postweaning gain; PC = postweaning

conformation; PP = postweaning precocity; PM = postweaning muscling; SC = scrotal circumference.

2AHEC=model including the direct and maternal breed additive, heterosis, epistatic loss and complementarity effects, AHE, including the direct and

maternal breed additive, heterosis and epistatic loss effects and AH including the direct and maternal breed additive and heterosis effects.

3Values of θ relates to R1 method. Values not shown of θ are equal to 0. For R2 method, θ is equal to 0.06, and for the LS method, θ is equal to 0.

4RE = relative efciency.

Bertoli et al.

508

large compared with the variation in recent years, re-

gardless of the method and model used in this estimate.

Overall, from 2002, estimates became more stable for

all tested methodologies and models for the 9 analyzed

traits, and a clear tendency to stabilization can be seen

since 2002. It is possible that we got a better balance of

data of combinations of relationships from that year.

Over the years, estimates for aa, am, ha, and hm

were identical for WG, using the LS and R1 under the

AH model. For AHE model, estimates of am, ha, ea,

and em for WG were equal to the LS and R1 from the

year 2000. This suggests that there is no benet in us-

ing R1 as an alternative to LS in the AHE model.

For the direct (aa) and maternal (am) breed additive

effects, the greatest variation in estimates over the years

occurred when using the AHEC model with R1. Estimates

for the direct heterosis (ha) effect varied very little over

time, showing a little larger uctuation in AH models es-

timated by LS (WG) or R2 (PG). For evaluations subse-

quent to 2002, the estimated values of ha remain almost

constant although differing between models and method-

ologies used. When we report the maternal heterosis, the

largest oscillation is, undoubtedly, with the AHEC model

and the LS, showing an inverted peak in 1998.

Estimates of the direct epistatic loss effect showed

similar values for R1 and R2 under the full model (AHEC),

both for WG (Fig. 3A) and for PG (Fig. 3B), because

they both shrink the estimates but not exactly the same

because they use slightly different K matrix. Regarding

the maternal epistatic loss effect, it is possible to observe a

behavior not similar to that observed in maternal heterosis

estimates for both WG and PG and em effects were more

variable for PG over models than for WG.

For the direct and maternal complementarity effects,

the wide variation appears in the use of the LS meth-

odology in the early years. There is evidently a large

shrinkage effect on estimates of direct and maternal

complementarity, especially in the early years with rela-

tively little data. From 2000 began a period of stabiliza-

tion, and from 2002, the variation becomes very small,

even for the LS method. Although it can be perceived as

a parallel deviation, except for ca, for the trait WG (Fig.

3a), this bias tends to remain constant, which can be seen

from almost parallel lines on the chart. The deviation up

for cm (LS relative to RR) appears to be compensated

for by deviations down for ea, ca, hm, and em in the

WG data. In the PG data, except for the 1998 blip down

for ca before 2000, the deviation down for cm appears

to be compensated for by deviations up for ca, hm, and

em. Estimates of xed genetic effects for the other traits

showed equivalent behavior when estimated over the 16

yr.

We have shown that there is high collinearity in the

AHEC model indicated by high VIF values and high CI

values. Many authors have reported that high collinearity

leads to unstable estimates of coefcients having large SE

(Schabenberger and Pierce, 2002; Schoeman et al., 2002;

Roso et al., 2005a,b; Carvalheiro et al., 2006; Cardoso et

al., 2008; Dias et al., 2011; Bueno et al., 2012). The insta-

bility is reected in estimates widely changing when more

data is added, as seen in the 1998 results (Fig. 3). Table

1 shows a very unbalanced pattern for animals with vari-

ous proportions of the 2 breeds, and this was accentuated

in the early years of the breeding program. As more gen-

erations were added to the database, more diverse breed

compositions were added with better links over genera-

tions and so parameter estimates became for stable. Even

so, a quarter of the animals are derived from three-eighths

Nellore sires crossed with three-eighths Nellore dams.

Several identical estimates were found. This result

was expected in some cases. When the ridge parameter

used in R1, which is set according to the VIF ≤ 10, is

0, in fact, there is no ridge regression, remaining in use

the least squares methodology. When the ridge param-

eter used in R1 is very close to that used by R2 (θ(R2)

= 0.06 and θ(R1) = 0.062 or θ(R1) = 0.064), very close

estimates are generated.

The variation identied in the AHEC model can be

explained, at least in part, because this method uses dif-

ferent values for the ridge parameter depending on each

analyzed period. As the ridge parameter is depends on

the VIF values, it is calculated for each analysis and the

oscillation of the values present higher amplitude. The

serious uctuation in estimates over time is between

RR and LS model. Petrini et al. (2012) suggest, as an

alternative, in addition to improving and increasing

the amount of data, reducing the number of covariates

in the model. This is also clear in this study, compar-

ing the curves of the 3 analyzed models, presented in

Fig. 2 and 3. It is very reasonable to do that if the SE is

so large that the coefcient is not signicantly differ-

ent from 0. However, according to Lopes et al. (2010)

and Carvalheiro et al. (2006), it is important to include

the effects of epistatic loss in the evaluation model for

prediction of breeding values of crossbred animals.

The inclusion of complementarity also appears to be

Ta bl e 5 . Number of observations in the analysis over the years for weaning gain (WG) and postweaning gain (PG)

Year 1994 1996 1998 2000 2002 2004 2006 2008 2010

WG 48,826 54,442 75,468 105,165 149,820 193,439 227,156 275,138 294,045

PG 20,526 28,423 38,439 54,644 76,253 96,515 115,748 136,332 138,075

Comparing methodologies 509

important, although it should be further investigated

(Carvalheiro et al., 2006). However, in the early years,

it would have been sensible to omit ca and cm from

the model, until such time as you were condent in the

values. If the data does not provide a robust estimate, it

is seem not appropriate to include it.

We could not identify a reason that would justi-

fy the reversal of the estimated values for the xed

effects in 1998. Such behavior of the estimates may

come from the collected raw data or data structure in

that generation. What we can see is that relative stabil-

ity in the estimates of xed genetic effects begins in

2000.

Schoeman et al. (2002) state that, in cases of linear

dependence between the variables of the incidence ma-

trix (W in this study), the regression coefcient become

extremely unstable and “very sensitive to small random

errors in Y” and may have large uctuations with the

addition or removal of variables in the model. Our data

showed this behavior only in the early years, showing

stability after a certain point. Petrini et al. (2012) men-

tion this instability over time but neither Schoeman et

al. (2002) or Petrini et al. (2012) present results relating

to this variation, so it was not possible to make com-

parisons with other populations in this regard.

Across-Breed EBV Comparisons. Comparisons

between the AB-EBV obtained by different methodolo-

gies and different models of estimation were made by

calculating Pearson and Spearman correlations (Fig. 4)

and the percentage of coincident animals in different

percentage selections (top 1, 5, 10, 20, and 40%; Fig. 5).

Estimated breeding value of the animals was increased

by the additive direct genetic effect of breed, proportion-

al to its breed composition, aimed at across-breed com-

parison. The crop presented relates only to calves born in

2009 (n =17,694), aiming to simulate the latest genetic

evaluation of the tested herd, using data from individu-

als with complete information on weaning and yearling

traits. Comparisons of AB-EBV bulls and cows are not

displayed because they had a pattern similar to calves.

Correlations between Across-Breed EBV

Obtained by Least Squares, Ridge Regression 1, and

Ridge Regression 2. All correlations were high, rang-

ing from 0.83 to 1.0 (Fig. 4). The AHEC model in R1

and R2 has values equal to 1 because the ridge parame-

ters used in these evaluations were similar. For the AHE

and AH models, the LS and R1 estimates often agreed

because the ridge parameter was 0 in many cases. The

lowest correlation was found for WG followed by PG

between LS × R1 and R2 × LS. When the correlation

Figure 3. Fixed genetic effects estimated for (A) weaning gain (WG) and (B) postweaning gain (PG) traits estimated by 3 distinct methodologies

(least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2]) and models (AHEC including the direct and maternal breed additive, heterosis,

epistatic loss and complementarity effects, AHE including the direct and maternal breed additive, heterosis and epistatic loss effects and AH including the

direct and maternal breed additive and heterosis effects). aa=direct breed additive, am=maternal breed additive, ha=direct heterosis, hm=maternal heterosis,

ea=direct epistatic loss, em=maternal epistatic loss, ca=direct complementarity and cm=maternal complementarity effects.

Bertoli et al.

510

was close to 1, the calves in the top percentage class

predominantly were the same ones (Fig. 5).

Coincidences for Different Proportions of

Selected Animals. For the model AHEC, R1 and R2

were almost perfectly correlated for all the tested traits

(99 to 100%), and LS and R1 also were for the AHE

and AH models (90 to 100%), except for SC. This is

true even when only 1% of the top animals were select-

ed (Fig. 5). This is possibly due to the ridge parameter

obtained in the R1 being sometimes coincident with

R2 and sometimes coincident with 0 (LS). This is a

proper result of the R1, which sets the ridge parameter

depending on the VIF. These results are in complete

agreement with the correlations presented in Fig. 4.

Figure 4. Pearson and Spearman correlations for direct across-breed EBV for weaning gain (WG), postweaning gain (PG), weaning conformation (WC),

postweaning conformation (PC), weaning precocity (WP), postweaning precocity (PP), weaning muscling (WM), postweaning muscling (PM), and scrotal

circumference (SC) given by different methodologies (least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2]) and models (AHEC including

the direct and maternal breed additive, heterosis, epistatic loss and complementarity effects, AHE including the direct and maternal breed additive, heterosis

and epistatic loss effects and AH including the direct and maternal breed additive and heterosis effects).

Comparing methodologies 511

Figure 5. Percentage of coincidence for different proportion (top 1, 5, 10, 20, and 40%) of selected calves on the basis of direct across-breed EBV for

weaning gain (WG) and postweaning gain (PG) given by different methodologies (least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2])

and models (AHEC including the direct and maternal breed additive, heterosis, epistatic loss and complementarity effects, AHE including the direct and

maternal breed additive, heterosis and epistatic loss effects and AH including the direct and maternal breed additive and heterosis effects). WC = wean-

ing conformation; PC = postweaning conformation; WP = weaning precocity; PP = postweaning precocity; WM = weaning muscling; PM = postweaning

muscling; SC = scrotal circumference.

Bertoli et al.

512

When comparisons are made on the basis of the 1%

top animals, the percentages of coincidences varied

between 59.14 and 99.46%, Lower percentage of co-

incidences were observed for WG in the AHEC mod-

els (LS × R1 and R2 × LS, both 59.14%) and AHE (LS

× R2, 65.59%, and R1 × R2, 66.67%). These results

show that the changes in methodology, for these mod-

els, caused an important reranking of the AB_EBV.

Roso et al. (2005a) found 78 to 80% of coinci-

dences among the top 40% AB-EBV estimated by LS

and ridge regression using a model equivalent to AH

and around 80 to 85% for a model equivalent to AHE.

Inversely, when the comparison was made for the top

1% selected animals, the coincidence has dropped

to 60 to 65% in both cases. Our data showed greater

coincidence in general, with few below 60% (WG

[AHEC] for LS × R1 and LS × R2 and SC [AHE] for

LS × R2) and the great majority over 70% of the top

1% selected animal coincidences. These differences

conrm the fact that the choice of methodology has

signicant consequences on the genetic selection on

the animals, resulting in different rankings of animals

on the basis of AB-EBV.

Petrini et al. (2012) argued that the presence of col-

linearity can affect the accuracy of estimates, regard-

less of the intensity and, consequently, the accuracy

of inferences based on these results. This can turn into

wrong choices, of the animals, in applied breeding pro-

grams. Roso et al. (2005b) concluded that inadequate

separation of nonadditive genetic effects in the evalu-

ation model and multicollinearity in the analysis may

affect the ranking of animals when compared between

different breeds. Carvalheiro et al. (2006) suggest that

the ridge regression can and should be used to correct

the multicollinearity problems as in the estimation of

the effects of regression as in the predicted genetic val-

ues. They afrm that the prediction surfaces present

themselves more acceptable from a biological point of

view under this methodology. Dias et al. (2011) found

no differences in the signal (positive/negative) of the

estimates for LS and RR but found differences in mag-

nitudes and SE. They suggest that LS overestimates the

values, RR being more reliable by decreasing the SE

of estimates and increasing the accuracy, despite of the

inclusion of the bias.

Collinearity was found to be a problem in the analy-

sis of our data when all 8 breed covariates were included

in the model but not when direct and maternal comple-

mentarity were excluded. When collinearity was included,

the R1 method proved an effective way of addressing the

problem. Nevertheless, there was little impact on the addi-

tive breed component or on the within-breed EBV so that

there was high agreement between the animals that would

be selected for additive effect regardless of the model.

The data did not sustain the tting of the AHEC

model when it was restricted to pre-2004 data. After

then, the diversity in breed compositions represented

across generations was sufcient to t the AHEC model,

but the functional relationships among the breed geno-

type covariates meant that method R1 was preferable.

LITERATURE CITED

Arnold, J. W., J. K. Bertrand, and L. L. Benyshek. 1992. Animal

model for genetic evaluation of multibreed data. J. Anim. Sci.

70:3322–3332.

Bertoli, C. D. 1991. Sistema Cruza-Controle de produção e aval-

iação dos valores genéticos dentro de uma população bovina

sintética. (In Portuguese.) MSc Diss., Universidade Federal

do Rio Grande do Sul, Faculdade de Agronomia.

Bueno, R. S., R. D. A. Torres, J. B. S. Ferraz, P. S. Lopes, J. P. Eler,

G. B. Mourão, M. Almeida e Silva, and E. C. Mattos. 2012.

Métodos de estimação de efeitos genéticos não-aditivos para

características de peso e perímetro escrotal em bovinos de

corte mestiços. (In Portuguese.) R. Bras. Zootec. 41:1140–

1145. doi:10.1590/S1516-35982012000500009.

Cardoso, V. , S. A. De Queiroz, and L. A. Fries. 2008. Estimativas

de efeitos genotípicos sobre os desempenhos pré e pós- des-

mama de populaces. Introdução Mater. (Estimates of genotyp-

ic effects on pre and post-weaning performance in Hereford

× Nelore populations.) (In Portuguese.) R. Bras. Zootec.

37:1763–1773. doi:10.1590/S1516-35982008001000008.

Carvalheiro, R., E. C. G. Pimentel, V. Cardoso, S. A. Queiroz,

and L. A. Fries. 2006. Genetic effects on preweaning weight

gain of Nelore-Hereford calves according to different mod-

els and estimation methods. J. Anim. Sci. 84:2925–2933.

doi:10.2527/jas.2006-214.

Dias, R. A. P., J. Petrini, J. B. S. Ferraz, J. P. Eler, R. S. Bueno, A.

L. L. da Costa, and G. B. Mourão. 2011. Multicollinearity

in genetic effects for weaning weight in a beef cattle com-

posite population. Livest. Sci. 142:188–194. doi:10.1016/j.

livsci.2011.07.016.

Efron, B. 1979. Bootstrap methods: Another look at the Jackknife.

Ann. Stat. 7:1–26

Freund, R. J., and R. C. Littell. 2000. SAS System for Regression.

3rd ed. SAS Inst.,Inc., Cary, NC.

Fries, L. A., D. J. Johnston, H. Hearnshaw, and H. U. Graser. 2000.

Evidence of epistatic effects on weaning weight in crossbred

beef cattle. Asian-Australas. J. Anim. Sci. 13(Suppl. B):242.

Fries, L. A., F. S. Schenkel, V. M. Roso, F. V. Brito, J. L. P. Severo,

and M. L. Piccoli. 2002. “Epistazygosity” and epistatic ef-

fects. In: 7th World Congr. Genet. Appl. Livest. Prod., August

19–23, Montpellier, France. Session 17. Estimation of addi-

tive and non-additive genetic parameters.

Kinghorn, B. P. 1993. Design of livestock breeding programs.

Animal Genetics and Breeding Unit, The University of New

England, Armidale, NSW, Australia. p. 187–203.

Lopes, J. S., P. R. N. Rorato, T. Weber, R. O. de Araújo, M. D.

A. Dornelles, and J. G. Comin. 2010. Pre-weaning perfor-

mance evaluation of a multibreed Aberdeen Angus × Nellore

population using different genetic models Avaliação do de-

sempenho na pré desmama de uma população bovina mul-

tirracial Aberdeen Angus × Nelore utili. Rev. Bras. Zootec.

39:2418–2425.

Marquardt, D. W., and R. D. Snee. 1975. Ridge regression in prac-

tice. Am. Stat. 29:3–20.

Comparing methodologies 513

Petrini, J., R. Antonio, P. Dias, S. Fernanda, N. Pertile, J. P.

Eler, J. Bento, S. Ferraz, and G. B. Mourão. 2012. Degree

of multicollinearity and variables involved in linear depen-

dence in additive-dominant models. Pesqui. Agropecu. Bras.

47:1743–1750. doi:10.1590/S0100-204X2012001200010.

Piccoli, M. L., V. M. Roso, F. V. Brito, J. L. P. Severo, F. S.

Schenkel, and L. A. Fries. 2002. Additive, complementar-

ity (additive*additive), dominance, and epistatic effects on

preweaning weight gain of Hereford × Nelore calves. In:

7th World Congr. Genet. Appl. Livest. Prod., August 19–23.

Montpelier, France. p. 2000–2003.

Pimentel, E. D. C. G., S. A. De Queiroz, R. Carvalheiro, and L.

A. Fries. 2006. Estimativas de efeitos genéticos em bezerros

cruzados por diferentes modelos e métodos de estimação. (In

Portuguese.) R. Bras. Zootec. 35:1020–1027. doi:10.1590/

S1516-35982006000400012.

Pimentel, E. D. C. G., S. A. De Queiroz, R. Carvalheiro, and L.

A. Fries. 2007. Use of ridge regression for the prediction of

early growth performance in crossbred calves. Genet. Mol.

Biol. 30:536–544. doi:10.1590/S1415-47572007000400006.

Roso, V. M., and F. S. Schenkel. 2006. AMC – A computer pro-

gram to assess the degree of connectedness among con-

temporary groups. In: Proc. 8th World Congr. Genet. Appl.

Livest. Prod., August 13 to 18, 2006, Belo Horizonte, MG,

Brazil. Communication no. 27-26.

Roso, V. M., F. S. Schenkel and S. P. Miller. 2004. Degree of con-

nectedness among groups of centrally tested beef bulls. Can.

J.Anim.Sci. 84:37–47.

Roso, V. M., F. S. Schenkel, S. P. Miller, and L. R. Schaeffer.

2005a. Estimation of genetic effects in the presence of multi-

collinearity in multibreed beef cattle evaluation. J. Anim. Sci.

83:1788–1800.

Roso, V. M., F. S. Schenkel, S. P. Miller, and J. W. Wilton. 2005b.

Additive, dominance, and epistatic loss effects on prewean-

ing weight gain of crossbred beef cattle from different Bos

taurus breeds. J. Anim. Sci. 83:1780–1787.

Schabenberger, O., and F. J. Pierce. 2002. Contemporary statis-

tical models for the plant and soil sciences. Ed Taylor and

Francis Group, New York, NY.

Schenkel, F. S. 1993. Calculo das heterozigoses. (In Portuguese.)

GenSys Consultores Associados, Porto Alegre, Brazil.

Schoeman, S. J., M. A. Aziz, and G. F. Jordaan. 2002. The in-

uence of multicollinearity on crossbreeding parameter esti-

mates for weaning weight in beef cattle. S. Afr. J. Anim. Sci.

32:239–246.

Williams, J. L., I. Aguilar, R. Rekaya, and J. K. Bertrand. 2010.

Estimation of breed and heterosis effects for growth and car-

cass traits in cattle using published crossbreeding studies. J.

Anim. Sci. 88:460–466. doi:10.2527/jas.2008-1628.