ArticlePDF Available

Comparing methodologies to estimate fixed genetic effects and to predict genetic values for an Angus × Nellore cattle population

Authors:

Abstract and Figures

The study assesses the need for and effectiveness of using ridge regression when estimating regression coefficients of covariates representing genetic effects due to breed proportion in a crossbreed genetic evaluation. It also compares 2 ways of selecting the ridge parameters. A large crossbred Angus × Nellore population with 294,045 records for weaning gain and 148,443 records for postweaning gain was used. Phenotypic visual scores varying from 1 to 5 for weaning and postweaning conformation, weaning and postweaning precocity, weaning and postweaning muscling, and scrotal circumference were analyzed. Three models were used to assess the need for ridge regression, having 4, 6, and 8 genetic covariates. All 3 models included the fixed contemporary group effect and random animal, maternal, and permanent environment effects. Model AH included fixed direct and maternal breed additive and the direct and maternal heterosis covariates, model AHE also included direct and maternal epistatic loss covariates, and model AHEC further included direct and maternal complementarity effects. The normal approach is to include these covariates as fixed effects in the model. However, being all derived from breed proportions, they are highly collinear and, consequently, may be poorly estimated. Ridge regression has been proposed as a method of reducing the collinearity. We found that collinearity was not a problem for models AH and AHE. We found a high variance inflation factor, >20, associated with some maternal covariates in the AHEC model reflecting instability of the regression coefficients and that this instability was well addressed by using ridge regression using a ridge parameter calculated from the variance inflation factor.
Content may be subject to copyright.
500
INTRODUCTION
The benet of crossbreeding includes improve-
ment in growth and carcass traits (Williams et al.,
2010) and has been studied by many researchers over
the past decade (Roso et al., 2005a,b; Carvalheiro et al.,
2006; Dias et al., 2011). For the purpose of selection
for production, the greatest challenge is the nonbiased
comparison of breeding bulls and dams of different
breed compositions. In crossbred populations, effects
that are assumed to be null in pure populations become
important and must be taken into account. For a fair
comparison, a multibreed genetic evaluation including
crossbred and purebred individuals in the same data set
is required, as proposed by Arnold et al. (1992).
Multibreed analysis requires the inclusion of ef-
fects for direct and maternal breed additive, heterosis
(Cardoso et al., 2008; Williams et al., 2010), epistatic
loss (Dias et al., 2011), and complementarity between
different breeds (Carvalheiro et al., 2006; Cardoso et
al., 2008) effects. However, this model may be dif-
cult to t if the data structure does not adequately
sample all the genetic relationships. The additive ge-
netic effect for each breed involved and their combin-
ing ability, general or specic, should be considered.
The nonadditive genetic effects are usually included as
covariates (Carvalheiro et al., 2006; Dias et al., 2011).
The inclusion of these effects as xed covariates may
Comparing methodologies to estimate xed genetic effects
and to predict genetic values for an Angus × Nellore cattle population
C. D. Bertoli,*†1 J. Braccini,† V. M. Roso‡
*Departamento de Zootecnia, Instituto Federal Catarinense Campus Camboriú
(IFC-Camboriu), Camboriú, SC, Brasil; †Departamento de Zootecnia, Universidade Federal do Rio Grande
do Sul (UFRGS), Porto Alegre, RS, Brazil; and ‡GenSys Consultores Associados, Porto Alegre, RS, Brazil
ABSTRACT: The study assesses the need for and
effectiveness of using ridge regression when estimat-
ing regression coefcients of covariates representing
genetic effects due to breed proportion in a crossbreed
genetic evaluation. It also compares 2 ways of select-
ing the ridge parameters. A large crossbred Angus ×
Nellore population with 294,045 records for weaning
gain and 148,443 records for postweaning gain was
used. Phenotypic visual scores varying from 1 to 5 for
weaning and postweaning conformation, weaning and
postweaning precocity, weaning and postweaning mus-
cling, and scrotal circumference were analyzed. Three
models were used to assess the need for ridge regres-
sion, having 4, 6, and 8 genetic covariates. All 3 models
included the xed contemporary group effect and ran-
dom animal, maternal, and permanent environment
effects. Model AH included xed direct and maternal
breed additive and the direct and maternal heterosis
covariates, model AHE also included direct and mater-
nal epistatic loss covariates, and model AHEC further
included direct and maternal complementarity effects.
The normal approach is to include these covariates as
xed effects in the model. However, being all derived
from breed proportions, they are highly collinear and,
consequently, may be poorly estimated. Ridge regres-
sion has been proposed as a method of reducing the
collinearity. We found that collinearity was not a prob-
lem for models AH and AHE. We found a high variance
ination factor, >20, associated with some maternal
covariates in the AHEC model reecting instability of
the regression coefcients and that this instability was
well addressed by using ridge regression using a ridge
parameter calculated from the variance ination factor.
Key words: complementarity, crossbred beef cattle evaluation,
epistatic loss, heterosis, nonadditive genetic effects, ridge regression.
© 2016 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2016.94:500–513
doi:10.2527/jas2015-9344
1Correspondig author: cdbertoli@gmail.com
Received May 25, 2015.
Accepted November 9, 2015.
Published February 12, 2016
Comparing methodologies 501
lead to problems with collinearity; the high correlations
among the covariates may result in poor estimates with
large SE (Schabenberger and Pierce, 2002). Ridge re-
gression is a numerical modication to least squares
regression that seeks to address this problem.
The objective of this study is to compare the least
squares and ridge regression methodologies in the
estimation of xed genetic effects and in the predic-
tion of breeding values for a crossbred population of
Angus and Nellore beef cattle. We also compare 2
ways of setting the ridge parameter to use.
MATERIALS AND METHODS
Animal Care and Use Committee approval was not
obtained for this study because the data were obtained
from an existing database.
Data with different breed compositions, resulting
from crosses between Angus and Nellore cattle, were
used. This data came from more than 200 herds distrib-
uted across the Brazilian states of Rio Grande do Sul,
Paraná, São Paulo, Mato Grosso, Mato Grosso do Sul,
and Goiás and also from Paraguay. The change in the
estimated values for the xed genetic effects for the traits
of weaning gain (WG) and postweaning gain (PG) over
16 yr of genetic evaluations was taken every 2 yr (1994–
2010). All herds were participants of Programa Natura
de Melhoramento Genético de Bovinos (Natura Cattle
Breeding Program, Brazil; Table 1).
A connectedness analysis between contemporary
groups was performed according to Roso and Schenkel
(2006). The total number of genetic links between con-
temporary groups, due to all common ancestors (sire,
dam, grand sire, grand dam, etc.) weighted by the addi-
tive relationships, was used. Contemporary groups with
at least 5 genetic links were considered connected and
retained for analysis. Roso et al. (2004) related that as the
degree of connectedness among test groups decreases,
the accuracy of comparisons of predicted breeding val-
ues (EBV) of bulls in different test groups also decreases.
The 8 genetic breed covariates were direct and ma-
ternal covariates for additive effect, heterosis, epistatic
loss, and complementarity. The proportion of Nellore
genes in the genetic composition of the animal (aa)
and their dams (am) were used as covariates to esti-
mate the direct and maternal breed genetic effect. To
estimate the direct and maternal heterosis effects, the
heterozygosity coefcients ha and hm were used as de-
scribed by Bertoli (1991) and Schenkel (1993) and also
used by Cardoso et al. (2008), Pimentel et al. (2007),
and Roso et al. (2005a). These coefcients are given
by hm = 1 [(NMaternalGrandSire × N
MaternalGrandDam)
+ (AMaternalGrandSire × A
MaternalGrandSire)] and ha = 1
[(NSire × N
Dam) + (ASire × A
Dam)], in which A refers
to the Angus breed proportion and N refers to Nellore
breed proportion. To estimate the direct and maternal
epistatic loss effects, the epistazygosity coefcients ea
and em were used, as proposed by Fries et al. (2000,
2002) and also used by Roso et al. (2005a,b), Pimentel
et al. (2006), Carvalheiro et al. (2006), and Cardoso et
al. (2008). These coefcients are given by ea = 1/2(Hs
+ Hd) and em = 1/2(Hmgs + Hmgd), in which Hs is the
sire heterozygosity, Hd is the dam heterozygosity, Hmgs
is the maternal grand sire heterozygosity, and Hmgd is
the maternal grand dam heterozygosity. When the breed
composition of a cow was not known (all progenies had
known breed composition), the cow was considered to
be inter se mating. Finally, the coefcients, proposed
by Kinghorn (1993) and also used by Fries et al. (2000),
Piccoli et al. (2002), and Cardoso et al. (2008), were used
to estimate the breed complementarity effects. The direct
complementarity (ca) coefcient is described as ca = aa
× (1.0 – aa) and the maternal complementarity (cm) coef-
cient is described as cm = am × (1.0 – am), in which aa
is the Nellore fraction of the animal breed and am is the
Nellore fraction of the dam breed composition.
Nine traits were analyzed: WG and PG; phenotypic
scores of weaning conformation (WC), weaning precoc-
ity (WP), and weaning muscling (WM); and phenotypic
scores of postweaning conformation (PC), postweaning
precocity (PP), and postweaning muscling (PM) as well
as scrotal circumference (SC) taken after weaning. The
phenotypic score for each trait is given on a 5-point scale,
in which 1 is the worst and 5 is the best score for each
management group.
The general model is described by Eq. [1]. The pair
traits WG-PG, WC-PC, WP-PP, WM-PM, and WG-SC
were analyzed in a bivariate analysis using the general
model for each trait:
y = Xβ + Wγ + Zα + ε, [1]
in which y is the vector of observations for the trait; β is the
vector of xed effects of environment, which includes con-
temporary group; γ is the vector of xed genetic effects; α
is the vector of random direct, maternal, and permanent
environmental of the dam effects; and ε is the vector of
random residual effects. Incidence matrices X, W, and Z
relate records to xed environmental effect of contempo-
rary group (herd sex, year, season; CG), to xed genetic
effects, and to random direct and maternal additive genetic
and permanent environmental effects, respectively.
The vectors of random effect α and ε
were assumed to have variance and covari-
ance = V(α) = 0
0
00
éù
ÄÄ
êú
êú
ÄÄ
êú
êú
Ä
ëû
a am
am m
AG AG
AG AG
IPe
and
Bertoli et al.
502
( )
=⊗VIRε, in which A is the additive numerator
relationship matrix among animals and I is the identity
matrix,
Ga =
2
112
2
12 2
aaa
aa a
ss
ss



, Gm =
2
112
2
12 2
mmm
mm m
ss
ss



,
Gam = 11 12
21 22
am am
am am
ss
ss



, Pe =
2
112
2
12 2
ppp
pp p
ss
ss




,
and R =
2
112
2
12 2
eee
ee e
ss
ss



,
in which σ2 refers to variance, σ refers to covariance, a
refers to direct additive genetic effects, m refers to ma-
ternal additive genetic effects, p refers to permanent
environmental effects, e refers to residual effects, and
1 refers to the rst trait and 2 refers to the second trait
on a 2-trait analysis.
Three versions of this model were considered
(Table 2). The base model, AH, contained 4 covari-
ables in W, the direct and maternal additive effects and
heterosis effects. The second model, AHE, contained
6 covariables in W, adding the direct and maternal
epistatic loss variables, and the third model, AHEC,
contained 8 covariables in W, adding the direct and
maternal complementarity variables.
The models described above were analyzed using the
least squares (LS) and ridge regression methodologies
(ridge regression 1 [R1] and ridge regression 1 [R2]) in
a 2-trait analysis. Data were preadjusted for xed effects
of animal age, dam age, and birth date (Julian). GenSys
Consultores Associados (Porto Alegre-RS, Brazil) devel-
oped the analysis programs in Fortran 95.
The ridge regression methodology (RR) is “…an
ad hoc regression method to combat collinearity. The
ridge regression estimator allows for some bias to break
the collinearity and thus reduce the mean square error
compared with ordinary least squares. The user must
choose the ridge parameter, a small number by which to
shrink the least squares estimates” (Schabenberger and
Pierce, 2002). The ridge regression estimator consists
of adding a small positive number to the diagonal of the
Ta bl e 1. Distributions of animals according to sire and dam breed composition, presented as percentage of Nellore,
used in two-trait weaning–postweaning gain analysis1
Sires
Dams
Total
Angus
One-eighth
Nellore
Two-eighths
Nellore
Three-eighths
Nellore
Four-eighths
Nellore
Five-eighths
Nellore
Six-eighths
Nellore
Seven-eighths
Nellore
Nellore
Angus 27,795 96 698 3,164 6,219 4,048 20,243 194 28,330 90,787
One-eighth
Nellore
52 5 24 10 71 0 2 0 17 181
Two-eighths
Nellore
85 206 760 1,434 6,780 163 3,520 1 502 13,451
Three-eighths
Nellore
1,305 371 2,238 78,159 43,194 2,690 6,061 148 18,713 152,879
Four-eighths
Nellore
47 125 96 687 4,066 218 1,587 28 5,947 12,801
Five-eighths
Nellore
1,170 0 14 443 1,046 1,186 1,084 21 1,053 6,017
Six-eighths
Nellore
8,668 4 5 697 466 181 285 3 185 10,494
Seven-eighths
Nellore
0 0 0 0 5 0 0 0 13 18
Nellore 241 31 52 1,341 4,726 236 738 51 1 7,417
Total 39,363 838 3,887 85,935 66,573 8,722 33,520 446 54,761 294,045
1Values are approximated to the closest class composition. Every class included fractions equal or smaller than the mentioned breed proportions.
Table 2. Fixed genetic effects included in the 3 geno-
typic models (AH including the direct and maternal
breed additive and heterosis effects, AHE including
the direct and maternal breed additive, heterosis and
epistatic loss effects and AHEC including the direct
and maternal breed additive, heterosis, epistatic loss
and complementarity effects) considered in this study
Model
Effects included on γ1
aa am ha hm ea em ca cm
AH xxxx
AHE xxxxxx
AHEC xxxxxxxx
1aa = direct breed additive effect; am = maternal breed additive effect;
ha = direct heterosis effect; hm = maternal heterosis effect; ea = direct
epistatic loss effect; em = maternal epistatic loss effect; ca = direct comple-
mentarity nonadditive effect; cm = maternal complementarity nonadditive
effect.
Comparing methodologies 503
W′W matrix to break the dependence among the col-
umns of the W′W matrix. No clear rule exists to choose
a ridge parameter so it must be empirically determined
(Dias et al., 2011). Two methods of determining the
ridge parameter, that is, the values of the nonnegative
diagonal elements of K (k1, k2, …, kp), were used in
this study.
The rst method (R1) was adapted from Roso et al.
(2005a) and estimated as ki = θ[(VIF
i/VIFmax)1/2], in
which i is the ith element of the diagonal matrix K, VIFi
is the variance ination factor (VIF) of the ith predictor
variable, and VIFmax is the maximum value of all VIFi.
Va r i a n c e i n  a t i on f a c t o r ( VIFi) was given by VIFi = 1/
(1 – Ri2), in which Ri2 is the coefcient of determination
(Schabenberger and Pierce, 2002). The magnitude of the
elements ki will be proportional to the square root of the
VIF of each predictor variable. To choose the value of
θ, the bootstrap procedure, originally proposed by Efron
(1979) and described by Roso (2005b) was performed.
This is an iterative process, with the value of θ ranging
from 0.000 to 1.000 with increments of 0.002. For each
value of θ, 10 bootstrap samples were used to estimate
all effects of model [1] as well as VIF. For each value
of θ, an average of VIF, obtained with the 10 bootstrap
samples, was calculated. The lower value of θ, whose
average bootstrap samples generated VIF below 10 for
all xed genetic effects, was used in each analysis. With
this value of θ, a new analysis was performed, containing
the complete data set, for the estimation of all effects ac-
cording to model [1] and its reductions. Carvalheiro et al.
(2006) used a similar method, choosing different values
where all VIF also became lower than 10. The VIF of the
regression coefcients is a simple but effective measure
for diagnosing collinearity (Schabenberger and Pierce,
2002).
For the second method (R2), the θ was empirically
chosen when the graphical plots shows a stabilization
of xed genetic estimates with different values of θ
used. Cardoso et al. (2008) and Piccoli et al. (2002)
used the value of 0.06 and Lopes et al. (2010) used
0.05 for the diagonal of K. For this analysis, the cho-
sen value of θ was 0.06.
In the present study, the RR analysis was per-
formed after standardization (centering and scaling)
of predictor variables, as recommended by Marquardt
and Snee (1975) and Freund and Littell (2000). After
estimation, the estimates were transformed to the orig-
inal scale and were presented in this way.
Another measure of multicollinearity presented
by Schabenberger and Pierce (2002) is the condition
index (CI), the square root of the ratio of the largest
to the smallest eigenvalue of the correlation matrix
formed from WC′WC:
CIi = (λmaxi)1/2,
in which λmax is the largest eigenvalue and λi is the ith
eigenvalue of the correlation matrix.
Small eigenvalues result in high CI, indicating a
potential collinearity problem. In the current investi-
gation, multicollinearity diagnostics were performed
using different measures: VIF, CI, and variance-de-
composition proportions associated with eigenvalues.
To compare the methodologies and models, we
used the relative efciency (RE), the stability of the
estimates over several generations, and the com-
parison between the across-breed EBV (AB-EBV).
Unbiasedness of an estimator may not necessarily be
the most desirable property. An estimator with a small
bias but with high accuracy may be preferred when the
unbiased estimator has a high variance. Schabenberger
and Pierce (2002) proposed RE, based on the mean
square error (MSE) of the functions that generated
such estimators. The RE of f(y) compared with g(y) as
estimators of a parameter λ was measured by the ratio
of their MSE. The RE, described in Schabenberger and
Pierce (2002), comprises RE[f(y), g(y)|λ] = MSE[g(y),
λ]/MSE[f(y), λ]. These authors suggest that if RE[f(y),
g(y)|λ] > 1, then f(y) should be preferred.
A simulation of data accumulation was performed
to demonstrate the variation of the estimates of xed
genetic effects over time, obtained by methodologies
LS, R1, and R2 and models AHEC, AHE, and AH.
The ridge parameter ki, i = 1, 2, …, p, was determined
using records accumulated from the beginning until
1994, 1995, …, 2010 and was used in the estimation of
xed genetic effects for each period. Ridge regression
methods and LS were then compared with respect to
stability of estimates over years.
The AB-EBV are the EBV added to the direct breed
additive effect, proportional to its breed composition.
RESULTS AND DISCUSSION
Multicollinearity
Eigenvalues and Condition Index. Figure 1 shows
the eigenvalues of the correlation matrix among predic-
tion variables of the xed genetic effects and correspond-
ing CI for all traits to the AHEC model. This model si-
multaneously considers all covariates tested in this study
and is the most likely model to present problems of col-
linearity between covariates. It can be noted that, for any
of the traits, the CI is lower than 30. The CI associated
with the last eigenvalue were between 20 and 30 for all
traits. The CI associated with the second smallest eigen-
value were greater than 10. When we consider the AHE
Bertoli et al.
504
and AH models (not shown), all CI were lower than 10
(the highest CI was 7.71 for SC in the AHE model).
Dias et al. (2011) considered that collinearity is
strong when CI is greater than 30 and weak when it
is between 10 and 30. For CI below 10, any linear
dependence that exists should not be problematic
(Schabenberger and Pierce, 2002; Roso et al., 2005a;
Dias et al., 2011). The results presented here show val-
ues within this range of 10 to 30.
The variance-decomposition proportions associ-
ated with the largest CI, presented in Table 3, suggest
that the strongest collinearity problem is between the
covariates used to estimate the effects of the maternal
components of heterosis, epistatic loss, and comple-
mentarity. When the largest CI was analyzed, these
covariates showed variance fractions greater than
80% for all traits. Observing the second largest CI,
the strongest association is between covariates that
are estimating the direct components of the same ef-
fects (heterosis, epistatic loss, and complementarity)
involving fractions around 60 to 70%. This is the best
method to detect collinearity between covariates.
When we are trying to estimate different effects from
the same information, we will just nd collinearity. Roso
et al. (2005a, p. 1793) state that “Multicollinearity in-
volving breed composition can be partially explained by
the mathematical constraint among breeds because breed
proportions of the breed composition of an animal add to
1 and the breed composition of a calf is equal to average
breed compositions of the sire and the dam.” The use of
breed composition, however, is presented as a relatively
simple measure to obtain and can be interpreted as if we
were looking at the same object from several different
perspectives (Fries et al., 2000). Dias et al. (2011) men-
tioned the value of 0.5 as a threshold to empirically de-
termine a strong linear relationship between the different
components of variance. Independent of this empirical
value, the data analyzed here has shown really high val-
ues (above 0.80) for the xed effects components. In this
case, to make a further check through VIF is still possi-
ble (Schabenberger and Pierce, 2002; Roso et al., 2005a;
Dias et al., 2011; Petrini et al., 2012).
Variance Ination Factor. Figure 2 shows the
VIF for all tested models and traits. It can easily be
seen that the AHE and AH models, regardless of the
methodology used to estimate the xed genetic ef-
fects, show low collinearity, whether strong or mod-
erate. Only the AHE model, when estimated by LS,
presented a VIF greater than 10 and it is only for the
SC trait (10.66) for direct epistatic loss.
The AHEC model, when estimated by LS, shows
VIF above 10 and sometimes above the upper threshold
of 30 (110 for ca for the SC trait). The breed additive
components show no VIF values that indicate moderate
or severe collinearity problems in any of the analyzed
models for the estimation of the xed genetic effects.
Ta bl e 3 . Decomposition of the variance structure of the parameter estimates associated with the 2 largest condition indices
(model AHEC that includes direct and maternal breed additive, heterosis, epistatic loss and complementarity effects)
Condition
index1
WG2WC2WP2WM2SC2
21.25 10.44 21.06 10.53 20.44 10.41 20.82 10.34 28.57 14.59
aa 0.00 0.06 0.00 0.07 0.00 0.07 0.00 0.07 0.00 0.11
am 0.01 0.09 0.01 0.10 0.01 0.11 0.01 0.10 0.01 0.18
ha 0.18 0.62 0.18 0.62 0.18 0.60 0.17 0.62 0.12 0.68
hm 0.94 0.01 0.94 0.01 0.94 0.01 0.94 0.01 0.94 0.02
ea 0.15 0.73 0.15 0.73 0.16 0.72 0.15 0.74 0.10 0.78
em 0.84 0.09 0.83 0.09 0.83 0.10 0.84 0.09 0.89 0.06
ca 0.18 0.65 0.18 0.65 0.19 0.64 0.18 0.65 0.12 0.70
cm 0.96 0.03 0.96 0.03 0.96 0.04 0.96 0.03 0.97 0.02
PG2PC2PP2PM2
Condition index 24.46 12.58 22.83 11.88 22.00 11.66 22.35 11.50
aa 0.01 0.07 0.01 0.07 0.01 0.08 0.01 0.08
am 0.01 0.09 0.01 0.09 0.02 0.11 0.01 0.10
ha 0.23 0.63 0.26 0.59 0.25 0.57 0.25 0.59
hm 0.95 0.00 0.95 0.00 0.95 0.00 0.95 0.00
ea 0.20 0.67 0.23 0.65 0.23 0.65 0.22 0.66
em 0.83 0.11 0.81 0.13 0.80 0.14 0.81 0.12
ca 0.24 0.65 0.26 0.62 0.26 0.62 0.25 0.62
cm 0.95 0.04 0.95 0.05 0.94 0.05 0.95 0.04
1aa = direct breed additive effect; am = maternal breed additive effect; ha = direct heterosis effect; hm = maternal heterosis effect; ea = direct epistatic
loss effect; em = maternal epistatic loss effect; ca = direct complementarity nonadditive effect; cm = maternal complementarity nonadditive effect.
2WG = weaning gain; WC = weaning conformation; WP = weaning precocity; WM = weaning muscling; SC = scrotal circumference; PG = postweaning
gain; PC = postweaning conformation; PP = postweaning precocity; PM = postweaning muscling.
Comparing methodologies 505
Figure 1. Eigenvalues and condition index of the correlation matrix among the prediction variables of xed genetic effects under model AHEC that
includes direct and maternal breed additive, heterosis, epistatic loss and complementarity effects. WG = weaning gain; PG = postweaning gain; WC = wean-
ing conformation; PC = postweaning conformation; WP = weaning precocity; PP = postweaning precocity; WM = weaning muscling; PM = postweaning
muscling; SC = scrotal circumference.
Bertoli et al.
506
The prediction variable of direct heterosis presented
values above 10 only for postweaning traits and SC, un-
der the AHEC model and LS methodology. For this same
model and methodology, the prediction variable of com-
ponent of complementarity remains at the same level for
all traits (between 10 and 23); maternal covariates, both
as complementarity and heterozygosity, presented a VIF
above 30 for all traits and the epistatic loss components
showed values above 10 and below 30 for all tested traits.
If a covariate is orthogonal to all the others, their
VIF is 1 (Schabenberger and Pierce, 2002). As the lin-
ear dependence increases, the VIF also increases. Dias
et al. (2011) suggested that the VIF may overestimate
the presence of multicollinearity, not differentiating
between high and low VIF values, making it impos-
sible to distinguish “quasi-dependence.” It is possible
to use a value of 10 for VIF as a cutoff value to indicate
that collinearity may be causing problems in the esti-
mation. Also, it is possible to use values above 30 to
indicate severe problems with collinearity between co-
variates (Schabenberger and Pierce, 2002; Schoeman et
al., 2002; Roso et al., 2005a; Dias et al., 2011). Dias et
Figure 2. Variance ination factors for the xed genetic effects, under 3 models (AHEC including the direct and maternal breed additive, heterosis,
epistatic loss and complementarity effects, AHE including the direct and maternal breed additive, heterosis and epistatic loss effects and AH including
the direct and maternal breed additive and heterosis effects and 3 methodologies (least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2]).
aa=direct breed additive, am=maternal breed additive, ha=direct heterosis, hm=maternal heterosis, ea=direct epistatic loss, em=maternal epistatic loss,
ca=direct complementarity and cm=maternal complementarity effects; WG = weaning gain; PG = postweaning gain; WC = weaning conformation; PC =
postweaning conformation; WP = weaning precocity; PP = postweaning precocity; WM = weaning muscling; PM = postweaning muscling; SC = scrotal
circumference.
Comparing methodologies 507
al. (2011) proposed only to verify the directions (posi-
tive or negative signals) of the values of xed genetic
effects. In this study, the directions have not changed
with the use of different methodologies; only their mag-
nitudes have changed. But this will be reected in the
breeding values, which will be discussed later.
Many studies propose ridge regression as an alter-
native to overcome the collinearity (Roso et al., 2005a;
Carvalheiro et al., 2006; Cardoso et al., 2008; Dias et
al., 2011; Petrini et al., 2012). The ridge regression
estimator consists of adding a small positive amount
on the diagonal of the W′W matrix, causing a reduc-
tion in the variance of the estimates at the expense of
introducing some bias. Therefore, the RR estimator of
γ takes the general form ˆ
ãk = (W′W + K)–1W′y, in
which K = diag(k1, k2, …, kp) and ki > 0. When all ki
elements are equal to 0,
ˆ
èk reduces to LS estimator.
This method, however, usually proposes an empirical
choice of the values of K. The K matrix should be
large enough to break the existing linear relationship
between the covariates and small enough to produce
the smallest possible bias (Schabenberger and Pierce,
2002).
When different methodologies are being con-
sidered to overcome the problem of collinearity,
Schabenberger and Pierce (2002) suggest the concept
of RE as an aid in decision making. Sometimes we
will have to choose between different optimal proper-
ties of estimation and it is necessary to establish clear
rules. It is not always possible to gather all the desir-
able properties in the same estimation process.
Methodologies and Models
Relative Efciency. The RE between the tested
methodologies is presented in Table 4. The values of
the chosen θ are also presented in Table 4 and vary
from 0.062 to 0.082. For the model AHEC, methodol-
ogies R1 and R2 showed a slight (1%) superiority over
LS for the traits PP and SC. The R2 method showed
better RE compared with LS and also in relation to the
R1 method for PG. For all other traits, there is no dif-
ference in RE between tested methodologies.
For the AHE model, estimates with LS and R1 are
the same for all traits, except for SC. For this trait, R1
and R2 showed a slightly higher efciency (1%) in
the estimation of the parameter when compared with
LS. For this model (AHE), when comparing LS with
R2, the biased estimator (R2) shows better RE just for
postweaning traits (PG, PC, PP, and PM).
With respect to the AH model, the comparison
shows slight superiority of R2 over LS for PG, PP, PM,
and SC. For the other traits, RE was equal to all differ-
ent methodologies.
According to the results obtained for RE, all test-
ed models have many similarities to each other, with
methodology R2 showing a slight advantage over the
other 2 for some of the postweaning traits. No refer-
ences were found with the use of RE in the choice of
methodologies for estimation of these parameters.
Comparisons of the Estimates of Fixed Genetic
Effects over Time. The change in the estimated values
for the xed genetic effects is shown in Fig. 3. The num-
ber of observations available in each analyzed period is
shown in Table 5. In the early years, the variation of the
estimated values for each of the xed genetic effects was
Table 4. Relative efciency between least squares (LS), ridge regression 1 (R1), and ridge regression 2 (R2)
methodologies for the 3 tested models
Trait1
AHEC2AHE2AH2
θ3RE4 (LS/R1) RE (LS/R2) RE (R1/R2) θ3RE (LS/R1) RE (LS/R2) RE (R1/R2) RE (LS/R2)
WG 0.064 1.00 1.00 1.00 1.00 1.00
WC 0.064 1.00 1.00 1.00 1.00 1.00
WP 0.062 1.00 1.00 1.00 1.00 1.00
WM 0.062 1.00 1.00 1.00 1.00 1.00
PG 0.066 1.00 1.01 1.01 1.01 1.01
PC 0.064 1.00 1.00 1.00 1.01 1.00
PP 0.062 1.01 1.01 1.00 1.01 1.01
PM 0.064 1.00 1.00 1.00 1.01 1.01
SC 0.082 1.01 1.01 1.00 0.062 1.01 1.01 1.00 1.01
1WG = weaning gain; WC = weaning conformation; WP = weaning precocity; WM = weaning muscling; PG = postweaning gain; PC = postweaning
conformation; PP = postweaning precocity; PM = postweaning muscling; SC = scrotal circumference.
2AHEC=model including the direct and maternal breed additive, heterosis, epistatic loss and complementarity effects, AHE, including the direct and
maternal breed additive, heterosis and epistatic loss effects and AH including the direct and maternal breed additive and heterosis effects.
3Values of θ relates to R1 method. Values not shown of θ are equal to 0. For R2 method, θ is equal to 0.06, and for the LS method, θ is equal to 0.
4RE = relative efciency.
Bertoli et al.
508
large compared with the variation in recent years, re-
gardless of the method and model used in this estimate.
Overall, from 2002, estimates became more stable for
all tested methodologies and models for the 9 analyzed
traits, and a clear tendency to stabilization can be seen
since 2002. It is possible that we got a better balance of
data of combinations of relationships from that year.
Over the years, estimates for aa, am, ha, and hm
were identical for WG, using the LS and R1 under the
AH model. For AHE model, estimates of am, ha, ea,
and em for WG were equal to the LS and R1 from the
year 2000. This suggests that there is no benet in us-
ing R1 as an alternative to LS in the AHE model.
For the direct (aa) and maternal (am) breed additive
effects, the greatest variation in estimates over the years
occurred when using the AHEC model with R1. Estimates
for the direct heterosis (ha) effect varied very little over
time, showing a little larger uctuation in AH models es-
timated by LS (WG) or R2 (PG). For evaluations subse-
quent to 2002, the estimated values of ha remain almost
constant although differing between models and method-
ologies used. When we report the maternal heterosis, the
largest oscillation is, undoubtedly, with the AHEC model
and the LS, showing an inverted peak in 1998.
Estimates of the direct epistatic loss effect showed
similar values for R1 and R2 under the full model (AHEC),
both for WG (Fig. 3A) and for PG (Fig. 3B), because
they both shrink the estimates but not exactly the same
because they use slightly different K matrix. Regarding
the maternal epistatic loss effect, it is possible to observe a
behavior not similar to that observed in maternal heterosis
estimates for both WG and PG and em effects were more
variable for PG over models than for WG.
For the direct and maternal complementarity effects,
the wide variation appears in the use of the LS meth-
odology in the early years. There is evidently a large
shrinkage effect on estimates of direct and maternal
complementarity, especially in the early years with rela-
tively little data. From 2000 began a period of stabiliza-
tion, and from 2002, the variation becomes very small,
even for the LS method. Although it can be perceived as
a parallel deviation, except for ca, for the trait WG (Fig.
3a), this bias tends to remain constant, which can be seen
from almost parallel lines on the chart. The deviation up
for cm (LS relative to RR) appears to be compensated
for by deviations down for ea, ca, hm, and em in the
WG data. In the PG data, except for the 1998 blip down
for ca before 2000, the deviation down for cm appears
to be compensated for by deviations up for ca, hm, and
em. Estimates of xed genetic effects for the other traits
showed equivalent behavior when estimated over the 16
yr.
We have shown that there is high collinearity in the
AHEC model indicated by high VIF values and high CI
values. Many authors have reported that high collinearity
leads to unstable estimates of coefcients having large SE
(Schabenberger and Pierce, 2002; Schoeman et al., 2002;
Roso et al., 2005a,b; Carvalheiro et al., 2006; Cardoso et
al., 2008; Dias et al., 2011; Bueno et al., 2012). The insta-
bility is reected in estimates widely changing when more
data is added, as seen in the 1998 results (Fig. 3). Table
1 shows a very unbalanced pattern for animals with vari-
ous proportions of the 2 breeds, and this was accentuated
in the early years of the breeding program. As more gen-
erations were added to the database, more diverse breed
compositions were added with better links over genera-
tions and so parameter estimates became for stable. Even
so, a quarter of the animals are derived from three-eighths
Nellore sires crossed with three-eighths Nellore dams.
Several identical estimates were found. This result
was expected in some cases. When the ridge parameter
used in R1, which is set according to the VIF 10, is
0, in fact, there is no ridge regression, remaining in use
the least squares methodology. When the ridge param-
eter used in R1 is very close to that used by R2 (θ(R2)
= 0.06 and θ(R1) = 0.062 or θ(R1) = 0.064), very close
estimates are generated.
The variation identied in the AHEC model can be
explained, at least in part, because this method uses dif-
ferent values for the ridge parameter depending on each
analyzed period. As the ridge parameter is depends on
the VIF values, it is calculated for each analysis and the
oscillation of the values present higher amplitude. The
serious uctuation in estimates over time is between
RR and LS model. Petrini et al. (2012) suggest, as an
alternative, in addition to improving and increasing
the amount of data, reducing the number of covariates
in the model. This is also clear in this study, compar-
ing the curves of the 3 analyzed models, presented in
Fig. 2 and 3. It is very reasonable to do that if the SE is
so large that the coefcient is not signicantly differ-
ent from 0. However, according to Lopes et al. (2010)
and Carvalheiro et al. (2006), it is important to include
the effects of epistatic loss in the evaluation model for
prediction of breeding values of crossbred animals.
The inclusion of complementarity also appears to be
Ta bl e 5 . Number of observations in the analysis over the years for weaning gain (WG) and postweaning gain (PG)
Year 1994 1996 1998 2000 2002 2004 2006 2008 2010
WG 48,826 54,442 75,468 105,165 149,820 193,439 227,156 275,138 294,045
PG 20,526 28,423 38,439 54,644 76,253 96,515 115,748 136,332 138,075
Comparing methodologies 509
important, although it should be further investigated
(Carvalheiro et al., 2006). However, in the early years,
it would have been sensible to omit ca and cm from
the model, until such time as you were condent in the
values. If the data does not provide a robust estimate, it
is seem not appropriate to include it.
We could not identify a reason that would justi-
fy the reversal of the estimated values for the xed
effects in 1998. Such behavior of the estimates may
come from the collected raw data or data structure in
that generation. What we can see is that relative stabil-
ity in the estimates of xed genetic effects begins in
2000.
Schoeman et al. (2002) state that, in cases of linear
dependence between the variables of the incidence ma-
trix (W in this study), the regression coefcient become
extremely unstable and “very sensitive to small random
errors in Y” and may have large uctuations with the
addition or removal of variables in the model. Our data
showed this behavior only in the early years, showing
stability after a certain point. Petrini et al. (2012) men-
tion this instability over time but neither Schoeman et
al. (2002) or Petrini et al. (2012) present results relating
to this variation, so it was not possible to make com-
parisons with other populations in this regard.
Across-Breed EBV Comparisons. Comparisons
between the AB-EBV obtained by different methodolo-
gies and different models of estimation were made by
calculating Pearson and Spearman correlations (Fig. 4)
and the percentage of coincident animals in different
percentage selections (top 1, 5, 10, 20, and 40%; Fig. 5).
Estimated breeding value of the animals was increased
by the additive direct genetic effect of breed, proportion-
al to its breed composition, aimed at across-breed com-
parison. The crop presented relates only to calves born in
2009 (n =17,694), aiming to simulate the latest genetic
evaluation of the tested herd, using data from individu-
als with complete information on weaning and yearling
traits. Comparisons of AB-EBV bulls and cows are not
displayed because they had a pattern similar to calves.
Correlations between Across-Breed EBV
Obtained by Least Squares, Ridge Regression 1, and
Ridge Regression 2. All correlations were high, rang-
ing from 0.83 to 1.0 (Fig. 4). The AHEC model in R1
and R2 has values equal to 1 because the ridge parame-
ters used in these evaluations were similar. For the AHE
and AH models, the LS and R1 estimates often agreed
because the ridge parameter was 0 in many cases. The
lowest correlation was found for WG followed by PG
between LS × R1 and R2 × LS. When the correlation
Figure 3. Fixed genetic effects estimated for (A) weaning gain (WG) and (B) postweaning gain (PG) traits estimated by 3 distinct methodologies
(least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2]) and models (AHEC including the direct and maternal breed additive, heterosis,
epistatic loss and complementarity effects, AHE including the direct and maternal breed additive, heterosis and epistatic loss effects and AH including the
direct and maternal breed additive and heterosis effects). aa=direct breed additive, am=maternal breed additive, ha=direct heterosis, hm=maternal heterosis,
ea=direct epistatic loss, em=maternal epistatic loss, ca=direct complementarity and cm=maternal complementarity effects.
Bertoli et al.
510
was close to 1, the calves in the top percentage class
predominantly were the same ones (Fig. 5).
Coincidences for Different Proportions of
Selected Animals. For the model AHEC, R1 and R2
were almost perfectly correlated for all the tested traits
(99 to 100%), and LS and R1 also were for the AHE
and AH models (90 to 100%), except for SC. This is
true even when only 1% of the top animals were select-
ed (Fig. 5). This is possibly due to the ridge parameter
obtained in the R1 being sometimes coincident with
R2 and sometimes coincident with 0 (LS). This is a
proper result of the R1, which sets the ridge parameter
depending on the VIF. These results are in complete
agreement with the correlations presented in Fig. 4.
Figure 4. Pearson and Spearman correlations for direct across-breed EBV for weaning gain (WG), postweaning gain (PG), weaning conformation (WC),
postweaning conformation (PC), weaning precocity (WP), postweaning precocity (PP), weaning muscling (WM), postweaning muscling (PM), and scrotal
circumference (SC) given by different methodologies (least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2]) and models (AHEC including
the direct and maternal breed additive, heterosis, epistatic loss and complementarity effects, AHE including the direct and maternal breed additive, heterosis
and epistatic loss effects and AH including the direct and maternal breed additive and heterosis effects).
Comparing methodologies 511
Figure 5. Percentage of coincidence for different proportion (top 1, 5, 10, 20, and 40%) of selected calves on the basis of direct across-breed EBV for
weaning gain (WG) and postweaning gain (PG) given by different methodologies (least squares [LS], ridge regression 1 [R1], and ridge regression 2 [R2])
and models (AHEC including the direct and maternal breed additive, heterosis, epistatic loss and complementarity effects, AHE including the direct and
maternal breed additive, heterosis and epistatic loss effects and AH including the direct and maternal breed additive and heterosis effects). WC = wean-
ing conformation; PC = postweaning conformation; WP = weaning precocity; PP = postweaning precocity; WM = weaning muscling; PM = postweaning
muscling; SC = scrotal circumference.
Bertoli et al.
512
When comparisons are made on the basis of the 1%
top animals, the percentages of coincidences varied
between 59.14 and 99.46%, Lower percentage of co-
incidences were observed for WG in the AHEC mod-
els (LS × R1 and R2 × LS, both 59.14%) and AHE (LS
× R2, 65.59%, and R1 × R2, 66.67%). These results
show that the changes in methodology, for these mod-
els, caused an important reranking of the AB_EBV.
Roso et al. (2005a) found 78 to 80% of coinci-
dences among the top 40% AB-EBV estimated by LS
and ridge regression using a model equivalent to AH
and around 80 to 85% for a model equivalent to AHE.
Inversely, when the comparison was made for the top
1% selected animals, the coincidence has dropped
to 60 to 65% in both cases. Our data showed greater
coincidence in general, with few below 60% (WG
[AHEC] for LS × R1 and LS × R2 and SC [AHE] for
LS × R2) and the great majority over 70% of the top
1% selected animal coincidences. These differences
conrm the fact that the choice of methodology has
signicant consequences on the genetic selection on
the animals, resulting in different rankings of animals
on the basis of AB-EBV.
Petrini et al. (2012) argued that the presence of col-
linearity can affect the accuracy of estimates, regard-
less of the intensity and, consequently, the accuracy
of inferences based on these results. This can turn into
wrong choices, of the animals, in applied breeding pro-
grams. Roso et al. (2005b) concluded that inadequate
separation of nonadditive genetic effects in the evalu-
ation model and multicollinearity in the analysis may
affect the ranking of animals when compared between
different breeds. Carvalheiro et al. (2006) suggest that
the ridge regression can and should be used to correct
the multicollinearity problems as in the estimation of
the effects of regression as in the predicted genetic val-
ues. They afrm that the prediction surfaces present
themselves more acceptable from a biological point of
view under this methodology. Dias et al. (2011) found
no differences in the signal (positive/negative) of the
estimates for LS and RR but found differences in mag-
nitudes and SE. They suggest that LS overestimates the
values, RR being more reliable by decreasing the SE
of estimates and increasing the accuracy, despite of the
inclusion of the bias.
Collinearity was found to be a problem in the analy-
sis of our data when all 8 breed covariates were included
in the model but not when direct and maternal comple-
mentarity were excluded. When collinearity was included,
the R1 method proved an effective way of addressing the
problem. Nevertheless, there was little impact on the addi-
tive breed component or on the within-breed EBV so that
there was high agreement between the animals that would
be selected for additive effect regardless of the model.
The data did not sustain the tting of the AHEC
model when it was restricted to pre-2004 data. After
then, the diversity in breed compositions represented
across generations was sufcient to t the AHEC model,
but the functional relationships among the breed geno-
type covariates meant that method R1 was preferable.
LITERATURE CITED
Arnold, J. W., J. K. Bertrand, and L. L. Benyshek. 1992. Animal
model for genetic evaluation of multibreed data. J. Anim. Sci.
70:3322–3332.
Bertoli, C. D. 1991. Sistema Cruza-Controle de produção e aval-
iação dos valores genéticos dentro de uma população bovina
sintética. (In Portuguese.) MSc Diss., Universidade Federal
do Rio Grande do Sul, Faculdade de Agronomia.
Bueno, R. S., R. D. A. Torres, J. B. S. Ferraz, P. S. Lopes, J. P. Eler,
G. B. Mourão, M. Almeida e Silva, and E. C. Mattos. 2012.
Métodos de estimação de efeitos genéticos não-aditivos para
características de peso e perímetro escrotal em bovinos de
corte mestiços. (In Portuguese.) R. Bras. Zootec. 41:1140–
1145. doi:10.1590/S1516-35982012000500009.
Cardoso, V. , S. A. De Queiroz, and L. A. Fries. 2008. Estimativas
de efeitos genotípicos sobre os desempenhos pré e pós- des-
mama de populaces. Introdução Mater. (Estimates of genotyp-
ic effects on pre and post-weaning performance in Hereford
× Nelore populations.) (In Portuguese.) R. Bras. Zootec.
37:1763–1773. doi:10.1590/S1516-35982008001000008.
Carvalheiro, R., E. C. G. Pimentel, V. Cardoso, S. A. Queiroz,
and L. A. Fries. 2006. Genetic effects on preweaning weight
gain of Nelore-Hereford calves according to different mod-
els and estimation methods. J. Anim. Sci. 84:2925–2933.
doi:10.2527/jas.2006-214.
Dias, R. A. P., J. Petrini, J. B. S. Ferraz, J. P. Eler, R. S. Bueno, A.
L. L. da Costa, and G. B. Mourão. 2011. Multicollinearity
in genetic effects for weaning weight in a beef cattle com-
posite population. Livest. Sci. 142:188–194. doi:10.1016/j.
livsci.2011.07.016.
Efron, B. 1979. Bootstrap methods: Another look at the Jackknife.
Ann. Stat. 7:1–26
Freund, R. J., and R. C. Littell. 2000. SAS System for Regression.
3rd ed. SAS Inst.,Inc., Cary, NC.
Fries, L. A., D. J. Johnston, H. Hearnshaw, and H. U. Graser. 2000.
Evidence of epistatic effects on weaning weight in crossbred
beef cattle. Asian-Australas. J. Anim. Sci. 13(Suppl. B):242.
Fries, L. A., F. S. Schenkel, V. M. Roso, F. V. Brito, J. L. P. Severo,
and M. L. Piccoli. 2002. “Epistazygosity” and epistatic ef-
fects. In: 7th World Congr. Genet. Appl. Livest. Prod., August
19–23, Montpellier, France. Session 17. Estimation of addi-
tive and non-additive genetic parameters.
Kinghorn, B. P. 1993. Design of livestock breeding programs.
Animal Genetics and Breeding Unit, The University of New
England, Armidale, NSW, Australia. p. 187–203.
Lopes, J. S., P. R. N. Rorato, T. Weber, R. O. de Araújo, M. D.
A. Dornelles, and J. G. Comin. 2010. Pre-weaning perfor-
mance evaluation of a multibreed Aberdeen Angus × Nellore
population using different genetic models Avaliação do de-
sempenho na pré desmama de uma população bovina mul-
tirracial Aberdeen Angus × Nelore utili. Rev. Bras. Zootec.
39:2418–2425.
Marquardt, D. W., and R. D. Snee. 1975. Ridge regression in prac-
tice. Am. Stat. 29:3–20.
Comparing methodologies 513
Petrini, J., R. Antonio, P. Dias, S. Fernanda, N. Pertile, J. P.
Eler, J. Bento, S. Ferraz, and G. B. Mourão. 2012. Degree
of multicollinearity and variables involved in linear depen-
dence in additive-dominant models. Pesqui. Agropecu. Bras.
47:1743–1750. doi:10.1590/S0100-204X2012001200010.
Piccoli, M. L., V. M. Roso, F. V. Brito, J. L. P. Severo, F. S.
Schenkel, and L. A. Fries. 2002. Additive, complementar-
ity (additive*additive), dominance, and epistatic effects on
preweaning weight gain of Hereford × Nelore calves. In:
7th World Congr. Genet. Appl. Livest. Prod., August 19–23.
Montpelier, France. p. 2000–2003.
Pimentel, E. D. C. G., S. A. De Queiroz, R. Carvalheiro, and L.
A. Fries. 2006. Estimativas de efeitos genéticos em bezerros
cruzados por diferentes modelos e métodos de estimação. (In
Portuguese.) R. Bras. Zootec. 35:1020–1027. doi:10.1590/
S1516-35982006000400012.
Pimentel, E. D. C. G., S. A. De Queiroz, R. Carvalheiro, and L.
A. Fries. 2007. Use of ridge regression for the prediction of
early growth performance in crossbred calves. Genet. Mol.
Biol. 30:536–544. doi:10.1590/S1415-47572007000400006.
Roso, V. M., and F. S. Schenkel. 2006. AMC A computer pro-
gram to assess the degree of connectedness among con-
temporary groups. In: Proc. 8th World Congr. Genet. Appl.
Livest. Prod., August 13 to 18, 2006, Belo Horizonte, MG,
Brazil. Communication no. 27-26.
Roso, V. M., F. S. Schenkel and S. P. Miller. 2004. Degree of con-
nectedness among groups of centrally tested beef bulls. Can.
J.Anim.Sci. 84:37–47.
Roso, V. M., F. S. Schenkel, S. P. Miller, and L. R. Schaeffer.
2005a. Estimation of genetic effects in the presence of multi-
collinearity in multibreed beef cattle evaluation. J. Anim. Sci.
83:1788–1800.
Roso, V. M., F. S. Schenkel, S. P. Miller, and J. W. Wilton. 2005b.
Additive, dominance, and epistatic loss effects on prewean-
ing weight gain of crossbred beef cattle from different Bos
taurus breeds. J. Anim. Sci. 83:1780–1787.
Schabenberger, O., and F. J. Pierce. 2002. Contemporary statis-
tical models for the plant and soil sciences. Ed Taylor and
Francis Group, New York, NY.
Schenkel, F. S. 1993. Calculo das heterozigoses. (In Portuguese.)
GenSys Consultores Associados, Porto Alegre, Brazil.
Schoeman, S. J., M. A. Aziz, and G. F. Jordaan. 2002. The in-
uence of multicollinearity on crossbreeding parameter esti-
mates for weaning weight in beef cattle. S. Afr. J. Anim. Sci.
32:239–246.
Williams, J. L., I. Aguilar, R. Rekaya, and J. K. Bertrand. 2010.
Estimation of breed and heterosis effects for growth and car-
cass traits in cattle using published crossbreeding studies. J.
Anim. Sci. 88:460–466. doi:10.2527/jas.2008-1628.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Article
The degree of connectedness among contemporary groups (CG) might be a limiting factor for effective selection across CG. With a lower degree of connectedness between CG, comparison of animals’ EBV from different CG is less accurate and can result in incorrect ranking of animals across groups. Kennedy and Trus (1993) suggested that either PEV of comparisons between animals or average PEV of comparisons between groups of animals (PEVD) should be the basis for measuring connectedness between CG. This statistic is, however, computationally demanding and very difficult to apply to routine large-scale genetic evaluations. Various criteria have been proposed for measuring connectedness among CG, but most of them are not feasible for implementation in large-scale genetic evaluations. Fries (1998) proposed the use of the total number of direct genetic links (GLT) between CG due to common sires and dams as a method for measuring the degree of connectedness among CG. Results from Roso et al. (2004) indicated that GLT can be used to accurately predict the average PEVD of pairs of CG. The objective was to develop an efficient computer program for measuring the degree of connectedness among CG, which could be implemented in very largescale genetic evaluations.
Full-text available
Article
Data on 17258 weaning weight records of calves from a crossbreeding project were utilized to investigate the problem of collinearity and its effect on the estimation of direct, maternal and the non-additive genetic effects. Several criteria were used to detect the near-dependency among the independent variables. The results indicated that there was a near-dependency among both the direct and the maternal genetic effects causing unstable estimates. It was attempted to solve the collinearity problem using ridge regression. An improved model fit was evident at a ridge value of k = 0.8 with large reductions in standard errors and estimates with more meaningful biological interpretation. Ridge regression is recommended for the estimation of crossbreeding effects where inevitable collinearity amongst the independent variables is evident.
Full-text available
Article
The degree of connectedness among test groups (TG) of bulls tested in central evaluation stations from 1988 to 2000 in Ontario, Canada, was evaluated using the following methods: average prediction error variance of the difference between estimated breeding values (PEVD), variance of estimated differences between test group effects (VED), connectedness rating (CR), and total number of direct genetic links between test groups (GLT). The model used in the analysis included the effects of breed and TG (fixed) and animal (random). PEVD was assumed the most adequate measure of connectedness and results from the alternative methods VED, CR, and GLT were compared against it. Models to predict the average PEVD of pairs of TG and the average PEVD of each TG with all other TG on the basis of VED, CR, and GLT were developed. Results from all measures of connectedness indicated an unfavorable trend in the degree of connectedness after 1994. The average PEVD of pairs of TG can be better predicted on the basis of the model that includes GLT. The average PEVD of each TG with all other TG can be better predicted on the basis of models that include either CR or GLT. Connectedness among TG of centrally tested beef bulls can be adequately assessed for specific pairs of TG or overall for each TG with all other TG using GLT.
Full-text available
Article
This work aimed at estimating the genetic effects that affect the pre-weaning performance of animals from multibreed crosses. In order to do so, it was used information of the weight at weaning of 79,521 animals, sired by 1,020 bulls and 61,898 cows from Aberdeen Angus and Nellore breeds and from many genetic groups resulted from their crosses. Five genetic models were tested: model 1, containing the fixed breed genetic effects (additive direct and maternal effects, heterozygote direct and maternal effects, epystatic direct and maternal effects, joint additive direct and maternal effects); model 2, equal to model 1, excluding direct and maternal joint additive effects; model 3, equal to model 1, excluding direct and maternal epystatic effects; model 4, equal to model 1, excluding direct and maternal epystactic effect and direct and maternal joint additive effects; and model 5, equal to model 1, excluding direct and maternal heterozygotic effects, direct and maternal epystatic effects and direct and maternal joint additive effects. The models were analyzed by the following methods: least square means method, ridge regression method, and the restricted maximum likelihood method. The dominant additive models usually used for genetic evaluations do not give a good description of the pre-weaning performance variations, making it necessary to add the heterozygote and epystatic effects; the joint additive effects do not significantly improve the adjustment of the analysis model and the heterozygote effects are efficient in representing a quadratic breed additive effect, in addition to insert an unnecessary bias assigned to multicollinearity related to the joint additive effects.
Full-text available
Article
The purpose of this study was to compare estimates of genetic effects obtained using the additive-dominance model and another which included parameters for joint-additive (complementarity) and epistatic effects, as well as evaluate alternative objective criteria for choosing the lambda coefficient in ridge regression implementation. The results indicated that the criterion to be employed at the choice of lambda not only depends on the data set and the model used, but also on a previous knowledge about the phenomenon under study and the practical interpretation of estimated coefficients. When performing genetic effects evaluation, if other than additive and dominance effects are contemplated, it may be possible to identify and separate joint-additive and epistatic effects, which are usually inlaid in the heterotic effect estimated by the additive-dominance model. The use of ridge regression method can make such analyses possible even under strong multicollinearity.
Full-text available
Article
The problem of multicollinearity in regression analysis was studied. Ridge regression (RR) techniques were used to estimate parameters affecting the performance of crossbred calves raised in tropical and subtropical regions by a model including additive, dominance, joint additive or "profit heterosis" and epistatic effects and their interactions with latitude in an attempt to model genotype by environment interactions. A software was developed in Fortran 77 to per- form five variant types of RR: the originally proposed method; the method implemented by SAS; and three methods of weighting the RR parameter λ. Three mathematical criteria were tested with the aim of choosing a value for the λ coefficient: the sum and the harmonic mean of the absolute Student t-values and the value of λ at which all variance inflation factors (VIF) became lower than 300. Prediction surfaces obtained from estimated coefficients were used to compare the five methods and three criteria. It was concluded that RR could be a good alternative to overcome multicollinearity problems. For all the methods tested, acceptable prediction surfaces could be obtained when the VIF criterion was employed. This mathematical criterion is thus recommended as an auxiliary tool for choosing λ.
Book
Despite its many origins in agronomic problems, statistics today is often unrecognizable in this context. Numerous recent methodological approaches and advances originated in other subject-matter areas and agronomists frequently find it difficult to see their immediate relation to questions that their disciplines raise. On the other hand, statisticians often fail to recognize the riches of challenging data analytical problems contemporary plant and soil science provides. The first book to integrate modern statistics with crop, plant and soil science, Contemporary Statistical Models for the Plant and Soil Sciences bridges this gap. The breadth and depth of topics covered is unusual. Each of the main chapters could be a textbook in its own right on a particular class of data structures or models. The cogent presentation in one text allows research workers to apply modern statistical methods that otherwise are scattered across several specialized texts. The combination of theory and application orientation conveys why a particular method works and how it is put in to practice. About the CD-ROM The accompanying CD-ROM is a key component of the book. For each of the main chapters additional sections of text are available that cover mathematical derivations, special topics, and supplementary applications. It supplies the data sets and SAS code for all applications and examples in the text, macros that the author developed, and SAS tutorials ranging from basic data manipulation to advanced programming techniques and publication quality graphics. Contemporary statistical models can not be appreciated to their full potential without a good understanding of theory. They also can not be applied to their full potential without the aid of statistical software. Contemporary Statistical Models for the Plant and Soil Science provides the essential mix of theory and applications of statistical methods pertinent to research in life sciences.