Increased accuracy of artificial selection by using
the realized relationship matrix
B. J. HAYES1*, P. M. VISSCHER2AND M. E. GODDARD1,3
1Biosciences Research Division, Department of Primary Industries Victoria, 1 Park Drive, Bundoora 3083, Australia.
2Queensland Institute of Medical Research, Brisbane, Australia.
3Faculty of Land and Food Resources, University of Melbourne, Parkville 3010, Australia.
(Received 23 July 2008 and in revised form 4 December 2008)
Dense marker genotypes allow the construction of the realized relationship matrix between
individuals, with elements the realized proportion of the genome that is identical by descent (IBD)
between pairs of individuals. In this paper, we demonstrate that by replacing the average
relationship matrix derived from pedigree with the realized relationship matrix in best linear
unbiased prediction (BLUP) of breeding values, the accuracy of the breeding values can be
substantially increased, especially for individuals with no phenotype of their own. We further
demonstrate that this method of predicting breeding values is exactly equivalent to the genomic
selection methodology where the effects of quantitative trait loci (QTLs) contributing to variation in
the trait are assumed to be normally distributed. The accuracy of breeding values predicted using the
realized relationship matrix in the BLUP equations can be deterministically predicted for known
family relationships, for example half sibs. The deterministic method uses the effective number of
independently segregating loci controlling the phenotype that depends on the type of family
relationship and the length of the genome. The accuracy of predicted breeding values depends on
this number of effective loci, the family relationship and the number of phenotypic records. The
deterministic prediction demonstrates that the accuracy of breeding values can approach unity if
enough relatives are genotyped and phenotyped. For example, when 1000 full sibs per family were
genotyped and phenotyped, and the heritability of the trait was 0.5, the reliability of predicted
genomic breeding values (GEBVs) for individuals in the same full sib family without phenotypes was
0.82. These results were verified by simulation. A deterministic prediction was also derived for
random mating populations, where the effective population size is the key parameter determining the
effective number of independently segregating loci. If the effective population size is large, a very
large number of individuals must be genotyped and phenotyped in order to accurately predict
breeding values for unphenotyped individuals from the same population. If the heritability of the
trait is 0.3, and Ne=1000, approximately 5750 individuals with genotypes and phenotypes are
required in order to predict GEBVs of un-phenotyped individuals in the same population with an
accuracy of 0.7.
In best linear unbiased prediction (BLUP) of breeding
values, information from performance of relatives is
incorporated through the use of a relationship matrix.
Elements of this matrix are derived as the predicted
proportion of the genome that is identical by descent
(IBD) among two individuals given their pedigree
relationship. However, Mendelian sampling during
gamete formation results in variation in the realized
proportion of the genome, which is IBD between
pairs of individuals with the same predicted relation-
ship coefficients (Franklin, 1977; Hill, 1993; Guo,
1996). For example, between full-sib individuals the
* Corresponding author. Tel: +61 (0)3 9479 5439. Fax: +61 (0)3
9479 3113. e-mail: firstname.lastname@example.org
Genet. Res., Camb. (2009), 91, pp. 47–60.
f 2009 Cambridge University Press
Printed in the United Kingdom
predicted proportion of the genome that is IBD is
0.5, while its standard deviation is 0.04 for a species
with 30 chromosomes each of 1 M in length (Guo,
DNA marker information can be used to calculate
the realized relationship matrix with elements the ac-
tual proportion of the genome that is IBD between
two individuals, with a high degree of precision, pro-
vided that a sufficient number of markers are used.
Nejati-Javaremi et al. (1997) demonstrated with
simulation that if the loci contributing to trait vari-
ation were known, and the alleles at these loci were
used to derive the realized relationship matrix, the
accuracy of breeding values calculated using this ma-
trix could be higher than that calculated using the
predicted relationship matrix. In practice, all the loci
contributing to trait variation are unlikely to have
been identified. Villanueva et al. (2005) demonstrated
by simulation that using the realized relationship
matrix derived from markers rather than the pre-
dicted relationship matrix in the calculation of esti-
mated breeding values (EBVs) could lead to higher
accuracies of selection. They proposed that marker
information used in this way could offer benefits in
selection programmes when no quantum trait locus
(QTL) has been mapped or when the underlying
genetic model can be considered the infinitesimal
model, where no individual QTL has a moderate to
large effect on the trait. For some traits such as height
in humans, this is indeed the case, with the largest
reported QTLs explaining only a small fraction of the
genetic variance (e.g. Sanna et al., 2008; Visscher,
While Villanueva et al. (2005) considered estimat-
ing realized relationships conditional on a known
pedigree (exploiting linkage information) realized re-
lationship coefficients can also be estimated for ‘un-
related’ individuals within a population. This requires
sufficient marker density to identify chromosome
segments in two individuals that are descended from
the same common, but unknown ancestor.
An alternative method by which DNA marker data
can be used to estimate breeding values is genomic
selection (Meuwissen et al., 2001). In this method, the
markers are used to track QTLs whose effects are es-
timated and summed to predict the breeding value of
each individual. However, if there are many QTLs
whose effects are normally distributed with constant
variance, then genomic selection can be equivalent
to the use of the realized relationship matrix (e.g.
Fernando, 1998; Habier et al., 2007; Van Raden,
2007 and Goddard, 2008).
Currently, there is no analytical method available
to predict the accuracy of EBVs calculated using the
genomic relationship matrix considering information
from relatives. Analytical expressions would be de-
sirable to guide the design of experiments aiming to
achieve a given accuracy of genomic breeding values
for the accuracy of GEBV considering information
from relatives. We also modify the expression of
Goddard (2008) for the accuracy of GEBV in random
mating populations to improve the predictions. Our
starting point for all derivations was the equivalent
genomic selection model. We then verified the ana-
lytical predictions using two simulation approaches.
First, we derive a prediction of the accuracy based on
the prediction error variance (PEV) where the realized
relationship matrix is determined by a large number
of informative markers. Secondly, we derive accuracy
from simulations with both markers and QTLs seg-
regating as the correlation between true and predicted
breeding values. We then investigate the sensitivity of
the results to the number of markers used, the number
of QTLs and effective population size.
(i) An equivalent model for genomic selection
This material is also contained in Goddard (2008) but
is included here for completeness. Consider a model
of the true breeding value of the ith individual (gi)
based on a large number of QTLs of small effect. To
simplify our analytical derivation, we will define a
parameter q as the number of independent chromo-
some segments. This model can be pictured as divid-
ing the chromosomes into segments that effectively
segregate independently and defining the effect of the
segment as the sum of the effects of the QTL carried
on that segment. The assumption here is that there are
at least as many QTLs as there are effective chromo-
some segments. Alternatively if QTLs are unlinked,
then q is the number of unlinked QTLs. Then
where ujis the allele substitution effect at the jth QTL
and is normally distributed uyN(0, su
the variance of the effect of QTL alleles sampled ran-
domly from the population, and Wijis 0, 1 or 2 if
individual i carries 0, 1 or 2 copies of the second allele
at the jth QTL. In practice, it is convenient to subtract
the mean value of w from each element so that
Wij=0x2pjor 1x2pjor 2x2pj, where pj=the allele
frequency of the second allele at locus j. This
corresponds to the genomic selection model that
Meuwissen et al. (2001) called the BLUP model.
A simple version of genomic selection is to define the
Wijbased on markers instead of the QTL. Then the
best estimates of the ujand hence gican be obtained
In matrix form g=Wu and V(g)=WWksu
W is a design matrix allocating QTL allele effects to
2), where su
B. J. Hayes et al. 48
Ignoring fixed effects, the model for an individual re-
where, as in the main text, f=the family mean, w_j
indicatesthealleleatthejth QTLwithsub-script sor d
for sire and dam alleles and us_and ud_are the effects
of the sire and dam alleles.
This results in a large set of equations but many of
the terms are approximately independent. For in-
stance, the alleles at one effective QTL (wj) are in-
herited independently of the alleles at other effective
QTL (e.g. wj+1), given our definition of QTL as in-
dependent loci. This means that the equations are
approximately block diagonal. In the case of a family
of half-sibs, for instance, there are two paternal alleles
and approximately half the offspring (N/n, where
n=2) in the family will receive each one. Therefore,
we will approximate the complete set of equations
with the equations for one QTL and assuming that all
alleles are equally represented. In matrix notation, the
and the mixed model equations, treating f as fixed
When the terms are evaluated, the left-hand side
The inverse of this matrix is
The PEV of uˆ can be obtained from this inverse
matrix. The estimates of u are used for selection
within family, so we require the PEV of uxu ¯. This is
Using the variance of the true uxu ¯=s2
the variance of u ˆxu ¯ is
Note that except for (nx1)/n, which corrects for
selection within only n possible alleles, this is the
normal formula for reliability of a BLUP solution
based on N/n records for each effect to be estimated,
In the full equation set, the residual variance is
V(y)(1xh2) because all the genetic variances are in-
cluded in the model. However, the full equations will
never be exactly balanced across all terms, and so the
PEV will be greater than that calculated above if l
were based on this residual. We have found that the
PEV are approximated better if we use the error in
eqn (A1), which is the phenotypic variance minus the
variance explained by one QTL allele. The variance
explained by one QTL is very small and 2qV(u)=
V(g), so l=V(y)/V(u)=1/(h2/(2q))=2q/h2.
Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon,
L. R. (2002). Merlin-rapid analysis of dense genetic
maps using sparse gene flow trees. Nature Genetics 30,
Dunning, A. M., Durocher, F., Healey, C. S., Teare, M. D.,
McBride, S. E., Carlomagno, F., Xu, C. F., Dawson, E.,
Rhodes, S., Ueda, S., Lai, E., Luben, R. N., Van
Rensburg, E. J., Mannermaa, A., Kataja, V., Rennart,
G., Dunham, I., Purvis, I., Easton, D. & Ponder, B. A. J.
(2000). The extent of linkage disequilibrium in four popu-
lations with distinct demographic histories. American
Journal of Human Genetics 67, 1544–1554.
Franklin, I. R. (1977). The distribution of the proportion of
the genome which is homozygous by descent in inbred
individuals. Theoretical Population Biology 11, 60–80.
Fernando, R. L. (1998). Some theoretical aspects of finite
locus models. Proceedings of the 6th World Congress of
GeneticsApplied to Livestock
January 1998, University of New England, Armidale,
Australia, volume 26 (1998), pp. 329–336.
Goddard, M. E. (2008). Genomic selection: prediction of
accuracy and maximisation of long term response.
Genetica Epub ahead of print. PMID: 18704696.
Guo, S. W. (1996). Variation in genetic identity among re-
latives. Human Heredity 46, 61–70.
Habier, D., Fernando, R. L. & Dekkers, J. C. (2007). The
impact of genetic relationship information on genome-
assisted breeding values. Genetics 177, 2389–2397.
Hayes, B. J. & Goddard, M. E. (2001). The distribution of
the effects of genes affecting quantitative traits in live-
stock. Genetics Selection Evolution 33, 209–229.
Hayes, B. J., Visscher, P. M., McPartlan, H. & Goddard,
M. E. (2003). A novel multi-locus measure of linkage
disequilibrium and it use to estimate past effective popu-
lation size. Genome Research 13, 635.
Hayes, B. J., Chamberlain, A. C., McPartlan, H., McLeod,
I., Sethuraman, L. & Goddard, M. E. (2007). Accuracy of
marker assisted selection with single markers and marker
haplotypes in cattle. Genetics Research 89, 215–220.
Hill, W. G. (1993). Variation in genetic identity within kin-
ships. Heredity 71, 652–653.
Hill, W. G., Goddard, M. E. & Visscher, P. M. (2008). Data
and theory point to mainly additive genetic variance for
complex traits. PLoS Genetics 4(2), e1000008. doi:
Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E.
(2001). Prediction of total genetic value using genome-
wide dense marker maps. Genetics 157, 1819–1829.
Meuwissen, T. H. & Goddard, M. E. (2004). Mapping
multiple QTL using linkage disequilibrium and linkage
analysis information and multitrait data. Genetics Selec-
tion Evolution 36(3), 261–279.
Selection using the realized relationship matrix 59
Nejati-Javaremi, A., Smith, C. & Gibson, J. (1997). Effect of
total allelic relationship on accuracy of evaluation and
response to selection. Journal of Animal Science 75,
Rasmusson, M. (1993). Variation in genetic identity within
kinships. Heredity 70, 266–268.
Reich, D. E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P. C.,
Richter, D. J., Lavery, T., Kouyoumjlan, R., Farhadian,
S. F., Ward, R. & Lander, E. S. (2001). Linkage dis-
equilibrium in the human genome. Nature 411, 199–204.
Riquet, J., Coppieters, W., Cambisano, N., Arranz, J. J.,
Berzi, P., Davis, S. K., Grisart, B., Farnir, F., Karim, L.,
Mni, M., Simon, P., Taylor, J. F., Vanmanshoven, P.,
Wagenaar, D., Womack, J. E. & Georges, M. (1999).
Fine-mapping of quantitative trait loci by identity by de-
scent in outbred populations: application to milk pro-
duction in dairy cattle. Genetics 96, 9252–9257.
Sanna, S., Jackson, A. U., Nagaraja, R., Willer, C. J., Chen,
W. M., Bonnycastle, L. L., Shen, H., Timpson, N.,
Lettre, G., Usala, G., Chines, P. S., Stringham, H. M.,
Scott, L. J., Dei, M., Lai, S., Albai, G., Crisponi, L.,
Naitza, S., Doheny, K. F., Pugh, E. W., Ben-Shlomo, Y.,
Ebrahim, S., Lawlor, D. A., Bergman, R. N., Watanabe,
R. M., Uda, M., Tuomilehto, J., Coresh, J., Hirschhorn,
J. N., Shuldiner, A. R., Schlessinger, D., Collins, F. S.,
Davey Smith, G., Boerwinkle, E., Cao, A., Boehnke, M.,
Abecasis, G. R. & Mohlke, K. L. (2008). Common var-
iants in the GDF5-UQCC region are associated with
variation in human height. Nature Genetics 40, 198–203.
Stam, P. (1980). The distribution of the fraction of the
genome identical by descent in finite random mating
populations. Genetical Research 35, 131–155.
Tenesa, A., Navarro, P., Hayes, B. J., Duffy, D. L.,
Clarke, G. M., Goddard, M. E. & Visscher, P. M.
(2007). Recent human effective population size estimated
from linkage disequilibrium. Genome Research 17,
The International HapMap Consortium (2007). A second
generation human haplotype map of over 3..1 million
SNPs. Nature 449(7164), 851–861.
Van Raden, P. M. (2007). Efficient estimation of breeding
values from dense genomic data. Journal of Dairy Science
90(Suppl. 1), 374–375.
Villanueva, B., Pong-Wong, R., Ferna ´ ndez, J. & Toro,
M. A. (2005). Benefits from marker-assisted selection
under an additive polygenic genetic model. Journal of
Animal Science 83, 1747–1752.
Visscher, P. M. (2008). Sizing up human height variation.
Nature Genetics 40, 489–490.
Visscher, P. M., Medland, S. E., Ferreira, M. A., Morley,
K. I., Zhu, G., Cornes, B. K., Montgomery, G. W. &
Martin, N. G. (2006). Assumption-free estimation of
heritability from genome-wide identity-by-descent shar-
ing between full siblings. PLoS Genetics 2, e41.
Wray, N. R., Goddard, M. E. & Visscher, P. M. (2007).
Prediction of individual genetic risk to disease from gen-
ome-wide association studies. Genome Research 17,
B. J. Hayes et al. 60