Content uploaded by Ahmad H Sallam
Author content
All content in this area was uploaded by Ahmad H Sallam on Nov 24, 2015
Content may be subject to copyright.
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
th e p l a nt g en om e m arch 2 015 vo l. 8, no. 1 1 of 15
o r i g i n a l r es e a r c h
Assessing Genomic Selection Prediction Accuracy
in a Dynamic Barley Breeding Population
A. H. Sallam, J. B. Endelman, J.-L. Jannink, and K. P. Smith*
ABSTRACT
Prediction accuracy of genomic selection (GS) has been previ-
ously evaluated through simulation and cross-validation; however,
validation based on progeny performance in a plant breeding
program has not been investigated thoroughly. We evaluated
several prediction models in a dynamic barley breeding popula-
tion comprised of 647 six-row lines using four traits differing in
genetic architecture and 1536 single nucleotide polymorphism
(SNP) markers. The breeding lines were divided into six sets
designated as one parent set and five consecutive progeny sets
comprised of representative samples of breeding lines over a
5-yr period. We used these data sets to investigate the effect of
model and training population composition on prediction accu-
racy over time. We found little difference in prediction accuracy
among the models confirming prior studies that found the simplest
model, random regression best linear unbiased prediction (RR-
BLUP), to be accurate across a range of situations. In general, we
found that using the parent set was sufficient to predict progeny
sets with little to no gain in accuracy from generating larger train-
ing populations by combining the parent set with subsequent
progeny sets. The prediction accuracy ranged from 0.03 to 0.99
across the four traits and five progeny sets. We explored charac-
teristics of the training and validation populations (marker allele
frequency, population structure, and linkage disequilibrium, LD) as
well as characteristics of the trait (genetic architecture and heri-
tability, H2). Fixation of markers associated with a trait over time
was most clearly associated with reduced prediction accuracy
for the mycotoxin trait DON. Higher trait H2 in the training popu-
lation and simpler trait architecture were associated with greater
prediction accuracy.
Genomic selection is touted as a marker-based breed-
ing approach that complements traditional marker-
assisted selection (MAS) and phenotypic selection. In
traditional MAS, favorable alleles or genes for relatively
simply inherited traits are mapped and then molecular
markers linked to those alleles are used to select indi-
viduals to use as parents or to advance from segregating
breeding populations (Bernardo, 2008). Marker-assisted
selection is more eective than phenotypic selection if the
tagged loci account for a large portion of the total genetic
variation within the population of selection candidates
(Collins et al., 2003; Castro et al., 2003; Xu and Crouch,
2008). e limitation of traditional MAS for highly com-
plex traits is that it captures only a small portion of the
total genetic variation because it uses a limited number of
selected markers (Lande and ompson, 1990; Bernardo,
2010). Phenotypic selection is eective on quantitative
traits, but is limited to stages in breeding cycles and envi-
ronments where such traits can be measured eectively,
such as for advanced lines in multiple location eld tri-
als. erefore, GS can be strategically implemented in
Published in The Plant Genome 8
doi: 10.3835/plantgenome2014.05.0020
© Crop Science Society of America
5585 Guilford Rd., Madison, WI 53711 USA
An open-access publication
All rights reserved. No part of this periodical may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher.
Permission for printing and for reprinting the material contained herein
has been obtained by the publisher.
A.H. Sallam, Dep. of Agronomy and Plant Genetics, Univ. of Minne-
sota, St. Paul, MN 55108; J.B. Endelman, Dep. of Horticulture, Univ.
of Wisconsin-Madison, 1575 Linden Dr., Madison, WI 53706; J.-L.
Jannink, USDA-ARS, R.W. Holley Center for Agriculture and Health,
Cornell Univ., Ithaca, NY 14853; K.P. Smith, Dep. of Agronomy and
Plant Genetics, Univ. of Minnesota, St. Paul, MN 55108. Received
6 Jan. 2014. *Corresponding author (smith376@umn.edu).
Abbreviations: BLUEs, best linear unbiased estimations; DON,
deoxynivalenol; EMMA, efficient mixed-model association; FHB,
Fusarium head blight; Fst, Wright’s fixation index; GS, genomic
selection; H2, heritability; GEBV, genomic estimated breeding value;
LD, linkage disequilibrium; MAF, minor allele frequency; MAS,
marker-assisted selection; ra, predictive ability; QTL, quantitative trait
loci; REML, restricted maximum likelihood; RKHS, Reproducing Kernel
Hilbert Space; RR-BLUP, random regression best linear unbiased pre-
diction; SNP, single nucleotide polymorphism.
Published March 13, 2015
2 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
breeding for quantitative traits at points in the breeding
process where phenotypic selection is not feasible.
Genomic selection uses trait predictions based on
estimates of all marker eects distributed across the
genome (Meuwissen et al., 2001). Based on simulation
studies, GS should improve gain from selection, reduce
costs associated with phenotyping, and accelerate devel-
opment of new cultivars by reducing the length of the
breeding cycle (Hener et al., 2009, 2010). Implementing
GS is accomplished by rst estimating marker eects
in a training population and then using those estimates
to predict the performance of selection candidates. e
predicted value of a selection candidate based on marker
eects is referred to as the genomic estimated breeding
value (GEBV; Meuwissen et al., 2001).
A key component to the eectiveness of GS is pre-
diction accuracy. Prediction accuracy is dened as the
correlation between the GEBV and the true breeding
value divided by the square root of H2, which is estimated
by measuring phenotypic performance (Goddard and
Hayes, 2007; Zhong et al., 2009). ere are three general
methods to assess prediction accuracy using real data: (i)
subset validation, (ii) interset validation, and (iii) progeny
validation (Figure 1). Subset validation is implemented by
randomly dividing a single population of individuals into
equal subsamples; one subsample is used as a validation
set to be predicted using the remaining subsamples as
the training set. Subset validation has been used to assess
prediction accuracy in cattle, wheat (Triticum aestivum
L.), and barley (Hordeum vulgare L.), among many other
livestock and crop species (Luan et Al., 2009; Hener et
al., 2010; Lorenz et al., 2012; Poland et al., 2012). In inter-
set validation, predened sets of genotypes are designated
as training and validation populations. ese sets could
be the same genotypes from independent environments
as training and validation data sets or sets of breeding
lines chronologically dened where older lines are used
to predict newer lines from either the same or indepen-
dent environments (Asoro et al., 2011; Lorenz et al., 2012).
Progeny validation implies that the training popula-
tion includes parents (or grandparents, and so forth) of
progeny lines that comprise the validation population.
A simulation study in animals has shown that decreases
in prediction accuracy are associated with decay of LD
between markers and quantitative trait loci (QTL) result-
ing from recombination in progeny generations (Habier et
al., 2007). erefore, meaningful assessment of prediction
accuracy should include progeny validation. In plants, we
are aware of only a single study that assesses accuracy by
progeny validation using empirical phenotypic and geno-
typic information (Hoeinz et al., 2012).
To assess the potential of GS, researchers have
explored various factors that aect prediction accuracy,
including prediction models. ese models include RR-
BLUP, Bayes A, Bayes B, Bayes Cp, Bayes LASSO, and
Reproducing Kernel Hilbert Space (RKHS; Meuwissen et
al., 2001; Kizilkaya et al., 2010; de los Campos et al., 2009;
Gianola and van Kaam, 2008). ese models dier in the
assumptions made for marker variances associated with
markers and/or types of gene action (reviewed by Lorenz
et al., 2011). RR-BLUP assumes that all markers have
equal variance, whereas Bayes A, Bayes B, Bayes Cp, and
Bayes Lasso models do not impose this constraint (Meu-
wissen et al., 2001; de los Campos et al., 2009; Kizilkaya et
al., 2010). e RKHS regression model can capture both
the additive and nonadditive interactions among loci by
creating a kernel matrix that includes interactions among
marker covariates (Gianola and van Kaam, 2008). Results
of empirical studies have shown variable performance of
prediction models on dierent traits (Crossa et al., 2010;
Lorenz et al., 2012; Rutkoski et al., 2012).
Other factors shown to aect prediction accuracy
include: (i) the LD between markers and QTL in the
training and the validation populations, (ii) the size of
the training population (N), (iii) the H2 of the trait under
investigation, and (iv) the genetic architecture of the trait.
Increasing marker density will improve prediction accu-
racy by increasing the number of QTL that are in LD with
markers and capturing more of the genetic variation (de
Roos et al., 2009; Asoro et al., 2011; Hener et al., 2011;
Zhao et al., 2012). e successful application of GS across
generations relies on the persistence of LD phase between
markers and QTL (de Roos et al., 2008). e persistence of
LD phase measured by the correlation of r among popula-
tions is likely to be a function of the genetic relationship
between populations (de Roos et al., 2008; Toosi et al.,
2010). Increasing N will lead to better estimation of SNP
eects (Hayes et al., 2009) and therefore, increases predic-
tion accuracy (Lorenzana and Bernardo, 2009; Asoro et al.,
2011; Lorenz et al., 2012). In a simulation study, Daetwyler
et al. (2010) found that prediction accuracies increased
with increase in H2 of the trait regardless of the number
of QTL controlling the trait or the prediction model used.
In a study that manipulated H2 by introducing random
error into empirical data sets, Combs and Bernardo (2013)
showed that accuracy increased with increasing H2 and N,
and that prediction accuracies were similar for dierent
combinations when H2 ´ N were held constant. Generally,
prediction accuracy decreases with the increase of trait
complexity (Hayes et al., 2010). Prediction models can vary
in performance among traits with dierent genetic archi-
tecture. Bayes B was more accurate when a smaller number
of loci control the trait whereas RR-BLUP was insensitive
to genetic architecture (Daetwyler et al., 2010).
Previous studies have demonstrated the potential
of GS on the basis of subset validation and interset vali-
dation. While these results are promising, additional
research is needed to assess accuracy in the context of
applied breeding. Specically, validation experiments are
needed to assess the accuracy of prediction on progenies
(progeny validation) over time, as would occur in breed-
ing populations. is would take into account changes
in allele frequency and LD that would be expected to
occur as a result of recombination and selection within
a dynamic breeding program. Lorenz et al. (2012) inves-
tigated prediction accuracy for the disease Fusarium
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 3 of 15
head blight (FHB) and its associated mycotoxin deoxyni-
valenol (DON) using interset validation. In this study,
we advance this work by using progeny validation and
include additional agronomic traits. We use a set of
breeding lines as a training population that include par-
ents that were used to predict ve chronological sets of
progenies (2006–2010) from a breeding program. Our
specic objectives were to (i) compare the accuracy of
dierent GS prediction models on DON concentration,
FHB resistance, yield, and plant height, (ii) study the
eect of trait architecture on prediction accuracy, (iii)
characterize changes in prediction accuracy over time,
(iv) examine the relationship between prediction accu-
racy and training population size and composition, allele
frequency, LD, and genetic distance between the training
and validation populations.
MATERIALS AND METHODS
Germplasm
To explore the accuracy of genomic predictions, we uti-
lized historical sets of breeding lines that we dene as par-
ent or progeny sets from the University of Minnesota bar-
ley breeding program. e parent set is comprised of 168
breeding lines that were developed between 1999 and 2004
and were either used as parents to develop lines in the
progeny sets or were cohorts of breeding lines that were
used as parents. e ve progeny sets consist of ve chron-
ological sets of breeding lines evaluated between 2006 and
2010. Each progeny set consists of approximately 96 lines
that were representative of the breeding lines developed
that year in the breeding program. e progeny sets 2006
and 2007 are the breeding lines from the University of
Minnesota barley breeding program that were included in
the association mapping study conducted by Massman et
al. (2011) and were referred to as CAP I and CAP II in that
study. All the breeding lines in the parental and progeny
sets were developed by single seed decent to at least at the
F4. At that point, F4:5 lines were evaluated for resistance to
FHB resistance, heading date, plant height, maturity, and
lodging. Lines selected as favorable for these traits are then
advanced to preliminary yield trials the following year
(Smith et al., 2013). e preliminary yield trial data were
used to characterize progeny set lines and the year desig-
nation for the progeny set refers to the year that the breed-
ing line entered preliminary yield trials. All pedigree,
SNP markers, and phenotypic data related to these sets of
breeding lines are available from the public database e
Hordeum Toolbox (http://thehordeumtoolbox.org, veried
3 Oct. 2014; Blake et al., 2012).
Phenotypic Evaluation
e parental lines were evaluated together for agronomic
traits in ve experiments conducted between 2009 and
2011 at Crookston and St. Paul, MN, in an augmented
block design with two replications and four incomplete
blocks per replication (Supplemental Table S1). Planting
density for all traits in all experiments was 300 plants m–2 .
Each line was represented once per block in two-row plots
3 m in length. Six check cultivars (Drummond, Lacey,
Quest, Rasmusson, Stellar-ND, and Tradition) were ran-
domly assigned to each block (Horsley et al., 2002, 2006a;
Rasmusson et al., 2001; Smith et al., 2010, 2013). We
also characterized the parental lines using the historical
data that was collected as part of the breeding program
as these lines were entered into preliminary yield trials.
Experiments for this unbalanced data set were arranged
as a randomized complete block design with two replica-
tions in two-row plots 3 m in length and were conducted
between 1999 and 2004. ree checks (‘Robust’, ‘Stander’,
and Lacey) were common to all the experiments (Ras-
musson and Wilcoxson, 1983; Rasmusson et al., 1993).
For both the historic and contemporary data sets, each
line was evaluated at least two times in yield trials con-
ducted in St. Paul, Morris, and Crookston, MN. Yield was
determined by harvesting each plot with a Wintersteiger
small plot combine, weighing the grain, and express-
ing it as kg/ha. Plant height was assessed as the height in
centimeters of two randomly selected samples of plants
from the middle of the plot from the soil surface to the
tip of the spike, excluding awns. e parental lines were
evaluated for FHB resistance and DON concentration in
2009 at St. Paul and in 2010 at St. Paul and Crookston,
MN, in an augmented block design with two replications
in four incomplete blocks. Each line was represented one
time per block in single-row plots 1.8 m in length, with
30 cm between rows. Six check cultivars (Drummond,
Lacey, Quest, Rasmusson, Stellar-ND, and Tradition)
were randomly assigned to each block. e parental lines
were evaluated for FHB resistance and DON concentra-
tion using a previously described method (Steenson,
2003). Briey, in St. Paul, plants were spray inoculated
with a F. graminearum macroconidia suspension using
CO2–pressure backpack sprayers. Plots were inoculated
when at least 90% of the heads had emerged from the
boot and sprayed again 3 d later (Mesn et al., 2003). Mist
irrigation was applied immediately aer inoculation to
promote disease infection. In Crookston, MN, plants were
inoculated by grain spawn using autoclaved corn (Zea
mays L.) colonized by ve local isolates of F. graminearum
(Horsley et al., 2006b). e colonized grain was spread
on the ground 2 wk before owering and again 1 wk later.
Overhead mist irrigation started 2 wk before anthesis and
continued until the hard dough stage of maturity. Fusar-
ium head blight severity was assessed about 14 d aer
inoculation by estimating the percentage of infected ker-
nels on a random sample of 10 spikes per plot using the
following assessment scale 0, 1, 3, 5, 10, 15, 25, 35, 50, 75,
and 100%. DON concentration was determined on a 25-g
sample from the harvested grain by gas chromatography
and mass spectrometry and expressed in ppm according
to the procedures of Mirocha et al. (1998).
Lines included in the progeny sets were derived from
crosses made between 2003 and 2007 and were evalu-
ated in preliminary yield trials conducted from 2006 to
2010 (Supplemental Table S1). Plots were arranged in a
4 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
randomized complete block design with two replications
and four check varieties (Robust, Stander, MNBrite, and
Lacey). Each progeny set was evaluated for yield and
plant height in Crookston, St. Paul, and Morris, MN, as
described above. e progeny sets were also evaluated for
FHB resistance and DON accumulation in disease nurs-
eries as described above. Each progeny set was evaluated
in three to four FHB experiments located in St. Paul and
Crookston, MN, and Osnabrock and Fargo, ND. Disease
inoculation, disease assessment, and DON measure-
ments were done as previously described.
Genotypic Evaluation
DNA for genotyping was extracted from a single plant
from the F4:5 bulk seed used in the phenotypic evaluation.
Approximately 3-wk-old leaf tissue was harvested and
freeze-dried. DNA was extracted at the USDA genotyp-
ing center in Fargo, ND, using the protocol of Slotta et al.
(2008). Each DNA sample was genotyped with the 1536
SNPs referred to as BOPA1 using the Illumnina Golden-
Gate oligonucleotide assay (Close et al., 2009). Markers
were ltered in parents set based on minor allele fre-
quency (MAF) < 0.01 and missing data frequency > 10%
Missing marker values were imputed using naïve imputa-
tion so that analytical operations could be performed.
Data Analysis
Analysis of variance was performed for DON concentra-
tion, FHB resistance, yield, and plant height using the
PROC GLM procedure in SAS (v.9.3, SAS Institute, 2011).
For each experiment, outlier observations with stan-
dardized residual absolute values of three or more were
removed from the data set and scored as missing values.
One experiment (yield in St. Paul, 2010) was removed
because no signicant dierences were found among lines.
To avoid including common checks across experi-
ments in variance component estimates, two-step pro-
cedures were used. For the contemporary data from the
parental set, we rst adjusted phenotypes for block eects
by using the common checks among blocks using the
PROC MIXED procedure in SAS (v.9.3, SAS Institute,
2011). e model was y = Xβ + Zu + e where y is the vec-
tor of unadjusted phenotypes, β is the vector of xed
block eects, and u is the vector of random check eects.
X and Z are incidence matrices to relate the vector of
unadjusted phenotypes to β and u. We then adjusted
phenotypes for trial eects by estimating these eects
as xed in an analysis with lines as random eects. e
model was y* = Xβ + Zu + e, where y* represents the
phenotypes adjusted for block eects calculated in the
rst step, β is the vector of xed trial eects, and u is the
vector of random line eects. In the historic data for the
parent set, subsets of lines were evaluated in dierent
years, but a common set of checks was included in each
trial. Similarly to the contemporary data, phenotypes
were adjusted for trial eects by computing these eects
in a mixed model with checks as random and trials as
xed eects. In the progeny data sets, phenotypes were
adjusted for trial eects by computing these eects in a
mixed model, with lines as random and trials as xed
eects. Finally, best linear unbiased estimations (BLUEs)
for lines in each experiment were estimated in models
with adjusted phenotypes as the response variable and
lines as xed eects. Variance components were esti-
mated using restricted maximum likelihood (REML) in
the PROC MIXED procedure in SAS by using the line
BLUEs as the response variable, lines as random eects
and experiments as xed eects. Broad-sense H2 on an
entry mean was estimated for all traits using the equa-
tion H2 =sg
2/(sg
2 + se
2/n), where sg
2 is genetic variance,
se
2 is the pooled error variance that includes G ´ E and
residuals, and n is the number of trials.
Characterizing LD, Genetic Distance,
and Parental Contribution
To assess the extent of the LD within the parental and
progeny sets, the adjacent marker LD was character-
ized as r2 using Haploview v.4.0 (Barrett et al., 2005). To
assess the persistence of LD phase between the parental
and progeny sets, the correlations of r were calculated
between parental and each progeny set (de Roos et al.,
2008; Toosi et al., 2010). We measured genetic distance
between the parent set and each progeny set by the
xation index (Fst, Weir and Cockerham, 1984) and
Nei’s genetic distance (Nei, 1987). Fst and Nei’s genetic
distances measure the dierentiation between two
populations due to changes in allele frequencies among
populations. e contribution of the parental lines to a
progeny set was assessed by summing the number of par-
ents for a progeny line that were included in the parent
set over the progeny set and dividing that by twice the
number of lines in that progeny set.
Association Analysis
To identify sets of markers associated with traits, associa-
tion analysis was implemented using the ecient mixed-
model association (EMMA) approach, which corrects for
population structure using genetic relatedness (Kang et al.,
2008). Association analyses were done on the parent set
for DON concentration, FHB resistance, yield, and plant
height using EMMA package implemented in R (Kang et
al., 2008). e analysis was based on the mixed model:
y = Xβ + Zu + e [1]
where y is the vector of individual phenotypes, X is an
incidence matrix that relates β to y, β is the vector of
xed eects that includes the overall mean and SNPs, Z
is the matrix of random eects that relates u to pheno-
types, u is the random eect of the genetic background
of each line and is distributed as u ~ N(0, Ksg
2). K is the
kinship matrix derived from marker genotypes and sg
2
is the genetic variance, and e is the residual where e ~
N(0, σ ²e I). I is the identity matrix and σ²e is the error
variance (Kang et al., 2008). We used a relaxed threshold
of –log p-value of 1.3 (p-value of 0.05) to identify markers
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 5 of 15
potentially associated with traits. ese subsets of mark-
ers were used to investigate the changes in allele frequen-
cies over time in the progeny sets. For all polymorphic
SNP markers, the proportion of variance explained
by each marker (R2) was calculated as R2 = SSreg/SStot,
whereas SSreg is the regression sum of squares, and SStot is
the total sum of squares of the regression model.
Prediction Models
Genomic predictions were estimated using four methods:
ridge regression best linear unbiased prediction (RR-
BLUP; Meuwissen et al., 2001), Gaussian kernel model
(Gianola and van Kaam, 2008; Endelman, 2011), Expo-
nential kernel model (Piepho, 2009; Endelman, 2011),
and Bayes Cp (Kizilkaya et al., 2010). RR-BLUP and
Bayes Cp can be modeled as
{ 1}
1 e
K
jjj
j
ua
=
= + d+
å
yZ
[2]
where y is the vector of individual phenotypes, u is the
population mean, K is the number of markers, Z is the
incidence matrix that links marker j genotypes to indi-
viduals, a is the eect of marker j, d is an indicator vari-
able that indicates the absence or the presence of marker
j with probability of p and 1 – p, respectively, and e is the
random residual. In RR-BLUP, all markers are included
(d = 1) and their eects are distributed with the same
variance N(0, sa
2). e variance of this distribution was
estimated on the basis of marker and phenotypic data
using REML. A Bayesian model was used to relax the
assumption of RR-BLUP to allow some marker variances
to be zero. Bayes Cp assumes common marker vari-
ance across all markers included in the model; however,
it allows some markers to have no eect on the trait
(Kizilkaya et al., 2010). In Bayes Cp, it is assumed that
each marker j has a zero eect with probability p when
dj = 0 and an eect aj ~ N(0, sa
2) with probability (1 – p)
when dj = 1. e parameter p is treated as unknown and
is estimated from the training data. In the Markov Chain
Monte Carlo (MCMC) algorithm for Bayes Cp, 10,000
iterations of Gibbs sampling were used and the rst 2500
iterations were discarded as burn-in. We implemented
Bayes Cp analysis in R (R Development Core Team,
2012). Gaussian and Exponential kernel models were
implemented to capture both the additive and nonaddi-
tive interactions between marker genotypes using the R
package rrBLUP (Endelman, 2011; R Development Core
Team, 2012). ese models can be presented as
y = 1u + Zg + e [3]
where y is the vector of individual phenotypes, u is the
population mean, Z is the matrix that relates g to the
adjusted phenotypes, g is the vector of genotypic values
that is distributed as g ~ N(0, Ksg
2), where K is the kernel
similarity matrix, and e is the residual (Endelman, 2011).
e Gaussian and Exponential models do not partition
the total genetic variance into additive and nonadditive
variances; rather, kernel functions are used to capture
these eects. Genomic predictions were calculated for all
the lines in the validation population using the four pre-
diction models. e correlation coecient between the
genomic predictions and line BLUEs was used to calcu-
late the predictive ability (ra). Prediction accuracy (ra/H)
of GS (Legarra et al., 2008; Chen et al., 2011) was calcu-
lated by dividing the ra by the square root of the broad-
sense H2 derived from the validation population data.
Training Populations
To test the eect of training population composition
on prediction accuracy, three dierent scenarios were
implemented by varying the training data set. In the
rst scenario, the 168 parental lines, using either the
contemporary or historic data, were used as the training
set to predict the performance of lines in each of the ve
progeny sets. In the second scenario, we varied the train-
ing population composition by adding one or more of the
progeny sets to the contemporary parent set to predict
the performance of a later progeny set. In the third sce-
nario, we used two earlier progeny sets to predict a later
progeny set. For each scenario, we implemented the four
prediction models described previously.
Because the experiments described above were used
to assess dierent types of training populations that var-
ied in population size, we also tested the eect of train-
ing population size on prediction accuracy for two out of
the four traits in 2008 and 2010 progeny sets as valida-
tion populations. For DON concentration and yield, we
used three scenarios. In the rst scenario, we randomly
sampled 25, 50, 75, 100, and 150 lines from the parent set
(n = 168). For each population size, samples were drawn
without replacement 500 times. In the second scenario,
we combined progeny sets before the validation set (com-
bined 2006 to 2007 when predicting 2008 and 2006 to
2009 when predicting 2010) with the parent set into a
single panel from which samples were drawn to generate
various training sets. We generated training sets from
the larger training panels by randomly sampling 25, 50,
75, 100, 150, 168, 264, and 360 when predicting 2008 and
Figure 1. Three validation approaches to assess prediction accu-
racy using different training and prediction sets.
6 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
sampling 25, 50, 75, 100, 150, 168, 264, 360, and 456 lines
when predicting 2010. For each population size, samples
were drawn without replacement 500 times. In the third
scenario, we combined the parental and progeny sets by
sequentially adding each progeny set in chronological
order to the parent set such that in each round of predic-
tion the size of the training population was increased by
96. is represents the single case that would occur if a
breeder accumulated data over time to increase the size
of the training population, and thus there exists just one
instance for this scenario. For each of the scenarios, we
used the training populations to generate predictions of
the 2008 and 2010 progeny sets for DON concentration
and yield using RR-BLUP.
RESULTS
Phenotypic Traits and Marker Density
e parent set and each of the progeny sets were evalu-
ated in multiple yield and disease experiments between
2006 and 2011. e yield experiment for the parent set
in St. Paul 2010 was removed due to very severe lodging
that resulted in large error variance and no signicant
dierences among lines. For all traits and experiments,
we observed signicant dierences among lines (p-value
< 0.01) in the parent and progeny sets. Genetic variances
(Table 1) decreased for DON concentration and plant
height as a function of progeny set year, whereas the
genetic variances for yield and FHB resistance uctuated.
e estimates of the H2 were moderate for DON concen-
tration and FHB resistance; low to moderate for yield;
and high for plant height as expected based on previous
studies (Boukerrou and Rasmusson, 1990; Ma et al.,
2000; Mesn et al., 2003). Aer ltering the 1536 BOPA1
SNPs for MAF and missing data, 984 markers remained
that spanned 1085 cM of the barley genome with an aver-
age distance between adjacent markers of 1.1 cM.
Relationship between Parent and Progeny Sets
e average adjacent marker LD in the progeny sets were
greater than the parent set and showed a slight increase
over time (Fig. 2). e correlation of r between parental
and progeny sets ranged from 0.44 to 0.61 (Fig. 2). e
parental contribution of the parent set to the progeny
sets decreased continuously over time with about a 75%
reduction from 2006 to 2010 (Fig. 3). Concurrent with
this decrease in parental contribution was an increase
in genetic distance between parent and the consecutive
progeny sets over time (Fig. 3). e genetic relationship
between the parents and the progeny sets can be visual-
ized in the heatmap of the kinship matrix (Fig. 4). As
lines were developed in the breeding program, their
similarity to the parent set diminished over time.
Figure 3. Relationship between the parent set and each progeny
set expressed as percentage of parental contribution (square)
to each of the progeny sets and the genetic distance between
parental and progeny sets expressed as Fst (Wright’s fixation
index, triangle; Wright, 1965) and Nei’s genetic distances
(circle; Nei, 1987).
Figure 2. The average linkage disequilibrium (LD) of all possible
adjacent marker pairs in the parental and progeny sets (triangle)
and the correlation (Cor) of r between parents and each of the
progeny sets (circle).
Table 1. The estimated genetic (sg2) and error (se2)
variances for the parent contemporary and progeny
sets (2006–2010) for deoxynivalenol concentration
(DON), Fusarium head blight resistance (FHB), yield,
and plant height (HT).
Year
sg
2se
2
DON FHB Yield HT DON FHB Yield HT
Parents 15.83 12.00 114,411 26.20 20.00 35.00 444,284 11.30
2006 51.23 34.80 9 7,7 01 19. 30 122.23 8 9.60 330,349 8.90
2007 15.40 4.60 109,203 15.20 24.01 16.00 334,950 9.50
2008 23.20 14.40 280,128 28.40 74.10 20.20 274,314 7.90
2009 12.9 0 6.95 68,516 14.50 22.90 22.99 360,820 11.30
2010 7. 50 13 .99 95, 241 7.96 16.80 58.60 365,080 8.10
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 7 of 15
Marker-Trait Associations
Based on association analysis using the parent set, all of
the traits displayed quantitative inheritance with mul-
tiple loci distributed across the genome contributing
to the traits (Fig. 5). Coincident QTLs for plant height,
DON concentration, and FHB resistance on the short
arm of chromosome 4H were detected in a region previ-
ously identied in a study using a similar germplasm
(Massman et al., 2011). Using a relaxed p-value of 0.05,
we identied 62, 58, 62, and 59 markers associated with
DON concentration, FHB resistance, yield, and plant
height, respectively.
To characterize the possible role of selection, we
examined allele frequencies of the SNP markers associ-
ated with the four traits (Fig. 6). In general, there was an
increase over time with the complete set of genome-wide
markers. For markers associated with individual traits
this trend was most apparent for DON concentration
followed by plant height and yield. No relationship was
observed for FHB resistance.
To investigate the eect of trait architecture and the
distribution of maker eects on prediction accuracy, we
estimated the proportion of variance explained by each
SNP marker in the parent set for all traits (Fig. 7). Based
on the distribution of R2 values, we found that plant
height was the least complex trait with several markers
exceeding 0.30. Yield, on the other hand, was the most
complex trait with only a few markers with R2 values >
0.10. Based on Fig. 5, it is clear that multiple markers are
likely associated with the same QTL. Nevertheless, 83,
59, 24, and 17 markers had R2 values greater than 0.10 for
plant height, DON concentration, FHB resistance, and
yield, respectively.
Another way to characterize the genetic architec-
ture of the traits is to use the p parameter, which is the
proportion of markers with no eect, estimated from
the Bayes Cp modeling. When using the parent set as a
training population, the p parameter estimates for yield
over four runs ranged between 0.28 and 0.43, with a
mean p of 0.35. For DON concentration the p parameter
estimates ranged between 0.37 and 0.54, with a mean
p of 0.45. For FHB the p parameter estimates ranged
between 0.49 and 0.58, with a mean p of 0.53. For plant
height, the p parameter estimates ranged between 0.45
and 0.80, with a mean p of 0.63. us, based on p esti-
mates, yield was the most complex trait, followed by
DON concentration, FHB resistance, and then plant
height. is suggestive trend from higher to lower com-
plexity agrees with the distribution of R2 values for mark-
ers displayed in Fig. 7. Assessment of genetic architecture
based on the p parameter estimates also agrees with the
results of (Lorenz et al., 2012) for DON concentration
and FHB resistance.
Prediction Accuracy
For the four traits investigated, all prediction models
performed similarly to each other with respect to predic-
tion accuracy (Supplemental Fig. S1). When we averaged
the prediction accuracy across the ve progeny set years,
we found no signicant dierences among the models
(Supplemental Table S2). ere was a strong correlation
among the four models for the predictions of yield for
the combined set of progeny when using the parent set as
a training population (Fig. 8). Consistent with other GS
studies RR-BLUP, in which all marker eects are sam-
pled from the same distribution and similarly shrunken
toward zero equally, performed similarly to models that
do not impose that restriction. Further comparisons of
prediction accuracy are based on RR-BLUP.
Another important consideration for prediction
accuracy is the need to generate new phenotypic data for
model training or to use existing data sets. We estimated
marker eects using available historical data for yield
from the breeding program in the parent set and com-
pared that to estimates obtained using the contemporary
data set (Fig. 9). e average prediction accuracy over
the 5 yr based on contemporary (0.57) and historic data
(0.42) were not signicantly dierent (p-value = 0.38).
Combining historic and contemporary data was equal to
using the contemporary data alone.
In general, when using the parent set to predict
progeny sets, accuracy was highest for plant height and
lowest for yield (Fig. 10). Fusarium head blight resistance
and DON concentration had similar prediction accuracy.
e relationship between accuracy and year of the prog-
eny set also diered between traits (Fig. 10). For yield
and plant height the prediction accuracy uctuated over
time, while for DON concentration there was an overall
decrease. Accuracy for FHB resistance remained rela-
tively constant across years.
Varying the size of the training population by
adding one or more progeny sets to the parent set to
predict a later progeny set generally showed the same
trend observed for the parent set alone, and in several
instances resulted in reduced accuracy. Using the most
Figure 4. Heatmap displaying the similarity kinship matrix calcu-
lated using marker data for parents and all progeny sets.
8 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
recent breeding lines and environments to the test
population by training a prediction model from the two
progeny sets before the validation year was generally less
accurate than using the parent set. e general trend was
that higher trait H2 in the parent training population
corresponded to higher ra when parents predicted all the
progenies using RR-BLUP (Fig. 11).
Since adding consecutive sets of lines to the parent
set changed both the composition and the size of the
training set, we looked at the eect of population size on
prediction accuracy with the parent set only, and with
the parent set combined with the progeny sets with dif-
ferent population sizes drawn at random. In both cases,
we identied an increase in accuracy with increasing in
population size for DON concentration and yield (Fig.
12). However, prediction accuracy for DON concentra-
tion seemed to plateau at a population size of 75, while
yield did not appear to plateau. It was also interesting
that random sampling from just the parent set oen
produced higher prediction accuracies than random
sampling from the combined parent and progeny sets
when compared at the same training population size.
DISCUSSION
Successful implementation of GS will involve the use of
improved genotyping technology to shorten the breeding
cycle and increased selection intensity by eective mod-
eling to accurately predict breeding values. Prior studies
have examined factors that aect prediction accuracy
through simulation and empirically through cross-vali-
dation (e.g., Daetwyler et al., 2010; Lorenz et al., 2012). To
assess prediction accuracy in a more realistic context, we
used sets of parents and progenies from an active breed-
ing program as training and validation sets. Because
breeding populations are dynamic, we tested progeny
sets dened chronologically over a 5-yr period. We found
that prediction accuracy varied over time, and that sim-
ply adding data from breeding progenies to the training
population did not improve and oen reduced predic-
tion accuracy. is suggests that careful construction of
Figure 5. Manhattan plot displaying significance level for association mapping of deoxynivalenol (DON) concentration, Fusarium head
blight (FHB) resistance, yield, and plant height in the contemporary parent data set. The relaxed threshold of 1.3 – log(p) which corre-
sponds to p-value of 0.05 is shown with a horizontal line.
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 9 of 15
training populations is warranted. We considered the
relationship between the training and the validation pop-
ulations with regard to genetic distance and dierences
in LD and allele frequencies. In this breeding population,
all of these factors changed over time, and each individu-
ally could not completely account for dierences in pre-
diction accuracy. However, the data support their role in
aecting prediction accuracy and suggest they should be
taken into account when designing training data sets and
developing strategies for retraining models over time to
maintain acceptable levels of accuracy.
Prediction Accuracy Can be Affected by
Changes in Breeding Populations Over Time
Breeding populations are dynamic and as such
approaches to using prediction methods should be
informed by changes in prediction accuracy that may
occur over time. Breeding value predictions are inu-
enced by allele frequencies, LD level, and the introgres-
sion of new alleles. ese factors will change over breed-
ing cycles due to selection, genetic dri, and unequal
parental contribution to progenies. We investigated pre-
diction accuracy in validation sets of breeding lines over
a 5-yr period and observed both little to no change, as
well as substantial decrease in prediction accuracy over
time, depending on the trait. To better understand the
underlying population parameters that could be aecting
prediction accuracy, we compared the parental training
population to the progeny validation sets with respect
to allele frequencies, parental contribution, genetic dis-
tance, and LD.
More than 35% of marker alleles that were segregat-
ing in the parent set became xed in the 2010 progeny
set. Gradual increases in allele xation for trait specic
markers are an indication of the eect of selection and/or
genetic dri. A previous study of the University of Min-
nesota barley breeding program showed that a reduction
in allelic diversity for specic simple sequence repeat
markers was in some cases associated with QTL regions
for traits that were under selection (Condón et al., 2008).
Once a marker associated with a trait that is segregat-
ing in training population becomes xed in subsequent
progeny generations, it loses its predictive value for the
purpose of selection. We observed a substantial increase
in the number of xed SNPs associated with DON that
corresponded to a reduction in prediction accuracy.
However, we saw a similar increase in xed SNPs for yield
and no corresponding reduction in accuracy. One pos-
sible explanation is that yield is likely conditioned by a
greater number of QTL with smaller eects and therefore
increases in the number SNPs that become xed over
time would have less of an eect on prediction accuracy.
Prediction accuracy should be greatest when the
training population is more closely related to the valida-
tion population (Habier et al., 2007; Hayes et al., 2009;
Lorenz et al., 2012). We observed in most cases that the
2006 progeny set, which was genetically more similar to
and had the largest number of direct parents from the
parent set, was predicted with the greatest accuracy. e
increase in the genetic distances between the parental
and progeny sets was most closely associated with a
decline in prediction accuracy for DON but not for FHB
resistance, yield, and plant height. is indicates that
other factors may also contribute to changes in predic-
tion accuracy over time.
Populations can dier in the degree of the LD
between markers and QTL due to dri, selection, and/or
recombination (Dekkers, 2004; Barton and Otto, 2005).
Prediction accuracy should increase as LD between
markers and QTL increases. Recombination in breed-
ing populations should reduce LD between markers
and QTL over the time while selection and genetic dri
should increase LD (Pfaelhuber et al., 2008). Habier et
al. (2007) studied the eect of LD on prediction accuracy
over many generations and found a decrease in predic-
tion accuracy was associated with decay of LD. We found
a general increase in adjacent marker LD in the prog-
eny sets over time, while prediction accuracy generally
remained constant or decreased. We also examined the
persistence of adjacent marker LD between the parent
set and each of the progeny sets using the correlation of
r (de Roos et al., 2008; Toosi et al., 2010). e correlation
of r did not decay over the window of time of this experi-
ment, despite the fact that genetic distance between the
parents and progeny sets increased over time. Asoro et
al. (2011) suggested that the ability of early generations
to predict later generations was due to the persistence of
the LD phase between early and late generations. us,
even if validation populations become more genetically
distant from training populations, if the LD phase is con-
sistent, prediction accuracy will be maintained.
Figure 6. Percentage of single nucleotide polymorphisms (SNPs)
that are fixed in the complete marker set (genome-wide) and in
the subsets of markers associated with deoxynivalenol (DON)
concentration, Fusarium head blight (FHB) resistance, yield, and
plant height (see Fig. 5) in each of the five progeny sets between
2006 and 2010.
10 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
How do Trait and Population Characteristics
Affect Prediction Accuracy?
Ideally, GS can be applied to traits that vary in H2 and
genetic architecture. In our study based on estimates of
R2 and the p parameter, yield was the most complex trait
while plant height was the least complex. However, infer-
ence of genetic architecture based on p should be inter-
preted cautiously (Gianola, 2013). We found that yield,
a more complex and lower H2 trait, generally had lower
prediction accuracy than a simpler and higher H2 trait
such as plant height. is is consistent with other studies
where complex traits controlled by many loci with small
eects produced lower prediction accuracy than less
complex traits (Hayes et al., 2010). Genomic predictions
should be more accurate for traits with higher H2 (Hayes
et al., 2009; Daetwyler et al., 2010; Lorenz, 2013; Combs
and Bernardo, 2013). Prediction accuracy for yield in
the current study was higher than accuracy observed
for yield in oat (Avena sativa L.; Asoro et al., 2011). ey
reasoned that lower accuracy for oat yield was due to the
evaluation of their germplasm in a wider range of envi-
ronments which reduced the genetic variance relative to
the G ´ E variance, and thereby reduced H2 for yield.
e barley germplasm in the current study was evaluated
in more homogenous environments, which are the target
production and evaluation environments in Minnesota.
erefore, the genetic variance is expected to be higher
relative to G ´ E, leading to a higher H2 estimate and an
increased prediction accuracy.
In addition to the characteristics of trait, H2, and
genetic variance, LD (as discussed above) and population
structure can aect prediction accuracy. ese factors
could have contributed to the striking dierence in the
response of accuracy to increased training population size
for DON versus yield (Fig. 12). Both traits have higher
than expected accuracies at very low training population
sizes (e.g., 25 individuals). Windhausen et al. (2012) sug-
gested that high accuracy at low training population size
can be diagnostic of subpopulation structure aecting
accuracy. In this context, we suggest that structure could
reduce accuracy at high training population size from the
following mechanism. Population structure is a cause of
LD: two loci that both have dierences in allele frequency
across subpopulations will be in LD. us, structure can
cause association between a marker and several QTL.
is phenomenon has been an ongoing issue in genome-
wide association studies (e.g., Pritchard et al., 2000). In
the context of genomic prediction, structure-generated
disequilibrium between a marker and several QTL will
prevent the marker’s estimated eect from converging on
the eect of a QTL to which it is actually linked, regard-
less of the training population size. ough they did not
comment on it, Wimmer et al. (2013) observed a phenom-
enon like this: in their Fig. 6A, at low model complexity,
Figure 7. Distribution of marker R2 values for plant height, deoxynivalenol (DON) concentration, Fusarium head blight (FHB) resistance,
and yield. R2 is the proportion of genetic variance explained by a marker.
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 11 of 15
the error of marker eect estimates increases as training
population size increases. is increase in error arises
presumably because of the documented deep structure
in rice (Zhao et al., 2011). e question remains as to why
this mechanism would more strongly aect DON than
yield. We hypothesize that structure in the Minnesota
barley breeding program is more strongly correlated to
DON than to yield, given that it has been purposefully
split into a population where FHB resistance was priori-
tized versus one where yield and quality continued to be
prioritized (Fang et al., 2013).
e availability of genome-wide markers can improve
our understanding of genetic architecture and the extent
to which epistasis inuences complex traits. In general,
the four genomic prediction models tested produced sim-
ilar accuracies across the four traits investigated in this
study. e four models diered in assumptions about the
genetic architecture of the trait and the extent to which
nonadditive eects contribute to the prediction. Lande
and ompson (1990) suggested the use of epistatic eects
in addition to additive eects in MAS schemes. Liu et al.
(2003) found that including epistasis improved both the
response and eciency of MAS. In some studies, includ-
ing epistasis in genomic prediction models through the
use of nonadditive kernels resulted in increased predic-
tion accuracy over RR-BLUP (Crossa et al., 2010; Wang
et al., 2012). In our study, we found that simple additive
models (RR-BLUP and Bayes Cp) performed similarly
to those that account for both additive and nonadditive
eects (Exponential and Gaussian). ese results are sim-
ilar to a recent study of barley breeding lines evaluated for
DON concentration and FHB resistance that showed that
both Bayes Cp and RR-BLUP produced the same level of
accuracy (Lorenz et al., 2012).
Practical Implications for Breeding
e increasing ease and rapidly declining cost of geno-
typing means that assembling phenotype data to train
prediction models will be the limiting step to implement-
ing GS. We found that using the contemporary parent
data was slightly, but not signicantly, better than using
historic parent data to train a prediction model. e con-
temporary data was balanced and we corrected for eld
spatial variability using the common checks, whereas
the historic data was unbalanced and no correction for
eld variability was made. Nevertheless, the prediction
accuracy from historic data was, in most cases, around
0.50 for each of the 5 yr, and this level of accuracy sug-
gests GS would be eective if the breeding cycle time is
half of what is done in phenotypic selection (Asoro et al.,
2011). ese results suggests that breeders could reduce
time and costs by using unbalanced historical data aer
proper adjustment for spatial variability and trial eects
to train prediction models. Historical unbalanced phe-
notypic data were also used to assess the use of GS in oat
(Asoro et al., 2011). Initiating GS with existing data and
later incorporating contemporary data sets should allow
breeders to realize benets of GS sooner and improve
eectiveness over time.
e size and composition of the training population
are important factors to manipulate prediction accuracy.
Breeders may consider combining training data sets to
maximize the use of the available phenotypic and genotypic
information and generate larger population sizes (Hayes et
al., 2009; de Roos et al., 2009; Asoro et al., 2011; Lorenz et
al., 2012; Technow et al., 2013). Lorenz et al. (2012) found
little to no improvement in prediction accuracy for FHB
resistance and DON concentration when increasing the size
of the training population by combining dierent barley
Figure 8. Scatterplot matrix for all prediction models when using
contemporary parent data set as the training population to predict
all progeny sets (2006–2010) using ridge regression best linear
unbiased prediction (RR-BLUP), Gaussian kernel model (GAUSS),
Exponential kernel model (EXP), and Bayes Cp for yield.
Figure 9. Prediction accuracy for yield using historic, contempo-
rary, and combined (historic and contemporary) parent data to
predict five progeny sets using ridge regression best linear unbi-
ased prediction.
12 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
breeding populations. Conversely for maize, when combin-
ing both int and dent heterotic groups together, prediction
accuracies increased by 10 and 13% when predicting dent
and int heterotic groups, respectively, for Northern corn
leaf blight resistance (Technow et al., 2013). In our study, we
found only a slight improvement or a reduction in accuracy
when increasing the population size by adding progeny
sets from the same breeding program to the parent set.
However, when adding progeny to the parent set, both the
size and the composition of the training populations were
altered. erefore, we separated these two factors by gener-
ating training populations by randomly sampling from the
combined data set. Interestingly, the prediction accuracy for
DON concentration plateaued at a much smaller popula-
tion size compared with yield (Fig. 12). Prediction accuracy
for yield did not level o, suggesting that the benet from
increasing training population size may depend on the trait.
In addition to optimizing prediction accuracy, the
eectiveness of GS will increase by shortening the breed-
ing cycle time and reducing the cost of selection (Hener
et al., 2010; Jannink et al., 2010). In the University of
Minnesota’s barley breeding program, GS is implemented
at the F3 stage for FHB resistance, DON concentration,
and yield. is is 1 yr aer crossing parents, compared
with a 4-yr breeding cycle that is typical for phenotypic
selection. e prediction accuracies that we observed
based on progeny validation always exceeded 0.25, indi-
cating that GS should exceed phenotypic selection in
gain per unit time. Combined with rapidly decreasing
genotyping costs, this suggests that GS should improve
breeding eciency substantially.
Supplemental Information Available
Supplemental information is included with this article.
Supplemental Figure S1. Prediction accuracy for
(A) DON accumulation, (B) FHB resistance, (C) yield,
and (D) plant height using RR-BLUP, Exponential kernel
method (EXP), Gaussian kernel method (GAUSS), and
Bayes Cπ when using the parent set as a training popula-
tion to predict the ve progeny sets.
Supplemental Table S1. Number of experimental
trials for the parent and ve progeny sets for deoxyniva-
lenol (DON) accumulation, Fusarium head blight (FHB)
resistance, yield, and plant height. Each line was repli-
cated twice in each experiment.
Figure 10. Prediction accuracy (ra/H, where ra is predictive ability and H the square root of heritability) for the four traits using ridge
regression best linear unbiased prediction in three scenarios for training populations: using the contemporary parent data set to predict
each progeny set (circle), using the sequential addition of progeny sets to the contemporary parent data set to predict the later progeny
set (triangle), and using the two previous years of the progeny sets to predict the later progeny set (square). The heritability for each
progeny set used as the validation set is shown in the solid bar.
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 13 of 15
Supplemental Table S2. Mean prediction accuracy
and p-values when using parent set as the training popu-
lation and ve progeny sets as the validation populations
for deoxynivalenol (DON) accumulation, Fusarium head
blight (FHB) resistance, yield, and plant height using
RR-BLUP, Exponential kernel method, Gaussian kernel
method, and Bayes Cπ.
Acknowledgments
We thank Ed Schiefelbein, Guillermo Velasquez, Karen Beaubien, Rich-
ard Horsley, the Minnesota Agricultural Experiment Station NW and WC
Research and Outreach Centers, and the University of Minnesota Small
Grains Pathology Lab for their contributions to conducting eld trials
and collecting data. In addition, we thank Shiaoman Chao and Yanhong
Dong for SNP genotyping and toxin analysis, respectively. Additional
thanks to Yang Da for helpful suggestions on an earlier dra of this man-
uscript. Funding for this work was supported by grants from the National
Institute of Food and Agriculture USDA Award Number 2009-65300-
05661, the U.S. Wheat and Barley Scab Initiative USDA–ARS Agreement
No. 59-0206-9-072, USDA HATCH project MIN-13-030, and the Rahr
Foundation. Any opinions, ndings, conclusions, or recommendations
expressed in this publication are those of the authors and do not necessar-
ily reect the view of the USDA.
Figure 12. Relationship between population size and prediction accuracy for deoxynivalenol (DON) concentration and yield. Three
scenarios are presented: (A) Using the contemporary parent data set as the training population to predict the 2008 and 2010 progeny
sets. Each point represents a subset of the training population by random sampling 500 times. (B) Using the combined contemporary
parent data set and progeny sets before the validation set to predict the 2008 and 2010 sets. Each point represents a subset of the
training population by random sampling 500 times. (C) The sequential addition of a progeny set to the contemporary parent data set
as a training population to predict the 2008 and 2010 progeny sets. The training population sizes are 168 (parent set), 264 parent set
+ 2006 (n = 96), and so on.
Figure 11. Relationship between the predictive ability (correla-
tion between genomic estimated breeding value and phenotypic
performance) when using a contemporary parent data set to
predict progeny sets using ridge regression best linear unbiased
prediction (RR-BLUP), and heritability of the contemporary parent
training population for plant height, deoxynivalenol (DON) con-
centration, Fusarium head blight (FHB) resistance, and yield.
14 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
References
Asoro, F.G., M.A. Newell, W.D. Beavis, M.P. Scott, and J.-L. Ja nnink. 2011.
Accuracy and trai ning population design for genomic selection on quantita-
tive traits in elite North American oats. Plant Gen. 4:132–144. doi:10.3835/
plantgenome2011.02.0007
Barrett, J.C., J. Maller, and M.J. Daly. 2005. Haploview : Analysis and v isualiza-
tion of LD and haplotype maps. Bioinformatics 21:263–265. doi:10.1093/
bioinformat ics/bth457
Barton, N.H., and S. P. Otto. 2005. Evolut ion of recombination due to ra ndom
dri. Genetics 169:2353–2370. doi:10.1534/genetics.104.032821
Bernardo, R. 2008. Molecula r markers and selec tion for complex traits in plants:
Learning from the last 20 years. Crop Sci. 48:1649–1664. doi:10.2135/crop-
sci2008.03.0131
Bernardo, R. 2010. Breed ing for quantitat ive traits in pla nts. Stemma Press,
Woo dbur y, MN .
Blake, V.C. , J.G. Kling, P. M. Hayes, and J.L. Jannink. 2 012. e Hordeum Tool-
box—e Barley Coord inated Agricu ltural Projec t genotype and phenotype
resource. Plant Gen. 5:81–91. doi:10.3835/plantgenome2012.03.0002
Boukerrou, L., and D. Rasmusson. 1990. Breeding for hig h biomass yield in spr ing
barley. Crop Sci. 30:31–35. doi:10.2135/cropsci1990.0011183X003000010007x
Castro, A.J., F. Capettini, A.E. Corey, T. Filichkina, P.M. Hayes, A. Kleinhofs, D.
Kudrna, K. Richardson, S. Sandoval-Islas, C. Rossi, and H. Vivar. 2003. Map-
ping and py ramiding of qu alitative and quantitative resistance to st ripe rust in
barley. eor. Appl. Genet . 107:922–930. doi:10.1007/s00122-003-1329-6
Chen, C.Y. , I. Misztal, I. Aguilar, S. Tsur uta , T.H.E. Meuwissen, S.E. Aggrey,
T. Wing, and W.M. Muir. 2011. Genome-wide marker-assisted select ion
combining all pedigree phenotypic information with genotypic data in one
step: An example using broiler chickens. J. Anim. Sci. 89:23–28. doi:10.2527/
jas.2010-3071
Close, T.J ., P.R . Bhat, S. Lonardi, Y. Wu, N. Rostoks, L. Ramsay, A. Druka, N.
Stein, J.T. Svensson, S. Wanamaker, S. Bozdag, M.L. Roose, M.J. Moscou,
S. Chao, R.K. Var shney, P. Szuecs, K. Sato, P.M . Hayes, D.E. Matthews,
A. Kleinhofs, G.J. Muehlbauer, J. DeYo ung, D.F. Marshall, K. Madishetty,
R.D. Fenton, P. Condamine, A. Graner, and R. Waugh. 2009. Development
and implementation of high-throughput SNP genot yping in barley. BMC
Genomics 10:582. doi:10.1186/1471-2164-10-582
Collins, H.M., J.F. Panozzo, S.J. Logue, S. P. Jeeries, and A.R. Barr. 2003. Map-
ping and validation of chromosome regions associated with high malt
extract in barley (Hordeum vulgare L.). Aust. J. Agric. Re s. 54:1223–1240.
doi:10.1071/AR02201
Combs, E., and R. Bernardo. 2013. Accuracy of genomewide selection for dier-
ent trait s with constant population size, heritabilit y, and number of markers.
Plant Gen. 6:1–7. doi:10.3835/plantgenome2012.11.0030
Condón, F., C. Gustus, D.C. Rasmusson, and K .P. Smith. 2008. Eect of
advanced cycle breeding on genetic diversit y in barley breedi ng germplasm.
Crop Sci. 48:1027–1036. doi:10.2135/cropsci2007.07.0415
Crossa, J., G. de los Campos, P. Perez , D. Gianola, J. Burgueno, J. Luis Araus,
D. Makumbi, R. P. Singh, S. Dreisigacker, J. Yan , V. Arief, M. Banziger, and
H. Braun. 2010. Prediction of gene tic values of quant itative traits in plant
breeding using pedig ree and molecular markers. Genetics 186:713–724.
doi:10.1534/gene tics.110.118521
Daetwyler, H.D., R. Pong-Wong, B. Villanueva, and J.A. Woolliams. 2010. e
impact of genetic architec ture on genome-wide evaluation methods. Genet-
ics 185:1021–1031. doi:10.1534/genetics.110.116855
Dekkers, J.C.M. 2004. Commercial application of marker- and gene-assisted
select ion in livestock: Strategies and lessons. J. Ani m. Sci. 82:E313 –E328.
de los Campos, G., D. Gianola, and G.J.M. Rosa. 2009. Reproducing kernel Hil-
bert spaces regression: A general framework for genetic eva luation. J. A nim.
Sci. 87:1883–1887. doi:10.2527/jas.2008-1259
de Roos, A .P.W. , B.J. Hayes, and M.E. Goddard. 2009. Reliabilit y of genomic pre-
dictions across multiple populations. Genetics 183:1545–1553. doi:10.1534/
genetics.109.104935
de Roos, A .P.W. , B.J. Hayes, R.J. Spelman, and M.E. Goddard. 2008. Linkage
disequ ilibrium and persistence of phase i n Holstein–Friesian, Jersey and
Angus cattle. Genetics 183:1545–1553. Genetics 179:1503–1512. doi:10.1534/
genetics.107.084301
Endelman, J.B. 2011. Ridge regression and other kernels for genomic selec-
tion wit h R package rrBLUP. Plant Gen. 4:250–255. doi:10.3835/plantgen-
ome2011.08.0024
Fang, Z., A. Eule-Nashoba, C. Powers, T.Y. Kono, S. Takun o, P.L . Morrell, and
K.P. Smit h. 2013. Compa rative analyses identify t he contributions of exot ic
donors to disease resista nce in a barley experimental popu lation. G3: Genes
Genomes Genet . 3:1945–1953. doi:10.1534/g3.113.007294
Gianola, D. 2013. Priors in whole-genome regression: e Bayesian alphabet
returns. Genetic 194:573–596. doi:10.1534 /genetics.113.151753
Gianola, D., a nd J.B.C.H.M. van Kaa m. 2008. Reproducing kernel Hilbert spaces
regression methods for genomic assisted prediction of quantitative traits.
Genetics 178:2289–2303. doi:10.1534/gene tics.107.084285
Goddard, M.E., and B.J. Hayes. 2007. Genomic selection. J. Ani m. Breed. Genet.
124:323–330. doi:10.1111/j.1439-0388.2007.00702.x
Habier, D., R.L. Fernando, and J.C.M. Dek kers. 2007. e impact of genetic
relationship information on genome-assisted breeding values. Genetics
177:2389–2397.
Hayes, B.J., P. J. Bowman, A.C . Chamberlain, and M.E. Goddard. 2009. Genomic
select ion in dairy cattle: Progress and challenges. J. Dairy Sci. 92:433–443.
doi:10.3168/jds.2008-1646
Hayes, B.J., J. Pr yce, A.J. Chamberlain, P.J. Bowman, and M.E. Goddard. 2010.
Genetic a rchitecture of complex traits and accuracy of genomic prediction:
Coat colour, milk-fat percentage and ty pe in holstein cat tle as contrast ing
model traits. PLoS Genet. 6:E1001139. doi:10.1371/journal.pgen.1001139
Hener, E.L., J.-L . Jannink, and M.E. Sorrells. 2 011. Genomic selection accuracy
using multifamily prediction models in a wheat breeding program. Pla nt
Gen. 4:65–75. doi:10.3835/plantgenome.2010.12.0029
Hener, E.L., A.J. Lorenz, J. Jannink, and M.E. Sorrells. 2 010. Plant breeding
with genomic selection: Gain per unit t ime and cost. Crop Sci. 50:1681–1690.
doi:10 .2135/cropsci2009.11.0662
Hener, E.L., M.E. Sorrells, and J.-L. Ja nnink. 2009. Genomic selection for crop
improvement. Crop Sci. 49:1–12 . doi:10.2135/cropsci2008.08.0512
Hoeinz, N., D. Borcha rdt, K. Weissleder, and M. Frisch. 2012. Genome-based
predict ion of test cross perfor mance in two subsequent breeding cycles.
eor. Appl. Genet. 125:1639–164 5. doi:10.1007/s00122-012-1940-5
Horsley, R.D., J.D. Franckowiak, P.B . Schwarz, and S.M. Neate. 2006a. Reg-
istrat ion of ‘Stellar-ND’ ba rley. Crop Sci. 46:980–981. doi:10.2135/crop-
sci2005.06-0121
Horsley, R.D., J.D. Franckowiak, P.B . Schwarz, and B.J. Steenson. 2002. Reg-
istrat ion of ‘Drummond’ barley. Crop Sci. 42:664–665. doi:10.2135/crop-
sci2002.0664
Horsley, R.D., D. Schmierer, C. Maier, D. Kudrna, C.A. Urrea, B.J. Steenson,
P.B . Schwarz, J.D. Franckowiak, M.J. Green, B. Zhang, and A. Kleinhofs.
2006b. Identication of QTLs associated with Fusarium head blight resis-
tance in barley accession CIho 4196. Crop Sci. 46:145–156. doi:10.2135/
cropsci2005.0247
Jannink, J.-L., A.J. Lorenz, and H. Iwata. 2 010. Genomic selection in plant breed-
ing: From theory to practice. Brief. Funct. Genomics. 9:166 –177. doi:10.1093/
bfg p/elq 001
Kang, H.M., N.A. Zaitlen, C.M. Wade , A. Kirby, D. Heckerman, M.J. Daly,
and E. Eskin. 2008. Ecient control of population structure in model
organism association mapping. Genetics 178:1709–1723. doi:10.1534/genet-
ics.107.080101
Kizilkaya, K., R.L. Fernando, and D.J. Garrick. 2010. Genomic pred iction of
simulated multibreed and purebred performance using observed y thou-
sand single nucleotide polymorphism genot ypes. J. Anim. Sci. 88:544–551.
doi:10.2527/jas.2009-2064
Lande, R., and R. ompson. 1990. Eciency of marker-assiste d selection in the
improvement of qua ntitative traits. Genetics 124:743 –756.
Legarra, A., C. Robert-Granie, E. Manfredi, and J.M. Elsen. 2008. Performance
of genomic selection in mice. Genetics 180:611–618. doi:10.1534/genet-
ics.108.088575
Liu, P., J. Zhu, X. Lou, and Y. Lu. 2003. A method for marker-assisted selec tion
based on QTLs with epist atic eects. Genetica 119:75–86.
Lorenz, A.J. 2013. Resource allocation for maximizing prediction accuracy a nd
genetic ga in of genomic selection in plant breeding: A simulation experi-
ment. G3: Genes Genomes Genet. 3:481–491. doi:10.1534/g3.112. 004911.
Lorenz, A.J., S. Chao, F.G. Asoro, E.L. Hener, T. Hayashi, H. Iwata, K.P. Smith,
M.E. Sorrells, and J.-L. Jannink. 2011. Genomic select ion in plant breeding:
Knowledge and prospects. Adv. Agron. 110 :77–123. doi:10.1016/ B978- 0-12-
385531-2.00002-5
Lorenz, A.J., K.P. Smith, and J.-L. Jannink. 2012. Potential and optimization
of genomic selection for Fusar ium head blight resistance in six-row barley.
Crop Sci. 52:1609–1621. doi:10.2135/cropsci2011.09.0503
Lorenzana, R.E., and R. Bernardo. 2009. Accuracy of genotypic value predic-
tions for marker-based selec tion in biparental plant populations. eor.
Appl. Genet. 120:151–161. doi:10.1007/s00122-009-1166-3
Luan, T., J.A. Woolliams, S. Lien, M. Kent, M. Svendsen, and T.H . Meuwissen.
2009. e accurac y of genomic selection i n Norwegian red cattle assessed by
cross-validation. Genetics 183:1119–1126 . doi:10.153 4/genet ics.109.107391
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 15 of 15
Ma, Z., B.J. Steenson, L.K. Prom, and N. L .V. Lapitan. 2000. Mapping quantita-
tive trait loci for Fusar ium head blight resistance in barley. Phytopatholog y
90:1079–1088. doi:10.1094/PHYTO.2000.90.10.1079
Massman, J., B. Cooper, R. Horsley, S. Neate, R. Dill-Macky, S. Chao, Y. Dong, P.
Schwarz, G.J. Muehlbauer, and K. P. Smit h. 2011. Genome-wide association
mapping of Fusarium head blight resistance in contemporar y barley breed-
ing germplasm. Mol. Breed. 27:439–454. doi:10.100 7/s11032- 010-94 42-0
Mesn, A., K .P. Smith, R. Waug h, R. Dill-Macky, C.K. Evans, C.D. Gustus, and
G.J. Muehlbauer. 2003. Quantitative tra it loci for Fusarium head blight resis-
tance in barley detected in a two-rowed by six-rowed population. Crop Sci.
43:307–318. doi:10.2135/cropsci2003.3070
Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard. 2001. Pred iction of total
genetic va lue using genome-wide dense marker maps . Genetics 157:1819–1829.
Mirocha, C.J., E. Kolaczkowski, W. Xie, H. Yu, and H. Jelen. 1998. Analysis
of deoxynivalenol and its derivatives ( batch and single kernel) using gas
chromatography/mass spectrometry. J. Agric. Food Chem. 46:1414–1418.
doi:10 .1021/jf970857o
Nei, M. 1987. Molecular evolutionary genetics. Columbia Univ. Press, N ew Yo rk.
Pfaelhuber, P., A. Lehner t, and W. Stephan. 2008. Linkage d isequilibrium
under genet ic hitchhik ing in nite populations. Genetics 179:527–537.
doi:10.1534/gene tics.107.081497
Piepho, H.P. 2009. Ridge regression and extensions for genomewide selection in
maize. Crop Sci. 49:1165–1176. doi:10.2135/cropsci2008.10.0595
Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S. Dreisigacker,
J. Crossa, H. Sánchez-Villeda, M. Sorells, and J.-L. Jannin k. 2012. Genomic
select ion in wheat breeding u sing genotyping-by-sequencing. Pla nt Gen.
5:103–113. doi:10.3835/plantgenome2012.06.0006
Pritchard, J.K., M. Stephens, N.A. Rosenberg, and P. Donnelly. 2000. Associa-
tion mapping in structured populations. Am. J. Hum. Genet. 67:170–181.
doi:10 .1086/302959
R Development Core Team. 2012. R: A language and environment for statistica l
computing. R Foundation for Stat istical Computing, Vienna, Austria.
Rasmusson, D.C., K .P. Smith, R. Dill-Macky, E.L. Schiefelbein, and J.V.
Wiersma. 20 01. Registration of ‘Lacey’ Barley. Crop Sci. 41:1991. doi:10 .2135/
cropsci2001.1991
Rasmusson, D.C., and R.D. Wilcoxson. 1983. Registration of ‘Robust’ barley.
Crop Sci. 23:1216. doi:10.2135/cropsci1983.0011183X002300060048x
Rasmusson, D.C., R.D. Wilcoxson, and J.V. Wiersma. 1993. Registration of
‘Stander’ barley. Crop Sci. 33:1403 . doi:10.2135/cropsci1993.0011183X00330
0060057x
Rutkoski, J., J. Benson, Y. Jia, G. Brown-Guedi ra, J.-L. Jannink, and M. Sor-
rells. 2012. Evaluation of genomic predict ion methods for Fusarium head
blight resistance in whe at. Plant Gen. 5:51–61. doi:10.3835/plantgen-
ome2 012.02.00 01
SAS Institute. 2011. e SAS system for Windows. v.9.3. SAS Inst., Ca ry, NC.
Slotta, T. A., L. Brady, and S. Chao. 2008. High throughput tissue preparation
for large-scale genoty ping experiments. Mol. Ecol. Resour. 8(1):83–87.
doi :10.1111/j.1471-8286 .20 07.019 07.x
Smith, K .P. , A. Budde, R. Dill-Macky, D.C. Rasmusson, E. Schiefelbein, B. Stef-
fenson, J.J. Wiersmaa, J .V. Wiersmad, a nd B. Zhang. 2013. Registration of
‘Quest’ spring malti ng barley with improved resistance to Fusarium head
blight. J. Plant Reg. 7:125–129. doi:10.3198/jpr2012.03.0200crc
Smith, K .P. , D.C. Rasmusson, E. Schiefelbein, J.J. Wiersma, J.V. Wiersma, A.
Budde, R. Dill-Macky, and B. Steenson. 2010. Registration of ‘Rasmusson’
barley. J. Plant Reg. 4:16 4–167. doi:10.3198/jpr2009.10.0622crc
Steenson, B.J. 2003. Fusarium head blight of barley: Impact, epidemics, man-
agement, and strategies for identifying and utilizing genetic resistance. In:
K.J. Leonard and W.R. Bushnel l, editors, Fusarium head blight of wheat and
barley. APS Press, St. Paul, MN. p. 241–295.
Tec hn ow, F., A. Bürger, and A.E. Melchinger. 2013. Genomic prediction of
northern corn leaf blight resistance in maize wit h combined or separated
trai ning sets for heterot ic groups. G3: Genes Genomes Genet. 3:197–203.
doi:10.1534 /g3.112.0 04630.
Too si , A., R. Fernando, and J. Dekkers. 2010. Genomic selection in admixed and
crossbred populations. J. Anim. Sci. 88:32–46. doi:10.2527/jas.2009-1975
Wan g, D., S.I. El-Basyoni, S.P. Baenziger, J. Crossa, K.M. Eskridge, and I. Dwei-
kat. 2012. Prediction of genetic values of quantitative traits with epistatic
eects in plant breeding populations. Heredity 109:313–319. doi:10.1038/
hdy. 2012 .44
Weir, B.S., and C.C. Cockerham . 1984. Estimating F-statistics for the analysis of
population structure. Evolution 38:1358–1370.
Wimmer, V., C. Lehermeier, T. Albrecht, H.-J. Auinger, Y. Wang, and C.-C.
Schön. 2013. Genome-wide prediction of traits with dierent genetic
architecture throu gh ecient variable selection. Genetics 195:573–587.
doi:10.1534/gene tics.113.150078
Windhausen, V.S ., G.A. Atlin, J.M. Hickey, J. Crossa, J.-L. Jannink, M.E. Sor-
rells, B. Raman, J.E . Cairnst, A. Tarekegne, K. Semagen, Y. Beyene, P. Grud-
loyma, F. Tech now, C. Riedelsheimer, and A. Melchinger. 2012. Eectiveness
of genomic prediction of maize hybrid performa nce in dierent breeding
populat ions and environment s. G3: Genes Genomes Genet. 2:1427–1436.
doi:10.1534/g3.112.003699.
Wright, S. 1965. e interpretation of population structure by F-st atistics wit h
specia l regard to systems of mating. Evolution 19:395–420.
Xu, Y., and J.H. Crouch. 2008. Marker-assisted selection in plant breeding:
From publicat ions to practice. Crop Sci. 48:391–407. doi:10.2135/crop-
sci2007.04.0191
Zhao, Y., M. Gowda, W. Liu, T. Würschum, H .P. Maurer, F.H. Longin, N. Ranc,
and J.C. Reif. 2012. Accuracy of genomic selection in European maize elite
breeding popu lations. eor. Appl. Genet. 124:769–776. doi:10.1007/s00122-
011-1745-y
Zhao, K., C.-W. Tung, G.C. Eizenga, M.H. Wri ght, M.L. Ali, A.H. Price, G. Nor-
ton, M.R. Islam, A. Reynolds, J. Mezey, A.M. MsClung, C.D. Bustama nte,
and S. McCouch. 2011. Genome-wide association mapping revea ls a rich
genetic architecture of complex traits in Oryza sativa. Nature Comms. 2:467.
doi:10.1038/ncomms1467
Zhong, S., J.C.M. Dekker, R.L. Fernando, and J.-L. Jannink. 2009. Factors aect-
ing accu racy from genomic selection in populations derived f rom multiple
inbred li nes: A barley case study. Genetics 182:355–364. doi:10.1534/ge net-
ics.108.098277