ArticlePDF Available

Abstract and Figures

Prediction accuracy of genomic selection (GS) has been previously evaluated through simulation and cross-validation; however, validation based on progeny performance in a plant breeding program has not been investigated thoroughly. We evaluated several prediction models in a dynamic barley breeding population comprised of 647 six-row lines using four traits differing in genetic architecture and 1536 single nucleotide polymorphism (SNP) markers. The breeding lines were divided into six sets designated as one parent set and five consecutive progeny sets comprised of representative samples of breeding lines over a 5-yr period. We used these data sets to investigate the effect of model and training population composition on prediction accuracy over time. We found little difference in prediction accuracy among the models confirming prior studies that found the simplest model, random regression best linear unbiased prediction (RRBLUP), to be accurate across a range of situations. In general, we found that using the parent set was sufficient to predict progeny sets with little to no gain in accuracy from generating larger training populations by combining the parent set with subsequent progeny sets. The prediction accuracy ranged from 0.03 to 0.99 across the four traits and five progeny sets. We explored characteristics of the training and validation populations (marker allele frequency, population structure, and linkage disequilibrium, LD) as well as characteristics of the trait (genetic architecture and heritability, H2). Fixation of markers associated with a trait over time was most clearly associated with reduced prediction accuracy for the mycotoxin trait DON. Higher trait H2 in the training population and simpler trait architecture were associated with greater prediction accuracy.
Content may be subject to copyright.
th e p l a nt g en om e m arch 2 015 vo l. 8, no. 1 1 of 15
o r i g i n a l r es e a r c h
Assessing Genomic Selection Prediction Accuracy
in a Dynamic Barley Breeding Population
A. H. Sallam, J. B. Endelman, J.-L. Jannink, and K. P. Smith*
ABSTRACT
Prediction accuracy of genomic selection (GS) has been previ-
ously evaluated through simulation and cross-validation; however,
validation based on progeny performance in a plant breeding
program has not been investigated thoroughly. We evaluated
several prediction models in a dynamic barley breeding popula-
tion comprised of 647 six-row lines using four traits differing in
genetic architecture and 1536 single nucleotide polymorphism
(SNP) markers. The breeding lines were divided into six sets
designated as one parent set and five consecutive progeny sets
comprised of representative samples of breeding lines over a
5-yr period. We used these data sets to investigate the effect of
model and training population composition on prediction accu-
racy over time. We found little difference in prediction accuracy
among the models confirming prior studies that found the simplest
model, random regression best linear unbiased prediction (RR-
BLUP), to be accurate across a range of situations. In general, we
found that using the parent set was sufficient to predict progeny
sets with little to no gain in accuracy from generating larger train-
ing populations by combining the parent set with subsequent
progeny sets. The prediction accuracy ranged from 0.03 to 0.99
across the four traits and five progeny sets. We explored charac-
teristics of the training and validation populations (marker allele
frequency, population structure, and linkage disequilibrium, LD) as
well as characteristics of the trait (genetic architecture and heri-
tability, H2). Fixation of markers associated with a trait over time
was most clearly associated with reduced prediction accuracy
for the mycotoxin trait DON. Higher trait H2 in the training popu-
lation and simpler trait architecture were associated with greater
prediction accuracy.
Genomic selection is touted as a marker-based breed-
ing approach that complements traditional marker-
assisted selection (MAS) and phenotypic selection. In
traditional MAS, favorable alleles or genes for relatively
simply inherited traits are mapped and then molecular
markers linked to those alleles are used to select indi-
viduals to use as parents or to advance from segregating
breeding populations (Bernardo, 2008). Marker-assisted
selection is more eective than phenotypic selection if the
tagged loci account for a large portion of the total genetic
variation within the population of selection candidates
(Collins et al., 2003; Castro et al., 2003; Xu and Crouch,
2008). e limitation of traditional MAS for highly com-
plex traits is that it captures only a small portion of the
total genetic variation because it uses a limited number of
selected markers (Lande and ompson, 1990; Bernardo,
2010). Phenotypic selection is eective on quantitative
traits, but is limited to stages in breeding cycles and envi-
ronments where such traits can be measured eectively,
such as for advanced lines in multiple location eld tri-
als. erefore, GS can be strategically implemented in
Published in The Plant Genome 8
doi: 10.3835/plantgenome2014.05.0020
© Crop Science Society of America
5585 Guilford Rd., Madison, WI 53711 USA
An open-access publication
All rights reserved. No part of this periodical may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher.
Permission for printing and for reprinting the material contained herein
has been obtained by the publisher.
A.H. Sallam, Dep. of Agronomy and Plant Genetics, Univ. of Minne-
sota, St. Paul, MN 55108; J.B. Endelman, Dep. of Horticulture, Univ.
of Wisconsin-Madison, 1575 Linden Dr., Madison, WI 53706; J.-L.
Jannink, USDA-ARS, R.W. Holley Center for Agriculture and Health,
Cornell Univ., Ithaca, NY 14853; K.P. Smith, Dep. of Agronomy and
Plant Genetics, Univ. of Minnesota, St. Paul, MN 55108. Received
6 Jan. 2014. *Corresponding author (smith376@umn.edu).
Abbreviations: BLUEs, best linear unbiased estimations; DON,
deoxynivalenol; EMMA, efficient mixed-model association; FHB,
Fusarium head blight; Fst, Wright’s fixation index; GS, genomic
selection; H2, heritability; GEBV, genomic estimated breeding value;
LD, linkage disequilibrium; MAF, minor allele frequency; MAS,
marker-assisted selection; ra, predictive ability; QTL, quantitative trait
loci; REML, restricted maximum likelihood; RKHS, Reproducing Kernel
Hilbert Space; RR-BLUP, random regression best linear unbiased pre-
diction; SNP, single nucleotide polymorphism.
Published March 13, 2015
2 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
breeding for quantitative traits at points in the breeding
process where phenotypic selection is not feasible.
Genomic selection uses trait predictions based on
estimates of all marker eects distributed across the
genome (Meuwissen et al., 2001). Based on simulation
studies, GS should improve gain from selection, reduce
costs associated with phenotyping, and accelerate devel-
opment of new cultivars by reducing the length of the
breeding cycle (Hener et al., 2009, 2010). Implementing
GS is accomplished by rst estimating marker eects
in a training population and then using those estimates
to predict the performance of selection candidates. e
predicted value of a selection candidate based on marker
eects is referred to as the genomic estimated breeding
value (GEBV; Meuwissen et al., 2001).
A key component to the eectiveness of GS is pre-
diction accuracy. Prediction accuracy is dened as the
correlation between the GEBV and the true breeding
value divided by the square root of H2, which is estimated
by measuring phenotypic performance (Goddard and
Hayes, 2007; Zhong et al., 2009). ere are three general
methods to assess prediction accuracy using real data: (i)
subset validation, (ii) interset validation, and (iii) progeny
validation (Figure 1). Subset validation is implemented by
randomly dividing a single population of individuals into
equal subsamples; one subsample is used as a validation
set to be predicted using the remaining subsamples as
the training set. Subset validation has been used to assess
prediction accuracy in cattle, wheat (Triticum aestivum
L.), and barley (Hordeum vulgare L.), among many other
livestock and crop species (Luan et Al., 2009; Hener et
al., 2010; Lorenz et al., 2012; Poland et al., 2012). In inter-
set validation, predened sets of genotypes are designated
as training and validation populations. ese sets could
be the same genotypes from independent environments
as training and validation data sets or sets of breeding
lines chronologically dened where older lines are used
to predict newer lines from either the same or indepen-
dent environments (Asoro et al., 2011; Lorenz et al., 2012).
Progeny validation implies that the training popula-
tion includes parents (or grandparents, and so forth) of
progeny lines that comprise the validation population.
A simulation study in animals has shown that decreases
in prediction accuracy are associated with decay of LD
between markers and quantitative trait loci (QTL) result-
ing from recombination in progeny generations (Habier et
al., 2007). erefore, meaningful assessment of prediction
accuracy should include progeny validation. In plants, we
are aware of only a single study that assesses accuracy by
progeny validation using empirical phenotypic and geno-
typic information (Hoeinz et al., 2012).
To assess the potential of GS, researchers have
explored various factors that aect prediction accuracy,
including prediction models. ese models include RR-
BLUP, Bayes A, Bayes B, Bayes Cp, Bayes LASSO, and
Reproducing Kernel Hilbert Space (RKHS; Meuwissen et
al., 2001; Kizilkaya et al., 2010; de los Campos et al., 2009;
Gianola and van Kaam, 2008). ese models dier in the
assumptions made for marker variances associated with
markers and/or types of gene action (reviewed by Lorenz
et al., 2011). RR-BLUP assumes that all markers have
equal variance, whereas Bayes A, Bayes B, Bayes Cp, and
Bayes Lasso models do not impose this constraint (Meu-
wissen et al., 2001; de los Campos et al., 2009; Kizilkaya et
al., 2010). e RKHS regression model can capture both
the additive and nonadditive interactions among loci by
creating a kernel matrix that includes interactions among
marker covariates (Gianola and van Kaam, 2008). Results
of empirical studies have shown variable performance of
prediction models on dierent traits (Crossa et al., 2010;
Lorenz et al., 2012; Rutkoski et al., 2012).
Other factors shown to aect prediction accuracy
include: (i) the LD between markers and QTL in the
training and the validation populations, (ii) the size of
the training population (N), (iii) the H2 of the trait under
investigation, and (iv) the genetic architecture of the trait.
Increasing marker density will improve prediction accu-
racy by increasing the number of QTL that are in LD with
markers and capturing more of the genetic variation (de
Roos et al., 2009; Asoro et al., 2011; Hener et al., 2011;
Zhao et al., 2012). e successful application of GS across
generations relies on the persistence of LD phase between
markers and QTL (de Roos et al., 2008). e persistence of
LD phase measured by the correlation of r among popula-
tions is likely to be a function of the genetic relationship
between populations (de Roos et al., 2008; Toosi et al.,
2010). Increasing N will lead to better estimation of SNP
eects (Hayes et al., 2009) and therefore, increases predic-
tion accuracy (Lorenzana and Bernardo, 2009; Asoro et al.,
2011; Lorenz et al., 2012). In a simulation study, Daetwyler
et al. (2010) found that prediction accuracies increased
with increase in H2 of the trait regardless of the number
of QTL controlling the trait or the prediction model used.
In a study that manipulated H2 by introducing random
error into empirical data sets, Combs and Bernardo (2013)
showed that accuracy increased with increasing H2 and N,
and that prediction accuracies were similar for dierent
combinations when H2 ´ N were held constant. Generally,
prediction accuracy decreases with the increase of trait
complexity (Hayes et al., 2010). Prediction models can vary
in performance among traits with dierent genetic archi-
tecture. Bayes B was more accurate when a smaller number
of loci control the trait whereas RR-BLUP was insensitive
to genetic architecture (Daetwyler et al., 2010).
Previous studies have demonstrated the potential
of GS on the basis of subset validation and interset vali-
dation. While these results are promising, additional
research is needed to assess accuracy in the context of
applied breeding. Specically, validation experiments are
needed to assess the accuracy of prediction on progenies
(progeny validation) over time, as would occur in breed-
ing populations. is would take into account changes
in allele frequency and LD that would be expected to
occur as a result of recombination and selection within
a dynamic breeding program. Lorenz et al. (2012) inves-
tigated prediction accuracy for the disease Fusarium
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 3 of 15
head blight (FHB) and its associated mycotoxin deoxyni-
valenol (DON) using interset validation. In this study,
we advance this work by using progeny validation and
include additional agronomic traits. We use a set of
breeding lines as a training population that include par-
ents that were used to predict ve chronological sets of
progenies (2006–2010) from a breeding program. Our
specic objectives were to (i) compare the accuracy of
dierent GS prediction models on DON concentration,
FHB resistance, yield, and plant height, (ii) study the
eect of trait architecture on prediction accuracy, (iii)
characterize changes in prediction accuracy over time,
(iv) examine the relationship between prediction accu-
racy and training population size and composition, allele
frequency, LD, and genetic distance between the training
and validation populations.
MATERIALS AND METHODS
Germplasm
To explore the accuracy of genomic predictions, we uti-
lized historical sets of breeding lines that we dene as par-
ent or progeny sets from the University of Minnesota bar-
ley breeding program. e parent set is comprised of 168
breeding lines that were developed between 1999 and 2004
and were either used as parents to develop lines in the
progeny sets or were cohorts of breeding lines that were
used as parents. e ve progeny sets consist of ve chron-
ological sets of breeding lines evaluated between 2006 and
2010. Each progeny set consists of approximately 96 lines
that were representative of the breeding lines developed
that year in the breeding program. e progeny sets 2006
and 2007 are the breeding lines from the University of
Minnesota barley breeding program that were included in
the association mapping study conducted by Massman et
al. (2011) and were referred to as CAP I and CAP II in that
study. All the breeding lines in the parental and progeny
sets were developed by single seed decent to at least at the
F4. At that point, F4:5 lines were evaluated for resistance to
FHB resistance, heading date, plant height, maturity, and
lodging. Lines selected as favorable for these traits are then
advanced to preliminary yield trials the following year
(Smith et al., 2013). e preliminary yield trial data were
used to characterize progeny set lines and the year desig-
nation for the progeny set refers to the year that the breed-
ing line entered preliminary yield trials. All pedigree,
SNP markers, and phenotypic data related to these sets of
breeding lines are available from the public database e
Hordeum Toolbox (http://thehordeumtoolbox.org, veried
3 Oct. 2014; Blake et al., 2012).
Phenotypic Evaluation
e parental lines were evaluated together for agronomic
traits in ve experiments conducted between 2009 and
2011 at Crookston and St. Paul, MN, in an augmented
block design with two replications and four incomplete
blocks per replication (Supplemental Table S1). Planting
density for all traits in all experiments was 300 plants m–2 .
Each line was represented once per block in two-row plots
3 m in length. Six check cultivars (Drummond, Lacey,
Quest, Rasmusson, Stellar-ND, and Tradition) were ran-
domly assigned to each block (Horsley et al., 2002, 2006a;
Rasmusson et al., 2001; Smith et al., 2010, 2013). We
also characterized the parental lines using the historical
data that was collected as part of the breeding program
as these lines were entered into preliminary yield trials.
Experiments for this unbalanced data set were arranged
as a randomized complete block design with two replica-
tions in two-row plots 3 m in length and were conducted
between 1999 and 2004. ree checks (‘Robust’, ‘Stander’,
and Lacey) were common to all the experiments (Ras-
musson and Wilcoxson, 1983; Rasmusson et al., 1993).
For both the historic and contemporary data sets, each
line was evaluated at least two times in yield trials con-
ducted in St. Paul, Morris, and Crookston, MN. Yield was
determined by harvesting each plot with a Wintersteiger
small plot combine, weighing the grain, and express-
ing it as kg/ha. Plant height was assessed as the height in
centimeters of two randomly selected samples of plants
from the middle of the plot from the soil surface to the
tip of the spike, excluding awns. e parental lines were
evaluated for FHB resistance and DON concentration in
2009 at St. Paul and in 2010 at St. Paul and Crookston,
MN, in an augmented block design with two replications
in four incomplete blocks. Each line was represented one
time per block in single-row plots 1.8 m in length, with
30 cm between rows. Six check cultivars (Drummond,
Lacey, Quest, Rasmusson, Stellar-ND, and Tradition)
were randomly assigned to each block. e parental lines
were evaluated for FHB resistance and DON concentra-
tion using a previously described method (Steenson,
2003). Briey, in St. Paul, plants were spray inoculated
with a F. graminearum macroconidia suspension using
CO2–pressure backpack sprayers. Plots were inoculated
when at least 90% of the heads had emerged from the
boot and sprayed again 3 d later (Mesn et al., 2003). Mist
irrigation was applied immediately aer inoculation to
promote disease infection. In Crookston, MN, plants were
inoculated by grain spawn using autoclaved corn (Zea
mays L.) colonized by ve local isolates of F. graminearum
(Horsley et al., 2006b). e colonized grain was spread
on the ground 2 wk before owering and again 1 wk later.
Overhead mist irrigation started 2 wk before anthesis and
continued until the hard dough stage of maturity. Fusar-
ium head blight severity was assessed about 14 d aer
inoculation by estimating the percentage of infected ker-
nels on a random sample of 10 spikes per plot using the
following assessment scale 0, 1, 3, 5, 10, 15, 25, 35, 50, 75,
and 100%. DON concentration was determined on a 25-g
sample from the harvested grain by gas chromatography
and mass spectrometry and expressed in ppm according
to the procedures of Mirocha et al. (1998).
Lines included in the progeny sets were derived from
crosses made between 2003 and 2007 and were evalu-
ated in preliminary yield trials conducted from 2006 to
2010 (Supplemental Table S1). Plots were arranged in a
4 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
randomized complete block design with two replications
and four check varieties (Robust, Stander, MNBrite, and
Lacey). Each progeny set was evaluated for yield and
plant height in Crookston, St. Paul, and Morris, MN, as
described above. e progeny sets were also evaluated for
FHB resistance and DON accumulation in disease nurs-
eries as described above. Each progeny set was evaluated
in three to four FHB experiments located in St. Paul and
Crookston, MN, and Osnabrock and Fargo, ND. Disease
inoculation, disease assessment, and DON measure-
ments were done as previously described.
Genotypic Evaluation
DNA for genotyping was extracted from a single plant
from the F4:5 bulk seed used in the phenotypic evaluation.
Approximately 3-wk-old leaf tissue was harvested and
freeze-dried. DNA was extracted at the USDA genotyp-
ing center in Fargo, ND, using the protocol of Slotta et al.
(2008). Each DNA sample was genotyped with the 1536
SNPs referred to as BOPA1 using the Illumnina Golden-
Gate oligonucleotide assay (Close et al., 2009). Markers
were ltered in parents set based on minor allele fre-
quency (MAF) < 0.01 and missing data frequency > 10%
Missing marker values were imputed using naïve imputa-
tion so that analytical operations could be performed.
Data Analysis
Analysis of variance was performed for DON concentra-
tion, FHB resistance, yield, and plant height using the
PROC GLM procedure in SAS (v.9.3, SAS Institute, 2011).
For each experiment, outlier observations with stan-
dardized residual absolute values of three or more were
removed from the data set and scored as missing values.
One experiment (yield in St. Paul, 2010) was removed
because no signicant dierences were found among lines.
To avoid including common checks across experi-
ments in variance component estimates, two-step pro-
cedures were used. For the contemporary data from the
parental set, we rst adjusted phenotypes for block eects
by using the common checks among blocks using the
PROC MIXED procedure in SAS (v.9.3, SAS Institute,
2011). e model was y = + Zu + e where y is the vec-
tor of unadjusted phenotypes, β is the vector of xed
block eects, and u is the vector of random check eects.
X and Z are incidence matrices to relate the vector of
unadjusted phenotypes to β and u. We then adjusted
phenotypes for trial eects by estimating these eects
as xed in an analysis with lines as random eects. e
model was y* = + Zu + e, where y* represents the
phenotypes adjusted for block eects calculated in the
rst step, β is the vector of xed trial eects, and u is the
vector of random line eects. In the historic data for the
parent set, subsets of lines were evaluated in dierent
years, but a common set of checks was included in each
trial. Similarly to the contemporary data, phenotypes
were adjusted for trial eects by computing these eects
in a mixed model with checks as random and trials as
xed eects. In the progeny data sets, phenotypes were
adjusted for trial eects by computing these eects in a
mixed model, with lines as random and trials as xed
eects. Finally, best linear unbiased estimations (BLUEs)
for lines in each experiment were estimated in models
with adjusted phenotypes as the response variable and
lines as xed eects. Variance components were esti-
mated using restricted maximum likelihood (REML) in
the PROC MIXED procedure in SAS by using the line
BLUEs as the response variable, lines as random eects
and experiments as xed eects. Broad-sense H2 on an
entry mean was estimated for all traits using the equa-
tion H2 =sg
2/(sg
2 + se
2/n), where sg
2 is genetic variance,
se
2 is the pooled error variance that includes G ´ E and
residuals, and n is the number of trials.
Characterizing LD, Genetic Distance,
and Parental Contribution
To assess the extent of the LD within the parental and
progeny sets, the adjacent marker LD was character-
ized as r2 using Haploview v.4.0 (Barrett et al., 2005). To
assess the persistence of LD phase between the parental
and progeny sets, the correlations of r were calculated
between parental and each progeny set (de Roos et al.,
2008; Toosi et al., 2010). We measured genetic distance
between the parent set and each progeny set by the
xation index (Fst, Weir and Cockerham, 1984) and
Nei’s genetic distance (Nei, 1987). Fst and Nei’s genetic
distances measure the dierentiation between two
populations due to changes in allele frequencies among
populations. e contribution of the parental lines to a
progeny set was assessed by summing the number of par-
ents for a progeny line that were included in the parent
set over the progeny set and dividing that by twice the
number of lines in that progeny set.
Association Analysis
To identify sets of markers associated with traits, associa-
tion analysis was implemented using the ecient mixed-
model association (EMMA) approach, which corrects for
population structure using genetic relatedness (Kang et al.,
2008). Association analyses were done on the parent set
for DON concentration, FHB resistance, yield, and plant
height using EMMA package implemented in R (Kang et
al., 2008). e analysis was based on the mixed model:
y = + Zu + e [1]
where y is the vector of individual phenotypes, X is an
incidence matrix that relates β to y, β is the vector of
xed eects that includes the overall mean and SNPs, Z
is the matrix of random eects that relates u to pheno-
types, u is the random eect of the genetic background
of each line and is distributed as u ~ N(0, Ksg
2). K is the
kinship matrix derived from marker genotypes and sg
2
is the genetic variance, and e is the residual where e ~
N(0, σ ²e I). I is the identity matrix and σ²e is the error
variance (Kang et al., 2008). We used a relaxed threshold
of –log p-value of 1.3 (p-value of 0.05) to identify markers
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 5 of 15
potentially associated with traits. ese subsets of mark-
ers were used to investigate the changes in allele frequen-
cies over time in the progeny sets. For all polymorphic
SNP markers, the proportion of variance explained
by each marker (R2) was calculated as R2 = SSreg/SStot,
whereas SSreg is the regression sum of squares, and SStot is
the total sum of squares of the regression model.
Prediction Models
Genomic predictions were estimated using four methods:
ridge regression best linear unbiased prediction (RR-
BLUP; Meuwissen et al., 2001), Gaussian kernel model
(Gianola and van Kaam, 2008; Endelman, 2011), Expo-
nential kernel model (Piepho, 2009; Endelman, 2011),
and Bayes Cp (Kizilkaya et al., 2010). RR-BLUP and
Bayes Cp can be modeled as
{ 1}
1 e
K
jjj
j
ua
=
= + d+
å
yZ
[2]
where y is the vector of individual phenotypes, u is the
population mean, K is the number of markers, Z is the
incidence matrix that links marker j genotypes to indi-
viduals, a is the eect of marker j, d is an indicator vari-
able that indicates the absence or the presence of marker
j with probability of p and 1 – p, respectively, and e is the
random residual. In RR-BLUP, all markers are included
(d = 1) and their eects are distributed with the same
variance N(0, sa
2). e variance of this distribution was
estimated on the basis of marker and phenotypic data
using REML. A Bayesian model was used to relax the
assumption of RR-BLUP to allow some marker variances
to be zero. Bayes Cp assumes common marker vari-
ance across all markers included in the model; however,
it allows some markers to have no eect on the trait
(Kizilkaya et al., 2010). In Bayes Cp, it is assumed that
each marker j has a zero eect with probability p when
dj = 0 and an eect aj ~ N(0, sa
2) with probability (1 – p)
when dj = 1. e parameter p is treated as unknown and
is estimated from the training data. In the Markov Chain
Monte Carlo (MCMC) algorithm for Bayes Cp, 10,000
iterations of Gibbs sampling were used and the rst 2500
iterations were discarded as burn-in. We implemented
Bayes Cp analysis in R (R Development Core Team,
2012). Gaussian and Exponential kernel models were
implemented to capture both the additive and nonaddi-
tive interactions between marker genotypes using the R
package rrBLUP (Endelman, 2011; R Development Core
Team, 2012). ese models can be presented as
y = 1u + Zg + e [3]
where y is the vector of individual phenotypes, u is the
population mean, Z is the matrix that relates g to the
adjusted phenotypes, g is the vector of genotypic values
that is distributed as g ~ N(0, Ksg
2), where K is the kernel
similarity matrix, and e is the residual (Endelman, 2011).
e Gaussian and Exponential models do not partition
the total genetic variance into additive and nonadditive
variances; rather, kernel functions are used to capture
these eects. Genomic predictions were calculated for all
the lines in the validation population using the four pre-
diction models. e correlation coecient between the
genomic predictions and line BLUEs was used to calcu-
late the predictive ability (ra). Prediction accuracy (ra/H)
of GS (Legarra et al., 2008; Chen et al., 2011) was calcu-
lated by dividing the ra by the square root of the broad-
sense H2 derived from the validation population data.
Training Populations
To test the eect of training population composition
on prediction accuracy, three dierent scenarios were
implemented by varying the training data set. In the
rst scenario, the 168 parental lines, using either the
contemporary or historic data, were used as the training
set to predict the performance of lines in each of the ve
progeny sets. In the second scenario, we varied the train-
ing population composition by adding one or more of the
progeny sets to the contemporary parent set to predict
the performance of a later progeny set. In the third sce-
nario, we used two earlier progeny sets to predict a later
progeny set. For each scenario, we implemented the four
prediction models described previously.
Because the experiments described above were used
to assess dierent types of training populations that var-
ied in population size, we also tested the eect of train-
ing population size on prediction accuracy for two out of
the four traits in 2008 and 2010 progeny sets as valida-
tion populations. For DON concentration and yield, we
used three scenarios. In the rst scenario, we randomly
sampled 25, 50, 75, 100, and 150 lines from the parent set
(n = 168). For each population size, samples were drawn
without replacement 500 times. In the second scenario,
we combined progeny sets before the validation set (com-
bined 2006 to 2007 when predicting 2008 and 2006 to
2009 when predicting 2010) with the parent set into a
single panel from which samples were drawn to generate
various training sets. We generated training sets from
the larger training panels by randomly sampling 25, 50,
75, 100, 150, 168, 264, and 360 when predicting 2008 and
Figure 1. Three validation approaches to assess prediction accu-
racy using different training and prediction sets.
6 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
sampling 25, 50, 75, 100, 150, 168, 264, 360, and 456 lines
when predicting 2010. For each population size, samples
were drawn without replacement 500 times. In the third
scenario, we combined the parental and progeny sets by
sequentially adding each progeny set in chronological
order to the parent set such that in each round of predic-
tion the size of the training population was increased by
96. is represents the single case that would occur if a
breeder accumulated data over time to increase the size
of the training population, and thus there exists just one
instance for this scenario. For each of the scenarios, we
used the training populations to generate predictions of
the 2008 and 2010 progeny sets for DON concentration
and yield using RR-BLUP.
RESULTS
Phenotypic Traits and Marker Density
e parent set and each of the progeny sets were evalu-
ated in multiple yield and disease experiments between
2006 and 2011. e yield experiment for the parent set
in St. Paul 2010 was removed due to very severe lodging
that resulted in large error variance and no signicant
dierences among lines. For all traits and experiments,
we observed signicant dierences among lines (p-value
< 0.01) in the parent and progeny sets. Genetic variances
(Table 1) decreased for DON concentration and plant
height as a function of progeny set year, whereas the
genetic variances for yield and FHB resistance uctuated.
e estimates of the H2 were moderate for DON concen-
tration and FHB resistance; low to moderate for yield;
and high for plant height as expected based on previous
studies (Boukerrou and Rasmusson, 1990; Ma et al.,
2000; Mesn et al., 2003). Aer ltering the 1536 BOPA1
SNPs for MAF and missing data, 984 markers remained
that spanned 1085 cM of the barley genome with an aver-
age distance between adjacent markers of 1.1 cM.
Relationship between Parent and Progeny Sets
e average adjacent marker LD in the progeny sets were
greater than the parent set and showed a slight increase
over time (Fig. 2). e correlation of r between parental
and progeny sets ranged from 0.44 to 0.61 (Fig. 2). e
parental contribution of the parent set to the progeny
sets decreased continuously over time with about a 75%
reduction from 2006 to 2010 (Fig. 3). Concurrent with
this decrease in parental contribution was an increase
in genetic distance between parent and the consecutive
progeny sets over time (Fig. 3). e genetic relationship
between the parents and the progeny sets can be visual-
ized in the heatmap of the kinship matrix (Fig. 4). As
lines were developed in the breeding program, their
similarity to the parent set diminished over time.
Figure 3. Relationship between the parent set and each progeny
set expressed as percentage of parental contribution (square)
to each of the progeny sets and the genetic distance between
parental and progeny sets expressed as Fst (Wright’s fixation
index, triangle; Wright, 1965) and Nei’s genetic distances
(circle; Nei, 1987).
Figure 2. The average linkage disequilibrium (LD) of all possible
adjacent marker pairs in the parental and progeny sets (triangle)
and the correlation (Cor) of r between parents and each of the
progeny sets (circle).
Table 1. The estimated genetic (sg2) and error (se2)
variances for the parent contemporary and progeny
sets (2006–2010) for deoxynivalenol concentration
(DON), Fusarium head blight resistance (FHB), yield,
and plant height (HT).
Year
sg
2se
2
DON FHB Yield HT DON FHB Yield HT
Parents 15.83 12.00 114,411 26.20 20.00 35.00 444,284 11.30
2006 51.23 34.80 9 7,7 01 19. 30 122.23 8 9.60 330,349 8.90
2007 15.40 4.60 109,203 15.20 24.01 16.00 334,950 9.50
2008 23.20 14.40 280,128 28.40 74.10 20.20 274,314 7.90
2009 12.9 0 6.95 68,516 14.50 22.90 22.99 360,820 11.30
2010 7. 50 13 .99 95, 241 7.96 16.80 58.60 365,080 8.10
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 7 of 15
Marker-Trait Associations
Based on association analysis using the parent set, all of
the traits displayed quantitative inheritance with mul-
tiple loci distributed across the genome contributing
to the traits (Fig. 5). Coincident QTLs for plant height,
DON concentration, and FHB resistance on the short
arm of chromosome 4H were detected in a region previ-
ously identied in a study using a similar germplasm
(Massman et al., 2011). Using a relaxed p-value of 0.05,
we identied 62, 58, 62, and 59 markers associated with
DON concentration, FHB resistance, yield, and plant
height, respectively.
To characterize the possible role of selection, we
examined allele frequencies of the SNP markers associ-
ated with the four traits (Fig. 6). In general, there was an
increase over time with the complete set of genome-wide
markers. For markers associated with individual traits
this trend was most apparent for DON concentration
followed by plant height and yield. No relationship was
observed for FHB resistance.
To investigate the eect of trait architecture and the
distribution of maker eects on prediction accuracy, we
estimated the proportion of variance explained by each
SNP marker in the parent set for all traits (Fig. 7). Based
on the distribution of R2 values, we found that plant
height was the least complex trait with several markers
exceeding 0.30. Yield, on the other hand, was the most
complex trait with only a few markers with R2 values >
0.10. Based on Fig. 5, it is clear that multiple markers are
likely associated with the same QTL. Nevertheless, 83,
59, 24, and 17 markers had R2 values greater than 0.10 for
plant height, DON concentration, FHB resistance, and
yield, respectively.
Another way to characterize the genetic architec-
ture of the traits is to use the p parameter, which is the
proportion of markers with no eect, estimated from
the Bayes Cp modeling. When using the parent set as a
training population, the p parameter estimates for yield
over four runs ranged between 0.28 and 0.43, with a
mean p of 0.35. For DON concentration the p parameter
estimates ranged between 0.37 and 0.54, with a mean
p of 0.45. For FHB the p parameter estimates ranged
between 0.49 and 0.58, with a mean p of 0.53. For plant
height, the p parameter estimates ranged between 0.45
and 0.80, with a mean p of 0.63. us, based on p esti-
mates, yield was the most complex trait, followed by
DON concentration, FHB resistance, and then plant
height. is suggestive trend from higher to lower com-
plexity agrees with the distribution of R2 values for mark-
ers displayed in Fig. 7. Assessment of genetic architecture
based on the p parameter estimates also agrees with the
results of (Lorenz et al., 2012) for DON concentration
and FHB resistance.
Prediction Accuracy
For the four traits investigated, all prediction models
performed similarly to each other with respect to predic-
tion accuracy (Supplemental Fig. S1). When we averaged
the prediction accuracy across the ve progeny set years,
we found no signicant dierences among the models
(Supplemental Table S2). ere was a strong correlation
among the four models for the predictions of yield for
the combined set of progeny when using the parent set as
a training population (Fig. 8). Consistent with other GS
studies RR-BLUP, in which all marker eects are sam-
pled from the same distribution and similarly shrunken
toward zero equally, performed similarly to models that
do not impose that restriction. Further comparisons of
prediction accuracy are based on RR-BLUP.
Another important consideration for prediction
accuracy is the need to generate new phenotypic data for
model training or to use existing data sets. We estimated
marker eects using available historical data for yield
from the breeding program in the parent set and com-
pared that to estimates obtained using the contemporary
data set (Fig. 9). e average prediction accuracy over
the 5 yr based on contemporary (0.57) and historic data
(0.42) were not signicantly dierent (p-value = 0.38).
Combining historic and contemporary data was equal to
using the contemporary data alone.
In general, when using the parent set to predict
progeny sets, accuracy was highest for plant height and
lowest for yield (Fig. 10). Fusarium head blight resistance
and DON concentration had similar prediction accuracy.
e relationship between accuracy and year of the prog-
eny set also diered between traits (Fig. 10). For yield
and plant height the prediction accuracy uctuated over
time, while for DON concentration there was an overall
decrease. Accuracy for FHB resistance remained rela-
tively constant across years.
Varying the size of the training population by
adding one or more progeny sets to the parent set to
predict a later progeny set generally showed the same
trend observed for the parent set alone, and in several
instances resulted in reduced accuracy. Using the most
Figure 4. Heatmap displaying the similarity kinship matrix calcu-
lated using marker data for parents and all progeny sets.
8 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
recent breeding lines and environments to the test
population by training a prediction model from the two
progeny sets before the validation year was generally less
accurate than using the parent set. e general trend was
that higher trait H2 in the parent training population
corresponded to higher ra when parents predicted all the
progenies using RR-BLUP (Fig. 11).
Since adding consecutive sets of lines to the parent
set changed both the composition and the size of the
training set, we looked at the eect of population size on
prediction accuracy with the parent set only, and with
the parent set combined with the progeny sets with dif-
ferent population sizes drawn at random. In both cases,
we identied an increase in accuracy with increasing in
population size for DON concentration and yield (Fig.
12). However, prediction accuracy for DON concentra-
tion seemed to plateau at a population size of 75, while
yield did not appear to plateau. It was also interesting
that random sampling from just the parent set oen
produced higher prediction accuracies than random
sampling from the combined parent and progeny sets
when compared at the same training population size.
DISCUSSION
Successful implementation of GS will involve the use of
improved genotyping technology to shorten the breeding
cycle and increased selection intensity by eective mod-
eling to accurately predict breeding values. Prior studies
have examined factors that aect prediction accuracy
through simulation and empirically through cross-vali-
dation (e.g., Daetwyler et al., 2010; Lorenz et al., 2012). To
assess prediction accuracy in a more realistic context, we
used sets of parents and progenies from an active breed-
ing program as training and validation sets. Because
breeding populations are dynamic, we tested progeny
sets dened chronologically over a 5-yr period. We found
that prediction accuracy varied over time, and that sim-
ply adding data from breeding progenies to the training
population did not improve and oen reduced predic-
tion accuracy. is suggests that careful construction of
Figure 5. Manhattan plot displaying significance level for association mapping of deoxynivalenol (DON) concentration, Fusarium head
blight (FHB) resistance, yield, and plant height in the contemporary parent data set. The relaxed threshold of 1.3 – log(p) which corre-
sponds to p-value of 0.05 is shown with a horizontal line.
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 9 of 15
training populations is warranted. We considered the
relationship between the training and the validation pop-
ulations with regard to genetic distance and dierences
in LD and allele frequencies. In this breeding population,
all of these factors changed over time, and each individu-
ally could not completely account for dierences in pre-
diction accuracy. However, the data support their role in
aecting prediction accuracy and suggest they should be
taken into account when designing training data sets and
developing strategies for retraining models over time to
maintain acceptable levels of accuracy.
Prediction Accuracy Can be Affected by
Changes in Breeding Populations Over Time
Breeding populations are dynamic and as such
approaches to using prediction methods should be
informed by changes in prediction accuracy that may
occur over time. Breeding value predictions are inu-
enced by allele frequencies, LD level, and the introgres-
sion of new alleles. ese factors will change over breed-
ing cycles due to selection, genetic dri, and unequal
parental contribution to progenies. We investigated pre-
diction accuracy in validation sets of breeding lines over
a 5-yr period and observed both little to no change, as
well as substantial decrease in prediction accuracy over
time, depending on the trait. To better understand the
underlying population parameters that could be aecting
prediction accuracy, we compared the parental training
population to the progeny validation sets with respect
to allele frequencies, parental contribution, genetic dis-
tance, and LD.
More than 35% of marker alleles that were segregat-
ing in the parent set became xed in the 2010 progeny
set. Gradual increases in allele xation for trait specic
markers are an indication of the eect of selection and/or
genetic dri. A previous study of the University of Min-
nesota barley breeding program showed that a reduction
in allelic diversity for specic simple sequence repeat
markers was in some cases associated with QTL regions
for traits that were under selection (Condón et al., 2008).
Once a marker associated with a trait that is segregat-
ing in training population becomes xed in subsequent
progeny generations, it loses its predictive value for the
purpose of selection. We observed a substantial increase
in the number of xed SNPs associated with DON that
corresponded to a reduction in prediction accuracy.
However, we saw a similar increase in xed SNPs for yield
and no corresponding reduction in accuracy. One pos-
sible explanation is that yield is likely conditioned by a
greater number of QTL with smaller eects and therefore
increases in the number SNPs that become xed over
time would have less of an eect on prediction accuracy.
Prediction accuracy should be greatest when the
training population is more closely related to the valida-
tion population (Habier et al., 2007; Hayes et al., 2009;
Lorenz et al., 2012). We observed in most cases that the
2006 progeny set, which was genetically more similar to
and had the largest number of direct parents from the
parent set, was predicted with the greatest accuracy. e
increase in the genetic distances between the parental
and progeny sets was most closely associated with a
decline in prediction accuracy for DON but not for FHB
resistance, yield, and plant height. is indicates that
other factors may also contribute to changes in predic-
tion accuracy over time.
Populations can dier in the degree of the LD
between markers and QTL due to dri, selection, and/or
recombination (Dekkers, 2004; Barton and Otto, 2005).
Prediction accuracy should increase as LD between
markers and QTL increases. Recombination in breed-
ing populations should reduce LD between markers
and QTL over the time while selection and genetic dri
should increase LD (Pfaelhuber et al., 2008). Habier et
al. (2007) studied the eect of LD on prediction accuracy
over many generations and found a decrease in predic-
tion accuracy was associated with decay of LD. We found
a general increase in adjacent marker LD in the prog-
eny sets over time, while prediction accuracy generally
remained constant or decreased. We also examined the
persistence of adjacent marker LD between the parent
set and each of the progeny sets using the correlation of
r (de Roos et al., 2008; Toosi et al., 2010). e correlation
of r did not decay over the window of time of this experi-
ment, despite the fact that genetic distance between the
parents and progeny sets increased over time. Asoro et
al. (2011) suggested that the ability of early generations
to predict later generations was due to the persistence of
the LD phase between early and late generations. us,
even if validation populations become more genetically
distant from training populations, if the LD phase is con-
sistent, prediction accuracy will be maintained.
Figure 6. Percentage of single nucleotide polymorphisms (SNPs)
that are fixed in the complete marker set (genome-wide) and in
the subsets of markers associated with deoxynivalenol (DON)
concentration, Fusarium head blight (FHB) resistance, yield, and
plant height (see Fig. 5) in each of the five progeny sets between
2006 and 2010.
10 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
How do Trait and Population Characteristics
Affect Prediction Accuracy?
Ideally, GS can be applied to traits that vary in H2 and
genetic architecture. In our study based on estimates of
R2 and the p parameter, yield was the most complex trait
while plant height was the least complex. However, infer-
ence of genetic architecture based on p should be inter-
preted cautiously (Gianola, 2013). We found that yield,
a more complex and lower H2 trait, generally had lower
prediction accuracy than a simpler and higher H2 trait
such as plant height. is is consistent with other studies
where complex traits controlled by many loci with small
eects produced lower prediction accuracy than less
complex traits (Hayes et al., 2010). Genomic predictions
should be more accurate for traits with higher H2 (Hayes
et al., 2009; Daetwyler et al., 2010; Lorenz, 2013; Combs
and Bernardo, 2013). Prediction accuracy for yield in
the current study was higher than accuracy observed
for yield in oat (Avena sativa L.; Asoro et al., 2011). ey
reasoned that lower accuracy for oat yield was due to the
evaluation of their germplasm in a wider range of envi-
ronments which reduced the genetic variance relative to
the G ´ E variance, and thereby reduced H2 for yield.
e barley germplasm in the current study was evaluated
in more homogenous environments, which are the target
production and evaluation environments in Minnesota.
erefore, the genetic variance is expected to be higher
relative to G ´ E, leading to a higher H2 estimate and an
increased prediction accuracy.
In addition to the characteristics of trait, H2, and
genetic variance, LD (as discussed above) and population
structure can aect prediction accuracy. ese factors
could have contributed to the striking dierence in the
response of accuracy to increased training population size
for DON versus yield (Fig. 12). Both traits have higher
than expected accuracies at very low training population
sizes (e.g., 25 individuals). Windhausen et al. (2012) sug-
gested that high accuracy at low training population size
can be diagnostic of subpopulation structure aecting
accuracy. In this context, we suggest that structure could
reduce accuracy at high training population size from the
following mechanism. Population structure is a cause of
LD: two loci that both have dierences in allele frequency
across subpopulations will be in LD. us, structure can
cause association between a marker and several QTL.
is phenomenon has been an ongoing issue in genome-
wide association studies (e.g., Pritchard et al., 2000). In
the context of genomic prediction, structure-generated
disequilibrium between a marker and several QTL will
prevent the marker’s estimated eect from converging on
the eect of a QTL to which it is actually linked, regard-
less of the training population size. ough they did not
comment on it, Wimmer et al. (2013) observed a phenom-
enon like this: in their Fig. 6A, at low model complexity,
Figure 7. Distribution of marker R2 values for plant height, deoxynivalenol (DON) concentration, Fusarium head blight (FHB) resistance,
and yield. R2 is the proportion of genetic variance explained by a marker.
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 11 of 15
the error of marker eect estimates increases as training
population size increases. is increase in error arises
presumably because of the documented deep structure
in rice (Zhao et al., 2011). e question remains as to why
this mechanism would more strongly aect DON than
yield. We hypothesize that structure in the Minnesota
barley breeding program is more strongly correlated to
DON than to yield, given that it has been purposefully
split into a population where FHB resistance was priori-
tized versus one where yield and quality continued to be
prioritized (Fang et al., 2013).
e availability of genome-wide markers can improve
our understanding of genetic architecture and the extent
to which epistasis inuences complex traits. In general,
the four genomic prediction models tested produced sim-
ilar accuracies across the four traits investigated in this
study. e four models diered in assumptions about the
genetic architecture of the trait and the extent to which
nonadditive eects contribute to the prediction. Lande
and ompson (1990) suggested the use of epistatic eects
in addition to additive eects in MAS schemes. Liu et al.
(2003) found that including epistasis improved both the
response and eciency of MAS. In some studies, includ-
ing epistasis in genomic prediction models through the
use of nonadditive kernels resulted in increased predic-
tion accuracy over RR-BLUP (Crossa et al., 2010; Wang
et al., 2012). In our study, we found that simple additive
models (RR-BLUP and Bayes Cp) performed similarly
to those that account for both additive and nonadditive
eects (Exponential and Gaussian). ese results are sim-
ilar to a recent study of barley breeding lines evaluated for
DON concentration and FHB resistance that showed that
both Bayes Cp and RR-BLUP produced the same level of
accuracy (Lorenz et al., 2012).
Practical Implications for Breeding
e increasing ease and rapidly declining cost of geno-
typing means that assembling phenotype data to train
prediction models will be the limiting step to implement-
ing GS. We found that using the contemporary parent
data was slightly, but not signicantly, better than using
historic parent data to train a prediction model. e con-
temporary data was balanced and we corrected for eld
spatial variability using the common checks, whereas
the historic data was unbalanced and no correction for
eld variability was made. Nevertheless, the prediction
accuracy from historic data was, in most cases, around
0.50 for each of the 5 yr, and this level of accuracy sug-
gests GS would be eective if the breeding cycle time is
half of what is done in phenotypic selection (Asoro et al.,
2011). ese results suggests that breeders could reduce
time and costs by using unbalanced historical data aer
proper adjustment for spatial variability and trial eects
to train prediction models. Historical unbalanced phe-
notypic data were also used to assess the use of GS in oat
(Asoro et al., 2011). Initiating GS with existing data and
later incorporating contemporary data sets should allow
breeders to realize benets of GS sooner and improve
eectiveness over time.
e size and composition of the training population
are important factors to manipulate prediction accuracy.
Breeders may consider combining training data sets to
maximize the use of the available phenotypic and genotypic
information and generate larger population sizes (Hayes et
al., 2009; de Roos et al., 2009; Asoro et al., 2011; Lorenz et
al., 2012; Technow et al., 2013). Lorenz et al. (2012) found
little to no improvement in prediction accuracy for FHB
resistance and DON concentration when increasing the size
of the training population by combining dierent barley
Figure 8. Scatterplot matrix for all prediction models when using
contemporary parent data set as the training population to predict
all progeny sets (2006–2010) using ridge regression best linear
unbiased prediction (RR-BLUP), Gaussian kernel model (GAUSS),
Exponential kernel model (EXP), and Bayes Cp for yield.
Figure 9. Prediction accuracy for yield using historic, contempo-
rary, and combined (historic and contemporary) parent data to
predict five progeny sets using ridge regression best linear unbi-
ased prediction.
12 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
breeding populations. Conversely for maize, when combin-
ing both int and dent heterotic groups together, prediction
accuracies increased by 10 and 13% when predicting dent
and int heterotic groups, respectively, for Northern corn
leaf blight resistance (Technow et al., 2013). In our study, we
found only a slight improvement or a reduction in accuracy
when increasing the population size by adding progeny
sets from the same breeding program to the parent set.
However, when adding progeny to the parent set, both the
size and the composition of the training populations were
altered. erefore, we separated these two factors by gener-
ating training populations by randomly sampling from the
combined data set. Interestingly, the prediction accuracy for
DON concentration plateaued at a much smaller popula-
tion size compared with yield (Fig. 12). Prediction accuracy
for yield did not level o, suggesting that the benet from
increasing training population size may depend on the trait.
In addition to optimizing prediction accuracy, the
eectiveness of GS will increase by shortening the breed-
ing cycle time and reducing the cost of selection (Hener
et al., 2010; Jannink et al., 2010). In the University of
Minnesota’s barley breeding program, GS is implemented
at the F3 stage for FHB resistance, DON concentration,
and yield. is is 1 yr aer crossing parents, compared
with a 4-yr breeding cycle that is typical for phenotypic
selection. e prediction accuracies that we observed
based on progeny validation always exceeded 0.25, indi-
cating that GS should exceed phenotypic selection in
gain per unit time. Combined with rapidly decreasing
genotyping costs, this suggests that GS should improve
breeding eciency substantially.
Supplemental Information Available
Supplemental information is included with this article.
Supplemental Figure S1. Prediction accuracy for
(A) DON accumulation, (B) FHB resistance, (C) yield,
and (D) plant height using RR-BLUP, Exponential kernel
method (EXP), Gaussian kernel method (GAUSS), and
Bayes Cπ when using the parent set as a training popula-
tion to predict the ve progeny sets.
Supplemental Table S1. Number of experimental
trials for the parent and ve progeny sets for deoxyniva-
lenol (DON) accumulation, Fusarium head blight (FHB)
resistance, yield, and plant height. Each line was repli-
cated twice in each experiment.
Figure 10. Prediction accuracy (ra/H, where ra is predictive ability and H the square root of heritability) for the four traits using ridge
regression best linear unbiased prediction in three scenarios for training populations: using the contemporary parent data set to predict
each progeny set (circle), using the sequential addition of progeny sets to the contemporary parent data set to predict the later progeny
set (triangle), and using the two previous years of the progeny sets to predict the later progeny set (square). The heritability for each
progeny set used as the validation set is shown in the solid bar.
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 13 of 15
Supplemental Table S2. Mean prediction accuracy
and p-values when using parent set as the training popu-
lation and ve progeny sets as the validation populations
for deoxynivalenol (DON) accumulation, Fusarium head
blight (FHB) resistance, yield, and plant height using
RR-BLUP, Exponential kernel method, Gaussian kernel
method, and Bayes Cπ.
Acknowledgments
We thank Ed Schiefelbein, Guillermo Velasquez, Karen Beaubien, Rich-
ard Horsley, the Minnesota Agricultural Experiment Station NW and WC
Research and Outreach Centers, and the University of Minnesota Small
Grains Pathology Lab for their contributions to conducting eld trials
and collecting data. In addition, we thank Shiaoman Chao and Yanhong
Dong for SNP genotyping and toxin analysis, respectively. Additional
thanks to Yang Da for helpful suggestions on an earlier dra of this man-
uscript. Funding for this work was supported by grants from the National
Institute of Food and Agriculture USDA Award Number 2009-65300-
05661, the U.S. Wheat and Barley Scab Initiative USDA–ARS Agreement
No. 59-0206-9-072, USDA HATCH project MIN-13-030, and the Rahr
Foundation. Any opinions, ndings, conclusions, or recommendations
expressed in this publication are those of the authors and do not necessar-
ily reect the view of the USDA.
Figure 12. Relationship between population size and prediction accuracy for deoxynivalenol (DON) concentration and yield. Three
scenarios are presented: (A) Using the contemporary parent data set as the training population to predict the 2008 and 2010 progeny
sets. Each point represents a subset of the training population by random sampling 500 times. (B) Using the combined contemporary
parent data set and progeny sets before the validation set to predict the 2008 and 2010 sets. Each point represents a subset of the
training population by random sampling 500 times. (C) The sequential addition of a progeny set to the contemporary parent data set
as a training population to predict the 2008 and 2010 progeny sets. The training population sizes are 168 (parent set), 264 parent set
+ 2006 (n = 96), and so on.
Figure 11. Relationship between the predictive ability (correla-
tion between genomic estimated breeding value and phenotypic
performance) when using a contemporary parent data set to
predict progeny sets using ridge regression best linear unbiased
prediction (RR-BLUP), and heritability of the contemporary parent
training population for plant height, deoxynivalenol (DON) con-
centration, Fusarium head blight (FHB) resistance, and yield.
14 of 15 th e p l a nt g en om e m arch 2 015 vo l. 8, n o. 1
References
Asoro, F.G., M.A. Newell, W.D. Beavis, M.P. Scott, and J.-L. Ja nnink. 2011.
Accuracy and trai ning population design for genomic selection on quantita-
tive traits in elite North American oats. Plant Gen. 4:132–144. doi:10.3835/
plantgenome2011.02.0007
Barrett, J.C., J. Maller, and M.J. Daly. 2005. Haploview : Analysis and v isualiza-
tion of LD and haplotype maps. Bioinformatics 21:263265. doi:10.1093/
bioinformat ics/bth457
Barton, N.H., and S. P. Otto. 2005. Evolut ion of recombination due to ra ndom
dri. Genetics 169:23532370. doi:10.1534/genetics.104.032821
Bernardo, R. 2008. Molecula r markers and selec tion for complex traits in plants:
Learning from the last 20 years. Crop Sci. 48:16491664. doi:10.2135/crop-
sci2008.03.0131
Bernardo, R. 2010. Breed ing for quantitat ive traits in pla nts. Stemma Press,
Woo dbur y, MN .
Blake, V.C. , J.G. Kling, P. M. Hayes, and J.L. Jannink. 2 012. e Hordeum Tool-
box—e Barley Coord inated Agricu ltural Projec t genotype and phenotype
resource. Plant Gen. 5:8191. doi:10.3835/plantgenome2012.03.0002
Boukerrou, L., and D. Rasmusson. 1990. Breeding for hig h biomass yield in spr ing
barley. Crop Sci. 30:3135. doi:10.2135/cropsci1990.0011183X003000010007x
Castro, A.J., F. Capettini, A.E. Corey, T. Filichkina, P.M. Hayes, A. Kleinhofs, D.
Kudrna, K. Richardson, S. Sandoval-Islas, C. Rossi, and H. Vivar. 2003. Map-
ping and py ramiding of qu alitative and quantitative resistance to st ripe rust in
barley. eor. Appl. Genet . 107:922930. doi:10.1007/s00122-003-1329-6
Chen, C.Y. , I. Misztal, I. Aguilar, S. Tsur uta , T.H.E. Meuwissen, S.E. Aggrey,
T. Wing, and W.M. Muir. 2011. Genome-wide marker-assisted select ion
combining all pedigree phenotypic information with genotypic data in one
step: An example using broiler chickens. J. Anim. Sci. 89:2328. doi:10.2527/
jas.2010-3071
Close, T.J ., P.R . Bhat, S. Lonardi, Y. Wu, N. Rostoks, L. Ramsay, A. Druka, N.
Stein, J.T. Svensson, S. Wanamaker, S. Bozdag, M.L. Roose, M.J. Moscou,
S. Chao, R.K. Var shney, P. Szuecs, K. Sato, P.M . Hayes, D.E. Matthews,
A. Kleinhofs, G.J. Muehlbauer, J. DeYo ung, D.F. Marshall, K. Madishetty,
R.D. Fenton, P. Condamine, A. Graner, and R. Waugh. 2009. Development
and implementation of high-throughput SNP genot yping in barley. BMC
Genomics 10:582. doi:10.1186/1471-2164-10-582
Collins, H.M., J.F. Panozzo, S.J. Logue, S. P. Jeeries, and A.R. Barr. 2003. Map-
ping and validation of chromosome regions associated with high malt
extract in barley (Hordeum vulgare L.). Aust. J. Agric. Re s. 54:12231240.
doi:10.1071/AR02201
Combs, E., and R. Bernardo. 2013. Accuracy of genomewide selection for dier-
ent trait s with constant population size, heritabilit y, and number of markers.
Plant Gen. 6:1–7. doi:10.3835/plantgenome2012.11.0030
Condón, F., C. Gustus, D.C. Rasmusson, and K .P. Smith. 2008. Eect of
advanced cycle breeding on genetic diversit y in barley breedi ng germplasm.
Crop Sci. 48:10271036. doi:10.2135/cropsci2007.07.0415
Crossa, J., G. de los Campos, P. Perez , D. Gianola, J. Burgueno, J. Luis Araus,
D. Makumbi, R. P. Singh, S. Dreisigacker, J. Yan , V. Arief, M. Banziger, and
H. Braun. 2010. Prediction of gene tic values of quant itative traits in plant
breeding using pedig ree and molecular markers. Genetics 186:713724.
doi:10.1534/gene tics.110.118521
Daetwyler, H.D., R. Pong-Wong, B. Villanueva, and J.A. Woolliams. 2010. e
impact of genetic architec ture on genome-wide evaluation methods. Genet-
ics 185:10211031. doi:10.1534/genetics.110.116855
Dekkers, J.C.M. 2004. Commercial application of marker- and gene-assisted
select ion in livestock: Strategies and lessons. J. Ani m. Sci. 82:E313 E328.
de los Campos, G., D. Gianola, and G.J.M. Rosa. 2009. Reproducing kernel Hil-
bert spaces regression: A general framework for genetic eva luation. J. A nim.
Sci. 87:18831887. doi:10.2527/jas.2008-1259
de Roos, A .P.W. , B.J. Hayes, and M.E. Goddard. 2009. Reliabilit y of genomic pre-
dictions across multiple populations. Genetics 183:15451553. doi:10.1534/
genetics.109.104935
de Roos, A .P.W. , B.J. Hayes, R.J. Spelman, and M.E. Goddard. 2008. Linkage
disequ ilibrium and persistence of phase i n Holstein–Friesian, Jersey and
Angus cattle. Genetics 183:1545–1553. Genetics 179:15031512. doi:10.1534/
genetics.107.084301
Endelman, J.B. 2011. Ridge regression and other kernels for genomic selec-
tion wit h R package rrBLUP. Plant Gen. 4:250255. doi:10.3835/plantgen-
ome2011.08.0024
Fang, Z., A. Eule-Nashoba, C. Powers, T.Y. Kono, S. Takun o, P.L . Morrell, and
K.P. Smit h. 2013. Compa rative analyses identify t he contributions of exot ic
donors to disease resista nce in a barley experimental popu lation. G3: Genes
Genomes Genet . 3:1945–1953. doi:10.1534/g3.113.007294
Gianola, D. 2013. Priors in whole-genome regression: e Bayesian alphabet
returns. Genetic 194:573596. doi:10.1534 /genetics.113.151753
Gianola, D., a nd J.B.C.H.M. van Kaa m. 2008. Reproducing kernel Hilbert spaces
regression methods for genomic assisted prediction of quantitative traits.
Genetics 178:22892303. doi:10.1534/gene tics.107.084285
Goddard, M.E., and B.J. Hayes. 2007. Genomic selection. J. Ani m. Breed. Genet.
124:323330. doi:10.1111/j.1439-0388.2007.00702.x
Habier, D., R.L. Fernando, and J.C.M. Dek kers. 2007. e impact of genetic
relationship information on genome-assisted breeding values. Genetics
177:23892397.
Hayes, B.J., P. J. Bowman, A.C . Chamberlain, and M.E. Goddard. 2009. Genomic
select ion in dairy cattle: Progress and challenges. J. Dairy Sci. 92:433443.
doi:10.3168/jds.2008-1646
Hayes, B.J., J. Pr yce, A.J. Chamberlain, P.J. Bowman, and M.E. Goddard. 2010.
Genetic a rchitecture of complex traits and accuracy of genomic prediction:
Coat colour, milk-fat percentage and ty pe in holstein cat tle as contrast ing
model traits. PLoS Genet. 6:E1001139. doi:10.1371/journal.pgen.1001139
Hener, E.L., J.-L . Jannink, and M.E. Sorrells. 2 011. Genomic selection accuracy
using multifamily prediction models in a wheat breeding program. Pla nt
Gen. 4:6575. doi:10.3835/plantgenome.2010.12.0029
Hener, E.L., A.J. Lorenz, J. Jannink, and M.E. Sorrells. 2 010. Plant breeding
with genomic selection: Gain per unit t ime and cost. Crop Sci. 50:16811690.
doi:10 .2135/cropsci2009.11.0662
Hener, E.L., M.E. Sorrells, and J.-L. Ja nnink. 2009. Genomic selection for crop
improvement. Crop Sci. 49:1–12 . doi:10.2135/cropsci2008.08.0512
Hoeinz, N., D. Borcha rdt, K. Weissleder, and M. Frisch. 2012. Genome-based
predict ion of test cross perfor mance in two subsequent breeding cycles.
eor. Appl. Genet. 125:1639164 5. doi:10.1007/s00122-012-1940-5
Horsley, R.D., J.D. Franckowiak, P.B . Schwarz, and S.M. Neate. 2006a. Reg-
istrat ion of ‘Stellar-ND’ ba rley. Crop Sci. 46:980981. doi:10.2135/crop-
sci2005.06-0121
Horsley, R.D., J.D. Franckowiak, P.B . Schwarz, and B.J. Steenson. 2002. Reg-
istrat ion of ‘Drummond’ barley. Crop Sci. 42:664665. doi:10.2135/crop-
sci2002.0664
Horsley, R.D., D. Schmierer, C. Maier, D. Kudrna, C.A. Urrea, B.J. Steenson,
P.B . Schwarz, J.D. Franckowiak, M.J. Green, B. Zhang, and A. Kleinhofs.
2006b. Identication of QTLs associated with Fusarium head blight resis-
tance in barley accession CIho 4196. Crop Sci. 46:145156. doi:10.2135/
cropsci2005.0247
Jannink, J.-L., A.J. Lorenz, and H. Iwata. 2 010. Genomic selection in plant breed-
ing: From theory to practice. Brief. Funct. Genomics. 9:166 177. doi:10.1093/
bfg p/elq 001
Kang, H.M., N.A. Zaitlen, C.M. Wade , A. Kirby, D. Heckerman, M.J. Daly,
and E. Eskin. 2008. Ecient control of population structure in model
organism association mapping. Genetics 178:17091723. doi:10.1534/genet-
ics.107.080101
Kizilkaya, K., R.L. Fernando, and D.J. Garrick. 2010. Genomic pred iction of
simulated multibreed and purebred performance using observed y thou-
sand single nucleotide polymorphism genot ypes. J. Anim. Sci. 88:544551.
doi:10.2527/jas.2009-2064
Lande, R., and R. ompson. 1990. Eciency of marker-assiste d selection in the
improvement of qua ntitative traits. Genetics 124:743 756.
Legarra, A., C. Robert-Granie, E. Manfredi, and J.M. Elsen. 2008. Performance
of genomic selection in mice. Genetics 180:611618. doi:10.1534/genet-
ics.108.088575
Liu, P., J. Zhu, X. Lou, and Y. Lu. 2003. A method for marker-assisted selec tion
based on QTLs with epist atic eects. Genetica 119:7586.
Lorenz, A.J. 2013. Resource allocation for maximizing prediction accuracy a nd
genetic ga in of genomic selection in plant breeding: A simulation experi-
ment. G3: Genes Genomes Genet. 3:481–491. doi:10.1534/g3.112. 004911.
Lorenz, A.J., S. Chao, F.G. Asoro, E.L. Hener, T. Hayashi, H. Iwata, K.P. Smith,
M.E. Sorrells, and J.-L. Jannink. 2011. Genomic select ion in plant breeding:
Knowledge and prospects. Adv. Agron. 110 :77123. doi:10.1016/ B978- 0-12-
385531-2.00002-5
Lorenz, A.J., K.P. Smith, and J.-L. Jannink. 2012. Potential and optimization
of genomic selection for Fusar ium head blight resistance in six-row barley.
Crop Sci. 52:16091621. doi:10.2135/cropsci2011.09.0503
Lorenzana, R.E., and R. Bernardo. 2009. Accuracy of genotypic value predic-
tions for marker-based selec tion in biparental plant populations. eor.
Appl. Genet. 120:151161. doi:10.1007/s00122-009-1166-3
Luan, T., J.A. Woolliams, S. Lien, M. Kent, M. Svendsen, and T.H . Meuwissen.
2009. e accurac y of genomic selection i n Norwegian red cattle assessed by
cross-validation. Genetics 183:11191126 . doi:10.153 4/genet ics.109.107391
sa l l a m e t a l.: ge no mi c p re di ct io n ac cu ra cy in bar le y b re ed in g 15 of 15
Ma, Z., B.J. Steenson, L.K. Prom, and N. L .V. Lapitan. 2000. Mapping quantita-
tive trait loci for Fusar ium head blight resistance in barley. Phytopatholog y
90:10791088. doi:10.1094/PHYTO.2000.90.10.1079
Massman, J., B. Cooper, R. Horsley, S. Neate, R. Dill-Macky, S. Chao, Y. Dong, P.
Schwarz, G.J. Muehlbauer, and K. P. Smit h. 2011. Genome-wide association
mapping of Fusarium head blight resistance in contemporar y barley breed-
ing germplasm. Mol. Breed. 27:439454. doi:10.100 7/s11032- 010-94 42-0
Mesn, A., K .P. Smith, R. Waug h, R. Dill-Macky, C.K. Evans, C.D. Gustus, and
G.J. Muehlbauer. 2003. Quantitative tra it loci for Fusarium head blight resis-
tance in barley detected in a two-rowed by six-rowed population. Crop Sci.
43:307318. doi:10.2135/cropsci2003.3070
Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard. 2001. Pred iction of total
genetic va lue using genome-wide dense marker maps . Genetics 157:18191829.
Mirocha, C.J., E. Kolaczkowski, W. Xie, H. Yu, and H. Jelen. 1998. Analysis
of deoxynivalenol and its derivatives ( batch and single kernel) using gas
chromatography/mass spectrometry. J. Agric. Food Chem. 46:14141418.
doi:10 .1021/jf970857o
Nei, M. 1987. Molecular evolutionary genetics. Columbia Univ. Press, N ew Yo rk.
Pfaelhuber, P., A. Lehner t, and W. Stephan. 2008. Linkage d isequilibrium
under genet ic hitchhik ing in nite populations. Genetics 179:527537.
doi:10.1534/gene tics.107.081497
Piepho, H.P. 2009. Ridge regression and extensions for genomewide selection in
maize. Crop Sci. 49:11651176. doi:10.2135/cropsci2008.10.0595
Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S. Dreisigacker,
J. Crossa, H. Sánchez-Villeda, M. Sorells, and J.-L. Jannin k. 2012. Genomic
select ion in wheat breeding u sing genotyping-by-sequencing. Pla nt Gen.
5:103113. doi:10.3835/plantgenome2012.06.0006
Pritchard, J.K., M. Stephens, N.A. Rosenberg, and P. Donnelly. 2000. Associa-
tion mapping in structured populations. Am. J. Hum. Genet. 67:170181.
doi:10 .1086/302959
R Development Core Team. 2012. R: A language and environment for statistica l
computing. R Foundation for Stat istical Computing, Vienna, Austria.
Rasmusson, D.C., K .P. Smith, R. Dill-Macky, E.L. Schiefelbein, and J.V.
Wiersma. 20 01. Registration of ‘Lacey’ Barley. Crop Sci. 41:1991. doi:10 .2135/
cropsci2001.1991
Rasmusson, D.C., and R.D. Wilcoxson. 1983. Registration of ‘Robust’ barley.
Crop Sci. 23:1216. doi:10.2135/cropsci1983.0011183X002300060048x
Rasmusson, D.C., R.D. Wilcoxson, and J.V. Wiersma. 1993. Registration of
‘Stander’ barley. Crop Sci. 33:1403 . doi:10.2135/cropsci1993.0011183X00330
0060057x
Rutkoski, J., J. Benson, Y. Jia, G. Brown-Guedi ra, J.-L. Jannink, and M. Sor-
rells. 2012. Evaluation of genomic predict ion methods for Fusarium head
blight resistance in whe at. Plant Gen. 5:5161. doi:10.3835/plantgen-
ome2 012.02.00 01
SAS Institute. 2011. e SAS system for Windows. v.9.3. SAS Inst., Ca ry, NC.
Slotta, T. A., L. Brady, and S. Chao. 2008. High throughput tissue preparation
for large-scale genoty ping experiments. Mol. Ecol. Resour. 8(1):8387.
doi :10.1111/j.1471-8286 .20 07.019 07.x
Smith, K .P. , A. Budde, R. Dill-Macky, D.C. Rasmusson, E. Schiefelbein, B. Stef-
fenson, J.J. Wiersmaa, J .V. Wiersmad, a nd B. Zhang. 2013. Registration of
‘Quest’ spring malti ng barley with improved resistance to Fusarium head
blight. J. Plant Reg. 7:125129. doi:10.3198/jpr2012.03.0200crc
Smith, K .P. , D.C. Rasmusson, E. Schiefelbein, J.J. Wiersma, J.V. Wiersma, A.
Budde, R. Dill-Macky, and B. Steenson. 2010. Registration of ‘Rasmusson
barley. J. Plant Reg. 4:16 4167. doi:10.3198/jpr2009.10.0622crc
Steenson, B.J. 2003. Fusarium head blight of barley: Impact, epidemics, man-
agement, and strategies for identifying and utilizing genetic resistance. In:
K.J. Leonard and W.R. Bushnel l, editors, Fusarium head blight of wheat and
barley. APS Press, St. Paul, MN. p. 241–295.
Tec hn ow, F., A. Bürger, and A.E. Melchinger. 2013. Genomic prediction of
northern corn leaf blight resistance in maize wit h combined or separated
trai ning sets for heterot ic groups. G3: Genes Genomes Genet. 3:197–203.
doi:10.1534 /g3.112.0 04630.
Too si , A., R. Fernando, and J. Dekkers. 2010. Genomic selection in admixed and
crossbred populations. J. Anim. Sci. 88:3246. doi:10.2527/jas.2009-1975
Wan g, D., S.I. El-Basyoni, S.P. Baenziger, J. Crossa, K.M. Eskridge, and I. Dwei-
kat. 2012. Prediction of genetic values of quantitative traits with epistatic
eects in plant breeding populations. Heredity 109:313319. doi:10.1038/
hdy. 2012 .44
Weir, B.S., and C.C. Cockerham . 1984. Estimating F-statistics for the analysis of
population structure. Evolution 38:1358–1370.
Wimmer, V., C. Lehermeier, T. Albrecht, H.-J. Auinger, Y. Wang, and C.-C.
Schön. 2013. Genome-wide prediction of traits with dierent genetic
architecture throu gh ecient variable selection. Genetics 195:573587.
doi:10.1534/gene tics.113.150078
Windhausen, V.S ., G.A. Atlin, J.M. Hickey, J. Crossa, J.-L. Jannink, M.E. Sor-
rells, B. Raman, J.E . Cairnst, A. Tarekegne, K. Semagen, Y. Beyene, P. Grud-
loyma, F. Tech now, C. Riedelsheimer, and A. Melchinger. 2012. Eectiveness
of genomic prediction of maize hybrid performa nce in dierent breeding
populat ions and environment s. G3: Genes Genomes Genet. 2:1427–1436.
doi:10.1534/g3.112.003699.
Wright, S. 1965. e interpretation of population structure by F-st atistics wit h
specia l regard to systems of mating. Evolution 19:395–420.
Xu, Y., and J.H. Crouch. 2008. Marker-assisted selection in plant breeding:
From publicat ions to practice. Crop Sci. 48:391407. doi:10.2135/crop-
sci2007.04.0191
Zhao, Y., M. Gowda, W. Liu, T. Würschum, H .P. Maurer, F.H. Longin, N. Ranc,
and J.C. Reif. 2012. Accuracy of genomic selection in European maize elite
breeding popu lations. eor. Appl. Genet. 124:769776. doi:10.1007/s00122-
011-1745-y
Zhao, K., C.-W. Tung, G.C. Eizenga, M.H. Wri ght, M.L. Ali, A.H. Price, G. Nor-
ton, M.R. Islam, A. Reynolds, J. Mezey, A.M. MsClung, C.D. Bustama nte,
and S. McCouch. 2011. Genome-wide association mapping revea ls a rich
genetic architecture of complex traits in Oryza sativa. Nature Comms. 2:467.
doi:10.1038/ncomms1467
Zhong, S., J.C.M. Dekker, R.L. Fernando, and J.-L. Jannink. 2009. Factors aect-
ing accu racy from genomic selection in populations derived f rom multiple
inbred li nes: A barley case study. Genetics 182:355364. doi:10.1534/ge net-
ics.108.098277
... The accuracy of GS, defined as the correlation of phenotypically estimated values and the genomic estimated breeding values, determines the usefulness of GS in a program (Rabier et al., 2016). Several factors are known to affect GS prediction accuracies, including the number of markers used (Zhong et al., 2009;Asoro et al., 2011), the size of the TP (Asoro et al., 2011), the relatedness of individuals between TP and PP (Clark et al., 2012;Sallam et al., 2015), and the extent of linkage disequilibrium between the markers and causal loci (Zhong et al., 2009;Brito et al., 2011). Aside from those factors, different statistical models give different prediction accuracies (Heslot et al., 2012). ...
... We are interested in evaluating the accuracy of GS in sugar kelp, and we considered several model options. A basic Genomic Best Linear Unbiased Prediction (GBLUP) model often provides adequate accuracies when compared to other models including Bayesian approaches (Heslot et al., 2012;Sallam et al., 2015;Huang et al., 2016). These models can be extended to account for genotype by environment interaction (GxE) effects which affect selection accuracy between environments for both GS and phenotypic selection (Resende et al., 2011;Lado et al., 2016). ...
... In previous studies, the GBLUP GS model produced similar accuracies across various traits compared to other GS models such as Bayesian approaches (Heslot et al., 2012;Sallam et al., 2015). Similarly, we did not observe significant differences between the model prediction accuracy of GBLUP (whether using pedigree alone, or using pedigree plus marker information) and GCA+SCA (whether using pedigree alone or using pedigree and marker information together). ...
Article
Full-text available
Sugar kelp (Saccharina latissima) has a biphasic life cycle, allowing selection on both thediploid sporophytes (SPs) and haploid gametophytes (GPs). We trained a genomic selection (GS) model from farm-tested SP phenotypic data and used a mixed-ploidy additive relationship matrix to predict GP breeding values. Topranked GPs were used to make crosses for further farm evaluation. The relationship matrix included 866 individuals: a) founder SPs sampled from the wild; b) progeny GPs from founders; c) Farm-tested SPs crossed from b); and d) progeny GPs from farm-tested SPs. The complete pedigree-based relationship matrix was estimated for all individuals. A subset of founder SPs (n = 58) and GPs (n = 276) were genotyped with Diversity Array Technology and whole genome sequencing, respectively. We evaluated GS prediction accuracy via cross validation for SPs tested on farm in 2019 and 2020 using a basic GBLUP model. We also estimated the general combining ability (GCA) and specific combining ability (SCA) variances of parental GPs. A total of 11 yield-related and morphology traits were evaluated. The cross validation accuracies for dry weight per meter (r ranged from 0.16 to 0.35) and wet weight per meter (r ranged 0.19 to 0.35) were comparable to GS accuracy for yield traits in terrestrial crops. For morphology traits, cross validation accuracy exceeded 0.18 in all scenarios except for blade thickness in the second year. Accuracy in a third validation year (2021) was 0.31 for dry weight per meter over a confirmation set of 87 individuals. Our findings indicate that progress can be made in sugar kelp breeding by using genomic selection.
... To be attractive, the GP models should achieve a moderate level of prediction accuracy (PA), being this latter directly proportional to genetic gain (Heffner et al. 2010). PA is calculated from the correlation between the genomic estimated breeding values (GEBVs) and the true breeding values (TBVs) (Heffner et al. 2009) and depends on various parameters, such as population size (Crossa et al. 2013), the genetic architecture of the target trait(s) (Sallam et al. 2015), marker density (Poland and Rutkoski 2016), and the statistical model (Lozada and Carter 2019). Therefore, the successful implementation of genomic prediction strategies in breeding programs requires careful consideration of all these factors. ...
Article
Full-text available
Key message Simultaneous improvement for GY and GPC by using GWAS and GBLUP suggested a significant application in durum wheat breeding. Abstract Despite the importance of grain protein concentration (GPC) in determining wheat quality, its negative correlation with grain yield (GY) is still one of the major challenges for breeders. Here, a durum wheat panel of 200 genotypes was evaluated for GY, GPC, and their derived indices (GPD and GYD), under eight different agronomic conditions. The plant material was genotyped with the Illumina 25 k iSelect array, and a genome-wide association study was performed. Two statistical models revealed dozens of marker-trait associations (MTAs), each explaining up to 30%. phenotypic variance. Two markers on chromosomes 2A and 6B were consistently identified by both models and were found to be significantly associated with GY and GPC. MTAs identified for phenological traits co-mapped to well-known genes (i.e., Ppd-1, Vrn-1). The significance values (p-values) that measure the strength of the association of each single nucleotide polymorphism marker with the target traits were used to perform genomic prediction by using a weighted genomic best linear unbiased prediction model. The trained models were ultimately used to predict the agronomic performances of an independent durum wheat panel, confirming the utility of genomic prediction, although environmental conditions and genetic backgrounds may still be a challenge to overcome. The results generated through our study confirmed the utility of GPD and GYD to mitigate the inverse GY and GPC relationship in wheat, provided novel markers for marker-assisted selection and opened new ways to develop cultivars through genomic prediction approaches.
... In this approach, the dataset is split in two (the calibration set and the validation set), making it possible to estimate accuracies in a given population while keeping parameters such as marker density, population structure or allele frequencies constant. However, cross-validation tends to overestimate accuracy relative to other validation approaches, such as inter-set validation or progeny validation [63,76,77]. Most of the studies performed in rice have used crossvalidation to obtain estimates of prediction accuracy [33]. ...
Article
Full-text available
Improving plant performance in salinity-prone conditions is a significant challenge in breeding programs. Genomic selection is currently integrated into many plant breeding programs as a tool for increasing selection intensity and precision for complex traits and for reducing breeding cycle length. A rice reference panel (RP) of 241 Oryza sativa L. japonica accessions genotyped with 20,255 SNPs grown in control and mild salinity stress conditions was evaluated at the vegetative stage for eight morphological traits and ion mass fractions (Na and K). Weak to strong genotype-by-condition interactions were found for the traits considered. Cross-validation showed that the predictive ability of genomic prediction methods ranged from 0.25 to 0.64 for multi-environment models with morphological traits and from 0.05 to 0.40 for indices of stress response and ion mass fractions. The performances of a breeding population (BP) comprising 393 japonica accessions were predicted with models trained on the RP. For validation of the predictive performances of the models, a subset of 41 accessions was selected from the BP and phenotyped under the same experimental conditions as the RP. The predictive abilities estimated on this subset ranged from 0.00 to 0.66 for the multi-environment models, depending on the traits, and were strongly correlated with the predictive abilities on cross-validation in the RP in salt condition (r = 0.69). We show here that genomic selection is efficient for predicting the salt stress tolerance of breeding lines. Genomic selection could improve the efficiency of rice breeding strategies for salinity-prone environments.
... While the ultimate goal of sparse testing is to reduce phenotyping effort in the context of G × E, PA can also vary according to the effects of location, year and generation of phenotyped material. Using genomic information from the most recent or a more homozygous generation in the training set (TS) can impact the PA of GS (Sallam et al. 2015). These authors highlighted the fact that more generations of selfing result in an increased percentage of fixed markers, thus losing their PA. ...
Article
Full-text available
Genomic selection is a worthy breeding method to improve genetic gain in recurrent selection breeding schemes. The integration of multi-generation and multi-location information could significantly improve genomic prediction models in the context of shuttle breeding. The Cirad-CIAT upland rice breeding program applies recurrent genomic selection and seeks to optimize the scheme to increase genetic gain while reducing phenotyping efforts. We used a synthetic population (PCT27) of which S0 plants were all genotyped and advanced by selfing and bulk seed harvest to the S0:2, S0:3, and S0:4 generations. The PCT27 was then divided into two sets. The S0:2 and S0:3 progenies for PCT27A and the S0:4 progenies for PCT27B were phenotyped in two locations: Santa Rosa the target selection location, within the upland rice growing area, and Palmira, the surrogate location, far from the upland rice growing area but easier for experimentation. While the calibration used either one of the two sets phenotyped in one or two locations, the validation population was only the PCT27B phenotyped in Santa Rosa. Five scenarios of genomic prediction and 24 models were performed and compared. Training the prediction model with the PCT27B phenotyped in Santa Rosa resulted in predictive abilities ranging from 0.19 for grain zinc concentration to 0.30 for grain yield. Expanding the training set with the inclusion of the PCT27A resulted in greater predictive abilities for all traits but grain yield, with increases from 5% for plant height to 61% for grain zinc concentration. Models with the PCT27B phenotyped in two locations resulted in higher prediction accuracy when the models assumed no genotype-by-environment (G × E) interaction for flowering (0.38) and grain zinc concentration (0.27). For plant height, the model assuming a single G × E variance provided higher accuracy (0.28). The gain in predictive ability for grain yield was the greatest (0.25) when environment-specific variance deviation effect for G × E was considered. While the best scenario was specific to each trait, the results indicated that the gain in predictive ability provided by the multi-location and multi-generation calibration was low. Yet, this approach could lead to increased selection intensity, acceleration of the breeding cycle, and a sizable economic advantage for the program. Supplementary Information The online version contains supplementary material available at 10.1186/s12284-023-00661-0.
... With GS, genome-wide marker effects are estimated using phenotypic and genotypic information in a training population and are then used to predict the performance of the selection candidate population using genotypic information (Meuwissen et al., 2001). Several factors can affect the efficiency of using GS in crop improvement including linkage disequilibrium between markers and causative genetic variants, prediction models, training population size, trait heritability, and the relationship between the training population and selection candidates (Asoro et al., 2011;Lorenz & Smith, 2015;Sallam et al., 2015). GS was found to be an efficient approach for improving resistance to FHB and DON accumulation and equal to replicated phenotypic evaluations in barley (Sallam & Smith, 2016;Tiede & Smith, 2018). ...
Article
Full-text available
Fusarium head blight (FHB) or scab is a devastating disease of barley that severely reduces the yield and quality of the grain. Additionally, mycotoxins produced by the causal Fusarium species can contaminate harvested grain, resulting in food safety concerns and further economic losses. In the Upper Midwest region of the United States, Fusarium graminearum is the primary causal agent, and deoxynivalenol (DON) is the main mycotoxin associated with Fusarium infection. Deployment of resistant cultivars is an important component of an integrated strategy to manage this disease. Unfortunately, few good sources of FHB resistance have been identified from the evaluation of large collections of Hordeum germplasm. Over the past 25 years, many barley mapping populations have been developed with selected resistance sources to identify the number, chromosomal position and allelic effect of quantitative trait loci (QTL) contributing to FHB resistance and DON accumulation. To consolidate the genetic data generated from 14 mapping studies that included 22 bi- or tri-parental mapping populations and three genome-wide association (GWAS) mapping panels, a consensus map was constructed that includes 4145 SNP, SSR, RFLP and AFLP markers. A meta-analysis based on this consensus map revealed 96 QTL for FHB resistance and 57 for DON accumulation scattered across the barley genome. Many of the QTL explained a low percentage (<10%) of variation for the traits and were often found significant in only one or a few environments in multi-year/multi-location field trials. Moreover, many of the FHB/DON QTL mapped to chromosomal positions coincided with various agro-morphological traits that could influence the level of disease (e.g. heading date, height, spike density, and spike angle), raising the important question of whether the former are true resistance factors or are simply the result of pleiotropy with the latter. Considering the magnitude of effect, consistency of detection across environments and independence from agro-morphological traits, only three of 96 QTL for FHB and five of 57 QTL for DON were considered priority targets for marker-assisted selection (MAS). In spite of the challenge for having a limited number of useful QTL for breeding, genomic selection holds promise for increasing the efficiency of developing FHB-resistant barley cultivars, an essential component of the overall management strategy for the disease.
... The ultimate aim of molecular markers is their practical application in Marker-assisted selection (MAS), where 46 preferable genotypes can be selected and unwanted genotypes discarded based on the molecular marker scores 47 (Mohan et al. 1997;Chopra 2014;Schmid and Thorwarth 2014;Sallam et al. 2015; Kumar et al. 2020). Therefore, 48 molecular markers can greatly assist breeding, but their adoption is dependent on the complexity of the technology, 49 which in many cases is beyond the capabilities of the users. ...
... Genomic selection has been successfully conducted in several crops (Windhausen et al., 2012;Sallam et al., 2015;Spindel et al., 2015). When the accuracy of genomic estimated breeding value (GEBV) is high enough, genomic prediction (GP) can reduce breeding time because the proportion of superior genotypes in a breeding population may increase, and hence accelerate selection gain (Bernardo, 2010;Heffner et al., 2010). ...
Article
Full-text available
Genomic selection is expected to improve selection efficiency and genetic gain in breeding programs. The objective of this study was to assess the efficacy of predicting the performance of grain sorghum hybrids using genomic information of parental genotypes. One hundred and two public sorghum inbred parents were genotyped using genotyping-by-sequencing. Ninty-nine of the inbreds were crossed to three tester female parents generating a total of 204 hybrids for evaluation at two environments. The hybrids were sorted in to three sets of 77,59 and 68 and evaluated along with two commercial checks using a randomized complete block design in three replications. The sequence analysis generated 66,265 SNP markers that were used to predict the performance of 204 F1 hybrids resulted from crosses between the parents. Both additive (partial model) and additive and dominance (full model) were constructed and tested using various training population (TP) sizes and cross-validation procedures. Increasing TP size from 41 to 163 increased prediction accuracies for all traits. With the partial model, the five-fold cross validated prediction accuracies ranged from 0.03 for thousand kernel weight (TKW) to 0.58 for grain yield (GY) while it ranged from 0.06 for TKW to 0.67 for GY with the full model. The results suggest that genomic prediction could become an effective tool for predicting the performance of sorghum hybrids based on parental genotypes.
Preprint
Full-text available
Background In drought periods, water use efficiency depends on the capacity of roots to extract water from deep soil. A semi-field phenotyping facility (RadiMax) was used to investigate above-ground and root traits in spring barley when grown under a water availability gradient. Above-ground traits included grain yield, grain protein concentration, grain nitrogen removal, and thousand kernel weight. Root traits were obtained through digital images measuring the root length at different depths. Two nearest-neighbor adjustments (M1 and M2) to model spatial variation were used for genetic parameter estimation and genomic prediction (GP). M1 and M2 used (co)variance structures and differed in the distance function to calculate between-neighbor correlations. M2 was the most developed adjustment, as accounted by the Euclidean distance between neighbors. Results The estimated heritabilities (\({\widehat{h}}^{2}\)) ranged from low to medium for root and above-ground traits. The genetic coefficient of variation (\(GCV\)) ranged from 3.2 to 7.0% for above-ground and 4.7 to 10.4% for root traits, indicating good breeding potential for the measured traits. The highest \(GCV\) observed for root traits revealed that significant genetic change in root development can be achieved through selection. We studied the genotype-by-water availability interaction, but no relevant interaction effects were detected. GP was assessed using leave-one-line-out (LOO) cross-validation. The predictive ability (PA) estimated as the correlation between phenotypes corrected by fixed effects and genomic estimated breeding values ranged from 0.33 to 0.49 for above-ground and 0.15 to 0.27 for root traits, and no substantial variance inflation in predicted genetic effects was observed. Significant differences in PA were observed in favor of M2. Conclusions The significant \(GCV\) and the accurate prediction of breeding values for above-ground and root traits revealed that developing genetically superior barley lines with improved root systems is possible. In addition, we found significant spatial variation in the experiment, highlighting the relevance of correctly accounting for spatial effects in statistical models. In this sense, the proposed nearest-neighbor adjustments are flexible approaches in terms of assumptions that can be useful for semi-field or field experiments.
Preprint
Full-text available
Genomic selection offers new prospects for revisiting hybrid breeding schemes by replacing extensive phenotyping of individuals with genomic predictions. Finding the ideal design for training genomic prediction models is still an open question. Previous studies have shown promising predictive abilities using sparse factorial instead of tester-based training sets to predict single-cross hybrids from the same generation. This study aims to further investigate the use of factorials and their optimization to predict line general combining abilities (GCAs) and hybrid values across breeding cycles. It relies on two breeding cycles of a maize reciprocal genomic selection scheme involving multiparental connected reciprocal populations from flint and dent complementary heterotic groups selected for silage performances. Selection based on genomic predictions trained on a factorial design resulted in a significant genetic gain for dry matter yield in the new generation. Results confirmed the efficiency of sparse factorial training sets to predict candidate line GCAs and hybrid values across breeding cycles. Compared to a previous study based on the first generation, the advantage of factorial over tester training sets appeared lower across generations. Updating factorial training sets by adding single-cross hybrids between selected lines from the previous generation or a random subset of hybrids from the new generation both improved predictive abilities. The CDmean criterion helped determine the set of single-crosses to phenotype to update the training set efficiently. Our results validated the efficiency of sparse factorial designs for calibrating hybrid genomic prediction experimentally and showed the benefit of updating it along generations.
Article
Bacterial leaf streak (BLS), caused chiefly by the pathogen Xanthomonas translucens pv. translucens, is becoming an increasingly important foliar disease of barley in the Upper Midwest. The deployment of resistant cultivars is the most economical and practical method of control. To identify sources of BLS resistance, we evaluated two panels of breeding lines from the University of Minnesota (UMN) and Anheuser-Busch InBev (AB InBev) barley improvement programs for reaction to strain CIX95 in the field at St. Paul and Crookston, MN in 2020 and 2021. The percentage of resistant lines in the UMN and AB InBev panels with mid-season maturity was 1.8% (6 of 333 lines) and 5.2% (13 of 251 lines), respectively. Both panels were genotyped with the barley 50K iSelect SNP array, and then a genome-wide association study was performed. A single, highly-significant association was identified for BLS resistance on chromosome 6H in the UMN panel. This association was also identified in the AB InBev panel. Seven other significant associations were detected in the AB InBev panel: two each on chromosomes 1H, 2H, and 3H and one on chromosome 5H. Of the eight associations identified in the panels, five were novel. The discovery of resistance in elite breeding lines will hasten the time needed to develop and release a BLS resistant cultivar.
Article
Full-text available
Asian rice, Oryza sativa is a cultivated, inbreeding species that feeds over half of the world's population. Understanding the genetic basis of diverse physiological, developmental, and morphological traits provides the basis for improving yield, quality and sustainability of rice. Here we show the results of a genome-wide association study based on genotyping 44,100 SNP variants across 413 diverse accessions of O. sativa collected from 82 countries that were systematically phenotyped for 34 traits. Using cross-population-based mapping strategies, we identified dozens of common variants influencing numerous complex traits. Significant heterogeneity was observed in the genetic architecture associated with subpopulation structure and response to environment. This work establishes an open-source translational research platform for genome-wide association studies in rice that directly links molecular variation in genes and metabolic pathways with the germplasm resources needed to accelerate varietal development and crop improvement.
Article
Full-text available
Genomic selection (GS) uses genome-wide molecular marker data to predict the genetic value of selection candidates in breeding programs. In plant breeding, the ability to produce large numbers of progeny per cross allows GS to be conducted within each family. However, this approach requires phenotypes of lines from each cross before conducting GS. This will prolong the selection cycle and may result in lower gains per year than approaches that estimate marker-effects with multiple families from previous selection cycles. In this study, phenotypic selection (PS), conventional marker-assisted selection (MAS), and GS prediction accuracy were compared for 13 agronomic traits in a population of 374 winter wheat (Triticum aestivum L.) advanced-cycle breeding lines. A cross-validation approach that trained and validated prediction accuracy across years was used to evaluate effects of model selection, training population size, and marker density in the presence of genotype x environment interactions (GxE). The average prediction accuracies using GS were 28% greater than with MAS and were 95% as accurate as PS. For net merit, the average accuracy across six selection indices for GS was 14% greater than for PS. These results provide empirical evidence that multifamily GS could increase genetic gain per unit time and cost in plant breeding.
Article
Full-text available
In genomewide selection, the expected correlation between predicted performance and true genotypic value is a function of the training population size (N), heritability on an entry-mean basis (h(2)), and effective number of chromosome segments underlying the trait (M-e). Our objectives were to (i) determine how the prediction accuracy of different traits responds to changes in N, h(2), and number of markers (N-M) and (ii) determine if prediction accuracy is equal across traits if N, h(2), and N-M are kept constant. In a simulated population and four empirical populations in maize (Zea mays L.), barley (Hordeum vulgare L.), and wheat (Triticum aestivum L.), we added random nongenetic effects to the phenotypic data to reduce h(2) to 0.50, 0.30 and 0.20. As expected, increasing N, h(2), and N M increased prediction accuracy. For the same trait within the same population, prediction accuracy was constant for different combinations of N and h(2) that led to the same Nh(2). Different traits, however, varied in their prediction accuracy even when N, h(2), and N-M were constant. Yield traits had lower prediction accuracy than other traits despite the constant N, h(2), and N-M. Empirical evidence and experience on the predictability of different traits are needed in designing training populations.
Article
Full-text available
Many important traits in plant and animal breeding are polygenic and therefore recalcitrant to traditional marker-assisted selection. Genomic selection addresses this complexity by including all markers in the prediction model. A key method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to BLUP when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other kernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and non-additive kernels by breeders, a new software package for R called rrBLUP has been developed. At its core is a fast maximum-likelihood algorithm for mixed models with a single variance component besides the residual error, which allows for efficient prediction with unreplicated training data. Use of the rrBLUP software is demonstrated through several examples, including the identification of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with non-additive kernels was significantly higher than RR for wheat grain yield but equivalent for several maize traits.
Article
Genomic selection (GS) uses genome-wide molecular marker data to predict the genetic value of selection candidates in breeding programs. In plant breeding, the ability to produce large numbers of progeny per cross allows GS to be conducted within each family. However, this approach requires phenotypes of lines from each cross before conducting GS. This will prolong the selection cycle and may result in lower gains per year than approaches that estimate marker-effects with multiple families from previous selection cycles. In this study, phenotypic selection (PS), conventional marker-assisted selection (MAS), and GS prediction accuracy were compared for 13 agronomic traits in a population of 374 winter wheat ( L.) advanced-cycle breeding lines. A cross-validation approach that trained and validated prediction accuracy across years was used to evaluate effects of model selection, training population size, and marker density in the presence of genotype × environment interactions (G×E). The average prediction accuracies using GS were 28% greater than with MAS and were 95% as accurate as PS. For net merit, the average accuracy across six selection indices for GS was 14% greater than for PS. These results provide empirical evidence that multifamily GS could increase genetic gain per unit time and cost in plant breeding.
Chapter
"Genomic selection," the ability to select for even complex, quantitative traits based on marker data alone, has arisen from the conjunction of new high-throughput marker technologies and new statistical methods needed to analyze the data. This review surveys what is known about these technologies, with sections on population and quantitative genetic background, DNA marker development, statistical methods, reported accuracies of genomic selection (GS) predictions, prediction of nonadditive genetic effects, prediction in the presence of subpopulation structure, and impacts of GS on long-term gain. GS works by estimating the effects of many loci spread across the genome. Marker and observation numbers therefore need to scale with the genetic map length in Morgans and with the effective population size of the population under GS. For typical crops, the requirements range from at least 200 to at most 10,000 markers and observations. With that baseline, GS can greatly accelerate the breeding cycle while also using marker information to maintain genetic diversity and potentially prolong gain beyond what is possible with phenotypic selection. With the costs of marker technologies continuing to decline and the statistical methods becoming more routine, the results reviewed here suggest that GS will play a large role in the plant breeding of the future. Our summary and interpretation should prove useful to breeders as they assess the value of GS in the context of their populations and resources.
Article
Fusarium head blight (FHB) is a devastating disease of barley (Hordeum vulgare L.), causing reductions in yield and quality. Marker-based selection for resistance to FHB and lowered deoxynivalenol (DON) grain concentration would save considerable costs and time associated with phenotyping. A marker-based selection approach called genomic selection (GS) uses genomewide marker information to predict genetic value. We used a cross-validation approach that separated training sets from validation sets by both entry and environment. We used this framework to test the potential of GS for genetic improvement of FHB and DON as well as test the effect of different factors on prediction accuracy. Prediction accuracy for FHB was found to be as high as 0.72 and that for DON was found to be as high as 0.68. Little difference was found between marker effect estimation methods in terms of prediction of entry genetic value. The extensive linkage disequilibrium (LD) present in this population allowed the marker set to be reduced to 384 markers and training population (TP) size to be reduced 200 with little effect on prediction accuracy. We found little to no advantage to combining subpopulations that correspond to neighboring breeding programs to increase TP size. Apparently, little genetic information is shared between subpopulations, either because of different marker-quantitative trait loci (QTL) linkage phases, different segregating QTL, or nonadditive gene action.