Accelerating Tomato Breeding by Exploiting Genomic Selection Approaches


Genomic selection (GS) is a predictive approach that was build up to increase the rate of genetic gain per unit of time in breeding programs. It has emerged as a valuable method for improving complex traits that are controlled by many genes with small effect. GS enables the prediction of breeding value of candidate genotypes for selection. In this work we address important issues related to GS and its implementation in tomato breeding context. Genomic constrains and critical parameters affecting the accuracy of prediction in such crop such as phenotyping, genotyping training population composition and size and statistical method should be carefully evaluated. Comparison of GS approaches for facilitating the selection of tomato superior genotypes during breeding program are also discussed. GS applied to tomato breeding has already shown to be feasible. We illustrated how GS can improve the rate of gain in elite lines selection, descendent and in backcross schemes. The GS schemes begin to be delineated and computer science can provide support for future selection strategies. A new breeding framework is beginning to emerge for optimizing tomato improvement procedures.
Accelerating tomato breeding by exploiting genomic
selection approaches
Elisa Cappetta, Giuseppe Andolfo, Antonio Di Matteo, Amalia Barone, Luigi Frusciante and
Maria Raffaella Ercolano*
Department of Agricultural Sciences, University of Naples ‘Federico II’, Portici, Italy, Running title: Tomato
genomic selection; Elisa Cappetta (; Giuseppe Andolfo (;
Antonio Di Matteo (; Amalia Barone (; Luigi Frusciante
(; Maria Raffaella Ercolano (
*Correspondence: Maria Raffaella Ercolano, Department of Agriculture Sciences, University of Naples
Federico II. Via Università 100, 80055 Portici (Naples), Italy.
Abstract: Genomic selection (GS) is a predictive approach that was build up to increase the rate of
genetic gain per unit of time in breeding programs. It has emerged as a valuable method for
improving complex traits that are controlled by many genes with small effect. GS enables the
prediction of breeding value of candidate genotypes for selection. In this work we address
important issues related to GS and its implementation in tomato breeding context. Genomic
constrains and critical parameters affecting the accuracy of prediction in such crop such as
phenotyping, genotyping training population composition and size and statistical method should
be carefully evaluated. Comparison of GS approaches for facilitating the selection of tomato
superior genotypes during breeding program are also discussed. GS applied to tomato breeding has
already shown to be feasible. We illustrated how GS can improve the rate of gain in elite lines
selection, descendent and in backcross schemes. The GS schemes begin to be delineated and
computer science can provide support for future selection strategies. A new breeding framework is
beginning to emerge for optimizing tomato improvement procedures.
Key word: Tomato; genetic breeding value; training population; genotyping; marker effect;
phenotyping; selection schemes
1. Background
Tomato (Solanum lycopersicum) is one of the most important vegetable crops worldwide. It
possesses unique properties, offering a rich source of minerals (potassium, magnesium, phosphorus)
and antioxidant compounds, which prevents cardiovascular, cancer diseases and strengthens our
immune system [1]. Tomato is an autogamous diploid species, with a modest genome size (900 Mb)
and a relatively short life cycle. As a model plant, numerous genetic and molecular tools have been
developed for tomato species, including a high-quality draft genome sequence, high-density genetic
maps, high-throughput molecular markers, introgression lines and mutant collections (Tomato
Genome Consortium- [2]). In addition, hundreds of genomes from landraces, cultivars, and wild
relatives have been re-sequenced, revealing a relatively low molecular diversity but high rate of
chromosome rearrangements due to traces of wild introgressions [3].
Tomato crop genetic basis became narrow along the process of domestication, preventing intra-
populational breeding strategies to provide satisfactory genetic gains [4]. Besides the low genetic
variability that limits breeding gains of conventional and modern selection schemes, tomato is
tolerant to inbreeding and this allows the generation and maintenance of inbred lines. Therefore, the
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
© 2020 by the author(s). Distributed under a Creative Commons CC BY license.
2 of 10
recombination of the genetic variability has been an excellent alternative for obtaining superior
genotypes [4]. Moreover, the retaining of genome segments from wild relatives, used for
introgressing agronomically relevant traits such as resistance to diseases and quality traits [5,6],
largely contributes to the genetic variability within the cultivated tomato gene pool.
In the early 1980s, the development of different molecular marker systems drastically changed
the fate of plant breeding. Molecular markers were mainly integrated in traditional phenotypic
selection (PS) by applying marker-assisted selection (MAS) to improve the plant selection process
through the inclusion of chromosomal segments containing quantitative trait loci (QTLs) or single
genes [7,8,9]. Several research articles concerning the identification of tomato QTLs and major genes
conferring resistance to biotic and environmental stresses have been reviewed in [5,10]. Molecular
markers have been also used in tomato to map genes or QTLs for environmental stresses and some
flower and fruit-related traits (reviewed in [11]). However, MAS is more suitable for application
concerning simple traits with a few major-effect genes than for complex traits controlled by a large
number of minor genes [12,13].
Genomic selection (GS) provides new opportunities for increasing the efficiency of plant
breeding programs for traits with polygenic inheritance [13,14,15]. The potential breeding value of
an individual is estimated using genomic-based data such as single nucleotide polymorphisms (SNP).
Recent high-throughput genotyping (HTG) systems helps to generate several thousand of SNP
markers allowing entire genomes to be scanned at a reasonable cost. Genomic screening of breeding
populations can accelerate the genetic gain obtained at each cycle, especially when selection is
performed for traits with low heritability. Although the effect of each marker is very small, a large
amount of genome-wide marker information has the potential to explain all the genetic variance [16].
The development of statistical methods capable of accurately predict marker effects has led to
the breakthrough of GS increasing the rate of genetic gain per unit of time. GS combines genotypic
and phenotypic data from a training population (TRN) in a training set (TRS) to obtain the genomic
estimated breeding values (GEBVs) of a testing set (TST) which has been genotyped but not
phenotyped. The GS model will be then employed to predict breeding values of not phenotyped
individuals in the next selection step (Figure 1).
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
3 of 10
Figure 1. Flowchart of a genomic selection (GS) breeding program. GS overview with cross validation
using a training set (70-90% out of 100-1000 lines) to estimate marker effects in order to get a genomic
estimated breeding value (GEBV) of lines in the testing set (10-30% out of 100-1000 lines). Finally,
phenotypic and genotypic data of the training set are used to setup the prediction model.
In tomato, pioneer studies concerning the application of GS for yield-related traits were reported for
fresh market varieties and wild related species [17,18]. More recently, Yamamoto et al. [19] assessed
the potential of GS to increase soluble solids content and fruit weight in F1 tomato varieties, whereas
Liabeuf et al. [20] reported the implementation of a GS approach to develop bacterial spot resistant
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
4 of 10
tomato lines. GS models were widely exploited for predicting phenotypes of progeny and parents,
although the efficiency varied depending on the parental cross combinations and the selected traits
[21]. Optimized and validated GS protocols are still needed in tomato. Several GS programs in
tomato are still in progress, thus the impact of factors affecting the implementation and the accuracy
of the model have not yet been evaluated while their optimization for tomato breeding is still required.
Among these factors, phenotyping procedures, TRN size, genetic relationship between individuals
in TRS and TST, genotyping platforms, marker quality metrics and design of GS schemes should be
further investigated. Here we discussed the application in tomato breeding schemes of GS within and
across breeding generations, as well as its potential to select parents based on their assessed GEBV.
2. Tomato GS schema implementation
Recent studies have demonstrated that the establishment of GS experiment optimal parameters
requires a careful evaluation of key factors. Selection response depends on the precision of the
phenotyping and genotyping methods used to obtain the GEBVs (including size of TRN, marker
density, marker technology), knowledge of the genome structure and marker linkage disequilibrium
The success of modern breeding programs based on genomic techniques strictly depends on
precision of measurements related to phenotyped traits [22]. Digital instruments with scalable
technologies can improve the precision of phenotyping [23] and accelerate the selection. Recent
technologies have being used to acquire specific data on tomato traits with the aim of boosting the
precision and the throughput of measurements, the size of analyzed plant populations and, thus,
enhancing the accuracy of the predicted phenotypic value and the genetic gain [24,25].
The appropriate TRN size and composition are also critical for gaining high prediction accuracy.
A positive correlation between prediction accuracy and TRN size was confirmed in several species
[26,27]. However, the optimal TRN size seems to be highly influenced by the relatedness of TRS and
TST [28,20,29]. The highest prediction accuracies were found using TRS with a strong relationship to
the TST [15,30,31]. Indeed, when the TRS and TST are unrelated, marker effects could be inconsistent
due to the presence of different alleles, allele frequencies, and linkage phases. Developing ad hoc TRN
is crucial and update the TRN at each cycle could improve the prediction accuracy since that the
segregating population could accumulate genetic diversity and gene frequencies may change at each
selection cycle [20].
To capture as much informative loci as possible an appropriate abundance of markers is required
[32]. In this regard, genotyping-by-sequencing (GBS) can be used to efficiently generate high-density
marker panels. Alternately, the cDNA-based GBS technique (RAR-seq restriction site associated RNA
sequencing) may detect conserved SNPs associated to a candidate mutation directly at the expression
level [33]. Recently, a customizable method for tomato targeted genotyping, named single primer
enrichment technology (SPET) was developed for improving the panel design and increasing the
multiplexing levels of tomato genotyping [34]. Previous GS data can help to design an optimized suite
of markers for next steps. Liabeuf et al. [20] reduced the initial SolCAP array” of 7,700 SNPs [35] to screen
populations with limited recombination. Moreover, the prediction accuracy may be also affected by minor
allele frequency threshold (MAF) [32]. Establishing methods for efficiently transferring validated genome
signatures within tomato breeding selection procedures is also relevant. Linkage drag caused by
recombination suppression can be reduced by estimating the effects of relevant markers improving
prediction performance. Indeed, large gene introgression fragments in tomato cultivars from Solanum
wild species caused drastic chromosome landscape changes. The Solanum peruvianum introgression
carrying the tomato mosaic virus (ToMV) resistance gene Tm2 can cover up to 79% of chromosome 9
in modern varieties [3].
In the framework of GS, several statistical methods have been developed to estimate the marker
effects in tomato [20]. The choice of the most appropriate method should be finalized to the specific
context, considering the model complexity (genetic architecture, population size and heritability) and
the computation requirements [36,37]. Ridge regression best linear unbiased prediction (RR-BLUP)
and genomic best linear unbiased prediction (G)BLUP [38] are suggested when assessing a trait that
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
5 of 10
is affected by many small-effect genes using close TRN relatives. On the other hand, when traits are
controlled by major-effect QTLs or when considering prediction of unrelated individuals, a higher
prediction accuracy can be obtained by Bayesian methods [39]. However empirical studies suggest
that there are no major differences between regression-based methods and Bayesian GS in tomato
3. Applying GS in tomato crop improvement
Several constraints can affect the genetic gain of a GS program in tomato. The implementation
of GS requires the optimization of field trial management and agricultural practices, seed production,
phenotyping, sample collection and sequencing [40]. Moreover, as discussed above, parameters such
as inbreeding level of populations, number of individuals to be assessed, and marker metrics, should
be carefully evaluated to effectively run a GS-assisted breeding scheme. It can be estimated that, for
tomato breeding programs, the genotyping work to complete GEBV predictions requires
approximately three months. Once these issues have been addressed, the GEBVs can be calculated
both to perform parental line selection and to evaluate the overall performance of the progenies in a
descendent selection or backcross schemes. The selection decision will be achieved based on the
higher GEBVs for each tested trait on the overall average of traits or as ‘indices' of GEBV from several
traits following selection priorities.
4. Evaluation of Elite lines
The first step in tomato F1 hybrid variety development is the selection of elite parents to
maximize the genetic variability exploitation. Elite germplasm represents a core collection of cross-
compatible genotypes enriched for some favorable alleles [41]. In a GS-assisted breeding scheme for
tomato F1 hybrid development, the decision to select parental lines is based on their breeding value
(i.e., the mean performance of the progeny of a given parent) that consequently requires to be
estimated accurately. Consistently, Yamamoto and collaborators [19] used a set of 96 big-fruited F1
tomato varieties to develop GS models, and the segregating populations obtained from crosses were
used to validate the models. Consequently, the GS models were used to successfully predict parental
combinations generating superior hybrids using progeny genotypic and phenotypic data for soluble
solids content and total fruit weight. However, the efficiency of predictions varied depending on
traits and parental combinations. While the need for fixing favorable alleles in the gene pool leads to
increase inbreeding, the GS selection gain is dramatically reduced in small populations with narrow genetic
variability. The managing of elite genetic diversity to increase the frequency of favorable alleles over time can
highly benefit from GS approaches [41]. The prediction accuracy of parent cross ability could improve
with the assessment of a higher number of selfing progenies. Thanks to the advances made in tomato
genome knowledge and genotyping technologies, breeders can easily identify valuable alleles in elite
germplasm [42,43] and create new lines combining these valuable alleles using a set of validated
5. Descendent selection schemes
In tomato, breeders commonly take advantage of useful genetic variability by recycling the best-
performing varieties that have been successful for a given area by Single Seed Descendent (SSD)
schema where each generation derived from the former, taking only one seed from each parent plant.
Nearly all steps can be conducted in the greenhouse, making this a method of choice for accelerating
breeding in areas that do not benefit of a growing season long enough [43]. In the classical SSD
scheme, the choice of tomato parental lines is very critical to ensure a higher additive breeding value
since self-fertilization increases inbreeding level by 1⁄2 at each cycle. In the SSD scheme, no selection
is conducted until the last generation (generally F6-F7), so the phenotyping of a larger number of
lines could be challenging. The integration of the GS approach in the SSD could result in reducing
the number of selfing generations thus shortening the overall schema and decreasing the
phenotyping effort (Figure 2). Because the prediction accuracy is generally higher when LD is high,
an increase of the breeding gains is expected when applying GS in the earliest heterozygous
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
6 of 10
segregating generations (i.e., F2-F4). Therefore, these generations could be successfully used for
developing the GS model, and subsequently GS prediction could assist selection in the following
generations. Genomic data can accurately track the best performing plants along the generations, and
the approach can successfully lead to the selection of individuals with the highest GEBV.
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
7 of 10
Figure 2. Comparison of genomic selection (GS) and conventional selection in tomato breeding
programs. Screening of recombinant lines through GS approaches optimizes the genetic gain obtained
in each selection cycle. Breeding cycles (horizontal dashed lines) are shortened by removing
phenotypic evaluation of lines before training population (TRN) evaluation for the next cycle.
6. Backcross schemes
Backcrossing is a quite popular breeding scheme where a valuable trait is introgressed from a
donor parent into the genomic background of a recurrent parent. In tomato breeding, backcrossing
schemes with exotic or elite materials are widely used to introduce favorable traits. However, the
constant introduction of novel alleles and the linkage drag, the crossing with old varieties or exotic
material with low breeding value as well as the extended breeding cycles deriving from complex
crossing scheme, can reduce the genetic gain per year. The response to genetic selection achieved
through the selection of lines with high breeding value in a segregating population can be certainly
improved by GS (Figure 2). A variant of the classical backcross scheme, where lines of each generation
are selected based on recurrent parent breeding value, allowed obtaining high rates of genetic gain
[44,45]. By combining GS with single-marker assays, genes with major effects can be also selected
within each offspring following the cross with the recurrent line. In this way, the GS approach is
expected to additively increase the genetic gain at each generation. Candidate genotypes for selection,
carrying specific alleles (i.e, resistance traits) can be identified using genotyping platforms that
include gene specific diagnostic markers or integrate single locus data obtained with different
technologies. In addition, among markers used in the GS model implementation, a subset of them
identifying undesirable segments of wild donor can be selected. In fact, large wild genome segments (between
the 30 and 70% of the whole chromosome) were found to be incorporated due to resistance gene introgressions
on specific chromosome in cultivated tomatoes [3]. As an extension of this approach, genome-wide
selection with high-throughput markers in BC1 could be even more efficient and the recovering of
the recurrent parent genome could be increased from generation BC1 to BC3 without affecting
favorable trait introgression.
7. Conclusions
The evaluation of complex traits such as disease resistance genes and QTLs for quality traits with
high efficiency in a segregating population can be a difficult task for tomato breeders. The
implementation of GS in breeding schemes, supporting the selection of improved genotypes, can
accelerate genetic achievable gain. Major GS implementation challenges were highlighted here,
including model development, genotyping quality, optimal GS incorporation stage and indications
for overcoming these issues were also discussed. While the methodological procedures begin to be
delineated, the optimal way to incorporate GS in a breeding scheme remains to be empirically defined.
Important features for the success of GS under different breeding scenarios should be assessed.
Advancements in genotyping efficiency and phenotyping technologies will facilitate the adoption of
GS in tomato breeding. A future update of existing selection schemas may be achieved using
computer simulations for investigating different strategies to face the selection process gaps.
Author Contributions EC was centrally involved in writing the manuscript and in drafting figures. GA
revised the manuscript and produced the figures. ADM revised the text. AB provided important suggestions for
improving the manuscript. LF critically revised the manuscript. MRE conceived the study, coordinated work
and contributed to manuscript writing. All of the authors read and approved the final manuscript.
Funding This work was supported by the Italian Ministry of University and Research and carried
out within TomGEM Project that has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 679796.
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
8 of 10
Conflicts of Interest Statement The authors declare that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential conflict of interest.
1. Frusciante, L.; Barone, A.; Carputo, D.; Ercolano, M.R.; Della Rocca, F.; Esposito, S. Evaluation and use of plant
biodiversity for food and pharmaceuticals. Fitoterapia. 2000, 71, 66-72.
2. Sato, S.; Tabata, S.; Hirakawa, H.; Asamizu, E.; Shirasawa, K.; Isobe, S.; Kaneko, T.; Nakamura, Y.; Shibata, D.;
Aoki, K. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012, 485, 635641.
3. Schouten, H.J.; Tikunov, Y.; Verkerke, W.; Finkers, R.; Bovy, A.; Bai, Y.; Visser, R.G.F. Breeding has Increased
the Diversity of Cultivated Tomato in The Netherlands. Fron. Plant. Sci. 2019, 10, 1606.
4. Souza, L.M; Paterniani, M.; Melo P.C.T.; Melo, A.M.T. Diallel cross among fresh market tomato inbreeding
lines. Hortic Bras. 2012, 30, 246-251
5. Ercolano, M.R.; Sanseverino, W.; Carli, P.; Ferriello, F.; Frusciante, L. Genetic and genomic approaches for R‐
gene mediated disease resistance in tomato: retrospects and prospects. Plant. Cell. Rep. 2012, 31, 973 985.
6. Sacco, A.; Di Matteo, A.; Lombardi, N.; Trotta, N.; Punzo, B.; Mari, A.; Barone, A. Quantitative trait loci
pyramiding for fruit quality traits in tomato. Mol. Breed. 2013, 31(1), 217-222.
7. Collard, B.C.Y.; Mackill, D.J. Marker-assisted selection: an approach for precision plant breeding in the twenty-
first century. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 2008, 363, 557572.
8. Andolfo, G.; Jupe, F.; Witek, K.; Etherington, G.J.; Ercolano, M.R.; Jones, J.D.G. Defining the full tomato NB-
LRR resistance gene repertoire using genomic and cDNA RenSeq. BMC Plant Biol. 2014, 14, 120.
9. Capuozzo, C.; Formisano, G.; Iovieno, P.; Andolfo, G.; Tomassoli, L.; Barbella, M.M.; Pico, B., Paris, H.S.,
Ercolano, M.R. Inheritance analysis and identification of SNP markers associated with ZYMV resistance in
Cucurbita pepo. Mol. Breed. 2017, 37(8), 1-12.
10. Kissoudis, C.; Chowdhury, R; van Heusden, S; van de Wiel, C., Finkers, R.; Visser, R.F.; Bai, Y.; van der
Linden, G. Combined biotic and abiotic stress resistance in tomato. Euphytica. 2015, 202, 317332.
11. Osei, M.K.; Prempeh, R.; Adjebeng, J.; Opoku, J.; Danquah, A.; Danquah, E.; Blay, E.; Adu-Dapaah H.; et al.
Marker-Assisted Selection (MAS): A Fast-Track Tool in Tomato Breeding. (Recent Advances in Tomato Breeding
and Production). IntechOpen, 2018, 93-113.
12. Dekkers, J.C.M.; Hospital, F. The use of molecular genetics in the improvement of agricultural populations.
Nat. Rev. Genet. 2002, 3(1), 2232.
13. Heffner, E.L.; Sorrells, M.E.; Jannink, J. Genomic Selection for Crop Improvement. Crop. Sci.
2009, 49(1), 112.
14. Crossa, J.; De Los Campos, G.; Pérez, P.; Gianola, D.; Burgueño, J.; Araus, J.L.; Makumbi, D.; Singh, R.P.;
Dreisigacker, S.; Yan, J.; et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree
and molecular markers. Genetics. 2010, 186(2), 713724.
15. Lorenz, A.J.; Chao, S.; Asoro, F.G.; Heffner, E.L.; Hayashi, T.; Iwata, H.; Hayashi, T.; Iwata, H.; Smith, K.P.;
Sorrells, M.E.; Jannink, J.K. Genomic Selection in Plant Breeding. Knowledge and Prospects. Adv. Agron. 2011,
110, 77-123.
16. Wang, X., Xu, Y., Hu, Z., Xu, C. Genomic selection methods for crop improvement: Current status and
prospects. Crop. J. 2018, 6, 330340.
17. Duangjit, J.; Causse, M.; Sauvage, C. Efficiency of genomic selection for tomato fruit quality. Mol. Breed. 2016,
36, 29.
18. Yamamoto, E.; Matsunaga, A.; Onogi, A.; Kajiya-Kanegae, H.; Minamikawa, M.; Suzuki, A.; Shirasawa, k.;
Hirakawa, H.; Nunome, T., Yamaguchi, H.; et al. A simulation-based breeding design that uses whole-genome
prediction in tomato. Sci. Rep. 2016, 6, 19454.
19. Yamamoto, E., Matsunaga, H., Onogi, A., Ohyama, A., Miyatake, K., Yamaguchi, H., Nunome, T.; Iwata, H.,
Fukuoka, H. Efficiency of genomic selection for breeding population design and phenotype prediction in tomato.
Heredity. 2017, 118, 202209.
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
9 of 10
20. Liabeuf, D.; Sim, S.C.; Francis, D.M. Comparison of marker-based genomic estimated breeding values and
phenotypic evaluation for selection of bacterial spot resistance in tomato. Phytopathology. 2018. 108, 392-401.
21. Robertsen, C.D.; Hjotrtshøj, R.L.; Janss, L.L. Genomic Selection in Cereal Breeding. Agron. 2019, 9, 1-16.
22. Esposito, S.; Carputo, D.; Cardi, T.; Tripodi, P. Applications and Trends of Machine Learning in Genomics
and Phenomics for Next-Generation Breeding. Plants. 2020, 9(1), 34.
23. Araus, J.L.; Cairns, J.E. Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci.
2014, 19, 5261.
24. Panthee, D. R.; Labate, J.A.; McGrath, M.T.; Breksa, A. P.; Robertson, L.D. Genotype and environmental
interaction for fruit quality traits in vintage tomato varieties. Euphytica. 2013, 193, 169182.
25. Daniel, I.O; Atinsola, K.O.; Ajala, M.O.; Popoola, A.R. Phenotyping a Tomato Breeding Population by Manual
Field Evaluation and Digital Imaging Analysis. Int J Plant Breed Genet. 2017, 11, 19-24.
26. Meuwissen, T.; Hayes, B.; Goddard, M. Accelerating improvement of livestock with genomic selection. Annu.
Rev. Anim. Biosci. 2013, 1(1), 21237.
27. Sarinelli, J.M.; Murphy, J.P.; Tyagi, P.; Holland, J.B.; Johnson, J.W.; Mergoum, M.; Mason, R.E.; Babar, A.;
Harrison, S.; Sutton, R.; et al. Training population selection and use of fixed effects to optimize genomic
predictions in a historical USA winter wheat panel. Theor. Appl. Genet. 2019, 132, 115.
28. Schopp, P.; Mü ller, D.; Technow, F.; Melchinger, A.E. Accuracy of genomic prediction in synthetic
populations depending on the number of parents, relatedness, and ancestral linkage disequilibrium. Genetics.
2017, 205, 441 454.
29. Edwards, S.M.; Buntjer, J.B.; Jackson, R.; Bentley, A.R.; Lage J.; Byrne, E., et al. The effects of training
population design on genomic prediction accuracy in wheat. Theor. Appl. Genet. 2019, 132, 1943-52.
30. Nielsen, N.H.; Jahoor, A.; Jensen, D.; Orabi, J.; Cericola, F.; Edriss, V.; Jensen, J. Genomic prediction of seed
quality traits using advanced barley breeding lines. PLoS ONE. 2016, 1(10), e0164494.
31. Cericola, F.; Jahoor, A.; Orabi, J.; Andersen, J.R.; Janss, L.L.; Jensen, J. Optimizing training population size
and genotyping strategy for genomic prediction using association study results and pedigree information. A
case of study in advanced wheat breeding lines. PLoS ONE. 2017, 12, e0169606.
32. Zhang, H.; Yin, L.; Wang, M.; Yuan, X.; Liu, X. Factors affecting the accuracy of genomic selection for
agricultural economic traits in maize, cattle, and pig populations. Front. Genet. 2019, 10, 189.
33. Alabady, M.S.; Rogers, W.L.; Malmberg, R.L. Development of transcriptomic markers for population analysis
using restriction site associated RNA sequencing (RARseq). PLoS ONE. 2015, 10(8), e0134855.
34. Barchi, L.; Acquadro, A.; Alonso, D.; Aprea, G.; Bassolino, L.; Demurtas, O.; Ferrante, P.; Gramazio, P.; Mini,
P.; Portis, E.; et al. Single Primer Enrichment Technology (SPET) for High-Throughput Genotyping in Tomato
and Eggplant Germplasm. Front. Plant Sci. 2019, 10, 1005.
35. Sim, S.C.; Van Deynze, A.; Stoffel, K.; Douches, D.S.; Zarka, D.; Ganal, M.W.; Chetelat, R.T.; Hutton, S.F.;
Scott, J.W.; Gardner, R.G.; et al. High-Density SNP Genotyping of Tomato (Solanum lycopersicum L.) Reveals
Patterns of Genetic Variation Due to Breeding. PLoS ONE. 2012, 7(9): e45520.
36. Maltecca, C.; Parker, K.L.; Cassady, J.P. Application of multiple shrinkage methods to genomic predictions.
J. Anim. Sci. 2012, 90, 17771787.
37. Heslot, N.; Rutkoski, J.; Poland, J.; Jannink, J.L.; Sorrells, M.E. Impact of marker ascertainment bias on
genomic selection accuracy and estimates of genetic diversity. PLoS ONE. 2013, 8, e74612.
38. Whittaker, J.C.; Thompson, R.; Denham, M.C. Marker-assisted selection using ridge regression. Genet. Res.
2000, 75, 249252.
39. De los Campos, G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L. Whole-genome regression
and prediction methods applied to plant and animal breeding. Genetics. 2013, 193, 327345.
40. Bassi, F.M.; Bentley, A.R.; Charmet, G.; Ortiz, R.; Crossa, J. Breeding Schemes for the Implementation of
Genomic Selection in Wheat (Triticum Spp.). Plant. Sci. 2016, 242, 2336.
41. Falk, D.E. Generating and maintaining diversity at the elite level in crop breeding. Genome. 2010, 53, 982−991.
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
10 of 10
41. Ercolano, M.R.; Sacco, A.; Ferriello, F.; D'Alessandro, R.; Tononi, P.; Traini, A.; Barone, A.; Zago, E.; Chiusano,
M.L.; Buson, G.; et al. Patchwork sequencing of tomato San Marzano and Vesuviano varieties highlights genome‐
wide variations. BMC Genom. 2014, 15, 138.
42. Sacco, A.; Ruggieri, V., Parisi, M.; Festa, G.; Rigano, M.M.; Picarella, M.E.; Mazzucato, A.; Barone, A.
Exploring a tomato landraces collection for fruit-related traits by the aid of a high-throughput genomic platform.
PLoS ONE. 2015, 10, e0137139.
43. Kanbar, A.; Kondo, K.; Shashidhar, H.E. Comparative efficiency of pedigree, modified bulk and single seed
descent breeding methods of selection for developing high-yielding lines in rice (Oryza sativa L.) under aerobic
condition. Electron. j. plant breed. 2011, 2(2), 184-193.
44. Breseghello, F.; Morais, O.P.; Castro, E.M.; Prabhu, S.A.; Bassinello, P.Z.; Pereira, J.P.; Utumi, M.M.; Ferreira,
M.E.; Soares, A.A. Recurrent selection resulted in rapid genetic gain for upland rice in Brazil. Int. Rice Res. Notes.
2009, 34, 1-4.
45. Shelton, A.; Tracy, W.; Shelton, A.C; Tracy, W.F. Recurrent selection and participatory plant breeding for
improvement of two organic open-pollinated sweet corn (Zea mays L.). Popul. Sustain. 2015, 7, 51395152.
Preprints ( | NOT PEER-REVIEWED | Posted: 14 September 2020 doi:10.20944/preprints202009.0308.v1
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Crops are the major source of food supply and raw materials for the processing industry. A balance between crop production and food consumption is continually threatened by plant diseases and adverse environmental conditions. This leads to serious losses every year and results in food shortages, particularly in developing countries. Presently, cutting-edge technologies for genome sequencing and phenotyping of crops combined with progress in computational sciences are leading a revolution in plant breeding, boosting the identification of the genetic basis of traits at a precision never reached before. In this frame, machine learning (ML) plays a pivotal role in data-mining and analysis, providing relevant information for decision-making towards achieving breeding targets. To this end, we summarize the recent progress in next-generation sequencing and the role of phenotyping technologies in genomics-assisted breeding toward the exploitation of the natural variation and the identification of target genes. We also explore the application of ML in managing big data and predictive models, reporting a case study using microRNAs (miRNAs) to identify genes related to stress conditions.
Full-text available
It is generally believed that domestication and breeding of plants has led to genetic erosion, including loss of nutritional value and resistances to diseases, especially in tomato. We studied the diversity dynamics of greenhouse tomato varieties in NW Europe, especially The Netherlands, over the last seven decades. According to the used SNP array, the genetic diversity was indeed very low during the 1960s, but is now eight times higher when compared to that dip. The pressure since the 1970s to apply less pesticides led to the introgression of many disease resistances from wild relatives, representing the first boost of genetic diversity. In Europe a second boost ensued, largely driven by German popular media who named poor tasting tomatoes Wasserbomben (water bombs). The subsequent collapse of Dutch tomato exports to Germany fueled breeding for fruit flavor, further increasing diversity since the 1990s. The increased diversity in composition of aroma volatiles observed starting from 1990s may reflect the efforts of breeders to improve fruit quality. Specific groups of aroma compounds showed different quantitative trend over the decades studied. Our study provides compelling evidence that breeding has increased the diversity of tomato varieties considerably since the 1970s.
Full-text available
Single primer enrichment technology (SPET) is a new, robust, and customizable solution for targeted genotyping. Unlike genotyping by sequencing (GBS), and like DNA chips, SPET is a targeted genotyping technology, relying on the sequencing of a region flanking a primer. Its reliance on single primers, rather than on primer pairs, greatly simplifies panel design, and allows higher levels of multiplexing than PCR-based genotyping. Thanks to the sequencing of the regions surrounding the target SNP, SPET allows the discovery of thousands of closely linked, novel SNPs. In order to assess the potential of SPET for high-throughput genotyping in plants, a panel comprising 5k target SNPs, designed both on coding regions and introns/UTRs, was developed for tomato and eggplant. Genotyping of two panels composed of 400 tomato and 422 eggplant accessions, comprising both domesticated material and wild relatives, generated a total of 12,002 and 30,731 high confidence SNPs, respectively, which comprised both target and novel SNPs in an approximate ratio of 1:1.6, and 1:5.5 in tomato and eggplant, respectively. The vast majority of the markers was transferrable to related species that diverged up to 3.4 million years ago (Solanum pennellii for tomato and S. macrocarpon for eggplant). Maximum Likelihood phylogenetic trees and PCA outputs obtained from the whole dataset highlighted genetic relationships among accessions and species which were congruent with what was previously reported in literature. Better discrimination among domesticated accessions was achieved by using the target SNPs, while better discrimination among wild species was achieved using the whole SNP dataset. Our results reveal that SPET genotyping is a robust, high-throughput technology for genetic fingerprinting, with a high degree of cross-transferability between crops and their cultivated and wild relatives, and allows identification of duplicates and mislabeled accessions in genebanks.
Full-text available
Genomic selection offers several routes for increasing the genetic gain or efficiency of plant breeding programmes. In various species of livestock, there is empirical evidence of increased rates of genetic gain from the use of genomic selection to target different aspects of the breeder’s equation. Accurate predictions of genomic breeding value are central to this, and the design of training sets is in turn central to achieving sufficient levels of accuracy. In summary, small numbers of close relatives and very large numbers of distant relatives are expected to enable predictions with higher accuracy. To quantify the effect of some of the properties of training sets on the accuracy of genomic selection in crops, we performed an extensive field-based winter wheat trial. In summary, this trial involved the construction of 44 F2:4 bi- and tri-parental populations, from which 2992 lines were grown on four field locations and yield was measured. For each line, genotype data were generated for 25 K segregating SNP markers. The overall heritability of yield was estimated to 0.65, and estimates within individual families ranged between 0.10 and 0.85. Genomic prediction accuracies of yield BLUEs were 0.125–0.127 using two different cross-validation approaches and generally increased with training set size. Using related crosses in training and validation sets generally resulted in higher prediction accuracies than using unrelated crosses. The results of this study emphasise the importance of the training panel design in relation to the genetic material to which the resulting prediction model is to be applied.
Full-text available
Genomic Selection (GS) has been proved to be a powerful tool for estimating genetic values in plant and livestock breeding. Newly developed sequencing technologies have dramatically reduced the cost of genotyping and significantly increased the scale of genotype data that used for GS. Meanwhile, state-of-the-art statistical methods were developed to make the best use of high marker density genotype data. In this study, 14 traits from four data sets of three species (maize, cattle, and pig) and five influential factors that affect the prediction accuracy were evaluated, including marker density (from 1 to ~600 k), statistical method (GBLUP-A, GBLUP-AD, and BayesR), minor allele frequency (MAF), heritability, and genetic architecture. Results indicate that in the GBLUP method, higher marker density leads to a higher prediction accuracy. In contrast, BayesR method needs more Monte Carlo Markov Chain (MCMC) iterations to reach the convergence and get reliable prediction values. BayesR outperforms GBLUP in predicting high or medium heritability trait that affected by one or several genes with large effects, while GBLUP performs similarly or slightly better than BayesR in predicting low heritability trait that controlled by a large amount of genes with minor effects. Prediction accuracy of trait with complex genetic architecture can be improved by increasing the marker density. Interestingly, for simple traits that controlled by one or several genes with large effects, higher marker density can cause a lower prediction accuracy if the QTN is included, but leads to a higher prediction accuracy if the QTN is excluded. The quantity of genetic markers with low MAF would not significantly affect the prediction accuracy of GBLUP, but results in a bad prediction accuracy performance of BayesR method. Compared with GBLUP-A, GBLUP-AD didn't show any advantages in capturing the non-additive variance for the traits with high heritability. The factors that affected prediction accuracy are discussed in this study and indicate that a combination of either GBLUP or BayesR method with moderate marker density and favorable polymorphism single nucleotide polymorphisms (SNPs) (~25 k SNPs) would always produce a good and stable prediction accuracy with acceptable breeding and computational costs.
Full-text available
Key message The optimization of training populations and the use of diagnostic markers as fixed effects increase the predictive ability of genomic prediction models in a cooperative wheat breeding panel. Abstract Plant breeding programs often have access to a large amount of historical data that is highly unbalanced, particularly across years. This study examined approaches to utilize these data sets as training populations to integrate genomic selection into existing pipelines. We used cross-validation to evaluate predictive ability in an unbalanced data set of 467 winter wheat (Triticum aestivum L.) genotypes evaluated in the Gulf Atlantic Wheat Nursery from 2008 to 2016. We evaluated the impact of different training population sizes and training population selection methods (Random, Clustering, PEVmean and PEVmean1) on predictive ability. We also evaluated inclusion of markers associated with major genes as fixed effects in prediction models for heading date, plant height, and resistance to powdery mildew (caused by Blumeria graminis f. sp. tritici). Increases in predictive ability as the size of the training population increased were more evident for Random and Clustering training population selection methods than for PEVmean and PEVmean1. The selection methods based on minimization of the prediction error variance (PEV) outperformed the Random and Clustering methods across all the population sizes. Major genes added as fixed effects always improved model predictive ability, with the greatest gains coming from combinations of multiple genes. Maximum predictabilities among all prediction methods were 0.64 for grain yield, 0.56 for test weight, 0.71 for heading date, 0.73 for plant height, and 0.60 for powdery mildew resistance. Our results demonstrate the utility of combining unbalanced phenotypic records with genome-wide SNP marker data for predicting the performance of untested genotypes.
Full-text available
With marker and phenotype information from observed populations, genomic selection (GS) can be used to establish associations between markers and phenotypes. It aims to use genome-wide markers to estimate the effects of all loci and thereby predict the genetic values of untested populations, so as to achieve more comprehensive and reliable selection and to accelerate genetic progress in crop breeding. GS models usually face the problem that the number of markers is much higher than the number of phenotypic observations. To overcome this issue and improve prediction accuracy, many models and algorithms, including GBLUP, Bayes, and machine learning have been employed for GS. As hot issues in GS research, the estimation of non-additive genetic effects and the combined analysis of multiple traits or multiple environments are also important for improving the accuracy of prediction. In recent years, crop breeding has taken advantage of the development of GS. The principles and characteristics of current popular GS methods and research progress in these methods for crop improvement are reviewed in this paper.
Full-text available
Cucurbit crops are economically important worldwide. One of the most serious threats to cucurbit production is Zucchini yellow mosaic virus (ZYMV). Several resistant accessions were identified in Cucurbita moschata and their resistance was introgressed into Cucurbita pepo. However, the mode of inheritance of ZYMV resistance in C. pepo presents a great challenge to attempts at introgressing resistance into elite germplasm. The main goal of this work was to analyze the inheritance of ZYMV resistance and to identify markers associated with genes conferring resistance. An Illumina GoldenGate assay allowed us to assess polymorphism among nine squash genotypes and to discover six polymorphic single-nucleotide polymorphisms (SNPs) between two near-isogenic lines, “True French” (susceptible to ZYMV) and Accession 381e (resistant to ZYMV). Two F2 and three BC1 populations obtained from crossing the ZYMV-resistant Accession 381e with two susceptible ones, the zucchini True French and the cocozelle “San Pasquale,” were assayed for ZYMV resistance. Molecular analysis revealed an approximately 90% association between SNP1 and resistance, which was confirmed using High Resolution Melt (HRM) and a CAPS marker. Co-segregation up to 72% in populations segregating for resistance was observed for two other SNP markers that could be potentially linked to genes involved in resistance expression. A functional prediction of proteins involved in the resistance response was performed on genome scaffolds containing the three SNPs of interest. Indeed, 16 full-length pathogen recognition genes (PRGs) were identified around the three SNP markers. In particular, we discovered that two nucleotide-binding site leucine-rich repeat (NBS-LRR) protein-encoding genes were located near the SNP1 marker. The investigation of ZYMV resistance in squash populations and the genomic analysis performed in this work could be useful for better directing the introgression of disease resistance into elite C. pepo germplasm.
Bacterial spot affects tomatoes (Solanum lycopersicum L.) grown under humid conditions. Major genes and quantitative trait loci (QTL) for resistance have been described, and multiple loci from diverse sources need to be combined to improve disease control. We investigated genomic selection (GS) prediction models for resistance to Xanthomonas euvesicatoria and experimentally evaluated the accuracy of these models. The training population consisted of 109 families combining resistance from four sources, and directionally selected from a population of 1,100 individuals. The families were evaluated on a plot basis in replicated inoculated trials, and genotyped with single nucleotide polymorphisms (SNPs). We compared the prediction ability of models developed with 14 to 387 SNPs. Genomic Estimated Breeding Values (GEBVs) were derived using Bayesian LASSO (BL) and ridge regression (RR). Evaluations were based on Leave-one-out cross validation and on empirical observations in replicated field trials using the next generation of inbred progeny and a hybrid population resulting from selections in the training population. Prediction ability was evaluated based on correlations between GEBVs and phenotypes (rg), percentage of co-selection between genomic and phenotypic selection, and relative efficiency of selection (rg/rp). Results were similar with BL and RR models. Models using only markers previously identified as significantly associated with resistance but weighted based on GEBV and mixed models with markers associated with resistance treated as fixed effects and markers distributed in the genome treated as random effects offered greater accuracy and a high percentage of co-selection. The accuracy of these models to predict the performance of progeny and hybrids exceeded the accuracy of phenotypic selection.