Jose Crossa

Jose Crossa
Consultative Group on International Agricultural Research | CGIAR · Biometrics and Statistics Unit

PhD

About

690
Publications
195,318
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
28,834
Citations

Publications

Publications (690)
Article
The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors to improve prediction accuracy. The genomic best...
Article
Full-text available
Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning met...
Preprint
Full-text available
Linking high-throughput environmental data (enviromics) into genomic prediction (GP) is a cost-effective strategy for increasing selection intensity under genotype-by-environment interactions (GxE). This study developed a data-driven approach based on Environment-Phenotype Associations (EPA) aimed at recycling important GxE information from histori...
Preprint
Full-text available
In this study we extend research on genomic prediction (GP) to polysomic polyploid plant species with the main objective to investigate single trait (ST) versus multi-trait (MT) for multi-environment (ME) models for the combination of three locations in Sweden (Helgegrden [HEL], Mosslunda [MOS], Ume [UM]) over two year-trials (2020, 2021) of 253 po...
Preprint
Full-text available
Plant breeders widely use recurrent selection schemes to increase the frequency of favorable alleles for quantitative traits in a population. Although simultaneous selection is complex because it involves several traits combined with selection cycles, the use of selection indexes (SI) is applied to increase the chance of success of the breeding pro...
Article
Full-text available
Wheat dough characteristics and end-use quality are strongly influenced by the amount and specific composition of the glutenins, the major components of gluten. Such proteins are divided into high-molecular-weight glutenins, encoded by the Glu-A1, Glu-B1 and Glu-D1 loci; and low-molecular-weight glutenins, encoded by the Glu-A3, Glu-B3 and Glu-D3 l...
Article
Full-text available
Key Message This study performed comprehensive analyses on the predictive abilities of single-trait and two multi-trait models in three populations. Our results demonstrated the superiority of multi-traits over single-trait models across seven agronomic and four to seven disease resistance traits of different genetic architecture. Abstract The pre...
Article
Full-text available
Genetic gains (ΔG) are determined by the breeders' equation ΔG = [(ck σ2G)/(y σP)], where c, k and y are the parental control, a function of the selection intensity and number of years to complete one selection cycle, respectively, while σ2G and are σP the genetic variance and the square root of the phenotypic variance. Plant breeding programs shou...
Article
Full-text available
Both the Linear Phenotypic Selection Index (LPSI) and the Restrictive Linear Phenotypic Selection Index (RLPSI) have been widely used to select parents and progenies, but the effect of economic weights on the selection parameters (the expected genetic gain, response to selection, and the correlation between the indices and genetic merits) have not...
Article
Full-text available
In plant breeding, the need to improve the prediction of future seasons or new locations and/or environments, also denoted as "leave one environment out," is of paramount importance to increase the genetic gain in breeding programs and contribute to food and nutrition security worldwide. Genomic selection (GS) has the potential to increase the accu...
Chapter
The plant net genetic merit is a linear combination of trait breeding values weighted by its respective economic weights whereas a linear selection index (LSI) is a linear combination of phenotypic or genomic estimated breeding values (GEBV) which is used to predict the net genetic merit of candidates for selection. Because economic values are diff...
Chapter
Full-text available
The main objective of a plant breeding program is to deliver superior germplasm for farmers in a defined set of environments, or a target population of environments (TPE). Historically, CIMMYT has characterized the environments in which the developed germplasm will be grown. The main factors that determine when and where a wheat variety can be grow...
Article
Full-text available
The adoption of machine learning frameworks in areas beyond computer science have been facilitated by the development of user-friendly software tools that do not require an advanced understanding of computer programming. In this paper, we present a new package (sparse kernel methods, SKM) software developed in R language for implementing six (gener...
Article
Ridge regression dealswith collinearity in the homoscedastic linear regression model. When the number of predictors (p) is much larger than the number of observations (n), it gives unique least-square estimators. From both, classical and Bayesian approaches, parameter estimation is a highly demanding computational task, in the first one being an op...
Article
Full-text available
Key message Sparse testing using genomic prediction can be efficiently used to increase the number of testing environments while maintaining selection intensity in the early yield testing stage without increasing the breeding budget. Abstract Sparse testing using genomic prediction enables expanded use of selection environments in early-stage yiel...
Chapter
Sound experimental design underpins successful plant improvement research. Robust experimental designs respect fundamental principles including replication, randomization and blocking, and avoid bias and pseudo-replication. Classical experimental designs seek to mitigate the effects of spatial variability with resolvable block plot structures. Rece...
Article
Full-text available
Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine‐learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine‐learning methods are being implemente...
Chapter
Full-text available
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype x environment (G x E) i...
Chapter
Full-text available
Genomic enabled prediction is playing a key role for the success of genomic selection (GS). However, according to the No Free Lunch Theorem, there is not a universal model that performs well for all data sets. Due to this, many statistical and machine learning models are available for genomic prediction. When multitrait data is available, models th...
Article
Full-text available
Vitamin A deficiency (VAD) is a public health issue worldwide. Provitamin A (PVA) biofortified maize serves as an alternative to help combat VAD. Breeding efforts to develop maize varieties with high PVA carotenoid content combine molecular and phenotypic selection strategies. The phenotypic assessment of carotenoids is currently done using liquid...
Chapter
Genomic selection (GS) is a methodology that revolutionized the process of breeding improved genetic materials in plant and animal breeding programs. It uses predicted genomic values of the potential of untested/unobserved genotypes as surrogates of phenotypes during the selection process. Such that the predicted genomic values are obtained using e...
Chapter
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic...
Chapter
Full-text available
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) i...
Article
Full-text available
Genomic enabled prediction is playing a key role for the success of genomic selection (GS). However, according to the No Free Lunch Theorem, there is not a universal model that performs well for all data sets. Due to this, many statistical and machine learning models are available for genomic prediction. When multitrait data is available, models th...
Article
Full-text available
Interview to Dr. Osval Antonio Montesinos-López and Dr. José Crossa about the Future of Genomics-Driven Plant Breeding? and the role of the disruptive methodology called DEEP LEARNING in helping and and improving the genomic selection method.
Article
Full-text available
Some studies have investigated the potential of genomic selection (GS) on stripe rust, leaf rust, Fusarium head blight (FHB), and leaf spot in wheat, but none of them have assessed the effect of the reaction norm model that incorporated GE interactions. In addition, the prediction accuracy on common bunt has not previously been studied. Here, we in...
Article
Full-text available
Machine learning methods such as multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this context, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a...
Article
Full-text available
Genomic selection (GS) is a predictive methodology that trains statistical machinelearning models with a reference population that is used to perform genome-enabled predictions of new lines. In plant breeding, it has the potential to increase the speed and reduce the cost of selection. However, to optimize resources, sparse testing methods have bee...
Article
Full-text available
Potato breeding must improve its efficiency by increasing the reliability of selection as well as identifying a promising germplasm for crossing. This study shows the prediction accuracy of genomic-estimated breeding values for several potato (Solanum tuberosum L.) breeding clones and the released cultivars that were evaluated at three locations in...
Article
Full-text available
Key message Using phenotype data of three spring wheat populations evaluated at 6–15 environments under two management systems, we found moderate to very high prediction accuracies across seven traits. The phenotype data collected under an organic management system effectively predicted the performance of lines in the conventional management and vi...
Chapter
Full-text available
We give a detailed description of random forest and exemplify its use with data from plant breeding and genomic selection. The motivations for using random forest in genomic-enabled prediction are explained. Then we describe the process of building decision trees, which are a key component for building random forest models. We give (1) the random f...
Chapter
Full-text available
The Bayesian paradigm for parameter estimation is introduced and linked to the main problem of genomic-enabled prediction to predict the trait of interest of the non-phenotyped individuals from genotypic information, environment variables, or other information (covariates). In this situation, a convenient practice is to include the individuals to b...
Chapter
Full-text available
Nowadays, huge data quantities are collected and analyzed for delivering deep insights into biological processes and human behavior. This chapter assesses the use of big data for prediction and estimation through statistical machine learning and its applications in agriculture and genetics in general, and specifically, for genome-based prediction a...
Chapter
Full-text available
This chapter deals with the main theoretical fundamentals and practical issues of using functional regression in the context of genomic prediction. We explain how to represent data in functions by means of basis functions and considered two basis functions: Fourier for periodic or near-periodic data and B-splines for nonperiodic data. We derived th...
Chapter
Full-text available
This data preparation chapter is of paramount importance for implementing statistical machine learning methods for genomic selection. We present the basic linear mixed model that gives rise to BLUE and BLUP and explain how to decide when to use fixed or random effects that give rise to best linear unbiased estimates (BLUE or BLUEs) and best linear...
Chapter
Full-text available
The overfitting phenomenon happens when a statistical machine learning model learns very well about the noise as well as the signal that is present in the training data. On the other hand, an underfitted phenomenon occurs when only a few predictors are included in the statistical machine learning model that represents the complete structure of the...
Chapter
Full-text available
The linear mixed model framework is explained in detail in this chapter. We explore three methods of parameter estimation (maximum likelihood, EM algorithm, and REML) and illustrate how genomic-enabled predictions are performed under this framework. We illustrate the use of linear mixed models by using the predictor several components such as envir...
Chapter
Full-text available
In this chapter, we explain, under a Bayesian framework, the fundamentals and practical issues for implementing genomic prediction models for categorical and count traits. First, we derive the Bayesian ordinal model and exemplify it with plant breeding data. These examples were implemented in the library BGLR. We also derive the ordinal logistic re...
Chapter
Full-text available
The fundamentals for Reproducing Kernel Hilbert Spaces (RKHS) regression methods are described in this chapter. We first point out the virtues of RKHS regression methods and why these methods are gaining a lot of acceptance in statistical machine learning. Key elements for the construction of RKHS regression methods are provided, the kernel trick i...
Chapter
Full-text available
We provide the fundamentals of convolutional neural networks (CNNs) and include several examples using the Keras library. We give a formal motivation for using CNN that clearly shows the advantages of this topology compared to feedforward networks for processing images. Several practical examples with plant breeding data are provided using CNNs und...
Chapter
Full-text available
This chapter gives details of the linear multiple regression model including assumptions and some pros and cons, the maximum likelihood. Gradient descendent methods are described for learning the parameters under this model. Penalized linear multiple regression is derived under Ridge and Lasso penalties, which also emphasizes the estimation of the...
Chapter
Full-text available
In this chapter, we go through the fundamentals of artificial neural networks and deep learning methods. We describe the inspiration for artificial neural networks and how the methods of deep learning are built. We define the activation function and its role in capturing nonlinear patterns in the input data. We explain the universal approximation t...
Chapter
Full-text available
In this chapter, the support vector machines (svm) methods are studied. We first point out the origin and popularity of these methods and then we define the hyperplane concept which is the key for building these methods. We derive methods related to svm: the maximum margin classifier and the support vector classifier. We describe the derivation of...
Chapter
Full-text available
This chapter provides elements for implementing deep neural networks (deep learning) for continuous outcomes. We give details of the hyperparameters to be tuned in deep neural networks and provide a general guide for doing this task with more probability of success. Then we explain the most popular deep learning frameworks that can be used to imple...
Chapter
Full-text available
In this chapter, we provide the main elements for implementing deep neural networks in Keras for binary, categorical, and mixed outcomes under feedforward networks as well as the main practical issues involved in implementing deep learning models with binary response variables. The same practical issues are provided for implementing deep neural net...
Article
Full-text available
We investigated increasing genetic gain for grain yield using early generation genomic selection (GS). A training set of 1,334 elite wheat breeding lines tested over three field seasons was used to generate Genomic Estimated Breeding Values (GEBVs) for grain yield under irrigated conditions applying markers and three different prediction methods: (...
Article
Full-text available
The breeding of new cultivars is a powerful approach to increase both the quantity and quality of potato harvest per land unit. The aim of this research was to determine using multi-site testing the progress made by the genetic enhancement of potato in Sweden in the last 1.5 decades by comparing advanced breeding clones (T 4 upwards) bred in Sweden...
Article
Full-text available
Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been wi...
Article
Full-text available
Potato breeding aims to improve crop productivity, quality and resilience based on heritable characteristics. Estimating the trait heritability and correlations—both genetic and phenotypic—among characteristics in a target population of environments allows us to define the best breeding method that leads to selection gains. Breeding clones (47) and...
Preprint
Full-text available
Drought tolerance in maize is a complex and polygenic trait, especially in the seedling stage. In plant breeding, such traits can be improved by genomic selection (GS), which has become a practical and effective tool. In the present study, a natural maize population named Northeast China core population (NCCP) consisting of 379 inbred lines were ge...
Article
Full-text available
A linear selection index (LSI) can be a linear combination of phenotypic values, marker scores, and genomic estimated breeding values (GEBVs); phenotypic values and marker scores; or phenotypic values and GEBVs jointly. The main objective of the LSI is to predict the net genetic merit (H), which is a linear combination of unobservable individual tr...
Article
Full-text available
When multi-trait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this paper we explore Bayesian multi-trait kernel methods for genomic prediction and...
Article
Full-text available
Zero hunger and good health could be realized by 2030 through effective conservation, characterization and utilization of germplasm resources ¹ . So far, few chickpea ( Cicer arietinum ) germplasm accessions have been characterized at the genome sequence level ² . Here we present a detailed map of variation in 3,171 cultivated and 195 wild accessio...
Article
Wheat (Triticum aestivum L.) is an essential food security crop in Afghanistan. To determine the contribution of wheat breeding to increasing productivity we analysed data obtained from 192 trials conducted over 11 locations from 2002–03 to 2015–16. Using this data, we estimated annual genetic gains for grain yield, days to heading and plant height...
Article
Full-text available
Quantitative genetics states that phenotypic variation is a consequence of the interaction between genetic and environmental factors. Predictive breeding is based on this statement, and because of this, ways of modeling genetic effects are still evolving. At the same time, the same refinement must be used for processing environmental information. H...
Article
Permanent raised beds (PB) are a conservation agriculture option for irrigated conditions that can improve soil quality, increase soil moisture conservation and stabilize yields compared to conventional furrow irrigation. In irrigated wheat (Triticum sp.) production, wet sowing (i.e. applying irrigation before sowing) is most widely used. It allows...
Article
Full-text available
Crop production systems need to expand their outputs sustainably to feed a burgeoning human population. Advances in genome sequencing technologies combined with efficient trait mapping procedures accelerate the availability of beneficial alleles for breeding and research. Enhanced interoperability between different omics and phenotyping platforms,...
Article
Full-text available
Sparse testing in genome-enabled prediction in plant breeding can be emulated throughout different line allocations where some lines are observed in all environments (overlap) and others are observed in only one environment (nonoverlap). We studied three general cases of the composition of the sparse testing allocation design for genome-enabled pre...
Article
Full-text available
Droughts and high temperatures are the main abiotic constraints hampering durum wheat production. This study investigated the accumulation of phenolic acids (PAs) in the wholemeal flour of six durum wheat cultivars under drought and heat stress. Phenolic acids were extracted from wholemeals and analysed through HPLC-DAD analysis. Ferulic acid was t...