[Show abstract][Hide abstract] ABSTRACT: Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein we review parametric methods including Least Squares Regression, Ridge Regression, Bayesian Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson Estimator, Reproducing Kernel Hilbert Space, Support Vector Machine Regression, and Neural Networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely upon epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE.
[Show abstract][Hide abstract] ABSTRACT: Identification of allelic variants associated with complex traits provides molecular genetic information associated with variability upon which both artificial and natural selections are based. Family-based association mapping (FBAM) takes advantage of linkage disequilibrium among segregating progeny within crosses and among parents to provide greater power than association mapping and greater resolution than linkage mapping. Herein, we discuss the potential adaption of human family-based association tests and quantitative transmission disequilibrium tests for use in crop species. The rapid technological advancement of next generation sequencing will enable sequencing of all parents in a planned crossing design, with subsequent imputation of genotypes for all segregating progeny. These technical advancements are easily adapted to mating designs routinely used by plant breeders. Thus, FBAM has the potential to be widely adopted for discovering alleles, common and rare, underlying complex traits in crop species.
Theoretical and Applied Genetics 04/2013; · 3.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Detection of quantitative trait loci (QTL) controlling complex traits followed by selection has become a common approach for selection in crop plants. The QTL are most often identified by linkage mapping using experimental F(2), backcross, advanced inbred, or doubled haploid families. An alternative approach for QTL detection are genome-wide association studies (GWAS) that use pre-existing lines such as those found in breeding programs. We explored the implementation of GWAS in oat (Avena sativa L.) to identify QTL affecting β-glucan concentration, a soluble dietary fiber with several human health benefits when consumed as a whole grain. A total of 431 lines of worldwide origin were tested over 2 years and genotyped using Diversity Array Technology (DArT) markers. A mixed model approach was used where both population structure fixed effects and pair-wise kinship random effects were included. Various mixed models that differed with respect to population structure and kinship were tested for their ability to control for false positives. As expected, given the level of population structure previously described in oat, population structure did not play a large role in controlling for false positives. Three independent markers were significantly associated with β-glucan concentration. Significant marker sequences were compared with rice and one of the three showed sequence homology to genes localized on rice chromosome seven adjacent to the CslF gene family, known to have β-glucan synthase function. Results indicate that GWAS in oat can be a successful option for QTL detection, more so with future development of higher-density markers.
Theoretical and Applied Genetics 08/2012; · 3.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Nested Association Mapping (NAM) has been proposed as a means to combine the power of linkage mapping with the resolution of association mapping. It is enabled through sequencing or array genotyping of parental inbred lines while using low-cost, low-density genotyping technologies for their segregating progenies. For purposes of data analyses of NAM populations, parental genotypes at a large number of Single Nucleotide Polymorphic (SNP) loci need to be projected to their segregating progeny. Herein we demonstrate how approximately 0.5 million SNPs that have been genotyped in 26 parental lines of the publicly available maize NAM population can be projected onto their segregating progeny using only 1,106 SNP loci that have been genotyped in both the parents and their 5,000 progeny. The challenge is to estimate both the genotype and genetic location of the parental SNP genotypes in segregating progeny. Both challenges were met by estimating their expected genotypic values conditional on observed flanking markers through the use of both physical and linkage maps. About 90%, of 500,000 genotyped SNPs from the maize HapMap project, were assigned linkage map positions using linear interpolation between the maize Accessioned Gold Path (AGP) and NAM linkage maps. Of these, almost 70% provided high probability estimates of genotypes in almost 5,000 recombinant inbred lines.
[Show abstract][Hide abstract] ABSTRACT: We present a multi-objective integer programming model for the gene stacking problem, which is to bring desirable alleles found in multiple inbred lines to a single target genotype. Pareto optimal solutions from the model provide strategic stacking schemes to maximize the likelihood of successfully creating the target genotypes and to minimize the number of generations associated with a stacking strategy. A consideration of genetic diversity is also incorporated in the models to preserve all desirable allelic variants in the target population. Although the gene stacking problem is proved to be NP-hard, we have been able to obtain Pareto frontiers for smaller sized instances within one minute using the state-of-the-art commercial computer solvers in our computational experiments.
European Journal of Operational Research. 01/2011; 214:168-178.
[Show abstract][Hide abstract] ABSTRACT: Genomic selection (GS) is a method to estimate the breeding values of individuals by using markers throughout the genome. We evaluated the accuracies of GS using data from five traits on 446 oat (Avena sativa L.) lines genotyped with 1005 Diversity Array Technology (DArT) markers and two GS methods (ridge regression–best linear unbiased prediction [RR-BLUP] and BayesCπ) under various training designs. Our objectives were to (i) determine accuracy under increasing marker density and training population size, (ii) assess accuracies when data is divided over time, and (iii) examine accuracy in the presence of population structure. Accuracy increased as the number of markers and training size become larger. Including older lines in the training population increased or maintained accuracy, indicating that older generations retained information useful for predicting validation populations. The presence of population structure affected accuracy: when training and validation subpopulations were closely related accuracy was greater than when they were distantly related, implying that linkage disequilibrium (LD) relationships changed across subpopulations. Across many scenarios involving large training populations, the accuracy of BayesCπ and RR-BLUP did not differ. This empirical study provided evidence regarding the application of GS to hasten the delivery of cultivars through the use of inexpensive and abundant molecular markers available to the public sector.
[Show abstract][Hide abstract] ABSTRACT: Identification of functional markers (FMs) provides information about the genetic architecture underlying complex traits. An approach that combines the strengths of linkage and association mapping, referred to as nested association mapping (NAM), has been proposed to identify FMs in many plant species. The ability to identify and resolve FMs for complex traits depends upon a number of factors including frequency of FM alleles, magnitudes of their genetic effects, disequilibrium among functional and nonfunctional markers, statistical analysis methods, and mating design. The statistical characteristics of power, accuracy, and precision to identify FMs with a NAM population were investigated using three simulation studies. The simulated data sets utilized publicly available genetic sequences and simulated FMs were identified using least-squares variable selection methods. Results indicate that FMs with simple additive genetic effects that contribute at least 5% to the phenotypic variability in at least five segregating families of a NAM population consisting of recombinant inbred progeny derived from 28 matings with a single reference inbred will have adequate power to accurately and precisely identify FMs. This resolution and power are possible even for genetic architectures consisting of disequilibrium among multiple functional and nonfunctional markers in the same genomic region, although the resolution of FMs will deteriorate rapidly if more than two FMs are tightly linked within the same amplicon. Finally, nested mating designs involving several reference parents will have a greater likelihood of resolving FMs than single reference designs.
[Show abstract][Hide abstract] ABSTRACT: Infection is a leading cause of neonatal morbidity and mortality worldwide. Premature neonates are particularly susceptible to infection because of physiologic immaturity, comorbidity, and extraneous medical interventions. Additionally premature infants are at higher risk of progression to sepsis or severe sepsis, adverse outcomes, and antimicrobial toxicity. Currently initial diagnosis is based upon clinical suspicion accompanied by nonspecific clinical signs and is confirmed upon positive microbiologic culture results several days after institution of empiric therapy. There exists a significant need for rapid, objective, in vitro tests for diagnosis of infection in neonates who are experiencing clinical instability. We used immunoassays multiplexed on microarrays to identify differentially expressed serum proteins in clinically infected and non-infected neonates. Immunoassay arrays were effective for measurement of more than 100 cytokines in small volumes of serum available from neonates. Our analyses revealed significant alterations in levels of eight serum proteins in infected neonates that are associated with inflammation, coagulation, and fibrinolysis. Specifically P- and E-selectins, interleukin 2 soluble receptor alpha, interleukin 18, neutrophil elastase, urokinase plasminogen activator and its cognate receptor, and C-reactive protein were observed at statistically significant increased levels. Multivariate classifiers based on combinations of serum analytes exhibited better diagnostic specificity and sensitivity than single analytes. Multiplexed immunoassays of serum cytokines may have clinical utility as an adjunct for rapid diagnosis of infection and differentiation of etiologic agent in neonates with clinical decompensation.
[Show abstract][Hide abstract] ABSTRACT: Although genetic studies have been critically important for the identification of therapeutic targets in Mendelian disorders, genetic approaches aiming to identify targets for common, complex diseases have traditionally had much more limited success. However, during the past year, a novel genetic approach - genome-wide association (GWA) - has demonstrated its potential to identify common genetic variants associated with complex diseases such as diabetes, inflammatory bowel disease and cancer. Here, we highlight some of these recent successes, and discuss the potential for GWA studies to identify novel therapeutic targets and genetic biomarkers that will be useful for drug discovery, patient selection and stratification in common diseases.
dressNature Reviews Drug Discovery 04/2008; 7(3):221-30. · 33.08 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Schizophrenia (SCZ) is a common, disabling mental illness with high heritability but complex, poorly understood genetic etiology. As the first phase of a genomic convergence analysis of SCZ, we generated 16.7 billion nucleotides of short read, shotgun sequences of cDNA from post-mortem cerebellar cortices of 14 patients and six, matched controls. A rigorous analysis pipeline was developed for analysis of digital gene expression studies. Sequences aligned to approximately 33,200 transcripts in each sample, with average coverage of 450 reads per gene. Following adjustments for confounding clinical, sample and experimental sources of variation, 215 genes differed significantly in expression between cases and controls. Golgi apparatus, vesicular transport, membrane association, Zinc binding and regulation of transcription were over-represented among differentially expressed genes. Twenty three genes with altered expression and involvement in presynaptic vesicular transport, Golgi function and GABAergic neurotransmission define a unifying molecular hypothesis for dysfunction in cerebellar cortex in SCZ.
PLoS ONE 02/2008; 3(11):e3625. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Novel, comprehensive approaches for biomarker discovery and validation are urgently needed. One particular area of methodologic need is for discovery of novel genetic biomarkers in complex diseases and traits. Here, we review recent successes in the use of genome wide association (GWA) approaches to identify genetic biomarkers in common human diseases and traits. Such studies are yielding initial insights into the allelic architecture of complex traits. In general, it appears that complex diseases are associated with many common polymorphisms, implying profound genetic heterogeneity between affected individuals.
[Show abstract][Hide abstract] ABSTRACT: Comparative genomics is an emerging and powerful approach to achieve crop improvement. Using comparative genomics, information from model plant species can accelerate the discovery of genes responsible for disease and pest resistance, tolerance to plant stresses such as drought, and enhanced nutritional value including production of anti-oxidants and anti-cancer compounds. We demonstrate here how to use the Legume Information System for a comparative genomics study, leveraging genomic information from Medicago truncatula (barrel medic), the model legume, to find candidate genes involved with sudden death syndrome (SDS) in Glycine max (soybean). Specifically, genetic maps, physical maps, and annotated tentative consensus and expressed sequence tag (EST) sequences from G. max and M. truncatula can be compared. In addition, the recently published M. truncatula genomic sequences can be used to identify M. truncatula candidate genes in a genomic region syntenic to a quantitative trait loci region for SDS in soybean. Genomic sequences of candidate genes from M. truncatula can then be used to identify ESTs with sequence similarities from soybean for primer design and cloning of potential soybean disease causing alleles.
[Show abstract][Hide abstract] ABSTRACT: The Legume Information System (LIS) (http://www.comparative-legumes.org), developed by the National Center for Genome Resources in cooperation with the USDA Agricultural Research Service (ARS), is a comparative legume resource that integrates genetic and molecular data from multiple legume species enabling cross-species genomic and transcript comparisons. The LIS virtual plant interface allows simplified and intuitive navigation of transcript data from Medicago truncatula, Lotus japonicus, Glycine max and Arabidopsis thaliana. Transcript libraries are represented as images of plant organs in different developmental stages, which are selected to query the analyzed and annotated data. Complex queries can be accomplished by adding modifiers, keywords and sequence names. The LIS also contains annotated genomic data featuring transcript alignments to validate gene predictions as well as motif and similarity analyses. The genomic browser supports comparative analysis via novel dynamic functional annotation comparisons. CMap, developed as part of the GMOD project (http://www.gmod.org/cmap/index.shtml), has been incorporated to support comparative analyses of community linkage and physical map data. LIS is being expanded to incorporate gene expression and biochemical pathways which will be seamlessly integrated forming a knowledge discovery framework.
Nucleic Acids Research 02/2005; 33(Database issue):D660-5. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The goal of the Complex Trait Consortium is to promote the development of resources that can be used to understand, treat and ultimately prevent pervasive human diseases. Existing and proposed mouse resources that are optimized to study the actions of isolated genetic loci on a fixed background are less effective for studying intact polygenic networks and interactions among genes, environments, pathogens and other factors. The Collaborative Cross will provide a common reference panel specifically designed for the integrative analysis of complex systems and will change the way we approach human health and disease.
[Show abstract][Hide abstract] ABSTRACT: A molecular understanding of porcine reproduction is of biological interest and economic importance. Our Midwest Consortium has produced cDNA libraries containing the majority of genes expressed in major female reproductive tissues, and we have deposited into public databases 21,499 expressed sequence tag (EST) gene sequences from the 3' end of clones from these libraries. These sequences represent 10,574 different genes, based on sequence comparison among these data, and comparison with existing porcine ESTs and genes indicate as many as 4652 of these EST clusters are novel. In silico analysis identified sequences that are expressed in specific pig tissues or organs and confirmed the broad expression in pig for many genes ubiquitously expressed in human tissues. Furthermore, we have developed computer software to identify sequence similarity of these pig genes with their human counterparts, and to extract the mapping information of these human homologues from genome databases. We demonstrate the utility of this software for comparative mapping by localizing 61 genes on the porcine physical map for Chromosomes (Chrs) 5, 10, and 14.
[Show abstract][Hide abstract] ABSTRACT: Applied breeding programs evaluate large numbers of progeny derived from multiple related crosses for a wide range of agronomic traits and for tens to hundreds of molecular markers. This study was conducted to determine how these phenotypic and genetic data could be used for routinely mapping quantitative trait loci (QTLs). With dense maps, haplotype sharing of parents in a certain region is a good indicator for QTL-allele sharing, albeit not 100% perfect. With this in mind, an approximate and simple method has been developed where ancestral genome blocks in the parents of the crosses can be identified via haplotype analysis and where the effect of a putative QTL is then modeled and estimated per ancestral genome block. A simulation of an early-generation maize breeding scheme demonstrates the potential of the present approach for QTL detection in existing breeding programs. With this new QTL mapping strategy, the power, precision, and accuracy associated with large numbers of progeny may be attained, inferences about QTLs can be drawn across the breeding program rather than being limited to the sample of progeny from a single cross, and results may be much more valuable for marker-assisted breeding because the QTLs apply to agronomically challenging situations in the field.