- [Show abstract] [Hide abstract] ABSTRACT: Cotton fibers represent the largest single cell in plants and they serve as models to study cell development. This study investigated the distribution and evolution of fiber Unigenes anchored to recombination hotspots between tetraploid cotton (Gossypium hirsutum) At and Dt subgenomes, and within a parental diploid cotton (G. raimondii) D genome. Comparative analysis of At vs D and Dt vs D showed that 1) the D genome provides many fiber genes after its merger with another parental diploid cotton (G. arboreum) A genome although the D genome itself does not produce any spinnable fiber; 2) similarity of fiber genes is higher between At vs D than between Dt vs D genomic hotspots. This is the first report that fiber genes have higher similarity between At and D than between Dt and D. The finding provides new insights into cotton genomic regions that would facilitate genetic improvement of natural fiber properties. Copyright © 2015. Published by Elsevier Inc.
- [Show abstract] [Hide abstract] ABSTRACT: Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein we review parametric methods including Least Squares Regression, Ridge Regression, Bayesian Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson Estimator, Reproducing Kernel Hilbert Space, Support Vector Machine Regression, and Neural Networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely upon epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE.
- [Show abstract] [Hide abstract] ABSTRACT: The discovery of sequence variants and their functionality has been increasing with new genotyping technologies and mapping studies across plant species. Sequence polymorphisms associated with trait variation can be converted into Functional Markers (FMs), which are derived from polymorphic sites within genes or regulatory sequences causally affecting phenotypic trait variation. The effects of FMs on the phenotypic variation of complex inherited traits, however, are usually dependent on epistatic and environment interactions. In this review, we propose the term “potential” to define effects of FMs. The potential of a FM is based on distribution of allelic effects across environments and backgrounds, and the probability of achieving a phenotype of interest. The use of FM in plant breeding programs and how FMs may be combined with Genomic Selection to increase genetic gain are also discussed.
- [Show abstract] [Hide abstract] ABSTRACT: The use of nonlinear functions provides a concise measure of a variety of physiological traits associated with growth and development that would otherwise be difficult to observe. Maize kernel growth and development is a complex process, and the description of the processes involved benefits from the application of a nonlinear function to the process over growing degree days (GDD), a measure of heat accumulation over a period of development. The objective of this study was to compare and contrast canonical models of growth and development to determine which provided the best description of maize kernel biomass accumulation. Observations of kernel dry weights starting shortly after pollination through maturity were regressed onto a measure of thermal time. Observations from differing maize hybrids taken in two years with significantly different weather patterns were used to construct the model. Of the four nonlinear functions described, the Weibull and Gompertz functions were found to describe the pattern of biomass accumulation best. The Gompertz function was selected to describe kernel growth based on information criteria, biological interpretation of the parameters, and computational ease. Tests of autocorrelation and homogeneity of errors determined that the errors were heteroscedastic but autocorrelation was not an issue. The application of a variance function, which models the residuals as a function of the variance, was used to account for heteroscedastic errors. While the Gompertz function was selected as the best fit for use with this data set, it is suggested that future studies use the selection process described herein to determine the most appropriate function.
- [Show abstract] [Hide abstract] ABSTRACT: beta-glucan, a soluble fiber found in oat (Avena sativa L.) grain, is good for human health, and selection for higher levels of this compound is regarded as an important breeding objective. Recent advances in oat DNA markers present an opportunity to investigate new selection methods for polygenic traits such as beta-glucan concentration. Our objectives in this study were to compare genomic, marker-assisted, and best linear unbiased prediction (BLUP)-based phenotypic selection for short-term response to selection and ability to maintain genetic variance for beta-glucan concentration. Starting with a collection of 446 elite oat lines from North America, each method was conducted for two cycles. The average beta-glucan concentration increased from 4.57 g/100 g in Cycle 0 to between 6.66 and 6.88 g/100 g over the two cycles. The averages of marker-based selection methods in Cycle 2 were greater than those of phenotypic selection (P < 0.08). Progenies with the highest beta-glucan came from the marker-based selection methods. Marker-assisted selection (MAS) for higher beta-glucan concentration resulted in a later heading date. We also found that marker-based selection methods maintained greater genetic variance than did BLUP phenotypic selection, potentially enabling greater future selection gains. Overall, the results of these experiments suggest that genomic selection is a superior method for selecting a polygenic complex trait like beta-glucan concentration.
- [Show abstract] [Hide abstract] ABSTRACT: Identification of allelic variants associated with complex traits provides molecular genetic information associated with variability upon which both artificial and natural selections are based. Family-based association mapping (FBAM) takes advantage of linkage disequilibrium among segregating progeny within crosses and among parents to provide greater power than association mapping and greater resolution than linkage mapping. Herein, we discuss the potential adaption of human family-based association tests and quantitative transmission disequilibrium tests for use in crop species. The rapid technological advancement of next generation sequencing will enable sequencing of all parents in a planned crossing design, with subsequent imputation of genotypes for all segregating progeny. These technical advancements are easily adapted to mating designs routinely used by plant breeders. Thus, FBAM has the potential to be widely adopted for discovering alleles, common and rare, underlying complex traits in crop species.
- [Show abstract] [Hide abstract] ABSTRACT: Genome-wide association studies (GWAS) can be a useful approach to detect quantitative trait loci (QTL) controlling complex traits in crop plants. Oat (Avena sativa L.) beta-glucan is a soluble dietary fiber and has been shown to have positive health benefits. We report a GWAS involving 446 elite oat breeding lines from North America genotyped with 1005 diversity arrays technology (DArT) markers and with phenotypic data from both historical and balanced 2-yr data. Association analyses accounting for pair-wise relationships and population structure were conducted using single-marker tests and least absolute shrinkage and selection operator (LASSO). Single-marker tests yielded six and 15 significant markers for the historical and balanced data sets, respectively. The LASSO method selected 24 and 37 markers as the most important in explaining beta-glucan concentration for the historical and balanced data sets, respectively. Comparisons of genetic location showed that 15 of the markers in our study were found on the same linkage groups as QTL identified in previous studies. Four of the markers colocalized to within 4 cM of three previously detected QTL, suggesting concordance between QTL detected in our study and previous studies. Two of the significant markers were also adjacent to a beta-glucan candidate gene in the rice (Oryza sativa L.) genome. Our findings suggest that GWAS can be used for QTL detection for the purpose of gene discovery and for marker-assisted selection to improve beta-glucan concentration in elite oat.
- [Show abstract] [Hide abstract] ABSTRACT: Detection of quantitative trait loci (QTL) controlling complex traits followed by selection has become a common approach for selection in crop plants. The QTL are most often identified by linkage mapping using experimental F(2), backcross, advanced inbred, or doubled haploid families. An alternative approach for QTL detection are genome-wide association studies (GWAS) that use pre-existing lines such as those found in breeding programs. We explored the implementation of GWAS in oat (Avena sativa L.) to identify QTL affecting β-glucan concentration, a soluble dietary fiber with several human health benefits when consumed as a whole grain. A total of 431 lines of worldwide origin were tested over 2 years and genotyped using Diversity Array Technology (DArT) markers. A mixed model approach was used where both population structure fixed effects and pair-wise kinship random effects were included. Various mixed models that differed with respect to population structure and kinship were tested for their ability to control for false positives. As expected, given the level of population structure previously described in oat, population structure did not play a large role in controlling for false positives. Three independent markers were significantly associated with β-glucan concentration. Significant marker sequences were compared with rice and one of the three showed sequence homology to genes localized on rice chromosome seven adjacent to the CslF gene family, known to have β-glucan synthase function. Results indicate that GWAS in oat can be a successful option for QTL detection, more so with future development of higher-density markers.
- [Show abstract] [Hide abstract] ABSTRACT: The landscape of plant genomes, while slowly being characterized and defined, is still composed primarily of regions of undefined function. Many eukaryotic genomes contain isochore regions, mosaics of homogeneous GC content that can abruptly change from one neighboring isochore to the next. Isochores are broken into families that are characterized by their GC levels. We identified 4,339 compositionally distinct domains and 331 of these were identified as long homogeneous genome regions (LHGRs). We assigned these to four families based on finite mixture models of GC content. We then characterized each family with respect to exon length, gene content, and transposable elements. The LHGR pattern of soybeans is unique in that while the majority of the genes within LHGRs are found within a single LHGR family with a narrow GC range (Family B), that family is not the highest in GC content as seen in vertebrates and invertebrates. Instead Family B has a mean GC content of 35%. The range of GC content for all LHGRs is 16-59% GC which is a larger range than what is typical of vertebrates. This is the first study in which LHGRs have been identified in soybeans and the functions of the genes within the LHGRs have been analyzed.
Dataset: Supplementary Table S1
- [Show abstract] [Hide abstract] ABSTRACT: We present a multi-objective integer programming model for the gene stacking problem, which is to bring desirable alleles found in multiple inbred lines to a single target genotype. Pareto optimal solutions from the model provide strategic stacking schemes to maximize the likelihood of successfully creating the target genotypes and to minimize the number of generations associated with a stacking strategy. A consideration of genetic diversity is also incorporated in the models to preserve all desirable allelic variants in the target population. Although the gene stacking problem is proved to be NP-hard, we have been able to obtain Pareto frontiers for smaller sized instances within one minute using the state-of-the-art commercial computer solvers in our computational experiments.
- [Show abstract] [Hide abstract] ABSTRACT: Genomic selection (GS) is a method to estimate the breeding values of individuals by using markers throughout the genome. We evaluated the accuracies of GS using data from five traits on 446 oat (Avena sativa L.) lines genotyped with 1005 Diversity Array Technology (DArT) markers and two GS methods (ridge regression–best linear unbiased prediction [RR-BLUP] and BayesCπ) under various training designs. Our objectives were to (i) determine accuracy under increasing marker density and training population size, (ii) assess accuracies when data is divided over time, and (iii) examine accuracy in the presence of population structure. Accuracy increased as the number of markers and training size become larger. Including older lines in the training population increased or maintained accuracy, indicating that older generations retained information useful for predicting validation populations. The presence of population structure affected accuracy: when training and validation subpopulations were closely related accuracy was greater than when they were distantly related, implying that linkage disequilibrium (LD) relationships changed across subpopulations. Across many scenarios involving large training populations, the accuracy of BayesCπ and RR-BLUP did not differ. This empirical study provided evidence regarding the application of GS to hasten the delivery of cultivars through the use of inexpensive and abundant molecular markers available to the public sector.
- [Show abstract] [Hide abstract] ABSTRACT: Nested Association Mapping (NAM) has been proposed as a means to combine the power of linkage mapping with the resolution of association mapping. It is enabled through sequencing or array genotyping of parental inbred lines while using low-cost, low-density genotyping technologies for their segregating progenies. For purposes of data analyses of NAM populations, parental genotypes at a large number of Single Nucleotide Polymorphic (SNP) loci need to be projected to their segregating progeny. Herein we demonstrate how approximately 0.5 million SNPs that have been genotyped in 26 parental lines of the publicly available maize NAM population can be projected onto their segregating progeny using only 1,106 SNP loci that have been genotyped in both the parents and their 5,000 progeny. The challenge is to estimate both the genotype and genetic location of the parental SNP genotypes in segregating progeny. Both challenges were met by estimating their expected genotypic values conditional on observed flanking markers through the use of both physical and linkage maps. About 90%, of 500,000 genotyped SNPs from the maize HapMap project, were assigned linkage map positions using linear interpolation between the maize Accessioned Gold Path (AGP) and NAM linkage maps. Of these, almost 70% provided high probability estimates of genotypes in almost 5,000 recombinant inbred lines.
- [Show abstract] [Hide abstract] ABSTRACT: Identification of functional markers (FMs) provides information about the genetic architecture underlying complex traits. An approach that combines the strengths of linkage and association mapping, referred to as nested association mapping (NAM), has been proposed to identify FMs in many plant species. The ability to identify and resolve FMs for complex traits depends upon a number of factors including frequency of FM alleles, magnitudes of their genetic effects, disequilibrium among functional and nonfunctional markers, statistical analysis methods, and mating design. The statistical characteristics of power, accuracy, and precision to identify FMs with a NAM population were investigated using three simulation studies. The simulated data sets utilized publicly available genetic sequences and simulated FMs were identified using least-squares variable selection methods. Results indicate that FMs with simple additive genetic effects that contribute at least 5% to the phenotypic variability in at least five segregating families of a NAM population consisting of recombinant inbred progeny derived from 28 matings with a single reference inbred will have adequate power to accurately and precisely identify FMs. This resolution and power are possible even for genetic architectures consisting of disequilibrium among multiple functional and nonfunctional markers in the same genomic region, although the resolution of FMs will deteriorate rapidly if more than two FMs are tightly linked within the same amplicon. Finally, nested mating designs involving several reference parents will have a greater likelihood of resolving FMs than single reference designs.
- [Show abstract] [Hide abstract] ABSTRACT: The analysis of gene expression microarrays plays an important role in elucidating the functionality of genes, including the discovery of genetic interactions that regulate gene expression. Several methods for modeling such gene regulatory networks exist, including a variety of continuous and discrete models. Methods based on fuzzy logic provide an interesting alternative. However, the guidelines for modeling gene expression with fuzzy logic are fairly open, and the need arises to investigate how adjustments in the modeling scheme will affect the results. In this work, we modify an existing fuzzy logic algorithm to involve an arbitrary number of classification states, and investigate the limiting behavior as the number of states tends to infinity. We also propose a probabilistic model as an alternative to the fuzzy logic model. We investigate the behavior of both models using yeast cell-cycle data and the simulated data of Werhli et al. [A.V. Werhli, M. Grzegorczyk, D. Husmeier, Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models, and Bayesian networks, Bioinformatics 22 (2006) 2523–2531]. We found that altering the number of classification states in both the fuzzy logic and probability models can influence which networks are predicted by both models. As the number of states tends to infinity, the predictions made by both models converge to those of a regression model. Models with a small to moderate number of classification states produced better results from a biological standpoint, compared to models with higher numbers of states. In simulated data, models with differing numbers of classification states produced similar overall results. Thus, increasing the complexity of the models has no apparent benefit, and models with smaller numbers of classification states are therefore preferred based on their ease of linguistic interpretation. The software used in this paper is freely available for non-commercial use at http://louisville.edu/~g0broc01.
- [Show abstract] [Hide abstract] ABSTRACT: Legume seed-storage proteins, such as those from soybean and peanut, represent an important source of protein in die human diet. While generally well tolerated, these proteins also represent a potentially serious allergenic threat to many individuals. Despite high sequence similarity of peanut allergens to their relatively non-allergenic soy and kidney bean counterparts, it is not well understood why peanuts elicit such an acute allergenic response. We have employed a number of bioinformatics tools and strategies to further investigate the relationship between allergenicity and the sequence, fold, and three-dimensional structures of legume seed-storage proteins. We mapped and compared multiple features including sequence conservation amongst protein families, physical location of IgE epitopes, and residues critical for IgE binding on known and modeled protein structures. These comparisons provide for a better understanding of the relationship between legume protein structure and allergenic response in the human population so that novel legume varieties with reduced allergenicity can be identified
- [Show abstract] [Hide abstract] ABSTRACT: Infection is a leading cause of neonatal morbidity and mortality worldwide. Premature neonates are particularly susceptible to infection because of physiologic immaturity, comorbidity, and extraneous medical interventions. Additionally premature infants are at higher risk of progression to sepsis or severe sepsis, adverse outcomes, and antimicrobial toxicity. Currently initial diagnosis is based upon clinical suspicion accompanied by nonspecific clinical signs and is confirmed upon positive microbiologic culture results several days after institution of empiric therapy. There exists a significant need for rapid, objective, in vitro tests for diagnosis of infection in neonates who are experiencing clinical instability. We used immunoassays multiplexed on microarrays to identify differentially expressed serum proteins in clinically infected and non-infected neonates. Immunoassay arrays were effective for measurement of more than 100 cytokines in small volumes of serum available from neonates. Our analyses revealed significant alterations in levels of eight serum proteins in infected neonates that are associated with inflammation, coagulation, and fibrinolysis. Specifically P- and E-selectins, interleukin 2 soluble receptor alpha, interleukin 18, neutrophil elastase, urokinase plasminogen activator and its cognate receptor, and C-reactive protein were observed at statistically significant increased levels. Multivariate classifiers based on combinations of serum analytes exhibited better diagnostic specificity and sensitivity than single analytes. Multiplexed immunoassays of serum cytokines may have clinical utility as an adjunct for rapid diagnosis of infection and differentiation of etiologic agent in neonates with clinical decompensation.
- [Show abstract] [Hide abstract] ABSTRACT: Although genetic studies have been critically important for the identification of therapeutic targets in Mendelian disorders, genetic approaches aiming to identify targets for common, complex diseases have traditionally had much more limited success. However, during the past year, a novel genetic approach - genome-wide association (GWA) - has demonstrated its potential to identify common genetic variants associated with complex diseases such as diabetes, inflammatory bowel disease and cancer. Here, we highlight some of these recent successes, and discuss the potential for GWA studies to identify novel therapeutic targets and genetic biomarkers that will be useful for drug discovery, patient selection and stratification in common diseases.
- [Show abstract] [Hide abstract] ABSTRACT: Schizophrenia (SCZ) is a common, disabling mental illness with high heritability but complex, poorly understood genetic etiology. As the first phase of a genomic convergence analysis of SCZ, we generated 16.7 billion nucleotides of short read, shotgun sequences of cDNA from post-mortem cerebellar cortices of 14 patients and six, matched controls. A rigorous analysis pipeline was developed for analysis of digital gene expression studies. Sequences aligned to approximately 33,200 transcripts in each sample, with average coverage of 450 reads per gene. Following adjustments for confounding clinical, sample and experimental sources of variation, 215 genes differed significantly in expression between cases and controls. Golgi apparatus, vesicular transport, membrane association, Zinc binding and regulation of transcription were over-represented among differentially expressed genes. Twenty three genes with altered expression and involvement in presynaptic vesicular transport, Golgi function and GABAergic neurotransmission define a unifying molecular hypothesis for dysfunction in cerebellar cortex in SCZ.
Iowa State University
Ames, Iowa, United States
- Department of Agronomy
National Center for Genome ResourcesSanta Fe, New Mexico, United States
University of Groningen
Groningen, Groningen, Netherlands
- Institute for Mathematics and Computing Science (IWI)