[Show abstract][Hide abstract] ABSTRACT: The availability of reliable genetic linkage maps is crucial for functional and evolutionary genomic analyses. Established theory and methods of genetic linkage analysis have made map construction a routine exercise in diploids. However, many evolutionarily, ecologically, and/or agronomically important species are autopolyploids, with autotetraploidy being a typical example. These species undergo much more complicated chromosomal segregation and recombination at meiosis than diploids. In addition, there is evidence of polyploidy-induced and highly dynamic changes in the structure of the genome. These polysomic characteristics indicate the inappropriateness of the theory and methods of linkage analysis in diploids for use in these species and a gap in the theory and methodology of tetraploid map construction. This paper presents a theoretical model and statistical framework for multilocus linkage analysis in autotetraploids for use with dominant and/or codominant DNA molecular markers. The theory and methods incorporate the essential features of allele segregation and recombination under tetrasomic inheritance and the major challenges in statistical modeling and marker data analysis. We validated the method and explored its statistical properties by intensive simulation study and demonstrated its utility by analysis of AFLP and SSR marker data from an outbred autotetraploid potato population.
Proceedings of the National Academy of Sciences 02/2010; 107(9):4270-4. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is well known that Affymetrix microarrays are widely used to predict genome-wide gene expression and genome-wide genetic polymorphisms from RNA and genomic DNA hybridization experiments, respectively. It has recently been proposed to integrate the two predictions by use of RNA microarray data only. Although the ability to detect single feature polymorphisms (SFPs) from RNA microarray data has many practical implications for genome study in both sequenced and unsequenced species, it raises enormous challenges for statistical modelling and analysis of microarray gene expression data for this objective. Several methods are proposed to predict SFPs from the gene expression profile. However, their performance is highly vulnerable to differential expression of genes. The SFPs thus predicted are eventually a reflection of differentially expressed genes rather than genuine sequence polymorphisms. To address the problem, we developed a novel statistical method to separate the binding affinity between a transcript and its targeting probe and the parameter measuring transcript abundance from perfect-match hybridization values of Affymetrix gene expression data. We implemented a Bayesian approach to detect SFPs and to genotype a segregating population at the detected SFPs. Based on analysis of three Affymetrix microarray datasets, we demonstrated that the present method confers a significantly improved robustness and accuracy in detecting the SFPs that carry genuine sequence polymorphisms when compared to its rivals in the literature. The method developed in this paper will provide experimental genomicists with advanced analytical tools for appropriate and efficient analysis of their microarray experiments and biostatisticians with insightful interpretation of Affymetrix microarray data.
[Show abstract][Hide abstract] ABSTRACT: Transcript abundance data from cRNA hybridizations to Affymetrix microarrays can potentially be used to identify genetic markers to facilitate high-throughput genotyping. We have shown that it is easily possible to use the information from Affymetrix expression arrays to accurately identify over 4,000 robust polymorphic transcript-derived markers (TDMs). We developed the method to identity TDM polymorphisms from experiments involving two tissues in two commercial varieties of barley and their doubled-haploid progeny. These TDMs represent ~18% of the total barley genes on the chip and can be used to predict the genotypes in an F(1)-derived, doubled-haploid population. According to our estimates, 35% of the TDMs reveal nucleotide polymorphism of the particular gene (single feature polymorphisms, SFPs) while 65% mark polymorphism resulting in extreme variation of gene expression (genetic expression markers, GEMs). These latter are probably mainly cis-acting regulators while a small proportion, approximately 5%, are loosely or un-linked transregulators.
[Show abstract][Hide abstract] ABSTRACT: A typical genetical genomics experiment results in four separate data sets; genotype, gene expression, higher-order phenotypic data and metadata that describe the protocols, processing and the array platform. Used in concert, these data sets provide the opportunity to perform genetic analysis at a systems level. Their predictive power is largely determined by the gene expression dataset where tens of millions of data points can be generated using currently available mRNA profiling technologies. Such large, multidimensional data sets often have value beyond that extracted during their initial analysis and interpretation, particularly if conducted on widely distributed reference genetic materials. Besides quality and scale, access to the data is of primary importance as accessibility potentially allows the extraction of considerable added value from the same primary dataset by the wider research community. Although the number of genetical genomics experiments in different plant species is rapidly increasing, none to date has been presented in a form that allows quick and efficient on-line testing for possible associations between genes, loci and traits of interest by an entire research community.
Using a reference population of 150 recombinant doubled haploid barley lines we generated novel phenotypic, mRNA abundance and SNP-based genotyping data sets, added them to a considerable volume of legacy trait data and entered them into the GeneNetwork http://www.genenetwork.org. GeneNetwork is a unified on-line analytical environment that enables the user to test genetic hypotheses about how component traits, such as mRNA abundance, may interact to condition more complex biological phenotypes (higher-order traits). Here we describe these barley data sets and demonstrate some of the functionalities GeneNetwork provides as an easily accessible and integrated analytical environment for exploring them.
By integrating barley genotypic, phenotypic and mRNA abundance data sets directly within GeneNetwork's analytical environment we provide simple web access to the data for the research community. In this environment, a combination of correlation analysis and linkage mapping provides the potential to identify and substantiate gene targets for saturation mapping and positional cloning. By integrating datasets from an unsequenced crop plant (barley) in a database that has been designed for an animal model species (mouse) with a well established genome sequence, we prove the importance of the concept and practice of modular development and interoperability of software engineering for biological data sets.
[Show abstract][Hide abstract] ABSTRACT: We previously mapped mRNA transcript abundance traits (expression-QTL or eQTL) using the Barley1 Affymetrix array and 'whole plant' tissue from 139 progeny of the Steptoe x Morex (St/Mx) reference barley mapping population. Of the 22,840 probesets (genes) on the array, 15,987 reported transcript abundance signals that were suitable for eQTL analysis, and this revealed a genome-wide distribution of 23,738 significant eQTLs. Here we have explored the potential of using these mRNA abundance eQTL traits as surrogates for the identification of candidate genes underlying the interaction between barley and the wheat stem rust fungus Puccinia graminis f. sp. tritici. We re-analysed quantitative 'resistance phenotype' data collected on this population in 1990/1991 and identified six loci associated with barley's reaction to stem rust. One of these coincided with the major stem rust resistance locus Rpg1, that we had previously positionally cloned using this population. Correlation analysis between phenotype values for rust infection and mRNA abundance values reported by the 22,840 GeneChip probe sets placed Rpg1, which is on the Barley1 GeneChip, in the top five candidate genes for the major QTL on chromosome 7H corresponding to the location of Rpg1. A second co-located with the rpg4/Rpg5 stem rust resistance locus that has been mapped in a different population and the remaining four were novel. Correlation analyses identified candidate genes for the rpg4/Rpg5 locus on chromosome 5H. By combining our data with additional published mRNA profiling data sets, we identify a putative sensory transduction histidine kinase as a strong candidate for a novel resistance locus on chromosome 2H and compile candidate gene lists for the other three loci.
Theoretical and Applied Genetics 08/2008; 117(2):261-72. · 3.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.
The present study reports a comprehensive survey of the performance of all seven commonly used methods in evaluating genome-wide gene expression from a well-designed experiment using Affymetrix microarrays. The experiment profiled eight genetically divergent barley cultivars each with three biological replicates. The dataset so obtained confers a balanced and idealized structure for the present analysis. The methods were evaluated on their sensitivity for detecting differentially expressed genes, reproducibility of expression values across replicates, and consistency in calling differentially expressed genes. The number of genes detected as differentially expressed among methods differed by a factor of two or more at a given false discovery rate (FDR) level. Moreover, we propose the use of genes containing single feature polymorphisms (SFPs) as an empirical test for comparison among methods for the ability to detect true differential gene expression on the basis that SFPs largely correspond to cis-acting expression regulators. The PDNN method demonstrated superiority over all other methods in every comparison, whilst the default Affymetrix MAS5.0 method was clearly inferior.
A comprehensive assessment of seven commonly used data extraction methods based on an extensive barley Affymetrix gene expression dataset has shown that the PDNN method has superior performance for the detection of differentially expressed genes.
[Show abstract][Hide abstract] ABSTRACT: Expression divergence of duplicate genes is widely believed to be important for their retention and evolution of new function, although the mechanism that determines their expression divergence remains unclear. We use a genetical genomics approach to explore divergence in genetical control of yeast duplicate genes created by a whole-genome duplication that occurred about 100 MYA and those with a younger duplication age. The analysis reveals that duplicate genes have a significantly higher probability of sharing common genetic control than pairs of singleton genes. The expression quantitative trait loci (eQTLs) have diverged completely for a high proportion of duplicate pairs, whereas a substantially larger proportion of duplicates share common regulatory motifs after 100 Myr of divergent evolution. The similarity in both genetical control and cis motif structure for a duplicate pair is a reflection of its evolutionary age. This study reveals that up to 20% of variation in expression between ancient duplicate gene pairs in the yeast genome can be explained by both cis motif divergence (approximately 8%) and by trans eQTL divergence (approximately 10%). Initially, divergence in all 3 aspects of cis motif structure, trans-genetical control, and expression evolves coordinately with the coding sequence divergence of both young and old duplicate pairs. These findings highlight the importance of divergence in both cis motif structure and trans-genetical control in the diverse set of mechanisms underlying the expression divergence of yeast duplicate genes.
Molecular Biology and Evolution 12/2007; 24(11):2556-65. · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: QTL analysis of 16 morphological/developmental traits is reported for the Landsberg erectaCape Verde islands (Ler Cvi) population of RILs, together with their genotypes for 16 SSR markers. A total of 43 QTL were found across all 5 chromosomes for the 16 traits analyzed; 8 QTL that control height, 19 QTL for leaf characters and 16 QTL for flowering characters. These QTL form six distinct clusters spread across chromosomes 1–4, while chromosome 5 has QTL along its length. We have confirmed four QTL identified by others, revealed several new QTL affecting flowering and height traits and demonstrated epistasis for several traits. We have identified several possible candidate genes for these QTL, some of which are potentially relevant to plant breeding aims.