[show abstract][hide abstract] ABSTRACT: The availability of reliable genetic linkage maps is crucial for functional and evolutionary genomic analyses. Established theory and methods of genetic linkage analysis have made map construction a routine exercise in diploids. However, many evolutionarily, ecologically, and/or agronomically important species are autopolyploids, with autotetraploidy being a typical example. These species undergo much more complicated chromosomal segregation and recombination at meiosis than diploids. In addition, there is evidence of polyploidy-induced and highly dynamic changes in the structure of the genome. These polysomic characteristics indicate the inappropriateness of the theory and methods of linkage analysis in diploids for use in these species and a gap in the theory and methodology of tetraploid map construction. This paper presents a theoretical model and statistical framework for multilocus linkage analysis in autotetraploids for use with dominant and/or codominant DNA molecular markers. The theory and methods incorporate the essential features of allele segregation and recombination under tetrasomic inheritance and the major challenges in statistical modeling and marker data analysis. We validated the method and explored its statistical properties by intensive simulation study and demonstrated its utility by analysis of AFLP and SSR marker data from an outbred autotetraploid potato population.
Proceedings of the National Academy of Sciences 02/2010; 107(9):4270-4. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: It is well known that Affymetrix microarrays are widely used to predict genome-wide gene expression and genome-wide genetic polymorphisms from RNA and genomic DNA hybridization experiments, respectively. It has recently been proposed to integrate the two predictions by use of RNA microarray data only. Although the ability to detect single feature polymorphisms (SFPs) from RNA microarray data has many practical implications for genome study in both sequenced and unsequenced species, it raises enormous challenges for statistical modelling and analysis of microarray gene expression data for this objective. Several methods are proposed to predict SFPs from the gene expression profile. However, their performance is highly vulnerable to differential expression of genes. The SFPs thus predicted are eventually a reflection of differentially expressed genes rather than genuine sequence polymorphisms. To address the problem, we developed a novel statistical method to separate the binding affinity between a transcript and its targeting probe and the parameter measuring transcript abundance from perfect-match hybridization values of Affymetrix gene expression data. We implemented a Bayesian approach to detect SFPs and to genotype a segregating population at the detected SFPs. Based on analysis of three Affymetrix microarray datasets, we demonstrated that the present method confers a significantly improved robustness and accuracy in detecting the SFPs that carry genuine sequence polymorphisms when compared to its rivals in the literature. The method developed in this paper will provide experimental genomicists with advanced analytical tools for appropriate and efficient analysis of their microarray experiments and biostatisticians with insightful interpretation of Affymetrix microarray data.
[show abstract][hide abstract] ABSTRACT: Transcript abundance data from cRNA hybridizations to Affymetrix microarrays can potentially be used to identify genetic markers to facilitate high-throughput genotyping. We have shown that it is easily possible to use the information from Affymetrix expression arrays to accurately identify over 4,000 robust polymorphic transcript-derived markers (TDMs). We developed the method to identity TDM polymorphisms from experiments involving two tissues in two commercial varieties of barley and their doubled-haploid progeny. These TDMs represent ~18% of the total barley genes on the chip and can be used to predict the genotypes in an F(1)-derived, doubled-haploid population. According to our estimates, 35% of the TDMs reveal nucleotide polymorphism of the particular gene (single feature polymorphisms, SFPs) while 65% mark polymorphism resulting in extreme variation of gene expression (genetic expression markers, GEMs). These latter are probably mainly cis-acting regulators while a small proportion, approximately 5%, are loosely or un-linked transregulators.
Methods in molecular biology (Clifton, N.J.) 02/2009; 513:81-92.
[show abstract][hide abstract] ABSTRACT: A typical genetical genomics experiment results in four separate datasets; genotype, gene expression, higher-order phenotypic data and metadata that describe the protocols, processing and the array platform. Used in concert, these data sets provide the opportunity to perform genetic analysis at a systems level. Their predictive power is largely determined by the gene expression dataset where tens of millions of data points can be generated using currently available mRNA profiling technologies. Such large, multidimensional data sets often have value beyond that extracted during their initial analysis and interpretation, particularly if conducted on widely distributed reference genetic materials. Besides quality and scale, access to the data is of primary importance as accessibility potentially allows the extraction of considerable added value from the same primary dataset by the wider research community. Although the number of genetical genomics experiments in different plant species is rapidly increasing, none to date has been presented in a form that allows quick and efficient on-line testing for possible associations between genes, loci and traits of interest by an entire research community. DESCRIPTION: Using a reference population of 150 recombinant doubled haploid barley lines we generated novel phenotypic, mRNA abundance and SNP-based genotyping data sets, added them to a considerable volume of legacy trait data and entered them into the GeneNetwork http://www.genenetwork.org. GeneNetwork is a unified on-line analytical environment that enables the user to test genetic hypotheses about how component traits, such as mRNA abundance, may interact to condition more complex biological phenotypes (higher-order traits). Here we describe these barley data sets and demonstrate some of the functionalities GeneNetwork provides as an easily accessible and integrated analytical environment for exploring them. CONCLUSION: By integrating barley genotypic, phenotypic and mRNA abundance data sets directly within GeneNetwork's analytical environment we provide simple web access to the data for the research community. In this environment, a combination of correlation analysis and linkage mapping provides the potential to identify and substantiate gene targets for saturation mapping and positional cloning. By integrating datasets from an unsequenced crop plant (barley) in a database that has been designed for an animal model species (mouse) with a well established genome sequence, we prove the importance of the concept and practice of modular development and interoperability of software engineering for biological datasets.
[show abstract][hide abstract] ABSTRACT: We previously mapped mRNA transcript abundance traits (expression-QTL or eQTL) using the Barley1 Affymetrix array and 'whole plant' tissue from 139 progeny of the Steptoe x Morex (St/Mx) reference barley mapping population. Of the 22,840 probesets (genes) on the array, 15,987 reported transcript abundance signals that were suitable for eQTL analysis, and this revealed a genome-wide distribution of 23,738 significant eQTLs. Here we have explored the potential of using these mRNA abundance eQTL traits as surrogates for the identification of candidate genes underlying the interaction between barley and the wheat stem rust fungus Puccinia graminis f. sp. tritici. We re-analysed quantitative 'resistance phenotype' data collected on this population in 1990/1991 and identified six loci associated with barley's reaction to stem rust. One of these coincided with the major stem rust resistance locus Rpg1, that we had previously positionally cloned using this population. Correlation analysis between phenotype values for rust infection and mRNA abundance values reported by the 22,840 GeneChip probe sets placed Rpg1, which is on the Barley1 GeneChip, in the top five candidate genes for the major QTL on chromosome 7H corresponding to the location of Rpg1. A second co-located with the rpg4/Rpg5 stem rust resistance locus that has been mapped in a different population and the remaining four were novel. Correlation analyses identified candidate genes for the rpg4/Rpg5 locus on chromosome 5H. By combining our data with additional published mRNA profiling data sets, we identify a putative sensory transduction histidine kinase as a strong candidate for a novel resistance locus on chromosome 2H and compile candidate gene lists for the other three loci.
Theoretical and Applied Genetics 08/2008; 117(2):261-72. · 3.66 Impact Factor
[show abstract][hide abstract] ABSTRACT: Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.
The present study reports a comprehensive survey of the performance of all seven commonly used methods in evaluating genome-wide gene expression from a well-designed experiment using Affymetrix microarrays. The experiment profiled eight genetically divergent barley cultivars each with three biological replicates. The dataset so obtained confers a balanced and idealized structure for the present analysis. The methods were evaluated on their sensitivity for detecting differentially expressed genes, reproducibility of expression values across replicates, and consistency in calling differentially expressed genes. The number of genes detected as differentially expressed among methods differed by a factor of two or more at a given false discovery rate (FDR) level. Moreover, we propose the use of genes containing single feature polymorphisms (SFPs) as an empirical test for comparison among methods for the ability to detect true differential gene expression on the basis that SFPs largely correspond to cis-acting expression regulators. The PDNN method demonstrated superiority over all other methods in every comparison, whilst the default Affymetrix MAS5.0 method was clearly inferior.
A comprehensive assessment of seven commonly used data extraction methods based on an extensive barley Affymetrix gene expression dataset has shown that the PDNN method has superior performance for the detection of differentially expressed genes.
[show abstract][hide abstract] ABSTRACT: Expression divergence of duplicate genes is widely believed to be important for their retention and evolution of new function, although the mechanism that determines their expression divergence remains unclear. We use a genetical genomics approach to explore divergence in genetical control of yeast duplicate genes created by a whole-genome duplication that occurred about 100 MYA and those with a younger duplication age. The analysis reveals that duplicate genes have a significantly higher probability of sharing common genetic control than pairs of singleton genes. The expression quantitative trait loci (eQTLs) have diverged completely for a high proportion of duplicate pairs, whereas a substantially larger proportion of duplicates share common regulatory motifs after 100 Myr of divergent evolution. The similarity in both genetical control and cis motif structure for a duplicate pair is a reflection of its evolutionary age. This study reveals that up to 20% of variation in expression between ancient duplicate gene pairs in the yeast genome can be explained by both cis motif divergence (approximately 8%) and by trans eQTL divergence (approximately 10%). Initially, divergence in all 3 aspects of cis motif structure, trans-genetical control, and expression evolves coordinately with the coding sequence divergence of both young and old duplicate pairs. These findings highlight the importance of divergence in both cis motif structure and trans-genetical control in the diverse set of mechanisms underlying the expression divergence of yeast duplicate genes.
Molecular Biology and Evolution 12/2007; 24(11):2556-65. · 10.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: The connexins are the subunits of a family of proteins that form gap junctions, allowing ions and small molecules to move between adjacent cells. At least four connexins are expressed in the ear, and, although there are known mutations at >100 loci that can cause deafness, those involving DFNB1, in the interval 13q11-q12 containing the GJB2 and GJB6 genes coding for connexins 26 and 30, are the most frequent cause of recessive deafness in many populations. We have suggested that the combined effects of relaxed selection and linguistic homogamy can explain the high frequency of connexin deafness and may have doubled its incidence in this country during the past 200 years. In this report, we show by computer simulation that assortative mating, in fact, can accelerate dramatically the genetic response to relaxed selection. Along with the effects of gene drift and consanguinity, assortative mating also may have played a key role in the joint evolution and accelerated fixation of genes for speech after they first appeared in Homo sapiens 100,000-150,000 years ago.
The American Journal of Human Genetics 06/2004; 74(6):1081-7. · 11.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: A commonly encountered difficulty with the genetic engineering of crop plants is that different varieties of a particular species can show great variability in the efficiency with which they can be transformed. This increases the effort required to introduce transgenes into particular genetic backgrounds. The use of Substitution Lines has allowed the finer mapping of three Quantitative Trait Loci (tf1, tf2 and tf3) that explain 26% of the variation in the efficiency of Agrobacterium-mediated transformation in Brassica oleracea. Use of an 'orthogonal set' of genotypes (containing all eight possible combinations of 'positive' and 'negative' alleles at the three QTL), along with time course studies of transgene expression, has allowed the determination of the stages at which these genes have their effects during transformation. With regard to control of the level of transient transgene expression, tf1 (on LGO1) alone has no detectable effect, whilst tf2 (on LGO3) and tf3 (on LGO7) have highly significant effects (P < 0.001). All three loci have highly significant (P < 0.001) effects on the levels of expression of stably integrated transgene. The use of RFLP markers has shown that tf1 and tf2 are in duplicated regions of the B. oleracea genome and appear to be paralogous in origin. Colinearity of these regions with the A. thaliana genome has been identified. The results allow the selection of progeny Brassica oleracea genotypes that are more efficiently transformed than either parent used in the original cross.
[show abstract][hide abstract] ABSTRACT: We have assigned all nine linkage groups of a Brassica oleracea genetic map to each of the nine chromosomes of the karyotype derived from mitotic metaphase spreads of the B. oleracea var. alboglabra line A12DHd using FISH. The majority of probes were BACs, with A12DHd DNA inserts, which give clear, reliable FISH signals. We have added nine markers to the existing integrated linkage map, distributed over six linkage groups. BACs were definitively assigned to linkage map positions through development of locus-specific PCR assays. Integration of the cytogenetic and genetic linkage maps was achieved with 22 probes representing 19 loci. Four chromosomes (2, 4, 7, and 9) are in the same orientation as their respective linkage groups (O4, O7, O8, and O6) whereas four chromosomes (1, 3, 5, and 8) and linkage groups (O3, O9, O2, and O1) are in the opposite orientation. The remaining chromosome (6) is probably in the opposite orientation. The cytogenetic map is an important resource for locating probes with unknown genetic map positions and is also being used to analyze the relationships between genetic and cytogenetic maps.
[show abstract][hide abstract] ABSTRACT: ABSTRACT The two genomes,(A and C) of the allopolyploid B. napushave,been clearly distinguished using genomic,in situ hybridization (GISH) despite the fact that the two extant diploids,B. rapa(A, n=10) and B. oleracea(C, n=9), representing the progenitor genomes, are closely related. Using DNA from B. oleraceaas the probe, with B. rapa DNA and the intergenic spacer of the B. oleracea 45S rDNA as the block, hybridization occurred on nine of the nineteen chromosome,pairs along the majority of their length. The pattern of hybridizationconfirms,that the two genomes,have remained,distinct in B. napusline DH12075, with no significant genome homogenization and no large-scale