Chapter

Introduction to GWAS and MutMap for identification of genes/QTL using next-generation sequencing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Most of the morphological and physiological traits including macronutrient use efficiency are quantitative and are controlled by many quantitative trait loci (QTL) working together. So far, QTL analysis is one of the most powerful tools to identify QTL. However, classical QTL analysis is costly, laborious, and time consuming. Here we introduce two adaptive progressive approaches, the genome-wide association study (GWAS) and MutMap, based on next-generation sequencing (NGS), and discuss the experimental designs for NGS-based analysis. GWAS and MutMap are the most powerful approaches for identification of the causal genes/QTL underlying complex traits. These approaches will accelerate the understanding of the molecular mechanisms and breeding selection for increasing macronutrient use efficiency.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Article
Full-text available
Background: Progress in genetics and breeding in pea still suffers from the limited availability of molecular resources. SNP markers that can be identified through affordable sequencing processes, without the need for prior genome reduction or a reference genome to assemble sequencing data would allow the discovery and genetic mapping of thousands of molecular markers. Such an approach could significantly speed up genetic studies and marker assisted breeding for non-model species. Results: A total of 419,024 SNPs were discovered using HiSeq whole genome sequencing of four pea lines, followed by direct identification of SNP markers without assembly using the discoSnp tool. Subsequent filtering led to the identification of 131,850 highly designable SNPs, polymorphic between at least two of the four pea lines. A subset of 64,754 SNPs was called and genotyped by short read sequencing on a subpopulation of 48 RILs from the cross 'Baccara' x 'PI180693'. This data was used to construct a WGGBS-derived pea genetic map comprising 64,263 markers. This map is collinear with previous pea consensus maps and therefore with the Medicago truncatula genome. Sequencing of four additional pea lines showed that 33 % to 64 % of the mapped SNPs, depending on the pairs of lines considered, are polymorphic and can therefore be useful in other crosses. The subsequent genotyping of a subset of 1000 SNPs, chosen for their mapping positions using a KASP™ assay, showed that almost all generated SNPs are highly designable and that most (95 %) deliver highly qualitative genotyping results. Using rather low sequencing coverages in SNP discovery and in SNP inferring did not hinder the identification of hundreds of thousands of high quality SNPs. Conclusions: The development and optimization of appropriate tools in SNP discovery and genetic mapping have allowed us to make available a massive new genomic resource in pea. It will be useful for both fine mapping within chosen QTL confidence intervals and marker assisted breeding for important traits in pea improvement.
Article
Full-text available
Background: Since the development of transcriptome analysis systems, many expression evolution studies characterized evolutionary forces acting on gene expression, without explicit discrimination between global expression differences and tissue specific expression differences. However, different types of gene expression alteration should have different effects on an organism, the evolutionary forces that act on them might be different, and different types of genes might show different types of differential expression between species. To confirm this, we studied differentially expressed (DE) genes among closely related groups that have extensive gene expression atlases, and clarified characteristics of different types of DE genes including the identification of regulating loci for differential expression using expression quantitative loci (eQTL) analysis data. Results: We detected differentially expressed (DE) genes between rice subspecies in five homologous tissues that were verified using japonica and indica transcriptome atlases in public databases. Using the transcriptome atlases, we classified DE genes into two types, global DE genes and changed-tissues DE genes. Global type DE genes were not expressed in any tissues in the atlas of one subspecies, however changed-tissues type DE genes were expressed in both subspecies with different tissue specificity. For the five tissues in the two japonica-indica combinations, 4.6 ± 0.8 and 5.9 ± 1.5 % of highly expressed genes were global and changed-tissues DE genes, respectively. Changed-tissues DE genes varied in number between tissues, increasing linearly with the abundance of tissue specifically expressed genes in the tissue. Molecular evolution of global DE genes was rapid, unlike that of changed-tissues DE genes. Based on gene ontology, global and changed-tissues DE genes were different, having no common GO terms. Expression differences of most global DE genes were regulated by cis-eQTLs. Expression evolution of changed-tissues DE genes was rapid in tissue specifically expressed genes and those rapidly evolved changed-tissues DE genes were regulated not by cis-eQTLs, but by complicated trans-eQTLs. Conclusions: Global DE genes and changed-tissues DE genes had contrasting characteristics. The two contrasting types of DE genes provide possible explanations for the previous controversial conclusions about the relationships between molecular evolution and expression evolution of genes in different species, and the relationship between expression breadth and expression conservation in evolution.
Article
Full-text available
A multiparent advanced-generation intercross population of maize has been developed to help plant geneticists identify sequence variants affecting important agricultural traits.
Article
Full-text available
Key message: Significant SNPs and candidate genes for symbiotic nitrogen fixation (SNF) and related traits were identified on Pv03, Pv07 and Pv09 chromosomes of common bean. A genome-wide association study (GWAS) was conducted to explore the genetic basis of variation for symbiotic nitrogen fixation (SNF) and related traits in the Andean Diversity Panel (ADP) comprising 259 common bean (Phaseolus vulgaris) genotypes. The ADP was evaluated for SNF and related traits in both greenhouse and field experiments. After accounting for population structure and cryptic relatedness, significant SNPs were identified on chromosomes Pv03, Pv07 and Pv09 for nitrogen derived from atmosphere (Ndfa) in the shoot at flowering, and for Ndfa in seed. The SNPs for Ndfa in shoot and Ndfa in seed co-localized on Pv03 and Pv09. Two genes Phvul.007G050500 and Phvul.009G136200 that code for leucine-rich repeat receptor-like protein kinases (LRR-RLK) were identified as candidate genes for Ndfa. LRR-RLK genes play a key role in signal transduction required for nodule formation. Significant SNPs identified in this study could potentially be used in marker-assisted breeding to accelerate genetic improvement of common bean for SNF.
Article
Full-text available
Argentina has a long tradition of sunflower breeding, and its germplasm is a valuable genetic resource worldwide. However, knowledge of the genetic constitution and variability levels of the Argentinean germplasm is still scarce, rendering the global map of cultivated sunflower diversity incomplete. In this study, 42 microsatellite loci and 384 single nucleotide polymorphisms (SNPs) were used to characterize the first association mapping population used for quantitative trait loci mapping in sunflower, along with a selection of allied open-pollinated and composite populations from the germplasm bank of the National Institute of Agricultural Technology of Argentina. The ability of different kinds of markers to assess genetic diversity and population structure was also evaluated. The analysis of polymorphism in the set of sunflower accessions studied here showed that both the microsatellites and SNP markers were informative for germplasm characterization, although to different extents. In general, the estimates of genetic variability were moderate. The average genetic diversity, as quantified by the expected heterozygosity, was 0.52 for SSR loci and 0.29 for SNPs. Within SSR markers, those derived from non-coding regions were able to capture higher levels of diversity than EST-SSR. A significant correlation was found between SSR and SNP- based genetic distances among accessions. Bayesian and multivariate methods were used to infer population structure. Evidence for the existence of three different genetic groups was found consistently across data sets (i.e., SSR, SNP and SSR + SNP), with the maintainer/restorer status being the most prevalent characteristic associated with group delimitation. The present study constitutes the first report comparing the performance of SSR and SNP markers for population genetics analysis in cultivated sunflower. We show that the SSR and SNP panels examined here, either used separately or in conjunction, allowed consistent estimations of genetic diversity and population structure in sunflower breeding materials. The generated knowledge about the levels of diversity and population structure of sunflower germplasm is an important contribution to this crop breeding and conservation.
Article
Full-text available
Key message This study identified 333 genomic regions associated to 28 traits related to nitrogen use efficiency in European winter wheat using genome-wide association in a 214-varieties panel experimented in eight environments. Abstract Improving nitrogen use efficiency is a key factor to sustainably ensure global production increase. However, while high-throughput screening methods remain at a developmental stage, genetic progress may be mainly driven by marker-assisted selection. The objective of this study was to identify chromosomal regions associated with nitrogen use efficiency-related traits in bread wheat (Triticum aestivum L.) using a genome-wide association approach. Two hundred and fourteen European elite varieties were characterised for 28 traits related to nitrogen use efficiency in eight environments in which two different nitrogen fertilisation levels were tested. The genome-wide association study was carried out using 23,603 SNP with a mixed model for taking into account parentage relationships among varieties. We identified 1,010 significantly associated SNP which defined 333 chromosomal regions associated with at least one trait and found colocalisations for 39 % of these chromosomal regions. A method based on linkage disequilibrium to define the associated region was suggested and discussed with reference to false positive rate. Through a network approach, colocalisations were analysed and highlighted the impact of genomic regions controlling nitrogen status at flowering, precocity, and nitrogen utilisation on global agronomic performance. We were able to explain 40 ± 10 % of the total genetic variation. Numerous colocalisations with previously published genomic regions were observed with such candidate genes as Ppd-D1, Rht-D1, NADH-Gogat, and GSe. We highlighted selection pressure on yield and nitrogen utilisation discussing allele frequencies in associated regions.
Article
Full-text available
TILLING (Targeting Induced Local Lesions IN Genomes) is a reverse genetic method that combines chemical mutagenesis with high-throughput genome-wide screening for point mutation detection in genes of interest. However, this mutation discovery approach faces a particular problem which is how to obtain a mutant population with a sufficiently high mutation density. Furthermore, plant mutagenesis protocols require two successive generations (M1, M2) for mutation fixation to occur before the analysis of the genotype can begin. Here, we describe a new TILLING approach for rice based on ethyl methanesulfonate (EMS) mutagenesis of mature seed-derived calli and direct screening of in vitro regenerated plants. A high mutagenesis rate was obtained (i.e. one mutation in every 451 Kb) when plants were screened for two senescence-related genes. Screening was carried out in 2400 individuals from a mutant population of 6912. Seven sense change mutations out of 15 point mutations were identified. This new strategy represents a significant advantage in terms of time-savings (i.e. more than eight months), greenhouse space and work during the generation of mutant plant populations. Furthermore, this effective chemical mutagenesis protocol ensures high mutagenesis rates thereby saving in waste removal costs and the total amount of mutagen needed thanks to the mutagenesis volume reduction.
Article
Full-text available
Rice is one of the most important crops in the world. The rice community needs to cooperate and share efforts and resources so that we can understand the functions of rice genes, especially those with a role in important agronomical traits, for application in agricultural production. Mutation is a major source of genetic variation that can be used for studying gene function. We will present here the status of mutant collections affected in a random manner by physical/chemical and insertion mutageneses. As of early September 2013, a total of 447, 919 flanking sequence tags from rice mutant libraries with T-DNA, Ac/Ds, En/Spm, Tos17, nDART/aDART insertions have been collected and publicly available. From these, 336,262 sequences are precisely positioned on the japonica rice chromosomes, and 67.5% are in gene interval. We discuss the genome coverage and preference of the insertion, issues limiting the exchange and use of the current collections, as well as new and improved resources. We propose a call to renew all mutant populations as soon as possible. We also suggest that a common web portal should be established for ordering seeds.
Article
Full-text available
Background This article describes the development of Multi-parent Advanced Generation Inter-Cross populations (MAGIC) in rice and discusses potential applications for mapping quantitative trait loci (QTLs) and for rice varietal development. We have developed 4 multi-parent populations: indica MAGIC (8 indica parents); MAGIC plus (8 indica parents with two additional rounds of 8-way F1 inter-crossing); japonica MAGIC (8 japonica parents); and Global MAGIC (16 parents – 8 indica and 8 japonica). The parents used in creating these populations are improved varieties with desirable traits for biotic and abiotic stress tolerance, yield, and grain quality. The purpose is to fine map QTLs for multiple traits and to directly and indirectly use the highly recombined lines in breeding programs. These MAGIC populations provide a useful germplasm resource with diverse allelic combinations to be exploited by the rice community. Results The indica MAGIC population is the most advanced of the MAGIC populations developed thus far and comprises 1328 lines produced by single seed descent (SSD). At the S4 stage of SSD a subset (200 lines) of this population was genotyped using a genotyping-by-sequencing (GBS) approach and was phenotyped for multiple traits, including: blast and bacterial blight resistance, salinity and submergence tolerance, and grain quality. Genome-wide association mapping identified several known major genes and QTLs including Sub1 associated with submergence tolerance and Xa4 and xa5 associated with resistance to bacterial blight. Moreover, the genome-wide association study (GWAS) results also identified potentially novel loci associated with essential traits for rice improvement. Conclusion The MAGIC populations serve a dual purpose: permanent mapping populations for precise QTL mapping and for direct and indirect use in variety development. Unlike a set of naturally diverse germplasm, this population is tailor-made for breeders with a combination of useful traits derived from multiple elite breeding lines. The MAGIC populations also present opportunities for studying the interactions of genome introgressions and chromosomal recombination.
Article
Full-text available
Background Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group). Results The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community. Conclusions A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies.
Article
Key message: Using GWAS, 13 significant SNPs distributed on six of the seven Aegilops tauschii chromosomes (all but 5D) were identified, and several candidate P-deficiency-responsive genes were proposed from searches of public databases. Aegilops tauschii, the wheat (Triticum aestivum) D-genome progenitor, possesses numerous genes for stress resistance, including genes for tolerance of phosphorus (P) deficiency. Investigation of the genetic architecture of A. tauschii will help in developing P-deficiency-tolerant varieties of wheat. We evaluated nine traits in a population of 380 A. tauschii specimens under conditions with and without P application, and we performed genome-wide association studies for these traits using single nucleotide polymorphism (SNP) chips containing 7185 markers. Using a general linear model, we identified 119 SNPs that were significantly associated with all nine traits, and a mixed linear model revealed 18 SNPs associated with all traits. Both models detected 13 significant markers distributed on six of the seven A. tauschii chromosomes (all but 5D). Searches of public databases revealed several candidate/flanking genes related to P-deficiency tolerance. These genes were grouped in five categories by the types of proteins they encoded: defense response proteins, enzymes, promoters and transcription factors, storage proteins, or proteins triggered by P deficiency. The identified SNPs and genes contain essential information for cloning genes related to P-deficiency tolerance in A. tauschii and wheat, and they provide a foundation for breeding P-deficiency tolerant wheat cultivars.
Article
The advent of next generation sequencing has influenced every aspect of biological research. Many labs are now using whole genome sequencing in Arabidopsis thaliana as a means to quickly identify EMS-generated mutations present in isolated mutants. Following identification of these mutations, examination of T-DNA insertional alleles defective in candidate genes or complementation of the mutant phenotype with a wild type copy of candidate genes can be used to verify which mutation is causative for the phenotype of interest. Here, we discuss the benefits and pitfalls of using this method to identify mutations underlying phenotypes.
Article
We present the first results from a novel multiparent advanced generation inter-cross (MAGIC) population derived from four elite wheat cultivars. The large size of this MAGIC population (1579 progeny), its diverse genetic composition and high levels of recombination all contribute to its value as a genetic resource. Applications of this resource include interrogation of the wheat genome and the analysis of gene-trait association in agronomically important wheat phenotypes. Here, we report the utilization of a MAGIC population for the first time for linkage map construction. We have constructed a linkage map with 1162 DArT, single nucleotide polymorphism and simple sequence repeat markers distributed across all 21 chromosomes. We benchmark this map against a high-density DArT consensus map created by integrating more than 100 biparental populations. The linkage map forms the basis for further exploration of the genetic architecture within the population, including characterization of linkage disequilibrium, founder contribution and inclusion of an alien introgression into the genetic map. Finally, we demonstrate the application of the resource for quantitative trait loci mapping using the complex traits plant height and hectolitre weight as a proof of principle.