Article

Skim-Based Genotyping by Sequencing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Genotyping by sequencing (GBS) is a relatively new method used to determine the differences in the genetic makeup of individuals. Its novelty stems from a combination of two already available methods: genotyping and next-generation sequencing. Depending on the individual study design GBS protocols can take multiple forms, however most share a sequence of core steps that have to be undertaken. These include: sequencing of the DNA from the individuals of interest (usually two parents of a mapping population and their progeny), mapping of the sequencing reads to the reference sequence, SNP calling and filtering, SNP genotyping and imputation, followed by haplotype identification and downstream analysis. GBS has a range of applications from general marker discovery, haplotype identification, and recombination characterization to quantitative trait locus (QTL) analysis, genome-wide association studies (GWAS), and genomic selection (GS). It has already been applied to a range of plant species including: rice, maize, artichoke, and Arabidopsis thaliana. It is a promising approach which is likely to provide new and important insights into plant biology.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Skim sequencing (SkimSeq) is one of the less complex NGS methods, which uses low coverage (1-10X) whole genome sequencing of multiple individuals for high resolution genotyping [45][46][47][48]. SkimSeq is less laborious with fewer complex steps, is unbiased towards specific alleles, is capable of SNPs detection, which enables informative sampling and validation of the genome [49][50][51][52][53]. ...
... Given that the SkimSeq is known as low coverage genome sequencing approach [38], and so far have only been applied in plant genomics research [45][46][47][48], we have chosen this technique to generate high resolution sequence dataset, as reported in previous studies, for the first aquatic invertebrate with small samples size. To explore the genetic diversity, population structure and discover the novel molecular markers of P. monodon from different origins (wild vs. domesticated), we genetically assayed 50 individuals with the SkimSeq approach using the short read Illumina sequencing platform. ...
... genotyping by sequencing approach is useful in population study with unknown parental genome information to generate detailed diversity analysis and marker-assisted selections [45,53]. Given that the SkimSeq is known as low coverage genome sequencing approach [38], and so far have only been applied in plant genomics research [45][46][47][48], we have chosen this technique to generate high resolution sequence dataset, as reported in previous studies, for the first aquatic invertebrate with small samples size. To explore the genetic diversity, population structure and discover the novel molecular markers of P. monodon from different origins (wild vs. domesticated), we genetically assayed 50 individuals with the SkimSeq approach using the short read Illumina sequencing platform. ...
Article
Full-text available
The domestication of a wild-caught aquatic animal is an evolutionary process, which results in genetic discrimination at the genomic level in response to strong artificial selection. Although black tiger shrimp (Penaeus monodon) is one of the most commercially important aquaculture species, a systematic assessment of genetic divergence and structure of wild-caught and domesticated broodstock populations of the species is yet to be documented. Therefore, we used skim sequencing (SkimSeq) based genotyping approach to investigate the genetic structure of 50 broodstock individuals of P. monodon species, collected from five sampling sites (n = 10 in each site) across their distribution in Indo-Pacific regions. The wild-caught P. monodon broodstock population were collected from Malaysia (MS) and Japan (MJ), while domesticated broodstock populations were collected from Madagascar (MMD), Hawaii, HI, USA (MMO), and Thailand (MT). After various filtering process, a total of 194,259 single nucleotide polymorphism (SNP) loci were identified, in which 4983 SNP loci were identified as putatively adaptive by the pcadapt approach. In both datasets, pairwise FST estimates high genetic divergence between wild and domesticated broodstock populations. Consistently, different spatial clustering analyses in both datasets categorized divergent genetic structure into two clusters: (1) wild-caught populations (MS and MJ), and (2) domesticated populations (MMD, MMO and MT). Among 4983 putatively adaptive SNP loci, only 50 loci were observed to be in the coding region. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses suggested that non-synonymous mutated genes might be associated with the energy production, metabolic functions, respiration regulation and developmental rates, which likely act to promote adaptation to the strong artificial selection during the domestication process. This study has demonstrated the applicability of SkimSeq in a highly duplicated genome of P. monodon specifically, across a range of genetic backgrounds and geographical distributions, and would be useful for future genetic improvement program of this species in aquaculture.
... Genotyping by sequencing (GBS) and single primer enrichment technology (SPET) are highly cost-effective methods for simultaneous marker development and genotyping and improved technologies are continuously developed. For example, skim-based GBS (skimGBS) applying low-coverage whole-genome sequencing has been developed for high-resolution genotyping (Golicz et al. 2015). In SkimGBS, the parents of a population are submitted to whole-genome re-sequencing at about 30Â sequencing depth and the offspring is sequenced at much lesser depth (approximately 1Â). ...
... The resulting maps allow for accurate definition of cross-over breakpoints and significantly improve mapping compared to conventional GBS. SkimGBS has been successful for general marker discovery, haplotype identification, recombination characterization, QTL analysis, genome-wide association studies and genomic selection on a range of different plant species (Golicz et al. 2015;Bayer et al. 2015), but was not yet performed on mungbean. ...
... Consequently, sequencing and assembling of one or a few genomes clearly do not depict the genomic diversity of a species. In contrast, sequencing a set of biodiverse lines and de novo assembly of the sequencing reads to a pangenome would give a much more accurate picture of the available genomic diversity (Golicz et al. 2015). ...
Chapter
Full-text available
Mungbean is a short-duration legume crop cultivated in South Asia, Southeast Asia and Australasia. Its cultivation is rapidly spreading to other parts of the world. Insect pests and diseases are the major constraints in increasing the productivity of mungbean crop. The important diseases in mungbean include mungbean yellow mosaic, anthracnose, powdery mildew, Cercospora leaf spot, dry root rot, halo blight, bacterial leaf spot and tan spot. The major insect-pests of mungbean are stem fly, thrips, aphids, whitefly, pod borers and bruchids. Development of host plant resistance to insect pests and diseases in mungbean by breeding for resistance is an alternative, economical and environment-friendly approach. Though breeding for resistance to insect pests and diseases has been extensively studied in mungbean, the success rate in stabilizing the resistance has been less due to the development of insect biotypes, new strains in pathogens and the environmental interactions. This chapter covers the insect and disease resistance sources in mungbean, resistant traits, the genetic basis of resistance and different breeding methods involved in breeding for insect and disease resistance.
... Genotyping by sequencing (GBS) and single primer enrichment technology (SPET) are highly cost-effective methods for simultaneous marker development and genotyping and improved technologies are continuously developed. For example, skim-based GBS (skimGBS) applying low-coverage whole-genome sequencing has been developed for high-resolution genotyping (Golicz et al. 2015). In SkimGBS, the parents of a population are submitted to whole-genome re-sequencing at about 30Â sequencing depth and the offspring is sequenced at much lesser depth (approximately 1Â). ...
... The resulting maps allow for accurate definition of cross-over breakpoints and significantly improve mapping compared to conventional GBS. SkimGBS has been successful for general marker discovery, haplotype identification, recombination characterization, QTL analysis, genome-wide association studies and genomic selection on a range of different plant species (Golicz et al. 2015;Bayer et al. 2015), but was not yet performed on mungbean. ...
... Consequently, sequencing and assembling of one or a few genomes clearly do not depict the genomic diversity of a species. In contrast, sequencing a set of biodiverse lines and de novo assembly of the sequencing reads to a pangenome would give a much more accurate picture of the available genomic diversity (Golicz et al. 2015). ...
Chapter
This chapter provides an overview of the economic importance of mungbean globally and the status of mungbean improvement research. The global mungbean area is about 7.3 million ha, and the average yield is 721 kg/ha. India and Myanmar each account for 30% of global output of 5.3 million t. Other large producers are China, Indonesia, Thailand, Kenya, and Tanzania. The mungbean market is divided in four main segments by usage: dry grains (important in South Asia and Kenya), sprouts (important in East and Southeast Asia), transparent noodles/starch (important in East and Southeast Asia), and paste (important in East Asia). Mungbean research is under-resourced in most countries as it is considered a minor crop. There is a history of strong international collaboration in mungbean improvement research in Asia, which is particularly important for a minor crop like mungbean as no single country has the capacity to cover all aspects requiring research. The International Mungbean Improvement Network was established in 2016 to further this collaboration and is coordinated by the World Vegetable Center.
... Genotyping by sequencing (GBS) and single primer enrichment technology (SPET) are highly cost-effective methods for simultaneous marker development and genotyping and improved technologies are continuously developed. For example, skim-based GBS (skimGBS) applying low-coverage whole-genome sequencing has been developed for high-resolution genotyping (Golicz et al. 2015). In SkimGBS, the parents of a population are submitted to whole-genome re-sequencing at about 30Â sequencing depth and the offspring is sequenced at much lesser depth (approximately 1Â). ...
... The resulting maps allow for accurate definition of cross-over breakpoints and significantly improve mapping compared to conventional GBS. SkimGBS has been successful for general marker discovery, haplotype identification, recombination characterization, QTL analysis, genome-wide association studies and genomic selection on a range of different plant species (Golicz et al. 2015;Bayer et al. 2015), but was not yet performed on mungbean. ...
... Consequently, sequencing and assembling of one or a few genomes clearly do not depict the genomic diversity of a species. In contrast, sequencing a set of biodiverse lines and de novo assembly of the sequencing reads to a pangenome would give a much more accurate picture of the available genomic diversity (Golicz et al. 2015). ...
Chapter
Full-text available
Mungbean (Vigna radiata (L.) R. Wilczek var. radiata) is an important legume crop widely produced and consumed throughout Southeast Asia, cultivated on more than 6 million hectares worldwide. Minimizing the impact climate variability has on production is vital to smallholder farmers rely on mungbeans as a source of income and nutrition Abiotic stress factors such as drought, water availability, heat and salinity pose a major risk to global food security. Variability in the climate and the increasing demand for food crops means innovative approaches must be implemented now to secure the food of tomorrow. Conventional breeding programs lead by the World Vegetable Centre and the Australian National Mungbean Improvement Program have dramatically increased the yields, reliability and sustainability of mungbean crops worldwide. Breeders and researchers are building on that foundational work through the implementation of genomic technologies. Sequencing the genomes of large diverse sets of mungbean germplasm aims to quantify how the genetic diversity present among the world’s mungbean collections and to identify genes associated with agronomically important traits. By combining sequence and phenotyping data regions of the genome associated with important traits link to, the maintenance of photosynthetic pathways and water-use efficiency can be targeted. Once identified, those pathways can be directly manipulated using genome-editing tools reduce current breeding times by more than half. Although abiotic stressors pose an immediate and extensive risk, fortunately the technologies and researchers needed to address the issues exist today.
... which we used to genotype the remaining samples. We extracted DNA using a Qiagen DNeasy Blood & Tissue kit with minor modifications, and then samples were sequenced through skimSeq, a low-coverage whole-genome sequencing method that combines Illumina and PerkinElmer technologies , Golicz et al. 2015. ...
... Because low coverage results in missing sequence data (Supporting information), single-nucleotide polymorphism (SNP) imputation with Beagle ver. 4.0 (Browning and Browning 2016) was performed to fill in missing genotypes based on haplotype blocks (Golicz et al. 2015) and filtered using a genotype probability threshold of 0.9 (Supporting information). After obtaining our variant data set from the skimSeq pipeline, we filtered out SNPs with quality score (QUAL) < 20, and mapping quality (MQ) < 20 using vcflib v1 (Garrison 2016). ...
Article
Full-text available
Genetic variation is a fundamental component of biodiversity, and studying population structure, gene flow and demographic history can help guide conservation strategies for many species. Like other aerial insectivores, the purple martin Progne subis is in decline, and yet their genetic background remains largely unknown. To address this knowledge gap, we assessed population structure in the nominate eastern subspecies (P. s. subis) with relation to natal dispersal and examined historical genetic patterns in all three subspecies (P. s. subis, P. s. arboricola, P. s. hesperia) across their North American breeding range by estimating effective population sizes over time. We used next‐generation sequencing strategies for genomic analyses, integrating whole‐genome resequencing data with continent‐wide band encounter records to examine natal dispersal. We documented population structure across P. s. subis, with the highest differentiation between the northern (Alberta) and more southern colonies and following patterns of isolation‐by‐distance. Consistent with spatial patterns of genetic differentiation, we also found greater longitudinal than latitudinal natal dispersal distances, signifying potential latitudinal constraints on gene flow. Earlier contractions in effective population sizes in the western P. s. arboricola and P. s. hesperia compared to the eastern P. s. subis subspecies suggest these subspecies originated from two different glacial refugia. Together, these findings support latitudinal distinction in P. s. subis, and elucidate the origin of subspecies differentiation, highlighting the importance of conserving populations across the range to maximize genetic diversity and adaptive potential in the purple martin.
... The skim sequencing, which is based on whole-genome sequencing with low genome coverage provides large numbers of SNPs markers. Further refinement of the existing NGS technologies like Illumina Infinium assay, Ion torrent, Roche454, and the Affymetrix GeneChip has the potential to revolutionize the genome-based high-throughput genotyping and its utilization for association mapping studies in crop improvement programmes (Golicz et al. 2015). This review covers the basic principles, methodology, application for crop improvement and future perspectives of skim sequencing. ...
... Imputation depends on the haplotype segments present in the reference panel and that inturn represents the similarity with sequence reads . For example, it is based on the haplotype structure of the parents, which assumes that no recombination occurred (Golicz et al. 2015;Bayer et al. 2015). Imputation cannot address too much missing information and under this situation, it is better to remove such genotypes before analysis. ...
Article
High-throughput genotyping has become more convenient and cost-effective due to recent advancements in next-generation sequencing (NGS) techniques. Numerous approaches exploring sequencing advances for genotyping have been developed over the past decade, which includes different variants of genotyping-by-sequencing (GBS), and restriction-site associated DNA sequencing (RAD-seq). Most of these methods are based on the reduced representation of the genome, which ultimately reduces the cost of sequencing by many folds. However, continuously lowering the cost of sequencing makes it more convenient to use whole genome-based approaches. In this regard, skim sequencing, where low coverage whole-genome sequencing is used for the identification of large numbers of polymorphic markers cost-effectively. In the present review, we have discussed recent technological advancements, applicability, and challenges of skim sequencing-based genotypic approaches for crop improvement programmes. Skim sequencing is being extensively used for genotyping in diverse plant species and has a wide range of applications, particularly in quantitative trait loci (QTL) mapping, genomewide association studies (GWAS), fine genetic map construction, and identification of recombination and gene conversion events in various breeding programmes. The cost-effectiveness, simplicity, and genomewide coverage will increase the application of skims sequencing-based genotyping. The article summarizes the protocol, uses, bioinformatics tools, its application, and future prospects of skim sequencing in crop improvement.
... The rapid decrease in the cost of genome-wide genotyping is enabling large-scale assessment of crop diversity to identify genes underlying climate-related traits. Genotyping by sequencing (GBS) leverages the low cost of sequencing to identify thousands to millions of single nucleotide polymorphisms (SNPs) in plant populations (Davey et al. 2011;Deschamps et al. 2012;He et al. 2014;Heffelfinger et al. 2014;Poland and Rife 2012;Golicz et al. 2015;. GBS approaches vary in the type and volume of data produced and can be divided into whole-genome resequencing (WGR) approaches such as skimGBS (e.g. ...
... Despite rapid advances and decreasing costs of genome-wide genotyping, several limitations of this approach to identifying functional crop diversity remain. While WGR is a powerful approach for accurate SNP calling in recombinant populations with a high-quality reference genome (Golicz et al. 2015), the sequencing of crops with large genomes such as wheat is still constrained by high costs. Reduced representation sequencing, on the other hand, lowers per sample costs and has facilitated access to larger genomes including wheat (Poland et al. 2012b). ...
Chapter
Rising food demand from a growing global population, combined with a changing climate, endangers global food security. Thus, there is a need to breed new varieties and increase the efficiency and environmental resilience of crops. Past intensification of crop production has primarily been achieved using fertilisers, herbicides and insecticides as well as improved agronomic methods. However, these practices often rely on finite resources and lack sustainability, making them impractical to increase production in the long term. The ongoing revolution in genomics offers an unprecedented potential to aid crops in adapting to changing environments and increase yield, while also facilitating the diversification of crop production with minor and newly established crop species. Identifying the genomic basis of climate-related agronomic traits for introgression into crop germplasm is a major challenge, requiring the integration of sequencing technologies and breeding expertise. Here we review state of the art genomic tools and their application for accelerating crop improvement in the face of climate change.
... RRS has gained increasing popularity in recent years and protocols have been modified and developed further into over a dozen approaches. Skim sequencing has also been increasingly adopted in studies on crops [13][14][15], and numerous analytical approaches have been developed to overcome the often lower confidence of SNP calls for low coverage data [12,13,16,17]. ...
... The limitations of low coverage skim sequencing are lower rates of SNP genotyping and increased false-positive rates. However, SNP discovery rates and accuracy can be substantially increased using high quality parental genomes, higher sample size, deeper sequencing, filtering, and imputation [14,104]. Skim sequencing currently remains perhaps the costliest genotyping method, but allows the least biased and most informative sampling of the genome. ...
Chapter
In the past decade, the application of high-throughput sequencing to crop genotyping has given rise to novel platforms capable of genotyping tens of thousands of genome-wide DNA markers. Coupled with the decreasing costs of sequencing, this rapid increase in markers allows accelerated and highly accurate genotyping of entire crop populations and diversity sets using single nucleotide polymorphisms (SNPs). These revolutionary advances accelerate crop improvement by facilitating a more precise connection of phenotype to genotype through association studies, linkage mapping and diversity analysis. The platforms driving the advances in genotyping are array technologies and genotyping by sequencing (GBS) methods, which include both low-coverage whole genome resequencing (skim sequencing) and reduced representation sequencing (RRS) approaches. Here, we outline and compare these genotyping platforms and provide a perspective on the promising future of crop genotyping. While SNP arrays provide high quality, simple handling, and unchallenging analysis, the lower cost of RRS and the greater data volume produced by skim sequencing suggest that use of GBS will become more prevalent in crop genomics as sequencing costs decrease and data analysis becomes more streamlined.
... There are two types of GBS methods: whole genome resequencing (WGR) and reduced representation sequencing (RRS). Huang, et al. [30] found that WGR generates a high density of SNPs and is often carried out at 1x coverage, which is sufficient for successful SNP calling in recombinant populations with a high-quality reference genome [31]. However, it is still prohibitively expensive to sequence populations with large genomes, such as wheat. ...
Article
In a world where food consumption is rising, climate change poses a severe danger to feeding a growing population. Previously, increased agricultural output was achieved by using fertilizer and insecticides for improved weed and pest control. However, these techniques rely on exhaustible resources and are frequently unstable. Current developments in advanced genetics are paving the door for long-term agricultural intensification and greater global warming crop adaptability. The amount of quality genomic information accessible has been rapidly increasing as a result of the widespread usage of genome sequencing technology. The increasing availability of genomic data has facilitated the shift to plant pan-genomics, allowing researchers to easily know the diversity and available traits for crop improvement and cultivar development. These advancements enhance genomic-assisted breeding, which allows for the quick engagement of candidate genes in climatic conditions and agricultural characteristics, enabling the development of resilient crops.
... Linkage mapping and association study are among the important quantitative genetic methods that have been used in order to know the relationships between a genotype and a specific phenotype. Although the success of linkage mapping in identifying the QTLs has been proven, given that the detected QTL is more than a few centimorgans and contains hundreds of genes, identifying the suitable candidate QTL is dificult 14,15 . Several generations of selfing to provide the mapping populations such as recombinant inbred lines through controlled crosses is another limitation of the linkage mapping 16 . ...
Article
Full-text available
In this study, the genetic and molecular diversity of 60 quinoa accessions was assessed using agronomically important traits related to grain yield as well as microsatellite (SSR) markers, and informative markers linked to the studied traits were identified using association study. The results showed that most of the studied traits had a relatively high diversity, but grain saponin and protein content showed the highest diversity. High diversity was also observed in all SSR markers, but KAAT023, KAAT027, KAAT036, and KCAA014 showed the highest values for most of the diversity indices and can be introduced as the informative markers to assess genetic diversity in quinoa. Population structure analysis showed that the studied population probably includes two subclusters, so that out of 60 quinoa accessions, 29 (48%) and 23 (38%) accessions were assigned to the first and second subclusters, respectively, and eight (13%) accessions were considered as the mixed genotypes. The study of the population structure using Structure software showed two possible subgroups (K = 2) in the studied population and the results of the bar plot confirmed it. Association study using the general linear model (GLM) and mixed linear model (MLM) identified the number of 35 and 32 significant marker-trait associations (MTAs) for the first year (2019) and 37 and 35 significant MTAs for the second year (2020), respectively. Among the significant MTAs identified for different traits, the highest number of significant MTAs were obtained for grain yield and 1000-grain weight with six and five MTAs, respectively.
... This results in higher sequencing coverages at certain loci for genotyping. However, RR-GBS increases the complexity of the lab protocols and has the potential to introduce sequencing bias [8]. At least 13 different protocols for RR-GBS have been developed, each of which differs slightly in the number and type of enzymes used [9]. ...
Article
Full-text available
Background Genomic prediction describes the use of SNP genotypes to predict complex traits and has been widely applied in humans and agricultural species. Genotyping-by-sequencing, a method which uses low-coverage sequence data paired with genotype imputation, is becoming an increasingly popular SNP genotyping method for genomic prediction. The development of Oxford Nanopore Technologies’ (ONT) MinION sequencer has now made genotyping-by-sequencing portable and rapid. Here we evaluate the speed and accuracy of genomic predictions using low-coverage ONT sequence data in a population of cattle using four imputation approaches. We also investigate the effect of SNP reference panel size on imputation performance. Results SNP array genotypes and ONT sequence data for 62 beef heifers were used to calculate genomic estimated breeding values (GEBVs) from 641 k SNP for four traits. GEBV accuracy was much higher when genome-wide flanking SNP from sequence data were used to help impute the 641 k panel used for genomic predictions. Using the imputation package QUILT, correlations between ONT and low-density SNP array genomic breeding values were greater than 0.91 and up to 0.97 for sequencing coverages as low as 0.1 × using a reference panel of 48 million SNP. Imputation time was significantly reduced by decreasing the number of flanking sequence SNP used in imputation for all methods. When compared to high-density SNP arrays, genotyping accuracy and genomic breeding value correlations at 0.5 × coverage were also found to be higher than those imputed from low-density arrays. Conclusions Here we demonstrated accurate genomic prediction is possible with ONT sequence data from sequencing coverages as low as 0.1 × , and imputation time can be as short as 10 min per sample. We also demonstrate that in this population, genotyping-by-sequencing at 0.1 × coverage can be more accurate than imputation from low-density SNP arrays.
... Currently, most researchers employ traditional molecular markers, such as RFLP, SSR, and InDel, to construct genetic maps with an accuracy of approximately 1-10 Mb [38]. Due to limitations in the number and coverage density of polymorphic genetic markers, the mapping intervals are larger, and the precision is lower. ...
Article
Full-text available
Grain shape is an important agronomic trait directly associated with yield in rice. In order to explore new genes related to rice grain shape, a high-density genetic map containing 2193 Bin markers (526957 SNP) was constructed by whole-genome resequencing of 208 recombinant inbred (RILs) derived from a cross between ZP37 and R8605, with a total genetic distance of 1542.27 cM. The average genetic distance between markers was 0.76 cM, and the physical distance was 201.29 kb. Quantitative trait locus (QTL) mapping was performed for six agronomic traits related to rice grain length, grain width, length-to-width ratio, thousand-grain weight, grain cross-sectional area, and grain perimeter under three different environments. A total of 39 QTLs were identified, with mapping intervals ranging from 8.1 kb to 1781.6 kb and an average physical distance of 517.5 kb. Among them, 15 QTLs were repeatedly detected in multiple environments. Analysis of the genetic effects of the identified QTLs revealed 14 stable genetic loci, including three loci that overlapped with previously reported gene positions, and the remaining 11 loci were newly identified loci associated with two or more environments or traits. Locus 1, Locus 3, Locus 10, and Locus 14 were novel loci exhibiting pleiotropic effects on at least three traits and were detected in multiple environments. Locus 14, with a contribution rate greater than 10%, influenced grain width, length-to-width ratio, and grain cross-sectional area. Furthermore, pyramiding effects analysis of three stable genetic loci showed that increasing the number of QTL could effectively improve the phenotypic value of grain shape. Collectively, our findings provided a theoretical basis and genetic resources for the cloning, functional analysis, and molecular breeding of genes related to rice grain shape.
... Two genotyping approaches were used. First, skim whole-genome resequencing (skimseq) (Golicz et al. 2015), a low-coverage, alternative method to genotyping by sequencing, was conducted on nine mapping families. Genotype data generated from skim-seq were used in a BSA approach. ...
Article
Full-text available
Variegation in Vitis hybrids was investigated to confirm the inheritance as a single, recessive gene as previously proposed and commonly observed in breeding programs. Variegated leaves have ornamental appeal, but the phenotype is sublethal in some environments. Twenty-nine grape families were characterized for variegation including F 1, S 1, and S 2 populations. The majority segregated 3 wild type (WT):1 variegated and were supported by chi-square tests. Four populations had segregation ratios supporting 15:1 or 1:1 models, and a unique flecking phenotype was identified in a Landot 4511 S 1 population that suggested the interaction of two recessive loci. A variegated parent was selfed to produce progeny with no WT offspring and was segregated 0:1. Marker trait associations including bulk segregant analysis (BSA), genome-wide association mapping, and quantitative trait loci (QTL) mapping was used on three populations. On chromosome 14, Lvar1 was identified and mapped to 24.5 to 29.5 Mb and associated closely with rhAmpSeq marker 14_27607541. Lvar2 was associated with rhAmpSeq marker 11_18433819 on chromosome 11 at 12.2 to 18.4 Mb. The identification of two loci and the segregation data in some populations suggest that grape breeding germplasm segregates for two recessive loci. The pedigree records suggest that ‘Frontenac’ inherited one of these loci, and that Landot 4511, an ancestor of many populations tested in this experiment, may carry two loci. A total of 252 candidate genes were identified at these loci, including a key target—adenosine triphosphate (ATP)-dependent zinc metalloprotease FtsH6, involved in photosystem II and similar to the var2 mutant in Arabidopsis. This knowledge can help breeders select for ornamental grapevines or eliminate variegation from their breeding programs.
... Genotyping by skim sequencing was performed using the "AgSeq" platform at the Genomics and Bioinformatics Services at Texas A&M AgriLife Research. Genotyping by skim sequencing employs next-generation sequencing to obtain single nucleotide polymorphism (SNP) marker data on the population [38]. Using this approach, genotype maps for the entire genomes can be developed, which makes it possible to detect that which part of the genome was inherited from each of the parental individuals. ...
Article
Full-text available
Salinity stress is a major constraint to rice production in many coastal regions due to saline groundwater and river sources, especially during the dry season in coastal areas when seawater intrudes further inland due to reduced river flows. Since salinity tolerance is a complex trait, breeding efforts can be assisted by mapping quantitative trait loci (QTLs) for complementary salt tolerance mechanisms, which can then be combined to provide higher levels of tolerance. While an abundance of seedling stage salinity tolerance QTLs have been mapped, few studies have investigated reproductive stage tolerance in rice due to the difficulty of achieving reliable stage-specific phenotyping techniques. In the current study, a BC1F2 mapping population consisting of 435 individuals derived from a cross between a salt-tolerant Saudi Arabian variety, Hasawi, and a salt-sensitive Bangladeshi variety, BRRI dhan28, was evaluated for yield components after exposure to EC 10 dS/m salinity stress during the reproductive stage. After selecting tolerant and sensitive progeny, 190 individuals were genotyped by skim sequencing, resulting in 6209 high quality single nucleotide polymorphic (SNP) markers. Subsequently, a total of 40 QTLs were identified, of which 24 were for key traits, including productive tillers, number and percent filled spikelets, and grain yield under stress. Importantly, three yield-related QTLs, one each for productive tillers (qPT3.1), number of filled spikelets (qNFS3.1) and grain yield (qGY3.1) under salinity stress, were mapped at the same position (6.7 Mb or 26.1 cM) on chromosome 3, which had not previously been associated with grain yield under salinity stress. These QTLs can be investigated further to dissect the molecular mechanisms underlying reproductive stage salinity tolerance in rice.
... Later, imputation is performed based on genetic linkage. Due to the large size of linkage blocks, Skim-seq is a suitable method for genotyping F 2 and F 3 segregating populations (Golicz et al., 2015;Kumar et al., 2021). ...
Article
Full-text available
Quinoa is a pseudocereal originating from the Andean regions. Despite quinoa’s long cultivation history, genetic analysis of this crop is still in its infancy. We aimed to localize quantitative trait loci (QTL) contributing to the phenotypic variation of agronomically important traits. We crossed the Chilean accession PI-614889 and the Peruvian accession CHEN-109, which depicted significant differences in days to flowering, days to maturity, plant height, panicle length, and thousand kernel weight (TKW), saponin content, and mildew susceptibility. We observed sizeable phenotypic variation across F2 plants and F3 families grown in the greenhouse and the field, respectively. We used Skim-seq to genotype the F2 population and constructed a high-density genetic map with 133,923 single nucleotide polymorphism (SNPs). Fifteen QTL were found for ten traits. Two significant QTL, common in F2 and F3 generations, depicted pleiotropy for days to flowering, plant height, and TKW. The pleiotropic QTL harbored several putative candidate genes involved in photoperiod response and flowering time regulation. This study presents the first high-density genetic map of quinoa that incorporates QTL for several important agronomical traits. The pleiotropic loci can facilitate marker-assisted selection in quinoa breeding programs.
... Next-generation sequencing (NGS) technologies have reduced the cost of sequencing; however, even low-pass shallow sequencing, sometimes called 'skim-sequencing,' is expensive for large plant genomes -prohibitively so for routine copy number detection (Golicz et al., 2015;Kim et al., 2016). ...
Article
Full-text available
Transgenic plants are produced both to investigate gene function and to confer desirable traits into crops. Transgene copy number is known to influence expression levels, and consequently, phenotypes. Similarly, knowledge of transgene zygosity is desirable for making quantitative assessments of phenotype and tracking the inheritance of transgenes in progeny generations. Since the first transgenic plants were produced, several methods for determining copy number have been applied, including Southern blotting, quantitative real-time PCR, and more recently, sequencing methods; however, each method has specific disadvantages, compromising throughput, accuracy, or expense. Digital PCR (dPCR) divides reactions into partitions, converting the exponential, analogue nature of PCR into a linear, digital signal that allows the frequency of occurrence of specific sequences to be accurately estimated. Confidence increases with the number of partitions; therefore, the availability of emulsion technologies that enable reactions to be divided into tens of thousands of nanodroplets allows accurate determination of copy number in what has become known as digital droplet PCR (ddPCR). ddPCR offers similar benefits of low costs and scalability as other PCR techniques but with superior accuracy and reliability. Graphic abstract: Digital PCR (dPCR) divides reactions into partitions, converting the exponential, analogue nature of PCR into a linear, digital signal that allows the frequency of transgene copy number to be accurately assessed.
... WGS allows for the discovery and genotyping of both nucleotide and structural variants) [4]. Low-pass (skim) sequencing is typically used to sequence the entire genome, albeit incompletely, of large sets of individuals with an ultra-low coverage ( 1Â)) [5]. SNP arrays are high-density oligo arrays containing up to several million probes of short length, which allow for the genotyping of hundreds of thousands of "selected" SNPs across the genome, in a single reaction, on a large set of individuals [6]. ...
Chapter
The quality, statistical power, and resolution of genome-wide association studies (GWAS) are largely dependent on the comprehensiveness of genotypic data. Over the last few years, despite the constant decrease in the price of sequencing, whole-genome sequencing (WGS) of association panels comprising a large number of samples remains cost-prohibitive. Therefore, most GWAS populations are still genotyped using low-coverage genotyping methods resulting in incomplete datasets. Imputation of untyped variants is a powerful method to maximize the number of SNPs identified in study samples, it increases the power and resolution of GWAS and allows to integrate genotyping datasets obtained from various sources. Here, we describe the key concepts underlying imputation of untyped variants, including the architecture of reference panels, and review some of the associated challenges and how these can be addressed. We also discuss the need and available methods to rigorously assess the accuracy of imputed data prior to their use in any genetic study.
... This is because data on recombinant populations in addition to high-quality reference genome allows screening for high-confdence SNPs (X. Huang et al. 2009;Golicz, Bayer, and Edwards 2015). Nonetheless, WGRs still remain costly for sequencing populations possessing large genomes. ...
... Although WGRS and GBS have been widely used for trait mapping and QTL detection in chickpea, both approaches have their own limitations; GBS generates numerous missing data points across the genome, and WGRS is costly for sequencing large mapping populations. These limitations can be partly offset by sequencing at low coverage, referred to as skim sequencing (Golicz et al. 2015). This approach was used to fine map the 'QTL-hotspot' region, from ~ 3 Mb to ~ 300 kb, enabling identification of key genes related to drought tolerance in chickpea (Kale et al. 2015). ...
Article
Full-text available
Key message: Integration of genomic technologies with breeding efforts have been used in recent years for chickpea improvement. Modern breeding along with low cost genotyping platforms have potential to further accelerate chickpea improvement efforts. The implementation of novel breeding technologies is expected to contribute substantial improvements in crop productivity. While conventional breeding methods have led to development of more than 200 improved chickpea varieties in the past, still there is ample scope to increase productivity. It is predicted that integration of modern genomic resources with conventional breeding efforts will help in the delivery of climate-resilient chickpea varieties in comparatively less time. Recent advances in genomics tools and technologies have facilitated the generation of large-scale sequencing and genotyping data sets in chickpea. Combined analysis of high-resolution phenotypic and genetic data is paving the way for identifying genes and biological pathways associated with breeding-related traits. Genomics technologies have been used to develop diagnostic markers for use in marker-assisted backcrossing programmes, which have yielded several molecular breeding products in chickpea. We anticipate that a sequence-based holistic breeding approach, including the integration of functional omics, parental selection, forward breeding and genome-wide selection, will bring a paradigm shift in development of superior chickpea varieties. There is a need to integrate the knowledge generated by modern genomics technologies with molecular breeding efforts to bridge the genome-to-phenome gap. Here, we review recent advances that have led to new possibilities for developing and screening breeding populations, and provide strategies for enhancing the selection efficiency and accelerating the rate of genetic gain in chickpea.
... Genotyping of 322 tomato genotypes was performed using low-coverage whole- [29]) with a paired-end approach (150 bp × 150 bp) (Illumina HiSeq 4000) at the Texas A&M Genomics and Bioinformatics service (College Station, TX). Raw sequences from the 322 genotypes were filtered to remove low-quality reads and adapter sequences. ...
... Exome capture detects genomic sequence of the genic region (Mascher et al., 2013). Skim sequencing-based genotyping involves resequencing of multiple individuals followed by alignment of the reads to the reference sequence to genotype SNPs (Golicz et al., 2015). AmpliSeq is one of the best methods for the detection of targeted polymorphisms to date (Ogiso-Tanaka et al., 2019), and we estimated that there was an agreement rate of more than 90% in the core polymorphisms detected using RNA-Seq and AmpliSeq. ...
Article
Full-text available
Marker-assisted selection of crop plants requires DNA markers that can distinguish between the closely related strains often used in breeding. The availability of reference genome sequence facilitates the generation of markers, by elucidating the genomic positions of new markers as well as of their neighboring sequences. In 2017, a high quality genome sequence was released for the six-row barley (Hordeum vulgare) cultivar Morex. Here, we developed a de novo RNA-Seq-based genotyping procedure for barley strains used in Japanese breeding programs. Using RNA samples from the seedling shoot, seedling root, and immature flower spike, we mapped next-generation sequencing reads onto the transcribed regions, which correspond to ∼590 Mb of the whole ∼4.8-Gbp reference genome sequence. Using 150 samples from 108 strains, we detected 181,567 SNPs and 45,135 indels located in the 28,939 transcribed regions distributed throughout the Morex genome. We evaluated the quality of this polymorphism detection approach by analyzing 387 RNA-Seq-derived SNPs using amplicon sequencing. More than 85% of the RNA-Seq SNPs were validated using the highly redundant reads from the amplicon sequencing, although half of the indels and multiple-allele loci showed different polymorphisms between the platforms. These results demonstrated that our RNA-Seq-based de novo polymorphism detection system generates genome-wide markers, even in the closely related barley genotypes used in breeding programs.
... However, majority of legume crops including three mentioned legume crops have genome sequence assemblies available. The limitations of both of these approaches partly can be countered by performing sequencing at lower depth, referred as skim sequencing (Golicz et al. 2015). The utility of this approach has been demonstrated in fine mapping of the "QTL-hotspot" region for drought tolerance-related traits in chickpea . ...
Article
Full-text available
Efficiency of breeding programs of legume crops such as chickpea, pigeonpea and groundnut has been considerably improved over the past decade through deployment of modern genomic tools and technologies. For instance, next-generation sequencing technologies have facilitated availability of genome sequence assemblies, re-sequencing of several hundred lines, development of HapMaps, high-density genetic maps, a range of marker genotyping platforms and identification of markers associated with a number of agronomic traits in these legume crops. Although marker-assisted backcrossing and marker-assisted selection approaches have been used to develop superior lines in several cases, it is the need of the hour for continuous population improvement after every breeding cycle to accelerate genetic gain in the breeding programs. In this context, we propose a sequence-based breeding approach which includes use of independent or combination of parental selection, enhancing genetic diversity of breeding programs, forward breeding for early generation selection, and genomic selection using sequencing/genotyping technologies. Also, adoption of speed breeding technology by generating 4–6 generations per year will be contributing to accelerate genetic gain. While we see a huge potential of the sequence-based breeding to revolutionize crop improvement programs in these legumes, we anticipate several challenges especially associated with high-quality and precise phenotyping at affordable costs, data analysis and management related to improving breeding operation efficiency. Finally, integration of improved seed systems and better agronomic packages with the development of improved varieties by using sequence-based breeding will ensure higher genetic gains in farmers’ fields.
... The use of 'SNP-CHIPs', arrays of thousands of cultivar SNPs (single nucleotide polymorphisms), has rapidly increased the amount and accessibility of genotype data for specific cultivars of interest (for review, see Uauy, 2017). As sequencing methods and imputation methods improve (Wang et al., 2018a), low coverage (skim) sequencing will soon be a viable option in polyploid wheat (Golicz et al., 2015). Alongside the largescale SNP datasets available at CerealsDB (Wilkinson et al., 2016) and Ensembl Plants (Bolser et al., 2017), advances in polymerase chain reaction (PCR) genotyping markers have facilitated moves into marker-assisted selection in breeding programs. ...
Article
Full-text available
Improving traits in wheat has historically been challenging due to its large and polyploid genome, limited genetic diversity, and in‐field phenotyping constraints. However, within recent years many of these barriers have been lowered. The availability of a chromosome‐level assembly of the wheat genome now facilitates a step‐change in wheat genetics and provides a common platform for resources including variation data, gene expression data, and genetic markers. The development of sequenced mutant populations and gene‐editing techniques now enable the rapid assessment of gene function in wheat directly. The ability to alter gene function in a targeted manner will unmask the effects of homoeolog redundancy and allow the hidden potential of this polyploid genome to be discovered. New techniques to identify and exploit the genetic diversity within wheat wild relatives now enable wheat breeders to take advantage of these additional sources of variation to address challenges facing food production. Finally, advances in phenomics have unlocked rapid screening of populations for many traits of interest both in greenhouses and in the field. Looking forwards, integrating diverse data types, including genomic, epigenetic and phenomics data will take advantage of big data approaches including machine learning to understand trait biology in wheat in unprecedented detail. This article is protected by copyright. All rights reserved.
... The RIL mapping population was genotyped using a skim sequencing approach (Golicz et al., 2015). The parental line, Williams 82 is the soybean reference genome (Schmutz et al., 2010) and to confirm the inferred genotypes, the parental line PI 483460B was sequenced at 15 X genome coverage. ...
Article
Full-text available
The cultivated [Glycine max (L) Merr.] and wild [Glycine soja Siebold & Zucc.] soybean species comprise wide variation in seed composition traits. Compared to wild soybean, cultivated soybean contains low protein, high oil and high sucrose. In this study, an inter‐specific population was derived from a cross between G. max (Williams 82) and G. soja (PI 483460B). This recombinant inbred line (RIL) population of 188 lines was sequenced at 0.3x depth. Based on 91,342 single nucleotide polymorphisms (SNPs), recombination events in RILs were defined, and a high‐resolution bin map was developed (4,070 bins). In addition to bin mapping, QTL analysis for protein, oil and sucrose was performed using 3,343 polymorphic SNPs (3K‐SNP), derived from Illumina Infinium BeadChip sequencing platform. The QTL regions from both platforms were compared and a significant concordance was observed between bin and 3K‐SNP markers. Importantly, the bin map derived from next generation sequencing technology enhanced mapping resolution (from 1325 Kb to 50 Kb). A total of 5, 9 and 4 QTLs were identified for protein, oil and sucrose content, respectively and some of the QTLs coincided with soybean domestication related genomic loci. The major QTL for protein and oil was mapped on Chr. 20 (qPro_20) and suggested negative correlation between oil and protein. In terms of sucrose content, a novel and major QTL was identified on Chr. 8 (qSuc_08) and harbors putative genes involved in sugar transport. In addition, genome‐wide association (GWAS) using 91,342 SNPs confirmed the genomic loci derived from QTL mapping. A QTL based haplotype using whole genome resequencing of 106 diverse soybean lines identified unique allelic variation in wild soybean that could be utilized to widen the genetic base in cultivated soybean. This article is protected by copyright. All rights reserved.
... Likewise, thiamine thiazole synthase, involved in stress-related mechanisms (Rapala-Kozik et al. 2012), and some trait specific genes like E3 ubiquitin-protein ligase and TIME FOR COFFE (TIC) were also identified in the "QTL-hotspot" region . Kale et al. (2015) performed fine mapping of the same "QTL-hotspot" region using a skim sequencing approach (Golicz et al. 2015) to construct a high-density bin mapping. They identified 53,223 SNPs segregating in the 222 RILs from the cross ICC 4958 Â ICC 1882, providing a total of 1610 bins. ...
Chapter
Breeding for quantitative traits has become a reality in chickpea due to the huge development of DNA molecular markers which have given rise to detailed genetic maps and fruitfully implementation of QTLs analysis. Also, it has been crucial that the availability of the whole-genome sequences in kabuli and desi chickpea types greatly assists to survey or re-localize QTLs. In this chapter, plant material more frequently employed for QTL analysis was described. Besides, the importance of an accurate phenotypic evaluation is stated. Strategies previously used to get a rapid and efficient screening for abiotic and biotic stresses as well as adaptative traits were commented. It is also provided a summary of QTLs associated with fungal diseases, drought and salt stresses, flowering, growth habit, yield- and quality-related components. Candidate genes suggested by different authors, some of them already used in marker-assisted selection, mark the beginning of new possibilities for chickpea breeders that will be able to choose the best allelic combination for agronomic traits.
... In a WGR approach known as skim-based genotyping by sequencing (SkimGBS), SNPs and genotypes are called using low-coverage genomic reads, typically <1×, to make genotyping large populations viable (Bayer et al., 2015). This low coverage is common to WGR approaches and is sufficient for genomic analyses in recombinant populations with high quality parental genome sequences (Golicz et al., 2015). To simplify data analysis, ...
Article
Full-text available
In the last decade, the revolution in sequencing technologies has deeply impacted crop genotyping practise. New methods allowing rapid, high-throughput genotyping of entire crop populations have proliferated and opened the door to wider use of molecular tools in plant breeding. These new genotyping by sequencing (GBS) methods include over a dozen reduced representation sequencing (RRS) approaches and at least four whole genome resequencing (WGR) approaches. The diversity of methods available, each often producing different types of data at different cost, can make selection of the best-suited method seem a daunting task. We review the most common genotyping methods used today and compare their suitability for linkage mapping, genome wide association studies (GWAS), marker-assisted and genomic selection and genome assembly and improvement in crops with various genome sizes and complexity. Furthermore, we give an outline of bioinformatics tools for analysis of genotyping data. WGR is well suited to genotyping biparental cross populations with complex, small to moderate sized genomes, and provides the lowest cost per marker data point. RRS approaches differ in their suitability for various tasks, but demonstrate similar costs per marker data point. These approaches are generally better suited for de novo applications and more cost-effective when genotyping populations with large genomes or high heterozygosity. We expect that although RRS approaches will remain the most cost-effective for some time, WGR will become more widespread for crop genotyping as sequencing costs continue to decrease. This article is protected by copyright. All rights reserved.
... After read correction, the accuracy can be increased up to ~ 99.99% [19]. [52,53], which is adequate for accurate SNP calling in recombinant populations with a high quality reference genome [54]. However, sequencing populations with large genomes such as wheat remains costly. ...
Article
Full-text available
Climate change is a major threat to food security in a world of rising crop demand. Although increases in crop production have previously been achieved through the use of fertilisers and chemicals for better control of weeds and pests, these methods rely on finite resources and are often unsustainable. Recent advances in genomics are laying the foundations for sustainable intensification of agriculture and heightened resilience of crops to climate change. The number of available high-quality reference genomes has been constantly growing due to the widespread application of genome sequencing technology. Advances in population-level genotyping have further contributed to a more comprehensive understanding of genomic variation. These increasing volumes of genomic data facilitate the move towards plant pangenomics, providing deeper insights into the diversity available for crop improvement and breeding of new cultivars. Genomics-assisted breeding is benefiting from these advances, allowing rapid identification of genes implicated in climate related agronomic traits, for breeding of crops adapted to a changing climate.
... On the other hand, the WGRS offers better data quality, however, it is also a costly process. To save the cost of sequencing, WGRS can be done at lower depth and in that scenario the approach is referred as skim sequencing (Golicz et al., 2015). Using this approach, Bayer et al. (2015) characterized the distribution of crossover and non-crossover recombination in rape mustard (Brassica napus) and chickpea using SkimGBS. ...
Article
Full-text available
Legumes play a vital role in ensuring global nutritional food security and improving soil quality through nitrogen fixation. Accelerated higher genetic gains is required to meet the demand of ever increasing global population. In recent years, speedy developments have been witnessed in legume genomics due to advancements in next-generation sequencing (NGS) and high-throughput genotyping technologies. Reference genome sequences for many legume crops have been reported in the last 5 years. The availability of the draft genome sequences and re-sequencing of elite genotypes for several important legume crops have made it possible to identify structural variations at large scale. Availability of large-scale genomic resources and low-cost and high-throughput genotyping technologies are enhancing the efficiency and resolution of genetic mapping and marker-trait association studies. Most importantly, deployment of molecular breeding approaches has resulted in development of improved lines in some legume crops such as chickpea and groundnut. In order to support genomics-driven crop improvement at a fast pace, the deployment of breeder-friendly genomics and decision support tools seems appear to be critical in breeding programs in developing countries. This review provides an overview of emerging genomics and informatics tools/approaches that will be the key driving force for accelerating genomics-assisted breeding and ultimately ensuring nutritional and food security in developing countries.
... This approach works best with biparental populations, where the two parent genotypes can be sequenced with higher read coverage and missing haplotype data imputed in the population based on this parent data. Skim genotyping-bysequencing in B. napus was used to identify the frequency of gene conversion events and homologous recombination in a mapping population (Bayer et al., 2015), with a range of future applications including marker discovery as well as conventional linkage mapping, QTL identification, genome-wide association studies and genomic selection (Golicz et al., 2014). In particular, genotyping-by-sequencing approaches provide an opportunity for detection and primary characterization of de novo variation derived from interspecific hybrids. ...
Article
Oilseed rape (Brassica napus) is one of our youngest crop species, arising several times under cultivation in the last few thousand years and completely unknown in the wild. Oilseed rape originated from hybridization events between progenitor diploid species B. rapa and B. oleracea, both important vegetable species. The diploid progenitors are also ancient polyploids, with remnants of two previous polyploidization events evident in the triplicated genome structure. This history of polyploid evolution and human agricultural selection makes B. napus an excellent model with which to investigate processes of genomic evolution and selection in polyploid crops. The ease of de novo interspecific hybridization, responsiveness to tissue culture and the close relationship of oilseed rape to the model plant Arabidopsis thaliana, coupled with the recent availability of reference genome sequences and suites of molecular cytogenetic and high-throughput genotyping tools, allow detailed dissection of genetic, genomic and phenotypic interactions in this crop. In this review we discuss the past and present uses of B. napus as a model for polyploid speciation and evolution in crop species, along with current and developing analysis tools and resources. We further outline unanswered questions that may now be tractable to investigation. This article is protected by copyright. All rights reserved.
... On the other hand, the WGRS offers better data quality, however, it is also a costly process. To save the cost of sequencing, WGRS can be done at lower depth and in that scenario the approach is referred as skim sequencing (Golicz et al., 2015). Using this approach, Bayer et al. (2015) characterized the distribution of crossover and non-crossover recombination in rape mustard (Brassica napus) and chickpea using SkimGBS. ...
Article
Full-text available
Legumes play a vital role in ensuring global nutritional food security and improving soil quality through nitrogen fixation. Accelerated higher genetic gains is required to meet the demand of ever increasing global population. In recent years, speedy developments have been witnessed in legume genomics due to advancements in next-generation sequencing (NGS) and high-throughput genotyping technologies. Reference genome sequences for many legume crops have been reported in the last 5 years. The availability of the draft genome sequences and re-sequencing of elite genotypes for several important legume crops have made it possible to identify structural variations at large scale. Availability of large-scale genomic resources and low-cost and high-throughput genotyping technologies are enhancing the efficiency and resolution of genetic mapping and marker-trait association studies. Most importantly, deployment of molecular breeding approaches has resulted in development of improved lines in some legume crops such as chickpea and groundnut. In order to support genomics-driven crop improvement at a fast pace, the deployment of breeder-friendly genomics and decision support tools seems appear to be critical in breeding programs in developing countries. This review provides an overview of emerging genomics and informatics tools/approaches that will be the key driving force for accelerating genomics-assisted breeding and ultimately ensuring nutritional and food security in developing countries.
... The decreasing cost of NGS data generation and the increasing availability of reference genomes are, for the first time, making it cost effective to generate whole genome sequence data for genotyping-by-sequencing applications. For example, skimgenotyping-by-sequencing uses low coverage whole genome sequencing for high resolution genotyping (Golicz et al. 2015b). This can be applied to genotype doubled haploid canola populations or a range of diverse lines for association mapping studies. ...
Article
Full-text available
Timing of life history events (phenology) is a key driver for the adaptation of grain crops to their environments. Anthesis (flowering) date is the critical phenological stage that has been most extensively studied. Maximum crop yield is achieved by maximising the duration of the pre-anthesis biomass accumulation phase and hence yield potential, while minimising the risk of water stress and temperature stress (heat and cold) during flowering and grain-filling stages. In this article, we review our understanding of phenology of the valuable oilseed crop canola (oilseed rape, Brassica napus L.) from the perspectives of biophysical modelling and genetics. In conjunction, we review the genomic resources for canola and how they could be used to develop models that can accurately predict flowering date in any given set of environmental conditions. Finally, we discuss how molecular marker tools can help canola breeders to continue to improve canola productivity in the light of climate changes and to broaden its adaptation into new agricultural areas.
... An alternate approach to overcome missing data issues is whole genome re-sequencing (WGRS) wherein samples can be sequenced at greater depth thereby reducing the missing data issue. To save the cost of sequencing, WGRS can be done at a lower depth and in that scenario the approach is referred to as skim sequencing 17 . This approach is very useful to identify sequence variants/SNPs in species where the reference genome sequence is available. ...
Article
Full-text available
A combination of two approaches, namely QTL analysis and gene enrichment analysis were used to identify candidate genes in the "QTL-hotspot" region for drought tolerance present on the Ca4 pseudomolecule in chickpea. In the first approach, a high-density bin map was developed using 53,223 single nucleotide polymorphisms (SNPs) identified in the recombinant inbred line (RIL) population of ICC 4958 (drought tolerant) and ICC 1882 (drought sensitive) cross. QTL analysis using recombination bins as markers along with the phenotyping data for 17 drought tolerance related traits obtained over 1-5 seasons and 1-5 locations split the "QTL-hotspot" region into two subregions namely "QTL-hotspot-a" (15 genes) and "QTL-hotspot-b" (11 genes). In the second approach, gene enrichment analysis using significant marker trait associations based on SNPs from the Ca4 pseudomolecule with the above mentioned phenotyping data, and the candidate genes from the refined "QTL-hotspot" region showed enrichment for 23 genes. Twelve genes were found common in both approaches. Functional validation using quantitative real-time PCR (qRT-PCR) indicated four promising candidate genes having functional implications on the effect of "QTL-hotspot" for drought tolerance in chickpea.
... An alternate approach to overcome missing data issues is whole genome re-sequencing (WGRS) wherein samples can be sequenced at greater depth thereby reducing the missing data issue. To save the cost of sequencing, WGRS can be done at a lower depth and in that scenario the approach is referred to as skim sequencing 17 . This approach is very useful to identify sequence variants/SNPs in species where the reference genome sequence is available. ...
Conference Paper
Full-text available
With the objective to identify candidate genes for drought tolerance related traits in chickpea, 232 recombinant inbred lines were sequenced using advanced whole genome re-sequencing based skim sequencing approach. A total of 497.6 Gb of Illumina paired read sequence data were generated for 232 samples which resulted in identification of 62,370 SNPs, distributed on 8 chromosomes. A total of 1,610 bins were identified using bin mapping approach. More than 60% bins were of ≤ 1 Mb size while ~70 % bins had ≤ 10 genes. The recombination bins were subsequently used as molecular markers for linkage mapping and QTL analysis. The QTL analysis has identified 134 QTLs for 17 traits and two drought tolerance indices, out of which 71 were major QTLs (PVE>10%) and 63 were minor QTLs (PVE<10%). The detailed analysis of QTLs from "QTL-hotspot" region reported earlier on CaLG04 were analyzed and 76 candidate genes were identified. Further, a priory candidate gene analysis was carried out using candidate genes along with SNPs from CaLG04 wherein, 23 genes were found enriched for six different traits viz. 100 seed weight, biomass, drought tolerance indices, plant height, number of pods per plant and shoot dry weight. Quantitative gene expression studies showed differential expression of some of these genes in drought tolerant and susceptible lines. Further characterization of these genes will help to understand their role in drought tolerance mechanism in chickpea.
Book
Full-text available
Biotechnology is one of the emerging fields that can add new and better application in a wide range of sectors like health care, service sector, agriculture, and processing industry to name some. This book will provide an excellent opportunity to focus on recent developments in the frontier areas of Biotechnology and establish new collaborations in these areas. T
Article
Full-text available
The impact of climate change on spring phenology poses risks to migratory birds, as migration timing is controlled predominantly by endogenous mechanisms. Despite recent advances in our understanding of the underlying genetic basis of migration timing, the ways that migration timing phenotypes in wild individuals may map to specific genomic regions requires further investigation. We examined the genetic architecture of migration timing in a long-distance migratory songbird (purple martin, Progne subis subis) by integrating genomic data with an extensive dataset of direct migratory tracks. A moderate to large amount of variance in spring migration arrival timing was explained by genomics (proportion of phenotypic variation explained by genomics = 0.74; polygenic score R² = 0.24). On chromosome 1, a region that was differentiated between migration timing phenotypes contained genes that could facilitate nocturnal flights and act as epigenetic modifiers. Overall, these results advance our understanding of the genomic underpinnings of migration timing.
Preprint
Full-text available
The impact of climate change on spring phenology poses risks to migratory birds, as migration timing is controlled predominantly by endogenous mechanisms. Despite numerous studies on internal cues controlling migration, the underlying genetic basis of migration timing remains largely unknown. We investigated the genetic architecture of migration timing in a long-distance migratory songbird (purple martin, Progne subis subis ) by integrating genomic data with an extensive dataset of direct migratory tracks. Our findings show migration has a predictable genetic basis in martins and maps to a region on chromosome 1. This region contains genes that could facilitate nocturnal flights and act as epigenetic modifiers. Additionally, we found that genomic variance explained a higher proportion of historic than recent environmental spring phenology data, which may suggest a reduction in the adaptive potential of migratory behavior in contemporary populations. Overall, these results advance our understanding of the genomic underpinnings of migration timing and could provide context for conservation action.
Preprint
Full-text available
Background Genomic prediction describes the use of SNP genotypes to predict complex phenotypes and has been widely applied in humans and agriculture species. Genotyping-by-sequencing, a method which uses low-coverage sequence data paired with genotype imputation, is becoming increasingly popular for SNP genotyping. The development of Oxford Nanopore Technologies’ (ONT) MinION sequencer has now made genotyping-by-sequencing portable and rapid. Here we evaluate the speed and accuracy of genomic predictions using low-coverage ONT sequence data in a population of cattle using four imputation approaches. We also investigate the effect of SNP reference panel size on their performance. Results SNP array genotypes and ONT sequence data for 64 beef heifers were used to calculate genomic estimated breeding values (GEBVs) from 641k SNP for four traits. Accuracy of the GEBVs was much higher when flanking SNP from sequence data was used to help impute the 641k panel used for genomic predictions. Using the imputation package QUILT, correlations between ONT and low-density SNP array genomic breeding values were greater than 0.91 and up to 0.97 for sequencing coverages as low as 0.1x using a panel of 48 million SNP that flanked the 641k in the prediction equation. Imputation time was significantly reduced by decreasing the number of flanking sequence SNP used in imputation for all methods. Genomic breeding values calculated using QUILT also had higher correlations to high density SNP arrays than genomic breeding values from imputed-low density arrays for coverages as low as 0.5x. Conclusions Here we demonstrated accurate genomic prediction is possible with ONT sequence data from sequencing coverages as low as 0.1x, and imputation time can be as short as 10 minutes per sample.
Preprint
Full-text available
Quinoa is a pseudocereal originating from the Andean regions. In spite of quinoa’s long cultivation history, genetic analysis of this crop is still in its infancy. We aimed to localize QTL contributing to the phenotypic variation of agronomically important traits. We crossed the Chilean accession PI-614889 and the Peruvian accession CHEN-109, which depicted significant differences in days to flowering, days to maturity, plant height, panicle length, thousand kernel weight (TKW), saponin content, and mildew susceptibility. We observed sizeable phenotypic variation across F 2 plants and F 3 families grown in the greenhouse and in the field, respectively. We used Skim-seq to genotype the F 2 population and constructed a high-density genetic map with 133,923 SNPs. Fifteen QTL were found for ten traits. Two significant QTL, common in F 2 and F 3 generations, depicted pleiotropy for days to flowering, plant height, and TKW. The pleiotropic QTL harbored several putative candidate genes involved in photoperiod response and flowering time regulation. This study presents the first high-density genetic map of quinoa that incorporates QTL for several important agronomical traits. The pleiotropic loci can facilitate marker assisted selection in quinoa breeding programs. Key message Skim-sequencing enabled the construction a high-density genetic map (133,923 SNPs) and fifteen QTL were detected for ten agronomically important traits.
Chapter
Marker-assisted selection (MAS) is a selection method for improving agriculturally essential traits in wheat. MAS allows efficient screening of difficult-to-phenotype traits, introgression of genomic complements from donor germplasm into elite breeding lines, and gene/trait pyramiding for quantitative traits. The discovery of a large number of QTL-specific molecular markers for qualitative and quantitative traits has accelerated MAS in recent years. Markers for loci related to disease resistance and agronomic and quality traits are available in wheat. The convenience of detecting and following the inheritance pattern of major genes/QTL at a low cost has improved genotyping and germplasm selection in the breeding programs. The development of high-throughput sequencing, genetic loci probing technologies, precision phenotyping systems, crop molecular physiology, and computational tools further extends the possible uses of MAS in wheat. The pan-genomic resources and the annotated genome sequence increase the effectiveness of MAS in hexaploid wheat by allowing for primer design, in silico validation of primers for specific binding, QTL walking, and genome-anchored fine mapping. This chapter sheds light on the use of MAS in shortening the breeding cycle, improving biotic and abiotic stress resistance, and sustaining the yield potential of wheat.
Article
Full-text available
Genotype imputation is a process that estimates missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs), boost the power to identify genetic association and promote the combination of genetic studies. However, there has been a lack of high-quality reference panels for most plants, which greatly hinders the application of genotype imputation. Here, we developed Plant-ImputeDB (http://gong_lab.hzau.edu.cn/Plant_imputeDB/), a comprehensive database with reference panels of 12 plant species for online genotype imputation, SNP and block search and free download. By integrating genotype data and whole-genome resequencing data of plants from various studies and databases, the current Plant-ImputeDB provides high-quality reference panels of 12 plant species, including ∼69.9 million SNPs from 34 244 samples. It also provides an easy-to-use online tool with the option of two popular tools specifically designed for genotype imputation. In addition, Plant-ImputeDB accepts submissions of different types of genomic variations, and provides free and open access to all publicly available data in support of related research worldwide. In general, Plant-ImputeDB may serve as an important resource for plant genotype imputation and greatly facilitate the research on plant genetic research.
Chapter
Legume crops play a key role for producing proteins for human and animal nutrition. Sustainable increase of plant protein production is essential to satisfy the rising demand of a growing world population. Breeding varieties with high and stable yields and with optimized nutritional value, which at the same time require less input in terms of energy and labor is one of the pathways for sustainable rise of mungbean productivity. Mungbean is mainly used in rotation with cereals. Therefore, producing an economically viable harvest in the short time window between two main crops, often under stressful conditions of a hot and dry season, is an important breeding aim for this crop. Breeding improved varieties requires access to the genetic diversity of the crop and crop wild relatives to source new traits. As natural plant populations are endangered by loss of habitats and climate change, ex situ collections have gained increased importance to conserve biodiversity for crop improvement. Effective screening methods for desired agronomical traits, including biotic and abiotic stress tolerances and pre-breeding technologies to introgress new traits from non-adapted materials into elite lines are facilitating breeding efforts. Often new traits have to be sourced from wild relatives. Crossing barriers between different Vigna species and the need of technologies to restore fertility add additional complexity when traits have to be sourced from wild species. Genomics methods such as quantitative trait mapping or pangenomics studies elucidate the genetic basis of traits of interest, and marker assisted or genomic selection are guiding breeding efforts. Well-coordinated phenotyping efforts to collect and analyze crop performance data across multiple locations are essential for effective breeding of a more productive, nutritious and resilient mungbean crop.
Article
Full-text available
As sequencing and genotyping technologies evolve, crop genetics researchers accumulate increasing numbers of genomic data sets from various genotyping platforms on different germplasm panels. Imputation is an effective approach to increase marker density of existing data sets toward the goal of integrating resources for downstream applications. While a number of imputation software packages are available, the limitations to utilization for the rice community include high computational demand and lack of a reference panel. To address these challenges, we develop the Rice Imputation Server, a publicly available web application leveraging genetic information from a globally diverse rice reference panel assembled here. This resource allows researchers to benefit from increased marker density without needing to perform imputation on their own machines. We demonstrate improvements that imputed data provide to rice genome-wide association (GWA) results of grain amylose content and show that the major functional nucleotide polymorphism is tagged only in the imputed data set.
Article
Full-text available
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Article
Full-text available
Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a challenge for large and polyploid genomes due to their size and complexity. We have developed a pipeline for the robust identification of SNPs in large and complex genomes using Illumina second generation DNA sequence data and demonstrated this by the discovery of SNPs in the hexaploid wheat genome. We have developed a SNP discovery pipeline called SGSautoSNP (Second-Generation Sequencing AutoSNP) and applied this to discover more than 800,000 SNPs between four hexaploid wheat cultivars across chromosomes 7A, 7B and 7D. All SNPs are presented for download and viewing within a public GBrowse database. Validation suggests an accuracy of greater than 93% of SNPs represent polymorphisms between wheat cultivars and hence are valuable for detailed diversity analysis, marker assisted selection and genotyping by sequencing. The pipeline produces output in GFF3, VCF, Flapjack or Illumina Infinium design format for further genotyping diverse populations. As well as providing an unprecedented resource for wheat diversity analysis, the method establishes a foundation for high resolution SNP discovery in other large and complex genomes.
Article
Full-text available
Bowtie 1 is a fast and memory-efficient program for aligning short reads to mammalian genomes. Burrows-Wheeler indexing allows Bowtie to align more than 25 million 35-bp reads per CPU hour to the human genome in a memory footprint of as little as 1.1 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a quality-aware search algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve greater alignment speed. Bowtie is free, open source software available for download from http://bowtie.cbcb.umd.edu . The Burrows-Wheeler Transformation of a text T, BWT(T), is constructed as shown to the right. The Burrows- Wheeler Matrix of T is the matrix whose rows are all distinct cyclic rotations of Tsortedlexicographically( sorted lexicographically ( is "less than" all other characters). BWT(T) is the sequence of characters in the last column of this matrix.
Article
Full-text available
Background The throughput of next-generation sequencing machines has increased dramatically over the last few years; yet the cost and time for library preparation have not changed proportionally, thus representing the main bottleneck for sequencing large numbers of samples. Here we present an economical, high-throughput library preparation method for the Illumina platform, comprising a 96-well based method for DNA isolation for yeast cells, a low-cost DNA shearing alternative, and adapter ligation using heat inactivation of enzymes instead of bead cleanups. Results Up to 384 whole-genome libraries can be prepared from yeast cells in one week using this method, for less than 15 euros per sample. We demonstrate the robustness of this protocol by sequencing over 1000 yeast genomes at ~30x coverage. The sequence information from 768 yeast segregants derived from two divergent S. cerevisiae strains was used to generate a meiotic recombination map at unprecedented resolution. Comparisons to other datasets indicate a high conservation of recombination at a chromosome-wide scale, but differences at the local scale. Additionally, we detected a high degree of aneuploidy (3.6%) by examining the sequencing coverage in these segregants. Differences in allele frequency allowed us to attribute instances of aneuploidy to gains of chromosomes during meiosis or mitosis, both of which showed a strong tendency to missegregate specific chromosomes. Conclusions Here we present a high throughput workflow to sequence genomes of large number of yeast strains at a low price. We have used this workflow to obtain recombination and aneuploidy data from hundreds of segregants, which can serve as a foundation for future studies of linkage, recombination, and chromosomal aberrations in yeast and higher eukaryotes.
Article
Full-text available
Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools.
Article
Full-text available
Many important crop species have genomes originating from ancestral or recent polyploidisation events. Multiple homoeologous gene copies, chromosomal rearrangements and amplification of repetitive DNA within large and complex crop genomes can considerably complicate genome analysis and gene discovery by conventional, forward genetics approaches. On the other hand, ongoing technological advances in molecular genetics and genomics today offer unprecedented opportunities to analyse and access even more recalcitrant genomes. In this review, we describe next-generation sequencing and data analysis techniques that vastly improve our ability to dissect and mine genomes for causal genes underlying key traits and allelic variation of interest to breeders. We focus primarily on wheat and oilseed rape, two leading examples of major polyploid crop genomes whose size or complexity present different, significant challenges. In both cases, the latest DNA sequencing technologies, applied using quite different approaches, have enabled considerable progress towards unravelling the respective genomes. Our ability to discover the extent and distribution of genetic diversity in crop gene pools, and its relationship to yield and quality-related traits, is swiftly gathering momentum as DNA sequencing and the bioinformatic tools to deal with growing quantities of genomic data continue to develop. In the coming decade, genomic and transcriptomic sequencing, discovery and high-throughput screening of single nucleotide polymorphisms, presence-absence variations and other structural chromosomal variants in diverse germplasm collections will give detailed insight into the origins, domestication and available trait-relevant variation of polyploid crops, in the process facilitating novel approaches and possibilities for genomics-assisted breeding.
Article
Full-text available
This review summarises the biology, discovery and applications of single nucleotide polymorphisms in complex polyploid crop genomes, with a focus on the important oilseed crop Brassica napus. Brassica napus is an allotetraploid species, and along with soybean and oil palm is one of the top three most important oilseed crops globally. Current efforts are well underway to de novo assemble the B. napus genome, following the release of the related B. rapa 'A' genome last year. The next generation of genome sequencing, SNP discovery and analysis pipelines, and the associated challenges for this work in B. napus, will be addressed. The biological applications of SNP technology for both evolutionary and molecular geneticists as well as plant breeders and industry are far-reaching, and will be invaluable to our understanding and advancement of the Brassica crop species.
Article
Full-text available
• Bread wheat ( Triticum aestivum ; Poaceae) is a crop plant of great importance. It provides nearly 20% of the world's daily food supply measured by calorie intake, similar to that provided by rice. The yield of wheat has doubled over the last 40 years due to a combination of advanced agronomic practice and improved germplasm through selective breeding. More recently, yield growth has been less dramatic, and a significant improvement in wheat production will be required if demand from the growing human population is to be met. • Next‐generation sequencing (NGS) technologies are revolutionizing biology and can be applied to address critical issues in plant biology. Technologies can produce draft sequences of genomes with a significant reduction to the cost and timeframe of traditional technologies. In addition, NGS technologies can be used to assess gene structure and expression, and importantly, to identify heritable genome variation underlying important agronomic traits. • This review provides an overview of the wheat genome and NGS technologies, details some of the problems in applying NGS technology to wheat, and describes how NGS technologies are starting to impact wheat crop improvement.
Article
Full-text available
The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria.
Article
Full-text available
Genome sequencing has been revolutionized by next-generation technologies, which can rapidly produce vast quantities of data at relatively low cost. With data production now no longer being limited, there is a huge challenge to analyse the data flood and interpret biological meaning. Bioinformatics scientists have risen to the challenge and a large number of software tools and databases have been produced and these continue to evolve with this rapidly advancing field. Here, we outline some of the tools and databases commonly used for the analysis of next-generation sequence data with comment on their utility.
Article
Full-text available
Complex Triticeae genomes pose a challenge to genome sequencing efforts due to their size and repetitive nature. Genome sequencing can reveal details of conservation and rearrangements between related genomes. We have applied Illumina second generation sequencing technology to sequence and assemble the low copy and unique regions of Triticum aestivum chromosome arm 7BS, followed by the construction of a syntenic build based on gene order in Brachypodium. We have delimited the position of a previously reported translocation between 7BS and 4AL with a resolution of one or a few genes and report approximately 13% genes from 7BS having been translocated to 4AL. An additional 13 genes are found on 7BS which appear to have originated from 4AL. The gene content of the 7DS and 7BS syntenic builds indicate a total of ~77,000 genes in wheat. Within wheat syntenic regions, 7BS and 7DS share 740 genes and a common gene conservation rate of ~39% of the genes from the corresponding regions in Brachypodium, as well as a common rate of colinearity with Brachypodium of ~60%. Comparison of wheat homoeologues revealed ~84% of genes previously identified in 7DS have a homoeologue on 7BS or 4AL. The conservation rates we have identified among wheat homoeologues and with Brachypodium provide a benchmark of homoeologous gene conservation levels for future comparative genomic analysis. The syntenic build of 7BS is publicly available at http://www.wheatgenome.info.
Article
Full-text available
We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.
Article
Full-text available
Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.
Article
Full-text available
Association mapping currently relies on the identification of genetic markers. Several technologies have been adopted for genetic marker analysis, with single nucleotide polymorphisms (SNPs) being the most popular where a reasonable quantity of genome sequence data are available. We describe several tools we have developed for the discovery, annotation, and visualization of molecular markers for association mapping. These include autoSNPdb for SNP discovery from assembled sequence data; TAGdb for the identification of gene specific paired read Illumina GAII data; CMap3D for the comparison of mapped genetic and physical markers; and BAC and Gene Annotator for the online annotation of genes and genomic sequences.
Article
Full-text available
Uncovering the genetic basis of agronomic traits in crop landraces that have adapted to various agro-climatic conditions is important to world food security. Here we have identified ∼ 3.6 million SNPs by sequencing 517 rice landraces and constructed a high-density haplotype map of the rice genome using a novel data-imputation method. We performed genome-wide association studies (GWAS) for 14 agronomic traits in the population of Oryza sativa indica subspecies. The loci identified through GWAS explained ∼ 36% of the phenotypic variance, on average. The peak signals at six loci were tied closely to previously identified genes. This study provides a fundamental resource for rice genetics research and breeding, and demonstrates that an approach integrating second-generation genome sequencing and GWAS can be used as a powerful complementary strategy to classical biparental cross-mapping for dissecting complex traits in rice.
Article
Full-text available
Massive loss of valuable plant species in the past centuries and its adverse impact on environmental and socioeconomic values has triggered the conservation of plant resources. Appropriate identification and characterization of plant materials is essential for the successful conservation of plant resources and to ensure their sustainable use. Molecular tools developed in the past few years provide easy, less laborious means for assigning known and unknown plant taxa. These techniques answer many new evolutionary and taxonomic questions, which were not previously possible with only phenotypic methods. Molecular techniques such as DNA barcoding, random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), microsatellites and single nucleotide polymorphisms (SNP) have recently been used for plant diversity studies. Each technique has its own advantages and limitations. These techniques differ in their resolving power to detect genetic differences, type of data they generate and their applicability to particular taxonomic levels. This review presents a basic description of different molecular techniques that can be utilized for DNA fingerprinting and molecular diversity analysis of plant species.
Article
Full-text available
A-Maize-ing Maize is one of our oldest and most important crops, having been domesticated approximately 9000 years ago in central Mexico. Schnable et al. (p. 1112 ; see the cover) present the results of sequencing the B73 inbred maize line. The findings elucidate how maize became diploid after an ancestral doubling of its chromosomes and reveals transposable element movement and activity and recombination. Vielle-Calzada et al. (p. 1078 ) have sequenced the Palomero Toluqueño ( Palomero ) landrace, a highland popcorn from Mexico, which, when compared to the B73 line, reveals multiple loci impacted by domestication. Swanson-Wagner et al. (p. 1118 ) exploit possession of the genome to analyze expression differences occurring between lines. The identification of single nucleotide polymorphisms and copy number variations among lines was used by Gore et al. (p. 1115 ) to generate a Haplotype map of maize. While chromosomal diversity in maize is high, it is likely that recombination is the major force affecting the levels of heterozygosity in maize. The availability of the maize genome will help to guide future agricultural and biofuel applications (see the Perspective by Feuillet and Eversole ).
Article
Full-text available
The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
Article
Full-text available
SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20-30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome. Availability: http://soap.genomics.org.cn.
Article
Full-text available
Over the last few years there has been a revolution in DNA sequencing technology that has brought down the cost of DNA sequencing and made the sequencing of an increasing number of genomes both feasible and cost effective. There has also been a dramatic shift in the type of sequence data being generated, with vast numbers of short reads or pairs of short reads replacing the traditional relatively long reads produced by Sanger sequencing. These changes in data quantity and format have led to a rethinking of sequence data management, storage, and visualization, and provide a challenge for bioinformatics. The vast amount of sequence data that will be generated over the next few years will require a change in what data are stored and how users query the information.
Article
Full-text available
The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. http://maq.sourceforge.net.
Article
Full-text available
The next-generation sequencing technology coupled with the growing number of genome sequences opens the opportunity to redesign genotyping strategies for more effective genetic mapping and genome analysis. We have developed a high-throughput method for genotyping recombinant populations utilizing whole-genome resequencing data generated by the Illumina Genome Analyzer. A sliding window approach is designed to collectively examine genome-wide single nucleotide polymorphisms for genotype calling and recombination breakpoint determination. Using this method, we constructed a genetic map for 150 rice recombinant inbred lines with an expected genotype calling accuracy of 99.94% and a resolution of recombination breakpoints within an average of 40 kb. In comparison to the genetic map constructed with 287 PCR-based markers for the rice population, the sequencing-based method was approximately 20x faster in data collection and 35x more precise in recombination breakpoint determination. Using the sequencing-based genetic map, we located a quantitative trait locus of large effect on plant height in a 100-kb region containing the rice "green revolution" gene. Through computer simulation, we demonstrate that the method is robust for different types of mapping populations derived from organisms with variable quality of genome sequences and is feasible for organisms with large genome sizes and low polymorphisms. With continuous advances in sequencing technologies, this genome-based method may replace the conventional marker-based genotyping approach to provide a powerful tool for large-scale gene discovery and for addressing a wide range of biological questions.
Article
Full-text available
Sequence data is crucial to our understanding of crop growth and development, as differences in DNA sequence are responsible for almost all of the heritable differences between crop varieties and ecotypes. The sequence of a genome is often referred to as the genetic blueprint, and is the foundation for all additional information from the genome to the phenome. The value of DNA sequence is leading to rapid improvements in sequencing technology, increasing throughput, and reducing costs, and technological advances are accelerating with the introduction of novel approaches that are replacing the traditional Sanger-based methods. As genome sequencing becomes cheaper, it will be applied to a greater number of species with increasingly large and complex genomes. This will increase our understanding of how differences in the sequence relate to phenotypic observations, heritable traits, speciation, and evolution. Our understanding of plants will be greatly enhanced by this flow of sequence information, with direct benefit for crop improvement.
Article
Full-text available
Molecular genetic markers represent one of the most powerful tools for the analysis of plant genomes and the association of heritable traits with underlying genetic variation. Molecular marker technology has developed rapidly over the last decade, with the development of high-throughput genotyping methods. Two forms of sequence-based marker, simple sequence repeats (SSRs), also known as microsatellites and single nucleotide polymorphisms (SNPs) now predominate applications in modern plant genetic analysis, along the anonymous marker systems such as amplified fragment length polymorphisms (AFLPs) and diversity array technology (DArT). The reducing cost of DNA sequencing and increasing availability of large sequence data sets permits the mining of this data for large numbers of SSRs and SNPs. These may then be used in applications such as genetic linkage analysis and trait mapping, diversity analysis, association studies and marker-assisted selection. Here, we describe automated methods for the discovery of molecular markers and new technologies for high-throughput, low-cost molecular marker genotyping. Genotyping examples include multiplexing of SSRs using Multiplex-Ready marker technology (MRT); DArT genotyping; SNP genotyping using the Invader assay, the single base extension (SBE), oligonucleotide ligation assay (OLA) SNPlex system, and Illumina GoldenGate and Infinium methods.
Article
Full-text available
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source (http://bowtie.cbcb.umd.edu).
Article
Full-text available
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Article
Full-text available
Single nucleotide polymorphism (SNP) discovery and genotyping are essential to genetic mapping. There remains a need for a simple, inexpensive platform that allows high-density SNP discovery and genotyping in large populations. Here we describe the sequencing of restriction-site associated DNA (RAD) tags, which identified more than 13,000 SNPs, and mapped three traits in two model organisms, using less than half the capacity of one Illumina sequencing run. We demonstrated that different marker densities can be attained by choice of restriction enzyme. Furthermore, we developed a barcoding system for sample multiplexing and fine mapped the genetic basis of lateral plate armor loss in threespine stickleback by identifying recombinant breakpoints in F(2) individuals. Barcoding also facilitated mapping of a second trait, a reduction of pelvic structure, by in silico re-sorting of individuals. To further demonstrate the ease of the RAD sequencing approach we identified polymorphic markers and mapped an induced mutation in Neurospora crassa. Sequencing of RAD markers is an integrated platform for SNP discovery and genotyping. This approach should be widely applicable to genetic mapping in a variety of organisms.
Article
Full-text available
Whole-genome hybridization studies have suggested that the nuclear genomes of accessions (natural strains) of Arabidopsis thaliana can differ by several percent of their sequence. To examine this variation, and as a first step in the 1001 Genomes Project for this species, we produced 15- to 25-fold coverage in Illumina sequencing-by-synthesis (SBS) reads for the reference accession, Col-0, and two divergent strains, Bur-0 and Tsu-1. We aligned reads to the reference genome sequence to assess data quality metrics and to detect polymorphisms. Alignments revealed 823,325 unique single nucleotide polymorphisms (SNPs) and 79,961 unique 1- to 3-bp indels in the divergent accessions at a specificity of >99%, and over 2000 potential errors in the reference genome sequence. We also identified >3.4 Mb of the Bur-0 and Tsu-1 genomes as being either extremely dissimilar, deleted, or duplicated relative to the reference genome. To obtain sequences for these regions, we incorporated the Velvet assembler into a targeted de novo assembly method. This approach yielded 10,921 high-confidence contigs that were anchored to flanking sequences and harbored indels as large as 641 bp. Our methods are broadly applicable for polymorphism discovery in moderate to large genomes even at highly diverged loci, and we established by subsampling the Illumina SBS coverage depth required to inform a broad range of functional and evolutionary studies. Our pipeline for aligning reads and predicting SNPs and indels, SHORE, is available for download at http://1001genomes.org.
Article
Population genetics studies using microsatellites, and data on their molecular dynamics, are on the increase. But, so far, no consensus has emerged on which mutation model should be used, though this is of paramount importance for analysis of population genetic structure. However, this is not surprising given the variety of microsatellite molecular motifs. Null alleles may be disturbing for population studies, even though their presence can be detected through careful population analyses, while homoplasy seems of little concern, at least over short evolutionary scales. Interspecific studies show that microsatellites are poor markers for phylogenetic inference. However, these studies are fuelling discussions on directional mutation and the role of selection and recombination In their evolution. Nonetheless, it remains true that microsatellites may be considered as good, neutral mendelian markers.
Article
Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing ...
Article
Genetic markers — heritable polymorphisms that can be measured in one or more populations of individuals — lie at the heart of modern genetics and enable the study of important questions in population genetics, ecological genetics and evolution. In 2003, Luikart et al.1 wrote: ...
Article
Despite the international significance of wheat, its large and complex genome hinders genome sequencing efforts. To assess the impact of selection on this genome, we have assembled genomic regions representing genes for chromosomes 7A, 7B and 7D. We demonstrate that the dispersion of wheat to new environments has shaped the modern wheat genome. Most genes are conserved between the three homoeologous chromosomes. We found differential gene loss that supports current theories on the evolution of wheat, with greater loss observed in the A and B genomes compared with the D. Analysis of intervarietal polymorphisms identified fewer polymorphisms in the D genome, supporting the hypothesis of early gene flow between the tetraploid and hexaploid. The enrichment for genes on the D genome that confer environmental adaptation may be associated with dispersion following wheat domestication. Our results demonstrate the value of applying next-generation sequencing technologies to assemble gene-rich regions of complex genomes and investigate polyploid genome evolution. We anticipate the genome-wide application of this reduced-complexity syntenic assembly approach will accelerate crop improvement efforts not only in wheat, but also in other polyploid crops of significance.
Article
The evolutionary importance of meiosis may not solely be associated with allelic shuffling caused by crossing-over but also have to do with its more immediate effects such as gene conversion. Although estimates of the crossing-over rate are often well resolved, the gene conversion rate is much less clear. In Arabidopsis, for example, next-generation sequencing approaches suggest that the two rates are about the same, which contrasts with indirect measures, these suggesting an excess of gene conversion. Here, we provide analysis of this problem by sequencing 40 F(2) Arabidopsis plants and their parents. Small gene conversion tracts, with biased gene conversion content, represent over 90% (probably nearer 99%) of all recombination events. The rate of alteration of protein sequence caused by gene conversion is over 600 times that caused by mutation. Finally, our analysis reveals recombination hot spots and unexpectedly high recombination rates near centromeres. This may be responsible for the previously unexplained pattern of high genetic diversity near Arabidopsis centromeres.
Chapter
The bulk of variation at the nucleotide level is often not visible at the phenotypic level. However, this variation can be exploited using molecular genetic marker systems. Molecular genetic markers represent one of the most powerful tools for genome analysis and permit the association of heritable traits with underlying genomic variation. Molecular marker technology has developed rapidly over the last decade, with the development of high-throughput genotyping methods and the availability of large amounts of sequence data for automated marker discovery. Two forms of sequence based marker, Simple Sequence Repeats (SSRs), also known as microsatellites, and Single Nucleotide Polymorphisms (SNPs) are the principal markers currently applied in modern genetic analysis. This are supplemented with anonymous marker systems such as Amplified Fragment Length Polymorphisms (AFLPs; Vos et al. 1995), and Diversity Array Technology (DArT; Jaccoud et al. 2001). The reducing cost of DNA sequencing has led to the availability of large sequence data sets that enable the mining of sequence based markers, such as SSRs and SNPs, which may then be applied to diversity analysis, genetic trait mapping, association studies, and marker assisted selection.
Article
The advent of next-generation sequencing (NGS) has revolutionized genomic and transcriptomic approaches to biology. These new sequencing tools are also valuable for the discovery, validation and assessment of genetic markers in populations. Here we review and discuss best practices for several NGS methods for genome-wide genetic marker development and genotyping that use restriction enzyme digestion of target genomes to reduce the complexity of the target. These new methods -- which include reduced-representation sequencing using reduced-representation libraries (RRLs) or complexity reduction of polymorphic sequences (CRoPS), restriction-site-associated DNA sequencing (RAD-seq) and low coverage genotyping -- are applicable to both model organisms with high-quality reference genome sequences and, excitingly, to non-model species with no existing genomic data.
Article
Next Generation Sequencing represents a powerful tool for detecting genetic variation associated with human disease. Because of the high cost of this technology, it is critical that we develop efficient study designs that consider the trade-off between the number of subjects (n) and the coverage depth (µ). How we divide our resources between the two can greatly impact study success, particularly in pilot studies. We propose a strategy for selecting the optimal combination of n and µ for studies aimed at detecting rare variants and for studies aimed at detecting associations between rare or uncommon variants and disease. For detecting rare variants, we find the optimal coverage depth to be between 2 and 8 reads when using the likelihood ratio test. For association studies, we find the strategy of sequencing all available subjects to be preferable. In deriving these combinations, we provide a detailed analysis describing the distribution of depth across a genome and the depth needed to identify a minor allele in an individual. The optimal coverage depth depends on the aims of the study, and the chosen depth can have a large impact on study success. Genet. Epidemiol. 2011.  © 2011 Wiley-Liss, Inc.
Article
The genome of bread wheat (Triticum aestivum) is predicted to be greater than 16 Gbp in size and consist predominantly of repetitive elements, making the sequencing and assembly of this genome a major challenge. We have reduced genome sequence complexity by isolating chromosome arm 7DS and applied second-generation technology and appropriate algorithmic analysis to sequence and assemble low copy and genic regions of this chromosome arm. The assembly represents approximately 40% of the chromosome arm and all known 7DS genes. Comparison of the 7DS assembly with the sequenced genomes of rice (Oryza sativa) and Brachypodium distachyon identified large regions of conservation. The syntenic relationship between wheat, B. distachyon and O. sativa, along with available genetic mapping data, has been used to produce an annotated draft 7DS syntenic build, which is publicly available at http://www.wheatgenome.info. Our results suggest that the sequencing of isolated chromosome arms can provide valuable information of the gene content of wheat and is a step towards whole-genome sequencing and variation discovery in this important crop.
Article
Population genetics studies using microsatellites, and data on their molecular dynamics, are on the increase. But, so far, no consensus has emerged on which mutation model should be used, though this is of paramount importance for analysis of population genetic structure. However, this is not surprising given the variety of microsatellite molecular motifs. Null alleles may be disturbing for population studies, even though their presence can be detected through careful population analyses, while homoplasy seems of little concern, at least over short evolutionary scales. Interspecific studies show that microsatellites are poor markers for phylogenetic inference. However, these studies are fuelling discussions on directional mutation and the role of selection and recombination in their evolution. Nonetheless, it remains true that microsatellites may be considered as good, neutral mendelian markers.
Article
DNA sequencing technology is undergoing a revolution with the commercialization of second generation technologies capable of sequencing thousands of millions of nucleotide bases in each run. The data explosion resulting from this technology is likely to continue to increase with the further development of second generation sequencing and the introduction of third generation single-molecule sequencing methods over the coming years. The question is no longer whether we can sequence crop genomes which are often large and complex, but how soon can we sequence them? Even cereal genomes such as wheat and barley which were once considered intractable are coming under the spotlight of the new sequencing technologies and an array of new projects and approaches are being established. The increasing availability of DNA sequence information enables the discovery of genes and molecular markers associated with diverse agronomic traits creating new opportunities for crop improvement. However, the challenge remains to convert this mass of data into knowledge that can be applied in crop breeding programs.
Article
Genome-wide association studies suggest that common genetic variants explain only a modest fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability. Although DNA sequencing costs have fallen markedly, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions ('exomes'), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of 12 humans. These include eight HapMap individuals representing three populations, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon syndrome (FSS). We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for Mendelian disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of non-synonymous variants by predicted functional impact.
Article
The ongoing revolution in DNA sequencing technology now enables the reading of thousands of millions of nucleotide bases in a single instrument run. However, this data quantity is often compromised by poor confidence in the read quality. The identification of genetic polymorphisms from this data is therefore problematic and, combined with the vast quantity of data, poses a major bioinformatics challenge. However, once these difficulties have been addressed, next-generation sequencing will offer a means to identify and characterize the wealth of genetic polymorphisms underlying the vast phenotypic variation in biological systems. We describe the recent advances in next-generation sequencing technology, together with preliminary approaches that can be applied for single nucleotide polymorphism discovery in plant species.
Article
Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequence data sets permits mining for these molecular markers, which may then be used for applications such as genetic trait mapping, diversity analysis and marker assisted selection in agriculture. Here we describe web-based automated methods for the discovery of SSRs using SSR taxonomy tree, the discovery of SNPs from sequence data using SNPServer and the identification of validated SNPs from within the dbSNP database. SSR taxonomy tree identifies pre-determined SSR amplification primers for virtually all species represented within the GenBank database. SNPServer uses a redundancy based approach to identify SNPs within DNA sequences. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms. The NCBI dbSNP database is a catalogue of molecular variation, hosting validated SNPs for several species within a public-domain archive.
Article
Only a subset of single-nucleotide polymorphisms (SNPs) can be genotyped in genome-wide association studies. Imputation methods can infer the alleles of 'hidden' variants and use those inferences to test the hidden variants for association.