[show abstract][hide abstract] ABSTRACT: • The use of quantitative disease resistance (QDR) is a promising strategy to promote durable resistance to plant pathogens, but genes involved in QDR are largely unknown. To identify genetic components and accelerate improvement of QDR in legumes to the root pathogen Aphanomyces euteiches, we took advantage of both the recently generated massive genomic data for Medicago truncatula and natural variation of this model legume.
• A high density (≈5.1 million Single Nucleotide Polymorphisms - SNPs) Genome-Wide Association Study (GWAS) was performed with both in vitro and greenhouse phenotyping data collected on 179 lines.
• GWAS identified several candidate genes, and pinpointed two independent major loci on the top of chromosome 3 that were detected in both phenotyping methods. Candidate SNPs in the most significant locus (σ2A = 23%) were in the promoter and coding regions of an F-box protein coding gene. Subsequent qRT-PCR and bioinformatic analyses performed on 20 lines demonstrated that resistance is associated with mutations directly affecting the interaction domain of the F-box protein rather than gene expression.
• These results refine the position of previously identified QTL to specific candidate genes, suggest potential molecular mechanisms, and identify new loci explaining QDR to A. euteiches.
[show abstract][hide abstract] ABSTRACT: Small peptides encoded as one- or two-exon genes in plants have recently been shown to affect multipleaspects of plant development, reproduction and defense responses. However, popular similaritysearch tools and gene prediction techniques generally fail to identify most members belonging tothis class of genes. This is largely due to the high sequence divergence among family members andthe limited availability of experimentally verified small peptides to use as training sets for homologysearch and ab initio prediction. Consequently, there is an urgent need for both experimental andcomputational studies in order to further advance the accurate prediction of small peptides.
We present here a homology-based gene prediction program to accurately predict small peptides atthe genome level. Given a high-quality profile alignment, SPADA identifies and annotates nearly allfamily members in tested genomes with better performance than all general-purpose gene predictionprograms surveyed. We find numerous mis-annotations in the current Arabidopsis thaliana and Medicagotruncatula genome databases using SPADA, most of which have RNA-Seq expression support.We also show that SPADA works well on other classes of small secreted peptides in plants (e.g., selfincompatibilityprotein homologues) as well as non-secreted peptides outside the plant kingdom (e.g.,the alpha-amanitin toxin gene family in the mushroom, Amanita bisporigera).
SPADA is a free software tool that accurately identifies and predicts the gene structure for short peptideswith one or two exons. SPADA is able to incorporate information from profile alignments intothe model prediction process and makes use of it to score different candidate models. SPADAachieveshigh sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families.A systematic application of SPADA to other classes of small peptides by research communitieswill greatly improve the genome annotation of different protein families in public genome databases.
[show abstract][hide abstract] ABSTRACT: Sequence data for >20 000 annotated genes from 56 accessions of Medicago truncatula were used to identify potential targets of positive selection, the determinants of evolutionary rate variation and the relative importance of positive and purifying selection in shaping nucleotide diversity. Based upon patterns of intraspecific diversity and interspecific divergence, c. 50-75% of nonsynonymous polymorphisms are subject to strong purifying selection and 1% of the sampled genes harbour a signature of positive selection. Combining polymorphism with expression data, we estimated the distribution of fitness effects and found that the proportion of deleterious mutations is significantly greater for expressed genes than for genes with undetected transcripts (nonexpressed) in a previous RNA-seq experiment and greater for broadly expressed genes than those expressed in only a single tissue. Expression level is the strongest correlate of evolutionary rates at nonsynonymous sites, and despite multiple genomic features being significantly correlated with evolutionary rates, they explain less than 20% of the variation in nonsynonymous rates (dN) and <15% of the variation in either synonymous rates (dS) or dN:dS. Among putative targets of selection were genes involved in defence against pathogens and herbivores, genes with roles in mediating the relationship with rhizobial symbionts and one-third of annotated histone-lysine methyltransferases. Adaptive evolution of the methyltransferases suggests that positive selection in gene expression may have occurred through evolution of enzymes involved in epigenetic modification.
[show abstract][hide abstract] ABSTRACT: BACKGROUND: The sinorhizobia are amongst the most well studied members of nitrogen-fixing root nodule bacteria and contribute substantial amounts of fixed nitrogen to the biosphere. While the alfalfa symbiont Sinorhizobium meliloti RM 1021 was one of the first rhizobial strains to be completely sequenced, little information is available about the genomes of this large and diverse species group. RESULTS AND DISCUSSION: Here we report the draft assembly and annotation of 48 strains of Sinorhizobium comprising five genospecies. While S. meliloti and S. medicae are taxonomically related, they displayed different nodulation patterns on diverse Medicago host plants, and have differences in gene content including those involved in conjugation and organic sulfur utilization. Genes involved in Nod-factor and polysaccharide biosynthesis, denitrification and Type III, IV, and VI secretion systems also vary within and between species. Symbiotic phenotyping and mutational analyses indicated that some Type IV secretion genes are symbiosis-related and involved in nitrogen fixation efficiency. Moreover, there is a correlation between the presence of Type IV secretion systems, heme biosynthesis and microaerobic denitrification genes, and symbiotic efficiency. CONCLUSIONS: Our results suggest that each Sinorhizobium strain uses a slightly different strategy to obtain maximum compatibility with a host plant. This large genome data set provides useful information to better understand the functional features of five Sinorhizobium species, especially compatibility in legume-Sinorhizobium interactions. The diversity of genes present in the accessory genomes of members of this genus indicates that each bacterium has adopted slightly different strategies to interact with diverse plant genera and soil environments.
[show abstract][hide abstract] ABSTRACT: Genome-scale data offer the opportunity to clarify phylogenetic relationships that are difficult to resolve with few loci, but they can also identify genomic regions with evolutionary history distinct from that of the species history. We collected whole-genome sequence data from 29 taxa in the legume genus Medicago, then aligned these sequences to the M. truncatula reference genome to confidently identify 87,596 variable homologous sites. We used this data set to estimate phylogenetic relationships among Medicago species, to investigate the number of sites needed to provide robust phylogenetic estimates, and to identify specific genomic regions supporting topologies in conflict with the genome-wide phylogeny. Our full genomic data set resolves relationships within the genus that were previously intractable. Sub-sampling the data reveals considerable variation in phylogenetic signal and power in smaller subsets of the data. Even when sampling 5,000 sites, no random sample of the data supports a topology identical to that of the genome-wide phylogeny. Phylogenetic relationships estimated from 500-site sliding windows revealed genome regions supporting several alternative species relationships among recently-diverged taxa, consistent with the expected effects of deep coalescence or introgression in the recent history of Medicago.
[show abstract][hide abstract] ABSTRACT: Genome-wide association study (GWAS) has revolutionized the search for the genetic basis of complex traits. To date, GWAS have generally relied on relatively sparse sampling of nucleotide diversity, which is likely to bias results by preferentially sampling high-frequency SNPs not in complete linkage disequilibrium (LD) with causative SNPs. To avoid these limitations we conducted GWAS with >6 million SNPs identified by sequencing the genomes of 226 accessions of the model legume Medicago truncatula. We used these data to identify candidate genes and the genetic architecture underlying phenotypic variation in plant height, trichome density, flowering time, and nodulation. The characteristics of candidate SNPs differed among traits, with candidates for flowering time and trichome density in distinct clusters of high linkage disequilibrium (LD) and the minor allele frequencies (MAF) of candidates underlying variation in flowering time and height significantly greater than MAF of candidates underlying variation in other traits. Candidate SNPs tagged several characterized genes including nodulation related genes SERK2, MtnodGRP3, MtMMPL1, NFP, CaML3, MtnodGRP3A and flowering time gene MtFD as well as uncharacterized genes that become candidates for further molecular characterization. By comparing sequence-based candidates to candidates identified by in silico 250K SNP arrays, we provide an empirical example of how reliance on even high-density reduced representation genomic makers can bias GWAS results. Depending on the trait, only 30-70% of the top 20 in silico array candidates were within 1 kb of sequence-based candidates. Moreover, the sequence-based candidates tagged by array candidates were heavily biased towards common variants; these comparisons underscore the need for caution when interpreting results from GWAS conducted with sparsely covered genomes.
PLoS ONE 01/2013; 8(5):e65688. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: The symbiosis between rhizobial bacteria and legume plants has served as a model for investigating the genetics of nitrogen fixation and the evolution of facultative mutualism. We used deep sequence coverage (>100×) to characterize genomic diversity at the nucleotide level among 12 Sinorhizobium medicae and 32 S. meliloti strains. Although these species are closely related and share host plants, based on the ratio of shared polymorphisms to fixed differences we found that horizontal gene transfer (HGT) between these species was confined almost exclusively to plasmid genes. Three multi-genic regions that show the strongest evidence of HGT harbor genes directly involved in establishing or maintaining the mutualism with host plants. In both species, nucleotide diversity is 1.5-2.5 times greater on the plasmids than chromosomes. Interestingly, nucleotide diversity in S. meliloti but not S. medicae is highly structured along the chromosome - with mean diversity (θ(π)) on one half of the chromosome five times greater than mean diversity on the other half. Based on the ratio of plasmid to chromosome diversity, this appears to be due to severely reduced diversity on the chromosome half with less diversity, which is consistent with extensive hitchhiking along with a selective sweep. Frequency-spectrum based tests identified 82 genes with a signature of adaptive evolution in one species or another but none of the genes were identified in both species. Based upon available functional information, several genes identified as targets of selection are likely to alter the symbiosis with the host plant, making them attractive targets for further functional characterization.
[show abstract][hide abstract] ABSTRACT: Recombination rates vary across the genome and in many species show significant relationships with several genomic features, including distance to the centromere, gene density, and GC content. Studies of fine-scale recombination rates have also revealed that in several species, there are recombination hotspots, that is, short regions with recombination rates 10-100 greater than those in surrounding regions. In this study, we analyzed whole-genome resequence data from 26 accessions of the model legume Medicago truncatula to gain insight into the genomic features that are related to high- and low-recombination rates and recombination hotspots at 1 kb scales. We found that high-recombination regions (1-kb windows among those in the highest 5% of the distribution) on all three chromosomes were significantly closer to the centromere, had higher gene density, and lower GC content than low-recombination windows. High-recombination windows are also significantly overrepresented among some gene functional categories-most strongly NB-ARC and LRR genes, both of which are important in plant defense against pathogens. Similar to high-recombination windows, recombination hotspots (1-kb windows with significantly higher recombination than the surrounding region) are significantly nearer to the centromere than nonhotspot windows. By contrast, we detected no difference in gene density or GC content between hotspot and nonhotspot windows. Using linear model wavelet analysis to examine the relationship between recombination and genomic features across multiple spatial scales, we find a significant negative correlation with distance to the centromere across scales up to 512 kb, whereas gene density and GC content show significantly positive and negative correlations, respectively, only up to 64 kb. Correlations between recombination and genomic features, particularly gene density and polymorphism, suggest that they are scale dependent and need to be assessed at scales relevant to the evolution of those features.
Genome Biology and Evolution 05/2012; 4(5):726-37. · 4.76 Impact Factor
[show abstract][hide abstract] ABSTRACT: We used a comparative genomics approach to investigate the evolution of a complex nucleotide-binding (NB)-leucine-rich repeat (LRR) gene cluster found in soybean (Glycine max) and common bean (Phaseolus vulgaris) that is associated with several disease resistance (R) genes of known function, including Rpg1b (for Resistance to Pseudomonas glycinea1b), an R gene effective against specific races of bacterial blight. Analysis of domains revealed that the amino-terminal coiled-coil (CC) domain, central nucleotide-binding domain (NB-ARC [for APAF1, Resistance genes, and CED4]), and carboxyl-terminal LRR domain have undergone distinct evolutionary paths. Sequence exchanges within the NB-ARC domain were rare. In contrast, interparalogue exchanges involving the CC and LRR domains were common, consistent with both of these regions coevolving with pathogens. Residues under positive selection were overrepresented within the predicted solvent-exposed face of the LRR domain, although several also were detected within the CC and NB-ARC domains. Superimposition of these latter residues onto predicted tertiary structures revealed that the majority are located on the surface, suggestive of a role in interactions with other domains or proteins. Following polyploidy in the Glycine lineage, NB-LRR genes have been preferentially lost from one of the duplicated chromosomes (homeologues found in soybean), and there has been partitioning of NB-LRR clades between the two homeologues. The single orthologous region in common bean contains approximately the same number of paralogues as found in the two soybean homeologues combined. We conclude that while polyploidization in Glycine has not driven a stable increase in family size for NB-LRR genes, it has generated two recombinationally isolated clusters, one of which appears to be in the process of decay.
[show abstract][hide abstract] ABSTRACT: Legumes are the third-largest family of angiosperms, the second-most-important crop family, and a key source of biological nitrogen in agriculture. Recently, the genome sequences of Glycine max (soybean), Medicago truncatula, and Lotus japonicus were substantially completed. Comparisons among legume genomes reveal a key role for duplication, especially a whole-genome duplication event approximately 58 Mya that is shared by most agriculturally important legumes. A second and more recent genome duplication occurred only in the lineage leading to soybean. Outcomes of genome duplication, including gene fractionation and sub- and neofunctionalization, have played key roles in shaping legume genomes and in the evolution of legume-specific traits. Analysis of legume genome sequences also enables the discovery of legume-specific gene families and provides a framework for genome-wide association mapping that will target phenotypes of special importance in legumes. Translating genomic resources from sequenced species to less studied but still important "orphan" legumes will enhance prospects for world food production.
[show abstract][hide abstract] ABSTRACT: Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species. Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing ∼94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.
[show abstract][hide abstract] ABSTRACT: Medicago truncatula is a model for investigating legume genetics, including the genetics and evolution of legume-rhizobia symbiosis. We used whole-genome sequence data to identify and characterize sequence polymorphisms and linkage disequilibrium (LD) in a diverse collection of 26 M. truncatula accessions. Our analyses reveal that M. truncatula harbors both higher diversity and less LD than soybean (Glycine max) and exhibits patterns of LD and recombination similar to Arabidopsis thaliana. The population-scaled recombination rate is approximately one-third of the mutation rate, consistent with expectations for a species with a high selfing rate. Linkage disequilibrium, however, is not extensive, and therefore, the low recombination rate is likely not a major constraint to adaptation. Nucleotide diversity in 100-kb windows was negatively correlated with gene density, which is expected if diversity is shaped by selection acting against slightly deleterious mutations. Among putative coding regions, members of four gene families harbor significantly higher diversity than the genome-wide average. Three of these families are involved in resistance against pathogens; one of these families, the nodule-specific, cysteine-rich gene family, is specific to the galegoid legumes and is involved in control of rhizobial differentiation. The more than 3 million SNPs that we detected, approximately one-half of which are present in more than one accession, are a valuable resource for genome-wide association mapping of genes responsible for phenotypic diversity in legumes, especially traits associated with symbiosis and nodulation.
Proceedings of the National Academy of Sciences 09/2011; 108(42):E864-70. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: The genomes of most, if not all, flowering plants have undergone whole genome duplication events during their evolution. The impact of such polyploidy events is poorly understood, as is the fate of most duplicated genes. We sequenced an approximately 1 million-bp region in soybean (Glycine max) centered on the Rpg1-b disease resistance gene and compared this region with a region duplicated 10 to 14 million years ago. These two regions were also compared with homologous regions in several related legume species (a second soybean genotype, Glycine tomentella, Phaseolus vulgaris, and Medicago truncatula), which enabled us to determine how each of the duplicated regions (homoeologues) in soybean has changed following polyploidy. The biggest change was in retroelement content, with homoeologue 2 having expanded to 3-fold the size of homoeologue 1. Despite this accumulation of retroelements, over 77% of the duplicated low-copy genes have been retained in the same order and appear to be functional. This finding contrasts with recent analyses of the maize (Zea mays) genome, in which only about one-third of duplicated genes appear to have been retained over a similar time period. Fluorescent in situ hybridization revealed that the homoeologue 2 region is located very near a centromere. Thus, pericentromeric localization, per se, does not result in a high rate of gene inactivation, despite greatly accelerated retrotransposon accumulation. In contrast to low-copy genes, nucleotide-binding-leucine-rich repeat disease resistance gene clusters have undergone dramatic species/homoeologue-specific duplications and losses, with some evidence for partitioning of subfamilies between homoeologues.
[show abstract][hide abstract] ABSTRACT: Retrotransposons and their remnants often constitute more than 50% of higher plant genomes. Although extensively studied in monocot crops such as maize (Zea mays) and rice (Oryza sativa), the impact of retrotransposons on dicot crop genomes is not well documented. Here, we present an analysis of retrotransposons in soybean (Glycine max). Analysis of approximately 3.7 megabases (Mb) of genomic sequence, including 0.87 Mb of pericentromeric sequence, uncovered 45 intact long terminal repeat (LTR)-retrotransposons. The ratio of intact elements to solo LTRs was 8:1, one of the highest reported to date in plants, suggesting that removal of retrotransposons by homologous recombination between LTRs is occurring more slowly in soybean than in previously characterized plant species. Analysis of paired LTR sequences uncovered a low frequency of deletions relative to base substitutions, indicating that removal of retrotransposon sequences by illegitimate recombination is also operating more slowly. Significantly, we identified three subfamilies of nonautonomous elements that have replicated in the recent past, suggesting that retrotransposition can be catalyzed in trans by autonomous elements elsewhere in the genome. Analysis of 1.6 Mb of sequence from Glycine tomentella, a wild perennial relative of soybean, uncovered 23 intact retroelements, two of which had accumulated no mutations in their LTRs, indicating very recent insertion. A similar pattern was found in 0.94 Mb of sequence from Phaseolus vulgaris (common bean). Thus, autonomous and nonautonomous retrotransposons appear to be both abundant and active in Glycine and Phaseolus. The impact of nonautonomous retrotransposon replication on genome size appears to be much greater than previously appreciated.
[show abstract][hide abstract] ABSTRACT: Large numbers of single nucleotide polymorphism (SNP) markers are now available for a number of crop species. However, the high-throughput methods for multiplexing SNP assays are untested in complex genomes, such as soybean, that have a high proportion of paralogous genes. The Illumina GoldenGate assay is capable of multiplexing from 96 to 1,536 SNPs in a single reaction over a 3-day period. We tested the GoldenGate assay in soybean to determine the success rate of converting verified SNPs into working assays. A custom 384-SNP GoldenGate assay was designed using SNPs that had been discovered through the resequencing of five diverse accessions that are the parents of three recombinant inbred line (RIL) mapping populations. The 384 SNPs that were selected for this custom assay were predicted to segregate in one or more of the RIL mapping populations. Allelic data were successfully generated for 89% of the SNP loci (342 of the 384) when it was used in the three RIL mapping populations, indicating that the complex nature of the soybean genome had little impact on conversion of the discovered SNPs into usable assays. In addition, 80% of the 342 mapped SNPs had a minor allele frequency >10% when this assay was used on a diverse sample of Asian landrace germplasm accessions. The high success rate of the GoldenGate assay makes this a useful technique for quickly creating high density genetic maps in species where SNP markers are rapidly becoming available.
Theoretical and Applied Genetics 05/2008; 116(7):945-52. · 3.66 Impact Factor
[show abstract][hide abstract] ABSTRACT: The nucleotide-binding site (NBS)-Leucine-rich repeat (LRR) gene family accounts for the largest number of known disease resistance genes, and is one of the largest gene families in plant genomes. We have identified 333 nonredundant NBS-LRRs in the current Medicago truncatula draft genome (Mt1.0), likely representing 400 to 500 NBS-LRRs in the full genome, or roughly 3 times the number present in Arabidopsis (Arabidopsis thaliana). Although many characteristics of the gene family are similar to those described on other plant genomes, several evolutionary features are particularly pronounced in M. truncatula, including a high degree of clustering, evidence of significant numbers of ectopic translocations from clusters to other parts of the genome, a small number of more evolutionarily stable NBS-LRRs, and numerous truncations and fusions leading to novel domain compositions. The gene family clearly has had a large impact on the structure of the genome, both through ectopic translocations (potentially, a means of seeding new NBS-LRR clusters), and through two extraordinarily large superclusters. Chromosome 6 encodes approximately 34% of all TIR-NBS-LRRs, while chromosome 3 encodes approximately 40% of all coiled-coil-NBS-LRRs. Almost all atypical domain combinations are in the TIR-NBS-LRR subfamily, with many occurring within one genomic cluster. This analysis shows the gene family not only is important functionally and agronomically, but also plays a structural role in the genome.
[show abstract][hide abstract] ABSTRACT: Medicago truncatula was used to characterize resistance to anthracnose and powdery mildew caused by Colletotrichum trifolii and Erysiphe pisi, respectively. Two isolates of E. pisi (Ep-p from pea and Ep-a from alfalfa) and two races of C. trifolii (races 1 and 2) were used in this study. The A17 genotype was resistant and displayed a hypersensitive response after inoculation with either pathogen, while lines F83005.5 and DZA315.16 were susceptible to anthracnose and powdery mildew, respectively. To identify the genetic determinants underlying resistance in A17, two F7 recombinant inbred line (RIL) populations, LR4 (A17 x DZA315.16) and LR5 (A17 x F83005.5), were phenotyped with E. pisi isolates and C. trifolii races, respectively. Genetic analyses showed that i) resistance to anthracnose is governed mainly by a single major locus to both races, named Ct1 and located on the upper part of chromosome 4; and ii) resistance to powdery mildew involves three distinct loci, Epp1 on chromosome 4 and Epa1 and Epa2 on chromosome 5. The use of a consensus genetic map for the two RIL populations revealed that Ct1 and Epp1, although located in the same genome region, were clearly distinct. In silico analysis in this region identified the presence of several clusters of nucleotide binding site leucine-rich repeat genes. Many of these genes have atypical resistance gene analog structures and display differential expression patterns in distinct stress-related cDNA libraries.
[show abstract][hide abstract] ABSTRACT: Although originally thought to be less frequent in plants than in animals, alternative splicing (AS) is now known to be widespread in plants. Here we report the characteristics of AS in legumes, one of the largest and most important plant families, based on EST alignments to the genome sequences of Medicago truncatula (Mt) and Lotus japonicus (Lj).
Based on cognate EST alignments alone, the observed frequency of alternatively spliced genes is lower in Mt (approximately 10%, 1,107 genes) and Lj (approximately 3%, 92 genes) than in Arabidopsis and rice (both around 20%). However, AS frequencies are comparable in all four species if EST levels are normalized. Intron retention is the most common form of AS in all four plant species (~50%), with slightly lower frequency in legumes compared to Arabidopsis and rice. This differs notably from vertebrates, where exon skipping is most common. To uncover additional AS events, we aligned ESTs from other legume species against the Mt genome sequence. In this way, 248 additional Mt genes were predicted to be alternatively spliced. We also identified 22 AS events completely conserved in two or more plant species.
This study extends the range of plant taxa shown to have high levels of AS, confirms the importance of intron retention in plants, and demonstrates the utility of using ESTs from related species in order to identify novel and conserved AS events. The results also indicate that the frequency of AS in plants is comparable to that observed in mammals. Finally, our results highlight the importance of normalizing EST levels when estimating the frequency of alternative splicing.
[show abstract][hide abstract] ABSTRACT: The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome. Single-nucleotide polymorphisms (SNPs) were discovered via the resequencing of sequence-tagged sites (STSs) developed from expressed sequence tag (EST) sequence. From an initial set of 9459 polymerase chain reaction primer sets designed to a diverse set of genes, 4240 STSs were amplified and sequenced in each of six diverse soybean genotypes. In the resulting 2.44 Mbp of aligned sequence, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of Theta= 0.000997. The analysis of the observed genetic distances between adjacent genes vs. the theoretical distribution based upon the assumption of a random distribution of genes across the 20 soybean linkage groups clearly indicated that genes were clustered. Of the 1141 genes, 291 mapped to 72 of the 112 gaps of 5-10 cM in the preexisting simple sequence repeat (SSR)-based map, while 111 genes mapped in 19 of the 26 gaps >10 cM. The addition of 1141 sequence-based genic markers to the soybean genome map will provide an important resource to soybean geneticists for quantitative trait locus discovery and map-based cloning, as well as to soybean breeders who increasingly depend upon marker-assisted selection in cultivar improvement.
[show abstract][hide abstract] ABSTRACT: White clover (Trifolium repens L.) is a forage legume widely used in combination with grass in pastures because of its ability to fix nitrogen. We have constructed a bacterial artificial chromosome (BAC) library of an advanced breeding line of white clover. The library contains 37 248 clones with an average insert size of approximately 85 kb, representing an approximate 3-fold coverage of the white clover genome based on an estimated genome size of 960 Mb. The BAC library was pooled and screened by polymerase chain reaction (PCR) amplification using both white clover microsatellites and PCR-based markers derived from Medicago truncatula, resulting in an average of 6 hits per marker; this supports the estimated 3-fold genome coverage in this allotetraploid species. PCR-based screening of 766 clones with a multiplex set of chloroplast primers showed that only 0.5% of BAC clones contained chloroplast-derived inserts. The library was further evaluated by sequencing both ends of 724 of the clover BACs. These were analysed with respect to their sequence content and their homology to the contents of a range of plant gene, expressed sequence tag, and repeat element databases. Forty-three microsatellites were discovered in the BAC-end sequences (BESs) and investigated as potential genetic markers in white clover. The BESs were also compared with the partially sequenced genome of the model legume M. truncatula with the specific intention of identifying putative comparative-tile BACs, which represent potential regions of microsynteny between the 2 species; 14 such BACs were discovered. The results suggest that a large-scale BAC-end sequencing strategy has the potential to anchor a significant proportion of the genome of white clover onto the gene-space sequence of M. truncatula.