[Show abstract][Hide abstract] ABSTRACT: Tef (Eragrostis tef) is a major cereal crop in Ethiopia. Lodging is the primary constraint to increasing productivity in this allotetraploid species, accounting for losses of ~15-45% in yield each year. As a first step toward identifying semi-dwarf varieties that might have improved lodging resistance, an ~6X fosmid library was constructed and used to identify both homoeologues of the dw3 semi-dwarfing gene of Sorghum bicolor. An EMS mutagenized population, consisting of ~21,210 tef plants, was planted and leaf materials were collected into 23 superpools. Two dwarfing candidate genes, homoeologues of dw3 of sorghum and rht1 of wheat, were sequenced directly from each superpool with 454 technology, and 120 candidate mutations were identified. Out of ten candidates tested, six independent mutations were validated by Sanger sequencing, including two predicted detrimental mutations in both dw3 homoeologues with a potential to improve lodging resistance in tef through further breeding. This study demonstrates that high throughput sequencing can identify potentially valuable mutations in under-studied plant species like tef, and has provided mutant lines that can now be combined and tested in breeding programs for improved lodging resistance.
[Show abstract][Hide abstract] ABSTRACT: In the fully sequenced Arabidopsis (Arabidopsis thaliana) genome, many gene models are annotated as "hypothetical protein," whose gene structures are predicted solely by computer algorithms with no support from either expressed sequence matches from Arabidopsis, or nucleic acid or protein homologs from other species. In order to confirm their existence and predicted gene structures, a high-throughput method of rapid amplification of cDNA ends (RACE) was used to obtain their cDNA sequences from 11 cDNA populations. Primers from all of the 797 hypothetical genes on chromosome 2 were designed, and, through 5' and 3' RACE, clones from 506 genes were sequenced and cDNA sequences from 399 target genes were recovered. The cDNA sequences were obtained by assembling their 5' and 3' RACE polymerase chain reaction products. These sequences revealed that (1) the structures of 151 hypothetical genes were different from their predictions; (2) 116 hypothetical genes had alternatively spliced transcripts and 187 genes displayed polyadenylation sites; and (3) there were transcripts arising from both strands, from the strand opposite to that of the prediction and possible dicistronic transcripts. Promoters from five randomly chosen hypothetical genes (At2g02540, At2g31270, At2g33640, At2g35550, and At2g36340) were cloned into report constructs, and their expressions are tissue or development stage specific. Our results indicate at least 50% of hypothetical genes on chromosome 2 are expressed in the cDNA populations with about 38% of the gene structures differing from their predictions. Thus, by using this targeted approach, high-throughput RACE, we revealed numerous transcripts including many uncharacterized variants from these hypothetical genes.
[Show abstract][Hide abstract] ABSTRACT: Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering 283 Mb (0.44 x) of the estimated 650 Mb Brassica genome were searched against the Arabidopsis genome, and conserved Arabidopsis genome sequences (CAGSs) were identified. Of these 229,735 conserved regions, 167,357 fell within or intersected existing gene models, while 60,378 were located in previously unannotated regions. After removal of sequences matching known proteins, CAGSs that were close to one another were chained together as potentially comprising portions of the same functional unit. This resulted in 27,347 chains of which 15,686 were sufficiently distant from existing gene annotations to be considered a novel conserved unit. Of 192 conserved regions examined, 58 were found to be expressed in our cDNA populations. Rapid amplification of cDNA ends (RACE) was used to obtain potentially full-length transcripts from these 58 regions. The resulting sequences led to the creation of 21 gene models at 17 new Arabidopsis loci and the addition of splice variants or updates to another 19 gene structures. In addition, CAGSs overlapping already annotated genes in Arabidopsis can provide guidance for manual improvement of existing gene models. Published genome-wide expression data based on whole genome tiling arrays and massively parallel signature sequencing were overlaid on the Brassica-Arabidopsis conserved sequences, and 1399 regions of intersection were identified. Collectively our results and these data sets suggest that several thousand new Arabidopsis genes remain to be identified and annotated.
Genome Research 05/2005; 15(4):487-95. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The development of single nucleotide polymorphism (SNP) markers in maize offers the opportunity to utilize DNA markers in
many new areas of population genetics, gene discovery, plant breeding and germplasm identification. However, the steps from
sequencing and SNP discovery to SNP marker design and validation are lengthy and expensive. Access to a set of validated SNP
markers is a significant advantage to maize researchers who wish to apply SNPs in scientific inquiry. We mined 1,088 loci
sequenced across 60 public inbreds that have been used in maize breeding in North America and Europe. We then selected 640
SNPs using generalized marker design criteria that enable utilization with several SNP chemistries. While SNPs were found
on average every 43 bases in 1,088 maize gene sequences, SNPs that were amenable to marker design were found on average every
623 bases; representing only 7% of the total SNPs discovered. We also describe the development of a 768 marker multiplex assay
for use on the Illumina® BeadArray™ platform. SNP markers were mapped on the IBM2 intermated B73×Mo17 high resolution genetic map using either the
IBM2 segregating population, or segregation in multiple parent-progeny triplets. A high degree of colinearity was found with
the genetic nested association map. For each SNP presented we give information on map location, polymorphism rates in different
heterotic groups and performance on the Illumina® platform.