Distinct evolutionary patterns of Oryza glaberrima deciphered by genome sequencing and comparative analysis.
ABSTRACT Here we present the genomic sequence of the African cultivated rice, Oryza glaberrima, and compare these data with the genome sequence of Asian cultivated rice, Oryza sativa. We obtained gene-enriched sequences of O. glaberrima that correspond to about 25% of the gene regions of the O. sativa (japonica) genome by methylation filtration and subtractive hybridization of repetitive sequences. While patterns of amino acid changes did not differ between the two species in terms of the biochemical properties, genes of O. glaberrima generally showed a larger synonymous-nonsynonymous substitution ratio, suggesting that O. glaberrima has undergone a genome-wide relaxation of purifying selection. We further investigated nucleotide substitutions around splice sites and found that eight genes of O. sativa experienced changes at splice sites after the divergence from O. glaberrima. These changes produced novel introns that partially truncated functional domains, suggesting that these newly emerged introns affect gene function. We also identified 2451 simple sequence repeats (SSRs) from the genomes of O. glaberrima and O. sativa. Although tri-nucleotide repeats were most common among the SSRs and were overrepresented in the protein-coding sequences, we found that selection against indels of tri-nucleotide repeats was relatively weak in both African and Asian rice. Our genome-wide sequencing of O. glaberrima and in-depth analyses provide rice researchers not only with useful genomic resources for future breeding but also with new insights into the genomic evolution of the African and Asian rice species.
- SourceAvailable from: Arwa Shahin[show abstract] [hide abstract]
ABSTRACT: SNP (Single Nucleotide Polymorphism) markers are rapidly becoming the markers of choice for applications in breeding because of next generation sequencing technology developments. For SNP development by NGS technologies, correct assembly of the huge amounts of sequence data generated is essential. Little is known about assembler's performance, especially when dealing with highly heterogeneous species that show a high genome complexity and what the possible consequences are of differences in assemblies on SNP retrieval. This study tested two assemblers (CAP3 and CLC) on 454 data from four lily genotypes and compared results with respect to SNP retrieval. CAP3 assembly resulted in higher numbers of contigs, lower numbers of reads per contig, and shorter average read lengths compared to CLC. Blast comparisons showed that CAP3 contigs were highly redundant. Contrastingly, CLC in rare cases combined paralogs in one contig. Redundant and chimeric contigs may lead to erroneous SNPs. Filtering for redundancy can be done by blasting selected SNP markers to the contigs and discarding all the SNP markers that show more than one blast hit. Results on chimeric contigs showed that only four out of 2,421 SNP markers were selected from chimeric contigs. In practice, CLC performs better in assembling highly heterogeneous genome sequences compared to CAP3, and consequently SNP retrieval is more efficient. Additionally a simple flow scheme is suggested for SNP marker retrieval that can be valid for all non-model species.BMC Research Notes 01/2012; 5:79.