Shu Ouyang

Michigan State University, East Lansing, MI, USA

Are you Shu Ouyang?

Claim your profile

Publications (29)218.95 Total impact

  • Source
    Article: Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana.
    [show abstract] [hide abstract]
    ABSTRACT: The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific. Comparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self-incompatible Brassicaceae species. Overall, the ALSGs and CBSGs were more highly methylated in floral tissue compared to the ECs. Single Nucleotide Polymorphism (SNP) analysis showed an elevated ratio of non-synonymous to synonymous SNPs within the ALSGs (1.99) and CBSGs (1.65) relative to the EC set (0.92), mainly caused by an elevated number of non-synonymous SNPs, indicating that they are fast-evolving at the protein sequence level. Our analyses suggest that while a significant fraction of the A. thaliana proteome is conserved within the Plant Kingdom, evolutionarily distinct sets of genes that may function in defining biological processes unique to these lineages have arisen within the Brassicaceae and A. thaliana.
    BMC Evolutionary Biology 02/2010; 10:41. · 3.52 Impact Factor
  • Source
    Article: Identification and characterization of pseudogenes in the rice gene complement.
    [show abstract] [hide abstract]
    ABSTRACT: The Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog. A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes. These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.
    BMC Genomics 08/2009; 10:317. · 4.07 Impact Factor
  • Article: Plant genome annotation methods.
    [show abstract] [hide abstract]
    ABSTRACT: Annotation of plant genomic sequences can be separated into structural and functional annotation. Structural annotation is the foundation of all genomics as without accurate gene models understanding gene function or evolution of genes across taxa can be impeded. Structural annotation is dependent on sensitive, specific computational programs and deep experimental evidence to identify gene features within genomic DNA. Functional annotation is highly dependent on sequence similarity to other known genes or proteins as the majority of initial "first-pass" functional annotation on a genomic scale is transitive. Coupling structural and functional annotation across genomes in a comparative manner promotes more accurate annotation as well as an understanding of gene and genome evolution. With the increasing availability of plant genome sequence data, the value of comparative annotation will increase. As with any new field, methodologies are evolving for genome annotation and will improve in the future.
    Methods in molecular biology (Clifton, N.J.) 02/2009; 513:263-82.
  • Article: Identification of miniature inverted-repeat transposable elements (MITEs) and biogenesis of their siRNAs in the Solanaceae: new functional implications for MITEs.
    [show abstract] [hide abstract]
    ABSTRACT: Small RNAs regulate the genome by guiding transcriptional and post-transcriptional silencing machinery to specific target sequences, including genes and transposable elements (TEs). Although miniature inverted-repeat transposable elements (MITEs) are closely associated with euchromatic genes, the broader functional impact of these short TE insertions in genes is largely unknown. We identified 22 families of MITEs in the Solanaceae (MiS1-MiS22) and found abundant MiS insertions in Solanaceae genomic DNA and expressed sequence tags (EST). Several Solanaceae MITEs generate genome changes that potentially affect gene function and regulation, most notably, a MiS insertion that provides a functionally indispensable alternative exon in the tobacco mosaic virus N resistance gene. We show that MITEs generate small RNAs that are primarily 24 nt in length, as detected by Northern blot hybridization and by sequencing small RNAs of Solanum demissum, Nicotiana glutinosa, and Nicotiana benthamiana. Additionally, we show that stable RNAi lines silencing DICER-LIKE3 (DCL3) in tobacco and RNA-dependent RNA polymerase 2 (RDR2) in potato cause a reduction in 24-nt MITE siRNAs, suggesting that, as in Arabidopsis, TE-derived siRNA biogenesis is DCL3 and RDR2 dependent. We provide evidence that DICER-LIKE4 (DCL4) may also play a role in MITE siRNA generation in the Solanaceae.
    Genome Research 12/2008; 19(1):42-56. · 13.61 Impact Factor
  • Source
    Article: Identification and functional analysis of light-responsive unique genes and gene family members in rice.
    [show abstract] [hide abstract]
    ABSTRACT: Functional redundancy limits detailed analysis of genes in many organisms. Here, we report a method to efficiently overcome this obstacle by combining gene expression data with analysis of gene-indexed mutants. Using a rice NSF45K oligo-microarray to compare 2-week-old light- and dark-grown rice leaf tissue, we identified 365 genes that showed significant 8-fold or greater induction in the light relative to dark conditions. We then screened collections of rice T-DNA insertional mutants to identify rice lines with mutations in the strongly light-induced genes. From this analysis, we identified 74 different lines comprising two independent mutant lines for each of 37 light-induced genes. This list was further refined by mining gene expression data to exclude genes that had potential functional redundancy due to co-expressed family members (12 genes) and genes that had inconsistent light responses across other publicly available microarray datasets (five genes). We next characterized the phenotypes of rice lines carrying mutations in ten of the remaining candidate genes and then carried out co-expression analysis associated with these genes. This analysis effectively provided candidate functions for two genes of previously unknown function and for one gene not directly linked to the tested biochemical pathways. These data demonstrate the efficiency of combining gene family-based expression profiles with analyses of insertional mutants to identify novel genes and their functions, even among members of multi-gene families.
    PLoS Genetics 02/2008; 4(8):e1000164. · 8.69 Impact Factor
  • Source
    Article: Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy.
    [show abstract] [hide abstract]
    ABSTRACT: Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.
    PLoS ONE 02/2008; 3(10):e3337. · 4.09 Impact Factor
  • Source
    Article: Analysis of 90 Mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition.
    [show abstract] [hide abstract]
    ABSTRACT: The Solanaceae family contains a number of important crop species including potato (Solanum tuberosum) which is grown for its underground storage organ known as a tuber. Albeit the 4th most important food crop in the world, other than a collection of approximately 220,000 Expressed Sequence Tags, limited genomic sequence information is currently available for potato and advances in potato yield and nutrition content would be greatly assisted through access to a complete genome sequence. While morphologically diverse, Solanaceae species such as potato, tomato, pepper, and eggplant share not only genes but also gene order thereby permitting highly informative comparative genomic analyses. In this study, we report on analysis 89.9 Mb of potato genomic sequence representing 10.2% of the genome generated through end sequencing of a potato bacterial artificial chromosome (BAC) clone library (87 Mb) and sequencing of 22 potato BAC clones (2.9 Mb). The GC content of potato is very similar to Solanum lycopersicon (tomato) and other dicotyledonous species yet distinct from the monocotyledonous grass species, Oryza sativa. Parallel analyses of repetitive sequences in potato and tomato revealed substantial differences in their abundance, 34.2% in potato versus 46.3% in tomato, which is consistent with the increased genome size per haploid genome of these two Solanum species. Specific classes and types of repetitive sequences were also differentially represented between these two species including a telomeric-related repetitive sequence, ribosomal DNA, and a number of unclassified repetitive sequences. Comparative analyses between tomato and potato at the gene level revealed a high level of conservation of gene content, genic feature, and gene order although discordances in synteny were observed. Genomic level analyses of potato and tomato confirm that gene sequence and gene order are conserved between these solanaceous species and that this conservation can be leveraged in genomic applications including cross-species annotation and genome sequencing initiatives. While tomato and potato share genic features, they differ in their repetitive sequence content and composition suggesting that repetitive sequences may have a more significant role in shaping speciation than previously reported.
    BMC Genomics 02/2008; 9:286. · 4.07 Impact Factor
  • Source
    Article: Characterization of paralogous protein families in rice.
    [show abstract] [hide abstract]
    ABSTRACT: High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa, cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other genetic events such as tandem duplications, has resulted in a substantial number of its genes, and resulting proteins, occurring in paralogous families. Using a computational pipeline that utilizes Pfam and novel protein domains, we characterized paralogous families in rice and compared these with paralogous families in the model dicotyledonous diploid species, Arabidopsis thaliana. Arabidopsis, which has undergone genome duplication as well, has a substantially smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53% and 68% of the non-transposable element-related rice and Arabidopsis proteins could be classified into paralogous protein families, respectively. Singleton and paralogous family genes differed substantially in their likelihood of encoding a protein of known or putative function; 26% and 66% of singleton genes compared to 73% and 96% of the paralogous family genes encode a known or putative protein in rice and Arabidopsis, respectively. Furthermore, a major skew in the distribution of specific gene function was observed; a total of 17 Gene Ontology categories in both rice and Arabidopsis were statistically significant in their differential distribution between paralogous family and singleton proteins. In contrast to mammalian organisms, we found that duplicated genes in rice and Arabidopsis tend to have more alternative splice forms. Using data from Massively Parallel Signature Sequencing, we show that a significant portion of the duplicated genes in rice show divergent expression although a correlation between sequence divergence and correlation of expression could be seen in very young genes. Collectively, these data suggest that while co-regulation and conserved function are present in some paralogous protein family members, evolutionary pressures have resulted in functional divergence with differential expression patterns.
    BMC Plant Biology 02/2008; 8:18. · 3.45 Impact Factor
  • Article: Identification and characterization of lineage-specific genes within the Poaceae.
    [show abstract] [hide abstract]
    ABSTRACT: Using the rice (Oryza sativa) sp. japonica genome annotation, along with genomic sequence and clustered transcript assemblies from 184 species in the plant kingdom, we have identified a set of 861 rice genes that are evolutionarily conserved among six diverse species within the Poaceae yet lack significant sequence similarity with plant species outside the Poaceae. This set of evolutionarily conserved and lineage-specific rice genes is termed conserved Poaceae-specific genes (CPSGs) to reflect the presence of significant sequence similarity across three separate Poaceae subfamilies. The vast majority of rice CPSGs (86.6%) encode proteins with no putative function or functionally characterized protein domain. For the remaining CPSGs, 8.8% encode an F-box domain-containing protein and 4.5% encode a protein with a putative function. On average, the CPSGs have fewer exons, shorter total gene length, and elevated GC content when compared with genes annotated as either transposable elements (TEs) or those genes having significant sequence similarity in a species outside the Poaceae. Multiple sequence alignments of the CPSGs with sequences from other Poaceae species show conservation across a putative domain, a novel domain, or the entire coding length of the protein. At the genome level, syntenic alignments between sorghum (Sorghum bicolor) and 103 of the 861 rice CPSGs (12.0%) could be made, demonstrating an additional level of conservation for this set of genes within the Poaceae. The extensive sequence similarity in evolutionarily distinct species within the Poaceae family and an additional screen for TE-related structural characteristics and sequence discounts these CPSGs as being misannotated TEs. Collectively, these data confirm that we have identified a specific set of genes that are highly conserved within, as well as specific to, the Poaceae.
    Plant physiology 01/2008; 145(4):1311-22. · 6.53 Impact Factor
  • Article: Characterization of paralogous protein families in rice
    [show abstract] [hide abstract]
    ABSTRACT: Abstract Background High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa , cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other genetic events such as tandem duplications, has resulted in a substantial number of its genes, and resulting proteins, occurring in paralogous families. Results Using a computational pipeline that utilizes Pfam and novel protein domains, we characterized paralogous families in rice and compared these with paralogous families in the model dicotyledonous diploid species, Arabidopsis thaliana . Arabidopsis, which has undergone genome duplication as well, has a substantially smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53% and 68% of the non-transposable element-related rice and Arabidopsis proteins could be classified into paralogous protein families, respectively. Singleton and paralogous family genes differed substantially in their likelihood of encoding a protein of known or putative function; 26% and 66% of singleton genes compared to 73% and 96% of the paralogous family genes encode a known or putative protein in rice and Arabidopsis, respectively. Furthermore, a major skew in the distribution of specific gene function was observed; a total of 17 Gene Ontology categories in both rice and Arabidopsis were statistically significant in their differential distribution between paralogous family and singleton proteins. In contrast to mammalian organisms, we found that duplicated genes in rice and Arabidopsis tend to have more alternative splice forms. Using data from Massively Parallel Signature Sequencing, we show that a significant portion of the duplicated genes in rice show divergent expression although a correlation between sequence divergence and correlation of expression could be seen in very young genes. Conclusion Collectively, these data suggest that while co-regulation and conserved function are present in some paralogous protein family members, evolutionary pressures have resulted in functional divergence with differential expression patterns.
    BMC Plant Biology. 01/2008;
  • Article: Phenotypic and transcriptomic changes associated with potato autopolyploidization.
    [show abstract] [hide abstract]
    ABSTRACT: Polyploidy is remarkably common in the plant kingdom and polyploidization is a major driving force for plant genome evolution. Polyploids may contain genomes from different parental species (allopolyploidy) or include multiple sets of the same genome (autopolyploidy). Genetic and epigenetic changes associated with allopolyploidization have been a major research subject in recent years. However, we know little about the genetic impact imposed by autopolyploidization. We developed a synthetic autopolyploid series in potato (Solanum phureja) that includes one monoploid (1x) clone, two diploid (2x) clones, and one tetraploid (4x) clone. Cell size and organ thickness were positively correlated with the ploidy level. However, the 2x plants were generally the most vigorous and the 1x plants exhibited less vigor compared to the 2x and 4x individuals. We analyzed the transcriptomic variation associated with this autopolyploid series using a potato cDNA microarray containing approximately 9000 genes. Statistically significant expression changes were observed among the ploidies for approximately 10% of the genes in both leaflet and root tip tissues. However, most changes were associated with the monoploid and were within the twofold level. Thus, alteration of ploidy caused subtle expression changes of a substantial percentage of genes in the potato genome. We demonstrated that there are few genes, if any, whose expression is linearly correlated with the ploidy and can be dramatically changed because of ploidy alteration.
    Genetics 09/2007; 176(4):2055-67. · 4.01 Impact Factor
  • Article: The rice kinase database. A phylogenomic database for the rice kinome.
    [show abstract] [hide abstract]
    ABSTRACT: The rice (Oryza sativa) genome contains 1,429 protein kinases, the vast majority of which have unknown functions. We created a phylogenomic database (http://rkd.ucdavis.edu) to facilitate functional analysis of this large gene family. Sequence and genomic data, including gene expression data and protein-protein interaction maps, can be displayed for each selected kinase in the context of a phylogenetic tree allowing for comparative analysis both within and between large kinase subfamilies. Interaction maps are easily accessed through links and displayed using Cytoscape, an open source software platform. Chromosomal distribution of all rice kinases can also be explored via an interactive interface.
    Plant physiology 03/2007; 143(2):579-86. · 6.53 Impact Factor
  • Source
    Article: The TIGR Rice Genome Annotation Resource: improvements and new features.
    [show abstract] [hide abstract]
    ABSTRACT: In The Institute for Genomic Research Rice Genome Annotation project (http://rice.tigr.org), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42,653 non-transposable element-related genes encoding 49,472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13,237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31,739 gene models), representing approximately 50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.
    Nucleic Acids Research 02/2007; 35(Database issue):D883-7. · 8.03 Impact Factor
  • Article: Expressed sequence tags from loblolly pine embryos reveal similarities with angiosperm embryogenesis.
    [show abstract] [hide abstract]
    ABSTRACT: The process of embryogenesis in gymnosperms differs in significant ways from the more widely studied process in angiosperms. To further our understanding of embryogenesis in gymnosperms, we have generated Expressed Sequence Tags (ESTs) from four cDNA libraries constructed from un-normalized, normalized, and subtracted RNA populations of zygotic and somatic embryos of loblolly pine (Pinus taeda L.). A total of 68,721 ESTs were generated from 68,131 cDNA clones. Following clustering and assembly, these sequences collapsed into 5,274 contigs and 6,880 singleton sequences for a total of 12,154 non-redundant sequences. Searches of a non-identical amino acid database revealed a putative homolog for 9,189 sequences, leaving 2,965 sequences with no known function. More extensive searches of additional plant sequence data sets revealed a putative homolog for all but 1,388 (11.4%) of the sequences. Using gene ontologies, a known function could be assigned for 5,495 of the 12,154 total non-redundant sequences with 13,633 associations in total assigned. When compared to approximately 72,000 sequences in a collated P. taeda transcript assembly derived from >245,000 ESTs derived from root, xylem, stem, needles, pollen cone, and shoot ESTs, 3,458 (28.5%) of the non-redundant embryo sequences were unique and thereby provide a valuable addition to development of a complete loblolly pine transcriptome. To assess similarities between angiosperm and gymnosperm embryo development, we examined our EST collection for putative homologs of angiosperm genes implicated in embryogenesis. Out of 108 angiosperm embryogenesis-related genes, homologs were present for 83 of these genes suggesting that pine contains similar genes for embryogenesis and that our RNA sampling methods were successful. We also identified sequences from the pine embryo transcriptome that have no known function and may contribute to the programming of gene expression and embryo development.
    Plant Molecular Biology 12/2006; 62(4-5):485-501. · 4.15 Impact Factor
  • Source
    Article: Genomic and genetic characterization of rice Cen3 reveals extensive transcription and evolutionary implications of a complex centromere.
    [show abstract] [hide abstract]
    ABSTRACT: The centromere is the chromosomal site for assembly of the kinetochore where spindle fibers attach during cell division. In most multicellular eukaryotes, centromeres are composed of long tracts of satellite repeats that are recalcitrant to sequencing and fine-scale genetic mapping. Here, we report the genomic and genetic characterization of the complete centromere of rice (Oryza sativa) chromosome 3. Using a DNA fiber-fluorescence in situ hybridization approach, we demonstrated that the centromere of chromosome 3 (Cen3) contains approximately 441 kb of the centromeric satellite repeat CentO. Cen3 includes an approximately 1,881-kb domain associated with the centromeric histone CENH3. This CENH3-associated chromatin domain is embedded within a 3,113-kb region that lacks genetic recombination. Extensive transcription was detected within the CENH3 binding domain based on comprehensive annotation of protein-coding genes coupled with empirical measurements of mRNA levels using RT-PCR and massively parallel signature sequencing. Genes <10 kb from the CentO satellite array were expressed in several rice tissues and displayed histone modification patterns consistent with euchromatin, suggesting that rice centromeric chromatin accommodates normal gene expression. These results support the hypothesis that centromeres can evolve from gene-containing genomic regions.
    The Plant Cell 09/2006; 18(9):2123-33. · 8.99 Impact Factor
  • Source
    Article: Transcription and histone modifications in the recombination-free region spanning a rice centromere.
    [show abstract] [hide abstract]
    ABSTRACT: Centromeres are sites of spindle attachment for chromosome segregation. During meiosis, recombination is absent at centromeres and surrounding regions. To understand the molecular basis for recombination suppression, we have comprehensively annotated the 3.5-Mb region that spans a fully sequenced rice centromere. Although transcriptional analysis showed that the 750-kb CENH3-containing core is relatively deficient in genes, the recombination-free region differs little in gene density from flanking regions that recombine. Likewise, the density of transposable elements is similar between the recombination-free region and flanking regions. We also measured levels of histone H4 acetylation and histone H3 methylation at 176 genes within the 3.5-Mb span. Active genes showed enrichment of H4 acetylation and H3K4 dimethylation as expected, including genes within the core. Our inability to detect sequence or histone modification features that distinguish recombination-free regions from flanking regions that recombine suggest that recombination suppression is an epigenetic feature of centromeres maintained by the assembly of CENH3-containing nucleosomes within the core. CENH3-containing centrochromatin does not appear to be distinguished by a unique combination of H3 and H4 modifications. Rather, the varied distribution of histone modifications might reflect the composition and abundance of sequence elements that inhabit centromeric DNA.
    The Plant Cell 01/2006; 17(12):3227-38. · 8.99 Impact Factor
  • Article: Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species.
    [show abstract] [hide abstract]
    ABSTRACT: Rice (Oryza sativa L.) chromosome 3 is evolutionarily conserved across the cultivated cereals and shares large blocks of synteny with maize and sorghum, which diverged from rice more than 50 million years ago. To begin to completely understand this chromosome, we sequenced, finished, and annotated 36.1 Mb ( approximately 97%) from O. sativa subsp. japonica cv Nipponbare. Annotation features of the chromosome include 5915 genes, of which 913 are related to transposable elements. A putative function could be assigned to 3064 genes, with another 757 genes annotated as expressed, leaving 2094 that encode hypothetical proteins. Similarity searches against the proteome of Arabidopsis thaliana revealed putative homologs for 67% of the chromosome 3 proteins. Further searches of a nonredundant amino acid database, the Pfam domain database, plant Expressed Sequence Tags, and genomic assemblies from sorghum and maize revealed only 853 nontransposable element related proteins from chromosome 3 that lacked similarity to other known sequences. Interestingly, 426 of these have a paralog within the rice genome. A comparative physical map of the wild progenitor species, Oryza nivara, with japonica chromosome 3 revealed a high degree of sequence identity and synteny between these two species, which diverged approximately 10,000 years ago. Although no major rearrangements were detected, the deduced size of the O. nivara chromosome 3 was 21% smaller than that of japonica. Synteny between rice and other cereals using an integrated maize physical map and wheat genetic map was strikingly high, further supporting the use of rice and, in particular, chromosome 3, as a model for comparative studies among the cereals.
    Genome Research 10/2005; 15(9):1284-91. · 13.61 Impact Factor
  • Article: Analyzing the potato abiotic stress transcriptome using expressed sequence tags.
    [show abstract] [hide abstract]
    ABSTRACT: To further increase our understanding of responses in potato to abiotic stress and the potato transcriptome in general, we generated 20 756 expressed sequence tags (ESTs) from a cDNA library constructed by pooling mRNA from heat-, cold-, salt-, and drought-stressed potato leaves and roots. These ESTs were clustered and assembled into a collection of 5240 unique sequences with 3344 contigs and 1896 singleton ESTs. Assignment of gene ontology terms (GOSlim/Plant) to the sequences revealed that 8101 assignments could be made with a total of 3863 molecular function assignments. Alignment to a set of 78 825 ESTs from other potato cDNA libraries derived from root, leaf, stolon, tuber, germinating eye, and callus tissues revealed 1476 sequences unique to abiotic stressed potato leaf and root tissue. Sequences present within the 5240 sequence set had similarity to genes known to be involved in abiotic stress responses in other plant species such as transcription factors, stress response genes, and signal transduction processes. In addition, we identified a number of genes unique to the abiotic stress library with unknown function, providing new candidate genes for investigation of abiotic stress responses in potato.
    Genome 09/2005; 48(4):598-605. · 1.65 Impact Factor
  • Source
    Article: The institute for genomic research Osa1 rice genome annotation database.
    [show abstract] [hide abstract]
    ABSTRACT: We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.
    Plant physiology 06/2005; 138(1):18-26. · 6.53 Impact Factor
  • Article: Structure, divergence, and distribution of the CRR centromeric retrotransposon family in rice.
    [show abstract] [hide abstract]
    ABSTRACT: The centromeric retrotransposon (CR) family in the grass species is one of few Ty3-gypsy groups of retroelements that preferentially transpose into highly specialized chromosomal domains. It has been demonstrated in both rice and maize that CRR (CR of rice) and CRM (CR of maize) elements are intermingled with centromeric satellite DNA and are highly concentrated within cytologically defined centromeres. We collected all of the CRR elements from rice chromosomes 1, 4, 8, and 10 that have been sequenced to high quality. Phylogenetic analysis revealed that the CRR elements are structurally diverged into four subfamilies, including two autonomous subfamilies (CRR1 and CRR2) and two nonautonomous subfamilies (noaCRR1 and noaCRR2). The CRR1/CRR2 elements contain all characteristic protein domains required for retrotransposition. In contrast, the noaCRR elements have different structures, containing only a gag or gag-pro domain or no open reading frames. The CRR and noaCRR elements share substantial sequence similarity in regions required for DNA replication and for recognition by integrase during retrotransposition. These data, coupled with the presence of young noaCRR elements in the rice genome and similar chromosomal distribution patterns between noaCRR1 and CRR1/CRR2 elements, suggest that the noaCRR elements were likely mobilized through the retrotransposition machinery from the autonomous CRR elements. Mechanisms of the targeting specificity of the CRR elements, as well as their role in centromere function, are discussed.
    Molecular Biology and Evolution 05/2005; 22(4):845-55. · 5.55 Impact Factor

Institutions

  • 2008–2010
    • Michigan State University
      • Department of Plant Biology
      East Lansing, MI, USA
  • 2008–2009
    • J. Craig Venter Institute
      Rockville, MD, USA
  • 2003–2008
    • St Joseph Medical Center (MD, USA)
      Towson, MD, USA
  • 2002–2007
    • University of Wisconsin, Madison
      • Department of Horticulture
      Madison, MS, USA
  • 2006
    • Georgia Institute of Technology
      • Institute of Paper Science and Technology
      Atlanta, GA, USA