[Show abstract][Hide abstract] ABSTRACT: Expressed sequence tags (ESTs) for the basal angiosperms Amborella trichopoda (Amborellaceae) and the water lily Nuphar advena (Nymphaeaceae) have proven valuable in identification of gene pairs to study the timing of duplication events relative to the most recent common ancestor (MRCA) of all extant angiosperms. Here we discuss how ESTs for these taxa are also useful for deducing gene families that were present in the MRCA of all flowering plants. For example, 4,572 gene clusters identified in an analysis of the rice and Arabidopsis proteomes contained putative orthologs of Amborella or Nuphar. Homologs of many developmentally important genes were identified from Amborella or Nuphar in these gene clusters. This number of ancestral genes is expected to increase as the number of Amborella and water lily ESTs increases. Genes found unduplicated in the rice and Arabidopsis genomes may be especially useful for phylogenetic analyses including diverse angiosperm lineages. We identify 595 of these single copy genes with putative orthologs in Amborella or Nuphar. Phylogenetic analysis of one of these nuclear single-copy genes encoding the enzyme carboxymethylenebutenolidase yields a topology that places Amborella and Nuphar as the sisters to all other extant lineages of angiosperms and is generally consistent with current angiosperm phylogenies based mainly on chloroplast, mitochondrial, and nuclear ribosomal sequences.
[Show abstract][Hide abstract] ABSTRACT: Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3x draft genome sequence of 'SunUp' papaya, the first commercial virus-resistant transgenic fruit tree to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.
[Show abstract][Hide abstract] ABSTRACT: MicroRNAs (miRNAs) negatively control gene expression by cleaving or inhibiting the translation of mRNA of target genes, and as such, they play an important role in plant development. Of the 79 plant miRNA families discovered to date, most are from the fully sequenced plant genomes of Arabidopsis, Populus and rice. Here, we identified miRNAs from leaves, roots, stems and flowers at different developmental stages of the basal eudicot species Eschscholzia californica (California poppy) using cloning and capillary sequencing, as well as ultrahigh-throughput pyrosequencing using the recently introduced 454 sequencing method. In total, we identified a minimum of 173 unique miRNA sequences belonging to 28 miRNA families and seven trans-acting small interfering RNAs (ta-siRNAs) conserved in eudicot and monocot species. miR529 and miR537, which have not yet been reported in eudicot species, were detected in California poppy; loci encoding these miRNAs were also found in Arabidopsis and Populus. miR535, which occurs in the moss Physcomitrella patens, was also detected in California poppy, but not in other angiosperms. Several potential miRNA targets were found in cDNA sequences of California poppy. Predicted target genes include transcription factors but also genes implicated in various metabolic processes and in stress defense. Comparative analysis of miRNAs from plants of phylogenetically-critical basal lineages aid the study of the evolutionary gains and losses of miRNAs in plants as well as their conservation, and lead to discoveries about the miRNAs of even well-studied model organisms.
Full-text · Article · Oct 2007 · The Plant Journal
[Show abstract][Hide abstract] ABSTRACT: Comparative genomics approaches are proving to be extremely valuable for the study of gene function, gene duplications, and genome evolution. In this chapter we discuss how cross‐species comparisons of gene sequences and gene‐expression patterns are elucidating the evolution of many plant processes including the regulation of reproduction. Emphasis is placed on the implications of gene and genome duplications for the evolution of genome structure and plant reproduction. In addition, we show that comparative analyses can both promote transfer of knowledge from model to nonmodel systems and inform our understanding of conserved processes in model species.
No preview · Article · Dec 2006 · Advances in Botanical Research
[Show abstract][Hide abstract] ABSTRACT: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More
than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication
event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially
more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite
[Show abstract][Hide abstract] ABSTRACT: In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Full-text · Article · Feb 2006 · Omics A Journal of Integrative Biology
[Show abstract][Hide abstract] ABSTRACT: The Chloroplast Genome Database (ChloroplastDB) is an interactive, web-based database for fully sequenced plastid genomes, containing genomic, protein, DNA and RNA sequences, gene locations, RNA-editing sites, putative protein families and alignments (http://chloroplast.cbio.psu.edu/). With recent technical advances, the rate of generating new organelle genomes has increased dramatically. However, the established ontology for chloroplast genes and gene features has not been uniformly applied to all chloroplast genomes available in the sequence databases. For example, annotations for some published genome sequences have not evolved with gene naming conventions. ChloroplastDB provides unified annotations, gene name search, BLAST and download functions for chloroplast encoded genes and genomic sequences. A user can retrieve all orthologous sequences with one search regardless of gene names in GenBank. This feature alone greatly facilitates comparative research on sequence evolution including changes in gene content, codon usage, gene structure and post-transcriptional modifications such as RNA editing. Orthologous protein sets are classified by TribeMCL and each set is assigned a standard gene name. Over the next few years, as the number of sequenced chloroplast genomes increases rapidly, the tools available in ChloroplastDB will allow researchers to easily identify and compile target data for comparative analysis of chloroplast genes and genomes.
Full-text · Article · Feb 2006 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: The combined processes of gene duplication, nucleotide substitution, domain duplication, and intron/exon shuffling can generate a complex set of related genes that may differ substantially in their expression patterns and functions. The APETALA2-like (AP2-like) gene family exhibits patterns of both gene and domain duplication, coupled with changes in sequence, exon arrangement, and expression. In angiosperms, these genes perform an array of functions including the establishment of the floral meristem, the specification of floral organ identity, the regulation of floral homeotic gene expression, the regulation of ovule development, and the growth of floral organs. To determine patterns of gene diversification, we conducted a series of broad phylogenetic analyses of AP2-like sequences from green plants. These studies indicate that the AP2 domain was duplicated prior to the divergence of the two major lineages of AP2-like genes, euAP2 and AINTEGUMENTA (ANT). Structural features of the AP2-like genes as well as phylogenetic analyses of nucleotide and amino acid (aa) sequences of the AP2-like gene family support the presence of the two major lineages. The ANT lineage is supported by a 10-aa insertion in the AP2-R1 domain and a 1-aa insertion in the AP2-R2 domain, relative to all other members of the AP2-like family. MicroRNA172-binding sequences, the function of which has been studied in some of the AP2-like genes in Arabidopsis, are restricted to the euAP2 lineage. Within the ANT lineage, the euANT lineage is characterized by four conserved motifs: one in the 10-aa insertion in the AP2-R1 domain (euANT1) and three in the predomain region (euANT2, euANT3, and euANT4). Our expression studies show that the euAP2 homologue from Amborella trichopoda, the putative sister to all other angiosperms, is expressed in all floral organs as well as leaves.
Full-text · Article · Feb 2006 · Molecular Biology and Evolution
[Show abstract][Hide abstract] ABSTRACT: MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.