[Show abstract][Hide abstract] ABSTRACT: Similar to Arabidopsis thaliana, the wild soybeans (Glycine soja) and many cultivars exhibit indeterminate stem growth specified by the shoot identity gene Dt1, the functional counterpart of Arabidopsis TERMINAL FLOWER1 (TFL1). Mutations in TFL1 and Dt1 both result in the shoot apical meristem (SAM) switching from vegetative to reproductive state to initiate terminal flowering and thus produce determinate stems. A second soybean gene (Dt2) regulating stem growth was identified, which, in the presence of Dt1, produces semideterminate plants with terminal racemes similar to those observed in determinate plants. Here, we report positional cloning and characterization of Dt2, a dominant MADS domain factor gene classified into the APETALA1/SQUAMOSA (AP1/SQUA) subfamily that includes floral meristem (FM) identity genes AP1, FUL, and CAL in Arabidopsis. Unlike AP1, whose expression is limited to FMs in which the expression of TFL1 is repressed, Dt2 appears to repress the expression of Dt1 in the SAMs to promote early conversion of the SAMs into reproductive inflorescences. Given that Dt2 is not the gene most closely related to AP1 and that semideterminacy is rarely seen in wild soybeans, Dt2 appears to be a recent gain-of-function mutation, which has modified the genetic pathways determining the stem growth habit in soybean.
[Show abstract][Hide abstract] ABSTRACT: Alternative splicing (AS) is common in higher eukaryotes and plays an important role in gene posttranscriptional regulation. It has been suggested that AS varies dramatically among species, tissues, and duplicated gene families of different sizes. However, the genomic forces that govern AS variation remain poorly understood. Here, through genome-wide identification of AS events in the soybean (Glycine max) genome using high-throughput RNA sequencing of 28 samples from different developmental stages, we found that more than 63% of multiexonic genes underwent AS. More AS events occurred in the younger developmental stages than in the older developmental stages for the same type of tissue, and the four main AS types, exon skipping, intron retention, alternative donor sites, and alternative acceptor sites, exhibited different characteristics. Global computational analysis demonstrated that the variations of AS frequency and AS types were significantly correlated with the changes of gene features and gene transcriptional level. Further investigation suggested that the decrease of AS within the genome-wide duplicated genes were due to the diminution of intron length, exon number, and transcriptional level. Altogether, our study revealed that a large number of genes were alternatively spliced in the soybean genome and that variations in gene structure and transcriptional level may play important roles in regulating AS.
[Show abstract][Hide abstract] ABSTRACT: Polyploidy is a common phenomenon, particularly in plants. The soybean (Glycine max [L.] Merr.) genome has undergone two whole genome duplication (WGD) events. The conservation and divergence of duplicated gene pairs are major contributors to genome evolution. D1 and D2 are two unlinked, paralogous nuclear genes, whose double-recessive mutant (d1d1d2d2) results in Chlorophyll retention, called 'stay-green'. Through molecular cloning and functional analyses, we demonstrated that D1 and D2 are homologous of STAY-GREEN (SGR) gene from other plant species and were duplicated as a result of the most recent WGD in soybean. Transcriptional analysis showed that both D1 and D2 were more highly expressed in older tissues, and chlorophyll degradation and programmed cell death-related genes were suppressed in a d1d2 double mutant, indicating that these genes are likely involved in early stages of tissue senescence. Investigation of genes flanking D1 and D2 revealed that evolution within collinear duplicated blocks may affect the conservation of individual gene pairs within the blocks. Moreover, we found that a long terminal repeat retrotransposon, GmD2IN, resulted in the d2 mutation. Further analysis of this retrotransposon family showed that insertion in or near coding regions can affect gene expression or splicing patterns, and may be an important force to promote the divergence of duplicated gene pairs. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: miRNAs genes are thought to undergo quick birth and death processes in genomes and the emergence of MIRNA-like hairpin provides the base for functional miRNA gene formation. However, the factors affecting the formation of an active miRNA gene from an MIRNA-like hairpin within a genome remain unclear. We performed a genome-wide investigation of MIRNA-like hairpin accumulation, expression, structural changes and relationships with annotated genomic features in the paleopolyploid soybean genome. Our results showed that adjacent gene and transposable element content, rates of genetic recombination at location of emergence, along with its own gene structure divergence greatly affected miRNA gene evolution. Further investigation suggested that miRNA genes from different duplication sources followed distinct evolutionary trajectories and that the accumulation of MIRNA-like hairpins might be a major factor causing LTR-RTs to lose activity during genome evolution. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: Preferential accumulation of transposable elements (TEs), particularly long terminal repeat retrotransposons (LTR-RTs), in recombination-suppressed pericentromeric regions seems to be a general pattern of TE distribution in flowering plants. However, whether such a pattern was formed primarily by preferential TE insertions into pericentromeric regions or by selection against TE insertions into euchromatin remains obscure. We recently investigated TE insertions in 31 resequenced wild and cultivated soybean (Glycine max) genomes and detected 34,154 unique nonreference TE insertions mappable to the reference genome. Our data revealed consistent distribution patterns of the nonreference LTR-RT insertions and those present in the reference genome, whereas the distribution patterns of the nonreference DNA TE insertions and the accumulated ones were significantly different. The densities of the nonreference LTR-RT insertions were found to negatively correlate with the rates of local genetic recombination, but no significant correlation between the densities of nonreference DNA TE insertions and the rates of local genetic recombination was detected. These observations suggest that distinct insertional preferences were primary factors that resulted in different levels of effectiveness of purifying selection, perhaps as an effect of local genomic features, such as recombination rates and gene densities that reshaped the distribution patterns of LTR-RTs and DNA TEs in soybean.
[Show abstract][Hide abstract] ABSTRACT: The evolutionary forces that govern the divergence and retention of duplicated genes in polyploids are poorly understood. In this study, we first investigated the rates of nonsynonymous substitution (Ka) and the rates of synonymous substitution (Ks) for a nearly complete set of genes in the paleopolyploid soybean (Glycine max) by comparing the orthologs between soybean and its progenitor species Glycine soja and then compared the patterns of gene divergence and expression between pericentromeric regions and chromosomal arms in different gene categories. Our results reveal strong associations between duplication status and Ka and gene expression levels and overall low Ks and low levels of gene expression in pericentromeric regions. It is theorized that deleterious mutations can easily accumulate in recombination-suppressed regions, because of Hill-Robertson effects. Intriguingly, the genes in pericentromeric regions-the cold spots for meiotic recombination in soybean-showed significantly lower Ka and higher levels of expression than their homoeologs in chromosomal arms. This asymmetric evolution of two members of individual whole genome duplication (WGD)-derived gene pairs, echoing the biased accumulation of singletons in pericentromeric regions, suggests that distinct genomic features between the two distinct chromatin types are important determinants shaping the patterns of divergence and retention of WGD-derived genes.
The Plant Cell 01/2012; 24(1):21-32. · 9.25 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Extensive DNA rearrangement of genic colinearity, as revealed by comparison of orthologous genomic regions, has been shown to be a general concept describing evolutionary dynamics of plant genomes. However, the nature, timing, lineages and adaptation of local genomic rearrangement in closely related species (e.g., within a genus) and haplotype variation of genomic rearrangement within populations have not been well documented.
We previously identified a hotspot for genic rearrangement and transposon accumulation in the Orp region of Asian rice (Oryza sativa, AA) by comparison with its orthologous region in sorghum. Here, we report the comparative analysis of this region with its orthologous regions in the wild progenitor species (O. nivara, AA) of Asian rice and African rice (O. glaberrima) using the BB genome Oryza species (O. punctata) as an outgroup, and investigation of transposon insertion sites and a segmental inversion event in the AA genomes at the population level. We found that Orp region was primarily and recently expanded in the Asian rice species O. sativa and O. nivara. LTR-retrotransposons shared by the three AA-genomic regions have been fixed in all the 94 varieties that represent different populations of the AA-genome species/subspecies, indicating their adaptive role in genome differentiation. However, LTR-retrotransposons unique to either O. nivara or O. sativa regions exhibited dramatic haplotype variation regarding their presence or absence between or within populations/subpopulations.
The LTR-retrotransposon insertion hotspot in the Orp region was formed recently, independently and concurrently in different AA-genome species, and that the genic rearrangements detected in different species appear to be differentially triggered by transposable elements. This region is located near the end of the short arm of chromosome 8 and contains a high proportion of LTR-retrotransposons similar to observed in the centromeric region of this same chromosome, and thus may represent a genomic region that has recently switched from euchromatic to heterochromatic states. The haplotype variation of LTR-retrotransposon insertions within this region reveals substantial admixture among various subpopulations as established by molecular markers at the whole genome level, and can be used to develop retrotransposon junction markers for simple and rapid classification of O. sativa germplasm.
[Show abstract][Hide abstract] ABSTRACT: The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 long terminal repeat-retrotransposon (LTR-RT) families comprising 32370 elements in soybean (Glycine max (L.) Merr.). Approximately 87% of these elements were located in recombination-suppressed pericentromeric regions, where the ratio (1.26) of solo LTRs to intact elements (S/I) is significantly lower than that of chromosome arms (1.62). Further analysis revealed a significant positive correlation between S/I and LTR sizes, indicating that larger LTRs facilitate solo LTR formation. Phylogenetic analysis revealed seven Copia and five Gypsy evolutionary lineages that were present before the divergence of eudicot and monocot species, but the scales and timeframes within which they proliferated vary dramatically across families, lineages and species, and notably, a Copia lineage has been lost in soybean. Analysis of the physical association of LTR-RTs with centromere satellite repeats identified two putative centromere retrotransposon (CR) families of soybean, which were grouped into the CR (e.g. CRR and CRM) lineage found in grasses, indicating that the 'functional specification' of CR pre-dates the bifurcation of eudicots and monocots. However, a number of families of the CR lineage are not concentrated in centromeres, suggesting that their CR roles may now be defunct. Our data also suggest that the envelope-like genes in the putative Copia retrovirus-like family are probably derived from the Gypsy retrovirus-like lineage, and thus we propose the hypothesis of a single ancient origin of envelope-like genes in flowering plants.
The Plant Journal 08/2010; 63(4):584-98. · 6.58 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Determinacy is an agronomically important trait associated with the domestication in soybean (Glycine max). Most soybean cultivars are classifiable into indeterminate and determinate growth habit, whereas Glycine soja, the wild progenitor of soybean, is indeterminate. Indeterminate (Dt1/Dt1) and determinate (dt1/dt1) genotypes, when mated, produce progeny that segregate in a monogenic pattern. Here, we show evidence that Dt1 is a homolog (designated as GmTfl1) of Arabidopsis terminal flower 1 (TFL1), a regulatory gene encoding a signaling protein of shoot meristems. The transition from indeterminate to determinate phenotypes in soybean is associated with independent human selections of four distinct single-nucleotide substitutions in the GmTfl1 gene, each of which led to a single amino acid change. Genetic diversity of a minicore collection of Chinese soybean landraces assessed by simple sequence repeat (SSR) markers and allelic variation at the GmTfl1 locus suggest that human selection for determinacy took place at early stages of landrace radiation. The GmTfl1 allele introduced into a determinate-type (tfl1/tfl1) Arabidopsis mutants fully restored the wild-type (TFL1/TFL1) phenotype, but the Gmtfl1 allele in tfl1/tfl1 mutants did not result in apparent phenotypic change. These observations indicate that GmTfl1 complements the functions of TFL1 in Arabidopsis. However, the GmTfl1 homeolog, despite its more recent divergence from GmTfl1 than from Arabidopsis TFL1, appears to be sub- or neo-functionalized, as revealed by the differential expression of the two genes at multiple plant developmental stages and by allelic analysis at both loci.
Proceedings of the National Academy of Sciences 05/2010; 107(19):8563-8. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop.
Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I) and 6,029 DNA transposons (Class II) with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95%) of these elements (particularly a few hundred low-copy-number families) are first described in this study.
SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually curated transposable element database for any individual plant genome completely sequenced to date. Transposable elements previously identified in legumes, the third largest family of flowering plants, are relatively scarce. Thus this database will facilitate structural, evolutionary, functional, and epigenetic analyses of transposable elements in soybean and other legume species.
[Show abstract][Hide abstract] ABSTRACT: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
[Show abstract][Hide abstract] ABSTRACT: Long terminal repeat (LTR) retrotransposons, the most abundant genomic components in flowering plants, are classifiable into autonomous and nonautonomous elements based on their structural completeness and transposition capacity. It has been proposed that selection is the major force for maintaining sequence (e.g., LTR) conservation between nonautonomous elements and their autonomous counterparts. Here, we report the structural, evolutionary, and expression characterization of a giant retrovirus-like soybean (Glycine max) LTR retrotransposon family, SNARE. This family contains two autonomous subfamilies, SARE(A) and SARE(B), that appear to have evolved independently since the soybean genome tetraploidization event approximately 13 million years ago, and a nonautonomous subfamily, SNRE, that originated from SARE(A). Unexpectedly, a subset of the SNRE elements, which amplified from a single founding SNRE element within the last approximately 3 million years, have been dramatically homogenized with either SARE(A) or SARE(B) primarily in the LTR regions and bifurcated into distinct subgroups corresponding to the two autonomous subfamilies. We uncovered evidence of region-specific swapping of nonautonomous elements with autonomous elements that primarily generated various nonautonomous recombinants with LTR sequences from autonomous elements of different evolutionary lineages, thus revealing a molecular mechanism for the enhancement of preexisting partnership and the establishment of new partnership between autonomous and nonautonomous elements.
The Plant Cell 01/2010; 22(1):48-61. · 9.25 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis and meiosis. This function is conserved across species, but the DNA components that are involved in kinetochore formation differ greatly, even between closely related species. To shed light on the nature, evolutionary timing and evolutionary dynamics of rice centromeres, we decoded a 2.25-Mb DNA sequence covering the centromeric region of chromosome 8 of an indica rice variety, 'Kasalath' (Kas-Cen8). Analysis of repetitive sequences in Kas-Cen8 led to the identification of 222 long terminal repeat (LTR)-retrotransposon elements and 584 CentO satellite monomers, which account for 59.2% of the region. A comparison of the Kas-Cen8 sequence with that of japonica rice 'Nipponbare' (Nip-Cen8) revealed that about 66.8% of the Kas-Cen8 sequence was collinear with that of Nip-Cen8. Although the 27 putative genes are conserved between the two subspecies, only 55.4% of the total LTR-retrotransposon elements in 'Kasalath' had orthologs in 'Nipponbare', thus reflecting recent proliferation of a considerable number of LTR-retrotransposons since the divergence of two rice subspecies of indica and japonica within Oryza sativa. Comparative analysis of the subfamilies, time of insertion, and organization patterns of inserted LTR-retrotransposons between the two Cen8 regions revealed variations between 'Kasalath' and 'Nipponbare' in the preferential accumulation of CRR elements, and the expansion of CentO satellite repeats within the core domain of Cen8. Together, the results provide insights into the recent proliferation of LTR-retrotransposons, and the rapid expansion of CentO satellite repeats, underlying the dynamic variation and plasticity of plant centromeres.
The Plant Journal 09/2009; 60(5):805-19. · 6.58 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In flowering plants, the accumulation of small deletions through unequal homologous recombination (UR) and illegitimate recombination (IR) is proposed to be the major process counteracting genome expansion, which is caused primarily by the periodic amplification of long terminal repeat retrotransposons (LTR-RTs). However, the full suite of evolutionary forces that govern the gain or loss of transposable elements (TEs) and their distribution within a genome remains unclear. Here, we investigated the distribution and structural variation of LTR-RTs in relation to the rates of local genetic recombination (GR) and gene densities in the rice (Oryza sativa) genome. Our data revealed a positive correlation between GR rates and gene densities and negative correlations between LTR-RT densities and both GR and gene densities. The data also indicate a tendency for LTR-RT elements and fragments to be shorter in regions with higher GR rates; the size reduction of LTR-RTs appears to be achieved primarily through solo LTR formation by UR. Comparison of indica and japonica rice revealed patterns and frequencies of LTR-RT gain and loss within different evolutionary timeframes. Different LTR-RT families exhibited variable distribution patterns and structural changes, but overall LTR-RT compositions and genes were organized according to the GR gradients of the genome. Further investigation of non-LTR-RTs and DNA transposons revealed a negative correlation between gene densities and the abundance of DNA transposons and a weak correlation between GR rates and the abundance of long interspersed nuclear elements (LINEs)/short interspersed nuclear elements (SINEs). Together, these observations suggest that GR and gene density play important roles in shaping the dynamic structure of the rice genome.
Genome Research 09/2009; 19(12):2221-30. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Soybean (Glycine max) is an important crop, and it is the world's main source of oilseed. Through a large-scale assessment of soybean domestication and improvement by resequencing 302 wild, landrace and improved soybean accessions at >11× depth, we detected 230 selective sweeps and numbers of selected copy number variations. Genome-wide association study revealed associations between 10 selected regions and 9 domestication/improvement traits and identified 13 previously uncharacterized loci relevant to important agronomical traits, such as oil content, plant height and pubescence form. Combined with previous linkage studies on quantitative trait loci (QTL) identification, we found that 96 selected regions fell into reported oil QTLs and 21 selected regions contained fatty acid biosynthesis genes. Moreover, we observed that some traits and loci are associated with particular geographical regions, showing that soybean populations were structured geographically. This study provides unprecedented large-scale information and resources for genetics research and genomics-enabled improvements in soybean breeding.
International Plant and Animal Genome Conference XXIII 2015;