[Show abstract][Hide abstract] ABSTRACT: The single-copper protein azurin from Pseudomonas aeruginosa has attracted great interest as an anti-cancer therapeutic agent or as a fuel cell catalyst for energy conversion. In this work, we obtained transgenic tobacco plants transformed with the chloroplast expression vector harboring the mature azurin polypeptide fused to psbA 5′UTR element, confirmed the integration of site-specificity into the tobacco chloroplast genome through homologous recombination by Southern hybridization analysis, and also identified the maternal inheritance. Northern hybridization analysis showed the polycistronic transcription expression pattern of the azurin gene. In addition, post-transcriptional processing of azurin monocistron was observed, which may be due to the endonucleolytic and intercistronic cleavage of the psbA mRNA 5′UTR element. Also, we examined the azurin expression levels depending on leaf maturity, showing a high expression level of 5.7 % of total soluble protein (TSP) in young leaves, in contrast to a low expression level of 0.72 % TSP in fully mature leaves. In addition, the copper level of transplastomic chloroplasts increased by twofold compared with that of non-transplastomic chloroplasts. These results suggest that the increased copper level may be due to the production of azurin in transplastomic chloroplasts, representing the formation of active azurin with copper ions in active sites.
Full-text · Article · Nov 2014 · Plant Biotechnology Reports
[Show abstract][Hide abstract] ABSTRACT: Retrocyclin-101 (RC101) and Protegrin-1 (PG1) are two important antimicrobial peptides that can be used as therapeutic agents against bacterial and/or viral infections, especially those caused by the HIV-1 or sexually transmitted bacteria. Because of their antimicrobial activity and complex secondary structures, they have not yet been produced in microbial systems and their chemical synthesis is prohibitively expensive. Therefore, we created chloroplast transformation vectors with the RC101 or PG1 coding sequence, fused with GFP to confer stability, furin or Factor Xa cleavage site to liberate the mature peptide from their fusion proteins and a His-tag to aid in their purification. Stable integration of RC101 into the tobacco chloroplast genome and homoplasmy were confirmed by Southern blots. RC101 and PG1 accumulated up to 32%-38% and 17%∼26% of the total soluble protein. Both RC101 and PG1 were cleaved from GFP by corresponding proteases in vitro, and Factor Xa-like protease activity was observed within chloroplasts. Confocal microscopy studies showed location of GFP fluorescence within chloroplasts. Organic extraction resulted in 10.6-fold higher yield of RC101 than purification by affinity chromatography using His-tag. In planta bioassays with Erwinia carotovora confirmed the antibacterial activity of RC101 and PG1 expressed in chloroplasts. RC101 transplastomic plants were resistant to tobacco mosaic virus infections, confirming antiviral activity. Because RC101 and PG1 have not yet been produced in other cell culture or microbial systems, chloroplasts can be used as bioreactors for producing these proteins. Adequate yield of purified antimicrobial peptides from transplastomic plants should facilitate further preclinical studies.
Full-text · Article · Jan 2011 · Plant Biotechnology Journal
[Show abstract][Hide abstract] ABSTRACT: Functional gene transfer from the plastid to the nucleus is rare among land plants despite evidence that DNA transfer to the nucleus is relatively frequent. During the course of sequencing plastid genomes from representative species from three rosid genera (Castanea, Prunus, Theobroma) and ongoing projects focusing on the Fagaceae and Passifloraceae, we identified putative losses of rpl22 in these two angiosperm families. We further characterized rpl22 from three species of Passiflora and one species of Quercus and identified sequences that likely represent pseudogenes. In Castanea and Quercus, both members of the Fagaceae, we identified a nuclear copy of rpl22, which consisted of two exons separated by an intron. Exon 1 encodes a transit peptide that likely targets the protein product back to the plastid and exon 2 encodes rpl22. We performed phylogenetic analyses of 97 taxa, including 93 angiosperms and four gymnosperm outgroups using alignments of 81 plastid genes to examine the phylogenetic distribution of rpl22 loss and transfer to the nucleus. Our results indicate that within rosids there have been independent transfers of rpl22 to the nucleus in Fabaceae and Fagaceae and a putative third transfer in Passiflora. The high level of sequence divergence between the transit peptides in Fabaceae and Fagaceae strongly suggest that these represent independent transfers. Furthermore, Blast searches did not identify the "donor" genes of the transit peptides, suggesting a de novo origin. We also performed phylogenetic analyses of rpl22 for 87 angiosperms and four gymnosperms, including nuclear-encoded copies for five species of Fabaceae and Fagaceae. The resulting trees indicated that the transfer of rpl22 to the nucleus does not predate the origin of angiosperms as suggested in an earlier study. Using previously published angiosperm divergence time estimates, we suggest that these transfers occurred approximately 56-58, 34-37, and 26-27 Ma for the Fabaceae, Fagaceae, and Passifloraceae, respectively.
Full-text · Article · Oct 2010 · Molecular Biology and Evolution
[Show abstract][Hide abstract] ABSTRACT: Chickpea (Cicerarietinum, Leguminosae), an important grain legume, is widely used for food and fodder throughout the world. We sequenced the complete plastid genome of chickpea, which is 125,319bp in size, and contains only one copy of the inverted repeat (IR). The genome encodes 108 genes, including 4 rRNAs, 29 tRNAs, and 75 proteins. The genes rps16, infA, and ycf4 are absent in the chickpea plastid genome, and ndhB has an internal stop codon in the 5'exon, similar to other legumes. Two genes have lost their introns, one in the 3'exon of the transpliced gene rps12, and the one between exons 1 and 2 of clpP; this represents the first documented case of the loss of introns from both of these genes in the same plastid genome. An extensive phylogenetic survey of these intron losses was performed on 302 taxa across legumes and the related family Polygalaceae. The clpP intron has been lost exclusively in taxa from the temperate "IR-lacking clade" (IRLC), whereas the rps12 intron has been lost in most members of the IRLC (with the exception of Wisteria, Callerya, Afgekia, and certain species of Millettia, which represent the earliest diverging lineages of this clade), and in the tribe Desmodieae, which is closely related to the tribes Phaseoleae and Psoraleeae. Data provided here suggest that the loss of the rps12 intron occurred after the loss of the IR. The two new genomic changes identified in the present study provide additional support of the monophyly of the IR-loss clade, and resolution of the pattern of the earliest-branching lineages in this clade. The availability of the complete chickpea plastid genome sequence also provides valuable information on intergenic spacer regions among legumes and endogenous regulatory sequences for plastid genetic engineering.
[Show abstract][Hide abstract] ABSTRACT: The complete sequence of the chloroplast genome of cassava (Manihot esculenta, Euphorbiaceae) has been determined. The genome is 161,453 bp in length and includes a pair of inverted repeats (IR) of 26,954 bp. The genome includes 128 genes; 96 are single copy and 16 are duplicated in the IR. There are four rRNA genes and 30 distinct tRNAs, seven of which are duplicated in the IR. The infA gene is absent; expansion of IRb has duplicated 62 amino acids at the 3' end of rps19 and a number of coding regions have large insertions or deletions, including insertions within the 23S rRNA gene. There are 17 intron-containing genes in cassava, 15 of which have a single intron while two (clpP, ycf3) have two introns. The usually conserved atpF group II intron is absent and this is the first report of its loss from land plant chloroplast genomes. The phylogenetic distribution of the atpF intron loss was determined by a PCR survey of 251 taxa representing 34 families of Malpighiales and 16 taxa from closely related rosids. The atpF intron is not only missing in cassava but also from closely related Euphorbiaceae and other Malpighiales, suggesting that there have been at least seven independent losses. In cassava and all other sequenced Malphigiales, atpF gene sequences showed a strong association between C-to-T substitutions at nucleotide position 92 and the loss of the intron, suggesting that recombination between an edited mRNA and the atpF gene may be a possible mechanism for the intron loss.
[Show abstract][Hide abstract] ABSTRACT: Angiosperms are the largest and most successful clade of land plants with >250,000 species distributed in nearly every terrestrial habitat. Many phylogenetic studies have been based on DNA sequences of one to several genes, but, despite decades of intensive efforts, relationships among early diverging lineages and several of the major clades remain either incompletely resolved or weakly supported. We performed phylogenetic analyses of 81 plastid genes in 64 sequenced genomes, including 13 new genomes, to estimate relationships among the major angiosperm clades, and the resulting trees are used to examine the evolution of gene and intron content. Phylogenetic trees from multiple methods, including model-based approaches, provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales. The plastid genome trees also provide strong support for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids. Resolution of relationships among the major clades of angiosperms provides the necessary framework for addressing numerous evolutionary questions regarding the rapid diversification of angiosperms. Gene and intron content are highly conserved among the early diverging angiosperms and basal eudicots, but 62 independent gene and intron losses are limited to the more derived monocot and eudicot clades. Moreover, a lineage-specific correlation was detected between rates of nucleotide substitutions, indels, and genomic rearrangements.
Full-text · Article · Jan 2008 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: Comparisons of complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera to six published grass chloroplast genomes reveal that gene content and order are similar but two microstructural changes have occurred. First, the expansion of the IR at the SSC/IRa boundary that duplicates a portion of the 5' end of ndhH is restricted to the three genera of the subfamily Pooideae (Agrostis, Hordeum and Triticum). Second, a 6 bp deletion in ndhK is shared by Agrostis, Hordeum, Oryza and Triticum, and this event supports the sister relationship between the subfamilies Erhartoideae and Pooideae. Repeat analysis identified 19-37 direct and inverted repeats 30 bp or longer with a sequence identity of at least 90%. Seventeen of the 26 shared repeats are found in all the grass chloroplast genomes examined and are located in the same genes or intergenic spacer (IGS) regions. Examination of simple sequence repeats (SSRs) identified 16-21 potential polymorphic SSRs. Five IGS regions have 100% sequence identity among Zea mays, Saccharum officinarum and Sorghum bicolor, whereas no spacer regions were identical among Oryza sativa, Triticum aestivum, H. vulgare and A. stolonifera despite their close phylogenetic relationship. Alignment of EST sequences and DNA coding sequences identified six C-U conversions in both Sorghum bicolor and H. vulgare but only one in A. stolonifera. Phylogenetic trees based on DNA sequences of 61 protein-coding genes of 38 taxa using both maximum parsimony and likelihood methods provide moderate support for a sister relationship between the subfamilies Erhartoideae and Pooideae.
[Show abstract][Hide abstract] ABSTRACT: The chloroplast genome sequence of Coffea arabica L., the first sequenced member of the fourth largest family of angiosperms, Rubiaceae, is reported. The genome is 155 189 bp in length, including a pair of inverted repeats of 25,943 bp. Of the 130 genes present, 112 are distinct and 18 are duplicated in the inverted repeat. The coding region comprises 79 protein genes, 29 transfer RNA genes, four ribosomal RNA genes and 18 genes containing introns (three with three exons). Repeat analysis revealed five direct and three inverted repeats of 30 bp or longer with a sequence identity of 90% or more. Comparisons of the coffee chloroplast genome with sequenced genomes of the closely related family Solanaceae indicated that coffee has a portion of rps19 duplicated in the inverted repeat and an intact copy of infA. Furthermore, whole-genome comparisons identified large indels (> 500 bp) in several intergenic spacer regions and introns in the Solanaceae, including trnE (UUC)-trnT (GGU) spacer, ycf4-cemA spacer, trnI (GAU) intron and rrn5-trnR (ACG) spacer. Phylogenetic analyses based on the DNA sequences of 61 protein-coding genes for 35 taxa, performed using both maximum parsimony and maximum likelihood methods, strongly supported the monophyly of several major clades of angiosperms, including monocots, eudicots, rosids, asterids, eurosids II, and euasterids I and II. Coffea (Rubiaceae, Gentianales) is only the second order sampled from the euasterid I clade. The availability of the complete chloroplast genome of coffee provides regulatory and intergenic spacer sequences for utilization in chloroplast genetic engineering to improve this important crop.
Full-text · Article · Mar 2007 · Plant Biotechnology Journal
[Show abstract][Hide abstract] ABSTRACT: The production of Citrus, the largest fruit crop of international economic value, has recently been imperiled due to the introduction of the bacterial disease Citrus canker. No significant improvements have been made to combat this disease by plant breeding and nuclear transgenic approaches. Chloroplast genetic engineering has a number of advantages over nuclear transformation; it not only increases transgene expression but also facilitates transgene containment, which is one of the major impediments for development of transgenic trees. We have sequenced the Citrus chloroplast genome to facilitate genetic improvement of this crop and to assess phylogenetic relationships among major lineages of angiosperms.
The complete chloroplast genome sequence of Citrus sinensis is 160,129 bp in length, and contains 133 genes (89 protein-coding, 4 rRNAs and 30 distinct tRNAs). Genome organization is very similar to the inferred ancestral angiosperm chloroplast genome. However, in Citrus the infA gene is absent. The inverted repeat region has expanded to duplicate rps19 and the first 84 amino acids of rpl22. The rpl22 gene in the IRb region has a nonsense mutation resulting in 9 stop codons. This was confirmed by PCR amplification and sequencing using primers that flank the IR/LSC boundaries. Repeat analysis identified 29 direct and inverted repeats 30 bp or longer with a sequence identity > or = 90%. Comparison of protein-coding sequences with expressed sequence tags revealed six putative RNA edits, five of which resulted in non-synonymous modifications in petL, psbH, ycf2 and ndhA. Phylogenetic analyses using maximum parsimony (MP) and maximum likelihood (ML) methods of a dataset composed of 61 protein-coding genes for 30 taxa provide strong support for the monophyly of several major clades of angiosperms, including monocots, eudicots, rosids and asterids. The MP and ML trees are incongruent in three areas: the position of Amborella and Nymphaeales, relationship of the magnoliid genus Calycanthus, and the monophyly of the eurosid I clade. Both MP and ML trees provide strong support for the monophyly of eurosids II and for the placement of Citrus (Sapindales) sister to a clade including the Malvales/Brassicales.
This is the first complete chloroplast genome sequence for a member of the Rutaceae and Sapindales. Expansion of the inverted repeat region to include rps19 and part of rpl22 and presence of two truncated copies of rpl22 is unusual among sequenced chloroplast genomes. Availability of a complete Citrus chloroplast genome sequence provides valuable information on intergenic spacer regions and endogenous regulatory sequences for chloroplast genetic engineering. Phylogenetic analyses resolve relationships among several major clades of angiosperms and provide strong support for the monophyly of the eurosid II clade and the position of the Sapindales sister to the Brassicales/Malvales.
[Show abstract][Hide abstract] ABSTRACT: Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms.
The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats > or = 30 bp with a sequence identity > or = 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II.
The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements.
[Show abstract][Hide abstract] ABSTRACT: The plastid transformation approach offers a number of unique advantages, including high-level transgene expression, multi-gene engineering, transgene containment, and a lack of gene silencing and position effects. The extension of plastid transformation technology to monocotyledonous cereal crops, including rice, bears great promise for the improvement of agronomic traits, and the efficient production of pharmaceutical or nutritional enhancement. Here, we report a promising step towards stable plastid transformation in rice. We produced fertile transplastomic rice plants and demonstrated transmission of the plastid-expressed green fluorescent protein (GFP) and aminoglycoside 3'-adenylyltransferase genes to the progeny of these plants. Transgenic chloroplasts were determined to have stably expressed the GFP, which was confirmed by both confocal microscopy and Western blot analyses. Although the produced rice plastid transformants were found to be heteroplastomic, and the transformation efficiency requires further improvement, this study has established a variety of parameters for the use of plastid transformation technology in cereal crops.
Full-text · Article · Jul 2006 · Molecules and Cells
[Show abstract][Hide abstract] ABSTRACT: Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. Repeat analyses identified 33-45 direct and inverted repeats >or=30 bp with a sequence identity of at least 90%; all but five of the repeats shared by all four Solanaceae genomes are located in the same genes or intergenic regions, suggesting a functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in chloroplast genomes. Only four spacer regions are fully conserved (100% sequence identity) among all genomes; deletions or insertions within some intergenic spacer regions result in less than 25% sequence identity, underscoring the importance of choosing appropriate intergenic spacers for plastid transformation and providing valuable new information for phylogenetic utility of the chloroplast intergenic spacer regions. Comparison of coding sequences with expressed sequence tags showed considerable amount of variation, resulting in amino acid changes; none of the C-to-U conversions observed in potato and tomato were conserved in tobacco and Atropa. It is possible that there has been a loss of conserved editing sites in potato and tomato.
Full-text · Article · Jun 2006 · Theoretical and Applied Genetics
[Show abstract][Hide abstract] ABSTRACT: Cotton (Gossypium hirsutum) is the most important fiber crop grown in 90 countries. In 2004-2005, US farmers planted 79% of the 5.7-million hectares of nuclear transgenic cotton. Unfortunately, genetically modified cotton has the potential to hybridize with other cultivated and wild relatives, resulting in geographical restrictions to cultivation. However, chloroplast genetic engineering offers the possibility of containment because of maternal inheritance of transgenes. The complete chloroplast genome of cotton provides essential information required for genetic engineering. In addition, the sequence data were used to assess phylogenetic relationships among the major clades of rosids using cotton and 25 other completely sequenced angiosperm chloroplast genomes.
The complete cotton chloroplast genome is 160,301 bp in length, with 112 unique genes and 19 duplicated genes within the IR, containing a total of 131 genes. There are four ribosomal RNAs, 30 distinct tRNA genes and 17 intron-containing genes. The gene order in cotton is identical to that of tobacco but lacks rpl22 and infA. There are 30 direct and 24 inverted repeats 30 bp or longer with a sequence identity > or = 90%. Most of the direct repeats are within intergenic spacer regions, introns and a 72 bp-long direct repeat is within the psaA and psaB genes. Comparison of protein coding sequences with expressed sequence tags (ESTs) revealed nucleotide substitutions resulting in amino acid changes in ndhC, rpl23, rpl20, rps3 and clpP. Phylogenetic analysis of a data set including 61 protein-coding genes using both maximum likelihood and maximum parsimony were performed for 28 taxa, including cotton and five other angiosperm chloroplast genomes that were not included in any previous phylogenies.
Cotton chloroplast genome lacks rpl22 and infA and contains a number of dispersed direct and inverted repeats. RNA editing resulted in amino acid changes with significant impact on their hydropathy. Phylogenetic analysis provides strong support for the position of cotton in the Malvales in the eurosids II clade sister to Arabidopsis in the Brassicales. Furthermore, there is strong support for the placement of the Myrtales sister to the eurosid I clade, although expanded taxon sampling is needed to further test this relationship.
[Show abstract][Hide abstract] ABSTRACT: The Vitaceae (grape) is an economically important family of angiosperms whose phylogenetic placement is currently unresolved. Recent phylogenetic analyses based on one to several genes have suggested several alternative placements of this family, including sister to Caryophyllales, asterids, Saxifragales, Dilleniaceae or to rest of rosids, though support for these different results has been weak. There has been a recent interest in using complete chloroplast genome sequences for resolving phylogenetic relationships among angiosperms. These studies have clarified relationships among several major lineages but they have also emphasized the importance of taxon sampling and the effects of different phylogenetic methods for obtaining accurate phylogenies. We sequenced the complete chloroplast genome of Vitis vinifera and used these data to assess relationships among 27 angiosperms, including nine taxa of rosids.
The Vitis vinifera chloroplast genome is 160,928 bp in length, including a pair of inverted repeats of 26,358 bp that are separated by small and large single copy regions of 19,065 bp and 89,147 bp, respectively. The gene content and order of Vitis is identical to many other unrearranged angiosperm chloroplast genomes, including tobacco. Phylogenetic analyses using maximum parsimony and maximum likelihood were performed on DNA sequences of 61 protein-coding genes for two datasets with 28 or 29 taxa, including eight or nine taxa from four of the seven currently recognized major clades of rosids. Parsimony and likelihood phylogenies of both data sets provide strong support for the placement of Vitaceae as sister to the remaining rosids. However, the position of the Myrtales and support for the monophyly of the eurosid I clade differs between the two data sets and the two methods of analysis. In parsimony analyses, the inclusion of Gossypium is necessary to obtain trees that support the monophyly of the eurosid I clade. However, maximum likelihood analyses place Cucumis as sister to the Myrtales and therefore do not support the monophyly of the eurosid I clade.
Phylogenies based on DNA sequences from complete chloroplast genome sequences provide strong support for the position of the Vitaceae as the earliest diverging lineage of rosids. Our phylogenetic analyses support recent assertions that inadequate taxon sampling and incorrect model specification for concatenated multi-gene data sets can mislead phylogenetic inferences when using whole chloroplast genomes for phylogeny reconstruction.
Full-text · Article · Feb 2006 · BMC Evolutionary Biology
[Show abstract][Hide abstract] ABSTRACT: Post-transcriptional RNA processing and translational regulations are important steps for gene expression. To analyze the
5′UTRof psbA that enhances translation of the sweet protein monellin in chloroplasts, we cloned the monellin gene, with and without
thepsbA 5′UTR, into the chloroplast expression vector for chloroplast transformation. Transgenic plants were identified as being
transplastomic via PCR and Southern blot analyses. We also observed non-specific recombination during tobacco chloroplast
transformation. Analyses of the transcription patterns showed that intercistronic cleavage of the psbA mRNA 5′ untranslated
(UTR) region was functional at the mature stage, with the monocistronic mRNA ofmonellin increasing while its dicistronic mRNA decreased. Moreover, monellin accumulation accounted for 2.3% of the total soluble
protein at the mature stage, but only 1.3% at the young stage in transplastomic lines that contained the 5′UTRof psbA. These results suggest that activation of the endonucleolytic cleavage of thepsbA 5′UTR element depends on chloroplast developmental conditions, and that it enhances the accumulation of sweet protein monellin
in those chloroplasts.
Full-text · Article · Jan 2006 · Journal of Plant Biology
[Show abstract][Hide abstract] ABSTRACT: Lack of complete chloroplast genome sequences is still one of the major limitations to extending chloroplast genetic engineering technology to useful crops. Therefore, we sequenced the soybean chloroplast genome and compared it to the other completely sequenced legumes, Lotus and Medicago. The chloroplast genome of Glycine is 152,218 basepairs (bp) in length, including a pair of inverted repeats of 25,574 bp of identical sequence separated by a small single copy region of 17,895 bp and a large single copy region of 83,175 bp. The genome contains 111 unique genes, and 19 of these are duplicated in the inverted repeat (IR). Comparisons of Glycine, Lotus and Medicago confirm the organization of legume chloroplast genomes based on previous studies. Gene content of the three legumes is nearly identical. The rpl22 gene is missing from all three legumes, and Medicago is missing rps16 and one copy of the IR. Gene order in Glycine, Lotus, and Medicago differs from the usual gene order for angiosperm chloroplast genomes by the presence of a single, large inversion of 51 kilobases (kb). Detailed analyses of repeated sequences indicate that many of the Glycine repeats that are located in the intergenic spacer regions and introns occur in the same location in the other legumes and in Arabidopsis, suggesting that they may play some functional role. The presence of small repeats of psbA and rbcL in legumes that have lost one copy of the IR indicate that this loss has only occurred once during the evolutionary history of legumes.
Full-text · Article · Oct 2005 · Plant Molecular Biology