Page 1
Complete Genomic Structure of the Cultivated Rice Endophyte
Azospirillum sp. B510
TAKAKAZU Kaneko1,2, KIWAMU Minamisawa3, TSUYOSHI Isawa3, HIROKI Nakatsukasa1, HISAYUKI Mitsui3,
YASUYUKI Kawaharada3, YASUKAZU Nakamura1, AKIKO Watanabe1, KUMIKO Kawashima1, AKIKO Ono1,
YOSHIMI Shimizu1, CHIKA Takahashi1, CHIHARU Minami1, TSUNAKAZU Fujishiro1, MITSUYO Kohara1,
MIDORI Katoh1, NAOMI Nakazaki1, SHINOBU Nakayama1, MANABU Yamada1, SATOSHI Tabata1,
and SHUSEI Sato1,*
Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan1; Faculty of Engineering,
Kyoto Sangyo University, Motoyama, Kamigamo, Kita-Ku, Kyoto 603-8555, Japan2 and Graduate School of Life
Sciences, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai, Miyagi 980-8577, Japan3
*To whom correspondence should be addressed. Tel. þ81-438-52-3923. Fax. þ81-438-52-3924.
E-mail: ssato@kazusa.or.jp
Edited by Katsumi Isono
(Received 28 September 2009; accepted 2 December 2009)
Abstract
We determined the nucleotide sequence of the entire genome of a diazotrophic endophyte, Azospirillum sp.
B510. Strain B510 is an endophytic bacterium isolated from stems of rice plants (Oryza sativa cv. Nipponbare).
The genome of B510 consisted of a single chromosome (3 311 395 bp) and six plasmids, designated as
pAB510a (1 455 109 bp), pAB510b (723 779 bp), pAB510c (681 723 bp), pAB510d (628 837 bp),
pAB510e (537 299 bp), andpAB510f (261 596 bp). The chromosomebears2893potential protein-encoding
genes, two sets of rRNA gene clusters (rrns), and 45 tRNA genes representing 37 tRNA species. The genomes of
the six plasmids contained a total of 3416 protein-encoding genes, seven sets of rrns, and 34 tRNAs represent-
ing 19 tRNA species. Eight genes for plasmid-specific tRNA species are located on either pAB510a or pAB510d.
Twoout of eight genomic islands are inserted in the plasmids, pAB510b and pAB510e, and one of the islands is
inserted into trnfM-CAU in the rrn located on pAB510e. Genes other than the nif gene cluster that are involved
in N2 fixation and are homologues of Bradyrhizobium japonicum USDA110 include fixABCX, fixNOQP, fixHIS,
fixG, and fixLJK. Three putative plant hormone-related genes encoding tryptophan 2-monooxytenase (iaaM)
and indole-3-acetaldehyde hydrolase (iaaH), which are involved in IAA biosynthesis, and ACC deaminase
(acdS), which reduces ethylene levels, were identified. Multiple gene-clusters for tripartite ATP-independent
periplasmic-transport systems and a diverse set ofmalic enzymes were identified, suggesting that B510 utilizes
C4-dicarboxylate during its symbiotic relationship with the host plant.
Keywords: Azospirillum; endophyte; rice plant; N2 fixation; plant hormone
1. Introduction
Endophytes are microorganisms that are able to
colonize the intercellular, and sometimes also intra-
cellular, spaces of plant tissues, without causing
apparent damage to the host plant. Gram-positive
and Gram-negative bacterial endophytes have been
isolated from several tissues in numerous plant
species.1,2 Many endophytes have beneficial effects
on plant growth and health.3–5 N2-fixing bacterial
endophytes, such as Herbaspirillum seropedicae,
Gluconacetobacter diazotrophicus, and Azoarcus sp.,
have been found within the tissues of some crops
and grasses, and partially contribute to the nitrogen
requirement of the host plants.6 Azoarcus sp. strain
BH72, isolated from the salt marsh plant kallar
# The Author 2010. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://
creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium,
provided the original work is properly cited.
DNA RESEARCH 17, 37–50, (2010) doi:10.1093/dnares/dsp026
Advance Access Publication: 4 January 2010
Page 2
grass, is best studied in terms of the molecular mech-
anisms of establishment inside plants and endophyte
functions.7
Krause et al.8 reported the first full genome sequence
of an endophyte, strain BH72 of Azoarcus species
(4.38 Mb), and this sequence provided valuable
insights into the life of bacterial endophytes, including
information about interactions with host plants. Fouts
et al.9 also reported the whole genome sequence of a
N2-fixing endophyte, Klebsiella pneumoniae 342.
Comparative genomics of naturally occurring bacterial
endophytes provides information that can be used to
develop enhanced bacterial endophytes.10
The genus Azospirillum consists of spirillum-shaped,
N2-fixing, Gram-negative alpha-proteobacteria that
often live in the plant rhizosphere.11 Since Azospirillum
inoculation promotes plant growth, agronomic appli-
cations of this genus have been developed.12
Azospirillum sp. B510 was isolated on 23 August 1999
from the surface-sterilized stems of rice plants (Oryza
sativa cv. Nipponbare) that were cultivated in the
Kashimadai experimental paddy field of Tohoku
University (Miyagi, Japan).13 The B510 strain is closely
related to A. oryzae COC8, which was reported as a
paddy soil bacterium (with 97.7% identity in their 16S
rRNA gene sequences),14 and B510 is classified in the
same cluster of the phylogenetic tree as A. oryzae COC8
(Supplementary Fig. S1). In addition to being a diazo-
troph under free-living conditions, B510 was found to
have positive motility, and to be capable of degrading
plant cell walls.13 Inoculation with Azospirillum sp. B510
was shown to promote plant growth under both labora-
tory and field conditions (Isawa et al., unpublished
results). Specifically, the field experiment in a field in
Hokkaido, Japan, indicated that B510 inoculation
increases stem number resulting in an increase in seed
yield (Isawa et al., unpublished results). Moreover, B510
inoculation enhanced disease resistance to virulent rice
blast fungus and the bacterial pathogen Xanthomonas
oryzae.15 Thus, Azospirillum sp. B510 is likely a beneficial
bacterium with agronomic applications.
In this study, we demonstrated the endophytic
characteristics of Azospirillum sp. B510 and its ability
to fix N2 in planta. Then, we determined the complete
nucleotide sequence of the Azospirillum sp. B510
genome and deduced the gene repertoire in the
genome. This is the first report of the genome struc-
ture of the genus Azospirillum.
2. Materials and methods
2.1. A bacterial strain, inoculation of rice plants, and
estimation of N2 fixation ability and of the
internal Azospirillum sp. B510 population
Azospirillum sp. B510 is a diazotrophic endophyte
that was isolated from the stem of cultivated rice,
O. sativa cv. Nipponbare.13 Bacteria were cultured in
Nutrient Broth (Difco, Detroit, MI, USA), collected by
centrifugation at 5000g for 3 min, and washed twice
with sterile saline (0.85% w/v NaCl). The bacterial
cell suspension was adjusted to 2 � 107 cells ml21 in
saline solution just before inoculation.16
The hulls of rice seeds were carefully removed using
forceps. After the hulled seeds were shaken in 10%
(w/v) Ca(OCl)2 for 30 min at 288C, they were
washed more than three times with sterile distilled
water. A surface-sterilized seed was placed in a steri-
lized test tube (16.5 mm in diameter, 150 mm in
height) containing 9 ml of 0.325% (w/v) semi-solid
agar solution with the sterilized inorganic nutri-
ents,13,17 and the tube was covered with an alu-
minium cap. Each seed was inoculated with a
bacterial cell suspension of 1 � 107 cells. The rice
plants were cultivated at 258C under long-day con-
ditions (16-h light and 8-h dark) for 10 days in a
plant growth cabinet (LH300; NK Systems Co. Ltd,
Osaka, Japan) that provided 65 mmol photons m22
s21 of photosynthetically active radiation.18
To estimate N2-fixing activity, acetylene was intro-
duced into test tubes, each containing a 10-day-old
rice seedling, and the tubes were enclosed with a ster-
ilized rubber stopper. After a 24-h incubation period,
the ethylene concentration was determined by gas
chromatography as described previously.13 Internal
populations of inoculated bacteria inside rice tissues
were estimated as follows. The 10-day-old rice seed-
lings were sampled from the test tubes. After the
seed parts of the seedlings were removed using
forceps, the remaining parts of the seedlings were
weighed. The parts of seedlings were dipped in 70%
(v/v) ethanol and then immersed in 1% NaOCl sol-
ution for 30 s. They were then quickly washed three
times with sterilized distilled water and then once
with sterilized saline solution. After the surface-steri-
lized plants were aseptically macerated in 1 ml of
saline solution using a mortar and pestle, the mace-
rates were serially diluted with saline solution and
plated on Nutrient Agar (Difco) plates. After incu-
bation at 308C for 7 days, colony forming units
(CFUs) were determined based on the fresh weight
of the rice plants. Simultaneously, uninoculated
plants were grown and subjected to CFU determi-
nation, as a negative control.
2.2. Genome sequencing
Total cellular DNA was purified according to stan-
dard procedures, and three genomic libraries, based
on two types of cloning vectors, were constructed
for sequencing. The IB5100/1 library contained
inserts of �3.0 kb cloned into pUC118 (Takara Bio
Inc., Japan), the IB5102/3 library contained inserts
38 Complete genomic structure of Azospirillum endophyte [Vol. 17,
Page 3
of �4.5 kb cloned into pUC118, and the IB510b
library contained inserts with an average size of
58 kb cloned into a BAC vector, pCC1BAC (Epicentre
Bio., USA).
Genome sequencing was performed using the
whole-genome shotgun method in combination
with BAC end-sequencing. The nucleotide sequences
of both ends of the clones from the IB5100/1,
IB5102/3, and IB510b libraries were analysed using
a Dye-terminator Cycle Sequencing Kit and the
3730XL Sequencer (Applied Biosystems, USA). The
end-sequence data from the BAC clones facilitated
the gap-closure process and provided the scaffolding
for reconstruction of the sequence of the entire
genome. We filled the remaining gaps in the sequence
by primer walking, using the plasmids or the BAC
clones as templates. The integrity of the reconstructed
genome sequence was assessed by chromosome
walking using the end sequences of the BAC clones.
2.3. Gene assignment, annotation, and information
analyses
RNA- and protein-encoding regions were assigned
by a combination of computer prediction and simi-
larity searches, as described previously.19
Genes for structural RNAs were identified by simi-
larity searches against an in-house structural RNA
database that had been constructed based on data
available in GenBank (rel.167). tmRNA-, tRNA-, and
rRNA-encoding regions were predicted using the
ARAGORN 1.2.20 program,20 the tRNA scan-SE
1.23 program,21 and the RNAmmer ver.1.2S
program,22 respectively, in combination with simi-
larity searches.
The prediction of protein-encoding regions was
carried out with the Glimmer 2.13 prediction
program.23 Prior to prediction, a matrix was gener-
ated for the B510 genome by training with a data
set of 610 open-reading frames that showed a high
degree of sequence similarity to a translated gene
set registered as the genomic data for both
Magnetospirillum magneticum AMB-1 (accession
number AP007255) and Rhodospirillum rubrum
ATCC 11170 (CP000230), which are bacteria
closely related to Azospirillum species.24 All the pre-
dicted protein-encoding regions of 150 bp or more
were translated into amino acid sequences, which
were then subjected to similarity searches against
the non-redundant (nr) protein database from NCBI
(GenBank database rel. 167.0) using the BLASTP
program.25 In parallel, all the predicted intergenic
sequences were compared with sequences in the nr
database using the BLASTX program to identify
genes that were not detected by the prediction
process. For predicted genes that did not show
sequence similarity to known genes, only those
equal to or longer than 150 bp were considered
candidates.
To annotate the functions of the assigned genes, the
KAAS system, which is based on bi-directional best-hit
information from sequence similarity against the
KEEG GENES database and on heuristics, was first
applied to all predicted protein-encoding genes.26
Next, the group escaped from KAAS was deduced
based on the sequence similarity of their translated
protein products to those of genes of known function
and to the protein motifs in the InterPro database
(ver. 17.0).27 A BLAST score of 1025 was considered
significant. Assignment of Clusters of Orthologous
Groups of proteins (COGs) of predicted gene products
was carried out by BLASTP analysis against the COG
reference data set (http://www.ncbi.nlm.nih.gov/
COG/).28 A BLAST E-value of less than 10210 was con-
sidered significant. After filtering, COG assignments of
the putative gene products were generated according
to COG identification, using the best-hit pair in the
reference data set.
Comparison between two genomic nucleotide
sequences was performed using GenomeMatcher
V.1.270.29 The GC-skew analysis was performed as
described by Lobry.30 Phage_Finder ver. 4.6 was used
to detect the prophage region inserted into the
B510 genome.31
The FtsK Orienting Polar Sequences (KOPS) motif is
specifically oriented toward the replication terminus
of the genomic sequences in alpha-proteobacteria.32
The cumulative distribution of the KOPS sequence
patterns (GGGNAGGG) was calculated along each
replicon of B510, and the distribution of these pat-
terns in the genome was plotted. Multicopy DNA
elements of longer than 500 bp that have the
capacity to encode a putative transposase were ident-
ified as insertion sequences (ISs), using the BLAST2
program, and then classified using RECON1.0533
and the IS finder (www-is.biotoul.fr).
3. Results and discussion
3.1. Colonization and N2 fixation in rice plants
The internal population of B510 was evaluated using
surface-sterilized rice seedlings and the plate counting
method. We calculated that there were 1.5–5.7 � 104
CFU g21 fresh weight of inoculated seedlings. In con-
trast, no colonies were detected in uninoculated rice
plants. These data indicate that B510 cells colonized
internal rice tissues, although the colonization level
was lower than that reported for other endophytes,
including Herbaspirillum sp. B501 (�106 CFU g21
fresh weight).13,34 Indeed, Yasuda et al.15 also observed
colonization of Azospirillum sp. B510 around the basal
No. 1] T. Kaneko et al. 39
Page 4
parts of shoots of cv. Nipponbare using gusA-tagged
B510.
To evaluate the in planta N2-fixing activity of
Azospirillum sp. B510, acetylene reduction activity
was assayed using rice seedlings inoculated or not
with the bacterium. In the presence of acetylene,
the seedlings inoculated with Azospirillum sp. B510
showed marked acetylene reduction activity com-
pared with the activity in uninoculated plants and in
plants inoculated in air (control) (Supplementary
Table S1). Acetylene reduction activity of
Azospirillum sp. B510 in planta (43 nmol h21 g21
fresh weight) was similar to that of Herbaspirillum
sp. B501 (67 nmol h21 g21 fresh weight), a well-
characterized N2-fixing endophyte isolated from rice
plants.13,17,34
3.2. Sequencing and structural features of the genome
The nucleotide sequence of the entire B510
genome was deduced initially by assembling a total
of 66 554 sequence files, which corresponded to
approximately six genome equivalents, according to
the method described in the Materials and methods
section. To ensure that the nucleotide sequence was
sufficiently accurate for further analysis of gene struc-
ture and function, finishing was carried out by visually
editing the draft sequences and by additional sequen-
cing to close the gaps. The genome of B510 consists
of a single chromosome and six circular plasmids
designated as pAB510a, pAB510b, pAB510c,
pAB510d, pAB510e, and pAB510f. The total size of
the genome is 7 599 738 bp, and the average GC
content is 67.6%. The size and the percentage of GC
content of each replicon are summarized in Table 1.
The integrity of 99.9% of the final genome sequence
was assessed by comparing the insert length of
anchored BAC clones with the computed distance
between the end sequences of the clones. The
integrity of the remaining region (334 299–
341 799 nt on the chromosome) where no BAC
clone was anchored was confirmed using the
sequence and insert length information of the
plasmid clones.
The nucleotide position was numbered from one
nucleotide upstream of the predicted ATG start
codon, based on the predicted translational initiation
site of the homologue of hemE (AZL028930) in the
chromosome. Nucleotide positions for the plasmids
were assigned based on the predicted translational
initiation site of AZLa11310 in pAB510a,
AZLb06310 in pAB510b, AZLc05330 in pAB510c,
AZLe04150 in pAB510e, AZLf01870 in pAB510f,
and the termination of AZLd05190 in pAB510d,
respectively.
Ten Azospirillum species have been examined for
their genome composition, using pulsed-field gel elec-
trophoresis.35 Multiple replicons were identified in 10
Azospirillum species as with B510. However, the
chromosome size of each Azospirillum strain
(,2.7 Mb) was smaller than one of B510 (3.3 Mb).
Linear plasmids were detected in several Azospirillum
strains, such as A. brasilense and A. lipoferum,35 but
similar structural units were not found in the B510
genome.
3.3. Structural features of the genome
3.3.1. Putative replication origin A GC-skew
analysis was performed to locate the probable origin
of DNA replication. We established that the shift of
GC-skew values occurred in two regions of the
chromosome, at coordinates 35 and 1710 kb, as
shown in Fig. 1 (the innermost circle). The hemE locus,
which is known to associate with the origins of
replication in alpha-proteobacteria,36 was found to be
adjacent to the shift point of the GC skew. A cluster of
nine genes, rho–hypothetical–hemH–hemE–
Table 1. Features of replicons in Azospirillum sp. B510 genome
Chromosome pAB510a pAB510b pAB510c pAB510d pAB510e pAB510f
Size (bp) 3 311 395 1 455 109 723 779 681 723 628 837 537 299 261 596
G þ C content (%) 67.8 67.6 67.5 67.4 68.0 67.5 65.9
Prophage 2 ND ND ND ND ND ND
Genomic island 6 ND 1 ND ND 1 ND
tRNA genes 45 14 2 3 6 9 ND
rRNA genesa 2 (rrn1,2) 4 (rrn4,5,6,7) 1 (rrn8) 1 (rrn9) ND 1 (rrn3) ND
Protein genes 2893 1131 631 533 519 415 187
COG assignmentb 2020 896 525 441 389 309 138
Not in COGs 873 235 106 92 130 106 49
ND means ‘not identified’.
aThe parenthetic references show IDs for the rRNA gene cluster (Supplementary Fig. S2).
bThe numbers of genes classified into 19 COG categories except for ones in ‘function unknown’ are shown.
40 Complete genomic structure of Azospirillum endophyte [Vol. 17,
Page 5
hypothetical–maf–aroE–coaE–dnaQ (AZL028900–
AZL028930–AZL000010– AZL000050: these codes
are hereinafter defined in the Protein-encoding genes
section), occurring at �0 kb on the B510 chromosome
(Supplementary Fig. S2) was commonly found in
the Magnetospirillum sp. AMB-1 genome.37 parA
(AZL000140) and parB (AZL000150) were found
downstream of dnaQ (AZL000050; Supplementary
Fig. S2). These findings strongly suggest that the ori
region of the chromosome is located between
Figure 1. Schematic representation of seven circular replicons in the Azospirillum sp. B510 genome. The scale for all plasmids is the same,
and the scale for the chromosome is one-half that of the plasmids. The scale indicates the location (in kb) outside the map. The bars in
the two outer circles, the outermost circle and the second circle, show the positions of the putative protein-encoding genes in clockwise
and counter-clockwise directions, respectively. The putative genes are represented by 25 colours, based on COG assignments
(Supplementary Fig. S2). In the third circles from the outside, positions of structural RNA genes are indicated by black (tRNAs) and
red (rRNAs) bars. In the fourth circles from the outside, the red bars indicate the positions of ISs, and the pale-green and blue areas
show the insertion of prophages and genomic islands, respectively. The innermost and second circles from the centre show the GC-
skew values (yellow and purple) and the average GC percent (blue and red), respectively, calculated using a window-size of 10 kb.
The scales for GC percent are presented on the second circles. Top and bottom of each scale are shown, as follows: 73.9/58.9% in
the chromosome, 74.6/57.0% in pAB510a, 72.5/60.7% in pAB510b, 72.0/60.2% in pAB510c, 74.9/48.3% in pAB510d, 71.9/
59.0% in pAB510e, and 71.1/52.7% in pAB510f.
No. 1] T. Kaneko et al. 41
End of preview.