ArticlePDF Available

Genome-Wide TSS Distribution in Three Related Clostridia with Normalized Capp-Switch Sequencing


Abstract and Figures

Solventogenic clostridia have been employed in industry for more than a century, initially being used in the acetone-butanol-ethanol (ABE) fermentation process for acetone and butanol production. Interest in these bacteria has recently increased in the context of green chemistry and sustainable development.
Content may be subject to copyright.
Genome-Wide TSS Distribution in Three Related Clostridia with
Normalized Capp-Switch Sequencing
Rémi Hocq,
*Surabhi Jagtap,
Magali Boutard,
Andrew C. Tolonen,
Laurent Duval,
Aurélie Pirayre,
Nicolas Lopes Ferreira,
François Wasels
IFP Energies Nouvelles, Rueil-Malmaison, France
Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université dEvry, Université Paris-Saclay, Evry, France
ABSTRACT Transcription initiation is a tightly regulated process that is crucial for many
aspects of prokaryotic physiology. High-throughput transcription start site (TSS) mapping
can shed light on global and local regulation of transcription initiation, which in turn may
help us understand and predict microbial behavior. In this study, we used Capp-Switch
sequencing to determine the TSS positions in the genomes of three model solventogenic
clostridia: Clostridium acetobutylicum ATCC 824, C. beijerinckii DSM 6423, and C. beijerinckii
NCIMB 8052. We rst rened the approach by implementing a normalization pipeline
accounting for gene expression, yielding a total of 12,114 mapped TSSs across the spe-
cies. We further compared the distributions of these sites in the three strains. Results indi-
cated similar distribution patterns at the genome scale, but also some sharp differences,
such as for the butyryl-CoA synthesis operon, particularly when comparing C. acetobutyli-
cum to the C. beijerinckii strains. Lastly, we found that promoter structure is generally
poorly conserved between C. acetobutylicum and C. beijerinckii.Afewconservedpro-
moters across species are discussed, showing interesting examples of how TSS determina-
tion and comparison can improve our understanding of gene expression regulation at
the transcript level.
IMPORTANCE Solventogenic clostridia have been employed in industry for more than
a century, initially being used in the acetone-butanol-ethanol (ABE) fermentation process
for acetone and butanol production. Interest in these bacteria has recently increased in
the context of green chemistry and sustainable development. However, our current
understanding of their genomes and physiology limits their optimal use as industrial sol-
vent production platforms. The gene regulatory mechanisms of solventogenesis are still
only partly understood, impeding efforts to increase rates and yields. Genome-wide map-
ping of transcription start sites (TSSs) for three model solventogenic Clostridium strains is
an important step toward understanding mechanisms of gene regulation in these indus-
trially important bacteria.
KEYWORDS Clostridium, butanol, solvents, transcriptional regulation
In bacteria, a key step in the regulation of gene expression is at the initiation of tran-
scription, specically whether and where the RNA polymerase binds DNA at the tran-
scription start site (TSS) (1). In this regard, a precise description of the dynamics of TSS
usage patterns can provide critical information for the comprehension of bacterial
transcription regulation. Genome-wide determination of prokaryotic TSSs has been
greatly facilitated by RNA-seq-derived methods that take advantage of the characteris-
tic presence of a 59triphosphate on the initiating nucleotide of unprocessed RNAs
(such as mRNAs). In the most recent approaches, these RNAs are selectively targeted
by the vaccinia capping enzyme, which adds either a desthiobiotinylated (Cappable-
seq) or a biotinylated (Capp-Switch seq) guanosine cap on triphosphorylated RNAs,
permitting their capture on streptavidin-coupled beads (24).
Editor Jennifer M. Auchtung, University of
Copyright © 2022 Hocq et al. This is an open-
access article distributed under the terms of
the Creative Commons Attribution 4.0
International license.
Address correspondence to François Wasels,
*Present address: Rémi Hocq, Institute for
Chemical, Environmental and Bioscience
Engineering, Biochemical Engineering
Research Division, Vienna University of
Technology, Vienna, Austria.
The authors declare no conict of interest.
Received 16 November 2021
Accepted 22 March 2022
Published 12 April 2022
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 1
In this study, we used the Capp-Switch sequencing methodology to map TSSs
alongside the genomes of three clostridial strains: C. acetobutylicum ATCC 824, C. bei-
jerinckii NCIMB 8052 and C. beijerinckii DSM 6423. These model solventogenic
Clostridium strains share the capacity to ferment plant polysaccharides into a mixture
of acetone, isopropanol, ethanol, and butanol (5). Despite this shared phenotypic trait,
as well as others (anaerobic metabolism, ability to ferment a wide range of carbohy-
drates, sporulation), the C. beijerinckii and C. acetobutylicum species are clearly geneti-
cally distinct (6). Genetic differences have arisen even between the same-species
strains NCIMB 8052 and DSM 6423, leading to distinct phenotypic characteristics and
behavior (e.g., reduction of acetone) (7). These similarities and differences, arising from
close but distinct evolutionary paths, are likely to be reected at the genetic level.
Comparing TSS distributions on the genomes of these strains could therefore highlight
important gene regulation features, as functionally important TSSs theoretically have a
high chance to be conserved.
We observed that Capp-Switch sequencing data sets are strongly biased toward the
detection of TSSs for highly expressed genes, especially over 1,000 transcripts per kilo-
base million (TPM). We thus improved the Capp-Switch analysis pipeline by incorporat-
ing a normalization step based on RNA-seq expression. Using normalized Capp-Switch
data sets, we described TSS distribution in these three model solventogenic strains,
identifying primary and alternative TSSs at the genome-wide level. We experimentally
validated our pipeline with the example of the central butyryl-CoA synthesis operon,
for which secondary TSS positioning is species-specic. Finally, we identied conserved
promoters across strains and compared the resulting maps, hypothesizing on the
potential regulatory roles of these features and pinpointing how TSS identication can
be used to access potentially crucial regulatory features which may differ between
strains or species.
Manual TSS discovery in C. acetobutylicum from raw Capp-Switch data. The
Capp-Switch seq library preparation pipeline (Fig. 1a) (3) was applied to RNA samples
extracted from mid-log-phase C. beijerinckii DSM 6423, NCIMB 8052, and C. acetobutyli-
cum ATCC 824 cultures growing on glucose. C. acetobutylicum raw reads sequencing
data were rst examined manually using Geneious software (https://www.geneious
.com) to identify TSSs along the chromosome and pSOL plasmid sequences. Although
TSSs were easily identied in most cases, strong background signal, i.e., mapping of
reads which apparently did not correspond to the 59ends of transcripts, hindered the
precise identication of TSSs for some genes. In particular, highly expressed genes
were covered by reads on their whole sequence, not only at the 59end of the tran-
script. This background noise was most likely detected because of nonspecic initia-
tion of transcription or the TSS enrichment step retaining processed transcripts.
A total of 1,250 TSSs were manually identied, of 94.8% were purines (524 A and
661 G). This TSS mapping identied novel transcriptional features (Fig. 1b to d). For
example, expression of the CA_C0303 gene, encoding a ferredoxin, is anked by
inward-facing TSS (Fig. 1b), suggesting that the antisense transcript could regulate
CA_C0303 expression. As a second example, transcription of CA_C2229, encoding the
pyruvate ferredoxin oxidoreductase, appears to be initiated at two upstream sites
(Fig. 1c). It can be hypothesized that alternative TSSs are the result of transcription initi-
ated by polymerases bound to alternative sigma factors, which might help regulate
gene expression depending on specic conditions such as environmental stimuli or re-
dox state. As a third example of how TSS data reveal novel transcript features, identi-
cation of a TSS at position 17,288 in the pSOL megaplasmid (Fig. 1d) lead to the identi-
cation of an unannotated coding sequence (8, 9). The product of this gene is identical
to holin-like toxins identied in the Clostridium genus and shares 41.2% identity with
the antibacterial protein Tmp1 (10).
Expression normalization improves TSS detection. We found that genes which
are highly expressed in published RNA-seq data sets (7, 11) often had multiple TSSs.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 2
Hence, we reasoned that the nal threshold set on reads per million (RPM) values of
TSSs (i.e., 10 RPM for Capp-Switch sequencing [3]) needed to be dynamically adjusted to
consider the local gene expression downstream from each TSS. Each TSS is associated
with a single gene based on its localization (intragenic, positioned inside the associated
gene; intergenic, positioned outside the associated gene, dened as having the closest
gene start or end relative to the TSS). We performed RNA-seq on the same RNA samples
that were used to prepare Capp-Switch seq libraries. We then normalized the strength of
each TSS (in RPM) with the RNA-seq expression (in transcripts per kilobase million [TPM])
of its associated gene (Fig. S1 in the supplemental material). Because each identied TSS
is associated with a normalized RPM value which is dependent on gene expression, this
method allowed us to retain, in practice, an invariable, simple, normalized RPM cutoff at
the end of the data analysis pipeline (10 normalized RPM, or 25 for a higher condence).
Next, we compared the data sets for each strain, with (normalized) and without (raw)
normalization (Fig. 2, Fig. S2 and 3). Adjusting TSS strength relative to local gene expres-
sion signicantly changed the data set distribution (Fig. 2a, Fig. S2). When we compared
the distribution of the expression of gene subsets with a detected TSS, this further
resulted in a signicant shift from highly expressed genes (raw data set; .1,000 TPM) to
a distribution representative of the whole RNA-seq data set (Fig. 2b). This observation
suggests that our novel analysis detected TSSs more evenly along the bacterial genome.
Normalization also resulted in a greater number of TSSs (Fig. 2c), which probably
resulted from the higher sensitivity when additional data were considered.
FIG 1 Capp-Switch library preparation protocol and manual data analysis. (a) Capp-Switch and RNA-seq
libraries are constructed starting from isolated bacterial RNA. For Capp-Switch libraries, primary transcripts
bearing a triphosphate at the 59end are rst selectively puried using streptavidin beads. Both Capp-Switch
and RNA-seq libraries were reverse transcribed using a template-switching enzyme and sequenced on the
Illumina platform. (b to d) Raw Capp-Switch reads mapped on the genome of C. acetobutylicum ATCC 824.
Predicted sense transcription start sites (TSSs) are shown as blue triangles, antisense TSSs are orange triangles.
Predicted open reading frames are shown in yellow. (b) Opposing TSSs ank the ferredoxin gene CA_C0303. (c)
Two alternative TSSs are located upstream of the pyruvate ferredoxin oxidoreductase gene CA_C2229. (d)
Detection of a new coding sequence (orange) downstream from CA_P0016.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 3
We subsequently tested our hypothesis that in the original data set, TSSs tend to
accumulate on a few genes (Fig. 2d). A high proportion of reads (.70%) contributed
to the detection of TSSs in genes that bore more than 4 TSSs, with a maximum of 244
TSSs for a single gene (Fig. S3). Conversely, expression normalization reduced the pro-
portion of genes with more than 4 TSS to 15%, indicating that (i) that a higher number
of genes overall were found with one or several TSSs and (ii) the normalization step
improved the data set by removing secondary TSSs which were linked either to perva-
sive transcription or to methodological noise.
Finally, we compared TSSs in both data sets (Fig. 2e). Most were data set-specic,
which underlines the impact of an additional normalization step on TSS identication.
To summarize, optimized expression normalization allows an even detection of
TSSs along the bacterial genome. Indeed, raw data sets tend to identify TSSs on very
highly expressed genes, which is detrimental because they only represent a fraction of
expressed genes. Our normalization pipeline enhances TSS detection in an expression-
independent manner and increases the number of genes with detected TSSs.
Capp-Switch identies thousands of TSSs in 3 clostridial genomes. After opti-
mizing the TSS identication pipeline, we focused on the resulting data for each strain
(Fig. 3, Table S1, and Fig. S4). Analysis of duplicate cultures permitted the detection of
4,090, 4,583, and 3,441 TSSs in C. beijerinckii DSM 6423, NCIMB 8052, and C. acetobutyli-
cum ATCC 824, respectively, with a high condence threshold (.25 normalized RPM,
Fig. 3a). Adjusting the threshold to 10 normalized RPM increased the number of TSSs
to over 13,000 and 9,000 for the C. beijerinckii and C. acetobutylicum strains, respec-
tively. TSSs identied in C. acetobutylicum were compared to previously experimentally
Raw Normalized
TSS number
2354 632 3458
TSS number (x103)
TSS per gene
Cumulated TSS strength
(% of total RPM)
TSS strength
TSS strength (log2(RPM))
TSS-associated gene expression
RNA-seq control
0510 15
(10 RPM)
(25 RPM)
(10 RPM)
(25 RPM)
FIG 2 Normalization of Capp-Switch reads enhances TSS identication for C. beijerinckii DSM 6423. (a) TSS
expression distribution with (normalized) and without (raw) expression normalization (10 reads per million
[RPM] [TSS] detection threshold shown). (b) RNA-seq expression distributions of (i) all genes (black line) and (ii)
the subset of genes for which TSSs have been found, in normalized (blue) and raw (red) data sets (25 RPM
detection threshold shown). (c) Number of TSSs found for normalized and raw data sets. (d) TSS number per
gene (with detected TSSs, 25 RPM detection threshold shown). Value corresponds to the number of reads
falling in each category (as percentage of the total RPM). (e) Venn diagram showing TSSs found for each data
set (25 RPM detection threshold shown) and the corresponding numbers of associated genes.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 4
validated TSSs for 11 genes (with a primer extension or 59RACE; Table S2). All previ-
ously identied TSSs were found in the normalized 10 RPM data set, and in 9 of 11
genes in the 25 normalized RPM data set. However, the latter data set limits the discov-
ery of novel, likely pervasive, lowly expressed secondary TSSs. Hence, we only used the
25 normalized RPM data sets in subsequent analyses to limit the false discovery rate.
TSSs were classied in 4 categories depending on their orientation and localization
relative to the associated genes: InterS (intergenic TSS with downstream gene in same
orientation), InterA (intergenic TSS with downstream gene opposite orientation), IntraS
(intragenic TSS in gene with same orientation), or IntraA (intragenic TSS in gene with
opposite orientation) (Fig. 3b). In the 3 strains, TSS repartition was relatively similar,
with most TSSs identied in the sense direction (InterS: 40 to 55%; IntraS: 40 to 55%).
Such an abundance of intragenic TSSs has been observed on several occasions using
different methodologies (4, 12), and this has been hypothesized to mainly be the result
of pervasive transcription, in some cases, however, with a conserved function (such as
driving the expression of truncated proteins or ncRNAs). This high number of IntraS
TSSs, however, must be considered in light of the high proportion of coding sequences
in bacterial genomes (88%, 81%, and 83% of the ATCC 824, NCIMB 8052, and DSM
6423 genomes, respectively). Even though the numbers of InterS and IntraS TSSs were
similar, most of the reads contributed to InterS TSSs (56 to 65% of total reads, Table
S3). These canonical, intergenic TSSs were found upstream from 1,468 genes in DSM
6423 (23% of genes), 1,525 genes in NCIMB 8052 (29%), and 976 genes in ATCC 824
(25%). For most of these genes, our data further revealed that, in our experimental con-
ditions, transcription was controlled by a single InterS TSS (in 65 to 80% of the cases,
FIG 3 General transcriptomic features of C. beijerinckii DSM 6423, C. beijerinckii NCIMB 8052, and C. acetobutylicum ATCC 824.
(a) Number of TSSs found for each strain with a condence threshold of 25 RPM. (b) Classication of TSSs in 4 categories:
intergenic sense (InterS), intragenic sense (IntraS), intergenic antisense (InterA), and intragenic antisense (IntraA). (c) Number of
InterS TSSs per gene for each strain. Values correspond to the percentage of genes with detected TSSs. (d) The 235 and 210
motifs found upstream from InterS TSSs of the three strains (e). 59UTR length distributions, calculated as the distance
between an InterS TSS and coding DNA sequence (CDS) starts.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 5
depending on the strain; Fig. 3c). As expected, conserved 210 and 235 motifs were
found enriched upstream from detected TSSs in all three strains (Fig. 3d), conrming
these were bona de TSSs. Less than 3% of the TSSs were observed in the antisense
direction, which supports previous results obtained for C. phytofermentans (3) (Fig. 3b).
In accordance with this study, we observed that these antisense transcripts may have
important biological functions (Table S4). Indeed, antisense transcription initiation
events were often detected for genes involved in transcriptional control, redox control,
and sugar uptake, suggesting that antisense transcription may regulate some of these
important cellular processes in vivo. Conrmation of antisense transcription from some
InterA and IntraA TSS was achieved by mapping visualization of forward reads from
paired-read RNAseq (Fig. S5).
The 59UTR lengths (measured as distances in bp between InterS TSS positions and
corresponding coding DNA sequence [CDS] starts) rarely exceeded 200 bp in the three
strains (Fig. 3e), exhibiting no correlation with gene expression (Data Set S8). Most 59
UTRs were between 0 and 100 bp long (with a peak at 232 bp relative to the start
codon), with 2 to 3% of transcripts categorized as leaderless (transcripts not bearing
an upstream RBS; for this analysis, 59UTR length of ,6 bp), suggesting that, despite
being common in other bacteria (13, 14), leaderless transcription seems to occur at rel-
atively low levels in solventogenic clostridia.
Alternative sigma factor binding sites were also identied in the sequences upstream
from InterS TSSs using RSAT (15) and a collection of motifs described in the literature
(Table S5). Only a very small subset of genes is transcribed via alternative sigma factors in
the three strains, which is not surprising, as alternative sigma factor-based regulation has
mostly been shown to regulate sporulation with regulons typically constituted by a few
genes (16). Expression using alternative sigma factors may be more common under non-
ideal growth conditions.
Capp-Switch reveals that acetoacetyl-CoA conversion is differentially controlled in
C. acetobutylicum and C. beijerinckii.Solventogenic clostridia have an atypical metabo-
lism that allows them to convert multiple carbon sources to a panel of high-interest
industrial compounds (i.e., butanol, ethanol, acetone, isopropanol, 2,3-butanediol).
These metabolic pathways are centered around acetyl-CoA (Fig. 4a), which serves as
the fundamental building block for these metabolites. To synthesize acetone or isopro-
panol and butanol, two molecules of acetyl-CoA are rst condensed into one acetoace-
tyl-CoA, which is transformed into acetone/isopropanol via the CtfAB-Adc-Sadh route,
or into butanol by enzymes encoded by the butyryl-CoA synthesis (BCS) operon and
subsequently by aldehyde/alcohol dehydrogenases.
The BCS pathway involves the products of 5 genes (hbd,crt,bcd,etfA,etfB)andwas
described several decades ago as a single operon in C. acetobutylicum (17). Even though
previous transcriptomics analyses suggested that this organization was conserved in both
C. beijerinckii strains (7, 11), our comparative TSS analysis revealed that, in addition to the
TSS located upstream of the rst gene (crt) in all three strains, there was a novel and highly
used TSS in C. beijerinckii strains located upstream from the hbd gene (i.e., the last gene)
coding for the rstenzymeinvolvedinthemetabolicpathway(Fig.4b).
These results were experimentally veried by Northern blotting (Fig. 4c and d). For
each strain, single-stranded radiolabeled DNA probes targeting either crt or hbd were
hybridized to nitrocellulose-transferred bacterial RNAs. The results show an hbd transcript
(1 kb) specictoC. beijerinckii strains. Anti-hbd Northern blots also revealed a high-mo-
lecular weight signal (between 4 and 6 kb) similar to anti-crt Northern blots, suggesting
that hbd is also transcribed as a part of the original BCS operon. However, this signal sur-
prisingly constitutes two distinct transcripts, detected in both C. beijerinckii assays. These
transcripts, therefore, contain both the anti-crt and anti-hbd probe binding sites, suggest-
ing either the existence of an upstream, lowly expressed TSS which was not detected by
our Capp-Switch approach, or a form of transcript processing (such as the ones described
by Gill et al. [18]) which precisely shortened some of the BCS transcripts.
To summarize, an experimental approach allowed us to check the biological rele-
vance of our Capp-Switch data, which indicated the presence of an alternative TSS for
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 6
hbd transcription in C. beijerinckii. Northern Blot analysis conrmed its presence, and
further indicated an operonic/sub-operonic structure driving hbd expression.
Promoter comparison across strains. Promoters are likely to be conserved across
close species if they are under selective pressure (12). With this in mind, we compared
promoter sequences in the three bacterial genomes, focusing on InterS TSSs (ltered
so that the distance between the TSS and the start codon was less than 200 bp) and
IntraS TSSs (Fig. S7). For each strain, promoter sequences (50 bp upstream from TSSs)
were extracted and aligned using pairwise alignments. To do this, promoter sequences
associated with homologous genes were aligned (homology .60%). An alignment
score threshold was chosen for each pair of strains based on the distribution of align-
ment scores. This pipeline allowed the recovery of 1,396 and 691 conserved InterS and
IntraS promoters, respectively (Table S6). Alignments were further ltered so that each
promoter was only associated with a single promoter from another strain (Fig. 5a and
b). Most strikingly, the number of associated C. beijerinckii-C. acetobutylicum promoters
was very low (10% of C. acetobutylicum promoters from genes which have orthologs
in C. beijerinckii were conserved in at least one of the C. beijerinckii strains), underlining
a poor conservation of promoters between these species. Despite their phenotypical
resemblance, these organisms might have diverged a sufciently long time ago (as
suggested in a recent phylogenetical evaluation [6]), implying that promoter sequen-
ces cannot be associated using our approach. On the other hand, comparison of the
two C. beijerinckii strains using our pipeline indicated that about half of InterS and a
third of IntraS promoters were conserved. Therefore, we performed functional enrich-
ment of genes with promoters conserved for the two C. beijerinckii strains (Fig. S8).
While some categories appear slightly depleted or enriched, the distribution of gene
CtfA - CtfBAdc
EtfA - EtfB
crt bcd etfB etfA hbd
common TSS C.beijerinckii-specific TSS
≈ 1 kb
≈ 5 kb
probe 1 probe 2
8052 824
crt hbd
crt hbd crt hbd
crt etfB etfA hbdbcd
500 bp
FIG 4 Northern blot analysis of the butyryl-CoA synthesis (BCS) transcript structures. (a) Metabolic pathways orienting solventogenic
metabolisms toward acetone/isopropanol or butanol formation. Pfor, pyruvate:ferredoxin oxidoreductase; Thl, thiolase; Hbd,
-hydoxybutyryl-CoA dehydrogenase; Crt, crotonase; Bcd, butyryl-CoA dehydrogenase; Etf, electron transfer avoprotein; Pta,
phosphotransacetylase; Ack, acetate kinase; Ptb, phosphate butyryltransferase; Buk, butyrate kinase; Ctf, acetoacetyl-CoAacetate/
butyrate-CoA transferase; Adc, acetoacetate decarboxylase; Ald, aldehyde dehydrogenase; Adh, alcohol dehydrogenase; sAdh,
secondary alcohol dehydrogenase. (b) Raw Capp-Switch reads mapped on the genome of C. beijerinckii NCIMB 8052. Annotated open
reading frames are represented in yellow. (c) Hybridization sites of the probes used for Northern blotting are highlighted for the crt
and hbd genes. (d) Anti-crt and anti-hbd Northern blotting was performed for each strain.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 7
categories with conserved promoters seems relatively similar to the gene category dis-
tribution of the genome, suggesting that gene function is not highly relevant to the
conservation of the considered promoters.
We subsequently looked at promoter conservation at the gene level and found some
interesting examples (Fig. 5c to f). Hsp18 is an important heat shock protein in solvento-
genic clostridia, as it is induced at the onset of solventogenesis and presumably involved
in solvent resistance (1921). The corresponding promoter is well conserved in all three
strains, but the 59UTR lengths are different (C. acetobutylicum, 149 bp, as previously
reported [19]; C. beijerinckii,39bp,Fig.5c).Thiscouldbelinkedtoadditionalpost-transcrip-
tional gene regulation in C. acetobutylicum. Indeed, the 149-bp 59UTR from C. acetobutyli-
cum reveals an extensive secondary structure (as shown by using RNAfold with default pa-
rameters [22]), which could regulate transcript translation via a riboswitch-like mechanism,
i.e., by promoting premature transcription termination or inhibiting translation initiation. In
contrast, the 59UTR length of ptb, a central metabolic gene involved in butyrate formation
and solventogenesis regulation (23, 24), is strictly the same among species, but the
FIG 5 Promoter comparison in the three strains. (a and b) Venn diagrams showing the number of (a) InterS and (b)
IntraS TSSs conserved and not conserved after ortholog pair identication, promoter alignment, and threshold-based
selection. (c to f) Selected examples of TSSs with or without conservation in the three strains.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 8
promoters are species-specic, suggesting that a differential transcriptional regulation
exists for these genes in the two species.
Because of their putative regulatory role, intragenic promoters are interesting to ana-
lyze, since strong conservation might imply a relevant functional role. In the case of the
RNase Y gene, which codes for an essential protein involved in mRNA decay (25), the ca-
nonical promoter was well conserved in all three strains, with similar 59UTR lengths
(Fig. 5e). However, in both C. beijerinckii strains, very strong internal promoters were also
detected. In particular, an antisense promoter strongly conserved in these two strains
might repress RNase Y expression via a RNA polymerase collision mechanism (26) (sup-
ported by strand-specic RNA-seq reads; Fig. S5A). For the anti-sigma factor CsfB, an inhibi-
tor of the sporulation- and solventogenesis-related
strong intragenic promoter is conserved in all three strains (Fig. 5f). Hence, this promoter
might be involved in the regulation of this gene in the three strains, and therefore in the
regulation of solvent production and spore formation.
As illustrated by these few examples, comparing promoter conservation across species
can highlight differential or similar transcriptional features, discovering potentially impor-
tant information on how various processes (such as solventogenesis) are regulated.
In this work, we investigated transcription initiation by determining TSSs in three
related solventogenic clostridia. In this regard, Capp-Switch sequencing (3) was used and
rst rened to include an optimized normalization pipeline, in an attempt to better reect
the biological relevance of identied TSSs. To do this, gene expression data analysis was
incorporated into the detection pipeline by performing RNA-seq on the same mRNA sam-
ples and using the resulting expression values to normalize Capp-Switch seq data. This
additional step limited the gene expression bias and hence enhanced results at the ge-
nome scale (expression bias, TSS number/gene) and the gene scale (Fig. S3), underlining
the importance of treating Capp-Switch data with a normalization step.
This improvement could be due to biological and/or technical causes, e.g., if strong
promoters increased permissive transcription events around the vicinity of the primary
TSS or if the purication technique was not fully efcient. Indeed, for highly expressed
genes, even a small proportion of co-puried undesirable RNAs (i.e., fragmented RNAs
for which the 59end does not correspond to the TSS) could bias the TSS detection
pipeline. In both cases, accounting for expression helps to mitigate this problem and
detect bona de TSSs, particularly for low or moderately expressed genes.
These results therefore highlight the necessity of a second sequencing to treat Capp-
Switch seq data. In their original work, Ettwiller and colleagues (4) previously questioned
the necessity of a control sequencing for Cappable-Seq (in this case, a library where the
triphosphate (PPP) 59end purication step is omitted), similarly to what had been done
until then for dRNA-seq (2). However, they concluded that this control was unnecessary
under their conditions because it only allowed the elimination of a minority of TSSs.
These discrepancies with our conclusions might come from technical differences (i.e.,
the way RNAs are processed in Capp-Switch seq versus Cappable-seq) or the way TSS
data were compared to the control in both experiments. Indeed, local enrichment was
considered for Cappable-seq, whereas in Capp-Switch seq we accounted the overall
expression of the associated gene for each TSS. This might considerably change the
observations, as Illumina RNA-seq produces uneven coverage along genes; in particular,
lower coverage at the 59end (28, 29).
Our genome and gene analyses (Fig. 3, Fig. S3) indicated that, for the three strains,
Capp-Switch accurately detects TSSs at the single-nucleotide resolution. Importantly,
TSSs have similar features in the three clostridia (total and pro-gene number of TSSs,
TSS categories, 59UTR length, upstream motifs). One interesting feature is the very low
number of antisense TSSs, which was already found in C. phytofermentans using the
same method (3). Indeed, it is well known that, in clostridia, antisense transcription
exists and has a relevant biological role (30, 31). Our data, however, suggest that
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 9
antisense transcription initiation is uncommon in these organisms and might be re-
stricted to a few genes or linked to very weak promoters, making its detection uncer-
tain when using the same pipeline used for sense TSSs.
While we observed many inter-strain TSS similarities at the genome-wide scale, there were
interesting differences at the gene scale. In particular, the BCS operon is a striking example of
how TSS data can be readily used to spot differences between microorganisms that would
otherwisebedifcult to distinguish/discriminate using RNA-seq only. Indeed, for both C. beijer-
inckii strains, previous transcriptomics studies predicted the same gene organization as for C.
acetobutylicum (7,11).However,ourdatashowthatC. beijerinckii uses an operonic/sub-oper-
onic structure for the expression of BCS genes, with hbd under the control of its own pro-
moter. Given that Hbd catalyzes the initial step of the metabolic pathway, separate transcrip-
tion regulation could allow tighter control of the whole pathway, possibly to ne-tune the
balance of acetoacetyl-CoA conversion into butanol or into acetone/isopropanol.
This interesting interspecies difference prompted us to develop a pipeline to automati-
cally retrieve conserved promoters between the three strains. Intuitively, promoters under
strong selective pressure are more likely to be conserved as being highly functional.
Conversely, non-conserved promoters may reect weaker selective pressure or a different
evolutionary path of the compared organisms. This approach is similar to the one Shao
et al. adopted (12), which identied secondary intragenic TSSs as a highly conserved fea-
ture in various Shewanella species. Our results, however, indicate that promoters diverged
sufciently between C. acetobutylicum and C. beijerinckii so that only a few of them were
conserved. However, these promoters most likely drive crucial functions (see examples
from Fig. 5). Between both C. beijerinckii strains, intergenic and intragenic promoter conser-
vation is, as expected, much higher. Conservation of these promoters, however, does not
seem to be related to gene function (Fig. S8), which is surprising given our initial hypothe-
ses. This observation could be explained by the fact that crucial functions are distributed
over various categories, or that tight promoter control might only be necessary in some
instances, not at the gene category level.
Comparison of promoters between the three strains also has the potential to shed light
on unannotated genes and regulatory pathways. For example, this is the case for CsfB, an
anti-sigma factor targeting sporulation-specic sigma factors which has only been studied
in B. subtilis (26) but could also have a highly relevant biological function in solventogenic
clostridia. Indeed, the regulation of sporulation-specic sigma factors (via transcription in-
terference, as suggested by our data for csfB) is likely to have a strong impact on solvento-
genesis, since both aspects are intertwined (3235).Thisisespeciallyimportantbecause,
even though alcohol production is the major reason for the study of solventogenic clostri-
dia, the solventogenesis regulatory circuitry is still only partly understood (36).
Overall, new aspects revealed by our TSS-mapping data can complete previous tran-
scriptomic studies of solventogenic clostridia. Making these maps available to the commu-
nity will undoubtedly further our comprehension of gene expression and help formulate
relevant metabolic engineering strategies for these industrially relevant microorganisms.
Strains, media, and culture conditions. Clostridial strains (ATCC 824, NCIMB 8052, DSM 6423) were
grown anaerobically at 34°C in liquid 2YTG (16 g tryptone, 10 g yeast extract, 5 g NaCl, and 20 g glucose
per L [pH 5.2]). Solid medium was prepared with 15 g/L agar and 5 g/L glucose, without pH adjustment.
For RNA isolation, 10 mL of mid-log-phase duplicate cultures (optical density at 600 nm 0.5) was har-
vested and stabilized on ice with 1.25 mL cold equilibration solution (a 1:18 proportion of acid phenol:
ethanol). After centrifugation, cell pellets were kept at 280°C until further use.
RNA extraction. Frozen pellets were suspended in TRIzol reagent (Invitrogen). Following cell lysis,
chloroform was added, and the aqueous phase was isopropanol-precipitated. Total RNA was then
treated with Turbo DNase (Invitrogen) and further puried (RNA Clean and Concentrator-5, .200-bp
protocol, Zymo Research). Ribosomal RNAs were depleted using the RiboZero kit (Illumina) and the
resulting RNAs were stored at 280°C until further use.
RNA-seq library preparation. RNA-seq libraries were prepared with a template-switching protocol.
After depletion of rRNA (RiboZero, Illumina), Moloney murine leukemia virus (MMLV) reverse transcriptase
(SMARTScribe, Takara Bio) was used to obtain cDNAs with the SMART Stranded N6 Primer mix (Takara Bio).
cDNAs were puried with AMPure magnetic beads (Beckman Coulter). Libraries were next obtained by PCR
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 10
using beads as the template (SeqAmp polymerase, Universal Forward and Indexed Reverse primers, Takara
Bio). After on-bead purication, libraries were sequenced with a MiSeq device (Illumina).
Capp-Switch library preparation. The Capp-switch sequencing library preparation protocol was
used (3). Briey, a 59biotinylated cap was rst added to the 59-PPP RNAs using vaccinia capping enzyme
(New England Biolabs). RNAs were fragmented and 59-capped RNAs were enriched using streptavidin
beads. cDNA synthesis was performed directly on the beads with the template-switching method, using
the SMARTscribe MMLV RT (Clontech). Eluted cDNAs were further used as templates for PCR (Universal
Forward PCR primer and Indexed Reverse PCR primer, Clontech Laboratories), and the resulting libraries
were sequenced on an Illumina MiSeq.
Accession number(s). Sequencing reads from RNA-seq and Capp-Switch seq have been submitted
to the SRA Database (BioProject accession no. PRJNA767822).
Capp-Switch data treatment pipeline. Capp-Switch forward reads were trimmed to remove the 3-
bp reverse transcriptase extension derived from the template switching library preparation protocol.
Capp-Switch and RNA-seq reads were then mapped to the relevant genomes (C. acetobutylicum ATCC
824, GCA_000008765.1;C. beijerinckii NCIMB 8052, GCA_000016965.1;C. beijerinckii DSM 6423,
GCA_900010805.1) using Geneious R10. 95 to 99% of reads were mapped to unique positions, yielding
between 0.73 million (rep. 2 ATCC 824 Capp-Switch) and 2.9 million (rep. 1 NCIMB 8052 RNA-seq) reads
per sample (Table S6). Capp-Switch data were then treated with a succession of custom Perl scripts (3).
Briey, TSSs were rst identied by counting the number of forward reads, starting at each genomic
position. Genome annotations were used to associate TSSs with the closest genes and classify these
TSSs in four subcategories (3) (InterS, InterA, IntraS, or IntraA, see Results). TSS replicates were compared,
and positions not detected in both duplicates were discarded. Next, gene annotations were used to as-
sociate TSS and RNA-seq data sets. Normalized data sets were obtained by dividing TSS raw read counts
by the mean RNA-seq expression values (in TPM) of TSS-associated genes and scaling the resulting val-
ues back to a total of 1 million reads. In the case of the ATCC 824 strain, only one RNA-seq could be per-
formed as no RNA for replicate 1 was left after the initial round of Capp-Switch sequencing. In this case,
the TPM value of replicate 2 was used for normalization. TSS positions were next clustered in the raw
and normalized data sets by retaining positions with the highest numbers of reads in 5-bp sliding win-
dows. These positions were further ltered out with a cutoff of 10 or 25 (high-condence data set) read
starts per million reads.
Motif analysis. The InterS subcategory of TSS was considered for motif analysis. The 50 bp upstream
from each InterS TSS were extracted from the genome sequences of the 3 strains and dened as pro-
moter sequences. These were fed into MEME (37) for motif discovery. All parameter settings were kept
at default and statistically signicant motifs were selected based on their E values (,0.05).
Detection of TSSs conserved in the different strains. The overall illustration of the pipeline is shown
in Fig. S7 in the supplemental material. To detect conserved promoters in the 3 strains, we rst ltered the
TSS subcategory InterS based on the distance (d,200 bp) between the TSSs and their associated gene
starts. The promoter sequences for the InterS and IntraS categories were extracted 50 bp upstream of
mapped TSSs. We performed all possible combinations of pairwise alignments, using the Needleman-
Wunsch algorithm, across all strains for the respective subcategories. Instead of giving an empirical threshold
of the alignment score to select conserved TSS, orthology-driven mapping of promoters was considered.
Orthology information was obtained from the MicroScope database (38). For each pair of strains, we split the
alignment scores into two groups (with alignment scores of promoters whose genes were orthologs or non-
orthologs). For these two lists of alignment scores, we observed that the probability distribution for the
scores of the orthologs group was different from that of the non-orthologs group. For each pair of strains, we
selected an alignment score threshold based on how these distributions overlapped. For RNase Y, the anti-
sense promoter was manually spotted and aligned for the 3 strains.
Study of specic sigma factors. Based on a list of 13 motifs related to different DNA-binding regula-
tors (Sigma A, Sigma 54, Sigma E, Sigma F, Sigma G, Sigma H, Sigma K, and Spo0A), we investigated, for
each strain, whether these motifs were recovered in the InterS promoter set (Table S5).
To this end, we applied the dna-pattern tool provided in the pattern-matching suite of RSAT (14) to
all promoter sequences for each strain. We parametrized the algorithm to authorize degenerated bases
based on the IUPAC code, as well as indels in the pattern. A motif search was performed on both strands
and overlapping of successive matches was prevented. For each motif, we obtained for each promoter
the number of matches and their positions. We then post-treated this information to obtain statistics on
the presence of specic sigma factors on the InterS promoters of the three strains independently.
Northern blotting. For each lane, 10
g heat-denatured total RNA was size-separated in a denatur-
ing 1.2% agarose gel (formaldehyde 6.6%). Following electrophoresis, RNAs were transferred overnight
by capillarity with 10 SSC buffer (1: 150 mM NaCl, 15 mM sodium citrate [pH 7.0]) onto a nitrocellu-
lose membrane (Hybond N1, GE Healthcare). Probes were obtained in two steps. First, 300- to 400-bp
fragments of target genes were amplied by PCR on genomic DNA. Second, these PCR products were
used as templates for unidirectional PCR (250- to 350-bp amplication of reverse strands, with incorpo-
ration of
P dATP; Perkin Elmer). The primers used for these PCRs are shown in Table S8. RNAs were
UV-cross-linked onto the membranes and pre-blocked for 1 h at 65°C with salmon sperm DNA in Church
buffer (0.5 M sodium phosphate [pH 7.0], 7% SDS, 1 mM EDTA). Radioactive probes were subsequently
added for hybridization for 6 h at 65°C. After two washes in 2 SSC 0.5% SDS buffer and one wash in
0.1 SSC 0.5% SDS buffer at 65°C, radioactivity was revealed by exposition on a phosphor screen (GE
Healthcare) and analysis of this screen using a Typhoon imager.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 11
Supplemental material is available online only.
This research received no specic grant from any funding agency in the public,
commercial, or not-for-prot sectors.
We thank Hervé le Hir for providing facilities, equipment, and technical assistance
adapted to Northern blot analysis, and Leila Bastianelli for fruitful discussions about
1. Browning DF, Busby SJW. 2016. Local and global regulation of transcrip-
tion initiation in bacteria. Nat Rev Microbiol 14:638650.
2. Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A,
Chabas S, Reiche K, Hackermüller J, Reinhardt R, Stadler PF, Vogel J. 2010.
The primary transcriptome of the major human pathogen Helicobacter
pylori. Nature 464:250255.
3. Boutard M, Ettwiller L, Cerisy T, Alberti A, Labadie K, Salanoubat M,
Schildkraut I, Tolonen AC. 2016. Global repositioning of transcription start
sites in a plant-fermenting bacterium. Nat Commun 7:13783. https://doi
4. Ettwiller L, Buswell J, Yigit E, Schildkraut I. 2016. A novel enrichment strat-
egy reveals unprecedented number of novel transcription start sites at
single base resolution in a model prokaryote and the gut microbiome.
BMC Genomics 17:199.
5. Jones DT, Woods DR. 1986. Acetone-butanol fermentation revisited.
Microbiol Rev 50:484524.
6. Poehlein A, Solano JDM, Flitsch SK, Krabben P, Winzer K, Reid SJ, Jones
DT, Green E, Minton NP, Daniel R, Dürre P. 2017. Microbial solvent forma-
tion revisited by comparative genome analysis. Biotechnol Biofuels 10:58.
7. Máté de Gérando H, Wasels F, Bisson A, Clement B, Bidard F, Jourdier E,
López-Contreras AM, Lopes Ferreira N. 2018. Genome and transcriptome
of the natural isopropanol producer Clostridium beijerinckii DSM6423.
BMC Genomics 19:242.
8. Nölling J, Breton G, Omelchenko MV, Makarova KS, Zeng Q, Gibson R, Lee
HM, Dubois J, Qiu D, Hitti J, Wolf YI, Tatusov RL, Sabathe F, Doucette-
Stamm L, Soucaille P, Daly MJ, Bennett GN, Koonin EV, Smith DR. 2001.
Genome sequence and comparative analysis of the solvent-producing
bacterium Clostridium acetobutylicum. J Bacteriol 183:48234838. https://
9. Ehsaan M, Kuit W, Zhang Y, Cartman ST, Heap JT, Winzer K, Minton NP.
2016. Mutant generation by allelic exchange and genome resequencing
of the biobutanol organism Clostridium acetobutylicum ATCC 824. Bio-
technol Biofuels 9:4.
10. Rajesh T, Anthony T, Saranya S, Pushpam PL, Gunasekaran P. 2011. Func-
tional characterization of a new holin-like antibacterial protein coding
gene tmp1 from goat skin surface metagenome. Appl Microbiol Biotech-
nol 89:10611073.
11. Wang Y, Li X, Mao Y, Blaschek HP. 2011. Single-nucleotide resolution analysis
of the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using
RNA-Seq. BMC Genomics 12:479.
12. Shao W, Price MN, Deutschbauer AM, Romine MF, Arkin AP. 2014. Conser-
vation of transcription start sites within genes across a bacterial genus.
mBio 5:e01398-14.
13. Cortes T, Schubert OT, Rose G, Arnvig KB, Comas I, Aebersold R, Young
DB. 2013. Genome-wide mapping of transcriptional start sites denes an
extensive leaderless transcriptome in Mycobacterium tuberculosis. Cell
Rep 5:11211131.
14. Shell SS, Wang J, Lapierre P, Mir M, Chase MR, Pyle MM, Gawande R,
Ahmad R, Sarracino DA, Ioerger TR, Fortune SM, Derbyshire KM, Wade JT,
Gray TA. 2015. Leaderless transcripts and small proteins are common fea-
tures of the mycobacterial translational landscape. PLoS Genet 11:
15. Nguyen NTT, Contreras-Moreira B, Castro-Mondragon JA, Santana-Garcia
W, Ossio R, Robles-Espinoza CD, Bahin M, Collombet S, Vincens P, Thieffry
D, van Helden J, Medina-Rivera A, Thomas-Chollier M. 2018. RSAT 2018:
regulatory sequence analysis tools 20th anniversary. Nucleic Acids Res 46:
16. Al-Hinai MA, Jones SW, Papoutsakis ET. 2015. The Clostridium sporulation
programs: diversity and preservation of endospore differentiation. Micro-
biol Mol Biol Rev 79:1937.
17. Boynton ZL, Bennet GN, Rudolph FB. 1996. Cloning, sequencing, and
expression of clustered genes encoding beta-hydroxybutyryl-coenzyme
A (CoA) dehydrogenase, crotonase, and butyryl-CoA dehydrogenase
from Clostridium acetobutylicum ATCC 824. J Bacteriol 178:30153024.
18. Gill EE, Chan LS, Winsor GL, Dobson N, Lo R, Ho Sui SJ, Dhillon BK, Taylor
PK, Shrestha R, Spencer C, Hancock REW, Unrau PJ, Brinkman FSL. 2018.
High-throughput detection of RNA processing in bacteria. BMC Genomics
19. Pich A, Narberhaus F, Bahl H. 1990. Induction of heat shock proteins dur-
ing initiation of solvent formation in Clostridium acetobutylicum. Appl
Microbiol Biotechnol 33:697704.
20. Bahl H, Müller H, Behrens S, Joseph H, Narberhaus F. 1995. Expression of
heat shock genes in Clostridium acetobutylicum. FEMS Microbiol Rev 17:
21. Sauer U, Dürre P. 1993. Sequence and molecular characterization of a DNA
region encoding a small heat shock protein of Clostridium acetobutylicum.J
Bacteriol 175:33943400.
22. Gruber AR, Bernhart SH, Lorenz R. 2015. The ViennaRNA web services. Meth-
ods Mol Biol 1269:307326.
23. Wiesenborn DP, Rudolph FB, Papoutsakis ET. 1989. Phosphotransbutyrylase
from Clostridium acetobutylicum ATCC 824 and its role in acidogenesis. Appl
Environ Microbiol 55:317322.
24. Zhao Y, Tomas CA, Rudolph FB, Papoutsakis ET, Bennett GN. 2005. Intracellu-
lar butyryl phosphate and acetyl phosphate concentrations in Clostridium
acetobutylicum and their implications for solvent formation. Appl Environ
Microbiol 71:530537.
25. Lehnik-Habrink M, Newman J, Rothe FM, Solovyova AS, Rodrigues C,
Herzberg C, Commichau FM, Lewis RJ, Stülke J. 2011. RNase Y in Bacillus
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 12
subtilis: a natively disordered protein that is the functional equivalent of
RNase E from Escherichia coli. J Bacteriol 193:54315441.
26. Shearwin KE, Callen BP, Egan JB. 2005. Transcriptional interference: a crash
course. Trends Genet 21:339345.
27. Martínez-Lumbreras S, Alfano C, Evans NJ, Collins KM, Flanagan KA,
Atkinson RA, Krysztonska EM, Vydyanath A, Jackter J, Fixon-Owoo S,
Camp AH, Isaacson RL. 2018. Structural and functional insights into Bacil-
lus subtilis sigma factor inhibitor, CsfB. Structure 26:640648.e5. https://
28. Wellenreuther R, Schupp I, Poustka A, Wiemann S, German cDNA Consor-
tium. 2004. SMART amplication combined with cDNA size fractionation
in order to obtain large full-length clones. BMC Genomics 5:36. https://
29. Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD. 2001. Reverse transcrip-
tase template switching: A SMART approach for full-length cDNA library con-
struction. BioTechniques 30:892897.
30. AndréG,EvenS,PutzerH,BurguièreP,CrouxC,DanchinA,Martin-Verstraete
I, Soutourina O. 2008. S-box and T-box riboswitches and antisense RNA con-
trol a sulfur metabolic operon of Clostridium acetobutylicum. Nucleic Acids
Res 36:59555969.
31. Soutourina OA, Monot M, Boudry P, Saujet L, Pichon C, Sismeiro O,
Semenova E, Severinov K, Le Bouguenec C, Coppée J-Y, Dupuy B, Martin-
Verstraete I. 2013. Genome-wide identication of regulatory RNAs in the
human pathogen Clostridium difcile. PLoS Genet 9:e1003493. https://doi
32. Janssen PJ, Jones DT, Woods DR. 1990. Studies on Clostridium acetobutyli-
cum ginA promoters and antisense RNA. Mol Microbiol 4:15751583.
33. Bi C, Jones SW, Hess DR, Tracy BP, Papoutsakis ET. 2011. SpoIIE Is necessary
for asymmetric division, sporulation, and expression of
does not control solvent production in Clostridium acetobutylicum ATCC 824.
J Bacteriol 193:51305137.
34. Jones SW, Tracy BP, Gaida SM, Papoutsakis ET. 2011. Inactivation of
Clostridium acetobutylicum ATCC 824 blocks sporulation prior to asym-
metric division and abolishes
protein expression but does not
block solvent formation. J Bacteriol 193:24292440.
35. Tracy BP, Jones SW, Papoutsakis ET. 2011. Inactivation of
Clostridium acetobutylicum illuminates their roles in clostridial-cell-form
biogenesis, granulose synthesis, solventogenesis, and spore morphogen-
esis. J Bacteriol 193:14141426.
36. Xue Q, Yang Y, Chen J, Chen L, Yang S, Jiang W, Gu Y. 2016. Roles of three AbrBs
in regulating two-phase Clostridium acetobutylicum fermentation. Appl Micro-
biol Biotechnol 100:90819089.
37. Bailey TL, Johnson J, Grant CE, Noble WS. 2015. The MEME Suite. Nucleic
Acids Res 43:W39W49.
38. Vallenet D, Calteau A, Dubois M, Amours P, Bazin A, Beuvin M, Burlot L,
Bussell X, Fouteau S, Gautreau G, Lajus A, Langlois J, Planel R, Roche D,
Rollin J, Rouy Z, Sabatet V, Médigue C. 2020. MicroScope: an integrated
platform for the annotation and exploration of microbial gene functions
through genomic, pangenomic and metabolic comparative analysis.
Nucleic Acids Res 48:D579D589.
TSS Mapping in 3 Clostridia with Capp-Switch Seq Microbiology Spectrum
March/April 2022 Volume 10 Issue 2 10.1128/spectrum.02288-21 13
Full-text available
Agrobacteria are a diverse, polyphyletic group of prokaryotes with multipartite genomes capable of transferring DNA into the genomes of host plants, making them an essential tool in plant biotechnology. Despite their utility in plant transformation, genome-wide transcriptional regulation is not well understood across the three main lineages of agrobacteria. Transcription start sites (TSSs) are a necessary component of gene expression and regulation. In this study, we used differential RNA-seq and a TSS identification algorithm optimized on manually annotated TSS, then validated with existing TSS to identify thousands of TSS with nucleotide resolution for representatives of each lineage. We extend upon the 356 TSSs previously reported in Agrobacterium fabrum C58 by identifying 1,916 TSSs. In addition, we completed genomes and phenotyping of Rhizobium rhizogenes C16/80 and Allorhizobium vitis T60/94, identifying 2,650 and 2,432 TSSs, respectively. Parameter optimization was crucial for an accurate, high-resolution view of genome and transcriptional dynamics, highlighting the importance of algorithm optimization in genome-wide TSS identification and genomics at large. The optimized algorithm reduced the number of TSSs identified internal and antisense to the coding sequence on average by 90.5% and 91.9%, respectively. Comparison of TSS conservation between orthologs of the three lineages revealed differences in cell cycle regulation of ctrA as well as divergence of transcriptional regulation of chemotaxis-related genes when grown in conditions that simulate the plant environment. These results provide a framework to elucidate the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. IMPORTANCE Transcription start sites (TSSs) are fundamental for understanding gene expression and regulation. Agrobacteria, a group of prokaryotes with the ability to transfer DNA into the genomes of host plants, are widely used in plant biotechnology. However, the genome-wide transcriptional regulation of agrobacteria is not well understood, especially in less-studied lineages. Differential RNA-seq and an optimized algorithm enabled identification of thousands of TSSs with nucleotide resolution for representatives of each lineage. The results of this study provide a framework for elucidating the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. The optimized algorithm also highlights the importance of parameter optimization in genome-wide TSS identification and genomics at large.
Full-text available
Large-scale genome sequencing and the increasingly massive use of high-throughput approaches produce a vast amount of new information that completely transforms our understanding of thousands of microbial species. However, despite the development of powerful bioinformatics approaches, full interpretation of the content of these genomes remains a difficult task. Launched in 2005, the MicroScope platform ( has been under continuous development and provides analysis for prokaryotic genome projects together with metabolic network reconstruction and post-genomic experiments allowing users to improve the understanding of gene functions. Here we present new improvements of the MicroScope user interface for genome selection, navigation and expert gene annotation. Automatic functional annotation procedures of the platform have also been updated and we added several new tools for the functional annotation of genes and genomic regions. We finally focus on new tools and pipeline developed to perform comparative analyses on hundreds of genomes based on pangenome graphs. To date, MicroScope contains data for >11 800 microbial genomes, part of which are manually curated and maintained by microbiologists (>4500 personal accounts in September 2019). The platform enables collaborative work in a rich comparative genomic context and improves community-based curation efforts.
Full-text available
RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at
Full-text available
Background: There is a worldwide interest for sustainable and environmentally-friendly ways to produce fuels and chemicals from renewable resources. Among them, the production of acetone, butanol and ethanol (ABE) or Isopropanol, Butanol and Ethanol (IBE) by anaerobic fermentation has already a long industrial history. Isopropanol has recently received a specific interest and the best studied natural isopropanol producer is C. beijerinckii DSM 6423 (NRRL B-593). This strain metabolizes sugars into a mix of IBE with only low concentrations of ethanol produced (< 1 g/L). However, despite its relative ancient discovery, few genomic details have been described for this strain. Research efforts including omics and genetic engineering approaches are therefore needed to enable the use of C. beijerinckii as a microbial cell factory for production of isopropanol. Results: The complete genome sequence and a first transcriptome analysis of C. beijerinckii DSM 6423 are described in this manuscript. The combination of MiSeq and de novo PacBio sequencing revealed a 6.38 Mbp chromosome containing 6254 genomic objects. Three Mobile Genetic Elements (MGE) were also detected: a linear double stranded DNA bacteriophage (ϕ6423) and two plasmids (pNF1 and pNF2) highlighting the genomic complexity of this strain. A first RNA-seq transcriptomic study was then performed on 3 independent glucose fermentations. Clustering analysis allowed us to detect some key gene clusters involved in the main life cycle steps (acidogenesis, solvantogenesis and sporulation) and differentially regulated among the fermentation. These putative clusters included some putative metabolic operons comparable to those found in other reference strains such as C. beijerinckii NCIMB 8052 or C. acetobutylicum ATCC 824. Interestingly, only one gene was encoding for an alcohol dehydrogenase converting acetone into isopropanol, suggesting a single genomic event occurred on this strain to produce isopropanol. Conclusions: We present the full genome sequence of Clostridium beijerinckii DSM 6423, providing a complete genetic background of this strain. This offer a great opportunity for the development of dedicated genetic tools currently lacking for this strain. Moreover, a first RNA-seq analysis allow us to better understand the global metabolism of this natural isopropanol producer, opening the door to future targeted engineering approaches.
Full-text available
Background: Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. Results: The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. Conclusions: This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on .
Full-text available
Global changes in bacterial gene expression can be orchestrated by the coordinated activation/deactivation of alternative sigma (σ) factor subunits of RNA polymerase. Sigma factors themselves are regulated in myriad ways, including via anti-sigma factors. Here, we have determined the solution structure of anti-sigma factor CsfB, responsible for inhibition of two alternative sigma factors, σG and σE, during spore formation by Bacillus subtilis. CsfB assembles into a symmetrical homodimer, with each monomer bound to a single Zn²⁺ ion via a treble-clef zinc finger fold. Directed mutagenesis indicates that dimer formation is critical for CsfB-mediated inhibition of both σG and σE, and we have characterized these interactions in vitro. This work represents an advance in our understanding of how CsfB mediates inhibition of two alternative sigma factors to drive developmental gene expression in a bacterium. Martínez-Lumbreras, Alfano et al. have solved the structure of the anti-sigma factor CsfB and explored its role in inhibiting two alternative sigma factors during Bacillus subtilis spore formation. The results provide insight into the molecular mechanism underlying a gene expression switch in bacteria.
Full-text available
Background: Understanding the RNA processing of an organismtextquoterights transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5textquoteright-monophosphate terminated RNA using a new high-throughput methodology called monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. Results: The pRNA-Seq transcript library revealed known tRNA, rRNA and tmRNA processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5textquoteright ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3,159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and that were centered on a palindromic TAT(A/T)ATA motif, suggesting that such sites may provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. Conclusions: This study introduces pRNA-Seq methodology, which provides the first comprehensive, genome-wide survey of RNA processing in any organism. As a proof of concept, we have employed this technique to study the bacterium Pseudomonas aeruginosa and have discovered extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation.
Full-text available
Background Microbial formation of acetone, isopropanol, and butanol is largely restricted to bacteria belonging to the genus Clostridium. This ability has been industrially exploited over the last 100 years. The solvents are important feedstocks for the chemical and biofuel industry. However, biological synthesis suffers from high substrate costs and competition from chemical synthesis supported by the low price of crude oil. To render the biotechnological production economically viable again, improvements in microbial and fermentation performance are necessary. However, no comprehensive comparisons of respective species and strains used and their specific abilities exist today. ResultsThe genomes of a total 30 saccharolytic Clostridium strains, representative of the species Clostridium acetobutylicum, C. aurantibutyricum, C. beijerinckii, C. diolis, C. felsineum, C. pasteurianum, C. puniceum, C. roseum, C. saccharobutylicum, and C. saccharoperbutylacetonicum, have been determined; 10 of them completely, and compared to 14 published genomes of other solvent-forming clostridia. Two major groups could be differentiated and several misclassified species were detected. Conclusions Our findings represent a comprehensive study of phylogeny and taxonomy of clostridial solvent producers that highlights differences in energy conservation mechanisms and substrate utilization between strains, and allow for the first time a direct comparison of sequentially selected industrial strains at the genetic level. Detailed data mining is now possible, supporting the identification of new engineering targets for improved solvent production.
Full-text available
Bacteria respond to their environment by regulating mRNA synthesis, often by altering the genomic sites at which RNA polymerase initiates transcription. Here, we investigate genome-wide changes in transcription start site (TSS) usage by Clostridium phytofermentans, a model bacterium for fermentation of lignocellulosic biomass. We quantify expression of nearly 10,000 TSS at single base resolution by Capp-Switch sequencing, which combines capture of synthetically capped 5′ mRNA fragments with template-switching reverse transcription. We find the locations and expression levels of TSS for hundreds of genes change during metabolism of different plant substrates. We show that TSS reveals riboswitches, non-coding RNA and novel transcription units. We identify sequence motifs associated with carbon source-specific TSS and use them for regulon discovery, implicating a LacI/GalR protein in control of pectin metabolism. We discuss how the high resolution and specificity of Capp-Switch enables study of condition-specific changes in transcription initiation in bacteria.
Characterization of the heat shock response in Clostridium acetobutylicum has indicated that at least 15 proteins are induced by a temperature upshift from 30 to 42°C. These so‐called heat shock proteins include DnaK and GroEL, two highly conserved molecular chaperones. Several genes encoding heat shock proteins of C. acetobutylicum have been cloned and analysed. The dnaK operon includes the genes orfA (a heat shock gene with an unknown function), grpE, dnaK, and dnaJ; and the groE operon the genes groES and groEL. The hsp18 gene coding for a cell member of the small heat shock protein family constitutes a monocistronic operon. Interestingly, the heat shock response in this bacterium is regulated by a mechanism, which is obviously different from that found in Escherichia coli. So far, no evidence for a heat shock‐specific sigma factor for the RNA polymerase in C. acetobutylicum has been found. In this bacterium, like in many Gram‐positive and several Gram‐negative bacteria, a conserved inverted repeat is located upstream of chaperone/chaperonin‐encoding stress genes such as dnaK and groEL and may be implicated as a cis‐acting regulatory site. The inverted repeat is not present in the promoter region of hsp18. Therefore, in C. acetobutylicum there are at least two classes of heat shock genes with respect to the type of regulation. Evidence has been found that a repressor is involved in the regulation of the heat shock response in C. acetobutylicum. However, this regulation seems to be independent of the inverted repeat motif, and the mechanism by which the inverted repeat motif mediates regulation remains to be elucidated. Another protein with a potential regulatory function might be the 21‐kDa heat shock protein, which is induced significantly earlier than the majority of heat shock proteins. This protein has similarity to the redox carrier rubredoxin. Interestingly, heat shock genes are expressed in C. acetobutylicum at an increased rate not only after heat stress but also during the initiation of solvent formation. The mRNA level of some heat shock genes, e.g. dnaK, reached a maximum at the same time during the metabolic shift as the mRNA levels of genes necessary for solvent production. Therefore, the heat shock response in C. acetobutylicum might be part of a global regulatory network including different stress responses like heat shock, metabolic switch, and also sporulation.