ArticlePDF Available

Global repositioning of transcription start sites in a plant-fermenting bacterium


Abstract and Figures

Bacteria respond to their environment by regulating mRNA synthesis, often by altering the genomic sites at which RNA polymerase initiates transcription. Here, we investigate genome-wide changes in transcription start site (TSS) usage by Clostridium phytofermentans, a model bacterium for fermentation of lignocellulosic biomass. We quantify expression of nearly 10,000 TSS at single base resolution by Capp-Switch sequencing, which combines capture of synthetically capped 5′ mRNA fragments with template-switching reverse transcription. We find the locations and expression levels of TSS for hundreds of genes change during metabolism of different plant substrates. We show that TSS reveals riboswitches, non-coding RNA and novel transcription units. We identify sequence motifs associated with carbon source-specific TSS and use them for regulon discovery, implicating a LacI/GalR protein in control of pectin metabolism. We discuss how the high resolution and specificity of Capp-Switch enables study of condition-specific changes in transcription initiation in bacteria.
Content may be subject to copyright.
Received 9 Aug 2016 |Accepted 1 Nov 2016 |Published 16 Dec 2016
Global repositioning of transcription start sites
in a plant-fermenting bacterium
Magali Boutard1,2, Laurence Ettwiller3, Tristan Cerisy1,2,4,5, Adriana Alberti1, Karine Labadie1,
Marcel Salanoubat1,2,4,5, Ira Schildkraut3& Andrew C. Tolonen1,2,4,5
Bacteria respond to their environment by regulating mRNA synthesis, often by altering the
genomic sites at which RNA polymerase initiates transcription. Here, we investigate genome-
wide changes in transcription start site (TSS) usage by Clostridium phytofermentans, a model
bacterium for fermentation of lignocellulosic biomass. We quantify expression of nearly
10,000 TSS at single base resolution by Capp-Switch sequencing, which combines capture of
synthetically capped 50mRNA fragments with template-switching reverse transcription. We
find the locations and expression levels of TSS for hundreds of genes change during meta-
bolism of different plant substrates. We show that TSS reveals riboswitches, non-coding RNA
and novel transcription units. We identify sequence motifs associated with carbon source-
specific TSS and use them for regulon discovery, implicating a LacI/GalR protein in control of
pectin metabolism. We discuss how the high resolution and specificity of Capp-Switch
enables study of condition-specific changes in transcription initiation in bacteria.
DOI: 10.1038/ncomms13783 OPEN
11CEA, DRF, IG, Genoscope, E
´vry 91000, France. 2CNRS-UMR8030, E
´vry 91000, France. 3New England Biolabs, Inc., Ipswich, Massachusetts 01938, USA.
´Paris-Saclay, E
´vry 91000, France. 5Universite
´vry, E
´vry 91000, France. Correspondence and requests for materials should be addressed to
A.C.T. (email:
NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms13783 | 1
Bacteria translate environmental signals into cellular
responses using a network of regulatory RNA and proteins
that control genome-wide transcription patterns. Many of
these regulators affect where RNA polymerase initiates messenger
RNA (mRNA) synthesis at transcription start sites (TSS). As
such, locating and quantifying changes in TSS usage is an
important step to understand bacterial gene regulation. Here, we
investigate TSS architecture in Clostridium phytofermentans ISDg,
a soil bacterium that ferments plant biomass into ethanol, H
acetate1, and belongs to the Lachnospiraceae family that includes
gut commensals with important roles in host nutrition2,3. This
anaerobic mesophile metabolizes diverse plant components
including cellulose, hemicellulose and pectin by tailoring
expression of many carbohydrate-active enzymes (CAZymes)
and other metabolic enzymes to the available substrate4,5.
C. phytofermentans has a 4.8 Mb genome with 3,926 predicted
protein-encoding genes3, and its ability to alter gene expression in
response to carbon sources and other environmental cues is
mediated by over 300 transcription regulator proteins6and
numerous non-coding RNA including metabolite-sensing
We investigate genome-wide patterns of C. phytofermentans
transcription initiation on heterogeneous plant substrates by
demonstrating an approach called Capp-Switch sequencing. The
initiating nucleotide of nascent mRNA is distinguished by a
50triphosphate (50-PPP), which has been exploited for genome-
wide TSS identification with dRNA-seq8by depleting rRNA and
other monophosphorylated transcripts using terminal
exonuclease (TEX). dRNA-seq has been applied to diverse
bacteria9–13, but incomplete and non-specific degradation of
processed RNA requires TSS identification to be based on
statistical comparison of read coverage in þTEX and TEX
samples. Capp-Switch avoids these problems by capturing and
purifying 50mRNA fragments, which are reverse transcribed with
template-switching to tagged cDNA for high-throughput
sequencing (Fig. 1). The 50-PPP of mRNA are modified by
vaccinia capping enzyme (VCE) to bear a biotinylated guanosine
cap that facilitates their capture and purification using
streptavidin magnetic beads. Recently, TSS were identified by
Cappable-Seq14 using VCE to add a desthiobiotin cap for bead-
based capture of 50mRNA, which were then eluted from the
beads and de-capped to ligate adapters for reverse transcription to
tagged cDNA. Capp-Switch streamlines this approach by reverse
transcribing the 50mRNA fragments using template-switching by
Moloney murine leukemia virus reverse (MMLV) transcriptase15.
Template-switching avoids adapter ligation and enables synthesis
of 50-tagged cDNA without releasing RNA from the beads,
permitting use of an irreversible, biotinylated cap to increase
RNA capture affinity. In all, we show Capp-Switch is a robust
method that yields a genome-wide, strand-specific, quantitative
map of TSS at single nucleotide resolution.
We apply Capp-Switch sequencing to define a genome-wide
map of 9,457 TSS during C. phytofermentans growth on raw
biomass, heterogeneous polysaccharides (cellulose, hemicellulose
and pectin) and their constituent sugars. We use this TSS map to
investigate features controlling gene regulation, such as RNA
polymerase binding sites, 50untranslated region (UTR) structure,
alternative promoters, operons and non-standard (leaderless and
antisense) transcription. We identify sequence motifs associated
with groups of TSS that are differentially expressed on specific
carbon sources and show these motifs can be used to reconstruct
transcription factor regulons. By integrating Capp-Switch data
with an updated genome annotation, RNA-seq and proteomics,
we discover novel transcriptional units (TU) and protein-
encoding genes. Finally, we discuss how Capp-Switch sequencing
can be applied as a general approach to explore transcription
regulation in prokaryotes.
General transcriptome features. Capp-Switch sequencing
quantified TSS with high reproducibility between duplicate
model substrate (Fig. 2a) and raw biomass (Fig. 2b) cultures. We
identified 9,457 TSS across treatments (Supplementary Data 1),
one-third of which were expressed in both sugar and poly-
saccharide cultures (Fig. 2c). Most reads (74%) contribute to
InterS TSS (Fig. 2d), which we observed upstream of 898 genes.
Among these, 687 genes (77%) are predicted to start operons16
(Supplementary Data 2), supporting these operon predictions and
the existence of many sub-operons. The 50UTR, spanning from
the primary TSS to the start codon, is less than 100 bp for most
genes, but there is no correlation between 50UTR length and TSS
First strand cDNA and
template switching
Capture 5’
Streptavidin beads
Second strand
Add 5’ biotin-GTP
MMLV reverse
Figure 1 | Overview of the Capp-Switch sequencing approach. Capp-Switch includes (ac) capture of 50mRNA fragments and (df) cDNA synthesis and
sequencing. (a) The mRNA 50triphosphate is capped with biotin-GTP by VCE. (b) RNA is fragmented and (c) the capped 50mRNA fragments are captured
on streptavidin magnetic beads and separated from other RNA. (d) The 50mRNA fragments are reverse transcribed to single-stranded cDNA using MMLV
reverse transcriptase. An oligonucleotide hybridizes to the 30overhang and the complementary sequence is synthesized by the MMLV template-switching
activity. (e) Double-stranded cDNA is synthesized using primers that hybridize to the single-stranded cDNA termini. (f) The cDNA is sequenced on a high-
throughput platform.
2NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms1 3783 |
strength (Fig. 2e). Studies in other bacteria report many leaderless
mRNA without 50UTR and ribosome binding sites (RBS)11. Four
per cent of InterS TSS are potentially leaderless in C.
phytofermentans, but these genes generally have another
upstream TSS and retain a typical RBS similar to highly
expressed C. phytofermentans genes (Supplementary Fig. 1).
Most genes were expressed from a single, primary TSS on all
substrates (Fig. 2f), but 191 (21%) genes altered their primary TSS
in response to carbon source. Further, genes with substrate-
specific InterS TSS are often differentially expressed on that
carbon source (w2test, Po0.01 for all substrates relative to
glucose) (Fig. 2g), supporting that changing TSS is a widespread
means of transcription regulation. In total, more than a thousand
TSS are specific to each polysaccharide (Supplementary Fig. 2A).
Xylan-specific (Supplementary Fig. 2B) and pectin-specific
(Supplementary Fig. 2C) TSS are primarily associated with
carbohydrate metabolism genes, while the most abundant
functional category of cellulose-specific TSS is prophage genes
(Supplementary Fig. 2D). The C. phytofermentans genome
includes a large prophage island that is not predicted to encode
a viable phage3, but whose transcription is up-regulated on
cellulose and biomass (Supplementary Fig. 3). This burst of
transcriptional initiation at viral genes could indicate prophage
excision was triggered on cellulosic substrates, that is, by low
carbon stress, or that viral proteins contribute to bacterial
Sequences upstream of primary TSS generally contain the
sigma-A-type consensus 35 and 10 hexamers recognized by
RNA polymerase (RNAP) and associated elements that likely
contribute to promoter function in this organism. An A-rich
region upstream of the -35 hexamer (TTGACA) (Fig. 2h)
resembles the ‘UP element’ that stimulates transcription initiation
by interacting with the RNAP alpha subunit18. Also, the Pribnow
hexamer (TATAAT) has an upstream TG di-nucleotide (Fig. 2i),
which enhances transcription in certain other bacteria19–21 by
interacting with the RNAP sigma-A subunit22. In contrast,
searching upstream of IntraS TSS identified an AT-rich stretch
B10 bp upstream of the TSS lacking RNAP binding sites
(Supplementary Fig. 4A), suggesting IntraS TSS often result
from promiscuous initiation at AT-rich sequences. We observed
IntraS TSS comprised that more than 50% of TSS (Fig. 2d), albeit
with fewer reads per site than InterS TSS. dRNA-seq studies have
rationalized similarly abundant intragenic TSS as resulting from
incomplete TEX degradation12, but our data support these TSS
Reads per million
100 200 300
UTR size (bp) TSS per gene
Xlo GalA Cel Xya Hg Sto
400 500
Culture 2 log2 (reads per million)
5956 TSS
6677 TSS
3,175 3,502
InterS IntraS InterA IntraA
Culture 2 log2 (reads per million)
Log2 (fold change)
Culture 1 log2 (reads per million)
Culture 1 log2 (reads per million)
ab cd
Figure 2 | General features of TSS identification by Capp-Switch sequencing. Capp-Switch reproducibly quantifies TSS usage in duplicate (a) glucose
(4,399 TSS; R2¼0.96) and (b) stover (1,532 TSS; R2¼0.99) cultures. (c) Venn diagram showing overlap of TSS identified in at least one monosaccharide
and one polysaccharide or biomass treatment. (d) Percentage of reads (purple) and TSS (yellow) classified as InterS, IntraS, InterA or IntraA summed
across treatments. (e) The length of most 50UTR (primary TSS to start codon) is o100 bp (blue bars with left Yaxis), but UTR length does not correlate
with expression strength (black line with right Yaxis). TSS strength is the average reads per million for all TSS in a 20 bp 50UTR size interval. Results show
glucose data. (f) Distribution of the number of InterS TSS per gene for data summed across treatments. (g) Genes with substrate-specific TSS are often
differentially expressed. The Yaxis is the absolute value of log
(RPKM substrate/RPKM glucose) from RNA-seq for all genes with InterS TSS specific to
that substrate. Substrates are xylose (Xlo n¼50 genes), galacturonic acid (GalA n¼146 genes), cellulose (Cel n¼94 genes), xylan (Xya n¼91 genes),
pectin (Hg n¼119 genes) and stover (Sto n¼48 genes). Symbols: red triangles are differentially expressed genes, blue circles unchanged genes, box shows
median and interquartile range. Promoter regions upstream of TSS expressed on three sugars and polysaccharides show consensus (h)35 and
(i)10 motifs recognized by RNA polymerase.
NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms13783 | 3
bear 50-PPP indicative of transcription initiation. IntraS TSS are
preferentially found in the 50end of genes (Supplementary
Fig. 4B), supporting they are under selective pressure and may
have roles including expression of alternative protein isoforms or
as mimicry molecules to sequester other RNA and ribonucleases
from their mRNA targets9.
Capp-Switch reads (Fig. 3a–d) start at specific positions with
respect to known genes showing TSS at single base resolution,
whereas RNA-seq reads begin throughout genes (Fig. 3e–h). We
observed four common TSS situations: genes with a single
upstream TSS, genes with both upstream and intragenic TSS,
genes with multiple TSS on a single substrate and genes
with substrate-specific TSS. For example, the glyceraldehyde
3-phosphate dehydrogenase (gadph) gene is constitutively trans-
cribed from a single TSS (Fig. 3a). The pyruvate ferredoxin oxido-
reductase (pfor) gene is transcribed from a single, upstream TSS
and another, weaker TSS in the coding sequence (Fig. 3b).
The cel5A cellulase gene23 is simultaneously transcribed from
multiple TSS on cellulose (Fig. 3c), as are other cellulases
(Supplementary Fig. 5). CAZyme expression in C.
phytofermentans is controlled by carbon source24,25 and our
data supports their regulation involves multiple promoters. The
cphy1510 gene encoding the most active xylanase5is transcribed
from three TSS on xylan and a different, upstream TSS on pectin
(Fig. 3d). Similarly, genes for other CAZymes including three
cellulases, one other xylanase, four pectinases and two glycosyl
transferases changed their primary TSS as a function of carbon
source. We confirmed the positions of the primary TSS identified
by Capp-Switch for gadph,pfor (IntraS and primary TSS),
cphy2243 and cphy1510 (xylan and pectin) using 50RACE
(Supplementary Fig. 6).
Motifs associated with TSS clusters. We clustered TSS based
on expression across carbon sources and searched sequences
surrounding TSS for overrepresented motifs (Supplementary
Fig. 7; Supplementary Data 3), revealing TSS clusters that share
motifs with potential regulatory functions (Fig. 4). For example,
the TSS cluster up-regulated on galacturonic acid and homo-
galacturonan (HG) (Fig. 4c) has a palindromic motif resembling
the cre operator (TGAAAGCGCTTTCA) bound by B. subtilis
CcpA26,27, a LacI/GalR regulator of numerous carbon metabolism
genes. LacI/GalR genes often have upstream copies of their
operators to auto-repress transcription28, and we found three
copies of the galacturonic acid cluster motif in the 50UTR of
cphy2742, a LacI/GalR gene specifically up-regulated on
galacturonic acid (Fig. 5a). Further, three of the six LacI/GalR
genes with detected primary TSS have upstream variants of the
cre operator that are conserved in their orthologs from related
species (Fig. 5b–d), leading us to propose C.phytofermentans
LacI/GalR regulators recognize related, but distinct, operators
to control separate regulons. Supportingly, the putative
Cphy2742 operator (Fig. 5b) is upstream of 22 genes in the
C. phytofermentans genome (Supplementary Table 1) including
3 CAZymes (PL9 pectin lyases) that degrade HG to galacturonic
acid5and transcription units containing all genes needed to
assimilate galacturonic acid29 (Supplementary Fig. 8).
The putative Cphy2742 operator sites are co-located with
or downstream of TSS for HG degradation and galacturonic
acid metabolism genes (Fig. 5e), supporting Cphy2742 binds
these sites to block transcription. Transcription of the
pl9 genes cphy2919 and cphy3869 switches to upstream primary
TSS on galacturonic acid relative to HG, but all TSS are close
enough to be potentially regulated by Cphy2742 operators. The
pta-ackA (cphy1326-7) acetate synthesis operon also has a
Cphy2742 operator and both pta-ackA expression and acetate
formation are elevated on galacturonic acid (Supplementary
Fig. 9). While B. subtilis CcpA represses most of its targets,
it activates pta and ackA transcription30,31 by binding upstream
of their promoters32. The Cphy2742 operator is also upstream
of the pta gene TSS, suggesting Cphy2742 may similarly activate
transcription of the pta-ackA operon as well as the glycolytic
gene ppdK and the hydrolase gene cphy0367. Collectively, we
propose Cphy2742 represses a comprehensive set of pectin
fermentation genes by binding a conserved palindrome at
or downstream of their TSS to block transcription. In response
to a galacturonic acid-based signal, Cphy2742 de-represses itself
and its targets, and may activate transcription of acetate synthesis
and other aspects of carbon metabolism by binding upstream
of TSS.
cphy2876 (gapdh)cphy3558 (pfor)cphy3202 (cel5A)cphy1510
Reads per thousand
Reads per million
Reads per million
Reads per million
Reads per million
Reads per million
Reads per million
Reads per million
3,528,700 3,529,700
Genome position (bp)
78 –546
Genome position (bp)
3,900,000 3,900,650 1,859,0001,858,460
Genome position (bp) Genome position (bp)
3,528,700 3,529,700
Genome position (bp)
4,391,800 4,395,400
Genome position (bp)
3,900,000 3,900,650 1,859,000
Genome position (bp) Genome position (bp)
7, 24, 45
ab c d
ef gh
Figure 3 | Capp-Switch reads start at specific genome positions corresponding to putative TSS. The number of reads starting at each genome position
are shown for Capp-Switch (ad) and RNA-seq (eh). The cphy2876 gapdh gene (a,e) has a single TSS (glucose data shown). The cphy3558 pfor gene
(b,f) has an upstream TSS and an intragenic sense TSS (glucose data shown). The cphy3202 cel5A cellulase gene (c,g) has three TSS during growth on
cellulose. The cphy1510 xylanase gene (d,h) is expressed from three TSS on xylan (red bars) and a single, upstream TSS on pectin (purple). Plots show the
number of reads starting at each genome position with forward strand reads on the positive Yaxis and reverse strand reads on the negative Yaxis. Distance
to the start codon is shown at the base of TSS peaks.
4NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms1 3783 |
Antisense and novel transcripts. Recent studies found 30–40% of
TSS are antisense in other bacteria8,9,13. However, antisense
transcription appears rare in C. phytofermentans: o1% of TSS
were antisense either between (InterA) or within genes (IntraA)
(Fig. 2d). To further investigate whether diffuse antisense
transcription was underestimated by our TSS thresholds, we
classified all mapped read starts, including those not meeting TSS
thresholds. Even then, InterA and IntraA classes together
comprise o4% reads. This dearth of antisense transcription
may relate to the early evolutionary divergence of the
Clostridiales33. Alternatively, we would not detect antisense
transcripts that were processed to remove 50-PPP or that are
below the 200 bp size threshold of our cDNA libraries, but studies
in other bacteria using larger size thresholds found antisense TSS
Glc Xlo GalA HG Xya Sto Cel Glc Xlo GalA HG Xya Sto Cel
0.5 0 –0.5
Glc Xlo GalA HG Xya Sto Cel Glc Xlo GalA HG Xya Sto Cel Glc Xlo GalA HG Xya Sto Cel
Figure 4 | TSS in carbon source-specific clusters share DNA sequence motifs. TSS clusters differentially expressed on (a,b) glucose, (c) galacturonic acid
and HG, (d) xylan and (e) stover and cellulose are shown along with their associated sequence motifs. Rows are expression of a TSS cluster member and
columns are duplicate glucose (Glc), xylose (Xlo), galacturonic acid (GalA), homogalacturonan (HG), xylan (Xya), stover (Sto) and cellulose (Cel) cultures.
Colours show TSS expression as log
-transformed read counts scaled to a median of zero for each TSS.
215 190 166 140
245 218 184
237 199
212 68
82 31
47 44
201 139
165 156
cphy2742 (lacl )
cphy1888 (pI9)
cphy2919 (pI9)
cphy3869 (pI9)
cphy0054 (kdgA)
cphy2534 (kduD)
cphy2741 (uxaB)
cphy0561 (ppdk)
cphy1326 (pta)
cphy2743 (uxaC )
mRNA expression
Figure 5 | The role of the LacI/GalR regulator Cphy2742 in galacturonic acid and pectin metabolism. (a) Transcription of the LacI/GalR gene cphy2742
is up-regulated on galacturonic acid relative to other carbon sources. Bars shows average RNA-seq RPKM of duplicate cultures; error bars are one s.d.
(bd) Upstream palindromes resembling cre operator sites found upstream of C. phytofermentans LacI/GalR genes and their orthologs from related
genomes (b)cphy2742 (motif e¼1.1 10 8), (c)cphy2467 (motif e¼2.4 10 8) and (d)cphy1883 (motif e¼8.9 10 2). (e) Twelve genes have both
TSS (blue triangles) and putative Cphy2742 operators (red ovals) including genes for pectin lyases (green), galacturonic acid metabolism (purple), general
carbon metabolism (yellow) and other or unknown (grey). The distance from the translation start is shown for each site.
NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms13783 | 5
in B35% of genes10. While comparatively rare, antisense
transcription appears to have important cellular functions. For
example, we observed an antisense TSS in the 50UTR of the
sporulation regulator spoOA (cphy2497) that also opposes
transcription of the spoIVB peptidase (cphy2498) (Fig. 6a). This
TSS was expressed on all sugars, but not polysaccharides,
supporting antisense transcription has a role in repressing
sporulation during log growth in sugar-replete conditions.
TSS reveal novel transcriptional features such as a TU
downstream of the glycoside hydrolase cphy2658 that is
up-regulated to have the strongest initiation site in the genome
on cellulose and corn stover (Fig. 6b). This region contains a
hypothetical open-reading frame (ORF) in the MaGe annotation
(clops3132) that has no similar sequences in Genbank, but the
ORF lacks an ribosome binding site (RBS), and we did not detect
any expressed peptides from this region by mass spectrometry,
suggesting it is a non-coding RNA. The most highly expressed
ABC transporter on glucose is a putative operon (cphy2241-3)
with a single TSS (Supplementary Fig. 5C,F). On all other carbon
sources, we observed repression of cphy2241-3 along with
appearance of an upstream, antisense TU (Fig. 6c) that has no
mapped peptides or predicted ORF. Non-coding RNA are often
associated with ABC transporters in clostridia34, and they may
also regulate ABC transport in this organism.
The C. phytofermentans genome may encode significantly
more genes than in the NCBI Genbank annotation. Classifying
TSS using the MaGe annotation showed 735 (7%) TSS map to
MaGe-specific clops genes of unknown function (Supplementary
Data 4), including 64 clops genes with InterS TSS. We examined
which of these novel TU encode proteins by mapping
C. phytofermentans MS/MS peptide spectra to the genome
translated in all frames, identifying peptides outside the predicted
proteome in 21 InterS, 13 IntraS, 5 InterA and 25 IntraA regions
(Supplementary Data 5). The combination of TSS and expressed
peptides supports ORFs with N-terminal extensions such as
cphy0891 (Supplementary Fig. 10A) and the existence of novel
ORFs. For example, clops3461, which overlaps with cphy2929 on
the opposite strand (Fig. 6d), and an antisense overlapping ORF
in cphy1953 encoding the ComEA competence protein
(Supplementary Fig. 10B).
TSS also show mechanisms of RNA-mediated gene regulation.
Comparative genomics with other clostridia detected a putative
T-box upstream of the C. phytofermentans trp operon34. In low
tryptophan conditions, the T-box promotes antitermination of
the trp operon by base pairing with uncharged tRNAtrp (ref. 35).
We observed transcription halted abruptly in the 50UTR of the
trp operon in glucose cultures (Fig. 6e), consistent with T-box-
mediated repression. In cellulose cultures, antitermination in the
T-box enabled trp operon mRNA expression, potentially enabling
translation of the trytophan-rich carbohydrate binding modules
in cellulases and other CAZymes. TSS also support riboswitches
associated with genes for metabolism of flavin mononucleotide
(FMN), cobalamin, thiamine pyrophosphate (TPP) and lysine
(Supplementary Data 6). For example, C. phytofermentans is
auxotrophic for thiamine, which it uptakes by a thiamine
transporter, Cphy0729 (ref. 36). The cphy0729 gene has a
single, constitutive TSS with an extended 50UTR containing a
putative TPP-sensing riboswitch (Fig. 6f) that could regulate
transporter expression in response to intracellular TPP levels37.
The strategy presented here to quantify condition-specific
changes in transcription initiation by Capp-Switch sequencing
could be generally applied to dissect the regulation of complex
bacterial phenotypes. In this study, we explored the transcrip-
tional programme enabling C. phytofermentans to ferment the
cellulosic, hemicellulosic and pectic components of plant biomass.
130 4×105
360 357
cphy2497 (spoOA)
cphy2498 (spoIVB)cphy2658 cphy2659 cphy2243 cphy2244
Reads per million
Reads per million
Reads per million
Reads per million
Reads per million
Reads per million
75 0
3,067,000 3,068,000
Genome position (bp)
Genome position (bp) Genome position (bp) Genome position (bp)
3,069,000 3,246,000 3,247,000 3,248,000 3,249,000 2,763,500 2,764,000 2,764,500 2,765,000 2,765,500
Genome position (bp) Genome position (bp)
cphy3848 (trpE)cphy0729
ab c
de f
Figure 6 | TSS show genome features. (a) The cphy2497 spoOA gene has both a primary TSS and an antisense TSS in the 50UTR (grey arrow) that were
observed on all sugars (glucose data shown). (b) A novel transcription unit (grey arrow) is up-regulated to be the most highly expressed TSS on biomass.
(c) Induction of a transcription unit (grey arrow) upstream of the ABC transporter cphy2243 is associated with repression of the transporter. This TSS was
observed on all substrates except glucose (cellulose data shown). (d) A primary TSS, RNA-seq reads, and three in-frame peptides expressed on cellulose
support the MaGe-predicted clops3461 gene rather than the annotated cphy2929 gene. Positions of peptides detected by mass spectrometry (purple) are
shown. (e) The trpE (cphy3848) gene has an upstream T-box that terminates transcription in the 50UTR during log-phase growth on glucose. (f) The
thiamine transporter (cphy0729) has an extended 50UTR containing a TPP-binding riboswitch. All plots show the number of reads starting at each genome
position for RNA-seq (blue) and Capp-Switch (red). Numbers at base of TSS peaks are distances to start codons of (a)cphy2497,(c)cphy2243,(d)
clops3461, (e)cphy3848,(f)cphy0729 and (b) the cphy2659 stop codon.
6NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms1 3783 |
We found that growth on these different carbon sources entailed
widespread TSS changes, including use of substrate-specific TSS
for genes encoding biomass-degrading enzymes such as cellulases,
xylanases and pectinases. Substrate-specific TSS could enable
tuning of expression by changing promoters or the regulatory
properties (that is, binding sites or secondary structure) of the
50UTR. We observed that genes encoding cellulases and other
enzymes are simultaneously expressed from more than one TSS.
Multiple regulators may control transcription of these genes,
reflecting the numerous transcription factors encoded by this
organism (Supplementary Data 7). Genes for biomass-degrading
enzymes in other Clostridiales are regulated by various
transcription factors including a two-component system for
hemicellulases38, a LacI/GalR protein for b-1-3 glucanases39 and
alternative sigma factors for cellulases40. We defined TSS clusters
that were differentially expressed on specific carbon sources and
used them to guide the discovery of sequence motifs with
potential regulatory function, leading us to identify the LacI/GalR
Cphy2742 as a putative regulator of pectin metabolism.
Combining TSS mapping with motif searching could be broadly
applied to LacI/GalR regulators and other types of transcription
factors. For example, each of the 4 TetR regulators for which we
detected TSS also have conserved, TSS-associated palindromes
that resemble operator sites (Supplementary Fig. 11).
We also gained insight into regulatory mechanisms such as
antisense transcription, leaderless transcription and non-coding
RNA. We observed that antisense and leaderless transcription are
much rarer than reported in other bacteria and it will be
interesting to see if they are similarly uncommon in closely-
related bacteria. We also show that integration of Capp-Switch
TSS mapping with RNA-seq and proteomics enables discovery of
novel transcription units and protein-encoding genes. Transcrip-
tion initiation is a complex and important component of gene
regulation for which most of the underlying mechanisms in
C. phytofermentans are yet unknown. Further, these results
illustrate how little we know about gene regulation in plant-
fermenting clostridia, a group of bacteria with important roles in
soil and gut microbiomes that have significant potential to serve
as biocatalysts for industrial transformation of plant biomass.
Bacterial cultivation.C. phytofermentans ISDg (ATCC 700394) was cultured
anaerobically at 30 °C in GS2 medium41 containing 5 g l 1of either D-( þ)-glucose
(Sigma G5767), D-( þ)-xylose (Sigma X3877), D-galacturonic acid sodium salt
(Sigma 73960), regenerated amorphous cellulose (RAC) from Avicel PH-101
(Sigma 11365), birchwood xylan (Sigma X0502), apple pectin (HG) (Sigma P8471)
or raw corn stover (Qteros Inc) cut in 0.5 3.0 cm strips. RAC was prepared by
phosphoric acid treatment42. Duplicate cultures were sampled in mid-log phase or
after 2 days (RAC) or 3 days (stover). Fermentation products were quantified by
Capp-Switch library preparation.Total RNA was extracted from duplicate
cultures for each treatment using TRI reagent (Sigma 93289) and treated with
Turbo DNase (Ambion AM2238) at 0.2 U mg1RNA for 30 min at 37 °C. RNA
was purified by Zymo Concentrator-5 (Zymo Research R1015) (4200 bp capture)
into 15 ml water. RNA was 50capped using VCE (NEB M2080) at 3 U mg1RNA
with 0.1 mM SAM and 0.5 mM 30biotin-GTP (NEB N0760) for 30 min at 37 °C
and purified by Zymo Concentrator-5 (4200 bp capture) with two additional
washes into 45 ml water. RNA was fragmented for 30 s at 94 °C using NEBNext
Magnesium-based RNA fragmentation buffer (NEB E6101) and purified by Zymo
Concentrator-5 (total RNA capture) into 100 ml water. Streptavidin magnetic
beads (NEB S1421S) were pre-washed twice with low-salt buffer (10 mM Tris,
50 mM NaCl, 1mM EDTA), twice with binding buffer (10 mM Tris, 500 mM NaCl,
1 mM EDTA) and resuspended at 4 mg ml 1beads in binding buffer. Capped
RNA fragments were bound to strepta vidin beads for 20 min at room temperature
and magnetically separated from other RNA by washing twice with binding buffer
and twice with low-salt buffer to elute non-bound RNA. Beads were washed once
with 1 mM Tris–HCl pH 7.5 and resuspended in 1 mM Tris–HCl pH 7.5.
RNA was converted to single-strand cDNA by SMARTscribe MMLV reverse
transcriptase (Clontech 634836) at 10 U ml1with 2.5 mM DTT, 1 mM dNTP,
1.2 mM SMARTer stranded oligo and 0.6 mM SMART stranded N6 primer
(Clontech 634836) by incubating 90 min at 42 °C and 10 min at 70 °C. Beads were
collected and the supernatant was combined with the liquid fraction after the beads
were washed with 30 ml 1 mM Tris pH 7.5. The cDNA was twice purified using
1 volume of solid phase reversible immobilization (SPRI) beads (Beckman Coulter
A63880). cDNA was left on beads after the second purification and double-
stranded cDNA was synthesized by 18 cycles PCR using SeqAmp DNA polymerase
(Clontech 638504) with 0.25 mM primers (Universal Forward PCR primer and
indexed Reverse PCR primer) and then SPRI purified with 1 volume of beads.
DNA was sequenced on Illumina MiSeq with 150 bp paired-end reads chemistry.
TSS identification and classification.Sequencing reads were quality filtered44
and the 3 bp MMLV reverse transcriptase 30non-template extension was
removed from the 50end of forward (R1) reads. Reads were mapped to the
C. phytofermentans ISDg genome (NCBI NC_010001.1) using Bowtie 2
(version 2.2.4)45. Alignments showed 87–98% of reads mapped to unique positions
in the C. phytofermentans genome, yielding between 0.4 million (corn stover)
and 3.4 million (glucose) reads per culture (Supplementary Table 2). TSS were
identified using R1 reads by calculating the number of reads starting at each
genomic position, clustering read counts within a 5 bp sliding window, and
retaining the position with the greatest number of reads. TSS were defined as
genome positions with greater than 10 read starts per million reads in both
duplicate cultures. Capp-switch TSS were confirmed by 50RACE (Sigma
03353621001) using primers in Supplementary Table 3 to amplify PCR products,
which were resolved by electrophoresis, excised and sequenced.
Genes in the NCBI and MicroScope (MaGe) annotations46 were used to divide
TSS into four categories: InterS (intergenic TSS with downstream gene in same
orientation), InterA (intergenic TSS with downstream gene opposite orientation),
IntraS (intragenic TSS in gene with same orientation) or IntraA (intragenic TSS
in gene with opposite orientation). The InterS TSS with the most reads for each
gene was defined as the primary TSS. Capp-Switch results were compared with
strand-specific (dUTP) RNA-seq of C. phytofermentans grown in the same
culture conditions5. RNA-seq gene expression was calculated as RPKM using
the Bioconductor47 package ‘easyRNASeq’ and differential expression was defined
as a DESeq48 (version 1.22.1) P-value o0.05 adjusted for multiple testing of the
3,902 genes in C. phytofermentans genome by Bonferroni correction. Peptides
corresponding to novel ORFs were identified by mapping peptide MS/MS spectra
from glucose, xylan and cellulose cultures4to the genome translated in all six
frames. Peptides were identified from spectra using SEQUEST and filtered to a
5% false discovery rate using a target-decoy approach49,50 including a target
database and a decoy of the reversed sequences.
Motif analysis.Sequence motifs were identified using MEME51 with a background
model of di-nucleotide frequencies in the C. phytofermentans genome. Searches for
RNA polymerase binding site motifs included positions 25–50 bp (35 motif) and
5–20 bp ( 10 motif) upstream of all primary TSS expressed on the three sugars
and polysaccharides. The top palindromic motifs associated with LacI/GalR and
TetR regulators were found by searching sequences from 250 (upstream) to
þ50 bp (downstream) relative to the start codon of C. phytofermentans genes
and their putative orthologs from related genomes identified by top reciprocal
BLAST searches (Supplementary Table 4). These motifs were used for genome-
wide scans from 250 to þ50 bp within all C. phytofermentans genes using
MAST52. To cluster TSS by expression, the 1,188 TSS with at least a 30-fold change
in read counts between two conditions were log
-transformed and each TSS
was normalized to have a median value of 0 across conditions and scaled so the
sum of the squared expression levels is 1. TSS were separated into 24 clusters
by K-means using the city-block similarity metric. Significant motifs (eo0.001)
associated with individual K-means clusters were identified by searching
100 to þ10 bp with respect to each TSS.
Data availability.The authors confirm that all data underlying the findings are
fully available without restriction. RNA sequencing files in FASTQ format are
available in the European Nucleotide Archive under study accession PRJEB13063.
1. Warnick, T. A., Methe
´, B. A. & Leschine, S. B. Clostridium phytofermentans sp.
nov., a cellulolytic mesophile from forest soil. Int. J. Syst. Evol. Microbiol. 52,
1155–1160 (2002).
2. Meehan, C. J. & Beiko, R. G. A phylogenomic view of ecological specialization
in the Lachnospiraceae, a family of digestive tract-associated bacteria. Genome
Biol. Evol. 6, 703–713 (2014).
3. Petit, E. et al. Genome and transcriptome of Clostridium phytofermentans,
catalyst for the direct conversion of plant feedstocks to fuels. PLoS ONE 10,
e0118285 (2015).
4. Tolonen, A. C. et al. Proteome-wide systems analysis of a cellulosic biofuel-
producing microbe. Mol. Syst. Biol. 7, 461 (2011).
5. Boutard, M. et al. Functional diversity of carbohydrate-active enzymes enabling
a bacterium to ferment plant biomass. PLoS Genet. 10, e1004773 (2014).
NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms13783 | 7
6. Hunter, S. et al. InterPro in 2011: new developments in the family and domain
prediction database. Nucleic Acids Res. 40, D306–D312 (2012).
7. Nawrocki, E. P. et al. Rfam 12.0: updates to the RNA families database.
Nucleic Acids Res. 43, D130–D137 (2015).
8. Sharma, C. M. et al. The primary transcriptome of the major human pathogen
Helicobacter pylori.Nature 464, 250–255 (2010).
9. Mitschke, J. et al. An experimentally anchored map of transcriptional start sites
in the model cyanobacterium Synechocystis sp. PCC6803. Proc. Natl Acad. Sci.
USA 108, 2124–2129 (2011).
10. Schlu¨ter, J.-P. et al. Global mapping of transcription start sites and promoter
motifs in the symbiotic a-proteobacterium Sinorhizobium meliloti 1021.
BMC Genomics 14, 156 (2013).
11. Cortes, T. et al. Genome-wide mapping of transcriptional start sites defines an
extensive leaderless transcriptome in Mycobacterium tuberculosis.Cell Rep. 5,
1121–1131 (2013).
12. Shao, W., Price, M. N., Deutschbauer, A. M., Romine, M. F. & Arkin, A. P.
Conservation of transcription start sites within genes across a bacterial genus.
MBio. 5, e01398-14 (2014).
13. Thomason, M. K. et al. Global transcriptional start site mapping using
differential RNA sequencing reveals novel antisense RNAs in Escherichia coli.
J. Bacteriol. 197, 18–28 (2015).
14. Ettwiller, L., Buswell, J., Yigit, E. & Schildkraut, I. A novel enrichment strategy
reveals unprecedented number of novel transcription start sites at single base
resolution in a model prokaryote and the gut microbiome. BMC Genomics 17,
199 (2016).
15. Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse
transcriptase template switching: a SMART approach for full-length cDNA
library construction. BioTechniques 30, 892–897 (2001).
16. Dehal, P. S. et al. MicrobesOnline: an integrated portal for comparative and
functional genomics. Nucleic Acids Res. 38, D396–D400 (2010).
17. Bondy-Denomy, J. & Davidson, A. R. When a virus is not a parasite: the
beneficial effects of prophages on bacterial fitness. J. Microbiol. 52, 235–242
18. Ross, W. et al. A third recognition element in bacterial promoters: DNA
binding by the alpha subunit of RNA polymerase. Science 262, 1407–1413
19. Graves, M. C. & Rabinowitz, J. C. In vivo and in vitro transcription of
the Clostridium pasteurianum ferredoxin gene. Evidence for ‘extended’
promoter elements in gram-positive organisms. J. Biol. Chem. 261,
11409–11415 (1986).
20. Helmann, J. D. Compilation and analysis of Bacillus subtilis sigma A-dependent
promoter sequences: evidence for extended contact between RNA polymerase
and upstream promoter DNA. Nucleic Acids Res. 23, 2351–2360 (1995).
21. Burns, H. D., Ishihama, A. & Minchin, S. D. Open complex formation during
transcription initiation at the Escherichia coli galP1 promoter: the role of the
RNA polymerase alpha subunit at promoters lacking an UP-element.
Nucleic Acids Res. 27, 2051–2056 (1999).
22. Barne, K. A., Bown, J. A., Busby, S. J. & Minchin, S. D. Region 2.5 of the
Escherichia coli RNA polymerase sigma70 subunit is responsible for
the recognition of the ‘extended-10’ motif at promoters. EMBO J. 16,
4034–4040 (1997).
23. Liu, W., Zhang, X.-Z., Zhang, Z. & Zhang, Y.-H. P. Engineering of Clostridium
phytofermentans Endoglucanase Cel5A for improved thermostability.
Appl. Environ. Microbiol. 76, 4914–4917 (2010).
24. Tolonen, A. C., Chilaka, A. C. & Church, G. M. Targeted gene inactivation in
Clostridium phytofermentans shows that cellulose degradation requires the
family 9 hydrolase Cphy3367. Mol. Microbiol. 74, 1300–1313 (2009).
25. Tolonen, A. C. et al. Fungal lysis by a soil bacterium fermenting cellulose.
Environ. Microbiol. 17, 2618–2627 (2015).
26. Weickert, M. J. & Chambliss, G. H. Site-directed mutagenesis of a catabolite
repression operator sequence in Bacillus subtilis.Proc. Natl Acad. Sci. USA. 87,
6238–6242 (1990).
27. Marciniak, B. C. et al. High- and low-affinity cre boxes for CcpA binding in
Bacillus subtilis revealed by genome-wide analysis. BMC Genomics 13, 401
28. Francke, C., Kerkhoven, R., Wels, M. & Siezen, R. J. A generic approach to
identify transcription factor-specific operator motifs; Inferences for LacI-family
mediated regulation in Lactobacillus plantarum WCFS1. BMC Genomics 9, 145
29. Richard, P. & Hilditch, S. D-galacturonic acid catabolism in microorganisms
and its biotechnological relevance. Appl. Microbiol. Biotechnol. 82, 597–604
30. Grundy, F. J., Waters, D. A., Allen, S. H. & Henkin, T. M. Regulation of the
Bacillus subtilis acetate kinase gene by CcpA. J. Bacteriol. 175, 7348–7355
31. Presecan-Siedel, E. et al. Catabolite regulation of the pta gene as part of carbon
flow pathways in Bacillus subtilis.J. Bacteriol. 181, 6889–6897 (1999).
32. Fujita, Y. Carbon catabolite control of the metabolic network in Bacillus subtilis.
Biosci. Biotechnol. Biochem. 73, 245–259 (2009).
33. Paredes, C. J., Alsaker, K. V. & Papoutsakis, E. T. A comparative genomic view
of clostridial sporulation and physiology. Nat. Rev. Microbiol. 3, 969–978
34. Chen, Y., Indurthi, D. C., Jones, S. W. & Papoutsakis, E. T. Small RNAs in the
genus Clostridium.MBio. 2, e00340-10 (2011).
35. Merino, E. & Yanofsky, C. Transcription attenuation: a highly conserved
regulatory strategy used by bacteria. Trends Genet. 21, 260–264 (2005).
36. Tolonen, A. C., Petit, E., Blanchard, J. L., Warnick, T. & Leschine, S. B. in
Biological Conversion of Biomass for Fuels and Chemicals (eds Sun, J. et al.)
114–139 (Royal Society of Chemistry, 2013).
37. Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger
RNAs directly to regulate bacterial gene expression. Nature 419, 952–956
38. Celik, H. et al. A two-component system (XydS/R) controls the expression of
genes encoding CBM6-containing proteins in response to straw in Clostridium
cellulolyticum.PLoS ONE 8, e56063 (2013).
39. Newcomb, M., Chen, C.-Y. & Wu, J. H. D. Induction of the celC operon of
Clostridium thermocellum by laminaribiose. Proc. Natl Acad. Sci. USA 104,
3747–3752 (2007).
40. Nataf, Y. et al. Clostridium thermocellum cellulosomal genes are regulated by
extracytoplasmic polysaccharides via alternative sigma factors. Proc. Natl Acad.
Sci. USA 107, 18646–18651 (2010).
41. Cavedon, K., Leschine, S. B. & Canale-Parola, E. Cellulase system of a free-
living, mesophilic clostridium (strain C7). J. Bacteriol. 172, 4222–4230
42. Hong, J., Ye, X., Wang, Y. & Zhang, Y.-H. P. Bioseparation of
recombinant cellulose-binding module-proteins by affinity adsorption
on an ultra-high-capacity cellulosic adsorbent. Anal. Chim. Acta 621,
193–199 (2008).
43. Tolonen, A. C. et al. Physiology, genomics, and pathway engineering of an
ethanol-tolerant strain of Clostridium phytofermentans.Appl. Environ.
Microbiol. 81, 5440–5448 (2015).
44. Alberti, A. et al. Comparison of library preparation methods reveals their
impact on interpretation of metatranscriptomic data. BMC Genomics 15,
912 (2014).
45. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2.
Nat. Methods 9, 357–359 (2012).
46. Vallenet, D. et al. MicroScope—an integrated microbial resource for the
curation and comparative analysis of genomic and metabolic data. Nucleic
Acids Res. 41, D636–D647 (2013).
47. Delhomme, N., Padioleau, I., Furlong, E. E. & Steinmetz, L. M. easyRNASeq:
a bioconductor package for processing RNA-Seq data. Bioinformatics 28,
2532–2533 (2012).
48. Anders, S. & Huber, W. Differential expression analysis for sequence count
data. Genome Biol. 11, R106 (2010).
49. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in
large-scale protein identifications by mass spectrometry. Nat. Methods 4,
207–214 (2007).
50. Tolonen, A. C. & Haas, W. Quantitative proteomics using reductive
dimethylation for stable isotope labeling. J Vis. Exp. 89, e51416 (2014).
51. Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization
to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2,
28–36 (1994).
52. Bailey, T. L. & Gribskov, M. Combining evidence using p-values:
application to sequence homology searches. Bioinformatics 14, 48–54
This work was funded by a CNRS Chaire d’Excellence to A.C.T. and the Genoscope-
CEA. We thank NEB for providing reagents (biotin-GTP, vaccinia capping enzyme and
streptavidin beads), the Genoscope-CEA sequencing platform for RNA sequencing and
the LABGeM group for supporting the MicroScope (MaGe) annotation resource.
Author contributions
L.E., A.A., M.S., I.S. and A.C.T. conceived the project. M.B., T.C. and K.L. collected data.
M.B., L.E., I.S. and A.C.T. analysed the results. A.C.T. wrote the paper.
Additional information
Supplementary Information accompanies this paper at
Competing financial interests: The authors declare no competing financial
8NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms1 3783 |
Reprints and permission information is available online at
How to cite this article: Boutard, M. et al. Global repositioning of transcription start sites
in a plant-fermenting bacterium. Nat. Commun. 7, 13783 doi: 10.1038/ncomms13783
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This work is licensed under a Creative Commons Attribution 4.0
International License. The images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated otherwise
in the credit line; if the material is not included under the Creative Commons license,
users will need to obtain permission from the license holder to reproduce the material.
To view a copy of this license, visit
rThe Author(s) 2016
NATURE COMMUNICATIONS | 7:13783 | DOI: 10.1038/ncomms13783 | 9
... While environmental conditions change, several genes are differentially expressed in bacteria. TSSs of these genes may be changed accordingly (Mendoza-Vargas et al. 2009;Boutard et al. 2016;Vera et al. 2020). For the primary and secondary TSSs corresponding to 530 genes, we investigated the distribution of the TSSs number per gene (Fig. 7A). ...
... There were even up to ten TSSs for a single gene. The distribution was similar to that observed in E. coli and Clostridium phytofermentans (Mendoza-Vargas et al. 2009;Boutard et al. 2016). Some of these genes have been reported to be essential for the growth and virulence of Brucella species, such as quorum sensing-dependent transcriptional regulator VjbR, two-component system genes, and lipopolysaccharide-related genes (Guzman-Verri et al. 2002;Kleinman et al. 2017;Smith 2018). ...
... Bacteria could regulate the expression of genes in response to different environments by changing the position of transcription initiation (Mendoza-Vargas et al. 2009;Boutard et al. 2016;Vera et al. 2020). We found that 57 % genes are expressed from more than one TSS, some of which are essential for Brucella growth and virulence ( Fig. 7A and B). ...
Brucella melitensis (B. melitensis) is an important facultative intracellular bacterium that causes global zoonotic diseases. Continuous intracellular survival and replication are the main obstruction responsible for the accessibility of prevention and treatment of brucellosis. Bacteria respond to complex environment by regulating gene expression. Many regulatory factors function at loci where RNA polymerase initiates messenger RNA synthesis. However, limited gene annotation is a current obstacle for the research on expression regulation in bacteria. To improve annotation and explore potential functional sites, we proposed a novel genome-wide method called Capping-seq for transcription start site (TSS) mapping in B. melitensis. This technique combines capture of capped primary transcripts with Single Molecule Real-Time (SMRT) sequencing technology. We identified 2,369 TSSs at single nucleotide resolution by Capping-seq. TSSs analysis of Brucella transcripts showed a preference of purine on the TSS positions. Our results revealed that -35 and -10 elements of promoter contained consensus sequences of TTGNNN and TATNNN, respectively. The 5' ends analysis showed that 57% genes are associated with more than one TSS and 47% genes contain long leader regions, suggested potential complex regulation at the 5' ends of genes in B. melitensis. Moreover, we identified 52 leaderless genes that are mainly involved in the metabolic processes. Overall, Capping-seq technology provides a unique solution for TSS determination in prokaryotes. Our findings develop a systematic insight into the primary transcriptome characterization of B. melitensis. This study represents a critical basis for investigating gene regulation and pathogenesis of Brucella.
... Genome-wide determination of prokaryotic TSSs has been greatly facilitated by RNA-seq-derived methods that take advantage of the characteristic presence of a 59 triphosphate on the initiating nucleotide of unprocessed RNAs (such as mRNAs). In the most recent approaches, these RNAs are selectively targeted by the vaccinia capping enzyme, which adds either a desthiobiotinylated (Cappableseq) or a biotinylated (Capp-Switch seq) guanosine cap on triphosphorylated RNAs, permitting their capture on streptavidin-coupled beads (2)(3)(4). ...
... Hence, we reasoned that the final threshold set on reads per million (RPM) values of TSSs (i.e., 10 RPM for Capp-Switch sequencing [3]) needed to be dynamically adjusted to consider the local gene expression downstream from each TSS. Each TSS is associated with a single gene based on its localization (intragenic, positioned inside the associated gene; intergenic, positioned outside the associated gene, defined as having the closest gene start or end relative to the TSS). ...
... Libraries were next obtained by PCR Capp-Switch library preparation. The Capp-switch sequencing library preparation protocol was used (3). Briefly, a 59 biotinylated cap was first added to the 59-PPP RNAs using vaccinia capping enzyme (New England Biolabs). ...
Full-text available
Solventogenic clostridia have been employed in industry for more than a century, initially being used in the acetone-butanol-ethanol (ABE) fermentation process for acetone and butanol production. Interest in these bacteria has recently increased in the context of green chemistry and sustainable development.
... Changes in the transcription start site depending on two different electron acceptor have been reported in studies on Geobactor [35]. Also, genome-wide analysis of transcription start sites in Clostridium identified several metabolism-related genes with multiple transcription start sites that change depending on the substrate [36]. Although the importance of having multiple transcription start sites has not been fully elucidated, it is considered an important regulatory mechanism of gene expression because it largely influences transcription efficiency, translation initiation, and protein abundance [37]. ...
Full-text available
N 2 O is the major greenhouse gases influencing global warming, and agricultural land is the predominant (anthropogenic) source of N 2 O emissions. Here, we report the high N 2 O-reducing activity of Bradyrhizobium ottawaense , suggesting the potential for efficiently mitigating N 2 O emission from agricultural lands. Among the 15 B. ottawaense isolates examined, the N 2 O-reducing activities of most (13) strains were approximately 5-fold higher than that of Bradyrhizobium diazoefficiens USDA110 T under anaerobic free-living conditions. This robust N 2 O-reducing activity of B. ottawaense was confirmed by N 2 O reductase (NosZ) protein levels and in the soybean rhizosphere after nodule decomposition. While the NosZ of B. ottawaense and B. diazoefficiens showed high homology, nosZ gene expression in B . ottawaense was over 150-fold higher than that in B. diazoefficiens USDA110 T , suggesting the high N 2 O-reducing activity of B. ottawaense is achieved by high nos expression. Furthermore, we examined the nos operon transcription start sites and found that, unlike B. diazoefficiens , B . ottawaense has two transcription start sites under N 2 O-respiring conditions, which may contribute to the high nosZ expression. Our study proposes the potential of B. ottawaense for effective N 2 O reduction and unique regulation of nos gene expression that contributes to the high performance of N 2 O mitigation in the soil.
... database to construct the training and independent test datasets. These 13 bacterial species include B. amyloliquefaciens [35], C. jejuni [36,37], C. phytofermentans [38], C. pneumoniae [39], E. coli [40,41], H. pylori [36,42], L. interrogans [43], M. smegmatis [44], R. capsulatus [45], S. coelicolor [46], S. oneidensis [47], S. pyogenes [48] and S. Typhimurium [49]. The experimentally verified annotations of bacterial transcription start sites (TSSs) were taken from the corresponding references. ...
Background: Promoters are DNA regions that initiate the transcription of specific genes near the transcription start sites. In bacteria, promoters are recognized by RNA polymerases and associated sigma factors. Effective promoter recognition is essential for synthesizing the gene-encoded products by bacteria to grow and adapt to different environmental conditions. A variety of machine learning-based predictors for bacterial promoters have been developed; however, most of them were designed specifically for a particular species. To date, only a few predictors are available for identifying general bacterial promoters with limited predictive performance. Results: In this study, we developed TIMER, a Siamese neural network-based approach for identifying both general and species-specific bacterial promoters. Specifically, TIMER uses DNA sequences as the input and employs three Siamese neural networks with the attention layers to train and optimize the models for a total of 13 species-specific and general bacterial promoters. Extensive 10-fold cross-validation and independent tests demonstrated that TIMER achieves a competitive performance and outperforms several existing methods on both general and species-specific promoter prediction. As an implementation of the proposed method, the web server of TIMER is publicly accessible at
... For those genes with two TSSs, TSS with a higher value was considered. This result is in agreement with Boutard et al. [29] where most genes were expressed from a single TSS. Identification of transcription start sites enables identification of promoter regions [30]. ...
Full-text available
Background Mycobacterium colombiense is an acid-fast, non-motile, rod-shaped mycobacterium confirmed to cause respiratory disease and disseminated infection in immune-compromised patients, and lymphadenopathy in immune-competent children. It has virulence mechanisms that allow them to adapt, survive, replicate, and produce diseases in the host. To tackle the diseases caused by M . colombiense , understanding of the regulation mechanisms of its genes is important. This paper, therefore, analyzes transcription start sites, promoter regions, motifs, transcription factors, and CpG islands in TetR family transcriptional regulatory (TFTR) genes of M . colombiense CECT 3035 using neural network promoter prediction, MEME, TOMTOM algorithms, and evolutionary analysis with the help of MEGA-X. Results The analysis of 22 protein coding TFTR genes of M . colombiense CECT 3035 showed that 86.36% and 13.64% of the gene sequences had one and two TSSs, respectively. Using MEME, we identified five motifs (MTF1, MTF2, MTF3, MTF4, and MTF5) and MTF1 was revealed as the common promoter motif for 100% TFTR genes of M . colombiense CECT 3035 which may serve as binding site for transcription factors that shared a minimum homology of 95.45%. MTF1 was compared to the registered prokaryotic motifs and found to match with 15 of them. MTF1 serves as the binding site mainly for AraC, LexA, and Bacterial histone-like protein families. Other protein families such as MATP, RR, σ-70 factor, TetR, LytTR, LuxR, and NAP also appear to be the binding candidates for MTF1. These families are known to have functions in virulence mechanisms, metabolism, quorum sensing, cell division, and antibiotic resistance. Furthermore, it was found that TFTR genes of M . colombiense CECT 3035 have many CpG islands with several fragments in their CpG islands. Molecular evolutionary genetic analysis showed close relationship among the genes. Conclusion We believe these findings will provide a better understanding of the regulation of TFTR genes in M . colombiense CECT 3035 involved in vital processes such as cell division, pathogenesis, and drug resistance and are likely to provide insights for drug development important to tackle the diseases caused by this mycobacterium. We believe this is the first report of in silico analyses of the transcriptional regulation of M . colombiense TFTR genes.
... A previous study using a technical variant of Cappable-seq (Capp-switch) investigated genome-wide patterns of C. phy transcription initiation on various plant substrates and demonstrated conditiondependent transcription regulation modifications. Amongst these changes, interesting regulatory mechanisms such as antisense transcription, leaderless transcription and non-coding RNA were identified (Boutard et al., 2016). ...
Full-text available
The development of high-throughput DNA sequencing revolutionized the study of complex bacterial communities called “microbiomes'', in diverse environments, from the central oceans to the human intestine. The research aim of this thesis is to develop new sequencing-based technologies and apply them to provide further insights into changes to the composition and activities of microbiomes. Specifically, Chapter One presents RIMS-seq (Rapid Identification of Methylase Specificity), a method to simultaneously obtain the DNA sequence and 5-methylcytosine (m5C) profile of bacterial genomes. Modification by m5C has been described in the genomes of many bacterial species to modulate gene expression and protect from viral infection. Chapter Two introduces ONT-cappable-seq and Loop-Cappable-seq, two new techniques to reveal operon architecture through full-length transcript sequencing, using Nanopore and LoopSeq sequencing, respectively. In Chapter Three, we applied a multiomics approach using some of the tools developed in the previous chapters tostudy the dynamics of the response of a model human intestinal microbiome after treatment with ciprofloxacin, a widely used broad-spectrum antibiotic. Antibiotics are critical treatments to prevent pathogenic infections, but they also kill commensal species that promote health, enhance the spread of resistant strains, and may degrade the protective effect of microbiota against invasion by pathogens. Therefore, it is crucial to be able to characterize both the composition but also the functional response of a microbial community to antibiotic treatment. We examined both the short and long-term transcriptional and genomic responses of the synthetic community and explored how the immediate transcriptomic response correlates and potentially predicts the later changes of the microbiome composition. The goal is to try to identify a marker appearing a few minutes/hours after the treatment that could be used to potentially predict the outcome of an antibiotic treatment, opening up the path to a more personalized medicine.
... The genes in the RefSeq annotation divided the TSS into the categories InterS (intergenic TSS with downstream gene in same orientation), InterA (intergenic TSS with downstream gene in opposite orientation), IntraS (intragenic TSS in gene with same orientation), or IntraA (intragenic TSS in gene with opposite orientation) according to Boutard et al. [72]. The Bedtools toolset was used to search for the closest gene and determine its distance from the TSS [73]. ...
Full-text available
The bacterial pathogen Salmonella enterica, which causes enteritis, has a broad host range and extensive environmental longevity. In water and soil, Salmonella interacts with protozoa and multiplies inside their phagosomes. Although this relationship resembles that between Salmonella and mammalian phagocytes, the interaction mechanisms and bacterial genes involved are unclear. Here, we characterized global gene expression patterns of S. enterica serovar Typhimurium within Acanthamoeba castellanii at the early stage of infection by Cappable-Seq. Gene expression features of S. Typhimurium within A. castellanii were presented with downregulation of glycolysis-related, and upregulation of glyoxylate cycle-related genes. Expression of Salmonella Pathogenicity Island-1 (SPI-1), chemotaxis system, and flagellar apparatus genes was upregulated. Furthermore, expression of genes mediating oxidative stress response and iron uptake was upregulated within A. castellanii as well as within mammalian phagocytes. Hence, global S. Typhimurium gene expression patterns within A. castellanii help better understand the molecular mechanisms of Salmonella adaptation to an amoeba cell and intracellular persistence in protozoa inhabiting water and soil ecosystems.
... These findings suggest that similar evolutionary tuning may enable GCadaptation of σ70 stringency in concert with evolving GC content. In line with our proposed mechanism, promoters from GC-rich Rhizobium and Streptomyces species (61.5 and 72.1 %GC, respectively) [31,32] have higher levels of degeneracy in their σ70 motifs than low GC organisms like Clostridium fermentans (GC% = 35) [33], consistent with less stringent requirements for initiation of transcription. A study of synthetic promoter sequences in industrial Clostridium species with similarly low GC contents identified a strong preference for AT-rich promoters [23], providing further support for GC-adaptation of promoter stringency. ...
While horizontal gene transfer is prevalent across the biosphere, the regulatory features that enable expression and functionalization of foreign DNA remain poorly understood. Here, we combine high-throughput promoter activity measurements and large-scale genomic analysis of regulatory regions to investigate the cross-compatibility of regulatory elements (REs) in bacteria. Functional characterization of thousands of natural REs in three distinct bacterial species revealed distinct expression patterns according to RE and recipient phylogeny. Host capacity to activate foreign promoters was proportional to their genomic GC content, while many low GC regulatory elements were both broadly active and had more transcription start sites across hosts. The difference in expression capabilities could be explained by the influence of the host GC content on the stringency of the AT-rich canonical σ70 motif necessary for transcription initiation. We further confirm the generalizability of this model and find widespread GC content adaptation of the σ70 motif in a set of 1,545 genomes from all major bacterial phyla. Our analysis identifies a key mechanism by which the strength of the AT-rich σ70 motif relative to a host’s genomic GC content governs the capacity for expression of acquired DNA. These findings shed light on regulatory adaptation in the context of evolving genomic composition.
Control of gene expression is fundamental to cell engineering. Here we demonstrate a set of approaches to tune gene expression in Clostridia using the model Clostridium phytofermentans. Initially, we develop a simple benchtop electroporation method that we use to identify a set of replicating plasmids and resistance markers that can be cotransformed into C. phytofermentans. We define a series of promoters spanning a >100-fold expression range by testing a promoter library driving the expression of a luminescent reporter. By insertion of tet operator sites upstream of the reporter, its expression can be quantitatively altered using the Tet repressor and anhydrotetracycline (aTc). We integrate these methods into an aTc-regulated dCas12a system with which we show in vivo CRISPRi-mediated repression of reporter and fermentation genes in C. phytofermentans. Together, these approaches advance genetic transformation and experimental control of gene expression in Clostridia.
Full-text available
Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at
Full-text available
*Motivation:* High-throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq) or cell counting (barcode sequencing). Statistical inference of differential signal in such data requires estimation of their variability throughout the dynamic range. When the number of replicates is small, error modelling is needed to achieve statistical power. Results: We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. *Availability:* A free open-source R software package, DESeq , is available from the Bioconductor project and from "":
Full-text available
Background The initiating nucleotide found at the 5’ end of primary transcripts has a distinctive triphosphorylated end that distinguishes these transcripts from all other RNA species. Recognizing this distinction is key to deconvoluting the primary transcriptome from the plethora of processed transcripts that confound analysis of the transcriptome. The currently available methods do not use targeted enrichment for the 5′end of primary transcripts, but rather attempt to deplete non-targeted RNA. Results We developed a method, Cappable-seq, for directly enriching for the 5' end of primary transcripts and enabling determination of transcription start sites at single base resolution. This is achieved by enzymatically modifying the 5′ triphosphorylated end of RNA with a selectable tag. We first applied Cappable-seq to E. coli, achieving up to 50 fold enrichment of primary transcripts and identifying an unprecedented 16539 transcription start sites (TSS) genome-wide at single base resolution. We also applied Cappable-seq to a mouse cecum sample and identified TSS in a microbiome. Conclusions Cappable-seq allows for the first time the capture of the 5′ end of primary transcripts. This enables a unique robust TSS determination in bacteria and microbiomes. In addition to and beyond TSS determination, Cappable-seq depletes ribosomal RNA and reduces the complexity of the transcriptome to a single quantifiable tag per transcript enabling digital profiling of gene expression in any microbiome. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2539-z) contains supplementary material, which is available to authorized users.
Full-text available
The Rfam database (available at is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.
Full-text available
Clostridium phytofermentans was isolated from forest soil and is distinguished by its capacity to directly ferment plant cell wall polysaccharides into ethanol as the primary product, suggesting that it possesses unusual catabolic pathways. The objective of the present study was to understand the molecular mechanisms of biomass conversion to ethanol in a single organism, Clostridium phytofermentans, by analyzing its complete genome and transcriptome during growth on plant carbohydrates. The saccharolytic versatility of C. phytofermentans is reflected in a diversity of genes encoding ATP-binding cassette sugar transporters and glycoside hydrolases, many of which may have been acquired through horizontal gene transfer. These genes are frequently organized as operons that may be controlled individually by the many transcriptional regulators identified in the genome. Preferential ethanol production may be due to high levels of expression of multiple ethanol dehydrogenases and additional pathways maximizing ethanol yield. The genome also encodes three different proteinaceous bacterial microcompartments with the capacity to compartmentalize pathways that divert fermentation intermediates to various products. These characteristics make C. phytofermentans an attractive resource for improving the efficiency and speed of biomass conversion to biofuels.
Full-text available
Novel processing strategies for hydrolysis and fermentation of lignocellulosic biomass in a single reactor offer large potential cost savings for production of biocommodities and biofuels. One critical challenge is retaining high enzyme production in the presence of elevated product titers. Toward this goal, the cellulolytic, ethanol-producing bacterium Clostridium phytofermentans was adapted to increased ethanol concentrations. The resulting ethanol-tolerant strain (ET strain) has nearly doubled ethanol tolerance relative to wild-type, but also reduced ethanol yield and growth at low ethanol concentrations. The genome of the ET strain has coding changes in proteins involved in membrane biosynthesis, the Rnf complex, cation homeostasis, gene regulation, and ethanol production. In particular, purification of the mutant bi-functional acetaldehyde CoA/alcohol dehydrogenase showed that a G609D variant abolished its activities, including ethanol formation. Heterologous expression of Zymomonas mobilis pyruvate decarboxylase and alcohol dehydrogenase in the ET strain increased cellulose consumption and restored ethanol production, demonstrating how metabolic engineering can be used to overcome disadvantageous mutations incurred during adaptation to ethanol. We discuss how genetic changes in the ET strain reveal novel potential strategies for improving microbial solvent tolerance. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Full-text available
Microbial metabolism of plant polysaccharides is an important part of environmental carbon cycling, human nutrition, and industrial processes based on cellulosic bioconversion. Here we demonstrate a broadly applicable method to analyze how microbes catabolize plant polysaccharides that integrates carbohydrate-active enzyme (CAZyme) assays, RNA sequencing (RNA-seq), and anaerobic growth screening. We apply this method to study how the bacterium Clostridium phytofermentans ferments plant biomass components including glucans, mannans, xylans, galactans, pectins, and arabinans. These polysaccharides are fermented with variable efficiencies, and diauxies prioritize metabolism of preferred substrates. Strand-specific RNA-seq reveals how this bacterium responds to polysaccharides by up-regulating specific groups of CAZymes, transporters, and enzymes to metabolize the constituent sugars. Fifty-six up-regulated CAZymes were purified, and their activities show most polysaccharides are degraded by multiple enzymes, often from the same family, but with divergent rates, specificities, and cellular localizations. CAZymes were then tested in combination to identify synergies between enzymes acting on the same substrate with different catalytic mechanisms. We discuss how these results advance our understanding of how microbes degrade and metabolize plant biomass.
Clostridia are anaerobic, endospore-forming prokaryotes that include strains of importance to human and animal health and physiology, cellulose degradation, solvent production and bioremediation. Their differentiation and related developmental programmes are not well understood at the molecular level. Recent genome sequencing and transcriptional-profiling studies have offered a glimpse of their inner workings and indicate that a better understanding of the orchestration of the molecular events that underlie their unique physiology, capabilities and diversity will pay major dividends.