FIG 3 - uploaded by Andrew Tolonen
Content may be subject to copyright.
General transcriptomic features of C. beijerinckii DSM 6423, C. beijerinckii NCIMB 8052, and C. acetobutylicum ATCC 824. (a) Number of TSSs found for each strain with a confidence threshold of 25 RPM. (b) Classification of TSSs in 4 categories: intergenic sense (InterS), intragenic sense (IntraS), intergenic antisense (InterA), and intragenic antisense (IntraA). (c) Number of InterS TSSs per gene for each strain. Values correspond to the percentage of genes with detected TSSs. (d) The 235 and 210 motifs found upstream from InterS TSSs of the three strains (e). 59 UTR length distributions, calculated as the distance between an InterS TSS and coding DNA sequence (CDS) starts.
Source publication
Solventogenic clostridia have been employed in industry for more than a century, initially being used in the acetone-butanol-ethanol (ABE) fermentation process for acetone and butanol production. Interest in these bacteria has recently increased in the context of green chemistry and sustainable development.
Contexts in source publication
Context 1
... we compared the data sets for each strain, with (normalized) and without (raw) normalization (Fig. 2, Fig. S2 and 3). Adjusting TSS strength relative to local gene expression significantly changed the data set distribution (Fig. 2a, Fig. S2). When we compared the distribution of the expression of gene subsets with a detected TSS, this further resulted in a significant shift from highly expressed genes (raw data set; .1,000 TPM) to a distribution ...
Context 2
... resulted from the higher sensitivity when additional data were considered. We subsequently tested our hypothesis that in the original data set, TSSs tend to accumulate on a few genes (Fig. 2d). A high proportion of reads (.70%) contributed to the detection of TSSs in genes that bore more than 4 TSSs, with a maximum of 244 TSSs for a single gene (Fig. S3). Conversely, expression normalization reduced the proportion of genes with more than 4 TSS to 15%, indicating that (i) that a higher number of genes overall were found with one or several TSSs and (ii) the normalization step improved the data set by removing secondary TSSs which were linked either to pervasive transcription or to ...
Context 3
... in 4 categories depending on their orientation and localization relative to the associated genes: InterS (intergenic TSS with downstream gene in same orientation), InterA (intergenic TSS with downstream gene opposite orientation), IntraS (intragenic TSS in gene with same orientation), or IntraA (intragenic TSS in gene with opposite orientation) (Fig. 3b). In the 3 strains, TSS repartition was relatively similar, with most TSSs identified in the sense direction (InterS: 40 to 55%; IntraS: 40 to 55%). Such an abundance of intragenic TSSs has been observed on several occasions using different methodologies (4,12), and this has been hypothesized to mainly be the result of pervasive ...
Context 4
... YYYY Volume XX Issue XX 10.1128/spectrum.02288-21depending on the strain; Fig. 3c). As expected, conserved 210 and 235 motifs were found enriched upstream from detected TSSs in all three strains (Fig. 3d), confirming these were bona fide TSSs. Less than 3% of the TSSs were observed in the antisense direction, which supports previous results obtained for C. phytofermentans (3) (Fig. ...
Context 5
... YYYY Volume XX Issue XX 10.1128/spectrum.02288-21depending on the strain; Fig. 3c). As expected, conserved 210 and 235 motifs were found enriched upstream from detected TSSs in all three strains (Fig. 3d), confirming these were bona fide TSSs. Less than 3% of the TSSs were observed in the antisense direction, which supports previous results obtained for C. phytofermentans (3) (Fig. ...
Context 6
... on the strain; Fig. 3c). As expected, conserved 210 and 235 motifs were found enriched upstream from detected TSSs in all three strains (Fig. 3d), confirming these were bona fide TSSs. Less than 3% of the TSSs were observed in the antisense direction, which supports previous results obtained for C. phytofermentans (3) (Fig. ...
Context 7
... in vivo. Confirmation of antisense transcription from some InterA and IntraA TSS was achieved by mapping visualization of forward reads from paired-read RNAseq (Fig. S5). The 59 UTR lengths (measured as distances in bp between InterS TSS positions and corresponding coding DNA sequence [CDS] starts) rarely exceeded 200 bp in the three strains (Fig. 3e), exhibiting no correlation with gene expression (Data Set S8). Most 59 UTRs were between 0 and 100 bp long (with a peak at 232 bp relative to the start codon), with %2 to 3% of transcripts categorized as leaderless (transcripts not bearing an upstream RBS; for this analysis, 59 UTR length of ,6 bp), suggesting that, despite being ...
Context 8
... expression data analysis was incorporated into the detection pipeline by performing RNA-seq on the same mRNA samples and using the resulting expression values to normalize Capp-Switch seq data. This additional step limited the gene expression bias and hence enhanced results at the genome scale (expression bias, TSS number/gene) and the gene scale (Fig. S3), underlining the importance of treating Capp-Switch data with a normalization ...
Context 9
... genome and gene analyses (Fig. 3, Fig. S3) indicated that, for the three strains, Capp-Switch accurately detects TSSs at the single-nucleotide resolution. Importantly, TSSs have similar features in the three clostridia (total and pro-gene number of TSSs, TSS categories, 59 UTR length, upstream motifs). One interesting feature is the very low number of antisense TSSs, which was ...
Citations
Agrobacteria are a diverse, polyphyletic group of prokaryotes with multipartite genomes capable of transferring DNA into the genomes of host plants, making them an essential tool in plant biotechnology. Despite their utility in plant transformation, genome-wide transcriptional regulation is not well understood across the three main lineages of agrobacteria. Transcription start sites (TSSs) are a necessary component of gene expression and regulation. In this study, we used differential RNA-seq and a TSS identification algorithm optimized on manually annotated TSS, then validated with existing TSS to identify thousands of TSS with nucleotide resolution for representatives of each lineage. We extend upon the 356 TSSs previously reported in Agrobacterium fabrum C58 by identifying 1,916 TSSs. In addition, we completed genomes and phenotyping of Rhizobium rhizogenes C16/80 and Allorhizobium vitis T60/94, identifying 2,650 and 2,432 TSSs, respectively. Parameter optimization was crucial for an accurate, high-resolution view of genome and transcriptional dynamics, highlighting the importance of algorithm optimization in genome-wide TSS identification and genomics at large. The optimized algorithm reduced the number of TSSs identified internal and antisense to the coding sequence on average by 90.5% and 91.9%, respectively. Comparison of TSS conservation between orthologs of the three lineages revealed differences in cell cycle regulation of ctrA as well as divergence of transcriptional regulation of chemotaxis-related genes when grown in conditions that simulate the plant environment. These results provide a framework to elucidate the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria.
IMPORTANCE
Transcription start sites (TSSs) are fundamental for understanding gene expression and regulation. Agrobacteria, a group of prokaryotes with the ability to transfer DNA into the genomes of host plants, are widely used in plant biotechnology. However, the genome-wide transcriptional regulation of agrobacteria is not well understood, especially in less-studied lineages. Differential RNA-seq and an optimized algorithm enabled identification of thousands of TSSs with nucleotide resolution for representatives of each lineage. The results of this study provide a framework for elucidating the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. The optimized algorithm also highlights the importance of parameter optimization in genome-wide TSS identification and genomics at large.