FIG 2 - uploaded by Andrew Tolonen
Content may be subject to copyright.
Normalization of Capp-Switch reads enhances TSS identification for C. beijerinckii DSM 6423. (a) TSS expression distribution with (normalized) and without (raw) expression normalization (10 reads per million [RPM] [TSS] detection threshold shown). (b) RNA-seq expression distributions of (i) all genes (black line) and (ii) the subset of genes for which TSSs have been found, in normalized (blue) and raw (red) data sets (25 RPM detection threshold shown). (c) Number of TSSs found for normalized and raw data sets. (d) TSS number per gene (with detected TSSs, 25 RPM detection threshold shown). Value corresponds to the number of reads falling in each category (as percentage of the total RPM). (e) Venn diagram showing TSSs found for each data set (25 RPM detection threshold shown) and the corresponding numbers of associated genes.
Source publication
Solventogenic clostridia have been employed in industry for more than a century, initially being used in the acetone-butanol-ethanol (ABE) fermentation process for acetone and butanol production. Interest in these bacteria has recently increased in the context of green chemistry and sustainable development.
Citations
Agrobacteria are a diverse, polyphyletic group of prokaryotes with multipartite genomes capable of transferring DNA into the genomes of host plants, making them an essential tool in plant biotechnology. Despite their utility in plant transformation, genome-wide transcriptional regulation is not well understood across the three main lineages of agrobacteria. Transcription start sites (TSSs) are a necessary component of gene expression and regulation. In this study, we used differential RNA-seq and a TSS identification algorithm optimized on manually annotated TSS, then validated with existing TSS to identify thousands of TSS with nucleotide resolution for representatives of each lineage. We extend upon the 356 TSSs previously reported in Agrobacterium fabrum C58 by identifying 1,916 TSSs. In addition, we completed genomes and phenotyping of Rhizobium rhizogenes C16/80 and Allorhizobium vitis T60/94, identifying 2,650 and 2,432 TSSs, respectively. Parameter optimization was crucial for an accurate, high-resolution view of genome and transcriptional dynamics, highlighting the importance of algorithm optimization in genome-wide TSS identification and genomics at large. The optimized algorithm reduced the number of TSSs identified internal and antisense to the coding sequence on average by 90.5% and 91.9%, respectively. Comparison of TSS conservation between orthologs of the three lineages revealed differences in cell cycle regulation of ctrA as well as divergence of transcriptional regulation of chemotaxis-related genes when grown in conditions that simulate the plant environment. These results provide a framework to elucidate the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria.
IMPORTANCE
Transcription start sites (TSSs) are fundamental for understanding gene expression and regulation. Agrobacteria, a group of prokaryotes with the ability to transfer DNA into the genomes of host plants, are widely used in plant biotechnology. However, the genome-wide transcriptional regulation of agrobacteria is not well understood, especially in less-studied lineages. Differential RNA-seq and an optimized algorithm enabled identification of thousands of TSSs with nucleotide resolution for representatives of each lineage. The results of this study provide a framework for elucidating the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. The optimized algorithm also highlights the importance of parameter optimization in genome-wide TSS identification and genomics at large.