ArticleLiterature Review

Metabolome-guided genome mining of RiPP natural products

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a chemically diverse class of metabolites. Many RiPPs show potent biological activities that make them attractive starting points for drug development. A promising approach for the discovery of new classes of RiPPs is genome mining. However, the accuracy of genome mining is hampered by the lack of signature genes shared across different RiPP classes. One way to reduce false-positive predictions is by complementing genomic information with metabolomics data. In recent years, several new approaches addressing such integrative genomics and metabolomics analyses have been developed. In this review, we provide a detailed discussion of RiPP-compatible software tools that integrate paired genomics and metabolomics data. We highlight current challenges in data integration and identify opportunities for further developments targeting new classes of bioactive RiPPs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, traditional methods of discovering new antibiotics have proven inadequate, leading to a shortage of novel antibiotics to combat increasingly resistant bacteria worldwide 1,2 . Advances in sequencing techniques have enabled large-scale sequencing of microbial genomes, opening new opportunities for discovering novel peptide antibiotics through genome mining 14 . Among the vast array of natural products, RiPPs stand out as one of the most expansive families, offering a remarkable diversity of structures and bioactive potential for antibiotic discovery 15 . ...
Preprint
Full-text available
Antimicrobial resistance remains a significant global threat, contributing significantly to mortality rates worldwide. Ribosomally synthesized and post-translationally modified peptides (RiPPs) have emerged as a promising source of novel peptide antibiotics due to their diverse chemical structures. Here, we reported the discovery of new Avi(Me)Cys-containing cyclopeptide antibiotics through a synergistic approach that combines rule-based genome mining, automated metabolomic analysis, and heterologous expression. We first bioinformatically identified 1,172 RiPP biosynthetic gene clusters (BGCs) responsible for Avi(Me)Cys-containing cyclopeptides from a vast pool of over 50,000 bacterial genomes. Subsequently, we successfully established the connection between three newly identified BGCs and the synthesis of five new peptide antibiotics. Notably, massatide A displayed excellent activity against a spectrum of gram-positive pathogens, including drug-resistant clinical isolates like linezolid-resistant S. aureus and methicillin-resistant S. aureus , with a minimum inhibitory concentration (MIC) of 0.25 μg/mL. The remarkable performance of massatide A in an animal infection model, coupled with a low risk of resistance and favorable safety profile, positions it as a promising candidate for antibiotic development. Our study highlights the potential of Avi(Me)Cys-containing cyclopeptides in expanding the arsenal of antibiotics against multi-drug-resistant bacteria, offering promising drug leads in the ongoing battle against infectious diseases.
... Lasso peptides are a growing class of bioactive bacterial peptides with unique lasso topology, which differentiates them from other members within the much larger ribosomally synthesized and post-translationally modified peptide (RiPP) superfamily [9,10]. The compact and constrained topology endows most lasso peptides with remarkable thermal and proteolytic stability and favors peptide-protein interactions, accounting for the diverse biological activities of lasso peptides, mainly as enzyme inhibitors and receptor antagonists [7,11]. ...
Article
Full-text available
Aborycin is a type I lasso peptide with a stable interlocked structure, offering a favorable framework for drug development. The aborycin biosynthetic gene cluster gul from marine sponge-associated Streptomyces sp. HNS054 was cloned and integrated into the chromosome of S. coelicolor hosts with different copies. The three-copy gul-integration strain S. coelicolor M1346::3gul showed superior production compared to the one-copy or two-copy gul-integration strains, and the total titer reached approximately 10.4 mg/L, i.e., 2.1 times that of the native strain. Then, five regulatory genes, phoU (SCO4228), wblA (SCO3579), SCO1712, orrA (SCO3008) and gntR (SCO1678), which reportedly have negative effects on secondary metabolism, were further knocked out from the M1346::3gul genome by CRISPR/Cas9 technology. While the ΔSCO1712 mutant showed a significant decrease (4.6 mg/L) and the ΔphoU mutant showed no significant improvement (12.1 mg/L) in aborycin production, the ΔwblA, ΔorrA and ΔgntR mutations significantly improved the aborycin titers to approximately 23.6 mg/L, 56.3 mg/L and 48.2 mg/L, respectively, which were among the highest heterologous yields for lasso peptides in both Escherichia coli systems and Streptomyces systems. Thus, this study provides important clues for future studies on enhancing antibiotic production in Streptomyces systems.
Article
Full-text available
Microorganisms produce small bioactive compounds as part of their secondary or specialised metabolism. Often, such metabolites have antimicrobial, anticancer, antifungal, antiviral or other bio-activities and thus play an important role for applications in medicine and agriculture. In the past decade, genome mining has become a widely-used method to explore, access, and analyse the available biodiversity of these compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free to use web server and as a standalone tool under an OSI-approved open source licence. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in archaea, bacteria, and fungi. Here, we present the updated version 7 of antiSMASH. antiSMASH 7 increases the number of supported cluster types from 71 to 81, as well as containing improvements in the areas of chemical structure prediction, enzymatic assembly-line visualisation and gene cluster regulation.
Article
Full-text available
Metabolomics-driven discoveries of biological samples remain hampered by the grand challenge of metabolite annotation and identification. Only few metabolites have an annotated spectrum in spectral libraries; hence, searching only for exact library matches generally returns a few hits. An attractive alternative is searching for so-called analogues as a starting point for structural annotations; analogues are library molecules which are not exact matches but display a high chemical similarity. However, current analogue search implementations are not yet very reliable and relatively slow. Here, we present MS2Query, a machine learning-based tool that integrates mass spectral embedding-based chemical similarity predictors (Spec2Vec and MS2Deepscore) as well as detected precursor masses to rank potential analogues and exact matches. Benchmarking MS2Query on reference mass spectra and experimental case studies demonstrate improved reliability and scalability. Thereby, MS2Query offers exciting opportunities to further increase the annotation rate of metabolomics profiles of complex metabolite mixtures and to discover new biology.
Article
Full-text available
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a large class of secondary metabolites that have garnered scientific attention due to their complex scaffolds with potential roles in medicine, agriculture, and chemical ecology. RiPPs derive from the cleavage of ribosomally synthesized proteins and additional modifications, catalyzed by various enzymes to alter the peptide backbone or side chains. Of these enzymes, cytochromes P450 (P450s) are a superfamily of heme-thiolate proteins involved in many metabolic pathways, including RiPP biosyntheses. In this review, we focus our discussion on P450 involved in RiPP pathways and the unique chemical transformations they mediate. Previous studies have revealed a wealth of P450s distributed across all domains of life. While the number of characterized P450s involved in RiPP biosyntheses is relatively small, they catalyze various enzymatic reactions such as C–C or C–N bond formation. Formation of some RiPPs is catalyzed by more than one P450, enabling structural diversity. With the continuous improvement of the bioinformatic tools for RiPP prediction and advancement in synthetic biology techniques, it is expected that further cytochrome P450-mediated RiPP biosynthetic pathways will be discovered. Summary The presence of genes encoding P450s in gene clusters for ribosomally synthesized and post-translationally modified peptides expand structural and functional diversity of these secondary metabolites, and here, we review the current state of this knowledge.
Article
Full-text available
Natural products research increasingly applies -omics technologies to guide molecular discovery. While the combined analysis of genomic and metabolomic datasets has proved valuable for identifying natural products and their biosynthetic gene clusters (BGCs) in bacteria, this integrated approach lacks application to fungi. Because fungi are hyper-diverse and underexplored for new chemistry and bioactivities, we created a linked genomics–metabolomics dataset for 110 Ascomycetes, and optimized both gene cluster family (GCF) networking parameters and correlation-based scoring for pairing fungal natural products with their BGCs. Using a network of 3,007 GCFs (organized from 7,020 BGCs), we examined 25 known natural products originating from 16 known BGCs and observed statistically significant associations between 21 of these compounds and their validated BGCs. Furthermore, the scalable platform identified the BGC for the pestalamides, demystifying its biogenesis, and revealed more than 200 high-scoring natural product–GCF linkages to direct future discovery.
Article
Full-text available
Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.
Article
Full-text available
Background It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. Results To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. Conclusion The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. 7Cdzcf-43_gfFdjHUXr8d9Video Abstract
Preprint
Full-text available
Small molecules can selectively modulate biological processes and thus generate phenotypic variation. Biological samples are complex matrices, and liquid chromatography tandem mass spectrometry often detects hundreds of molecules, of which only a fraction may be associated with this variation. The challenge therefore lies in the prioritization of the most relevant molecules for further investigation. Tools are needed to effectively contextualize mass spectrometric data with phenotypical and environmental (meta)data. To accelerate this task, we developed FERMO, a dashboard application combining mass spectrometry data with qualitative and quantitative biological observations. FERMO's centralized interface enables users to rapidly inspect data, formulate hypotheses, and prioritize molecules of interest. We demonstrate the applicability of FERMO in a case study on antibiotic activity of bacterial extracts, where we successfully prioritized the bioactive molecule siomycin out of 143 molecular features. We expect that besides natural product discovery, FERMO will find application in a wide range of omics-driven fields.
Article
Full-text available
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are of increasing interest in natural products as well as drug discovery. This empowers not only the unique chemical structures and topologies in natural products but also the excellent bioactivities such as antibacteria, antifungi, antiviruses, and so on. Advances in genomics, bioinformatics, and chemical analytics have promoted the exponential increase of RiPPs as well as the evaluation of biological activities thereof. Furthermore, benefiting from their relatively simple and conserved biosynthetic logic, RiPPs are prone to be engineered to obtain diverse analogues that exhibit distinct physiological activities and are difficult to synthesize. This Review aims to systematically address the variety of biological activities and/or the mode of mechanisms of novel RiPPs discovered in the past decade, albeit the characteristics of selective structures and biosynthetic mechanisms are briefly covered as well. Almost one-half of the cases are involved in anti-Gram-positive bacteria. Meanwhile, an increasing number of RiPPs related to anti-Gram-negative bacteria, antitumor, antivirus, etc., are also discussed in detail. Last but not least, we sum up some disciplines of the RiPPs’ biological activities to guide genome mining as well as drug discovery and optimization in the future.
Article
Full-text available
Natural products are structurally highly diverse and exhibit a wide array of biological activities. As a result, they serve as an important source of new drug leads. Traditionally, natural products have been discovered by bioactivity-guided fractionation. The advent of genome sequencing technology has resulted in the introduction of an alternative approach towards novel natural product scaffolds: Genome mining. Genome mining is an in-silico natural product discovery strategy in which sequenced genomes are analyzed for the potential of the associated organism to produce natural products. Seemingly universal biosynthetic principles have been deciphered for most natural product classes that are used to detect natural product biosynthetic gene clusters using pathway-encoded conserved key enzymes, domains, or motifs as bait. Several generations of highly sophisticated tools have been developed for the biosynthetic rule-based identification of natural product gene clusters. Apart from these hard-coded algorithms, multiple tools that use machine learning-based approaches have been designed to complement the existing genome mining tool set and focus on natural product gene clusters that lack genes with conserved signature sequences. In this perspective, we take a closer look at state-of-the-art genome mining tools that are based on either hard-coded rules or machine learning algorithms, with an emphasis on the confidence of their predictions and potential to identify non-canonical natural product biosynthetic gene clusters. We highlight the genome mining pipelines' current strengths and limitations by contrasting their advantages and disadvantages. Moreover, we introduce two indirect biosynthetic gene cluster identification strategies that complement current workflows. The combination of all genome mining approaches will pave the way towards a more comprehensive understanding of the full biosynthetic repertoire encoded in microbial genome sequences.
Article
Full-text available
Background Untargeted metabolomics approaches based on mass spectrometry obtain comprehensive profiles of complex biological samples. However, on average only 10% of the molecules can be annotated. This low annotation rate hampers biochemical interpretation and effective comparison of metabolomics studies. Furthermore, de novo structural characterization of mass spectral data remains a complicated and time-intensive process. Recently, the field of computational metabolomics has gained traction and novel methods have started to enable large-scale and reliable metabolite annotation. Molecular networking and machine learning-based in-silico annotation tools have been shown to greatly assist metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery. Aim of review We highlight recent advances in computational metabolite annotation workflows with a special focus on their evaluation and comparison with other tools. Whilst the progress is substantial and promising, we also argue that inconsistencies in benchmarking different tools hamper users from selecting the most appropriate and promising method for their research. We summarize benchmarking strategies of the different tools and outline several recommendations for benchmarking and comparing novel tools. Key scientific concepts of review This review focuses on recent advances in mass spectral library-based and machine learning-supported metabolite annotation workflows. We discuss large-scale library matching and analogue search, the current bloom of mass spectral similarity scores, and how molecular networking has changed the field. In addition, the potentials and challenges of machine learning-supported metabolite annotation workflows are highlighted. Overall, recent developments in computational metabolomics have started to fundamentally change metabolomics workflows, and we expect that as a community we will be able to overcome current method performance ambiguities and annotation bottlenecks.
Article
Full-text available
Microbial specialized metabolites are an important source of and inspiration for many pharmaceutical, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra to their corresponding previously experimentally validated biosynthetic genes (e.g. via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to combine NPOmix with MassQL for mining siderophores that can be reproduced by NPOmix users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.
Article
Full-text available
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosyn-thetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely char-acterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities , as well as protein domain selectivities. Together , these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
Article
Full-text available
Over recent decades, the pipeline of antibiotics acting against Gram‐negative bacteria is running dry, as most discovered candidate antibiotics suffer from insufficient potency, pharmacokinetic properties, or toxicity. The darobactins, a promising new small peptide class of drug candidates, bind to novel antibiotic target BamA, an outer membrane protein. Previously, we reported that biosynthetic engineering in a heterologous host generated novel darobactins with enhanced antibacterial activity. Here we utilize an optimized purification method and present cryo‐EM structures of the Bam complex with darobactin 9 (D9), which served as a blueprint for the biotechnological generation of twenty new darobactins including halogenated analogs. The newly engineered darobactin 22 binds more tightly to BamA and outperforms the favorable activity profile of D9 against clinically relevant pathogens such as carbapenem‐resistant Acinetobacter baumannii up to 32‐fold, without observing toxic effects.
Article
Full-text available
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a promising source of new antimicrobials in the face of rising antibiotic resistance. Here, we report a scalable platform that combines high-throughput bioinformatics with automated biosynthetic gene cluster refactoring for rapid evaluation of uncharacterized gene clusters. As a proof of concept, 96 RiPP gene clusters that originate from diverse bacterial phyla involving 383 biosynthetic genes are refactored in a high-throughput manner using a biological foundry with a success rate of 86%. Heterologous expression of all successfully refactored gene clusters in Escherichia coli enables the discovery of 30 compounds covering six RiPP classes: lanthipeptides, lasso peptides, graspetides, glycocins, linear azol(in)e-containing peptides, and thioamitides. A subset of the discovered lanthipeptides exhibit antibiotic activity, with one class II lanthipeptide showing low µM activity against Klebsiella pneumoniae, an ESKAPE pathogen. Overall, this work provides a robust platform for rapidly discovering RiPPs.
Article
Full-text available
Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS ² ) spectra. In an evaluation with 3,863 MS ² spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS ² dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds.
Article
Full-text available
Bacterial specialized metabolites are a proven source of antibiotics and cancer therapies, but whether we have sampled all the secondary metabolite chemical diversity of cultivated bacteria is not known. We analysed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes (MAGs) using a modified BiG-SLiCE and the new clust-o-matic algorithm. We estimate that only 3% of the natural products potentially encoded in bacterial genomes have been experimentally characterized. We show that the variation in secondary metabolite biosynthetic diversity drops significantly at the genus level, identifying it as an appropriate taxonomic rank for comparison. Equal comparison of genera based on relative evolutionary distance revealed that Streptomyces bacteria encode the largest biosynthetic diversity by far, with Amycolatopsis, Kutzneria and Micromonospora also encoding substantial diversity. Finally, we find that several less-well-studied taxa, such as Weeksellaceae (Bacteroidota), Myxococcaceae (Myxococcota), Pleurocapsa and Nostocaceae (Cyanobacteria), have potential to produce highly diverse sets of secondary metabolites that warrant further investigation. A comprehensive survey of secondary metabolites encoded in bacteria identifies large differences in biosynthetic diversity among genera and pinpoints those that can be targeted for novel chemistries provisionally suitable as antimicrobials.
Article
Full-text available
Few tools exist in natural products discovery to integrate biological screening and untargeted mass spectrometry data at the library scale. Previously, we reported Compound Activity Mapping as a strategy for predicting compound bioactivity profiles directly from primary screening results on extract libraries. We now present NP Analyst, an open online platform for Compound Activity Mapping that accepts bioassay data of almost any type, and is compatible with mass spectrometry data from major instrument manufacturers via the mzML format. In addition, NP Analyst will accept processed mass spectrometry data from the MZmine 2 and GNPS open-source platforms, making it a versatile tool for integration with existing discovery workflows. We demonstrate the utility of this new tool for both the dereplication of known compounds and the discovery of novel bioactive natural products using a challenging low-resolution antimicrobial bioassay data set. This new platform is available at www.npanalyst.org.
Article
Full-text available
The β-barrel assembly machinery (BAM) complex is an essential component of Escherichia coli that inserts and folds outer membrane proteins (OMPs). The natural antibiotic compound darobactin inhibits BamA, the central unit of BAM. Here, we employ dynamic single-molecule force spectroscopy (SMFS) to better understand the structure-function relationship of BamA and its inhibition by darobactin. The five N-terminal polypeptide transport (POTRA) domains show low mechanical, kinetic, and energetic stabilities. In contrast, the structural region linking the POTRA domains to the transmembrane β-barrel exposes the highest mechanical stiffness and lowest kinetic stability within BamA, thus indicating a mechano-functional role. Within the β-barrel, the four N-terminal β-hairpins H1–H4 expose the highest mechanical stabilities and stiffnesses, while the four C-terminal β-hairpins H5–H6 show lower stabilities and higher flexibilities. This asymmetry within the β-barrel suggests that substrates funneling into the lateral gate formed by β-hairpins H1 and H8 can force the flexible C-terminal β-hairpins to change conformations.
Article
Full-text available
Short-read sequencing of GC-rich genomes such as those from actinomycetes results in a fragmented genome assembly and truncated biosynthetic gene clusters (often 10 to >100 kb long), which hinders our ability to understand the biosynthetic potential of a given strain and predict the molecules that can be produced. The current study demonstrates that contiguous DNA assemblies, suitable for analysis of BGCs, can be obtained through low-coverage, multiplexed sequencing on Flongle, which provides a new low-cost workflow ($30 to 40 per strain) for sequencing actinomycete strain libraries.
Article
Full-text available
Within the natural products field there is an increasing emphasis on the study of compounds from microbial sources. This has been fuelled by interest in the central role that microorganisms play in mediating both interspecies interactions and host-microbe relationships. To support the study of natural products chemistry produced by microorganisms we released the Natural Products Atlas, a database of known microbial natural products structures, in 2019. This paper reports the release of a new version of the database which includes a full RESTful application programming interface (API), a new website framework, and an expanded database that includes 8128 new compounds, bringing the total to 32 552. In addition to these structural and content changes we have added full taxonomic descriptions for all microbial taxa and have added chemical ontology terms from both NP Classifier and ClassyFire. We have also performed manual curation to review all entries with incomplete configurational assignments and have integrated data from external resources, including CyanoMetDB. Finally, we have improved the user experience by updating the Overview dashboard and creating a dashboard for taxonomic origin. The database can be accessed via the new interactive website at https://www.npatlas.org.
Article
Full-text available
Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes.
Article
Full-text available
p>Microbial gene clusters encoding the biosynthesis of primary and secondary metabolites play key roles in shaping microbial ecosystems and driving microbiome-associated phenotypes. Although effective approaches exist to evaluate the metabolic potential of such bacteria through identification of these metabolic gene clusters in their genomes, no automated pipelines exist to profile the abundance and expression levels of such gene clusters in microbiome samples to generate hypotheses about their functional roles, and to find associations with phenotypes of interest. Here, we describe BiG-MAP, a bioinformatic tool to profile abundance and expression levels of gene clusters across metagenomic and metatranscriptomic data and evaluate their differential abundance and expression under different conditions. To illustrate its usefulness, we analyzed 96 metagenomic samples from healthy and caries-associated human oral microbiome samples and identified 252 gene clusters, including unreported ones, that were significantly more abundant in either phenotype. Among them, we found the muc operon, a gene cluster known to be associated with tooth decay. Additionally, we found a putative reuterin biosynthetic gene cluster from a Streptococcus strain to be enriched but not exclusively found in healthy samples; metabolomic data from the same samples showed masses with fragmentation patterns consistent with (poly)acrolein, which is known to spontaneously form from the products of the reuterin pathway and has been previously shown to inhibit pathogenic Streptococcus mutans strains. Thus, we show how BiG-MAP can be used to generate new hypotheses on potential drivers of microbiome-associated phenotypes and prioritize the experimental characterization of relevant gene clusters that may mediate them.</p
Article
Full-text available
Microbial specialized metabolites are key mediators in host-microbiome interactions. Most of the chemical space produced by the microbiome currently remains unexplored and uncharacterized. This situation calls for new and improved methods to exploit the growing publicly available genomic and metabolomic data sets and connect the outcomes to structural and functional knowledge inferred from transcriptomics and proteomics experiments. Here, we first describe currently available approaches that support the comprehensive mining of metabolomics and genomics data. Next, we provide our vision on how to move forward toward the automated linking of omics data of specialized metabolites to their structures, biosynthesis pathways, producers, and functions.
Article
Full-text available
This review highlights the key computational tools and emerging strategies for metabolite annotation, and discusses how these advances will enable integrated large-scale analysis to accelerate natural product discovery.
Article
Full-text available
Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.
Article
Full-text available
Genomics and metabolomics are widely used to explore specialized metabolite diversity. The Paired Omics Data Platform is a community initiative to systematically document links between metabolome and (meta)genome data, aiding the identification of natural product biosynthetic origins and metabolite structures.
Article
Full-text available
Genomics and metabolomics are widely used to explore specialized metabolite diversity. The Paired Omics Data Platform is a community initiative to systematically document links between metabolome and (meta)genome data, aiding identification of natural product biosynthetic origins and metabolite structures.
Article
Full-text available
Despite an excellent track record, microbial drug discovery suffers from high rates of rediscovery. Better workflows for the rapid investigation of complex extracts are needed to increase throughput and to allow early prioritization of samples. In addition, systematic characterization of poorly explored strains is seldomly performed. Here, we report a metabolomic study of 72 isolates belonging to the rare actinomycete genus Planomonospora, using a workflow of commonly used open access tools to investigate its secondary metabolites. The results reveal a correlation of chemical diversity and strain phylogeny, with classes of metabolites exclusive to certain phylogroups. We were able to identify previously reported Planomonospora metabolites, including the ureylene-containing oligopeptide antipain, the thiopeptide siomycin including new congeners, and the ribosomally synthesized peptides sphaericin and lantibiotic 97518. In addition, we found that Planomonospora strains can produce the siderophore desferrioxamine or a salinichelin-like peptide. Analysis of the genomes of three newly sequenced strains led to the detection of 59 gene cluster families, of which three were connected to products found by LC-MS/MS profiling. This study demonstrates the value of metabolomic studies to investigate poorly explored taxa and provides a first picture of the biosynthetic capabilities of the genus Planomonospora.
Article
Full-text available
Microbial natural products constitute a wide variety of chemical compounds, many which can have antibiotic, antiviral, or anticancer properties that make them interesting for clinical purposes. Natural product classes include polyketides (PKs), nonribosomal peptides (NRPs), and ribosomally synthesized and post-translationally modified peptides (RiPPs). While variants of biosynthetic gene clusters (BGCs) for known classes of natural products are easy to identify in genome sequences, BGCs for new compound classes escape attention. In particular, evidence is accumulating that for RiPPs, subclasses known thus far may only represent the tip of an iceberg. Here, we present decRiPPter (Data-driven Exploratory Class-independent RiPP TrackER), a RiPP genome mining algorithm aimed at the discovery of novel RiPP classes. DecRiPPter combines a Support Vector Machine (SVM) that identifies candidate RiPP precursors with pan-genomic analyses to identify which of these are encoded within operon-like structures that are part of the accessory genome of a genus. Subsequently, it prioritizes such regions based on the presence of new enzymology and based on patterns of gene cluster and precursor peptide conservation across species. We then applied decRiPPter to mine 1,295 Streptomyces genomes, which led to the identification of 42 new candidate RiPP families that could not be found by existing programs. One of these was studied further and elucidated as a representative of a novel subfamily of lanthipeptides, which we designate class V. The 2D structure of the new RiPP, which we name pristinin A3 ( 1 ), was solved using nuclear magnetic resonance (NMR), tandem mass spectrometry (MS/MS) data, and chemical labeling. Two previously unidentified modifying enzymes are proposed to create the hallmark lanthionine bridges. Taken together, our work highlights how novel natural product families can be discovered by methods going beyond sequence similarity searches to integrate multiple pathway discovery criteria.
Article
Full-text available
Novel antibiotics are urgently needed to address the looming global crisis of antibiotic resistance. Historically, the primary source of clinically used antibiotics has been microbial secondary metabolism. Microbial genome sequencing has revealed a plethora of uncharacterized natural antibiotics that remain to be discovered. However, the isolation of these molecules is hindered by the challenge of linking sequence information to the chemical structures of the encoded molecules. Here, we present PRISM 4, a comprehensive platform for prediction of the chemical structures of genomically encoded antibiotics, including all classes of bacterial antibiotics currently in clinical use. The accuracy of chemical structure prediction enables the development of machine-learning methods to predict the likely biological activity of encoded molecules. We apply PRISM 4 to chart secondary metabolite biosynthesis in a collection of over 10,000 bacterial genomes from both cultured isolates and metagenomic datasets, revealing thousands of encoded antibiotics. PRISM 4 is freely available as an interactive web application at http://prism.adapsyn.com.
Article
Full-text available
Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.
Article
Full-text available
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.
Article
Full-text available
Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery.
Article
Full-text available
This review is an updated and expanded version of the five prior reviews that were published in this journal in 1997, 2003, 2007, 2012, and 2016. For all approved therapeutic agents, the time frame has been extended to cover the almost 39 years from the first of January 1981 to the 30th of September 2019 for all diseases worldwide and from ∼1946 (earliest so far identified) to the 30th of September 2019 for all approved antitumor drugs worldwide. As in earlier reviews, only the first approval of any drug is counted, irrespective of how many “biosimilars” or added approvals were subsequently identified. As in the 2012 and 2016 reviews, we have continued to utilize our secondary subdivision of a “natural product mimic”, or “NM”, to join the original primary divisions, and the designation “natural product botanical”, or “NB”, to cover those botanical “defined mixtures” now recognized as drug entities by the FDA (and similar organizations). From the data presented in this review, the utilization of natural products and/or synthetic variations using their novel structures, in order to discover and develop the final drug entity, is still alive and well. For example, in the area of cancer, over the time frame from 1946 to 1980, of the 75 small molecules, 40, or 53.3%, are N or ND. In the 1981 to date time frame the equivalent figures for the N* compounds of the 185 small molecules are 62, or 33.5%, though to these can be added the 58 S* and S*/NMs, bringing the figure to 64.9%. In other areas, the influence of natural product structures is quite marked with, as expected from prior information, the anti-infective area being dependent on natural products and their structures, though as can be seen in the review there are still disease areas (shown in Table 2) for which there are no drugs derived from natural products. Although combinatorial chemistry techniques have succeeded as methods of optimizing structures and have been used very successfully in the optimization of many recently approved agents, we are still able to identify only two de novo combinatorial compounds (one of which is a little speculative) approved as drugs in this 39-year time frame, though there is also one drug that was developed using the “fragment-binding methodology” and approved in 2012. We have also added a discussion of candidate drug entities currently in clinical trials as “warheads” and some very interesting preliminary reports on sources of novel antibiotics from Nature due to the absolute requirement for new agents to combat plasmid-borne resistance genes now in the general populace. We continue to draw the attention of readers to the recognition that a significant number of natural product drugs/leads are actually produced by microbes and/or microbial interactions with the “host from whence it was isolated”; thus we consider that this area of natural product research should be expanded significantly.
Article
Full-text available
Significance Natural products form the basis for most drugs in clinical use. Advances in genome sequencing and bioinformatic tools have revealed thousands of biosynthetic gene clusters encoding these products. However, linking natural products identified by genome mining to their corresponding products in untargeted metabolomics data remains a key challenge. Here we present a platform, DeepRiPP, which integrates genomic and metabolomic data to automate the discovery of new ribosomally synthesized posttranslationally modified peptides (RiPPs), a subclass of natural products with diverse chemistry and activities. We apply DeepRiPP to discover 3 novel RiPPs.
Article
Full-text available
Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the ‘biosynthetic gene similarity clustering and prospecting engine’ (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the ‘core analysis of syntenic orthologues to prioritize natural product gene clusters’ (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues. Two bioinformatic tools, BiG-SCAPE and CORASON, enable sequence similarity network and phylogenetic analysis of gene clusters and their families across hundreds of strains and in large datasets, leading to the discovery of new natural products.
Article
Full-text available
The current need for novel antibiotics is especially acute for drug-resistant Gram-negative pathogens1,2. These microorganisms have a highly restrictive permeability barrier, which limits the penetration of most compounds3,4. As a result, the last class of antibiotics acting against Gram-negative bacteria was developed in the 1960s². We reason that useful compounds can be found in bacteria that share similar requirements for antibiotics with humans, and focus on Photorhabdus symbionts of entomopathogenic nematode microbiomes. Here we report a new antibiotic that we name darobactin, from a screen of Photorhabdus isolates. Darobactin is coded by a silent operon with little production under laboratory conditions, and is ribosomally synthesized. Darobactin has an unusual structure with two fused rings that form post-translationally. The compound is active against important Gram-negative pathogens both in vitro and in animal models of infection. Mutants resistant to darobactin map to BamA, an essential chaperone and translocator that folds outer membrane proteins. Our study suggests that bacterial symbionts of animals contain antibiotics that are particularly suitable for development into therapeutics.
Article
Full-text available
Significant progress has been made in the past few years on the computational identification of biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. Current machine learning tools have limitations, since they are specific to the RiPPclass they are trained for and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network archictectures that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP was able to identify PP sequences in significantly more putative RiPP clusters than current tools while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that were recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.
Article
Full-text available
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
Article
Full-text available
Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, in silico annotation tools obtain and rank candidate molecules for fragmentation spectra. Ideally, all structural information obtained and inferred from these computational tools could be combined to increase the resulting chemical insight one can obtain from a data set. However, integration is currently hampered as each tool has its own output format and efficient matching of data across these tools is lacking. Here, we introduce MolNetEnhancer, a workflow that combines the outputs from molecular networking, MS2LDA, in silico annotation tools (such as Network Annotation Propagation or DEREPLICATOR), and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. We present examples from four plant and bacterial case studies and show how MolNetEnhancer enables the chemical annotation, visualization, and discovery of the subtle substructural diversity within molecular families. We conclude that MolNetEnhancer is a useful tool that greatly assists the metabolomics researcher in deciphering the metabolome through combination of multiple independent in silico pipelines.
Article
Darobactins bind to the novel antibiotic target BamA in the outer membrane BAM‐complex of Gram‐negative pathogens. The renewal of the outer membrane proteins is prevented and bacteria are lysed. The new derivative D22 was created through cryo‐EM structure‐ and activity‐guided biosynthetic engineering as described by Thomas Marlovits, Jennifer Herrmann, Rolf Müller et al. in their Research Article (DOI: 10.1002/ange.202214094), showing superior potency against multi‐drug resistant clinical isolates, for which most antibiotics on the market are no longer effective.
Article
Biomacromolecules are known to feature complex three‐dimensional shapes that are essential for their function. Among natural products, ambiguous molecular shapes are a rare phenomenon. The hexapeptide tryptorubin A can adopt one of two unusual atropisomeric configurations. Initially hypothesized to be a non‐ribosomal peptide, we show that tryptorubin A is the first characterized member of a new family of ribosomally synthesized and posttranslationally modified peptides (RiPPs) that we named atropopeptides. The sole modifying enzyme encoded in the gene cluster, a cytochrome P450 monooxygenase, is responsible for the atropospecific formation of one carbon–carbon and two carbon–nitrogen bonds. The characterization of two additional atropopeptide biosynthetic pathways revealed a two‐step maturation process. Atropopeptides promote pro‐angiogenic cell functions as indicated by an increase in endothelial cell proliferation and undirected migration. Our study expands the biochemical space of RiPP‐modifying enzymes and paves the way towards the chemoenzymatic utilization of atropopeptide‐modifying P450s.
Article
Biomacromolecules are known to feature complex three‐dimensional shapes that are essential for their function. Among natural products, ambiguous molecular shapes are a rare phenomenon. The hexapeptide tryptorubin A can adopt one of two unusual atropisomeric configurations. Initially hypothesized to be a non‐ribosomal peptide, we show that tryptorubin A is the first characterized member of a new family of ribosomally synthesized and posttranslationally modified peptides (RiPPs) that we named atropopeptides. The sole modifying enzyme encoded in the gene cluster, a cytochrome P450 monooxygenase, is responsible for the atropospecific formation of one carbon–carbon and two carbon–nitrogen bonds. The characterization of two additional atropopeptide biosynthetic pathways revealed a two‐step maturation process. Atropopeptides promote pro‐angiogenic cell functions as indicated by an increase in endothelial cell proliferation and undirected migration. Our study expands the biochemical space of RiPP‐modifying enzymes and paves the way towards the chemoenzymatic utilization of atropopeptide‐modifying P450s.
Article
Organisms in nature have evolved into proficient synthetic chemists, utilizing specialized enzymatic machinery to biosynthesize an inspiring diversity of secondary metabolites. Often serving to boost competitive advantage for their producers, these secondary metabolites have widespread human impacts as antibiotics, antiinflammatories, and antifungal drugs. The natural products discovery field has begun a shift away from traditional activity-guided approaches and is beginning to take advantage of increasingly available metabolomics and genomics datasets to explore undiscovered chemical space. Major strides have been made and now enable -omics-informed prioritization of chemical structures for discovery, including the prospect of confidently linking metabolites to their biosynthetic pathways. Over the last decade, more integrated strategies now provide researchers with pipelines for simultaneous identification of expressed secondary metabolites and their biosynthetic machinery. However, continuous collaboration by the natural products community will be required to optimize strategies for effective evaluation of natural product biosynthetic gene clusters to accelerate discovery efforts. Here, we provide an evaluative guide to scientific literature as it relates to studying natural product biosynthesis using genomics, metabolomics, and their integrated datasets. Particular emphasis is placed on the unique insights that can be gained from large-scale integrated strategies, and we provide source organism-specific considerations to evaluate the gaps in our current knowledge
Article
All organisms produce specialized organic molecules, ranging from small volatile chemicals to large gene-encoded peptides, that have evolved to provide them with diverse cellular and ecological functions. As natural products, they are broadly applied in medicine, agriculture and nutrition. The rapid accumulation of genomic information has revealed that the metabolic capacity of virtually all organisms is vastly underappreciated. Pioneered mainly in bacteria and fungi, genome mining technologies are accelerating metabolite discovery. Recent efforts are now being expanded to all life forms, including protists, plants and animals, and new integrative omics technologies are enabling the increasingly effective mining of this molecular diversity.
Article
Microbial natural products impress by their bioactivity, structural diversity, and ingenious biosynthesis. While screening the less exploited actinobacterial genus Planomonospora, two cyclopeptides were discovered, featuring an unusual Tyr-His biaryl bridging across a tripeptide scaffold, with the sequences N-acetyl-Tyr-Tyr-His and N-acetyl-Tyr-Phe-His. Planomonospora genomes pointed toward a ribosomal synthesis of the cyclopeptide from a pentapeptide precursor encoded by 18-bp bytA, to our knowledge the smallest coding gene ever reported. Closely linked to bytA is bytO, encoding a cytochrome P450 monooxygenase likely responsible for biaryl installment. In Streptomyces, the bytAO segment was sufficient to direct production of the crosslinked N-acetylated Tyr-Tyr-His tripeptide. Bioinformatic analysis of related cytochrome P450 monooxygenases indicated that they constitute a widespread family of enzymes, and the corresponding genes are closely linked to 5-amino acid coding sequences in approximately 200 (actino)bacterial genomes, all with potential for biaryl linkage between amino acids 1 and 3. We propose the named biarylitides this family of RiPPs.
Article
Covering: up to June 2020 Ribosomally-synthesized and post-translationally modified peptides (RiPPs) are a large group of natural products. A community-driven review in 2013 described the emerging commonalities in the biosynthesis of RiPPs and the opportunities they offered for bioengineering and genome mining. Since then, the field has seen tremendous advances in understanding of the mechanisms by which nature assembles these compounds, in engineering their biosynthetic machinery for a wide range of applications, and in the discovery of entirely new RiPP families using bioinformatic tools developed specifically for this compound class. The First International Conference on RiPPs was held in 2019, and the meeting participants assembled the current review describing new developments since 2013. The review discusses the new classes of RiPPs that have been discovered, the advances in our understanding of the installation of both primary and secondary post-translational modifications, and the mechanisms by which the enzymes recognize the leader peptides in their substrates. In addition, genome mining tools used for RiPP discovery are discussed as well as various strategies for RiPP engineering. An outlook section presents directions for future research.
Article
The toxicity of the cyanobacterium Microcystis aeruginosa EAWAG 127a was evaluated against the sensitive grazer Thamnocephalus platyurus, and the extract possessed strong activity. To investigate the compounds responsible for cytotoxicity, a series of peptides from this cyanobacterium were studied using a combined genomic and molecular networking approach. The results led to the isolation, structure elucidation, and biological evaluation of microviridin 1777, which represents the most potent chymotrypsin inhibitor characterized from this family of peptides to date. Furthermore, the biosynthetic gene clusters of microviridin, anabaenopeptin, aeruginosin, and piricyclamide were located in the producing organism, and six additional natural products were identified by tandem mass spectrometry analyses. These results highlight the potential of modern techniques for the identification of natural products, demonstrate the ecological role of protease inhibitors produced by cyanobacteria, and raise ramifications concerning the presence of novel, yet uncharacterized, toxin families in cyanobacteria beyond microcystin.
Article
Microbial secondary metabolism is a reservoir of bioactive compounds of immense biotechnological and biomedical potential. The biosynthetic machinery responsible for the production of these secondary metabolites (SMs) (also called natural products) is often encoded by collocated groups of genes called biosynthetic gene clusters (BGCs). High-throughput genome sequencing of both isolates and metagenomic samples combined with the development of specialized computational workflows is enabling systematic identification of BGCs and the discovery of novel SMs. In order to advance exploration of microbial secondary metabolism and its diversity, we developed the largest publicly available database of predicted BGCs combined with experimentally verified BGCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc-public). Here we describe the first major content update of the IMG-ABC knowledgebase, since its initial release in 2015, refreshing the BGC prediction pipeline with the latest version of antiSMASH (v5) as well as presenting the data in the context of underlying environmental metadata sourced from GOLD (https://gold.jgi.doe.gov/). This update has greatly improved the quality and expanded the types of predicted BGCs compared to the previous version.
Article
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are an important class of natural products that contain antibiotics and a variety of other bioactive compounds. The existing methods for discovery of RiPPs by combining genome mining and computational mass spectrometry are limited to discovering specific classes of RiPPs from small datasets, and these methods fail to handle unknown post-translational modifications. Here, we present MetaMiner, a software tool for addressing these challenges that is compatible with large-scale screening platforms for natural product discovery. After searching millions of spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure against just eight genomic and metagenomic datasets, MetaMiner discovered 31 known and seven unknown RiPPs from diverse microbial communities, including human microbiome and lichen microbiome, and microorganisms isolated from the International Space Station.