Christoph Steinbeck

Christoph Steinbeck
Friedrich Schiller University Jena | FSU · Department of Inorganic and Analytical Chemistry

Professor

About

326
Publications
108,471
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,152
Citations
Introduction
Christoph Steinbeck is Professor for Analytical Chemistry, Cheminformatics and Chemometrics at the Friedrich-Schiller-University in Jena, Germany. His research interests are the computer-assisted structure elucidation of natural products and computational metabolomics. Over the course of his career, Christoph Steinbeck was founding editor-in-chief of the Journal of Cheminformatics, a director of the Metabolomics Society, chairman of the Computers-Information-Chemistry (CIC) division of the German Chemical Society, and established the German Conference on Cheminformatics. Christoph is a lifetime member of the World Association of Theoretically Oriented Chemists (WATOC), a member of the Metabolomics Society, the German Chemical Society, as well as of various editorial boards and committees.
Additional affiliations
January 2008 - present
EMBL-EBI
Position
  • Head of Cheminformatics and Metabolism
January 2008 - present
European Bioinformatics Institute (EMBL-EBI)
Position
  • EMBL-EBI
January 2004 - December 2007
University of Cologne
Education
November 1992 - December 1995
University of Bonn
Field of study
  • Natural Products, Cheminformatics
October 1986 - October 1992
University of Bonn
Field of study
  • Chemistry

Publications

Publications (326)
Article
Full-text available
The COCONUT (COlleCtion of Open Natural prodUcTs) database was launched in 2021 as an aggregation of openly available natural product datasets and has been one of the biggest open natural product databases since. Apart from the chemical structures of natural products, COCONUT contains information about names and synonyms, species and organism parts...
Poster
Full-text available
Developing computational algorithms for extracting specific substructures from molecular graphs (in silico molecule fragmentation) involves repeated sequences of implementing a rule set, applying it to relevant structures, inspecting the results, and adjusting the algorithm. The open MORTAR (MOlecule fRagmenTAtion fRamework) Java rich client applic...
Preprint
Full-text available
The COCONUT (COlleCtion of Open Natural prodUcTs) database was launched in 2021 as an aggregation of openly available natural product datasets and has been one of the biggest open natural product databases since. Apart from the chemical structures of natural products, COCONUT contains information about names and synonyms, species, and organism part...
Article
Full-text available
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration s...
Article
Full-text available
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition...
Article
Full-text available
The progress of the DFG-funded NFDI4Chem consortium (NFDI 4/1 - project number 441958208) in data management in chemistry is outlined in our latest report, highlighting the steps we have taken to integrate a data-centric approach within the chemistry community. This interim report offers a comprehensive overview of our data management activities, c...
Preprint
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information found in traditional laboratory notebooks or for facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure R...
Article
Full-text available
Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the...
Preprint
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration s...
Article
Full-text available
Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communicatio...
Preprint
Full-text available
In marine ecosystems, microbial communities often interact using specialised metabolites, which play a central role in shaping the dynamics of the ecological networks and maintaining the balance of the ecosystem. With metabolomics and transcriptomics analyses, this study explores the interactions between two marine microalgae, Skeletonema marinoi a...
Article
In October 2003, 20 years ago, the open‐source and open‐content database NMRshiftDB was announced. Since then, the database, renamed as nmrshiftdb2 later, has been continuously available and is one of the longer‐running projects in the field of open data in chemistry. After 20 years, we evaluate the success of the project and present lessons learnt...
Article
Full-text available
Diatoms (Bacillariophyceae) are aquatic photosynthetic microalgae with an ecological role as primary producers in the aquatic food web. They account substantially for global carbon, nitrogen, and silicon cycling. Elucidating the chemical space of diatoms is crucial to understanding their physiology and ecology. To expand the known chemical space of...
Article
Full-text available
In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the...
Preprint
Full-text available
In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the...
Preprint
Full-text available
Scientific workflows facilitate the automation of different data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency, accessibility, and reusability in workflows, it is essential to implement the 17 FAIR principles as much as possible. To do so, the research data management community has s...
Article
Full-text available
The collection of metadata for research data is an important aspect in the FAIR principles. The schema.org and Bioschemas initiatives created a vocabulary to embed markup for many different types, including BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and others relevant in the Natural and Life Sciences with immediate benefits...
Article
Full-text available
The Chemistry consortium NFDI4Chem aims to digitalise key steps in chemical research, supporting scientists in managing research data throughout its life cycle. The SmartLab, embedded in a federation of services, integrates various tools such as electronic lab notebooks, data repositories, and search services, to create a smart lab environment for...
Preprint
Full-text available
Diatoms (Bacillariophyceae) are aquatic photosynthetic microalgae with an ecological role as primary producers in the aquatic food web. They account substantially for global carbon, nitrogen, and silicon cycling. Elucidating the chemical space of diatoms is crucial to understanding their physiology and ecology. To expand the known chemical space of...
Article
Full-text available
The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information extraction in a way that requires less manual intervent...
Thesis
Full-text available
Computational methodologies extracting specific substructures like functional groups or molecular scaffolds from input molecules can be grouped under the term “in silico molecule fragmentation”. They can be used to investigate what specifically characterises a heterogeneous compound class, like pharmaceuticals or Natural Products (NP) and in which...
Preprint
Full-text available
In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present Chem...
Poster
Full-text available
MORTAR (MOlecule fRagmenTation fRamework) [1] is an open-source client application designed to facilitate molecular fragmentation and substructure analysis workflows. No programming skills are required to perform in silico fragmentation studies with MORTAR, as its graphical features allow the visualisation of fragmentation results for individual co...
Poster
Full-text available
Bottom-up variants of Dissipative Particle Dynamics (DPD), where particles can be defined as small molecules with a molecular weight in the order of 100 Daltons, allow the study of large (bio)molecular systems and supramolecular phenomena on the nanometre length and microsecond time scale. The conservative interaction between two DPD particles i an...
Article
Full-text available
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC–MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence....
Article
Full-text available
The influence of molecular fragmentation and parameter settings on a mesoscopic dissipative particle dynamics (DPD) simulation of lamellar bilayer formation for a C10E4/water mixture is studied. A “bottom-up” decomposition of C10E4 into the smallest fragment molecules (particles) that satisfy chemical intuition leads to convincing simulation result...
Preprint
Full-text available
The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information extraction in a way that requires less manual intervent...
Article
Full-text available
Recent years have seen a sharp increase in the development of deep learning and artificial intelligence-based molecular informatics. There has been a growing interest in applying deep learning to several subfields, including the digital transformation of synthetic chemistry, extraction of chemical information from the scientific literature, and AI...
Article
Full-text available
The structure elucidation of small organic molecules (<1500 Dalton) through 1D and 2D nuclear magnetic resonance (NMR) data analysis is a potentially challenging, combinatorial problem. This publication presents Sherlock, a free and open-source Computer-Assisted Structure Elucidation (CASE) software where the user controls the chain of elementary o...
Article
Full-text available
Research data provide evidence for the validation of scientific hypotheses in most areas of science. Open access to them is the basis for true peer review of scientific results and publications. Hence, research data are at the heart of the scientific method as a whole. The value of openly sharing research data has by now been recognized by scientis...
Preprint
Full-text available
Recent years have seen a sharp increase in the development of deep learning and artificial intelligence-based molecular informatics. There has been a growing interest in applying deep learning to several subfields, including the digital transformation of synthetic chemistry, extraction of chemical information from the scientific literature, and AI...
Article
Full-text available
Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computa...
Article
Full-text available
Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in co...
Preprint
Full-text available
Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computa...
Article
Full-text available
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of m...
Article
Full-text available
Research data management (RDM) is needed to assist experimental advances and data collection in the chemical sciences. Many funders require RDM because experiments are often paid for by taxpayers and the resulting data should be deposited sustainably for posterity. However, paper notebooks are still common in laboratories and research data is often...
Article
Full-text available
Forschungsdatenmanagement (FDM) ist erforderlich, um wissenschaftlichen Fortschritt und das Sammeln von Daten zu fördern. Viele Fördergeldgeber verlangen FDM, da Forschung oft durch Steuergelder finanziert wird und daraus resultierende Daten nachhaltig für kommende Generationen hinterlegt werden sollten. In heutigen Laboren sind Papier‐Laborbücher...
Preprint
Full-text available
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of m...
Preprint
Full-text available
Recent years have seen a sharp increase in the development of deep learning and artificial intelligence-based molecular informatics. The success of AlphaFold led to a growing interest in applying deep learning to a number of subfields, including the digital transformation of synthetic chemistry, extraction of chemical information from the scientifi...
Preprint
Full-text available
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted Liquid Chromatography Mass Spectrometry (LCMS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence....
Preprint
Full-text available
Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computa...
Preprint
Full-text available
Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in co...
Preprint
Full-text available
The increasing volumes of data produced by high-throughput instruments coupled with advanced computational infrastructures for scientific computing have enabled what is often called a {\em Fourth Paradigm} for scientific research based on the exploration of large datasets. Current scientific research is often interdisciplinary, making data integrat...
Article
Full-text available
Different charge treatment approaches are examined for cyclotide-induced plasma membrane disruption by lipid extraction studied with dissipative particle dynamics. A pure Coulomb approach with truncated forces tuned to avoid individual strong ion pairing still reveals hidden statistical pairing effects that may lead to artificial membrane stabiliza...
Article
Full-text available
The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In t...
Article
Full-text available
The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still...
Article
Full-text available
Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for...
Poster
Full-text available
Glycosidic moieties are a common feature in natural product (NP) structures. They have been detected in 12 % of structures in the open NP database COCONUT [1]. While sugar units can be important for NP pharmacokinetic activities in some cases, they can also obstruct the analysis of the aglycone (molecule core without the glycoside) in cheminformati...
Poster
Full-text available
With MORTAR (MOlecule fRagmenTation fRamework) we present an open software project that supports workflows for molecular fragmentation and substructure analysis. MORTAR offers graphical functions for visualising the fragmentation results of individual compounds or entire compound sets. With several different views and analysis functions, MORTAR sup...
Article
Full-text available
Chemical structure generators are used in cheminformatics to produce or enumerate virtual molecules based on a set of boundary conditions. The result can then be tested for properties of interest, such as adherence to measured data or for their suitability as drugs. The starting point can be a potentially fuzzy set of fragments or a molecular formu...
Preprint
Full-text available
The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still...
Preprint
Full-text available
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of m...
Preprint
Full-text available
The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In t...
Article
Full-text available
Diatoms (Bacillariophyceae) are a major constituent of the phytoplankton and have a universally recognized ecological importance. Between 1,000 and 1,300 diatom genera have been described in the literature, but only 10 nuclear genomes have been published and made available to the public up to date. Skeletonema costatum is a cosmopolitan marine diat...
Article
Full-text available
The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemi...
Preprint
Full-text available
Chemical structure generators are used in cheminformatics to produce or enumerate virtual molecules based on a set of boundary conditions. The result can then be tested for properties of interest, such as adherence to measured data or for their suitability as drugs. The starting point can be a potentially fuzzy set of fragments or a molecular formu...
Article
Full-text available
The open rich-client Molecule Set Comparator (MSC) application enables a versatile and fast comparison of large molecule sets with a unique inter-set molecule-to-molecule mapping obtained e.g. by molecular-recognition-oriented machine learning approaches. The molecule-to-molecule comparison is based on chemical descriptors obtained with the Chemist...
Presentation
Full-text available
With the recent explosion of information, Natural Products (NP) research critically needs efficient ways to access and share knowledge, also to save precious knowledge being lost [1]. The reporting and sharing of NP occurrences in biological organisms are relevant to numerous scientific fields ranging from drug discovery to chemical ecology or chem...
Preprint
Full-text available
The chemical graph theory is a subfield of mathematical chemistry which applies classic graph theory to chemical entities and phenomena. Chemical graphs are main data structures to represent chemical structures in cheminformatics. Computable properties of graphs lay the foundation for (quantitative) structure activity and structure property predict...
Preprint
Full-text available
The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemi...
Preprint
Full-text available
The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemi...
Preprint
Full-text available
The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemi...
Preprint
The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemi...
Article
Full-text available
Background The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating...
Article
Full-text available
Sweet dessert watermelon (Citrullus lanatus) is one of the most important vegetable crops consumed throughout the world. The chemical composition of watermelon provides both high nutritional value and various health benefits. The present manuscript introduces a catalog of 1,679 small molecules occurring in the watermelon and their cheminformatics a...