About
92
Publications
40,339
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,635
Citations
Introduction
You can find more about me at https://www.linkedin.com/in/stefan-kuhn-756bb74
Skills and Expertise
Current institution
Publications
Publications (92)
In this first of a two-part series, we introduce the concept of a FAIRSpec-ready spectroscopic data collection – that is, a collection of instrument data, chemical structure representations, and related digital items that is ready to be automatically or semi-automatically extracted for metadata that will allow the production of an IUPAC FAIRSpec Fi...
A method for data review in chemical sciences with a focus on data for the characterization of synthetic molecules is described. As current procedures for data curation in chemistry rely almost exclusively on manual checking or peer reviewing, a (semi-)automatic procedure for the evaluation of data assigned to molecular structures is proposed and d...
This review explores the current applications of artificial intelligence (AI) in nuclear magnetic resonance (NMR) spectroscopy, with a particular emphasis on small molecule chemistry. Applications of AI techniques, especially machine learning (ML) and deep learning (DL) in the areas of shift prediction, spectral simulations, spectral processing, st...
Cluster machines are gaining importance, for example in high-performance computing and eScience. For this, programs need to be parallelized and run with appropriate tools, typically MPI (Message Passing Interface) in a scientific context. Since writing programs for parallel computation is significantly more di cult than programming for sequential e...
Electronic health records (EHRs) are a critical tool in healthcare and capture a wide array of patient information that can inform clinical decision-making. However, the sheer volume and complexity of EHR data present challenges for healthcare providers, particularly in fast-paced environments such as intensive care units (ICUs). To address this pr...
A method for data review in chemical sciences with a focus on data for the characterization of synthetic molecules is described. As current procedures for data curation in chemistry rely almost exclusively on manual checking or peer reviewing, a (semi-)automatic procedure for the evaluation of data assigned to molecular structures is proposed and d...
In October 2003, 20 years ago, the open‐source and open‐content database NMRshiftDB was announced. Since then, the database, renamed as nmrshiftdb2 later, has been continuously available and is one of the longer‐running projects in the field of open data in chemistry. After 20 years, we evaluate the success of the project and present lessons learnt...
Prediction of chemical shift in NMR using machine learning methods is typically done with the maximum amount of data available to achieve the best results. In some cases, such large amounts of data are not available, e.g. for heteronuclei. We demonstrate a novel machine learning model that is able to achieve better results than other models for rel...
The detection of intrusions in computer networks, known as Network-Intrusion-Detection Systems (NIDSs), is a critical field in network security. Researchers have explored various methods to design NIDSs with improved accuracy, prevention measures, and faster anomaly identification. Safeguarding computer systems by quickly identifying external intru...
Recently, new technologies have been developed that allow physical and virtual space to converge. We reviewed a range of innovative technologies that enable immersive and 3D interaction, which we believe are of particular interest to apply Universal Design for Learning (UDL) principles in teaching and learning practices, with a particular interest...
[This corrects the article DOI: 10.1021/acsomega.9b00488.].
Prediction of chemical shift in NMR using machine learning methods is typically done with the maximum amount of data available to achieve the best results. In some cases, such large amounts of data are not available, e.g. for heteronuclei. We demonstrate a novel machine learning model which is able to achieve good results with comparatively low amo...
This paper presents a proof-of-concept method for classifying chemical compounds directly from NMR data without performing structure elucidation. This can help to reduce the time in finding good structure candidates, as in most cases matching must be done by a human engineer, or at the very least a process for matching must be meaningfully interpre...
The present article reports the creation and usage of a general natural product database for the structural dereplication of natural products. This database, acd_lotusv7, derives from the LOTUS natural products database as the sole source of chemical structures. Database construction also relies on the commercial "ACD/C + H Predictors and DB" softw...
This paper presents a proof-of-concept method for classifying chemical compounds directly from NMR data without doing structure elucidation. This can help to reduce time in finding good structure candidates, as in most cases matching must be done by a human engineer, or at the very least a process for matching must be meaningfully interpreted by on...
Introduction:
Data Fusion-based Discovery (DAFdiscovery) is a pipeline designed to help users combine mass spectrometry (MS), nuclear magnetic resonance (NMR), and bioactivity data in a notebook-based application to accelerate annotation and discovery of bioactive compounds. It applies Statistical Total Correlation Spectroscopy (STOCSY) and Statis...
The Covid-19 pandemic has led to the adoption of face masks in physical teaching spaces across the world. This has in-turn presented a number of challenges for practitioners in the face-to-face delivery of content and in effectively engaging learners in practical settings, where face coverings are an ongoing requirement. Being unable to identify th...
We demonstrate that particle swarm optimisation (PSO) can be used to solve a variety of problems arising during operation of a digital inspection microscope. This is a use case for the feasibility of heuristics in a real-world product. We show solutions to four measurement problems, all based on PSO. This allows for a compact software implementatio...
Research data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine...
DAFdiscovery is a pipeline designed to help users combine NMR, MS and bioactivity data in a notebook-based application to accelerate annotation and discovery of bioactive compounds. It applies Statistical Total Correlation (STOCSY) and Statistical HeteroSpectroscopy (SHY) calculation in their data using an easy-to-follow Jupyter Notebook. Different...
DAFdiscovery is a pipeline designed to help users combine NMR, MS and bioactivity data in a notebook-based application to accelerate annotation and discovery of bioactive compounds. It applies Statistical Total Correlation (STOCSY) and Statistical HeteroSpectroscopy (SHY) calculation in their data using an easy-to-follow Jupyter Notebook. Different...
The COVID-19 pandemic caused a shift in teaching practice towards blended learning for many higher education institutions. This led to the rapid adoption of certain digital technologies within existing teaching structures as a means to meet student access needs. This paper is an attempt to summarise and extend pre-COVID-19 pedagogical research to l...
Syzygium malaccense (L.) Merr. & L.M. Perry is a native tree to Malaysia, but also occurs in other tropical regions of the world, including Brazil. The increasing interest in the consumption of its leaves motivated the investigation of compounds of the plant. Metabolite profiling of S. malaccense leaves was achieved by high-speed countercurrent chr...
We have demonstrated in previous work that the Calculus of Covalent Bonding (CCB) can be used to simulate higher-level biochemical processes. This is significant since CCB was originally devised to model lower level organic chemical reactions. In this paper we extend the use of the calculus to model an important gene repair pathway, namely DNA Mism...
The Covid-19 pandemic caused a shift in teaching practice towards blended learning for many Higher Education institutions. This led to the rapid adoption of certain digital technologies within existing teaching structures as a means to meet student access needs and facilitate learning. Integration of these technologies caused numerous challenges fo...
Estimations of accurate and reliable NMR chemical shift values, coupling patterns and constants within a reasonable timeframe remains significantly challenging, and the unavailability of reliable software strategies for the prediction of low‐field (e.g., 60 MHz) spectra from those acquired at higher operating frequencies hampers their direct compar...
Calculation of solution‐state NMR parameters, including chemical shift values and scalar coupling constants, is often a crucial step for unambiguous structure assignment. Datadriven (sometimes called “empirical”) methods leverage databases of known parameter values to estimate parameters for unknown or novel molecules. This is in contrast to popula...
This paper presents a proof of concept of a method to identify substructures in 2D NMR spectra of mixtures using a bespoke image‐based Convolutional Neural Network application. This is done using HSQC and HMBC spectra separately and in combination. The application can reliably detect substructures in pure compounds, using a simple network. Results...
Training effective simulation scenarios presents numerous challenges from a pedagogical point of view. Through application of the Conceptual Framework for e-Learning and Training (COFELET) as a pattern for designing serious games, we propose the use of the Simulated Critical Infrastructure Protection Scenarios (SCIPS) platform as a prospective tool...
Classical 1D 1H NMR spectra are prototypic for NMR spectroscopy in that they represent a wealth of chemical information encoded into convoluted graphs or patterns that contain complex features (aka multiplets), even for seemingly simple molecules. Accordingly, the utility of NMR depends on the theoretical and visual skills required to extract all t...
Classical 1D 1H NMR spectra are prototypic for NMR spectroscopy in that they represent a wealth of chemical information encoded into convoluted graphs or patterns that contain complex features (aka multiplets), even for seemingly simple molecules. Accordingly, the utility of NMR depends on the theoretical and visual skills required to extract all t...
In this paper the reversibility of executable Interval Temporal Logic (ITL) specifications is investigated. ITL allows for the reasoning about systems in terms of behaviours which are represented as non-empty sequences of states. It allows for the specification of systems at different levels of abstraction. At a high level this specification is in...
The lack of machine‐readable data is a major obstacle in the application of NMR in artificial intelligence. As a way to overcome this, a procedure for capturing primary NMR Spectroscopic instrumental data annotated with rich metadata and publication in a FAIR data repository is described as part of an undergraduate student laboratory experiment in...
In this paper the reversibility of executable Interval Temporal Logic (ITL) specifications is investigated. ITL allows for the reasoning about systems in terms of behaviours which are represented as non-empty sequences of states. It allows for the specification of systems at different levels of abstraction. At a high level this specification is in...
This article proposes a framework that automatically designs classifiers for the early detection of COVID-19 from chest X-ray images. To do this, our approach repeatedly makes use of a heuristic for optimisation to efficiently find the best combination of the hyperparameters of a convolutional deep learning model. The framework starts with optimisi...
This paper presents a method to identify substructures in NMR spectra of mixtures, specifically 2D spectra, using a bespoke image-based Convolutional Neural Network application. This is done using HSQC and HMBC spectra separately and in combination. The application can reliably detect substructures in pure compounds, using a simple network. It can...
The NMReDATA format has been proposed as a way to store, exchange, and to disseminate NMR data and physical and chemical metadata of chemical compounds. In this paper we report on analytical workflows that take advantage of the uniform and standardized NMReDATA format. We also give access to a repository of sample data, which can serve for validati...
The role and importance of the identification of natural products are discussed in the perspective of the study of secondary metabolites. The rapid identification of already reported compounds , or structural dereplication, is recognized as a key element in natural product chemistry. The biological taxonomy of metabolite producing organisms, the kn...
Introduction
Metabolomics is the approach of choice to guide the understanding of biological systems and its molecular intricacies, but compound identification is yet a bottleneck to be overcome.
Objective
To assay the use of NMRfilter for confidence compound identification based on chemical shift predictions for different datasets.
Results
We...
This study combines two novel deterministic methods with a Convolutional Neural Network to develop a machine learning method that is aware of directionality of light in images. The first method detects shadows in terrestrial images by using a sliding-window algorithm that extracts specific hue and value features in an image. The second method inter...
The issue of detecting improvised explosive devices, henceforth IEDs, in rural or built-up urban environments is a persistent and serious concern for governments in the developing world. In many cases, such devices are plastic, or varied metallic objects containing rudimentary explosives, which are not visible to the naked eye and are difficult to...
In this chapter we give an overview of techniques for the modelling and reasoning about reversibility of systems, including out-of-causal-order reversibility, as it appears in chemical reactions. We consider the autoprotolysis of water reaction, and model it with the Calculus of Covalent Bonding, the Bonding Calculus, and Reversing Petri Nets. This...
PNMRNP is an SDF file that reports the structure, properties and classification of 211,280 natural products.
The starting point of this work (January 2019) was https://github.com/oolonek/ISDB/tree/master/Data/dbs which contains csv files of the UNPD data base (Gu J et al., PLOS ONE 2013, 8, e62839, doi:10.1371/journal.pone.0062839) and which are pa...
Developments in artificial intelligence can be leveraged to support the diagnosis of degenerative disorders, such as epilepsy and Parkinson's disease. This study aims to provide a software solution, focused initially towards Parkinson's disease, which can positively impact medical practice surrounding de-generative diagnoses. Through the use of a d...
Die größte freie Datenbank für NMR‐Spektroskopie organischer Moleküle, nmrshiftdb, ermöglicht nicht nur den Vergleich und die Verifizierung von Spektren, sondern auch deren Vorhersage für neue Verbindungen. Die größte freie Datenbank für NMR‐Spektroskopie organischer Moleküle, nmrshiftdb, ermöglicht nicht nur den Vergleich und die Verifizierung von...
We suggest an improved software pipeline for mixture analysis. The improvements include combining tandem MS and 2D NMR data for a reliable identification of the constituents in an algorithm based on network analysis aiming for a robust and reliable identification routine. An important part of this pipeline is the use of open-data repositories, alth...
Abstract Accurate calculation of specific spectral properties for NMR is an important step for molecular structure elucidation. Here we report the development of a novel machine learning technique for accurately predicting chemical shifts of both $${^1\mathrm{H}}$$ 1H and $${^{13}\mathrm{C}}$$ 13C nuclei which exceeds DFT-accessible accuracy for $$...
We introduce a process calculus with a new action prefixing operator that allows to
model locally controlled reversibility. Since the observation of covalent bonding in
chemical reactions is the starting point of our work we call the process calculus the
Calculus of Covalent Bonding (CCB). The calculus is based on CCSK, but adds an
operator of the...
Descriptions of molecular environments have many applications in chemoinformatics, including chemical shift prediction. Hierarchically ordered spherical environment (HOSE) codes are the most popular such descriptions. We developed a method to extend these with stereochemistry information. It enables distinguishing atoms which would be considered id...
We suggest an improved software pipeline for mixture analysis. The improvements include combining tandem MS and 2D NMR data for a reliable identification of its constituents in an algorithm based on network analysis aiming for a robust and reliable identification routine. An important part of this pipeline is the use of open-data repositories, alth...
Even though NMR has found countless applications in the field of small molecule characterization, there is no standard file format available for the NMR data relevant to structure characterization of small molecules. A new format is therefore introduced to associate the NMR parameters extracted from 1D and 2D spectra of organic compounds to the pro...
NMR is a mature technique that is well established and adopted in a wide range of research facilities from laboratories to hospitals. This accounts for large amounts of valuable experimental data that may be readily exported into a standard and open format. Yet the publication of these data faces an important issue: raw data are not made available;...
We introduce a process calculus with a new prefixing operator that allows us to model locally controlled reversibility. Actions can be undone spontaneously, as in other reversible process calculi, or as pairs of concerted actions, where performing a weak action forces undoing of another action. The new operator in its full generality allows us to m...
Background
The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonical...
Background
The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonical...
We introduce a process calculus with a new prefixing operator that allows us to model locally controlled reversibility. Actions can be undone spontaneously, as in other reversible process calculi, or as pairs of concerted actions, where performing a weak action forces undoing of another action. The new operator in its full generality allows us to m...
We introduce a simple process calculus with a new operator that allows us to model locally controlled reversibility. In our setting, actions can be undone spontaneously, as in other reversible process calculi, or as a part of pairs of the so-called concerted actions, where performing forwards weak action forces undoing of another action, without th...
We describe a new operator for reversible process calculi that allows us to model locally controlled reversibility. In our setting, actions can be undone spontaneously or as a part of pairs of so-called concerted actions, where performing forwards a weak action forces undoing of another action, without the need of a global control or a memory. We m...
nmrshiftdb2 supports with its laboratory information management system the integration of an electronic lab administration and management into academic NMR facilities. Also, it offers the setup of a local database, while full access to nmrshiftdb2's World Wide Web database is granted. This freely available system allows on the one hand the submissi...
Die weiterentwickelte Software nmrshiftdb2 bietet frei zugänglich Daten und Werkzeuge für die Zuordnung und Struktursuche von und mit NMR-Spektren. Außerdem gibt es ein Modul zur elektronischen Laboradministration für universitäre NMR-Abteilungen.
nmrshiftdb2 and its predecessor NMRShiftDB [1,2] have been available as a community-based NMR database since 2002. During that time a continuously growing set of currently more than 40000 structures with 48600 spectra could be established. These data are freely available (http://nmrshiftdb.org) and cannot only be searched but can also be downloaded...
Metabolomics studies the occurrence and change of concentrations of small molecular weight chemical compounds (metabolites) in organisms, organs, tissues, cells and ultimately cell compartments in the context of environmental changes, disease or other boundary conditions. It does this by means of spectroscopic and chro-matographic techniques and by...
Contemporary biological research integrates neighboring scientific domains to answer complex questions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.
Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitu...
Current efforts in Metabolomics, such as the Human Metabolome Project, collect structures of biological metabolites as well as data for their characterisation, such as spectra for identification of substances and measurements of their concentration. Still, only a fraction of existing metabolites and their spectral fingerprints are known. Computer-A...
Descriptor set. The data provides the complete set of descriptor values in matrix form for the H-NMR shifts upon the prediction is based.
Supplement. Plots and Tables referenced by main article.
CMLSpect is an extension of Chemical Markup Language (CML) for managing spectral and other analytical data. It is designed to be flexible enough to contain a wide variety of spectral data. The paper describes the CMLElements used and gives practical examples for common types of spectra. In addition it demonstrates how different views of the data ca...
There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused...
The Chemistry Development Kit (CDK) provides methods for common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possib...
NMRShiftDB ist eine freie Datenbank für organische Verbindungen und die ihnen zugeordneten NMR-Spektren. Sie bietet die Möglichkeit zur Spektren- und Struktursuche sowie zur Vorhersage von chemischen Verschiebungen und hilft so bei der Identifizierung, Charakterisierung und Strukturaufklärung von unbekannten Verbindungen.
Compound identification and support for computer-assisted structure elucidation via a free community-built web database for organic structures and their NMR data is described. The new database NMRShiftDB is available on . As the first NMR database, NMRShiftDB allows not only open access to the database but also open and peer reviewed submission of...
The process of designing and implementing NMRShiftDB, an open-source, open-content database for chemical structures and their NMR data based solely on free software is described. NMRShiftDB is available to the community on http://www.nmrshiftdb.org. It allows for open submission and retrieval of data sets by its user community. The software and the...
The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common...
Verschiedene Frameworks der Apache Software Foundation bieten Entwicklern von Webanwendungen ein aeusserst nuetzliches Geruest fuer ihre Projekte. Dabei basieren die Frameworks Turbine, Jetspeed, Cocoon auf den Standards Java und XML und lassen sich daher an beliebiger Stelle erweitern. Dieses Buch bietet einen fundierten, praxisnahen Ueberblick ue...