Matthias Rupp
Research interests
-
InterestsMachine Learning, Kernel Methods, Cheminformatics, Chemoinformatics, Computational Chemistry
Publications
-
3.09Impact points
Optimizing transition states via kernel-based machine learning.
The Journal of chemical physics. 05/2012; 136(17):174101.
We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinemen... [more] We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinement of the surface by molecular dynamics sampling. We demonstrate that the machine-learned surfaces contain the relevant low-energy saddle points. The mechanisms of reactions may be extracted from the machine-learned surfaces in order to identify unexpected chemically relevant processes. Furthermore, we show that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface.
-
5.76Impact points
DOGS: reaction-driven de novo design of bioactive compounds.
PLoS computational biology. 02/2012; 8(2):e1002380.
We present a computational method for the reaction-based de novo design of drug-like molecules. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated 'in silico' assembly of potentially novel bioactive compounds. The quality of the designed compounds... [more] We present a computational method for the reaction-based de novo design of drug-like molecules. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated 'in silico' assembly of potentially novel bioactive compounds. The quality of the designed compounds is assessed by a graph kernel method measuring their similarity to known bioactive reference ligands in terms of structural and pharmacophoric features. We implemented a deterministic compound construction procedure that explicitly considers compound synthesizability, based on a compilation of 25'144 readily available synthetic building blocks and 58 established reaction principles. This enables the software to suggest a synthesis route for each designed compound. Two prospective case studies are presented together with details on the algorithm and its implementation. De novo designed ligand candidates for the human histamine H₄ receptor and γ-secretase were synthesized as suggested by the software. The computational approach proved to be suitable for scaffold-hopping from known ligands to novel chemotypes, and for generating bioactive molecules with drug-like properties.
-
Finding Density Functionals with Machine Learning
12/2011;
Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of non-interacting fermions in 1d, mean absolute errors below 1 kcal/mol on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies i... [more] Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of non-interacting fermions in 1d, mean absolute errors below 1 kcal/mol on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. Challenges for application of our method to real electronic structure problems are discussed.
-
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
09/2011;
We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schr\"odinger equation is mapped onto a non-linear statistical regression problem of reduced comp... [more] We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schr\"odinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ~10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.
-
3.84Impact points
Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information.
Journal of computer-aided molecular design. 06/2011; 25(6):533-54.
The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains ... [more] The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.
-
2.46Impact points
Predicting the pKa of small molecule.
Combinatorial chemistry & high throughput screening. 04/2011; 14(5):307-27.
The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pKa of the acid dissociation constant (Ka). We survey the literature on computational methods to predict the pKa of small mo... [more] The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pKa of the acid dissociation constant (Ka). We survey the literature on computational methods to predict the pKa of small molecules. In this, we address data availability (used data sets, data quality, proprietary versus public data), molecular representations (quantum mechanics, descriptors, structured representations), prediction methods (approaches, implementations), as well as pKa-specific issues such as mono- and multiprotic compounds. We discuss advantages, problems, recent progress, and challenges in the field.
-
4.41Impact points
Spherical harmonics coefficients for ligand-based virtual screening of cyclooxygenase inhibitors.
PloS one. 01/2011; 6(7):e21554.
Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for ... [more] Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort.
-
3.77Impact points
Pharmacophore alignment search tool: Influence of canonical atom labeling on similarity searching.
Journal of computational chemistry. 11/2010; 31(15):2810-26.
Previously, (Hähnke et al., J Comput Chem 2009, 30, 761) we presented the Pharmacophore Alignment Search Tool (PhAST), a ligand-based virtual screening technique representing molecules as strings coding pharmacophoric features and comparing them by global pairwise sequence alignment. To guarantee un... [more] Previously, (Hähnke et al., J Comput Chem 2009, 30, 761) we presented the Pharmacophore Alignment Search Tool (PhAST), a ligand-based virtual screening technique representing molecules as strings coding pharmacophoric features and comparing them by global pairwise sequence alignment. To guarantee unambiguity during the reduction of two-dimensional molecular graphs to one-dimensional strings, PhAST employs a graph canonization step. Here, we present the results of the comparison of 11 different algorithms for graph canonization with respect to their impact on virtual screening. Retrospective screenings of a drug-like data set were evaluated using the BEDROC metric, which yielded averaged values between 0.4 and 0.14 for the best-performing and worst-performing canonization technique. We compared five scoring schemes for the alignments and found preferred combinations of canonization algorithms and scoring functions. Finally, we introduce a performance index that helps prioritize canonization approaches without the need for extensive retrospective evaluation.
-
2.65Impact points
Truxillic acid derivatives act as peroxisome proliferator-activated receptor gamma activators.
Bioorganic & medicinal chemistry letters. 03/2010; 20(9):2920-3.
In previous studies, we identified a truxillic acid derivative as selective activator of the peroxisome proliferator-activated receptor gamma, which is a member of the nuclear receptor family and acts as ligand-activated transcription factor of genes involved in glucose metabolism. Herein we present... [more] In previous studies, we identified a truxillic acid derivative as selective activator of the peroxisome proliferator-activated receptor gamma, which is a member of the nuclear receptor family and acts as ligand-activated transcription factor of genes involved in glucose metabolism. Herein we present the structure-activity relationships of 16 truxillic acid derivatives, investigated by a cell-based reporter gene assay guided by molecular docking analysis.
-
3.23Impact points
From machine learning to natural product derivatives that selectively activate transcription factor PPARgamma.
ChemMedChem. 02/2010; 5(2):191-4.
-
3.77Impact points
Distance phenomena in high-dimensional chemical descriptor spaces: Consequences for similarity-based approaches.
Journal of computational chemistry. 04/2009;
Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real-valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the in... [more] Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real-valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the influence which the choice of (dis)similarity measure can have on results, and provide recommendations for such choices. We review the mathematical concepts used to measure (dis)similarity in vector spaces, namely norms, metrics, inner products, and, similarity coefficients, as well as the relationships between them, employing (dis)similarity measures commonly used in cheminformatics as examples. We present several phenomena (empty space phenomenon, sphere volume related phenomena, distance concentration) in high-dimensional descriptor spaces which are not encountered in two and three dimensions. These phenomena are theoretically characterized and illustrated on both artificial and real (bioactivity) data. (c) 2009 Wiley Periodicals, Inc. J Comput Chem 2009.
-
3.77Impact points
Shapelets: possibilities and limitations of shape-based virtual screening.
Journal of computational chemistry. 02/2008; 29(1):108-14.
Complementarity of molecular surfaces is crucial for molecular recognition. A method for representation of molecular shape is presented. We decompose the molecular surface into commensurate patches with defined shape by fitting hyperbolical paraboloids onto a triangulated isosurface of the Gaussian ... [more] Complementarity of molecular surfaces is crucial for molecular recognition. A method for representation of molecular shape is presented. We decompose the molecular surface into commensurate patches with defined shape by fitting hyperbolical paraboloids onto a triangulated isosurface of the Gaussian model of a molecule. As a result of this decomposition we obtain a 3D graph representation of the molecular shape, which can be used for complete and partial shape matching and isosteric group searching. To point out the possibilities and limitations of shape-only models, we challenged our method by three scenarios in a virtual screening contest: rigid body alignment, consensus shape filtering, and target-specific screening.
-
Predicting the pK(a) of Small Molecules
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING. 14.
The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pK(a) of the acid dissociation constant (K-a). We survey the literature on computational methods to predict the pK(a) of sma... [more] The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pK(a) of the acid dissociation constant (K-a). We survey the literature on computational methods to predict the pK(a) of small molecules. In this, we address data availability (used data sets, data quality, proprietary versus public data), molecular representations (quantum mechanics, descriptors, structured representations), prediction methods (approaches, implementations), as well as pK(a)-specific issues such as mono-and multiprotic compounds. We discuss advantages, problems, recent progress, and challenges in the field.
-
3.88Impact points
Kernel approach to molecular similarity based on iterative graph similarity.
Journal of chemical information and modeling. 47(6):2280-6.
Similarity measures for molecules are of basic importance in chemical, biological, and pharmaceutical applications. We introduce a molecular similarity measure defined directly on the annotated molecular graph, based on iterative graph similarity and optimal assignments. We give an iterative algorit... [more] Similarity measures for molecules are of basic importance in chemical, biological, and pharmaceutical applications. We introduce a molecular similarity measure defined directly on the annotated molecular graph, based on iterative graph similarity and optimal assignments. We give an iterative algorithm for the computation of the proposed molecular similarity measure, prove its convergence and the uniqueness of the solution, and provide an upper bound on the required number of iterations necessary to achieve a desired precision. Empirical evidence for the positive semidefiniteness of certain parametrizations of our function is presented. We evaluated our molecular similarity measure by using it as a kernel in support vector machine classification and regression applied to several pharmaceutical and toxicological data sets, with encouraging results.