Igor Tetko

Igor Tetko
Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) | HZM · Institute of Structural Biology

Ph.D.

About

296
Publications
54,232
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,502
Citations
Introduction
Currently works at the Institute of Structural Biology, Helmholtz Zentrum München as well as CEO of BIGCHEM GmbH. His interest includes Chemoinformatics, Machine Learning, Computer Science (also known as "AI", "Big Data", etc.) and Chemistry.
Additional affiliations
August 2005 - August 2005
University of Strasbourg
Position
  • Professor
September 2001 - present
September 1996 - August 2001
Education
September 1983 - June 1989

Publications

Publications (296)
Article
The working temperature range of Ionic Liquids (IL) is determined by their liquid state range, where the IL’s melting point/glass transition and decomposition temperatures define the lower and upper limits of the range, respectively. Computational prediction of the structure of new ILs with required properties, e.g. which can exist in a liquid stat...
Article
Full-text available
A previously developed model to predict antibacterial activity of ionic liquids against a resistant A. baumannii strain was used to assess activity of phosphonium ionic liquids. Their antioxidant potential was additionally evaluated with newly developed models, which were based on public data. The accuracy of the models was rigorously evaluated usi...
Article
Full-text available
Small-molecule drug design aims to identify inhibitors that can specifically bind to a functionally important region on the target, i.e., an active site of an enzyme. Identification of potential binding pockets is typically based on static three-dimensional structures. However, small molecules may induce and select a dynamic binding pocket that is...
Article
Full-text available
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the liter...
Article
Full-text available
A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (in...
Article
The melting point (MP) of an ionic liquid (IL) is one of the key physical properties as it determines the lower limit of the IL working temperature range. In this work, we analysed the recently published studies to predict MP of ILs. While we were able to reproduce the statistical parameters reported by the authors, we found that the performance of...
Article
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are f...
Article
S. aureus resistant to methicillin (MRSA) is one of the most-concerned multidrug resistant bacteria, due to its role in life-threatening infections. There is an urgent need to develop new antibiotics against MRSA. In this study, we firstly compiled a data set of 2,3-diaminoquinoxalines by chemical synthesis and antibacterial screening against S. au...
Article
Full-text available
Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditi...
Article
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted ove...
Article
Full-text available
Online Chemical Modeling Environment (OCHEM) was used for QSAR analysis of a set of ionic liquids (ILs) tested against multi-drug resistant (MDR) clinical isolate Acinetobacter baumannii and Staphylococcus aureus strains. The predictive accuracy of regression models has coefficient of determination q2 = 0.66 − 0.79 with cross-validation and indepen...
Article
Full-text available
The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural networks, are increasingly used in the chemical industry, in particular with respect to Big Data. This editorial highlights the main results pr...
Article
Full-text available
We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, elimin...
Preprint
Full-text available
In the present paper we evaluated efficiency of the recent Transformer-CNN models to predict target properties based on the augmented stereochemical SMILES. We selected a well-known Cliff activity dataset as well as a Dipole moment dataset and compared the effect of three representations for R/S stereochemistry in SMILES. The considered representat...
Article
Full-text available
We present a Focused Library Generator that is able to create from scratch new molecules with desired properties. After training the Generator on the ChEMBL database, transfer learning was used to switch the generator to producing new Mdmx inhibitors that are a promising class of anticancer drugs. Lilly medicinal chemistry filters, molecular dockin...
Article
Full-text available
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL datab...
Article
Full-text available
Correction for ‘QSAR without borders’ by Eugene N. Muratov et al. , Chem. Soc. Rev. , 2020, DOI: 10.1039/d0cs00098a.
Article
Full-text available
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure–activity relationships (QSAR) modeling, has developed many important a...
Article
Full-text available
Abstract Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep...
Article
Full-text available
We show that a water envelope network plays a critical role in protein–protein interactions (PPI).
Article
Full-text available
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transf...
Preprint
Full-text available
We investigated the effect of different augmentation scenarios on predicting (retro)synthesis of chemical compounds using SMILES representation. We showed that augmentation of not only input sequences but also, importantly, of the target data eliminated the effect of data memorization by neural networks and improved their generalization performance...
Article
QSAR analysis of a set of previously synthesized phosphonium ionic liquids (PILs) tested against gram‐negative multi‐drug resistant clinical isolate Acinetobacter baumannii was done using the Online Chemical Modeling Environment (OCHEM). To overcome the problem of overfitting due to descriptor selection, five‐fold cross‐validation with variable sel...
Article
Full-text available
Objectives: This study investigated how different concentrations of doxorubicin (DOX) can affect the function of cardiac cells. This study also examined whether activation of prokineticin receptor (PKR)-1 by a nonpeptide agonist, IS20, prevents DOX-induced cardiovascular toxicity in mouse models. Background: High prevalence of heart failure duri...
Article
Full-text available
Background: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need i...
Article
Full-text available
Trypanosoma protists are pathogens leading to a spectrum of devastating infectious diseases. The range of available chemotherapeutics against Trypanosoma is limited and the existing therapies are partially ineffective and cause serious adverse effects. Formation of the PEX14-PEX5 complex is essential for protein import into the parasites’ glycosome...
Article
• Ionic liquids (ILs) are considered as an alternative to traditional organic solvents due to their unique physical and chemical properties. On the one hand, they have promising solvating characteristics, on the other hand, they are considered as environmentally friendly “green” solvents. Recent studies of ILs toxicity however questioned the safety...
Preprint
Full-text available
Prediction of molecular properties, including physico-chemical properties, is a challenging task in chemistry. Herein we present a new state-of-the-art multitask prediction method based on existing graph neural network methods. We have used different architectures for our models and the results clearly demonstrate that multitask learning can improv...
Preprint
Full-text available
We present SMILES-embeddings derived from the internal encoder state of a Transformer[1] model trained to canonize SMILES as a Seq2Seq problem. Using CharNN[2] architecture upon the embeddings results in higher quality QSAR/QSPR models on diverse benchmark datasets, including regression and classification tasks. The proposed Transformer-CNN method...
Preprint
We present SMILES-embeddings derived from internal encoder state of a Transformer model trained to canonize SMILES as a Seq2Seq problem. Using CharNN architecture upon the embeddings results in a higher quality QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMIL...
Preprint
Full-text available
Recurrent neural networks have been widely used to generate millions of de novo molecules in a known chemical space. These deep generative models are typically setup with LSTM or GRU units and trained with canonical SMILES. In this study, we introduce a new robust architecture, Generative Examination Network GEN, based on bidirectional RNNs with co...
Preprint
Full-text available
Recurrent neural networks have been widely used to generate millions of de novo molecules in a known chemical space. These deep generative models are typically setup with LSTM or GRU units and trained with canonical SMILEs. In this study, we introduce a new robust architecture, Generative Examination Networks GEN, based on bidirectional RNNs with c...
Chapter
Full-text available
We investigate the effect of augmentation of SMILES to increase the performance of convolutional neural network models by extending the results of our previous study [1] to new methods and augmentation scenarios. We demonstrate that augmentation significantly increases performance and this effect is consistent across investigated methods. The convo...
Chapter
Full-text available
G-Protein Coupled Receptors (GPCR) are involved in all the major signaling pathways. As a result, they often serve as potential target for therapeutic drugs. In this study we analyze publicly available assays involving different classes of GPCR to identify false positives. Using the latest developments in Machine Learning, we then build models that...
Chapter
Full-text available
We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snapshot l...
Chapter
Full-text available
Discrete time series can represent the occurrences of either a deterministic or a random process. Dynamical system theory provides powerful techniques to assess whether a set of equations (in a suitable embedding space) underlies the dynamics. In this case the trajectory can be predicted whenever the initial conditions are known with absolute preci...
Preprint
Full-text available
div> We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snap...
Preprint
Full-text available
We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snapshot l...
Article
Full-text available
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some ch...
Book
Full-text available
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019. The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
Book
Full-text available
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019. The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
Book
Full-text available
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019. The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
Book
Full-text available
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019. The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submissio...
Book
Full-text available
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019. The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
Article
Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different endpoints: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge...
Preprint
Full-text available
In our study, we demonstrate the synergy effect between convolutional neural networks and the multiplicity of SMILES. The model we propose, the so-called Convolutional Neural Fingerprint (CNF) model, reaches the accuracy of traditional descriptors such as Dragon (Mauri et al. [22]), RDKit (Landrum [18]), CDK2 (Willighagen et al. [43]) and PyDescrip...
Article
Full-text available
Despite the increasing volume of available data, the proportion of experimentally measured data remains small compared to the virtual chemical space of possible chemical structures. Therefore, there is a strong interest in simultaneously predicting different ADMET and biological properties of molecules, which are frequently strongly correlated with...
Article
Here, we report the data visualization, analysis and modeling for a large set of 4830 SN2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph – Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics...
Chapter
The knowledge of physical and chemical properties of a compound is required for understanding and modeling the action of a compound in drug discovery, environmental chemistry, and other chemical industries. This chapter first provides an overview of typical methods for the prediction of physicochemical properties and gives a more detailed analysis...
Article
Firefly luciferase is an enzyme that has found ubiquitous use in biological assays in high-throughput screening (HTS) campaigns. The inhibition of luciferase in such assays could lead to a false positive result. This issue has been known for a long time, and there have been significant efforts to identify luciferase inhibitors in order to enhance r...
Article
The problem of designing new anti-tubercular drugs against multiple-drug-resistant tuberculosis (MDR-TB) was addressed using advanced machine learning methods. Since there are only few published measurements against MDR-TB, we collected a large literature dataset and developed models against the non-resistant H37Rv strain. The predictive accuracy o...