
Igor TetkoHelmholtz Munich Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Igor Tetko
Ph.D.
About
306
Publications
63,302
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,446
Citations
Introduction
Currently works at the Institute of Structural Biology, Helmholtz Zentrum München as well as CEO of BIGCHEM GmbH. His interest includes Chemoinformatics, Machine Learning, Computer Science (also known as "AI", "Big Data", etc.) and Chemistry.
Additional affiliations
August 2005 - August 2005
September 2001 - present
September 1996 - August 2001
Education
September 1983 - June 1989
Publications
Publications (306)
Background. The bacterial pathogen Acinetobacter baumannii is one of the most dangerous multi-drug-resistant (MDR) microorganisms, which causes numerous bacterial infections. Nowadays, there is an urgent need for new broad-spectrum antibacterial agents with specific molecular mechanisms of action. Long-chain 1-alkylpyridinium salts are efficient ca...
“Unlocking the Secrets of the Prokineticins.” See the article by Vincenzi et al. (dx.doi.org/10.1124/pharmrev.122.000801).
Machine Learning techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to cat...
The prokineticins (PKs) were discovered approximately 20 years ago as small peptides inducing gut contractility. Today, they are established as angiogenic, anorectic and proinflammatory cytokines, chemokines, hormones and neuropeptides involved in variety of physiological and pathophysiological pathways. Their altered expression or mutations implic...
Retrosynthesis consists of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found with the goal to provide a synthesis route. Its two primary research directions, single-step retrosynthesis prediction, which models the chemical reaction logic, and multi-step synt...
The EUOS/SLAS challenge has its goal to develop reliable algorithms to predict solubility of small molecules experimentally measured aqueous solubility of 100k compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by nephelometry assay. This article describes the winning Top I...
The assessment of persistence (P), bioaccumulation (B), and toxicity (T) of a chemical is a crucial first step at ensuring chemical safety and is a cornerstone of the European Union’s chemicals regulation REACH (Registration, Evaluation, Authorization, and Restriction of Chemicals). Existing methods for PBT assessment are overly complex and cumbers...
Retrosynthesis is the task of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found. Consequently, the goal is to provide a valid synthesis route for a molecule. As more single-step models develop, we see increasing accuracy in the prediction of molecular discon...
The working temperature range of Ionic Liquids (IL) is determined by their liquid state range, where the IL’s melting point/glass transition and decomposition temperatures define the lower and upper limits of the range, respectively. Computational prediction of the structure of new ILs with required properties, e.g. which can exist in a liquid stat...
A previously developed model to predict antibacterial activity of ionic liquids against a resistant A. baumannii strain was used to assess activity of phosphonium ionic liquids. Their antioxidant potential was additionally evaluated with newly developed models, which were based on public data. The accuracy of the models was rigorously evaluated usi...
Small-molecule drug design aims to identify inhibitors that can specifically bind to a functionally important region on the target, i.e., an active site of an enzyme. Identification of potential binding pockets is typically based on static three-dimensional structures. However, small molecules may induce and select a dynamic binding pocket that is...
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the liter...
A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (in...
The melting point (MP) of an ionic liquid (IL) is one of the key physical properties as it determines the lower limit of the IL working temperature range. In this work, we analysed the recently published studies to predict MP of ILs. While we were able to reproduce the statistical parameters reported by the authors, we found that the performance of...
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are f...
S. aureus resistant to methicillin (MRSA) is one of the most-concerned multidrug resistant bacteria, due to its role in life-threatening infections. There is an urgent need to develop new antibiotics against MRSA. In this study, we firstly compiled a data set of 2,3-diaminoquinoxalines by chemical synthesis and antibacterial screening against S. au...
Background:
Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditi...
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted ove...
Online Chemical Modeling Environment (OCHEM) was used for QSAR analysis of a set of ionic liquids (ILs) tested against multi-drug resistant (MDR) clinical isolate Acinetobacter baumannii and Staphylococcus aureus strains. The predictive accuracy of regression models has coefficient of determination q2 = 0.66 − 0.79 with cross-validation and indepen...
The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural networks, are increasingly used in the chemical industry, in particular with respect to Big Data. This editorial highlights the main results pr...
We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, elimin...
In the present paper we evaluated efficiency of the recent Transformer-CNN models to predict target properties based on the augmented stereochemical SMILES. We selected a well-known Cliff activity dataset as well as a Dipole moment dataset and compared the effect of three representations for R/S stereochemistry in SMILES. The considered representat...
We present a Focused Library Generator that is able to create from scratch new molecules with desired properties. After training the Generator on the ChEMBL database, transfer learning was used to switch the generator to producing new Mdmx inhibitors that are a promising class of anticancer drugs. Lilly medicinal chemistry filters, molecular dockin...
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL datab...
Correction for ‘QSAR without borders’ by Eugene N. Muratov et al. , Chem. Soc. Rev. , 2020, DOI: 10.1039/d0cs00098a.
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure–activity relationships (QSAR) modeling, has developed many important a...
Abstract Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep...
We show that a water envelope network plays a critical role in protein–protein interactions (PPI).
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transf...
We investigated the effect of different augmentation scenarios on predicting (retro)synthesis of chemical compounds using SMILES representation. We showed that augmentation of not only input sequences but also, importantly, of the target data eliminated the effect of data memorization by neural networks and improved their generalization performance...
QSAR analysis of a set of previously synthesized phosphonium ionic liquids (PILs) tested against gram‐negative multi‐drug resistant clinical isolate Acinetobacter baumannii was done using the Online Chemical Modeling Environment (OCHEM). To overcome the problem of overfitting due to descriptor selection, five‐fold cross‐validation with variable sel...
Objectives:
This study investigated how different concentrations of doxorubicin (DOX) can affect the function of cardiac cells. This study also examined whether activation of prokineticin receptor (PKR)-1 by a nonpeptide agonist, IS20, prevents DOX-induced cardiovascular toxicity in mouse models.
Background:
High prevalence of heart failure duri...
Background:
Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need i...
Trypanosoma protists are pathogens leading to a spectrum of devastating infectious diseases. The range of available chemotherapeutics against Trypanosoma is limited and the existing therapies are partially ineffective and cause serious adverse effects. Formation of the PEX14-PEX5 complex is essential for protein import into the parasites’ glycosome...
• Ionic liquids (ILs) are considered as an alternative to traditional organic solvents due to their unique physical and chemical properties. On the one hand, they have promising solvating characteristics, on the other hand, they are considered as environmentally friendly “green” solvents. Recent studies of ILs toxicity however questioned the safety...
Prediction of molecular properties, including physico-chemical properties, is a challenging task in chemistry. Herein we present a new state-of-the-art multitask prediction method based on existing graph neural network methods. We have used different architectures for our models and the results clearly demonstrate that multitask learning can improv...
We present SMILES-embeddings derived from the internal encoder state of a Transformer[1] model trained to canonize SMILES as a Seq2Seq problem. Using CharNN[2] architecture upon the embeddings results in higher quality QSAR/QSPR models on diverse benchmark datasets, including regression and classification tasks. The proposed Transformer-CNN method...
We present SMILES-embeddings derived from internal encoder state of a Transformer model trained to canonize SMILES as a Seq2Seq problem. Using CharNN architecture upon the embeddings results in a higher quality QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMIL...
Recurrent neural networks have been widely used to generate millions of de novo molecules in a known chemical space. These deep generative models are typically setup with LSTM or GRU units and trained with canonical SMILES. In this study, we introduce a new robust architecture, Generative Examination Network GEN, based on bidirectional RNNs with co...
Recurrent neural networks have been widely used to generate millions of de novo molecules in a known chemical space. These deep generative models are typically setup with LSTM or GRU units and trained with canonical SMILEs. In this study, we introduce a new robust architecture, Generative Examination Networks GEN, based on bidirectional RNNs with c...
We investigate the effect of augmentation of SMILES to increase the performance of convolutional neural network models by extending the results of our previous study [1] to new methods and augmentation scenarios. We demonstrate that augmentation significantly increases performance and this effect is consistent across investigated methods. The convo...
G-Protein Coupled Receptors (GPCR) are involved in all the major signaling pathways. As a result, they often serve as potential target for therapeutic drugs. In this study we analyze publicly available assays involving different classes of GPCR to identify false positives. Using the latest developments in Machine Learning, we then build models that...
We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snapshot l...
Discrete time series can represent the occurrences of either a deterministic or a random process. Dynamical system theory provides powerful techniques to assess whether a set of equations (in a suitable embedding space) underlies the dynamics. In this case the trajectory can be predicted whenever the initial conditions are known with absolute preci...
div>
We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snap...
We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snapshot l...
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some ch...
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019.
The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019.
The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019.
The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019.
The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submissio...
The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019.
The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submission...
Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different endpoints: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge...
In our study, we demonstrate the synergy effect between convolutional neural networks and the multiplicity of SMILES. The model we propose, the so-called Convolutional Neural Fingerprint (CNF) model, reaches the accuracy of traditional descriptors such as Dragon (Mauri et al. [22]), RDKit (Landrum [18]), CDK2 (Willighagen et al. [43]) and PyDescrip...
Despite the increasing volume of available data, the proportion of experimentally measured data remains small compared to the virtual chemical space of possible chemical structures. Therefore, there is a strong interest in simultaneously predicting different ADMET and biological properties of molecules, which are frequently strongly correlated with...
Here, we report the data visualization, analysis and modeling for a large set of 4830 SN2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph – Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics...
The knowledge of physical and chemical properties of a compound is required for understanding and modeling the action of a compound in drug discovery, environmental chemistry, and other chemical industries. This chapter first provides an overview of typical methods for the prediction of physicochemical properties and gives a more detailed analysis...
Firefly luciferase is an enzyme that has found ubiquitous use in biological assays in high-throughput screening (HTS) campaigns. The inhibition of luciferase in such assays could lead to a false positive result. This issue has been known for a long time, and there have been significant efforts to identify luciferase inhibitors in order to enhance r...
The problem of designing new anti-tubercular drugs against multiple-drug-resistant tuberculosis (MDR-TB) was addressed using advanced machine learning methods. Since there are only few published measurements against MDR-TB, we collected a large literature dataset and developed models against the non-resistant H37Rv strain. The predictive accuracy o...