Iurii Sushko

Helmholtz-Zentrum Munich · IBIS

Research interests

  • Interests
    30 QSAR models, different machine-learning methods

Publications

  • 3.88
    Impact points
    A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition.

    Sergii Novotarskyi, Iurii Sushko, Robert Körner, Anil Kumar Pandey, Igor V Tetko

    Journal of chemical information and modeling. 06/2011; 51(6):1271-80.

    Prediction of CYP450 inhibition activity of small molecules poses an important task due to high risk of drug-drug interactions. CYP1A2 is an important member of CYP450 superfamily and accounts for 15% of total CYP450 presence in human liver. This article compares 80 in-silico QSAR models that were c... [more] Prediction of CYP450 inhibition activity of small molecules poses an important task due to high risk of drug-drug interactions. CYP1A2 is an important member of CYP450 superfamily and accounts for 15% of total CYP450 presence in human liver. This article compares 80 in-silico QSAR models that were created by following the same procedure with different combinations of descriptors and machine learning methods. The training and test sets consist of 3745 and 3741 inhibitors and noninhibitors from PubChem BioAssay database. A heterogeneous external test set of 160 inhibitors was collected from literature. The studied descriptor sets involve E-state, Dragon and ISIDA SMF descriptors. Machine learning methods involve Associative Neural Networks (ASNN), K Nearest Neighbors (kNN), Random Tree (RT), C4.5 Tree (J48), and Support Vector Machines (SVM). The influence of descriptor selection on model accuracy was studied. The benefits of "bagging" modeling approach were shown. Applicability domain approach was successfully applied in this study and ways of increasing model accuracy through use of applicability domain measures were demonstrated as well as fragment-based model interpretation was performed. The most accurate models in this study achieved values of 83% and 68% correctly classified instances on the internal and external test sets, respectively. The applicability domain approach allowed increasing the prediction accuracy to 90% for 78% of the internal and 17% of the external test sets, respectively. The most accurate models are available online at http://ochem.eu/models/Q5747 .
  • 3.84
    Impact points
    Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information.

    Iurii Sushko, Sergii Novotarskyi, Robert Körner, Anil Kumar Pandey, Matthias Rupp, Wolfram Teetz, Stefan Brandmaier, Ahmed Abdelaziz, Volodymyr V Prokopenko, Vsevolod Y Tanchuk, [......], Dmitriy Chekmarev, Artem Cherkasov, Joao Aires-de-Sousa, Qing-You Zhang, Andreas Bender, Florian Nigsch, Luc Patiny, Antony Williams, Valery Tkachenko, Igor V Tetko

    Journal of computer-aided molecular design. 06/2011; 25(6):533-54.

    The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains ... [more] The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.
  • 3.88
    Impact points
    Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set.

    Iurii Sushko, Sergii Novotarskyi, Robert Körner, Anil Kumar Pandey, Artem Cherkasov, Jiazhong Li, Paola Gramatica, Katja Hansen, Timon Schroeter, Klaus-Robert Müller, [......], Denis Fourches, Eugene Muratov, Alexander Tropsha, Igor Baskin, Dragos Horvath, Gilles Marcou, Christophe Muller, Alexander Varnek, Volodymyr V Prokopenko, Igor V Tetko

    Journal of chemical information and modeling. 10/2010; 50(12):2094-111.

    The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have b... [more] The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .
  • Online chemical modeling environment

    S Novotarskyi, I Sushko, I Tetko

    Chemistry Central Journal. 01/2009;

  • 3.88
    Impact points
    Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection.

    Igor Tetko, Iurii Sushko, Anil Pandey, Hao Zhu, Alexander Tropsha, Ester Papa, Tomas Öberg, Roberto Todeschini, Denis Fourches, Alexandre Varnek

    Journal of chemical information and modeling. 09/2008;

    The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It co... [more] The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site.
  • Applicability Domain of QSAR models

    Iurii Sushko

    In den vergangenen Jahrzehnten wurde die Anwendung von QSAR-Modellen zur Vorhersage von biologischen Aktivitäten oder physikochemischen Eigenschaften immer gebräuchlicher. Das derzeit größte Problem, das eine praktische Anwendung dieser Modelle behindert, ist das Fehlen einer klaren Definition für d... [more] In den vergangenen Jahrzehnten wurde die Anwendung von QSAR-Modellen zur Vorhersage von biologischen Aktivitäten oder physikochemischen Eigenschaften immer gebräuchlicher. Das derzeit größte Problem, das eine praktische Anwendung dieser Modelle behindert, ist das Fehlen einer klaren Definition für den Anwendbarkeitsbereich (~Applicability Domain, AD) eines Modells. Diese Arbeit stellt neue Methoden für die AD-Bestimmung vor und vergleicht sie in einer Benchmarking Analyse mit bereits bestehenden Ansätzen. Eine praktische Bewertung der ermittelten AD wird anhand von Untersuchungen solcher Moleküleigenschaften, wie Mutagenizität, Toxizität und Lipophilie veranschaulicht. Die entwickelten Verfahren für die AD-Bestimmung erlauben es, die Vorhersagegenauigkeit verschiedenster chemischer Verbindungen abzuschätzen und weiterhin genau diejenigen zu identifizieren, deren, vom Modell vorhergesagter Wert, nahezu gleichwertig zu einer experimentellen Bestimmung ist.

Following (10)

9
Publications
11
Followers