Predicting Drug-Induced Hepatotoxicity Using QSAR and Toxicogenomics Approaches

Laboratory for Molecular Modeling, University of North Carolina , Chapel Hill, North Carolina 27599, United States.
Chemical Research in Toxicology (Impact Factor: 3.53). 06/2011; 24(8):1251-62. DOI: 10.1021/tx200148a
Source: PubMed


Quantitative structure-activity relationship (QSAR) modeling and toxicogenomics are typically used independently as predictive tools in toxicology. In this study, we evaluated the power of several statistical models for predicting drug hepatotoxicity in rats using different descriptors of drug molecules, namely, their chemical descriptors and toxicogenomics profiles. The records were taken from the Toxicogenomics Project rat liver microarray database containing information on 127 drugs ( ). The model end point was hepatotoxicity in the rat following 28 days of continuous exposure, established by liver histopathology and serum chemistry. First, we developed multiple conventional QSAR classification models using a comprehensive set of chemical descriptors and several classification methods (k nearest neighbor, support vector machines, random forests, and distance weighted discrimination). With chemical descriptors alone, external predictivity (correct classification rate, CCR) from 5-fold external cross-validation was 61%. Next, the same classification methods were employed to build models using only toxicogenomics data (24 h after a single exposure) treated as biological descriptors. The optimized models used only 85 selected toxicogenomics descriptors and had CCR as high as 76%. Finally, hybrid models combining both chemical descriptors and transcripts were developed; their CCRs were between 68 and 77%. Although the accuracy of hybrid models did not exceed that of the models based on toxicogenomics data alone, the use of both chemical and biological descriptors enriched the interpretation of the models. In addition to finding 85 transcripts that were predictive and highly relevant to the mechanisms of drug-induced liver injury, chemical structural alerts for hepatotoxicity were identified. These results suggest that concurrent exploration of the chemical features and acute treatment-induced changes in transcript levels will both enrich the mechanistic understanding of subchronic liver injury and afford models capable of accurate prediction of hepatotoxicity from chemical structure and short-term assay results.

Download full-text


Available from: Yen Low
  • Source
    • "Zhu et al. (2008) first combined the in vitro assay data on cell viability with conventional chemical descriptors which greatly improved the prediction accuracy of rodent carcinogenicity. In subsequent studies, the hybrid classification models (i.e. using both biological and chemical descriptors) were developed in predicting an acute toxicity half-maximal lethal dose (Sedykh et al., 2011) and shortterm drug hepatotoxicity (Low et al., 2011) in rats. Compared with traditional, purely chemical (e.g. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Drug-induced liver injury (DILI) is a major adverse drug reaction that accounts for one-third of post-marketing drug withdrawals. Several classifiers for human hepatotoxicity using chemical descriptors with limited prediction accuracies have been published. In this study, we developed predictive in silico models based on a set of 156 DILI positive and 136 DILI negative compounds for DILI prediction. First, models based on a chemical descriptor (CDK, Dragon and MOE) and in vitro cell-imaging endpoints [human hepatocyte imaging assay technology (HIAT) descriptors] were built using random forest (RF) and five-fold cross-validation procedure. Then three hybrid models were built using HIAT and a single type of chemical descriptors. Generally, the models based only on chemical descriptors were poor, with a correct classification rate (CCR) around 0.60 when the default threshold value (i.e. threshold = 0.50) was used. The hybrid models afforded a CCR of 0.73 with a specificity of 0.74 and a better true positive rate (sensitivity of 0.71), which is crucial in drug toxicity screening for the purpose of patient safety. The benefit of hybrid models was even more drastic when stricter classification thresholds were employed (e.g. CCR would be 0.83 when double thresholds (non-toxic < 0.40 and toxic > 0.60) were used for the hybrid model). We have developed rigorously validated hybrid models which can be used in virtual screening of lead compounds with potential hepatotoxicity. Our study also showed a chemical structure and in vitro biological data can be complementary in enhancing the prediction accuracy of human hepatotoxicity and can afford rational mechanistic interpretation. Copyright © 2013 John Wiley & Sons, Ltd.
    Full-text · Article · Mar 2014 · Journal of Applied Toxicology
  • Source
    • "Cheng et al. (2010) examined similarities between chemical structures and molecular targets of 37 drugs that were clustered based on their bioactivity profiles. Low et al. (2011) classified 127 rat liver samples to toxic versus non-toxic responses, based on combined drug-induced expression profiles and chemical descriptors, and identified chemical substructures and genes that were responsible for liver toxicity. In a broader setting, when the goal is to find dependencies between two data sources (chemical structures and genomic responses), correlation-type approaches match the goal directly, and have the additional advantage that a predefined classification is not required. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Analysis of relationships of drug structure to biological response is key to understanding off-target and unexpected drug effects, and for developing hypotheses on how to tailor drug therapies. New methods are required for integrated analyses of a large number of chemical features of drugs against the corresponding genome-wide responses of multiple cell models.Results: In this article, we present the first comprehensive multi-set analysis on how the chemical structure of drugs impacts on genome-wide gene expression across several cancer cell lines [Connectivity Map (CMap) database]. The task is formulated as searching for drug response components across multiple cancers to reveal shared effects of drugs and the chemical features that may be responsible. The components can be computed with an extension of a recent approach called Group Factor Analysis. We identify 11 components that link the structural descriptors of drugs with specific gene expression responses observed in the three cell lines and identify structural groups that may be responsible for the responses. Our method quantitatively outperforms the limited earlier methods on CMap and identifies both the previously reported associations and several interesting novel findings, by taking into account multiple cell lines and advanced 3D structural descriptors. The novel observations include: previously unknown similarities in the effects induced by 15-delta prostaglandin J2 and HSP90 inhibitors, which are linked to the 3D descriptors of the drugs; and the induction by simvastatin of leukemia-specific response, resembling the effects of corticosteroids.Availability and implementation: Source Code implementing the method is available at: or samuel.kaski@aalto.fiSupplementary Information: Supplementary data are available at Bioinformatics online.
    Full-text · Article · Dec 2013 · Bioinformatics
  • Source
    • "A few studies have applied SVM in toxicology. However, most of these were for classifying or predicting specific pharmacodynamic, pharmacokinetic and toxicological properties of chemical or biological molecules using QSAR/QSPR or for examining information on biological or toxicological pathways (Yap et al., 2007;Cao et al., 2012;Uehara et al., 2011;Low et al., 2011). To the best of our knowledge, there are no examples of using SVM methods for assessing toxicological effects of real-world complex mixtures based on their pollutant composition. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Powerful, robust in silico approaches offer great promise for classifying and predicting biological effects of complex mixtures and for identifying the constituents of greatest concern. Support vector machine (SVM) methods can deal with high dimensional data and small sample size and examine multiple interrelationships among samples. In this work, we applied SVM methods to examine pollution profiles and mutagenicity of 60 water samples obtained from 6 cities in China during 2006-2011. Pollutant profiles were characterized in water extracts by gas chromatography-mass spectrometry (GC/MS) and mutagenicity examined by Ames assays. We encoded feature vectors of GS-MS peaks in the mixtures and used 48 samples as the training set, reserving 12 samples as the test set. The SVM model and regression were constructed from whole pollution profiles that ranked compounds in relation to their correlation to the mutagenicity. Both classification and prediction performance were evaluated. The SVM model based on whole pollution profiles showed lower performance (sensitivity, specificity, accuracy and correlation coefficient were 69.5%-70.7%, 70.6%-73.2%, 69.9%-72.1%, and 0.55-0.59, respectively) than one based on compounds with highest association with mutagenicity. A SVM model with the top 10 compounds had the highest performance (sensitivity, specificity, accuracy, and correlation coefficient were 89.8%-90.3%, 90.1%-92.1%, 90.1%-91.3% and, 0.80-0.82, respectively), with negligible decreases in performance between the test and training set. SVM can be a powerful, robust classifier of the relationship of pollutants and mutagenicity in complex real-world mixtures. The top 14 compounds have the greatest contribution to mutagenicity and deserve further studies to identify these constituents.
    Full-text · Article · Feb 2013 · Toxicology
Show more