Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.

Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy.
Journal of Molecular Graphics and Modelling (Impact Factor: 2.33). 04/2007; 25(6):755-66. DOI: 10.1016/j.jmgm.2006.06.005
Source: PubMed

ABSTRACT The soil sorption partition coefficient (log K(oc)) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K(ow) and log S(w). The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Juvenile hormone esterase (JHE) plays a key role in the development and metamorphosis of holometabolous insects. Its inhibitors could possibly be targeted for insect control. Conversely, JHE may also be involved in endocrine disruption by xenobiotics, resulting in detrimental effects in beneficial insects. There is therefore a need to know the structural characteristics of the molecules able to monitor JHE activity, and to develop SAR and QSAR studies to estimate their effectiveness. For a large diverse population of 181 trifluoromethylketones (TFKs) - the most potent JHE inhibitors known to date - we recently proposed a binary classification (active/inactive) using a support vector machine and Codessa structural descriptors. We have now examined, using the same data set and with the same descriptors, the applicability and performance of five other machine learning approaches. These have been shown able to handle high dimensional data (with descriptors possibly irrelevant or redundant) and to cope with complex mechanisms, but without delivering explicit directly exploitable models. Splitting the data into five batches (training set 80%, test set 20%) and carrying out leave-one-out cross-validation, led to good results of comparable performance, consistent with our previous support vector classifier (SVC) results. Accuracy was greater than 0.80 for all approaches. A reduced set of 15 descriptors common to all the investigated approaches showed good predictive ability (confirmed using a three-layer perceptron) and gives some clues regarding a mechanistic interpretation.
    SAR and QSAR in environmental research. 06/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In cases in which experimental data on chemical-specific input parameters are lacking, chemical regulations allow the use of alternatives to testing, such as in silico predictions based on quantitative structure–property relationships (QSPRs). Such predictions are often given as point estimates; however, little is known about the extent to which uncertainties associated with QSPR predictions contribute to uncertainty in fate assessments. In the present study, QSPR-induced uncertainty in overall persistence (POV) and long-range transport potential (LRTP) was studied by integrating QSPRs into probabilistic assessments of five polybrominated diphenyl ethers (PBDEs), using the multimedia fate model Simplebox. The uncertainty analysis considered QSPR predictions of the fate input parameters' melting point, water solubility, vapor pressure, organic carbon–water partition coefficient, hydroxyl radical degradation, biodegradation, and photolytic degradation. Uncertainty in POV and LRTP was dominated by the uncertainty in direct photolysis and the biodegradation half-life in water. However, the QSPRs developed specifically for PBDEs had a relatively low contribution to uncertainty. These findings suggest that the reliability of the ranking of PBDEs on the basis of POV and LRTP can be substantially improved by developing better QSPRs to estimate degradation properties. The present study demonstrates the use of uncertainty and sensitivity analyses in nontesting strategies and highlights the need for guidance when compounds fall outside the applicability domain of a QSPR. Environ. Toxicol. Chem. 2013;32:1069–1076. © 2013 SETAC
    Environmental Toxicology and Chemistry 05/2013; 32(5). · 2.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Regulatory agencies worldwide are committed to the objectives of the Strategic Approach to International Chemicals Management to ensure that by 2020 chemicals are used and produced in ways that lead to the minimization of significant adverse effects on human health and the environment. Under the Government of Canada's Chemicals Management Plan, the commitment to address a large number of substances, many with limited data, has highlighted the importance of pursuing alternative hazard assessment methodologies that are able to accommodate chemicals with varying toxicological information. One such method is (Quantitative) Structure Activity Relationships ((Q)SAR) models. The current investigation into the predictivity of 20 (Q)SAR tools designed to model bacterial reverse mutation in Salmonella typhimurium is one of the first of this magnitude to be carried out using an external validation set comprised mainly of industrial chemicals which represent a diverse group of aromatic and benzidine-based azo dyes and pigments. Overall, this study highlights the value in challenging the predictivity of existing models using a small but representative subset of data-rich chemicals. Furthermore, external validation revealed that only a handful of models satisfactorily predicted for the test chemical space. The exercise also provides insight into using the Organisation for Economic Co-operation and Development (Q)SAR Toolbox as a read across tool.
    Journal of Environmental Science and Health Part C Environmental Carcinogenesis & Ecotoxicology Reviews 01/2014; 32(1):46-82. · 3.23 Impact Factor


Available from
Jun 19, 2014