An in silico ensemble method for lead discovery: Decision forest

Z-Tech at National Center for Toxicological Research, U.S. Food and Drug Administration, Division of Bioinformatics, Jefferson, AR 72079, USA.
SAR and QSAR in Environmental Research (Impact Factor: 1.6). 09/2005; 16(4):339-47. DOI: 10.1080/10659360500203022
Source: PubMed


Recent progress in combinatorial chemistry and parallel synthesis has radically changed the approach to drug discovery in the pharmaceutical industry. At present, thousands of compounds can be made in a short period, creating a need for fast and effective in silico methods to select the most promising lead candidates. Decision forest is a novel pattern recognition method, which combines the results of multiple distinct but comparable decision tree models to reach a consensus prediction. In this article, a decision forest model was developed using a structurally diverse training data set containing 232 compounds whose estrogen receptor binding activity was tested at the U.S. Food and Drug Administration (FDA)'s National Center for Toxicological Research (NCTR). The model was subsequently validated using a test data set of 463 compounds selected from the literature, and then applied to a large data set with 57,145 compounds as a screening example. The results show that the decision forest method is a fast, reliable and effective in silico approach, which could be useful in drug discovery.

Download full-text


Available from: Roger G Perkins
  • [Show abstract] [Hide abstract]
    ABSTRACT: Toxicity databases have a special role in predictive toxicology, providing ready access to historical information throughout the workflow of discovery, development, and product safety processes in drug development as well as in review by regulatory agencies. To provide accurate information within a hypothesesbuilding environment, the content of the databases needs to be rigorously modeled using standards and controlled vocabulary. The utilitarian purposes of databases widely vary, ranging from a source for (Q)SAR datasets for modelers to a basis for "read-across" for regulators. Many tasks involved in the use of databases are closely tied to data mining, hence database and data mining are essential technology pairs. To understand chemically-induced toxicity, chemical structures must be integrated into the toxicity databases. Data mining these "structure-integrated toxicity databases" requires techniques for handling both chemical structures and textual toxicity information. Structure data mining is similar with some modifications to that conventionally employed for large chemical databases, while data mining of toxicity endpoints is not well developed. This review presents a general strategy to data mine structure-integrated toxicity databases to link chemical structures to biological endpoints. Iterative probing of the chemical domain with toxicity endpoint descriptors and the biological domain with chemical descriptors enables linking of the two domains. Data mining steps to elucidate the hidden relationships between the target organs and chemical classes are presented as an example. Work is in progress in the public domain toward the linking of chemistry to biology by providing databases that can be mined.
    No preview · Article · May 2006 · Current Computer - Aided Drug Design
  • [Show abstract] [Hide abstract]
    ABSTRACT: Although the literature is replete with QSAR models developed for many toxic effects caused by reversible chemical interactions, the development of QSARs for the toxic effects of reactive chemicals lacks a consistent approach. While limitations exit, an appropriate starting-point for modeling reactive toxicity is the applicability of the general rules of organic chemical reactions and the association of these reactions to cellular targets of importance in toxicology. The identification of plausible "molecular initiating events" based on covalent reactions with nucleophiles in proteins and DNA provides the unifying concept for a framework for reactive toxicity. This paper outlines the proposed framework for reactive toxicity. Empirical measures of the chemical reactivity of xenobiotics with a model nucleophile (thiol) are used to simulate the relative rates at which a reactive chemical is likely to bind irreversibly to cellular targets. These measures of intrinsic reactivity serve as correlates to a variety of toxic effects; what's more they appear to be more appropriate endpoints for QSAR modeling than the toxicity endpoints themselves.
    No preview · Article · Sep 2006 · SAR and QSAR in Environmental Research
  • [Show abstract] [Hide abstract]
    ABSTRACT: A number of xenobiotics by mimicking natural hormones can disrupt crucial functions in wildlife and humans. These chemicals termed endocrine disruptors are able to exert adverse effects through a variety of mechanisms. Fortunately, there is a growing interest in the study of these structurally diverse chemicals mainly through research programs based on in vitro and in vivo experimentations but also by means of SAR and QSAR models. The goal of our study was to retrieve from the literature all the papers dealing with structure-activity models on endocrine disruptor xenobiotics. A critical analysis of these models was made focusing our attention on the quality of the biological data, the significance of the molecular descriptors and the validity of the statistical tools used for deriving the models. The predictive power and domain of application of these models were also discussed.
    No preview · Article · Sep 2006 · SAR and QSAR in Environmental Research
Show more