An in silico ensemble method for lead discovery: decision forest.

Z-Tech at National Center for Toxicological Research, U.S. Food and Drug Administration, Division of Bioinformatics, Jefferson, AR 72079, USA.
SAR and QSAR in Environmental Research (Impact Factor: 1.92). 09/2005; 16(4):339-47. DOI: 10.1080/10659360500203022
Source: PubMed

ABSTRACT Recent progress in combinatorial chemistry and parallel synthesis has radically changed the approach to drug discovery in the pharmaceutical industry. At present, thousands of compounds can be made in a short period, creating a need for fast and effective in silico methods to select the most promising lead candidates. Decision forest is a novel pattern recognition method, which combines the results of multiple distinct but comparable decision tree models to reach a consensus prediction. In this article, a decision forest model was developed using a structurally diverse training data set containing 232 compounds whose estrogen receptor binding activity was tested at the U.S. Food and Drug Administration (FDA)'s National Center for Toxicological Research (NCTR). The model was subsequently validated using a test data set of 463 compounds selected from the literature, and then applied to a large data set with 57,145 compounds as a screening example. The results show that the decision forest method is a fast, reliable and effective in silico approach, which could be useful in drug discovery.


Available from: Roger G Perkins, Jun 02, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Importance of the field: Virtual screening (VS) coupled with structural biology is a significantly important approach to increase the number and enhance the success of projects in lead identification stage of drug discovery process. Recent advances and future directions in estrogen therapy have resulted in great demand for identifying the potential estrogen receptor (ER) modulators with more activity and selectivity. Areas covered in this review: This review presents the current state of the art in VS and structure-activity relationship of ER modulators in recent discovery, and discusses the strengths and weaknesses of the technology. What the reader will gain: Readers will gain an overview of the current platforms of in silico screening for discovery of ER modulators; they will learn which structural information is significantly correlated with the bioactivity of ER modulators and what novel strategies should be considered for the creation of more effective chemical structures. Take home message: With the goal of reducing toxicity and/or improving efficacy, challenges to the successful modeling of endocrine agents are proposed, providing new paradigms for the design of ER inhibitors.
    Expert Opinion on Drug Discovery 01/2010; 5(1):21-31. DOI:10.1517/17460440903490395 · 3.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Toxicity databases have a special role in predictive toxicology, providing ready access to historical information throughout the workflow of discovery, development, and product safety processes in drug development as well as in review by regulatory agencies. To provide accurate information within a hypothesesbuilding environment, the content of the databases needs to be rigorously modeled using standards and controlled vocabulary. The utilitarian purposes of databases widely vary, ranging from a source for (Q)SAR datasets for modelers to a basis for "read-across" for regulators. Many tasks involved in the use of databases are closely tied to data mining, hence database and data mining are essential technology pairs. To understand chemically-induced toxicity, chemical structures must be integrated into the toxicity databases. Data mining these "structure-integrated toxicity databases" requires techniques for handling both chemical structures and textual toxicity information. Structure data mining is similar with some modifications to that conventionally employed for large chemical databases, while data mining of toxicity endpoints is not well developed. This review presents a general strategy to data mine structure-integrated toxicity databases to link chemical structures to biological endpoints. Iterative probing of the chemical domain with toxicity endpoint descriptors and the biological domain with chemical descriptors enables linking of the two domains. Data mining steps to elucidate the hidden relationships between the target organs and chemical classes are presented as an example. Work is in progress in the public domain toward the linking of chemistry to biology by providing databases that can be mined.
    Current Computer - Aided Drug Design 05/2006; 2(2):135-150. DOI:10.2174/157340906777441672 · 1.94 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Juvenile hormone esterase (JHE) plays a key role in the development and metamorphosis of holometabolous insects. Its inhibitors could possibly be targeted for insect control. Conversely, JHE may also be involved in endocrine disruption by xenobiotics, resulting in detrimental effects in beneficial insects. There is therefore a need to know the structural characteristics of the molecules able to monitor JHE activity, and to develop SAR and QSAR studies to estimate their effectiveness. For a large diverse population of 181 trifluoromethylketones (TFKs) - the most potent JHE inhibitors known to date - we recently proposed a binary classification (active/inactive) using a support vector machine and Codessa structural descriptors. We have now examined, using the same data set and with the same descriptors, the applicability and performance of five other machine learning approaches. These have been shown able to handle high dimensional data (with descriptors possibly irrelevant or redundant) and to cope with complex mechanisms, but without delivering explicit directly exploitable models. Splitting the data into five batches (training set 80%, test set 20%) and carrying out leave-one-out cross-validation, led to good results of comparable performance, consistent with our previous support vector classifier (SVC) results. Accuracy was greater than 0.80 for all approaches. A reduced set of 15 descriptors common to all the investigated approaches showed good predictive ability (confirmed using a three-layer perceptron) and gives some clues regarding a mechanistic interpretation.
    SAR and QSAR in Environmental Research 06/2014; 25(7):1-28. DOI:10.1080/1062936X.2014.919959 · 1.92 Impact Factor