-
[show abstract]
[hide abstract]
ABSTRACT: The use of large-scale compound screening has become a key component of drug discovery projects in both the pharmaceutical and the biotechnological industries. More recently, these activities have also been embraced by the academic community as a major tool for chemical genomic activities. High-throughput screening (HTS) activities constitute a major step in the initial drug discovery efforts and involve the use of large quantities of biological reagents, hundreds of thousands to millions of compounds, and the utilization of expensive equipment. All these factors make it very important to evaluate in advance of the HTS campaign any potential issues related to reproducibility of the experimentation and the quality of the results obtained at the end of these very costly activities. In this article, the authors describe how GlaxoSmithKline (GSK) has addressed the need of a true validation of the HTS process before embarking in full HTS campaigns. They present 2 different aspects of the so-called validation process: (1) optimization of the HTS workflow and its validation as a quality process and (2) the statistical evaluation of the HTS, focusing on the reproducibility of results and the ability to distinguish active from nonactive compounds in a vast collection of samples. The authors describe a variety of reproducibility indexes that are either innovative or have been adapted from generic medical diagnostic screening strategies. In addition, they exemplify how these validation tools have been implemented in a number of case studies at GSK.
Journal of Biomolecular Screening 02/2009; 14(1):66-76. · 2.05 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: More than 500 compounds chosen to represent kinase inhibitor space have been screened against a panel of over 200 protein kinases. Significant results include the identification of hits against new kinases including PIM1 and MPSK1, and the expansion of the inhibition profiles of several literature compounds. A detailed analysis of the data through the use of affinity fingerprints has produced findings with implications for biological target selection, the choice of tool compounds for target validation, and lead discovery and optimization. In a detailed examination of the tyrosine kinases, interesting relationships have been found between targets and compounds. Taken together, these results show how broad cross-profiling can provide important insights to assist kinase drug discovery.
Journal of Medicinal Chemistry 12/2008; 51(24):7898-914. · 4.80 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.
Journal of Chemical Information and Modeling 08/2008; 48(8):1543-57. · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control. By simultaneously optimizing recall and precision, the MOEA generates a family of SARs that lie on the precision-recall (PR) curve. The user is then able to select a query with an appropriate balance in the two objectives: for example, a low recall-high precision query may be preferred when establishing the SAR, whereas a high recall-low precision query may be more appropriate in a virtual screening context. Each query on the PR curve aims at capturing the structure-activity information into a single representation, and each can be considered as an alternative (equally valid) solution. We then investigate combining individual queries into teams with the aim of capturing multiple SARs that may exist in a data set, for example, as is commonly seen in high-throughput screening data sets. Team formation is carried out iteratively as a postprocessing step following the evolution of the individual queries. The inclusion of uniqueness as a third objective within the MOEA provides an effective way of ensuring the queries are complementary in the active compounds they describe. Substantial improvements in both recall and precision are seen for some data sets. Furthermore, the resulting queries provide more detailed structure-activity information than is present in a single query.
Journal of Chemical Information and Modeling 08/2008; 48(8):1558-70. · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Data mining is a fast-growing field that is finding application across a wide range of industries. HTS is a crucial part of the drug discovery process at most large pharmaceutical companies. Accurate analysis of HTS data is, therefore, vital to drug discovery. Given the large quantity of data generated during an HTS, and the importance of analyzing those data effectively, it is unsurprising that data-mining techniques are now increasingly applied to HTS data analysis. Taking a broad view of both the HTS process and the data-mining process, we review recent literature that describes the application of data-mining techniques to HTS data.
Drug Discovery Today 09/2006; 11(15-16):694-9. · 6.83 Impact Factor
-
Journal of Chemical Information and Modeling. 01/2004; 44:2145-2156.
-
[show abstract]
[hide abstract]
ABSTRACT: This paper describes the development of a drug rings database and Web-based search tools. The database contains ring structures from both corporate and commercial databases, along with characteristic descriptors including frequency of occurrence as an indicator of synthetic accessibility and calculated property and geometric parameters. Analysis of the rings in several major databases is described, with illustrations of applications of the database in lead discovery programs where bioisosteres and geometric isosteres are sought.
Journal of Medicinal Chemistry 08/2003; 46(15):3257-74. · 5.25 Impact Factor
-
Journal of Chemical Information and Computer Sciences. 01/2001; 41:1295-1300.
-
[show abstract]
[hide abstract]
ABSTRACT: Reduced graph representations of chemical structures have been shown to be effective in similarity searching applications where they offer comparable performance to other 2D descriptors in terms of recall experiments. They have also been shown to complement existing descriptors and to offer potential to scaffold hop from one chemical series to another. Various methods have been developed for quantifying the similarity between reduced graphs including fingerprint approaches, graph matching, and an edit distance method. The edit distance approach quantifies the degree of similarity of two reduced graphs based on the number and type of operations required to convert one graph to the other. An attractive feature of the edit distance method is the ability to assign different weights to different operations. For example, the mutation of an aromatic ring node to an acyclic node may be assigned a higher weight than the mutation of an aromatic ring to an aliphatic ring node. In this paper, we describe a genetic algorithm (GA) for training the weights of the different edit distance operations. The method is applied to specific activity classes extracted from the MDDR database to derive activity-class specific weights. The GA-derived weights give substantially improved results in recall experiments as compared to using weights assigned on intuition. Furthermore, such activity specific weights may provide useful structure--activity information for subsequent design efforts. In a virtual screening setting when few active compounds are known, it may be more useful to have weights that perform well across a variety of different activity classes. Thus, the GA is also trained on multiple activity classes simultaneously to derive a generalized set of weights. These more generally applicable weights also represent a substantial improvement on previous work.
Journal of Chemical Information and Modeling 46(2):577-86. · 4.68 Impact Factor