-
[show abstract]
[hide abstract]
ABSTRACT: A compound pathway model is introduced to monitor SAR progression in compound data sets. Pathways are formed by sequences of structurally analogous compounds with stepwise increasing potency that ultimately yield highly potent compounds. Hence, the model was designed to mimic compound optimization efforts. Different pathway categories were defined. Pathways originating from any active compound in a data set were systematically identified including compounds forming activity cliffs. The relative frequency of activity cliff-dependent and -independent pathways was determined and compared. In 23 of 39 different compound data sets that qualified for our analysis, significant differences in the relative frequency of activity cliff-dependent and -independent pathways were observed. In 17 of these 23 data sets, activity cliff-dependent pathways occurred with higher relative frequency than cliff-independent pathways. In addition, pathways originating from the majority of activity cliff compounds displayed desired SAR progression, reflecting SAR information gain associated with activity cliffs.
Journal of Chemical Information and Modeling 04/2013; · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Activity cliffs are defined as pairs of structurally similar compounds with a significant difference in potency. These compound pairs have high SAR information content because they represent small structural changes leading to large potency alterations. Accordingly, activity cliffs are of prime interest for SAR exploration and compound optimization. It is currently unknown to what extent activity cliff information is utilized in practical medicinal chemistry. Therefore, we have assembled 56 compound data sets that evolved over time and searched for analogs of activity cliff-forming compounds with further increased potency. For ~75% of all activity cliffs, there was no evidence for further chemical exploration. For ~25% of all cliffs, potency progression was detected. In total, for ~15% of all activity cliffs, positive cliff progression was observed that often involved multiple analogs. Given these findings, chemically unexplored activity cliffs should provide significant opportunities for further study in medicinal chemistry.
Journal of Medicinal Chemistry 03/2013; · 4.80 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Using support vector machine (SVM) ranking, a complex multi-class prediction task has been investigated involving sets of compounds that were active against closely related targets and represented all possible combinations of single-, dual, and triple-target activities. Standard SVM models were not capable of differentiating compounds with overlapping yet distinct activity profiles. To address this problem, we designed differentially weighted SVM linear combinations that were found to preferentially detect compounds with desired activity profiles and deprioritize others. Hence, combining independently derived SVM models using negative and positive linear weighting factors balanced relative contributions from individual reference sets and successfully distinguished between compounds with overlapping activity profiles.
Journal of Chemical Information and Modeling 03/2013; · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We provide a future perspective of the virtual screening field. A number of challenges will be highlighted that virtual screening will likely face when compound data will further grow at or beyond current rates and when much more target information will become available. These challenges go beyond computational efficiency issues (that will of course also play a critical role). For example, for structure-based approaches, the accuracy of scoring functions and energy calculations will need to be improved. For ligand-based approaches, the compound class-dependence of similarity methods needs to be further explored and relationships between molecular similarity and activity similarity need to be established. We also comment on the current and future value of virtual screening. Opportunities for further development in a postgenome era are also discussed. It is hoped that some of the views and hypotheses we articulate might stimulate further discussion about the virtual screening field going forward.
Chemical Biology & Drug Design 01/2013; 81(1):33-40. · 2.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Activity cliffs are formed by pairs of structurally similar compounds that act against the same target but display a significant difference in potency. Such activity cliffs are the most prominent features of activity landscapes of compound data sets and a primary focal point of structure-activity relationship (SAR) analysis. The search for activity cliffs in various compound sets has been the topic of a number of previous investigations. So far, activity cliff analysis has concentrated on data mining for activity cliffs and on their graphical representation and has thus been descriptive in nature. By contrast, approaches for activity cliff prediction are currently not available. We have derived support vector machine (SVM) models to successfully predict activity cliffs. A key aspect of the approach has been the design of new kernels to enable SVM classification on the basis of molecule pairs, rather than individual compounds. In test calculations on different data sets, activity cliffs have been accurately predicted using specifically designed structural representations and kernel functions.
Journal of Chemical Information and Modeling 08/2012; 52(9):2354-65. · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (∼76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.
Journal of Chemical Information and Modeling 08/2011; 51(8):1831-9. · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In independent studies it has previously been demonstrated that two-dimensional (2D) fingerprints have scaffold hopping ability in virtual screening, although these descriptors primarily emphasize structural and/or topological resemblance of reference and database compounds. However, the mechanism by which such fingerprints enrich structurally diverse molecules in database selection sets is currently little understood. In order to address this question, similarity search calculations on 120 compound activity classes of varying structural diversity were carried out using atom environment fingerprints. Two feature selection methods, Kullback-Leibler divergence and gain ratio analysis, were applied to systematically reduce these fingerprints and generate alternative versions for searching. Gain ratio is a feature selection method from information theory that has thus far not been considered in fingerprint analysis. However, it is shown here to be an effective fingerprint feature selection approach. Following comparative feature selection and similarity searching, the compound recall characteristics of original and reduced fingerprint versions were analyzed in detail. Small sets of fingerprint features were found to distinguish subsets of active compounds from other database molecules. The compound recall of fingerprint similarity searching often resulted from a cumulative detection of distinct compound subsets by different fingerprint features, which provided a rationale for the scaffold hopping potential of these 2D fingerprints.
Journal of Chemical Information and Modeling 08/2011; 51(9):2254-65. · 4.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Support vector machine modeling has become increasingly popular in chemoinformatics. Recently, several advanced support vector machine applications have been reported including, among others, multitask learning for ligand-target prediction. Here, we introduce another support vector machine approach to add compound potency information to similarity searching and enrich database selection sets with potent hits. For this purpose, we introduce a structure-activity kernel function and a potency-oriented support vector machine linear combination approach. Using fingerprint descriptors, potency-directed support vector machine searching has been successfully applied to four high-throughput screening data sets, and different support vector machine strategies have been compared. For potency-balanced compound reference sets, potency-directed support vector machine searching meets or exceeds recall rates of standard support vector machine calculations but detects many more potent hits.
Chemical Biology & Drug Design 01/2011; 77(1):30-8. · 2.28 Impact Factor