Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets

Journal of Chemical Information and Modeling (Impact Factor: 3.74). 12/2012; 53(1). DOI: 10.1021/ci300403k
Source: PubMed


Large corpora of kinase small molecule inhibitor data are accessible to public sector research from thousands of journal article and patent publications. These data have been generated employing a wide variety of assay methodologies and experimental procedures by numerous laboratories. Here we ask the question how applicable these heterogeneous datasets are to predict kinase activities and which characteristics of the datasets contribute to their utility. We accessed almost 500,000 molecules from the Kinase Knowledge Base (KKB) and after rigorous aggregation and standardization generated over 180 distinct datasets covering all major groups of the human Kinome. To assess the value of the datasets we generated hundreds of classification and regression models. Their rigorous cross-validation and characterization demonstrated highly predictive classification and quantitative models for the majority of kinase targets if a minimum required number of active compounds or structure-activity data points were available. We then applied the best classifiers to compounds most recently profiled in the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program and found good agreement of profiling results with predicted activities. Our results indicate that, although heterogeneous in nature, the publically accessible datasets are exceedingly valuable and well suited to develop highly accurate predictors for practical Kinome-wide virtual screening applications and to complement experimental kinase profiling.

Download full-text


Available from: Stephan Schurer, Nov 05, 2014
1 Follower
14 Reads
  • Source
    • "All these studies achieved good prediction performances: from 0.67 to 0.73 correlation coefficient in Lapins and Wikberg (2010); accuracy between 74 and 81% and matthews correlation coefficient (MCC) between 0.3 and 0.48 in different tested datasets and with different encodings and learning methods in Niijima et al. (2012); 94% accuracy and 0.98 area under the ROC curve (auROC) in Cao et al. (2013). In Schürer and Muskal (2013), the auROC for individual kinase models vary from around 0.93 to 1, and the prediction accuracy showed a positive correlation with the number of known inhibitors available for training. In Yabuuchi et al. (2011), some predicted novel inhibitors for the epidermal growth factor receptor kinase and the cyclin-dependent kinase 2 were experimentally confirmed, sometimes showing scaffold hopping (i.e., having radically different characteristics than known inhibitors). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The central role of kinases in virtually all signal transduction networks is the driving motivation for the development of compounds modulating their activity. ATP-mimetic inhibitors are essential tools for elucidating signaling pathways and are emerging as promising therapeutic agents. However, off-target ligand binding and complex and sometimes unexpected kinase/inhibitor relationships can occur for seemingly unrelated kinases, stressing that computational approaches are needed for learning the interaction determinants and for the inference of the effect of small compounds on a given kinase. Recently published high-throughput profiling studies assessed the effects of thousands of small compound inhibitors, covering a substantial portion of the kinome. This wealth of data paved the road for computational resources and methods that can offer a major contribution in understanding the reasons of the inhibition, helping in the rational design of more specific molecules, in the in silico prediction of inhibition for those neglected kinases for which no systematic analysis has been carried yet, in the selection of novel inhibitors with desired selectivity, and offering novel avenues of personalized therapies.
    Frontiers in Genetics 06/2014; 5:196. DOI:10.3389/fgene.2014.00196
  • Source
    • "Examples of such applications are the global mapping of pharmacological space by Paolini and co-workers, [3] the Similarity Ensemble Approach (SEA), [4] the Bayesian models for adverse drug reactions by Bender and coworkers, [5] the models used for polypharmacological optimization by Hopkins et al., [6] and the kinome-wide activity modeling studies by Schuerer and Muskal. [7] These methods can be used to predict off-target effects based on heterogeneous public activity data and chemical similarity analysis. Usually, public off-target toxicity models like human Ether-à-go-go-Related Gene (hERG) [8] and cytochrome P450 (CYP) models [9], [10] are based and validated on mixed public IC50 data, since there is not enough public data available that originates from one single assay. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The biochemical half maximal inhibitory concentration (IC50) is the most commonly used metric for on-target activity in lead optimization. It is used to guide lead optimization, build large-scale chemogenomics analysis, off-target activity and toxicity models based on public data. However, the use of public biochemical IC50 data is problematic, because they are assay specific and comparable only under certain conditions. For large scale analysis it is not feasible to check each data entry manually and it is very tempting to mix all available IC50 values from public database even if assay information is not reported. As previously reported for Ki database analysis, we first analyzed the types of errors, the redundancy and the variability that can be found in ChEMBL IC50 database. For assessing the variability of IC50 data independently measured in two different labs at least ten IC50 data for identical protein-ligand systems against the same target were searched in ChEMBL. As a not sufficient number of cases of this type are available, the variability of IC50 data was assessed by comparing all pairs of independent IC50 measurements on identical protein-ligand systems. The standard deviation of IC50 data is only 25% larger than the standard deviation of Ki data, suggesting that mixing IC50 data from different assays, even not knowing assay conditions details, only adds a moderate amount of noise to the overall data. The standard deviation of public ChEMBL IC50 data, as expected, resulted greater than the standard deviation of in-house intra-laboratory/inter-day IC50 data. Augmenting mixed public IC50 data by public Ki data does not deteriorate the quality of the mixed IC50 data, if the Ki is corrected by an offset. For a broad dataset such as ChEMBL database a Ki- IC50 conversion factor of 2 was found to be the most reasonable.
    PLoS ONE 04/2013; 8(4):e61007. DOI:10.1371/journal.pone.0061007 · 3.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental impediment to functional recovery from spinal cord injury (SCI) and traumatic brain injury is the lack of sufficient axonal regeneration in the adult central nervous system. There is thus a need to develop agents that can stimulate axon growth to re-establish severed connections. Given the critical role played by protein kinases in regulating axon growth and the potential for pharmacological intervention, small molecule protein kinase inhibitors present a promising therapeutic strategy. Here, we report a robust cell-based phenotypic assay, utilizing primary rat hippocampal neurons, for identifying small molecule kinase inhibitors that promote neurite growth. The assay is highly reliable and suitable for medium throughput screening, as indicated by its Z'-factor of 0.73. A focused structurally diverse library of protein kinase inhibitors was screened, revealing several compound groups with the ability to strongly and consistently promote neurite growth. The best performing bioassay hit robustly and consistently promoted axon growth in a postnatal cortical slice culture assay. This study can serve as a jumping-off point for structure activity relationship (SAR) and other drug discovery approaches towards the development of drugs for treating SCI and related neurological pathologies.
    ACS Chemical Biology 03/2013; 8(5). DOI:10.1021/cb300584e · 5.33 Impact Factor
Show more