Computational Analysis of Novel Drug Opportunities (CANDO) platform benchmarking accuracies and putative drug predictions. Percent accuracies based on large-scale benchmarking of the CANDO platform are shown for seven out of a possible 1439 indications with two or more approved drugs. The putative drug predictions with the highest confidence are shown in purple (concurrence≥5) or blue (concurrence=4) as compound names over each accuracy bar, and PubChem IDs for others are shown within each of the four concurrence score categories. The percent accuracy measure reflects the ability of the platform to recognize related approved drugs in the top 25 ranked predictions for an indication based on inferring homology between compound–proteome signatures, where each signature comprises interaction scores between a compound and 48,278 proteins, and there are 3733 compounds. The concurrence score represents the number of occurrences of particular compounds in each set of top 25 predictions generated for all of the drugs approved for a particular indication (number indicated by brown circles in the middle of each accuracy bar). The resulting predictions are drugs approved for other indications but represent proteomic homologs (i.e. have similar compound–proteome signatures to drugs approved for the indication considered). The red medical plus sign on right-hand side signifies the threshold accuracies of prediction for particular numbers of indications: 14 indications have 100% benchmarking accuracy in terms of identifying related drugs approved for the same indication; 20 indications have 80% accuracy or more; 75 indications have 60% accuracy or more; 254 indications have 40% accuracy or more; 543 indications have 20% accuracy or more; and 657 indications had some measure of success in terms of benchmarking (i.e. greater than 0% accuracy). The solid black lines represent the average accuracies of the CANDO platform for all 1439 indications (17%) and for the 657 successful indications (36%) based on the top 25 predictions. These particular seven indications were selected because they are among those for which validations are being undertaken by collaborators and contract research organizations; however, our prospective predictions could be validated by any researcher working on these indications and, thus, reflect real-use cases of the CANDO platform. By contrast, with respect to randomly devised controls, the accuracy never exceeds 0.2% (small dashed line) even when the CANDO matrix is swapped out with more than 1000 matrices constructed by randomly swapping all compound and all protein interaction values. Likewise, the best single protein control (Argonaut), defined as the best performing protein when each of the 48,278 proteins is considered individually by the CANDO platform, yields 2% average accuracy for all indications (long dashed line). This not only indicates the value of using multiple proteins to increase the accuracy of drug predictions, but also points to the potential of the CANDO platform in dissecting the roles of particular proteins and protein classes in disease using small molecules approved for treatment of particular indications as probes. The PubChem IDs marked with asterisks represent high confidence drug predictions across multiple indications; 91/105 high confidence drug predictions are shared between indications (see Table S1 in the supplementary material online), indicating the complex relation between small molecules, proteomes, and indications, such as Alzheimer's disease, type 2 diabetes mellitus and systemic lupus erythematous. Our results indicate that our holistic compound–proteome signature homology inference-based drug discovery could yield significantly higher success rates than blind high-throughput screening focused on singular disease etiologies. The CANDO approach is applicable to any disease pathology that can be localized to a group of proteins (including whole-pathogen proteomes), as well as 2030 indications associated with at least one US Food and Drug Administration approved drug.

Computational Analysis of Novel Drug Opportunities (CANDO) platform benchmarking accuracies and putative drug predictions. Percent accuracies based on large-scale benchmarking of the CANDO platform are shown for seven out of a possible 1439 indications with two or more approved drugs. The putative drug predictions with the highest confidence are shown in purple (concurrence≥5) or blue (concurrence=4) as compound names over each accuracy bar, and PubChem IDs for others are shown within each of the four concurrence score categories. The percent accuracy measure reflects the ability of the platform to recognize related approved drugs in the top 25 ranked predictions for an indication based on inferring homology between compound–proteome signatures, where each signature comprises interaction scores between a compound and 48,278 proteins, and there are 3733 compounds. The concurrence score represents the number of occurrences of particular compounds in each set of top 25 predictions generated for all of the drugs approved for a particular indication (number indicated by brown circles in the middle of each accuracy bar). The resulting predictions are drugs approved for other indications but represent proteomic homologs (i.e. have similar compound–proteome signatures to drugs approved for the indication considered). The red medical plus sign on right-hand side signifies the threshold accuracies of prediction for particular numbers of indications: 14 indications have 100% benchmarking accuracy in terms of identifying related drugs approved for the same indication; 20 indications have 80% accuracy or more; 75 indications have 60% accuracy or more; 254 indications have 40% accuracy or more; 543 indications have 20% accuracy or more; and 657 indications had some measure of success in terms of benchmarking (i.e. greater than 0% accuracy). The solid black lines represent the average accuracies of the CANDO platform for all 1439 indications (17%) and for the 657 successful indications (36%) based on the top 25 predictions. These particular seven indications were selected because they are among those for which validations are being undertaken by collaborators and contract research organizations; however, our prospective predictions could be validated by any researcher working on these indications and, thus, reflect real-use cases of the CANDO platform. By contrast, with respect to randomly devised controls, the accuracy never exceeds 0.2% (small dashed line) even when the CANDO matrix is swapped out with more than 1000 matrices constructed by randomly swapping all compound and all protein interaction values. Likewise, the best single protein control (Argonaut), defined as the best performing protein when each of the 48,278 proteins is considered individually by the CANDO platform, yields 2% average accuracy for all indications (long dashed line). This not only indicates the value of using multiple proteins to increase the accuracy of drug predictions, but also points to the potential of the CANDO platform in dissecting the roles of particular proteins and protein classes in disease using small molecules approved for treatment of particular indications as probes. The PubChem IDs marked with asterisks represent high confidence drug predictions across multiple indications; 91/105 high confidence drug predictions are shared between indications (see Table S1 in the supplementary material online), indicating the complex relation between small molecules, proteomes, and indications, such as Alzheimer's disease, type 2 diabetes mellitus and systemic lupus erythematous. Our results indicate that our holistic compound–proteome signature homology inference-based drug discovery could yield significantly higher success rates than blind high-throughput screening focused on singular disease etiologies. The CANDO approach is applicable to any disease pathology that can be localized to a group of proteins (including whole-pathogen proteomes), as well as 2030 indications associated with at least one US Food and Drug Administration approved drug.

Source publication
Article
Full-text available
The Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando) uses similarity of compound–proteome interaction signatures to infer homology of compound/drug behavior. We constructed interaction signatures for 3733 human ingestible compounds covering 48,278 protein structures mapping to 2030 indications based on...

Citations

... Drug repurposing aids in the speed 111 of drug development by finding new uses for existing drugs with known safety profiles 112 [19]. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for 113 multiscale therapeutic discovery, repurposing, and design has been developed to overcome 114 current issues in drug discovery, such as the simultaneous optimization of efficacy and 115 safety, by using a holistic multiscale approach to characterize drug/compound behaviors 116 and functions [20][21][22][23][24][25][26][27][28][29][30][31][32][33]. ...
... The CANDO platform is comprised of various parallel pipelines that include protocols 118 for large scale protein-compound interaction scoring, analytics, benchmarking, and drug 119 candidate generation. The platform is agnostic to the compound-proteome interaction scor-120 ing protocol used; however, the default scoring method for CANDO v2+ uses a bioanalytic 121 docking protocol (BANDOCK) to generate interaction scores for a given protein binding site 122 by comparing the structural similarity of known binding site ligands to a query compound 123 [20][21][22][23][24][25][26][27][28][29][30][31]. A typical pipeline in CANDO calculates interactions between every compound 124 and every protein from large libraries to generate a compound-protein interaction matrix 125 where each compound (each row in the matrix) is described by a set of interactions to all 126 the proteins (each column in the matrix) in a proteome. ...
... Figure 2 illustrates the precision medicine pipeline implemented within the CANDO 142 platform for generating and validating putative NSCLC candidates. The CANDO platform 143 for multiscale therapeutic discovery, repurposing, and design generates novel drug predic-144 tions and repurposes existing drugs for every indication by overcoming the limitations of 145 traditional single target approaches [20][21][22][23][24][25][26][27][28][29][30][31][32][33]. One of the key tenets of CANDO is drugs that 146 are safe for human use exert their therapeutic effects and undergo the process of absorption, 147 dispersion, metabolism, and excretion (ADME) by interacting with multiple targets. ...
Preprint
Pharmacogenomics is a rapidly growing field with the goal of providing personalized care to every patient. Previously, we developed the Computational Analysis of Novel Drug Opportunities (CANDO) platform for multiscale therapeutic discovery to screen optimal compounds for any indication/disease by performing analytics on their interactions with large protein libraries. We implemented a comprehensive precision medicine drug discovery pipeline within the CANDO platform to determine which drugs are most likely to be effective against mutant phenotypes of non-small cell lung cancer (NSCLC) based on the supposition that drugs with similar interaction profiles (or signatures) will have similar behavior and therefore show synergistic effects. CANDO predicted that osimertinib, an EGFR inhibitor, is most likely to synergize with four KRAS inhibitors. Validation studies with cellular proliferation assays confirmed that osimertinib in combination with ARS-1620, a KRAS G12C inhibitor, and BAY-293, a pan-KRAS inhibitor, showed a synergistic effect on decreasing cellular proliferation by acting on mutant KRAS. Our precision medicine pipeline may be used to identify compounds capable of synergizing with inhibitors of KRAS G12C, and to assess their likelihood of becoming drugs by understanding their behavior at the proteomic/interactomic scales.
... In this study we describe and evaluate the performance of our Computational Analysis of Novel Drug Opportunities (CANDO) multiscale therapeutic drug discovery, repurposing, and design platform for identifying small molecules that show potential in inhibiting the SARS-CoV-2 virus and treating COVID-19. CANDO was originally designed as a shotgun repurposing platform for exactly this type of epidemic/pandemic scenario utilizing multiscale modeling techniques and adhering to multitarget drug theory, but has since been enhanced to carry out novel drug discovery against all indications Samudrala, 2003b, 2005;Jenwitheesuk et al., 2008;Horst et al., 2012;Minie et al., 2014;Sethi et al., 2015;Falls et al., 2019;Fine et al., 2019;Mangione and Samudrala, 2019;Mangione et al., 2020b;Hudson and Samudrala, 2021;Schuler et al., 2021) as well as novel drug design (Overhoff et al., 2021). The relatively recent introduction of higher order biological data such as protein pathways, proteinprotein interactions, drug side effects, and protein-disease associations has further augmented our ability to describe compound behavior holistically, with subsequent improved performance (Moukheiber et al., 2021;Schuler et al., 2021;Mangione, 2022;. ...
... We utilized our in-house bioinformatic analytics-based docking protocol BANDOCK to generate interaction scores between every compound and every protein structure; these scores serve as a proxy for binding strength/probability (Minie et al., 2014;Sethi et al., 2015;Falls et al., 2019;Hudson and Samudrala, 2021). The COACH algorithm from the I-TASSER suite (Yang et al., 2013) was used to predict binding sites for each protein. ...
Article
Full-text available
The worldwide outbreak of SARS-CoV-2 in early 2020 caused numerous deaths and unprecedented measures to control its spread. We employed our Computational Analysis of Novel Drug Opportunities (CANDO) multiscale therapeutic discovery, repurposing, and design platform to identify small molecule inhibitors of the virus to treat its resulting indication, COVID-19. Initially, few experimental studies existed on SARS-CoV-2, so we optimized our drug candidate prediction pipelines using results from two independent high-throughput screens against prevalent human coronaviruses. Ranked lists of candidate drugs were generated using our open source cando.py software based on viral protein inhibition and proteomic interaction similarity. For the former viral protein inhibition pipeline, we computed interaction scores between all compounds in the corresponding candidate library and eighteen SARS-CoV proteins using an interaction scoring protocol with extensive parameter optimization which was then applied to the SARS-CoV-2 proteome for prediction. For the latter similarity based pipeline, we computed interaction scores between all compounds and human protein structures in our libraries then used a consensus scoring approach to identify candidates with highly similar proteomic interaction signatures to multiple known anti-coronavirus actives. We published our ranked candidate lists at the very beginning of the COVID-19 pandemic. Since then, 51 of our 276 predictions have demonstrated anti-SARS-CoV-2 activity in published clinical and experimental studies. These results illustrate the ability of our platform to rapidly respond to emergent pathogens and provide greater evidence that treating compounds in a multitarget context more accurately describes their behavior in biological systems.
... In contrast, this study utilizes the Computational Analysis of Novel Drug Opportunities (CANDO) platform to obtain the protein feature descriptors to understand toxicity at the protein pathway level. CANDO is a multiscale shotgun drug discovery, repurposing, and design platform which employs multitargeting to generate proteomic scale interaction signatures for any small molecule, including approved drugs, against large libraries of protein structures from various organisms [36][37][38][39][40][41][42][43][44][45][46][47][48][49]. The proteomic interaction signatures are analyzed to computationally assess compound similarity, with the premise that drugs with similar signatures may treat the same diseases. ...
... The Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery, repurposing, and design platform [36][37][38][39][40][41][42][43][44][45][46][47][48][49] was used to generate protein interaction signatures for every molecule in its drug/compound library, which served as the feature extraction section in our pipeline. These protein interaction signatures were used as features in our machine learning development. ...
Article
Full-text available
Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.
... We used the CANDO platform [62][63][64][65][66][67][68][69] to predict drugs that can be repurposed to treat stable COPD. In CANDO, a compound/drug is potentially repurposable for an indication when it is found to have similar binding interactions with a specific proteome or library of proteins as a drug with known approval for the indication of interest. ...
Article
Full-text available
Bronchoalveolar lavage of the epithelial lining fluid (BALF) can sample the profound changes in the airway lumen milieu prevalent in chronic obstructive pulmonary disease (COPD). We compared the BALF proteome of ex-smokers with moderate COPD who are not in exacerbation status to non-smoking healthy control subjects and applied proteome-scale translational bioinformatics approaches to identify potential therapeutic protein targets and drugs that modulate these proteins for the treatment of COPD. Proteomic profiles of BALF were obtained from (1) never-smoker control subjects with normal lung function (n = 10) or (2) individuals with stable moderate (GOLD stage 2, FEV1 50–80% predicted, FEV1/FVC < 0.70) COPD who were ex-smokers for at least 1 year (n = 10). After identifying potential crucial hub proteins, drug–proteome interaction signatures were ranked by the computational analysis of novel drug opportunities (CANDO) platform for multiscale therapeutic discovery to identify potentially repurposable drugs. Subsequently, a literature-based knowledge graph was utilized to rank combinations of drugs that most likely ameliorate inflammatory processes. Proteomic network analysis demonstrated that 233 of the >1800 proteins identified in the BALF were significantly differentially expressed in COPD versus control. Functional annotation of the differentially expressed proteins was used to detail canonical pathways containing the differential expressed proteins. Topological network analysis demonstrated that four putative proteins act as central node proteins in COPD. The drugs with the most similar interaction signatures to approved COPD drugs were extracted with the CANDO platform. The drugs identified using CANDO were subsequently analyzed using a knowledge-based technique to determine an optimal two-drug combination that had the most appropriate effect on the central node proteins. Network analysis of the BALF proteome identified critical targets that have critical roles in modulating COPD pathogenesis, for which we identified several drugs that could be repurposed to treat COPD using a multiscale shotgun drug discovery approach.
... Knowing which of these interactions have therapeutic and adverse consequences would be a massive advantage when determining the chance of a novel therapy succeeding to approval in clinical trials. The primary means to achieve this goal has been through the use of the Computational Analysis of Novel Drug Opportunities (CANDO) platform, which is a multiscale shotgun drug discovery, repurposing, and design platform that assesses the similarity of small molecule compounds via their simulated interactions with various proteomes [47][48][49][50][51][52][53][54][55]. A major tenet of CANDO is that drugs that interact with proteins similarly will have similar behavior in biological systems and, therefore, may treat the same diseases. ...
... We have developed the Computational Analysis of Novel Drug Opportunities (CANDO) platform [47][48][49] to address these drug discovery challenges. One fundamental tenet of CANDO is that drugs interact with many different proteins and pathways to rectify disease states, and this promiscuous nature is exploited to relate drugs based on their proteomic signatures [49,55,[70][71][72] [55,70,73,74]. ...
... Protein libraries with fewer predicted ligand cluster binding partners yield much worse performance than those consisting of proteins interacting with a more structurally-diverse range of ligands. Coupling this with the finding that there is a minimum number of proteins required to reach optimal benchmarking accuracies ( Figure 2.3), which was also observed by us previously [47,48], drugs should realistically be described in the context of their multitarget nature, treating both small molecule compounds and proteins promiscuously, as in biological systems [60,96,97]. However, using libraries of proteins with too many diverse interactions in the CANDO platform also leads to suboptimal performance. ...
Thesis
Full-text available
The ability to accurately determine therapeutic and adverse effects of drugs using computational means would significantly reduce the failure rate of novel therapeutics in the clinic. As clinical trial success rates continue to plummet, innovative solutions that deviate from the strategies of modern methods are desperately needed. Modern methods typically consider drugs as “magic bullets” in which they assume the drug will modulate its desired target and that will be sufficient for treating the disease. This approach fails to consider the totality of interactions in which small molecule therapeutics participate due to their promiscuous nature, leading to two consequences: 1) the inability to foresee adverse side effects and 2) an incomplete picture of their therapeutic mechanisms. Multitarget theory establishes the framework through which we can better understand all possible impacts drugs have on biological systems and is paramount to solving the science of drug discovery. I investigated this in various applications, including protein promiscuity as a feature that more accurately describes drug behavior, efficiently predicting potent inhibitors of SARS-CoV-2, and integration of a heterogeneous biological network that not only allows for the accurate prediction of drug indications, but their potential side effects as well. My work highlights the superior efficacy of this holistic approach by demonstrating in multiple different diseases that compound behavior is better understood and predicted when considered through the lens of multitarget drug theory, and provides a basis for re-imagining the science of drug discovery.
... The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.472455 doi: bioRxiv preprint proteomic scale interaction signatures for approved and investigational small molecule 87 therapeutics against large libraries of protein structures from various organisms [36][37][38][39][40][41][42][43]. 88 These proteomic signatures are analyzed to computationally assess compound similarity, 89 with the premise that drugs with similar signatures may treat the same diseases. ...
... There were duplicates and inconsistent activity labels for the compounds across the 298 twelve assays. [36][37][38][39][40][41][42][43]. Interaction scores between all compounds in the Tox21 dataset and each struc-313 ture in the human protein library were computed using a rapid in-house bioanalytical 314 docking protocol known as BANDOCK [71]. ...
Preprint
Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning methods have been used to predict toxicity-related biological activities using chemical structure descriptors. However, proteomic features have not been fully investigated. In this study, we construct a computational model using machine learning for selecting the most important proteins representing features in predicting the toxicity of the compounds in the Tox21 dataset using the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) platform for therapeutic discovery. Tox21 is a highly imbalanced dataset consisting of twelve in-vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For our computational model, we employed a random forest (RF) with the combination of Synthetic Minority Oversampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) method, aka SMOTE+ENN, which is resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR), toxicity mediating transcription factor, and microchondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were passed into enrichment analysis to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with NR-AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong, with twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.
... Computational methods are efficient, accurate, holistic (i.e., take into account the entire interaction space of chemical entities), and have breadth in terms of chemical space exploration necessary to overcome the limitations of traditional approaches [2,6,12,13,[17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. To expand compound libraries utilized in screening, combinatorial chemistry and machine-learning design pipelines have been developed to generate libraries of compounds likely to bind to a given target [35][36][37]. ...
... We developed the Computational Analysis of Novel Drug Repurposing Opportunities (CANDO) platform for shotgun multitarget drug discovery, repurposing, and design to overcome the aforementioned limitations of traditional single-target approaches [18][19][20][21][22][23][24][25][26][27][28][29]. The platform screens and ranks drugs/compounds for every disease/indication (and adverse event) through the large-scale modeling and analytics of the interactions between comprehensive libraries of drugs/compounds and protein structures. ...
... A critical aspect of verifying the utility of the designs generated was to compare their predicted behavior (i.e., proteomic interaction signatures) to their intended behavior, which was input to conditional generation. If the predicted behaviors of designed compounds were highly similar to the conditional objective across objectives relative to the corresponding controls, we concluded that the RCVAE design pipeline may be used to accurately design compounds that possess any desirable bioactivity and subsequent function, given the extensive benchmarking and validation the CANDO paradigm has undergone [18,21,26,29,[47][48][49][50]. This is the primary motivation and goal for using the CVAE architecture in terms of accelerating drug discovery: design with respect to arbitrary numbers of on-, off-, and anti-targets ( Figure 1). ...
Article
Full-text available
Computational approaches have accelerated novel therapeutic discovery in recent decades. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget therapeutic discovery, repurposing, and design aims to improve their efficacy and safety by employing a holistic approach that computes interaction signatures between every drug/compound and a large library of non-redundant protein structures corresponding to the human proteome fold space. These signatures are compared and analyzed to determine if a given drug/compound is efficacious and safe for a given indication/disease. In this study, we used a deep learning-based autoencoder to first reduce the dimensionality of CANDO-computed drug–proteome interaction signatures. We then employed a reduced conditional variational autoencoder to generate novel drug-like compounds when given a target encoded “objective” signature. Using this approach, we designed compounds to recreate the interaction signatures for twenty approved and experimental drugs and showed that 16/20 designed compounds were predicted to be significantly (p-value ≤ 0.05) more behaviorally similar relative to all corresponding controls, and 20/20 were predicted to be more behaviorally similar relative to a random control. We further observed that redesigns of objectives developed via rational drug design performed significantly better than those derived from natural sources (p-value ≤ 0.05), suggesting that the model learned an abstraction of rational drug design. We also show that the designed compounds are structurally diverse and synthetically feasible when compared to their respective objective drugs despite consistently high predicted behavioral similarity. Finally, we generated new designs that enhanced thirteen drugs/compounds associated with non-small cell lung cancer and anti-aging properties using their predicted proteomic interaction signatures. his study represents a significant step forward in automating holistic therapeutic design with machine learning, enabling the rapid generation of novel, effective, and safe drug leads for any indication.
... In trypanosomatids, an exception could be the proteasome inhibitors (GNF6702, GSK3494245) [128,129], although their practical use is still awaited despite further development (LXE408) [130]. The in silico and bioinformatic approaches have followed a similar fate, and information on the effect of the next innovation waves (e.g., artificial intelligence, predictive platforms [131], and automation of drug discovery [132]) is not yet abundant enough to envision their possibilities, insofar that we are still waiting for a breakthrough in antiparasitic chemotherapy [121,133]. ...
Article
Full-text available
Leishmaniasis is a vector-borne parasitic disease caused by Leishmania species. The disease affects humans and animals, particularly dogs, provoking cutaneous, mucocutaneous, or visceral processes depending on the Leishmania sp. and the host immune response. No vaccine for humans is available, and the control relies mainly on chemotherapy. However, currently used drugs are old, some are toxic, and the safer presentations are largely unaffordable by the most severely affected human populations. Moreover, its efficacy has shortcomings, and it has been challenged by the growing reports of resistance and therapeutic failure. This manuscript presents an overview of the currently used drugs, the prevailing model to develop new antileishmanial drugs and its low efficiency, and the impact of deconstruction of the drug pipeline on the high failure rate of potential drugs. To improve the predictive value of preclinical research in the chemotherapy of leishmaniasis, several proposals are presented to circumvent critical hurdles—namely, lack of common goals of collaborative research, particularly in public–private partnership; fragmented efforts; use of inadequate surrogate models, especially for in vivo trials; shortcomings of target product profile (TPP) guides.
... Computational methods are efficient, accurate, holistic (i.e., take into account the entire interaction space of chemical entities), and have breadth in terms of chemical space exploration that are necessary to overcome the limitations of traditional approaches [2,6,12,13,[17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. To expand compound libraries utilized in screening, combinatorial chemistry and machine learning design pipelines have been developed to generate libraries of compounds likely to bind to a given target [35][36][37]. ...
... We developed the Computational Analysis of Novel Drug Repurposing Opportunities (CANDO) platform for shotgun multitarget drug discovery, repurposing, and design to overcome the aforementioned limitations of traditional single-target approaches [18][19][20][21][22][23][24][25][26][27][28][29]. The platform screens and ranks drugs/compounds for every disease/indication (and adverse event) through large-scale modeling and analytics of interactions between comprehensive libraries of drugs/compounds and protein structures. ...
... Multiple pipelines for multiscale therapeutic drug discovery, repurposing, and design have been implemented in the CANDO platform [18][19][20][21][22][23][24][25][26][27][28][29]. Here we utilize CANDO to simulate the interactions between a given drug/compound and a library of protein structures to generate the corresponding proteomic interaction signature. ...
Preprint
Computational approaches have accelerated novel therapeutic discovery in recent decades. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multi-target therapeutic discovery, repurposing, and design aims to improve their efficacy and safety by employing a holistic approach by computing interaction signatures between every drug/compound and a large library of non-redundant protein structures corresponding to the human proteome fold space. These signatures are compared and analyzed to determine if a given drug/compound is efficacious and safe for a given indication/disease. In this study, we used a deep learning based autoencoder to first reduce the dimensionality of CANDO computed drug-proteome interaction signatures. We then employed a reduced conditional variational autoencoder to generate novel drug-like compounds when given a target encoded "objective" signature. Using this model, we designed compounds to recreate the interaction signatures for twenty approved and experimental drugs and showed that 16/20 designed compounds are predicted to be significantly (p-value ≤ .05) more behaviorally similar relative to all corresponding controls, and 20/20 are predicted to be more behaviorally similar relative to a random control. We further observed that redesigns of objectives developed via rational drug design perform significantly better than those derived from natural sources (p-value ≤.05), suggesting that the model has learned an abstraction of rational drug design. We also show that designed compounds are structurally diverse and synthetically feasible when compared to their respective objective drugs despite consistently high predicted behavioral similarity. Finally, we generated new designs that enhance thirteen drugs/compounds associated with non-small cell lung cancer and anti-aging properties using their predicted proteomic interaction signatures. This work represents a significant step forward in automating holistic therapeutic design with machine learning, and subsequently offers a reduction in the time needed to generate novel, effective, and safe drug leads for any indication.
... 7 Less-defined end goals (high risk) enable researchers to systematically disrupt the entire drug discovery and development process (high reward). Drug-repurposing technologies, such as our CANDO platform, [8][9][10][11][12][13][14][15][16][17][18] are used to systematically predict the relative efficacy of every drug in its comprehensive library to treat every disease/indication, minimizing risks, and amplifying rewards. In conjunction with mechanistic basic science analyses, these platforms may be used to better understand the science of drug behavior and thereby model reality with greater fidelity. ...
... We developed and deployed the CANDO platform to model the relationships between every disease/indication and every human use drug/compound. [8][9][10][11][12][13][14][15][16][17][18] Built upon the premise of polypharmacology and multitargeting, at the core of CANDO is the ability to infer similarity of compound/drug behavior. Canonically, we use molecular-docking protocols to evaluate the interaction between large libraries of drugs/compounds and protein structures. ...
... Since the development and application of CANDO version 1, [8][9][10][11] we have continued to enhance our platform by analyzing the effect of protein subsets on drug behavior, implementing heterogeneous measures of drug/compound similarity, using multiple molecular-docking software packages to evaluate interactions, and refining nonsimilarity-based approaches for drug repurposing in situations where there is no approved drug for a disease/indication. [12][13][14][15][16][17][18]39 CANDO v2 Version 2 of the CANDO platform (v2) described here, implementing updated drug/compound and protein structure libraries, indication lists, drug-indication mappings, interaction scoring protocols, benchmarking and evaluation metrics, along with data fusion of multiple pipelines mixing and matching between these choices, is used as a template for the rigorous eval-uation of the performance of drug repurposing technologies. ...
Article
Full-text available
Drug-repurposing technologies are growing in number and maturing. However, comparisons to each other and to reality are hindered because of a lack of consensus with respect to performance evaluation. Such comparability is necessary to determine scientific merit and to ensure that only meaningful predictions from repurposing technologies carry through to further validation and eventual patient use. Here, we review and compare performance evaluation measures for these technologies using version 2 of our shotgun repurposing Computational Analysis of Novel Drug Opportunities (CANDO) platform to illustrate their benefits, drawbacks, and limitations. Understanding and using different performance evaluation metrics ensures robust cross-platform comparability, enabling us to continue to strive toward optimal repurposing by decreasing the time and cost of drug discovery and development.