Article

NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Motivation: Accurately predicting drug-target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks. Results: Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound-protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning. Availability and implementation: The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI. Supplementary information: Supplementary data are available at Bioinformatics online.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On the other hand, graph neural networks (GNNs), which are nonlinear machine learning models that generalize convolutional neural networks to graph data [19,20] with new information passing and aggregation techniques [21,22], have been successfully applied to heterogeneous network prediction tasks and yielded state-of-the-art performance [23,24,25]. The GNN methods have emerged as an important paradigm for advancing the drug discovery process [26,27,28]. For instance, NeoDTI [26] adopts such a GNN-based approach to exploit the heterogeneous biological networks and yields the state-of-the-art performance for target-drug interaction prediction. ...
... The GNN methods have emerged as an important paradigm for advancing the drug discovery process [26,27,28]. For instance, NeoDTI [26] adopts such a GNN-based approach to exploit the heterogeneous biological networks and yields the state-of-the-art performance for target-drug interaction prediction. Despite the success stories of the aforementioned computational methods in target identification, they still have the following limitations: ...
... On the one hand, the structured data in the form of biological networks document the relations between different biological entities (e.g., the interactions or associations between targets, diseases, drugs, and their side effects). These networks enable the graph learning methods to capture the latent feature representations of entities to infer new relations [17,26]. On the other hand, the unstructured data in the form of literature evidence can add a new dimension to describe the known relations. ...
Preprint
Full-text available
Early identification of safe and efficacious targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially the machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet-lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.
... Wang et al. transformed new DTI prediction problems into a two-layer graphical model named the restricted Boltzmann machine (RBM). Wan et al. [20] developed a new nonlinear end-to-end learning model, called NeoDTI, which integrates different heterogeneous information of drugs and targets, and learned the representation of drugs and targets to predict DTIs. however, note that these approaches have the disadvantage of treating all node relationships in the heterogeneous network equally and they may not work when chemical pathways and protein interactions are unknown. ...
... Deep learning-based methods Y. Yamanishi [5] √ √ X. Zheng [10] √ H. Öztürk [6] √ √ Y.-B. Wang [7] √ √ A. Mayr [11] √ √ I. Lee [12] √ √ X. Chen [8] √ X. Zeng [9] √ F. Wan [20] √ √ ...
... NeoDTI [20] integrates diverse information from heterogeneous networks and use graph neural network to learn the representation of drugs and targets automatically. [36] propose an "end-to-end" learning framework based on heterogeneous graphical convolutional networks to learn low-dimensional feature representations of drugs and targets. ...
Article
Full-text available
Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process. Although the method of validation via wet-lab experiments has become available, experimental methods for drug-target interaction (DTI) identification remain either time consuming or heavily dependent on domain expertise. Therefore, various computational models have been proposed to predict possible interactions between drugs and target proteins. However, most prediction methods do not consider the topological structures characteristics of the relationship. In this paper, we propose a relational topology- based heterogeneous network embedding method to predict drug-target interactions, abbreviated as RTHNE_ DTI. We first construct a heterogeneous information network based on the interaction between different types of nodes, to enhance the ability of association discovery by fully considering the topology of the network. Then drug and target protein nodes can be represented by the other types of nodes. According to the different topological structure of the relationship between the nodes, we divide the relationship in the heterogeneous network into two categories and model them separately. Extensive experiments on the real- world drug datasets, RTHNE_DTI produces high efficiency and outperforms other state-of-the-art methods. RTHNE_DTI can be further used to predict the interaction between unknown interaction drug-target pairs.
... Over recent years, a substantial number of computational methods have been developed for predicting drug discovery (Zhu et al., 2005;Sousa et al., 2006;Keiser et al., 2007;Bleakley and Yamanishi, 2009;Buza and Peška, 2017;Luo et al., 2017;Cheng et al., 2018;Olayan et al., 2018;Wan et al., 2018;Yan et al., 2019;Zeng et al., 2019;Wang et al., 2020a;Tang et al., 2020;Zeng et al., 2020;An and Yu, 2021;Chu et al., 2021;Yan et al., 2021;Zong et al., 2021). Target-based (Sousa et al., 2006), ligand similaritybased (Keiser et al., 2007) and machine learning-based (Zhu et al., 2005) methods are the three main-stream in prediction methods. ...
... Integration of neighbor information from a heterogeneous network for discovering new drug-target interactions develop a new nonlinear end-to-end learning model (Wan et al., 2018). The model integrates various information from heterogeneous network data and automatically learns representations that preserve drug and target topologies to facilitate DTI prediction.It focuses on the topological information of drugs and targets, and the collection of feature information is not enough, only drug structure similarity and target sequence similarity. ...
... The multi-source data used in this paper included nine sources for the drugs and six for the targets. According to current studies (Wan et al., 2018;Zeng et al., 2020), data sources for drugs and targets are not limited to this, such as drug-induced gene expression profiles, drug pathways profiles, and so on. In the future, more data sources for drugs and targets will be studied to complement the rich-ness of drugs and targets with multiple networks, and to further confirm our strategy's robustness. ...
Article
Full-text available
Accurate identification of Drug Target Interactions (DTIs) is of great significance for understanding the mechanism of drug treatment and discovering new drugs for disease treatment. Currently, computational methods of DTIs prediction that combine drug and target multi-source data can effectively reduce the cost and time of drug development. However, in multi-source data processing, the contribution of different source data to DTIs is often not considered. Therefore, how to make full use of the contribution of different source data to predict DTIs for efficient fusion is the key to improving the prediction accuracy of DTIs. In this paper, considering the contribution of different source data to DTIs prediction, a DTIs prediction approach based on an effective fusion of drug and target multi-source data is proposed, named EFMSDTI. EFMSDTI first builds 15 similarity networks based on multi-source information networks classified as topological and semantic graphs of drugs and targets according to their biological characteristics. Then, the multi-networks are fused by selective and entropy weighting based on similarity network fusion (SNF) according to their contribution to DTIs prediction. The deep neural networks model learns the embedding of low-dimensional vectors of drugs and targets. Finally, the LightGBM algorithm based on Gradient Boosting Decision Tree (GBDT) is used to complete DTIs prediction. Experimental results show that EFMSDTI has better performance (AUROC and AUPR are 0.982) than several state-of-the-art algorithms. Also, it has a good effect on analyzing the top 1000 prediction results, while 990 of the first 1000DTIs were confirmed. Code and data are available at https://github.com/meng-jie/EFMSDTI.
... BMC Bioinformatics (2022) 23:372 As a result, the graph embedding approach, particularly the graph neural network method [19], is gradually applied to this issue. In order to anticipate probable drug-target interactions, Wan et al. [20] developed a neural integration of neighbor information from an HN (NeoDTI). NeoDTI automatically learns topology-preserving representations while integrating a variety of data from HN. ...
... Four recently proposed deep-learning models, including DeepDR, NeoDTI, LAGCN, and NIMGCN [18,20,21,36], are chosen as baseline approaches in order to demonstrate the superiority of GCMM's performance. They are also similarity-based graph neural network models. ...
... After the full connected layer, the correlation coefficients of each pair of drug-disease are obtained through a matrix completion decoder. Experimental results in 5FCCV demonstrate that GCMM performs better than the other four similarity-based graph neural network models, DeepDR, NeoDTI, LAGCN, and NIMGCN [18,20,21,36], in the majority of indexs, and has a much higher accuracy. In addition, a case study on AD's potential therapeutic provides specific applications that reaffirms the medical validity of GCMM. ...
Article
Full-text available
Background The main focus of in silico drug repurposing, which is a promising area for using artificial intelligence in drug discovery, is the prediction of drug–disease relationships. Although many computational models have been proposed recently, it is still difficult to reliably predict drug–disease associations from a variety of sources of data. Results In order to identify potential drug–disease associations, this paper introduces a novel end-to-end model called Graph convolution network based on a multimodal attention mechanism (GCMM). In particular, GCMM incorporates known drug–disease relations, drug–drug chemical similarity, drug–drug therapeutic similarity, disease–disease semantic similarity, and disease–disease target-based similarity into a heterogeneous network. A Graph Convolution Network encoder is used to learn how diseases and drugs are embedded in various perspectives. Additionally, GCMM can enhance performance by applying a multimodal attention layer to assign various levels of value to various features and the inputting of multi-source information. Conclusion 5 fold cross-validation evaluations show that the GCMM outperforms four recently proposed deep-learning models on the majority of the criteria. It shows that GCMM can predict drug–disease relationships reliably and suggests improvement in the desired metrics. Hyper-parameter analysis and exploratory ablation experiments are also provided to demonstrate the necessity of each module of the model and the highest possible level of prediction performance. Additionally, a case study on Alzheimer’s disease (AD). Four of the five medications indicated by GCMM to have the highest potential correlation coefficient with AD have been demonstrated through literature or experimental research, demonstrating the viability of GCMM. All of these results imply that GCMM can provide a strong and effective tool for drug development and repositioning.
... Based on the data sources used as the input, two types of algorithms were used: network-based methods and structure-and sequence-based methods: (i) Network-based methods are the methods that used any graphical information from the proposed benchmark as the input, which includes multiple types of biomedical entities, such as drugs, targets, diseases, side effects and pathways, and the corresponding information from multipartite (including drug-target bipartite) networks. In practice, we used three stateof-the-art network-based methods: DTINet [21], Bio-Linked Network Embeddings (bioLNE) [64] and NEural integration of neighbOr information for DTI prediction (NeoDTI) [65]. For DTINet and NeoDTI, we used drugtarget, drug-disease, protein-disease, drug-side effect, protein-protein, drug-drug interaction as well as drugdrug similarity, and protein-protein similarity matrices as the input data. ...
... Existing methods were categorized into two distinct categories for the purpose of evaluating our benchmark based on the input data used: (i) network-based and (ii) structure-and sequence-based methods. For networkbased methods, DTINet [21], bioLNE [64] and NeoDTI [65] are considered state-of-the-art for comparison. For structure-and sequence-based, DeepPurpose [66], Deep-DTA [67] and GraphDTA [68] were adopted. ...
... The proposed benchmark will provide a fine assessment of the effectiveness of the drug repurposing methods. Secondly, both DeepPurpose [66] and NeoDTI [65] perform worse when the training and testing nodes sharing more connections (e.g. the performance of CC > TA > TC > TT; SU > SS > DI). Although it was expected for a network-based method, such as NeoDTI, to be affected by the connectivity of the drugs and targets, it is a novel bias found for DeepPurpose. ...
Article
Full-text available
Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.
... makes this new database an important supplement to existing databases. With the rapid adoption of AI in drug discovery, it is found that a comprehensive 'interacting network' including a large number of drugs and their interacting molecules is highly favored when using AI methods (108)(109)(110)(111)(112)(113)(114). For example, some tools were constructed to predict drug-target interaction based on the network of heterogeneous drug-centered interactions (109), and other tools were made available to learn the topology-preserving representation of drugs and targets based on the heterogeneous network data of drug-target, drug-drug and target-target interaction (110). ...
... With the rapid adoption of AI in drug discovery, it is found that a comprehensive 'interacting network' including a large number of drugs and their interacting molecules is highly favored when using AI methods (108)(109)(110)(111)(112)(113)(114). For example, some tools were constructed to predict drug-target interaction based on the network of heterogeneous drug-centered interactions (109), and other tools were made available to learn the topology-preserving representation of drugs and targets based on the heterogeneous network data of drug-target, drug-drug and target-target interaction (110). Because of these emerging demands on such interaction-based big data, DrugMAP made the first endeavor to weave a comprehensive network containing >200 000 interactions among >30 000 drugs/drug candidates and >5000 molecules of pharmacological importance. ...
Article
Full-text available
The efficacy and safety of drugs are widely known to be determined by their interactions with multiple molecules of pharmacological importance, and it is therefore essential to systematically depict the molecular atlas and pharma-information of studied drugs. However, our understanding of such information is neither comprehensive nor precise, which necessitates the construction of a new database providing a network containing a large number of drugs and their interacting molecules. Here, a new database describing the molecular atlas and pharma-information of drugs (DrugMAP) was therefore constructed. It provides a comprehensive list of interacting molecules for >30 000 drugs/drug candidates, gives the differential expression patterns for >5000 interacting molecules among different disease sites, ADME (absorption, distribution, metabolism and excretion)-relevant organs and physiological tissues, and weaves a comprehensive and precise network containing >200 000 interactions among drugs and molecules. With the great efforts made to clarify the complex mechanism underlying drug pharmacokinetics and pharmacodynamics and rapidly emerging interests in artificial intelligence (AI)-based network analyses, DrugMAP is expected to become an indispensable supplement to existing databases to facilitate drug discovery. It is now fully and freely accessible at: https://idrblab.org/drugmap/.
... This section gives the details of the drug-target interaction dataset used in our proposed work. The dataset was obtained from the study by Wan et al [35]. They obtained the original datasets for DDI and DTI from DrugBank Version 3.0 [26], PPI from HPRD database Release 9 [27], drug structure similarity from Morgan ngerprints of radius 2 calculated by RDCit [28][29], and protein sequence similarity from ...
... In this section, we detail the experiments and analysis performed using the proposed technique, comparing the performance of the proposed method with the baseline. MSCMF [32], TL_HGBI [33], DTINet [34], NeoDTI [35], and GADTI [36]. MSCMF is a factor model that uses multiple drug and target similarity matrices to automatically select similarities by estimating the weights of multiple similarity matrices from the data, which is effective in improving the performance of predicting drug-target interactions. ...
Preprint
Full-text available
Background Accurate prediction of drug-target interactions (DTIs) can guide the drug discovery process and thus facilitate drug development. Most existing computational models for machine learning tend to focus on integrating multiple data sources and combining them with popular embedding methods. However, researchers have paid less attention to the correlation between drugs and target proteins. In addition, recent studies have employed heterogeneous network graphs for DTI prediction, but there are limitations in obtaining rich neighborhood information among nodes in heterogeneous network graphs. Results Inspired by recent years of graph embedding and knowledge representation learning, we develop a new end-to-end learning model, called Graph-DTI, which integrates various information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. Our framework consists of three main building blocks. First, we integrate multiple data sources of drugs and target proteins and build a heterogeneous network from a collection of datasets. Second, the heterogeneous network is formed by extracting higher-order structural information using a GCN-inspired graph autoencoder to learn the nodes (drugs, proteins) and their topological neighborhood representations. The last part is to predict the potential DTIs and then send the trained samples to the classifier for binary classification. Conclusions The substantial improvement in prediction performance compared to other baseline DTI prediction methods demonstrates the superior predictive power of Graph-DTI. Moreover, the proposed framework has been successful in ranking drugs corresponding to different targets and vice versa. All these results suggest that Graph-DTI can provide a powerful tool for drug research, development and repositioning.
... In contrast, network-based approaches consider the diverse drug and target data as a (multiplex) heterogeneous DTI network that describes multiple aspects of drug and target relations, and learn topology-preserving representations of drugs and targets to facilitate DTI prediction. With deep neural networks showing consistently superior performance in the latest years in a plethora of different learning tasks, their adoption in the DTI prediction field, especially inferring new DTIs by mining DTI networks, is understandably rising [12,5,13,14]. Although deep learning models achieve improved performance, they require larger amounts of data and are computationally intensive [1]. ...
... WkNNIR cannot perform predictions in S1, as it is specifically designed to predict interactions involving new drugs or/and targets (S2, S3, S4) [6]. Furthermore, the proposed MDMF2A is also compared to four deep learning-based methods, namely NeoDTI [12], DTIP [5], DCFME [14], and SupDTI [13]. These deep learning competitors can only be applied to the Luo dataset in S1, because they formulate the DTI dataset as a heterogeneous network consisting of four types of nodes (drugs, targets, drug-side effects, and diseases) and learn embeddings for all types of nodes. ...
Article
The discovery of drug–target interactions (DTIs) is a very promising area of research with great potential. The accurate identification of reliable interactions among drugs and proteins via computational methods, which typically leverage heterogeneous information retrieved from diverse data sources, can boost the development of effective pharmaceuticals. Although random walk and matrix factorization techniques are widely used in DTI prediction, they have several limitations. Random walk-based embedding generation is usually conducted in an unsupervised manner, while the linear similarity combination in matrix factorization distorts individual insights offered by different views. To tackle these issues, we take a multi-layered network approach to handle diverse drug and target similarities, and propose a novel optimization framework, called Multiple similarity DeepWalk-based Matrix Factorization (MDMF), for DTI prediction. The framework unifies embedding generation and interaction prediction, learning vector representations of drugs and targets that not only retain higher order proximity across all hyper-layers and layer-specific local invariance, but also approximate the interactions with their inner product. Furthermore, we develop an ensemble method (MDMF2A) that integrates two instantiations of the MDMF model, optimizing the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC), respectively. The empirical study on real-world DTI datasets shows that our method achieves statistically significant improvement over current state-of-the-art approaches in four different settings. Moreover, the validation of highly ranked non-interacting pairs also demonstrates the potential of MDMF2A to discover novel DTIs.
... Based on integrated multiple drug and protein-related information sources, Luo et al. [43] developed a method called DTINet to predict potential drug-protein associations. The nonlinear end-to-end learning model NeoDTI was proposed by Wang et al. [44] to facilitate DTI prediction. The DTI-CNN model proposed by Peng et al. [40] obtained drug and target features in heterogeneous networks through random walks, and then used a deep neural ...
... Based on integrated multiple drug and protein-related information sources, Luo et al. [43] developed a method called DTINet to predict potential drug-protein associations. The nonlinear endto-end learning model NeoDTI was proposed by Wang et al. [44] to facilitate DTI prediction. The DTI-CNN model proposed by Peng et al. [40] obtained drug and target features in heterogeneous networks through random walks, and then used a deep neural network model to predict new drug-target interactions. ...
Article
Full-text available
Parkinson’s disease (PD) is a serious neurodegenerative disease. Most of the current treatment can only alleviate symptoms, but not stop the progress of the disease. Therefore, it is crucial to find medicines to completely cure PD. Finding new indications of existing drugs through drug repositioning can not only reduce risk and cost, but also improve research and development efficiently. A drug repurposing method was proposed to identify potential Parkinson’s disease-related drugs based on multi-source data integration and convolutional neural network. Multi-source data were used to construct similarity networks, and topology information were utilized to characterize drugs and PD-associated proteins. Then, diffusion component analysis method was employed to reduce the feature dimension. Finally, a convolutional neural network model was constructed to identify potential associations between existing drugs and LProts (PD-associated proteins). Based on 10-fold cross-validation, the developed method achieved an accuracy of 91.57%, specificity of 87.24%, sensitivity of 95.27%, Matthews correlation coefficient of 0.8304, area under the receiver operating characteristic curve of 0.9731 and area under the precision–recall curve of 0.9727, respectively. Compared with the state-of-the-art approaches, the current method demonstrates superiority in some aspects, such as sensitivity, accuracy, robustness, etc. In addition, some of the predicted potential PD therapeutics through molecular docking further proved that they can exert their efficacy by acting on the known targets of PD, and may be potential PD therapeutic drugs for further experimental research. It is anticipated that the current method may be considered as a powerful tool for drug repurposing and pathological mechanism studies.
... New trends in network-driven drug discoveries also include graph neural network-based approaches. For example, NeoDTI [81] implements an end-to-end neural network to learn the embeddings via graph convolutional networks and uses a topology a preserving objective to learn to reconstruct the weights of the input graph. The network can then be queried for drug-protein pairs to predict their interaction. ...
... A major advantage of these methods is the possibility to include any number of node and edge types and to predict any type of interaction. A disadvantage that is shared with all link prediction methods is that predictions depend on the completeness of the interaction graph and similarity information used for learning which can significantly impact predictions for novel drugs or targets [81]. ...
Article
Full-text available
The network approach is quickly becoming a fundamental building block of computational methods aiming at elucidating the mechanism of action (MoA) and therapeutic effect of drugs. By modeling the effect of drugs and diseases on different biological networks, it is possible to better explain the interplay between disease perturbations and drug targets as well as how drug compounds induce favorable biological responses and/or adverse effects. Omics technologies have been extensively used to generate the data needed to study the mechanisms of action of drugs and diseases. These data are often exploited to define condition-specific networks and to study whether drugs can reverse disease perturbations. In this review, we describe network data mining algorithms that are commonly used to study drug's MoA and to improve our understanding of the basis of chronic diseases. These methods can support fundamental stages of the drug development process, including the identification of putative drug targets, the in silico screening of drug compounds and drug combinations for the treatment of diseases. We also discuss recent studies using biological and omics-driven networks to search for possible repurposed FDA-approved drug treatments for SARS-CoV-2 infections (COVID-19).
... protein-associated phenotypes and diseases, drugs' side-effects, proteins' domain information) have been widely applied in traditional machine learning based drug-target interaction prediction works (Y. Luo et al., 2017;Wan et al., 2019) and other topics (e.g. drug repositioning ). ...
Preprint
Full-text available
Due to the tremendous combinatorial search space of tremendous drug-protein pairs, deep learning algorithms have been utilized to facilitate the identification of novel drug-target interactions. In this paper, we proposed an end-to-end deep learning model, DeepERA, to identify drug-target interactions based on heterogeneous data. This model assembles three independent feature embedding modules (intrinsic embedding, relational embedding, and annotation embedding) which each represent different attributes of the dataset and jointly contribute to the comprehensive predictions. This is the first work that, to our knowledge, applied deep learning models to learn each intrinsic features, relational features, and annotation features and combine them to predict drug-protein interactions. Our results showed that DeepERA outperformed other deep learning approaches proposed recently. Using our DeepERA framework, we identified 45,603 novel drug-protein interactions for the whole human proteome, including 356 drug-protein interactions for the human proteins targeted by SARS-CoV-2 viral proteins. We also performed computational docking for the selected interactions and conducted a two-way statistical test to "normalize" the docking scores of different proteins/drugs to support our predictions.
... For example, Peng et al. and Shao et al. predicted DTIs by integrating various node information through Graph Convolution Network (GCN) [37,40]. Similarly, Wan et al. proposed NeoDTI to predict DTIs, based on GCN integrating multi-type neighborhood information to advanced features through the neural network [39]. These methods of features diffusing according to the network structure ignore the direct association behavior semantic information of the network structure. ...
Article
Full-text available
Background Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. Methods We propose a multi-modal representation framework of ‘DeepMPF’ based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein–drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. Results To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. Conclusions All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/ , which can help relevant researchers to further study.
... Microarray gene expression [41], gene-disease interaction [42], disease-disease similarity, disease-variant network [43,44] CNN, GCN Although a lot of progress has been made in the relevant studies, there still remain some limitations, especially in the analysis of both the multiple trajectories of medical use and the patients' static data, such as electronic medical data. Researchers always employ two modes of feature learning for electronic medical data. ...
Article
Full-text available
How to use multi-dimensional time series data is a huge challenge for big data analysis. Multiple trajectories of medical use in electronic medical data are typical time series data. Although many artificial-intelligence techniques have been proposed to use the multiple trajectories of medical use in predicting the risk of concurrent medical use, most existing methods pay less attention to the temporal property of medical-use trajectory and the potential correlation between the different trajectories of medical use, resulting in limited concurrent multi-trajectory applications. To address the problem, we proposed a multi-stage neural network-based application mode of multi-dimensional time series data for feature learning of high-dimensional electronic medical data in adverse event prediction. We designed a synthetic factor for the multiple -trajectories of medical use with the combination of a Long Short Term Memory–Deep Auto Encoder neural network and bisecting k-means clustering method. Then, we used a deep neural network to produce two kinds of feature vectors for risk prediction and risk-related factor analysis, respectively. We conducted extensive experiments on a real-world dataset. The results showed that our proposed method increased the accuracy by 5%~10%, and reduced the false rate by 3%~5% in the risk prediction of concurrent medical use. Our proposed method contributes not only to clinical research, where it helps clinicians make effective decisions and establish appropriate therapy programs, but also to the application optimization of multi-dimensional time series data for big data analysis.
... For proteins represented as FASTA sequence, we use convolutional neural network for learning representations. Therefore, unlike previous works [2], [3], [7], [13], [17], [18], GraMDTA learns representations for drugs and proteins from structures and their corresponding knowledge graphs. We aggregate the multiple modalities of drugs and proteins using multi-head attention weighting mechanism to learn relevant information while eliminating the noisy information cascades. ...
... Zeng et al. presented a network-based model that combines neural networks with heterogeneous networks in which drugs and targets are represented as nodes, and their interactions are represented as edges to predict drug-target interactions. 97 Although system pharmacology is beyond the scope of this review and will not be further analyzed, it is interesting to note that the combination of ML methods with biological networks may be an efficient strategy to develop more accurate and understandable predictive models, as more data on AOPs become available. ...
Article
Full-text available
Machine learning (ML) models to predict the toxicity of small molecules have garnered great attention and have become widely used in recent years. Computational toxicity prediction is particularly advantageous in the early stages of drug discovery in order to filter out molecules with high probability of failing in clinical trials. This has been helped by the increase in the number of large toxicology databases available. However, being an area of recent application, a greater understanding of the scope and applicability of ML methods is still necessary. There are various kinds of toxic end points that have been predicted in silico. Acute oral toxicity, hepatotoxicity, cardiotoxicity, mutagenicity, and the 12 Tox21 data end points are among the most commonly investigated. Machine learning methods exhibit different performances on different data sets due to dissimilar complexity, class distributions, or chemical space covered, which makes it hard to compare the performance of algorithms over different toxic end points. The general pipeline to predict toxicity using ML has already been analyzed in various reviews. In this contribution, we focus on the recent progress in the area and the outstanding challenges, making a detailed description of the state-of-the-art models implemented for each toxic end point. The type of molecular representation, the algorithm, and the evaluation metric used in each research work are explained and analyzed. A detailed description of end points that are usually predicted, their clinical relevance, the available databases, and the challenges they bring to the field are also highlighted.
... WkNNIR [9], a neighborhood method, recovers possible missing interactions of known drugs (targets) and predicts interactions for new entities using proximity information characterized by chemical structure and protein sequence-based similarities. Apart from computing similarities, other solutions to describe characteristics of drug structures and target sequences include utilizing handcrafted molecular fingerprints and protein descriptors [10,11], as well as learning more robust high-level drug and target representations by graph, recurrent, and transformer neural networks [12,13,14]. ...
Preprint
The discovery of drug-target interactions (DTIs) is a pivotal process in pharmaceutical development. Computational approaches are a promising and efficient alternative to tedious and costly wet-lab experiments for predicting novel DTIs from numerous candidates. Recently, with the availability of abundant heterogeneous biological information from diverse data sources, computational methods have been able to leverage multiple drug and target similarities to boost the performance of DTI prediction. Similarity integration is an effective and flexible strategy to extract crucial information across complementary similarity views, providing a compressed input for any similarity-based DTI prediction model. However, existing similarity integration methods filter and fuse similarities from a global perspective, neglecting the utility of similarity views for each drug and target. In this study, we propose a Fine-Grained Selective similarity integration approach, called FGS, which employs a local interaction consistency-based weight matrix to capture and exploit the importance of similarities at a finer granularity in both similarity selection and combination steps. We evaluate FGS on five DTI prediction datasets under various prediction settings. Experimental results show that our method not only outperforms similarity integration competitors with comparable computational costs, but also achieves better prediction performance than state-of-the-art DTI prediction approaches by collaborating with conventional base models. Furthermore, case studies on the analysis of similarity weights and on the verification of novel predictions confirm the practical ability of FGS.
... GNN allows for generalizing DNN operations to graph-structured processing [40], [41]. By aggregating information from neighboring nodes, GNN models encode structural-relational information into the representation, which then is applied in a wide range of tasks, including biochemical structure discovery [42], [43], computer vision [44], and recommendation systems [45]. ...
... Although the above-mentioned methods have shown high prediction accuracy. Those molecular docking methods rely on the three-dimensional structure of the target protein [17]. The results of ligand-based methods may be less than ideal when there are insufficient data on known ligands [18]. ...
Article
Full-text available
Background Drug-target interactions (DTIs) prediction becomes more and more important for accelerating drug research and drug repositioning. Drug-target interaction network is a typical model for DTIs prediction. As many different types of relationships exist between drug and target, drug-target interaction network can be used for modeling drug-target interaction relationship. Recent works on drug-target interaction network are mostly concentrate on drug node or target node and neglecting the relationships between drug-target. Results We propose a novel prediction method for modeling the relationship between drug and target independently. Firstly, we use different level relationships of drugs and targets to construct feature of drug-target interaction. Then, we use line graph to model drug-target interaction. After that, we introduce graph transformer network to predict drug-target interaction. Conclusions This method introduces a line graph to model the relationship between drug and target. After transforming drug-target interactions from links to nodes, a graph transformer network is used to accomplish the task of predicting drug-target interactions.
... Networkbased approaches integrate data like drug-drug interactions, protein-protein interactions, drug-disease interactions, and drug-target interactions from multiple sources into a single unified framework to boost DTI prediction [3,12,16,17,35]. For instance, Wan et al. [28] devised an end-to-end technique entitled NeoDTI to combine data from omics networks and learn topology that preserves the information of drugs and targets. Recent years have seen a fast growth of ML models based on knowledge graphs (KG). ...
Preprint
Full-text available
Detecting probable Drug Target Interaction (DTI) is a critical task in drug discovery. Conventional DTI studies are expensive, labor-intensive, and take a lot of time, hence there are significant reasons to construct useful computational techniques that may successfully anticipate possible DTIs. Although certain methods have been developed for this cause, numerous interactions are yet to be discovered, and prediction accuracy is still low. To meet these challenges, we propose a DTI prediction model built on molecular structure of drugs and sequence of target proteins. In the proposed model, we use Simplified Molecular Input Line Entry System (SMILES) to create CDK descriptors, Molecular ACCess System (MACCS) fingerprints, Electrotopological state (Estate) fingerprints and amino acid sequences of targets to get Pseudo Amino Acid Composition (PseAAC). We target to evaluate performance of DTI prediction models using CDK descriptors. For comparison, we use benchmark data and evaluate models performance on two widely used fingerprints, MACCS fingerprints and Estate fingerprints. The evaluation of performances shows that CDK descriptors are superior at predicting DTIs. The proposed method also outperforms other previously published techniques significantly.
... Precision* = TPR TPR + FPR (15) where TPR = TP/(TP+FN) and FPR = FP/(FP+TN). Because TPR and FPR have the same range, the prediction results are assessed properly regardless of data imbalances using this metric. ...
Article
Full-text available
Drug repositioning, which involves the identification of new therapeutic indications for approved drugs, considerably reduces the time and cost of developing new drugs. Recent computational drug repositioning methods use heterogeneous networks to identify drug–disease associations. This review reveals existing network-based approaches for predicting drug–disease associations in three major categories: graph mining, matrix factorization or completion, and deep learning. We selected eleven methods from the three categories to compare their predictive performances. The experiment was conducted using two uniform datasets on the drug and disease sides, separately. We constructed heterogeneous networks using drug–drug similarities based on chemical structures and ATC codes, ontology-based disease–disease similarities, and drug–disease associations. An improved evaluation metric was used to reflect data imbalance as positive associations are typically sparse. The prediction results demonstrated that methods in the graph mining and matrix factorization or completion categories performed well in the overall assessment. Furthermore, prediction on the drug side had higher accuracy than on the disease side. Selecting and integrating informative drug features in drug–drug similarity measurement are crucial for improving disease-side prediction.
... Inspired by recent deep learning techniques, several deep learning models have been applied to drug discovery and repositioning processes that including the convolution neural network (CNN) (Öztürk et al., 2018;Wan et al., 2019) graph convolution network (GCN) (Nguyen et al., 2021), transformer (Vaswani et al., 2017;Chen et al., 2020) and the deep neural network (Gawehn et al., 2016), etc. In CPI model architecture, the process is usually divided into compound feature extraction, protein feature extraction, and classifier. ...
Article
Full-text available
Compound-protein interaction (CPI) prediction is a foundational task for drug discovery, which process is time-consuming and costly. The effectiveness of CPI prediction can be greatly improved using deep learning methods to accelerate drug development. Large number of recent research results in the field of computer vision, especially in deep learning, have proved that the position, geometry, spatial structure and other features of objects in an image can be well characterized. We propose a novel molecular image-based model named CAT-CPI (combining CNN and transformer to predict CPI) for CPI task. We use Convolution Neural Network (CNN) to learn local features of molecular images and then use transformer encoder to capture the semantic relationships of these features. To extract protein sequence feature, we propose to use a k-gram based method and obtain the semantic relationships of sub-sequences by transformer encoder. In addition, we build a Feature Relearning (FR) module to learn interaction features of compounds and proteins. We evaluated CAT-CPI on three benchmark datasets—Human, Celegans, and Davis—and the experimental results demonstrate that CAT-CPI presents competitive performance against state-of-the-art predictors. In addition, we carry out Drug-Drug Interaction (DDI) experiments to verify the strong potential of the methods based on molecular images and FR module.
... For proteins represented as FASTA sequence, we use convolutional neural network for learning representations. Therefore, unlike previous works [2], [3], [7], [13], [17], [18], GraMDTA learns representations for drugs and proteins from structures and their corresponding knowledge graphs. We aggregate the multiple modalities of drugs and proteins using multi-head attention weighting mechanism to learn relevant information while eliminating the noisy information cascades. ...
Preprint
Full-text available
Finding novel drug-target associations is vital for drug discovery. However, screening millions of small molecules for a select target protein is challenging. Several computational approaches have been proposed in the past using machine learning methods to find the candidate drugs for proteins. Some of these works utilized structures of drugs and proteins for modeling. A few of the works utilized knowledge graph networks and identified the potential candidates through link prediction approaches. While structural learning offers molecular-based representations, the knowledge graph-based learning offers interaction-based representations. Such multimodal sources of information acting complimentarily could improve the robustness of drug-target association (DTA) predictions. In this work, we propose multimodal graph neural network to learn both structural and knowledge graph representations while utilizing multi-head attention to fuse the multimodal representations and predict DTAs. We compare our proposed approach with existing works and show the benefits of multimodal fusion for DTA.
... In another study aimed at identifying drug-target interactions (DTIs), a CNN-based tool, NeoDTI, was developed. 179 NeoDTI mines large-scale graph data and automatically learns the topology-preserving representations of drugs and targets, to facilitate DTI prediction with compoundprotein binding affinity. Using such approaches, the drug targets of NPs can be identied, which can accelerate the drug-discovery platform. ...
Article
Full-text available
Covering: up to the end of 2022Microorganisms are exceptional sources of a wide array of unique natural products and play a significant role in drug discovery. During the golden era, several life-saving antibiotics and anticancer agents were isolated from microbes; moreover, they are still widely used. However, difficulties in the isolation methods and repeated discoveries of the same molecules have caused a setback in the past. Artificial intelligence (AI) has had a profound impact on various research fields, and its application allows the effective performance of data analyses and predictions. With the advances in omics, it is possible to obtain a wealth of information for the identification, isolation, and target prediction of secondary metabolites. In this review, we discuss drug discovery based on natural products from microorganisms with the help of AI and machine learning.
... Computational Intelligence and Neuroscience e architecture of fast Text is different from most popular large neural networks with a complex hierarchical structure. Quick Text only contains three layers: input, implicit, and output [25]. e input layer is used as the upper part of the model, and the n-gram word vector is obtained by superimposing all the words of the Text and then averaged to generate a vector to characterize the Text. ...
Article
Full-text available
This paper addresses data mining and neural network model construction and analysis to design a data interaction process model based on data mining and topology visualization. This paper performs preprocessing data operations such as data filtering and cleaning of the collected data. A typical multichannel convolutional neural network (MCNN) in deep learning techniques is applied to alert students’ academic performance. In addition, the network topology of the CNN is optimized to improve the performance of the model. The CNN has many hyperparameters that need to be tuned to construct an optimal model that can effectively interact with the data. In this paper, we propose a method to visualize the network topology within unstable regions to address the current problem of lacking an effective way to layout the network topology into specified areas. The technique transforms the network topology layout problem within the unstable region into a circular topology diffusion problem within a convex polygon, ensuring a clear, logical topology connection, and dramatically reducing the gaps in the area, making the layout more uniform beautiful. This paper constructs a real-time data interaction model based on JSON format and database triggers using message queues for reliable delivery. A platform-based real-time data interaction solution is designed by combining the timer method with the original key. The solution designed in this paper considers the real-time accuracy, security and reliability of data interaction. It satisfies the platform’s initial and newly discovered requirements for data interaction.
... DeepScreen [7] is an individual predictor for a specific target using a deep convolutional neural network. NeoDTI [8] integrates various sources from heterogeneous network data and uses topology-preserving representations of drugs and targets to implement interaction prediction. Hu et al. [9] proposes a CNN-based method for drug-target interaction prediction, which takes 1D, 2D structural descriptors of drug and sequence of protein as the network inputs. ...
Article
Full-text available
Background Affinity prediction between molecule and protein is an important step of virtual screening, which is usually called drug-target affinity (DTA) prediction. Its accuracy directly influences the progress of drug development. Sequence-based drug-target affinity prediction can predict the affinity according to protein sequence, which is fast and can be applied to large datasets. However, due to the lack of protein structure information, the accuracy needs to be improved. Results The proposed model which is called WGNN-DTA can be competent in drug-target affinity (DTA) and compound-protein interaction (CPI) prediction tasks. Various experiments are designed to verify the performance of the proposed method in different scenarios, which proves that WGNN-DTA has the advantages of simplicity and high accuracy. Moreover, because it does not need complex steps such as multiple sequence alignment (MSA), it has fast execution speed, and can be suitable for the screening of large databases. Conclusion We construct protein and molecular graphs through sequence and SMILES that can effectively reflect their structures. To utilize the detail contact information of protein, graph neural network is used to extract features and predict the binding affinity based on the graphs, which is called weighted graph neural networks drug-target affinity predictor (WGNN-DTA). The proposed method has the advantages of simplicity and high accuracy.
... Network-based approaches utilized the DTI network of identified edges between drugs and targets to identify new DTIs. Indeed, by constructing a heterogeneous network that includes information on drugs, proteins, diseases, and side-effects, the DTINet method can improve the accuracy of DTIs prediction (Luo et al., 2017), but the learning model only takes relatively simple log-bilinear functions, obtaining features may not be the inherent representations of drugs or targets for the final DTI prediction task (Wan et al., 2019). Supervised learning-based approaches are classified into similarity-based approaches and feature-based approaches (Chen et al., 2018). ...
Article
Full-text available
Drug–target interactions (DTIs) are regarded as an essential part of genomic drug discovery, and computational prediction of DTIs can accelerate to find the lead drug for the target, which can make up for the lack of time-consuming and expensive wet-lab techniques. Currently, many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target, but further efforts are needed to improve them. In this article, we proposed a new sequence-based method for accurately identifying DTIs. For target protein, we explore using pre-trained Bidirectional Encoder Representations from Transformers (BERT) to extract sequence features, which can provide unique and valuable pattern information. For drug molecules, Discrete Wavelet Transform (DWT) is employed to generate information from drug molecular fingerprints. Then we concatenate the feature vectors of the DTIs, and input them into a feature extraction module consisting of a batch-norm layer, rectified linear activation layer and linear layer, called BRL block and a Convolutional Neural Networks module to extract DTIs features further. Subsequently, a BRL block is used as the prediction engine. After optimizing the model based on contrastive loss and cross-entropy loss, it gave prediction accuracies of the target families of G Protein-coupled receptors, ion channels, enzymes, and nuclear receptors up to 90.1, 94.7, 94.9, and 89%, which indicated that the proposed method can outperform the existing predictors. To make it as convenient as possible for researchers, the web server for the new predictor is freely accessible at: https://bioinfo.jcu.edu.cn/dtibert or http://121.36.221.79/dtibert/. The proposed method may also be a potential option for other DITs.
... The DTA value can also be combined with some biochemical information-rich heterogeneous data to further improve the accuracy of DTI prediction, such as drug-disease association information, proteinprotein interaction information, etc. [16] Therefore, they have higher training efficiency and stronger expansion. Due to this advantage, machine learning methods based on DTA prediction are widely used in DTI prediction tasks. ...
Preprint
As a necessary process in drug development, finding a drug compound that can selectively bind to a specific protein is highly challenging and costly. Drug-target affinity (DTA), which represents the strength of drug-target interaction (DTI), has played an important role in the DTI prediction task over the past decade. Although deep learning has been applied to DTA-related research, existing solutions ignore fundamental correlations between molecular substructures in molecular representation learning of drug compound molecules/protein targets. Moreover, traditional methods lack the interpretability of the DTA prediction process. This results in missing feature information of intermolecular interactions, thereby affecting prediction performance. Therefore, this paper proposes a DTA prediction method with interactive learning and an autoencoder mechanism. The proposed model enhances the corresponding ability to capture the feature information of a single molecular sequence by the drug/protein molecular representation learning module and supplements the information interaction between molecular sequence pairs by the interactive information learning module. The DTA value prediction module fuses the drug-target pair interaction information to output the predicted value of DTA. Additionally, this paper theoretically proves that the proposed method maximizes evidence lower bound (ELBO) for the joint distribution of the DTA prediction model, which enhances the consistency of the probability distribution between the actual value and the predicted value. The experimental results confirm mutual transformer-drug target affinity (MT-DTA) achieves better performance than other comparative methods.
Drug-drug interaction refers to taking the two drugs may produce certain reaction which may be a threat to patients' health, or enhance the efficacy helpful for medical work. Therefore, it is necessary to study and predict it. In fact, traditional experimental methods can be used for drug-drug interaction prediction, but they are time-consuming and costly, so we prefer to use more accurate and convenient calculation methods to predict the unknown drug-drug interaction. In this paper, we proposed a deep learning framework called MSResG that considers multi-sources features of drugs and combines them with Graph Auto-Encoder to predicting. Firstly, the model obtains four feature representations of drugs from the database, namely, chemical substructure, target, pathway and enzyme, and then calculates the Jaccard similarity of the drugs. To balance different drug features, we perform similarity integration by finding the mean value. Then we will be comprehensive similarity network combined with drug interaction network, and encodes and decodes it using the graph auto-encoder based on residual graph convolution network. Encoding is to learn the potential feature vectors of drugs, which contain similar information and interaction information. Decoding is to reconstruct the network to predict unknown drug-drug interaction. The experimental results show that our model has advanced performance and is superior to other existing advanced methods. Case study also shows that MSResG has practical significance.
Article
Drug–target interaction (DTI) prediction is an essential step in drug repositioning. A few graph neural network (GNN)-based methods have been proposed for DTI prediction using heterogeneous biological data. However, existing GNN-based methods only aggregate information from directly connected nodes restricted in a drug-related or a target-related network and are incapable of capturing high-order dependencies in the biological heterogeneous graph. In this paper, we propose a metapath-aggregated heterogeneous graph neural network (MHGNN) to capture complex structures and rich semantics in the biological heterogeneous graph for DTI prediction. Specifically, MHGNN enhances heterogeneous graph structure learning and high-order semantics learning by modeling high-order relations via metapaths. Additionally, MHGNN enriches high-order correlations between drug-target pairs (DTPs) by constructing a DTP correlation graph with DTPs as nodes. We conduct extensive experiments on three biological heterogeneous datasets. MHGNN favorably surpasses 17 state-of-the-art methods over 6 evaluation metrics, which verifies its efficacy for DTI prediction. The code is available at https://github.com/Zora-LM/MHGNN-DTI.
Chapter
Drug-disease association prediction is essential in drug development and repositioning. At present, the proposed drug-disease association prediction models based on graph convolution usually learn the characterization of the entire drug-disease heterogeneous network. However, the obtained characterization information come more from the characteristics of neighboring nodes in the homogeneous network, it lacks attribute information of nodes in the heterogeneous network, thus affecting the model's predictive performance. In this paper, an end-to-end model named DAHNGC based on graph convolutional neural networks is proposed to predict drug-disease association, which divides the characteristic learning of drugs and disease nodes into two parts. The proposed model uses the graph convolutional network to learn the attribute characteristics of drugs and disease nodes in the homogeneous network. Based on the known relationship between drugs and diseases, we design a method to automatically learn the characteristic information of drugs and disease nodes in heterogeneous networks. Subsequently, the drug-disease association matrix is reconstructed using a bilinear decoder to obtain a potential drug-disease association. In addition, we also adopt the DropEdge method to alleviate the over-smoothing problem of graph convolution. The experimental results show that the average AUC of the DAHNGC is 0.9113 through five-fold cross-verification, which is superior to that of the comparative method.
Article
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Article
Motivation: Discovering the drug-target interactions (DTIs) is a crucial step in drug development such as the identification of drug side effects and drug repositioning. Since identifying DTIs by web-biological experiments is time-consuming and costly, many computational-based approaches have been proposed and have become an efficient manner to infer the potential interactions. Although extensive effort is invested to solve this task, the prediction accuracy still needs to be improved. More especially, heterogeneous network-based approaches do not fully consider the complex structure and rich semantic information in these heterogeneous networks. Therefore, it is still a challenge to predict DTIs efficiently. Results: In this study, we develop a novel method via Multiview heterogeneous information network embedding with Hierarchical Attention mechanisms to discover potential Drug-Target Interactions (MHADTI). Firstly, MHADTI constructs different similarity networks for drugs and targets by utilizing their multisource information. Combined with the known DTI network, three drug-target heterogeneous information networks (HINs) with different views are established. Secondly, MHADTI learns embeddings of drugs and targets from multiview HINs with hierarchical attention mechanisms, which include the node-level, semantic-level and graph-level attentions. Lastly, MHADTI employs the multilayer perceptron to predict DTIs with the learned deep feature representations. The hierarchical attention mechanisms could fully consider the importance of nodes, meta-paths and graphs in learning the feature representations of drugs and targets, which makes their embeddings more comprehensively. Extensive experimental results demonstrate that MHADTI performs better than other SOTA prediction models. Moreover, analysis of prediction results for some interested drugs and targets further indicates that MHADTI has advantages in discovering DTIs. Availability and implementation: https://github.com/pxystudy/MHADTI.
Preprint
Full-text available
Drug repurposing is an active area of research and effort due to decreasing the cost and time of drug development. Most of those efforts are primarily concerned with the prediction of drug-target interactions. Many evaluation models, from matrix factorization to more cutting-edge deep neural networks, have come to the scene to predict drug-target interactions. Most of the available information on drugs and targets is gathered and used as the features to feed the prediction models. Some predictive models are devoted to the prediction's quality, and others are devoted to the efficiency of the predictive models, e.g., embedding generation. In this work, we propose two predictive models of drug-target. To do this, we use the relations of drugs and targets and propose a method of similarity computations. Using these similarities, we generate an accumulative feature representation of these two objects. We propose two inductive, deep network models of IEDTI and DEDTI for drug-target interaction prediction. The former uses triplet and maps the input feature vectors into meaningful embedding vectors. Then, it applies a deep predictive model to each drug-target pair to evaluate their interaction. The DEDTI directly uses the feature vectors of drugs and targets and applies a predictive model to each pair to predict their interactions. The results show that both models outperform the state-of-the-art models.
Article
Motivation: Large-scale heterogeneous data provides diverse perspectives for predicting drug-protein interactions (DPIs). However, the available information on molecular interactions and clinical associations related to drugs or proteins is incomplete because there may be unproven interactions and associations. This incomplete information in the available data is presented in the form of non-interaction and non-correlation, which may mislead the prediction model. Existing methods fuse incomplete and complete information without considering their integrity, so the negative effects of incomplete information still exist. Results: We develop a network-based DPI prediction method named BRWCP, which uses the complete information network to correct the prediction results acquired by the incomplete information network. By integrating relevant heterogeneous information that may be incomplete, the feature similarities of drugs and proteins are obtained. Combining the feature similarities and known DPIs, an incomplete information-based drug-protein heterogeneous network is constructed. Then a bidirectional random walk with pruning algorithm is adopted in this heterogeneous network to predict potential DPIs. Next, the predicted DPIs are combined with the chemical fingerprint similarity of drugs and amino acid sequence similarity of proteins to construct the complete information network. The bidirectional random walk with pruning algorithm is applied in the new network to obtain the final prediction results until it converges. Experimental results show that BRWCP is superior to several state-of-the-art DPI prediction methods, and case studies further confirm its ability to tap potential drug-protein interactions. Availability: The code of BRWCP is available at https://github.com/lyfdomain/BRWCP. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Drug-drug interactions (DDIs) prediction is a challenging task in drug development and clinical application. Due to the extremely large complete set of all possible DDIs, computer-aided DDIs prediction methods are getting lots of attention in the pharmaceutical industry and academia. However, most existing computational methods only use single perspective information and few of them conduct the task based on the biomedical knowledge graph (BKG), which can provide more detailed and comprehensive drug lateral side information flow. To this end, a deep learning framework, namely DeepLGF, is proposed to fully exploit BKG fusing local-global information to improve the performance of DDIs prediction. More specifically, DeepLGF first obtains chemical local information on drug sequence semantics through a natural language processing algorithm. Then a model of BFGNN based on graph neural network is proposed to extract biological local information on drug through learning embedding vector from different biological functional spaces. The global feature information is extracted from the BKG by our knowledge graph embedding method. In DeepLGF, for fusing local-global features well, we designed four aggregating methods to explore the most suitable ones. Finally, the advanced fusing feature vectors are fed into deep neural network to train and predict. To evaluate the prediction performance of DeepLGF, we tested our method in three prediction tasks and compared it with state-of-the-art models. In addition, case studies of three cancer-related and COVID-19-related drugs further demonstrated DeepLGF's superior ability for potential DDIs prediction. The webserver of the DeepLGF predictor is freely available at http://120.77.11.78/DeepLGF/.
Article
Determining the interaction of drug and target plays a key role in the process of drug development and discovery. The calculation methods can predict new interactions and speed up the process of drug development. In recent studies, the network-based approaches have been proposed to predict drug-target interactions. However, these methods cannot fully utilize the node information from heterogeneous networks. Therefore, we propose a method based on heterogeneous graph convolutional neural network for drug-target interaction prediction, GCHN-DTI (Predicting drug-target interactions by graph convolution on heterogeneous net-works), to predict potential DTIs. GCHN-DTI integrates network information from drug-target interactions, drug-drug interactions, drug-similarities, target-target interactions, and target-similarities. Then, the graph convolution operation is used in the heterogeneous network to obtain the node embedding of the drugs and the targets. Furthermore, we incorporate an attention mechanism between graph convolutional layers to combine node embedding from each layer. Finally, the drug-target interaction score is predicted based on the node embedding of the drugs and the targets. Our model uses fewer network types and achieves higher prediction performance. In addition, the prediction performance of the model will be significantly improved on the dataset with a higher proportion of positive samples. The experimental evaluations show that GCHN-DTI outperforms several state-of-the-art prediction methods.
Chapter
Predicting the relationships between drugs and targets is a crucial step in the course of drug discovery and development. Computational prediction of associations between drugs and targets greatly enhances the probability of finding new interactions by reducing the cost of in vitro experiments. In this paper, a Meta-path-based Representation Learning model, namely MRLDTI, is proposed to predict unknown DTIs. Specifically, we first design a random walk strategy with a meta-path to collect the biological relations of drugs and targets. Then, the representations of drugs and targets are captured by a heterogeneous skip-gram algorithm. Finally, a machine learning classifier is employed by MRLDTI to discover novel DTIs. Experimental results indicate that MRLDTI performs better than several state-of-the-art models under ten-fold cross-validation on the gold standard dataset.KeywordsDrug repositioningComputational predictionDrugsTargetsDTIs
Chapter
Predicting drug-target interactions plays an important role in shortening the cycle and reducing the cost of drug development. Although many existing approaches have been successful, most of them mainly start from one-dimensional sequences and do not take full advantage of the biological relationships between the drug and the target. In this paper, the heterogeneous networks are constructed based on biological properties such as drug, target, disease, side effects and their relationships. The features of drugs and targets are automatically learned using a neural network-based topology-preserving learning model. To improve the prediction accuracy of drug-target interactions, a RandSAS optimization strategy is designed, which introduces a stochastic factor based on drug-target binding affinity and drug-drug similarity to optimize the prediction process. The experimental result shows that the prediction results are relatively good when the similarity threshold is set to 0.6. In addition, based on the constructed heterogeneous network, RandSAS strategy could improve the accuracy of drug-target prediction to a certain extent.KeywordsHeterogeneous networksTopology preservationRandom factorSimilaritiesDrug-target interaction
Article
Drug repositioning identifies novel therapeutic potentials for existing drugs and is considered an attractive approach due to the opportunity for reduced development timelines and overall costs. Prior computational methods usually learned a drug's representation from an entire graph of drug-disease associations. Therefore, the representation of learned drugs representation are static and agnostic to various diseases. However, for different diseases, a drug's mechanism of actions (MoAs) are different. The relevant context information should be differentiated for the same drug to target different diseases. Computational methods are thus required to learn different representations corresponding to different drug-disease associations for the given drug. In view of this, we propose an end-to-end partner-specific drug repositioning approach based on graph convolutional network, named PSGCN. PSGCN firstly extracts specific context information around drug-disease pairs from an entire graph of drug-disease associations. Then, it implements a graph convolutional network on the extracted graph to learn partner-specific graph representation. As the different layers of graph convolutional network contribute differently to the representation of the partner-specific graph, we design a layer self-attention mechanism to capture multi-scale layer information. Finally, PSGCN utilizes sortpool strategy to obtain the partner-specific graph embedding and formulates a drug-disease association prediction as a graph classification task. A fully-connected module is established to classify the partner-specific graph representations. The experiments on three benchmark datasets prove that the representation learning of partner-specific graph can lead to superior performances over state-of-the-art methods. In particular, case studies on small cell lung cancer and breast carcinoma confirmed that PSGCN is able to retrieve more actual drug-disease associations in the top prediction results. Moreover, in comparison with other static approaches, PSGCN can partly distinguish the different disease context information for the given drug.
Article
Motivation: Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming, and expensive. With the collection of more validated biomedical data and the advancement of computing technology, the computational methods based on chemogenomics gradually attract more attention, which guide the experimental verifications. Results: In this study, we propose an end-to-end deep learning-based method named IIFDTI to predict DTIs based on independent features of drug-target pairs and interactive features of their substructures. First, the interactive features of substructures between drugs and targets are extracted by the bidirectional encoder-decoder architecture. The independent features of drugs and targets are extracted by the graph neural networks and convolutional neural networks, respectively. Then, all extracted features are fused and inputted into fully connected dense layers in downstream tasks for predicting DTIs. IIFDTI takes into account the independent features of drugs/targets and simulates the interactive features of the substructures from the biological perspective. Multiple experiments show that IIFDTI outperforms the state-of-the-art methods in terms of AUC, AUPR, precision, and recall on benchmark datasets. In addition, the mapped visualizations of attention weights indicate that IIFDTI has learned the biological knowledge insights, and two case studies illustrate the capabilities of IIFDTI in practical applications. Availability and implementation: The codes of IIFDTI are available at https://github.com/czjczj/IIFDTI. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Predicting drug-target interactions (DTIs) is crucial at many phases of drug discovery and repositioning. Many computational methods based on heterogeneous networks (HNs) have proved their potential to predict DTIs by capturing extensive biological knowledge and semantic information from meta-paths. However, existing methods manually customize meta-paths, which is overly dependent on some specific expertise. Such strategy heavily limits the scalability and flexibility of these models, and even affects their predictive performance. To alleviate this limitation, we propose a novel HN-based method with attentive meta-path extraction for DTI prediction, named HampDTI, which is capable of automatically extracting useful meta-paths through a learnable attention mechanism instead of pre-definition based on domain knowledge. Specifically, by scoring multi-hop connections across various relations in the HN with each relation assigned an attention weight, HampDTI constructs a new trainable graph structure, called meta-path graph. Such meta-path graph implicitly measures the importance of every possible meta-path between drugs and targets. To enable HampDTI to extract more diverse meta-paths, we adopt a multi-channel mechanism to generate multiple meta-path graphs. Then, a graph neural network is deployed on the generated meta-path graphs to yield the multi-channel embeddings of drugs and targets. Finally, HampDTI fuses all embeddings from different channels for predicting DTIs. The meta-path graphs are optimized along with the model training such that HampDTI can adaptively extract valuable meta-paths for DTI prediction. The experiments on benchmark datasets not only show the superiority of HampDTI in DTI prediction over several baseline methods, but also, more importantly, demonstrate the effectiveness of the model discovering important meta-paths.
Article
In drug development, unexpected side effects are the main reason for the failure of candidate drug trials. Discovering potential side effects of drugs in silico can improve the success rate of drug screening. However, most previous works extracted and utilized an effective representation of drugs from a single perspective. These methods merely considered the topological information of drug in the biological entity network, or combined the association information (e.g. knowledge graph KG) between drug and other biomarkers, or only used the chemical structure or sequence information of drug. Consequently, to jointly learn drug features from both the macroscopic biological network and the microscopic drug molecules. We propose a hybrid embedding graph neural network model named idse-HE, which integrates graph embedding module and node embedding module. idse-HE can fuse the drug chemical structure information, the drug substructure sequence information and the drug network topology information. Our model deems the final representation of drugs and side effects as two implicit factors to reconstruct the original matrix and predicts the potential side effects of drugs. In the robustness experiment, idse-HE shows stable performance in all indicators. We reproduce the baselines under the same conditions, and the experimental results indicate that idse-HE is superior to other advanced methods. Finally, we also collect evidence to confirm several real drug side effect pairs in the predicted results, which were previously regarded as negative samples. More detailed information, scientific researchers can access the user-friendly web-server of idse-HE at http://bioinfo.jcu.edu.cn/idse-HE. In this server, users can obtain the original data and source code, and will be guided to reproduce the model results.
Chapter
Drug-Target Interaction (DTI) prediction usually devotes to accurately identify the potential binding targets on proteins so as to guide the drug development. However, the sparse imbalance of known drug-target pairs remains a challenge for high-quality representation learning of drugs and targets, interfering with accurate prediction. The labeled drug-target pairs are far less than the missed since the obtained DTIs are recorded with pathogenic proteins and sophisticated bio-experiments. Therefore, we propose a deep learning paradigm via Heterogeneous graph data Augmentation and node Similarity (HAS) to solve the sparse imbalanced problem on drug-target interaction prediction. Heterogeneous graph data augmentation is devised to generate multi-view augmented graphs through a heterogeneous neighbors sampling strategy. Then the consistency across different graph structures is captured using graph contrastive optimization. Node similarity is calculated on the heterogeneous entity association matrices, aiming to integrate similarity information and heterogeneous attribute gain for drug-target interaction prediction. Extensive experiments show that HAS offers superior performance in sparse imbalanced scenarios compared state-of-the-art methods. Ablation studies prove the effectiveness of heterogeneous graph data augmentation and node similarity.KeywordsSparse imbalanced DTI predictionHeterogeneous graph data augmentationGraph contrastive optimizationNode similarity
Article
Drug-target interaction (DTI) prediction performs a crucial part in drug discovery and design. Although many computational approaches for such prediction have been proposed, current researches still generally adopt chemical similarities of drugs or the sequence similarities of targets. However, the valuable information of known interactions has not been noticed, and the existing noise and useless information reduce the accuracy of DTI prediction. In addition, many existing computational approaches ignore the behavior information between nodes of the DTI network. In this paper, we develop an ensemble computational approach called integrated multi-similarity fusion and heterogeneous graph inference. First, based on the known DTI network, the degree distribution of drug and target similarities are analyzed and the noise and useless information are removed to improve prediction accuracy. Second, based on drug and target similarities and known DTIs, a strategy of multi-similarity fusion is proposed to capture potential useful information from known interactions that is used for enhancing drug and target similarities. Third, the heterogeneous graph inference is used to predict the DTIs to capture the edge weight (closeness) and behavior information (diffusion) between nodes of a heterogeneous network. To assist the reproducibility of our work and its comparison to published results, we perform experiments on four benchmark datasets. Results show that our approach outperforms some existing approaches and can contribute to predicting potential DTIs.
Article
Full-text available
The emergence of large-scale genomic, chemical and pharmacological data provides new opportunities for drug discovery and repositioning. In this work, we develop a computational pipeline, called DTINet, to predict novel drug–target interactions from a constructed heterogeneous network, which integrates diverse drug-related information. DTINet focuses on learning a low-dimensional vector representation of features, which accurately explains the topological properties of individual nodes in the heterogeneous network, and then makes prediction based on these representations via a vector space projection scheme. DTINet achieves substantial performance improvement over other state-of-the-art methods for drug–target interaction prediction. Moreover, we experimentally validate the novel interactions between three drugs and the cyclooxygenase proteins predicted by DTINet, and demonstrate the new potential applications of these identified cyclooxygenase inhibitors in preventing inflammatory diseases. These results indicate that DTINet can provide a practically useful tool for integrating heterogeneous information to predict new drug–target interactions and repurpose existing drugs.
Conference Paper
Full-text available
We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.
Article
Full-text available
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
Article
Full-text available
Model: 2015, 55, 263-274). However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the iterative refinement long short-term memory, that, when combined with graph convolutional neural networks, significantly improves learning of meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery (Ramsundar, B. deepchem.io. https://github.com/deepchem/deepchem, 2016).
Article
Full-text available
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability: http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact: zhusf@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Availability: Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Full-text available
In silico discovery of interactions between drug compounds and target proteins is of core importance for improving the efficiency of the laborious and costly experimental determination of drug-target interaction. Drug-target interaction data are available for many classes of pharmaceutically useful target proteins including enzymes, ion channels, GPCRs and nuclear receptors. However, current drug-target interaction databases contain a small number of drug-target pairs which are experimentally validated interactions. In particular, for some drug compounds (or targets) there is no available interaction. This motivates the need for developing methods that predict interacting pairs with high accuracy also for these 'new' drug compounds (or targets). We show that a simple weighted nearest neighbor procedure is highly effective for this task. We integrate this procedure into a recent machine learning method for drug-target interaction we developed in previous work. Results of experiments indicate that the resulting method predicts true interactions with high accuracy also for new drug compounds and achieves results comparable or better than those of recent state-of-the-art algorithms. Software is publicly available at http://cs.ru.nl/~tvanlaarhoven/drugtarget2013/.
Article
Full-text available
Motivation: In silico prediction of drug-target interactions plays an important role toward identifying and developing new uses of existing or abandoned drugs. Network-based approaches have recently become a popular tool for discovering new drug-target interactions (DTIs). Unfortunately, most of these network-based approaches can only predict binary interactions between drugs and targets, and information about different types of interactions has not been well exploited for DTI prediction in previous studies. On the other hand, incorporating additional information about drug-target relationships or drug modes of action can improve prediction of DTIs. Furthermore, the predicted types of DTIs can broaden our understanding about the molecular basis of drug action. Results: We propose a first machine learning approach to integrate multiple types of DTIs and predict unknown drug-target relationships or drug modes of action. We cast the new DTI prediction problem into a two-layer graphical model, called restricted Boltzmann machine, and apply a practical learning algorithm to train our model and make predictions. Tests on two public databases show that our restricted Boltzmann machine model can effectively capture the latent features of a DTI network and achieve excellent performance on predicting different types of DTIs, with the area under precision-recall curve up to 89.6. In addition, we demonstrate that integrating multiple types of DTIs can significantly outperform other predictions either by simply mixing multiple types of interactions without distinction or using only a single interaction type. Further tests show that our approach can infer a high fraction of novel DTIs that has been validated by known experiments in the literature or other databases. These results indicate that our approach can have highly practical relevance to DTI prediction and drug repositioning, and hence advance the drug discovery process. Availability: Software and datasets are available on request. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Motivation: The identification of drug–target interaction (DTI) represents a costly and time-consuming step in drug discovery and design. Computational methods capable of predicting reliable DTI play an important role in the field. Recently, recommendation methods relying on network-based inference (NBI) have been proposed. However, such approaches implement naive topology-based inference and do not take into account important features within the drug–target domain. Results: In this article, we present a new NBI method, called domain tuned-hybrid (DT-Hybrid), which extends a well-established recommendation technique by domain-based knowledge including drug and target similarity. DT-Hybrid has been extensively tested using the last version of an experimentally validated DTI database obtained from DrugBank. Comparison with other recently proposed NBI methods clearly shows that DT-Hybrid is capable of predicting more reliable DTIs. Availability: DT-Hybrid has been developed in R and it is available, along with all the results on the predictions, through an R package at the following URL: http://sites.google.com/site/ehybridalgo/. Contact: apulvirenti@dmi.unict.it Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Motivation: In silico methods provide efficient ways to predict possible interactions between drugs and targets. Supervised learning approach, bipartite local model (BLM), has recently been shown to be effective in prediction of drug-target interactions. However, for drug-candidate compounds or target-candidate proteins that currently have no known interactions available, its pure 'local' model is not able to be learned and hence BLM may fail to make correct prediction when involving such kind of new candidates. Results: We present a simple procedure called neighbor-based interaction-profile inferring (NII) and integrate it into the existing BLM method to handle the new candidate problem. Specifically, the inferred interaction profile is treated as label information and is used for model learning of new candidates. This functionality is particularly important in practice to find targets for new drug-candidate compounds and identify targeting drugs for new target-candidate proteins. Consistent good performance of the new BLM-NII approach has been observed in the experiment for the prediction of interactions between drugs and four categories of target proteins. Especially for nuclear receptors, BLM-NII achieves the most significant improvement as this dataset contains many drugs/targets with no interactions in the cross-validation. This demonstrates the effectiveness of the NII strategy and also shows the great potential of BLM-NII for prediction of compound-protein interactions. Contact: jpmei@ntu.edu.sg Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) provides information about interactions between environmental chemicals and gene products and their relationships to diseases. Chemical–gene, chemical–disease and gene–disease interactions manually curated from the literature are integrated to generate expanded networks and predict many novel associations between different data types. CTD now contains over 15 million toxicogenomic relationships. To navigate this sea of data, we added several new features, including DiseaseComps (which finds comparable diseases that share toxicogenomic profiles), statistical scoring for inferred gene–disease and pathway–chemical relationships, filtering options for several tools to refine user analysis and our new Gene Set Enricher (which provides biological annotations that are enriched for gene sets). To improve data visualization, we added a Cytoscape Web view to our ChemComps feature, included color-coded interactions and created a ‘slim list’ for our MEDIC disease vocabulary (allowing diseases to be grouped for meta-analysis, visualization and better data management). CTD continues to promote interoperability with external databases by providing content and cross-links to their sites. Together, this wealth of expanded chemical–gene–disease data, combined with novel ways to analyze and view content, continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases.
Conference Paper
Full-text available
Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve.
Article
Full-text available
The in silico prediction of potential interactions between drugs and target proteins is of core importance for the identification of new drugs or novel targets for existing drugs. However, only a tiny portion of all drug-target pairs in current datasets are experimentally validated interactions. This motivates the need for developing computational methods that predict true interaction pairs with high accuracy. We show that a simple machine learning method that uses the drug-target network as the only source of information is capable of predicting true interaction pairs with high accuracy. Specifically, we introduce interaction profiles of drugs (and of targets) in a network, which are binary vectors specifying the presence or absence of interaction with every target (drug) in that network. We define a kernel on these profiles, called the Gaussian Interaction Profile (GIP) kernel, and use a simple classifier, (kernel) Regularized Least Squares (RLS), for prediction drug-target interactions. We test comparatively the effectiveness of RLS with the GIP kernel on four drug-target interaction networks used in previous studies. The proposed algorithm achieves area under the precision-recall curve (AUPR) up to 92.7, significantly improving over results of state-of-the-art methods. Moreover, we show that using also kernels based on chemical and genomic information further increases accuracy, with a neat improvement on small datasets. These results substantiate the relevance of the network topology (in the form of interaction profiles) as source of information for predicting drug-target interactions. Software and Supplementary Material are available at http://cs.ru.nl/~tvanlaarhoven/drugtarget2011/. tvanlaarhoven@cs.ru.nl; elenam@cs.ru.nl. Supplementary data are available at Bioinformatics online.
Article
Full-text available
DrugBank (http://www.drugbank.ca) is a richly annotated database of drug and drug target information. It contains extensive data on the nomenclature, ontology, chemistry, structure, function, action, pharmacology, pharmacokinetics, metabolism and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. It also contains comprehensive information on the target diseases, proteins, genes and organisms on which these drugs act. First released in 2006, DrugBank has become widely used by pharmacists, medicinal chemists, pharmaceutical researchers, clinicians, educators and the general public. Since its last update in 2008, DrugBank has been greatly expanded through the addition of new drugs, new targets and the inclusion of more than 40 new data fields per drug entry (a 40% increase in data 'depth'). These data field additions include illustrated drug-action pathways, drug transporter data, drug metabolite data, pharmacogenomic data, adverse drug response data, ADMET data, pharmacokinetic data, computed property data and chemical classification data. DrugBank 3.0 also offers expanded database links, improved search tools for drug-drug and food-drug interaction, new resources for querying and viewing drug pathways and hundreds of new drug entries with detailed patent, pricing and manufacturer data. These additions have been complemented by enhancements to the quality and quantity of existing data, particularly with regard to drug target, drug description and drug action data. DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of 'omics' (i.e. pharmacogenomic, pharmacoproteomic, pharmacometabolomic and even pharmacoeconomic) applications.
Article
Full-text available
Predicting drug-protein interactions from heterogeneous biological data sources is a key step for in silico drug discovery. The difficulty of this prediction task lies in the rarity of known drug-protein interactions and myriad unknown interactions to be predicted. To meet this challenge, a manifold regularization semi-supervised learning method is presented to tackle this issue by using labeled and unlabeled information which often generates better results than using the labeled data alone. Furthermore, our semi-supervised learning method integrates known drug-protein interaction network information as well as chemical structure and genomic sequence data. Using the proposed method, we predicted certain drug-protein interactions on the enzyme, ion channel, GPCRs, and nuclear receptor data sets. Some of them are confirmed by the latest publicly available drug targets databases such as KEGG. We report encouraging results of using our method for drug-protein interaction network reconstruction which may shed light on the molecular interaction inference and new uses of marketed drugs.
Article
Full-text available
The molecular understanding of phenotypes caused by drugs in humans is essential for elucidating mechanisms of action and for developing personalized medicines. Side effects of drugs (also known as adverse drug reactions) are an important source of human phenotypic information, but so far research on this topic has been hampered by insufficient accessibility of data. Consequently, we have developed a public, computer-readable side effect resource (SIDER) that connects 888 drugs to 1450 side effect terms. It contains information on frequency in patients for one-third of the drug-side effect pairs. For 199 drugs, the side effect frequency of placebo administration could also be extracted. We illustrate the potential of SIDER with a number of analyses. The resource is freely available for academic research at http://sideeffects.embl.de.
Article
Full-text available
In silico prediction of drug-target interactions from heterogeneous biological data is critical in the search for drugs for known diseases. This problem is currently being attacked from many different points of view, a strong indication of its current importance. Precisely, being able to predict new drug-target interactions with both high precision and accuracy is the holy grail, a fundamental requirement for in silico methods to be useful in a biological setting. This, however, remains extremely challenging due to, amongst other things, the rarity of known drug-target interactions. We propose a novel supervised inference method to predict unknown drug-target interactions, represented as a bipartite graph. We use this method, known as bipartite local models to first predict target proteins of a given drug, then to predict drugs targeting a given protein. This gives two independent predictions for each putative drug-target interaction, which we show can be combined to give a definitive prediction for each interaction. We demonstrate the excellent performance of the proposed method in the prediction of four classes of drug-target interaction networks involving enzymes, ion channels, G protein-coupled receptors (GPCRs) and nuclear receptors in human. This enables us to suggest a number of new potential drug-target interactions. An implementation of the proposed algorithm is available upon request from the authors. Datasets and all prediction results are available at http://cbio.ensmp.fr/~yyamanishi/bipartitelocal/.
Article
Full-text available
Human Protein Reference Database (HPRD--http://www.hprd.org/), initially described in 2003, is a database of curated proteomic information pertaining to human proteins. We have recently added a number of new features in HPRD. These include PhosphoMotif Finder, which allows users to find the presence of over 320 experimentally verified phosphorylation motifs in proteins of interest. Another new feature is a protein distributed annotation system--Human Proteinpedia (http://www.humanproteinpedia.org/)--through which laboratories can submit their data, which is mapped onto protein entries in HPRD. Over 75 laboratories involved in proteomics research have already participated in this effort by submitting data for over 15,000 human proteins. The submitted data includes mass spectrometry and protein microarray-derived data, among other data types. Finally, HPRD is also linked to a compendium of human signaling pathways developed by our group, NetPath (http://www.netpath.org/), which currently contains annotations for several cancer and immune signaling pathways. Since the last update, more than 5500 new protein sequences have been added, making HPRD a comprehensive resource for studying the human proteome.
Article
Full-text available
The identification of protein function based on biological information is an area of intense research. Here we consider a complementary technique that quantitatively groups and relates proteins based on the chemical similarity of their ligands. We began with 65,000 ligands annotated into sets for hundreds of drug targets. The similarity score between each set was calculated using ligand topology. A statistical model was developed to rank the significance of the resulting similarity scores, which are expressed as a minimum spanning tree to map the sets together. Although these maps are connected solely by chemical similarity, biologically sensible clusters nevertheless emerged. Links among unexpected targets also emerged, among them that methadone, emetine and loperamide (Imodium) may antagonize muscarinic M3, alpha2 adrenergic and neurokinin NK2 receptors, respectively. These predictions were subsequently confirmed experimentally. Relating receptors by ligand chemistry organizes biology to reveal unexpected relationships that may be assayed using the ligands themselves.
Article
Full-text available
Colony stimulating factor-1 (CSF1) and its receptor (CSF1-R) are important in mammary gland development and have been implicated in breast carcinogenesis. In a nested case-control study in the Nurses' Heath Study of 726 breast cancer cases diagnosed between June 1, 1992, and June 1, 1998, and 734 matched controls, we prospectively evaluated whether circulating levels of CSF1 (assessed in 1989-1990) are associated with breast cancer risk. The association varied by menopausal status (P(heterogeneity) = 0.009). CSF1 levels in the highest quartile (versus lowest) were associated with an 85% reduced risk of premenopausal breast cancer [relative risk (RR), 0.15; 95% confidence interval (95% CI), 0.03-0.85; P(trend) = 0.02]. In contrast, CSF1 levels in the highest quartile conferred a 33% increased risk of postmenopausal breast cancer (RR, 1.33; 95% CI, 0.96-1.86; P(trend) = 0.11), with greatest risk for invasive (RR, 1.45; 95% CI, 1.02-2.07; P(trend) = 0.06) and ER+/PR+ tumors (RR, 1.72; 95% CI, 1.11-2.66; P(trend) = 0.04). Thus, the association of circulating CSF1 levels and breast cancer varies by menopausal status.
Article
Median lethal death, LD50, is a general indicator of compound acute oral toxicity (AOT). Various in silico methods were developed for AOT prediction to reduce costs and time. In this study, we developed an improved molecular graph encoding convolutional neural networks (MGE-CNN) architecture to develop three types of high-quality AOT models: regression model (deepAOT-R), multi-classification model (deepAOTC) and multi-task (deepAOT-CR). These predictive models highly outperformed previously reported models. For the two external data sets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively. We then performed forward and backward exploration of deepAOT models for deep fingerprints, which could support shallow machine learning methods more efficiently than traditional fingerprints or descriptors.We further performed automatic feature learning, a key essence of deep learning, to map the corresponding activation values into fragment space and derive AOT-related chemical substructures by reverse mining of the features. Our deep learning architecture for AOT is generally applicable in predicting and exploring other toxicity or property endpoints of chemical compounds. The two deepAOT models are freely available at http://repharma.pku.edu.cn/DLAOT/DLAOThome.php.
Conference Paper
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large num- ber of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alterna- tive to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example,we present a simplemethod for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation function to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark, results we believe are strong enough to justify retiring this benchmark.
Article
Decades of costly failures in translating drug candidates from preclinical disease models to human therapeutic use warrant reconsideration of the priority placed on animal models in biomedical research. Following an international workshop attended by experts from academia, government institutions, research funding bodies, and the corporate and nongovernmental organisation (NGO) sectors, in this consensus report, we analyse, as case studies, five disease areas with major unmet needs for new treatments. In view of the scientifically driven transition towards a human pathway-based paradigm in toxicology, a similar paradigm shift appears to be justified in biomedical research. There is a pressing need for an approach that strategically implements advanced, human biology-based models and tools to understand disease pathways at multiple biological scales. We present recommendations to help achieve this.
Article
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.
Article
Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design as the first step in in-silico screening. We previously proposed chemical genomics-based virtual screening (CGBVS), which predicts CPIs by using a support vector machine (SVM). However, the CGBVS has problems when training using more than a million datasets of CPIs since SVMs require an exponential increase in the calculation time and computer memory. To solve this problem, we propose the CGBVS-DNN, in which we use deep neural networks, a kind of deep learning technique, instead of the SVM. Deep learning does not require learning all input data at once because the network can be trained with small mini-batches. Experimental results show that the CGBVS-DNN outperformed the original CGBVS with a quarter million CPIs. Results of cross-validation show that the accuracy of the CGBVS-DNN reaches up to 98.2 % (σ<0.01) with 4 million CPIs.
Article
The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL- CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets.
Article
Convolutional neural networks (CNNs) have greatly improved state-of-the-art performances in a number of fields, notably computer vision and natural language processing. In this work, we are interested in generalizing the formulation of CNNs from low-dimensional regular Euclidean domains, where images (2D), videos (3D) and audios (1D) are represented, to high-dimensional irregular domains such as social networks or biological networks represented by graphs. This paper introduces a formulation of CNNs on graphs in the context of spectral graph theory. We borrow the fundamental tools from the emerging field of signal processing on graphs, which provides the necessary mathematical background and efficient numerical schemes to design localized graph filters efficient to learn and evaluate. As a matter of fact, we introduce the first technique that offers the same computational complexity than standard CNNs, while being universal to any graph structure. Numerical experiments on MNIST and 20NEWS demonstrate the ability of this novel deep learning system to learn local, stationary, and compositional features on graphs, as long as the graph is well-constructed.
Conference Paper
Network based prediction of interaction between drug compounds and target proteins is a core step in the drug discovery process. The availability of drug–target interaction data has boosted the development of machine learning methods for the in silico prediction of drug–target interactions. In this paper we focus on the crucial issue of data bias. We show that four popular datasets contain a bias because of the way they have been constructed: all drug compounds and target proteins have at least one interaction and some of them have only a single interaction. We show that this bias can be exploited by prediction methods to achieve an optimistic generalization performance as estimated by cross-validation procedures, in particular leave-one-out cross validation. We discuss possible ways to mitigate the effect of this bias, in particular by adapting the validation procedure. In general, results indicate that the data bias should be taken into account when assessing the generalization performance of machine learning methods for the in silico prediction of drug–target interactions. The datasets and source code for this article are available at http://cs.ru.nl/~tvanlaarhoven/bias2014/
Article
Many questions about the biological activity and availability of small molecules remain inaccessible to investigators who could most benefit from their answers. To narrow the gap between chemoinformatics and biology, we have developed a suite of ligand annotation, purchasability, target and biology association tools, incorporated into ZINC and meant for investigators who are not computer specialists. The new version contains over 120 million purchasable "drug-like" compounds - effectively all organic molecules that are for sale - a quarter of which are available for immediate delivery. ZINC connects purchasable compounds to high-value ones such as metabolites, drugs, natural products and annotated compounds from the literature. Compounds may be accessed by the genes they are annotated for, as well as the major and minor target classes to which those genes belong. It offers new analysis tools that are easy for non-specialists yet with few limitations for experts. ZINC retains its original 3D roots - all molecules are available in biologically relevant, ready-to-dock formats. ZINC is freely available at zinc15.docking.org.
Article
Motivation: The emergence of network medicine not only offers more opportunities for better and more complete understanding of the molecular complexities of diseases, but also serves as a promising tool for identifying new drug targets and establishing new relationships among diseases that enable drug repositioning. Computational approaches for drug repositioning by integrating information from multiple sources and multiple levels have the potential to provide great insights to the complex relationships among drugs, targets, disease genes and diseases at a system level. Results: In this article, we have proposed a computational framework based on a heterogeneous network model and applied the approach on drug repositioning by using existing omics data about diseases, drugs and drug targets. The novelty of the framework lies in the fact that the strength between a disease-drug pair is calculated through an iterative algorithm on the heterogeneous graph that also incorporates drug-target information. Comprehensive experimental results show that the proposed approach significantly outperforms several recent approaches. Case studies further illustrate its practical usefulness. Availability and implementation: http://cbc.case.edu Contact: jingli@cwru.edu Supplementary information: Supplementary data are available at Bioinformatics online.