Article

Mining node attributes for link prediction with a non-negative matrix factorization-based approach

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, NMF algorithms have been widely used in different types of networks to perform link prediction tasks, mainly due to the advantages of NMF such as dimensionality reduction, interpretability and network reconfiguration [28]. The core idea of most of the published NMF-based link prediction models is to map the adjacency matrix of the network to a low-dimensional latent space, then maintain the network structural information by graph regularization, and finally reconstruct the original network with minimum error [29][30][31]. Wang et al. [29] proposed a non-negative matrix factorization model based on the kernel framework, which preserves both local and global network structure information. Mahmoodi et al. [30] proposed the adversarial nonnegative matrix factorization link prediction model, which uses the common neighbor algorithm to maintain the local network structure. ...
... Mahmoodi et al. [30] proposed the adversarial nonnegative matrix factorization link prediction model, which uses the common neighbor algorithm to maintain the local network structure. Zhao et al. [31] proposed a non-negative matrix factorization link prediction model that fuses node attributes and network topology. In addition, Symmetric Non-negative Matrix Factorization (SNMF) is an important variant of NMF, and SNMF and its variants have been widely applied to community detections [32] and link prediction [33]. ...
... Mahmoodi et al. [30] proposed a link prediction model adversarial non-negative matrix factorization that enables an adversarial training algorithm to reconstruct the network as well as being able to preserve the network local structure. Zhao et al. [31] proposes a novel NMF-based link prediction model to integrate network topology and node attributes. In addition, in recent years, some (non-negative) matrix factorization variants have been widely used in link prediction to overcome the shortcomings of NMF, e.g., Duan et al. [33] perform the link prediction task for the first time by using SNMF, and the experimental results show the performance is better than that of the traditional methods. ...
Article
Full-text available
The aim of link prediction is to predict missing links or eliminate spurious links and new links in future network. Among different types of link prediction algorithms, non-negative matrix factorization(NMF)-based methods have become a promising and competitive algorithm with the advantages of dimension reduction and higher prediction accuracy. However, existing NMF-based algorithms have some problems as follows: (1) NMF failure directly captures the neighbor capability of the node; (2) most NMF-based methods do not guarantee that the additional information can effectively improve the performance. To dismiss the above limitations, a novel link prediction model named Pairwisely Constrained Symmetric Non-Negative Matrix Factorization via Degree-related clustering coefficient and Improved closeness centrality (PCSNMF_\_DI) is proposed in this paper. Specifically, PCSNMF_\_DI has the following features: (1) the pairwisely constrained graph regularized coupled with degree-related clustering coefficients and improved closeness centrality to maintain both local and global information; and (2) an ablation study is employed to analyse and discuss the necessity of the proposed model for attaching local and global information. In addition, the proof of convergence and the computational complexity for PCSNMF_\_DI are presented. PCSNMF_\_DI and baseline methods are evaluated on sixteen real-world networks and the results demonstrate that PCSNMF_\_DI significantly outperforms state-of-the-art link prediction approaches.
... NMF generates a low-dimensional representation of the data, which can then be used for various downstream tasks such as classification and clustering. Specifically, in complex networks, NMF has shown remarkable performance across various applications, including link prediction [28][29][30][31][32][33][34], a classification problem, and community detection [35][36][37][38][39][40], a clustering problem. ...
... This model integrates both network topology and node attribute information, calculates high-order proximities between nodes using the structure-attribute random walk similarity (SARWS) method, and employs the ' 2;1 -norm to constrain the loss function and regularization terms. Zhao et al. [34] proposed a novel link prediction method NMFLP that integrates network topology and node attributes, which can be employed to both attribute networks by using node semantic attributes and to networks without attributes by leveraging the topological structure as node attributes. ...
Article
Full-text available
Complex real-world systems, evolving over time, can be modeled as dynamic networks. Numerous studies have focused on utilizing information about the entities and relationships within networks. Temporal link prediction, a challenging yet critical task for dynamic networks, aims to forecast the appearance and disappearance of links in future snapshots based on the network structure observed in previous snapshots. However, existing works have not fully utilized information from historical networks, such as evolving structures and community data. Additionally, nonnegative matrix factorization (NMF) techniques are unable to automatically extract nonlinear spatial and temporal features from dynamic networks. In this paper, we introduce a unified temporal link prediction framework, EDeepEye, which leverages NMF and graph regularization to predict temporal links. Based on this framework, we propose three novel methods: SDeepEye, GDeepEye, and QDeepEye, which incorporate prior information, weighted matrices, and modularity matrices, respectively. Additionally, we provide effective multiplicative updating rules for the factors of the methods, which learn latent features from the temporal topological structure. Three evaluation metrics, i.e., area under the receiver operator characteristic curve, Precision and root mean squared error, are applied to verify the superiority of the proposed methods. The results of empirical study show that our proposed methods outperform the baseline methods on eight real-world networks and 16 synthetic networks.
... Matrix factorization methods NMF [23], NMF-AP [11], NMFLP [38] , ICP [37] Extract the key features of the network by optimizing the decomposition Used a lot in recommendation systems. ...
Article
Full-text available
Link prediction is a key task in the analysis of complex networks, aiming to predict missing links or potential future links based on the network’s attributes and structure. A common approach to this problem is using node similarity, which evaluates the likelihood of two unconnected nodes forming a link. Traditional similarity methods often rely on the concept of common neighbors. However, these methods have limitations as they do not account for the influence of neighboring nodes with varying degrees of connectivity and overlook potential links between nodes that do not share common neighbors. This paper proposes a novel algorithm that improves existing link prediction methods by considering both the contributions of common neighbors and the efficiency of information transmission within the network. We introduce a power-law distribution model to capture the contributions of neighbors and select relevant network features to construct indicators that reflect information transmission efficiency. The proposed method does not require manual parameter tuning, making it more efficient and adaptable to different networks. Through experiments on multiple real-world networks, the algorithm demonstrates high prediction accuracy and achieves impressive results when evaluated using standard metrics. The findings underscore the effectiveness of this method in accurately predicting missing links, making it a useful tool for network analysis.
... As research into complex networks deepens, researchers are paying increasing attention to areas like link prediction [1], community detection [2], and key node identification [3]. In complex networks, nodes represent individuals, and edges represent the connections between them. ...
Article
Full-text available
In the field of complex network analysis, accurately identifying key nodes is crucial for understanding and controlling information propagation. Although several local centrality methods have been proposed, their accuracy may be compromised if interactions between nodes and their neighbors are not fully considered. To address this issue, this paper proposes a key node identification method based on multilayer neighbor node gravity and information entropy (MNNGE). The method works as follows: First, the relative gravity of the nodes is calculated based on their weights. Second, the direct gravity of the nodes is calculated by considering the attributes of neighboring nodes, thus capturing interactions within local triangular structures. Finally, the centrality of the nodes is obtained by aggregating the relative and direct gravity of multilayer neighbor nodes using information entropy. To validate the effectiveness of the MNNGE method, we conducted experiments on various real-world network datasets, using evaluation metrics such as the susceptible-infected-recovered (SIR) model, Kendall τ correlation coefficient, Jaccard similarity coefficient, monotonicity, and complementary cumulative distribution function. Our results demonstrate that MNNGE can identify key nodes more accurately than other methods, without requiring parameter settings, and is suitable for large-scale complex networks.
... NMFLP can be used with a variety of networks, regardless of whether or not they include node semantic properties. In networks when node characteristics are present, it employs these attributes directly to forecast links; in the absence of attributes, it substitutes the network's structure for the attributes [32]. ...
Article
Full-text available
Link prediction in the Social Network is most important and an essential part now a days. The continued growth and evolution of this field will lead to new and improved methods for analyzing and understanding social networks. Link prediction is also helpful in various network applications in both academic and real-world contexts. For better understanding of prediction of links in a network graph through the use of different algorithms and information of prediction of missing link between network that all of the clear information is discuss in this paper. This paper presents the study of different types of algorithms which are better informative to understand the connection prediction, in a methodical manner. For this study, the similarity approaches are concentrated with its types of algorithms which are used to forecast the presence of missing links in social networks. This paper addresses the various link prediction approaches considering the structure of the network to reduce uncertainty. Evaluation measures for link prediction and their practical applications are also covered in this work. Lastly, it discusses the difficulties and provides plans for the development of link prediction methods in the future. This discussion may help researchers to choose the proper network structure for predicting the links.
Article
This paper presents a novel link prediction approach, termed Basic-Structural Similarity Link Prediction (BSSLP), designed to address the zero-similarity problem in complex networks. BSSLP integrates basic similarity, which establishes a nonzero baseline for all node pairs, with structural similarity that captures both local and intermediate topological features. This integration effectively mitigates challenges such as cold start and sparse network prediction. Through extensive experiments on nine real-world networks, BSSLP consistently outperforms seven benchmark methods, achieving an average AUC improvement of 5.17%. The method demonstrates robust performance across various network structures and maintains high prediction accuracy under different training set proportions. By providing nonzero similarity estimates for all potential edges, BSSLP significantly enhances the prediction of new connections and offers deeper insights into network dynamics.
Article
Compatibility among acupoints is a fundamental principle in acupuncture treatment within traditional Chinese medicine, playing a vital role in enhancing the effectiveness and scope of therapeutic interventions. With the increasing availability of acupuncture-related data, link prediction offers a data-driven approach that facilitates the evidence-based exploration and validation of acupoint compatibilities. However, existing link prediction methods often focus on mapping acupoints and their compatibility relationships into lower-dimensional spaces. These approaches can overlook essential acupoint features and make the predictions susceptible to noise interference. To address these challenges, we propose a novel acupoint compatibility prediction model based on a Feature-Aware Residual Graph Attention Network and Matrix Factorization (FRGATMF). Our model introduces a feature-aware connectivity fusion strategy that integrates acupoint attributes with structural information to enrich acupoint representations. Following this, a deep non-negative matrix factorization approach is employed to construct a denoised feature matrix. This matrix is processed through a residual graph attention network to derive comprehensive and effective node embeddings, which are crucial for accurate link prediction. Experimental results on the acupuncture dataset, along with three public datasets, demonstrate that FRGATMF significantly outperforms seven existing comparison models across various evaluation metrics. Additionally, link prediction can identify previously unconsidered or undocumented acupoint combinations that may offer better therapeutic results, thus expanding the range of treatment options and highlighting its potential in improving the prediction of acupoint compatibility relationships.
Article
Full-text available
Predicting links in complex networks has been one of the essential topics within the realm of data mining and science discovery over the past few years. This problem remains an attempt to identify future, deleted, and redundant links using the existing links in a graph. Local random walk is considered to be one of the most well-known algorithms in the category of quasi-local methods. It traverses the network using the traditional random walk with a limited number of steps, randomly selecting one adjacent node in each step among the nodes which have equal importance. Then this method uses the transition probability between node pairs to calculate the similarity between them. However, in most datasets this method is not able to perform accurately in scoring remarkably similar nodes. In the present article, an efficient method is proposed for improving local random walk by encouraging random walk to move, in every step, towards the node which has a stronger influence. Therefore, the next node is selected according to the influence of the source node. To do so, using mutual information, the concept of the asymmetric mutual influence of nodes is presented. A comparison between the proposed method and other similarity-based methods (local, quasi-local, and global) has been performed, and results have been reported for 11 real-world networks. It had a higher prediction accuracy compared with other link prediction approaches.
Article
Full-text available
We consider the graph link prediction task, which is a classic graph analytical problem with many real-world applications. With the advances of deep learning, current link prediction methods commonly compute features from subgraphs centered at two neighboring nodes and use the features to predict the label of the link between these two nodes. In this formalism, a link prediction problem is converted to a graph classification task. In order to extract fixed-size features for classification, graph pooling layers are necessary in the deep learning model, thereby incurring information loss. To overcome this key limitation, we propose to seek a radically different and novel path by making use of the line graphs in graph theory. In particular, each node in a line graph corresponds to a unique edge in the original graph. Therefore, link prediction problems in the original graph can be equivalently solved as a node classification problem in its corresponding line graph, instead of a graph classification task. Experimental results on fourteen datasets from different applications demonstrate that our proposed method consistently outperforms the state-of-the-art methods, while it has fewer parameters and high training efficiency.
Article
Full-text available
Link prediction in a complex network is a problem of fundamental interest in network science and has attracted increasing attention in recent years. It aims to predict missing (or future) links between two entities in a complex system that are not already connected. Among existing methods, local similarity indices are most popular that take into account the information of common neighbours to estimate the likelihood of existence of a connection between two nodes. In this paper, we propose global and quasi-local extensions of some commonly used local similarity indices. We have performed extensive numerical simulations on publicly available datasets from diverse domains demonstrating that the proposed extensions not only give superior performance, when compared to their respective local indices, but also outperform some of the current, state-of-the-art, local and global link-prediction methods.
Article
Full-text available
Link prediction provides methods for estimating potential connections in complex networks that have theoretical and practical relevance for personalized recommendations and various other applications. Traditional collaborative filtering algorithms treat similarity as a scalar value causing some information loss. This paper is primarily a novel approach to calculating user similarity that uses a vector to measure user similarity across multiple dimensions based on the items’ characteristics. Our approach defines global similarity, local similarity and meta similarity to calculate vector similarity as indicators of similarity between users, revealing and measuring the difference between users’ general preferences in different scenarios. The experimental results show that the presented similarity methods improve prediction accuracy in recommender systems compared to some state-of-art approaches. Our results confirm that user similarity can be measured differently when considering different classes of items, which extends our understanding of similarity measurement.
Conference Paper
Full-text available
Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute information. It is more challenging since the new nodes do not have structure information and cannot be seen during the model training. To solve this problem, we propose a model called DEAL, which consists of three components: two node embedding encoders and one alignment mechanism. The two encoders aim to output the attribute-oriented node embedding and the structure-oriented node embedding, and the alignment mechanism aligns the two types of embeddings to build the connections between the attributes and links. Our model DEAL is versatile in the sense that it works for both inductive and transductive link prediction. Extensive experiments on several benchmark datasets show that our proposed model significantly outperforms existing inductive link prediction methods, and also outperforms the state-of-the-art methods on transductive link prediction.
Article
Full-text available
The link prediction can be used to seek missing or future links in the network, so it has become a hot research topic. The network generally contains two types of information: the topological structure of network formed by the connection between nodes, and the attribute information of nodes. However, the existing topology-based link prediction algorithms consider little attribute information. In this paper, a novel algorithm called Network Embedding with Attribute Deep Fusion for Link Prediction (NEADF-LP) is proposed. We get the embedded vectors with topological structure and attribute information by structure encoder and attribute encoder respectively, and fuse two vectors deeply. Compared with mainstream baselines on CiteSeer and Cora datasets, the results show that the deep fusion of topological structure and attribute information improve the accuracy of link prediction effectively.
Article
Full-text available
Real world complex networks are indirect representation of complex systems. They grow over time. These networks are fragmented and raucous in practice. An important concern about complex network is link prediction. Link prediction aims to determine the possibility of probable edges. The link prediction demand is often spotted in social networks for recommending new friends, and, in recommender systems for recommending new items (movies, gadgets etc) based on earlier shopping history. In this work, we propose a new link prediction algorithm namely “Common Neighbor and Centrality based Parameterized Algorithm” (CCPA) to suggest the formation of new links in complex networks. Using AUC (Area Under the receiver operating characteristic Curve) as evaluation criterion, we perform an extensive experimental evaluation of our proposed algorithm on eight real world data sets, and against eight benchmark algorithms. The results validate the improved performance of our proposed algorithm.
Article
Full-text available
The goal of link prediction is to estimate the possibility of future links among nodes using known network information and the attributes of the nodes. According to the time-varying characteristics and the node’s mobility of opportunistic networks, this paper proposes a novel link prediction method based on the Bayesian recurrent neural network (BRNN-LP) framework. The time series data of a dynamic opportunistic networks is sliced into snapshots in which there exist the correlation information and spatial location information. A vector of a snapshot is constructed based on such information, which represents the link information. Then, the vectors of multiple network snapshots constitute a spatiotemporal vector sequence. Benefiting from the BRNN’s ability of extracting the features of time series data, the correlation between spatiotemporal vector sequence and node connection states is learned, and the law of the link evolution is captured to predict future links. The results on the MIT Reality dataset show that compared with methods such as the similarity-based indices, the support vector classifier, linear discriminant analysis and recurrent neural network, the proposed prediction method is more accurate and stable.
Article
Full-text available
Link prediction in social networks has a long history in complex network research area. The formation of links in networks has been approached by scientists from different backgrounds, ranging from physics to computer science. To predict the formation of new links, we consider measures which originate from network science and use them in the place of mass and distance within the formalism of Newton's Gravitational Law. The attraction force calculated in this way is treated as a proxy for the likelihood of link formation. In particular, we use three different measures of vertex centrality as mass, and 13 dissimilarity measures including shortest path and inverse Katz score in place of distance, leading to over 50 combinations that we evaluate empirically. Combining these through gravitational law allows us to couple popularity with similarity, two important characteristics for link prediction in social networks. Performance of our predictors is evaluated using Area Under the Precision–Recall Curve (AUC)for seven different real-world network datasets. The experiments demonstrate that this approach tends to outperform the setting in which vertex similarity measures like Katz are used on their own. Our approach also gives us the opportunity to combine network's global and local properties for predicting future or missing links. Our study shows that the use of the physical law which combines node importance with measures quantifying how distant the nodes are, is a promising research direction in social link prediction.
Article
Full-text available
Link prediction in dynamic networks aims to predict edges according to historical linkage status. It is inherently difficult because of the linear/non-linear transformation of underlying structures. The problem of efficiently performing dynamic link inference is extremely challenging due to the scale of networks and different evolving patterns. Most previous approaches for link prediction are based on members’ similarity and supervised learning methods. However, research work on investigating hidden patterns of dynamic social networks are rarely conducted. In this paper, we propose a novel framework that incorporate a deep learning method, i.e., Temporal Restricted Boltzmann Machine, and a machine learning approach, i.e., Gradient Boosting Decision Tree. The proposed model is capable of modeling each link’s evolving patterns. We also propose a novel transformation for input matrix, which significantly reduce the computational complexity and make our algorithm scalable to large networks. Extensive experiments demonstrate that the proposed method outperforms existing state-of-the-art algorithms on real-world dynamic networks.
Article
Full-text available
Traditional methods for link prediction can be categorized into three main types: graph structure feature-based, latent feature-based, and explicit feature-based. Graph structure feature methods leverage some handcrafted node proximity scores, e.g., common neighbors, to estimate the likelihood of links. Latent feature methods rely on factorizing networks' matrix representations to learn an embedding for each node. Explicit feature methods train a machine learning model on two nodes' explicit attributes. Each of the three types of methods has its unique merits. In this paper, we propose SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction), a new framework for link prediction which combines the power of all the three types into a single graph neural network (GNN). GNN is a new type of neural network which directly accepts graphs as input and outputs their labels. In SEAL, the input to the GNN is a local subgraph around each target link. We prove theoretically that our local subgraphs also reserve a great deal of high-order graph structure features related to link existence. Another key feature is that our GNN can naturally incorporate latent features and explicit features. It is achieved by concatenating node embeddings (latent features) and node attributes (explicit features) in the node information matrix for each subgraph, thus combining the three types of features to enhance GNN learning. Through extensive experiments, SEAL shows unprecedentedly strong performance against a wide range of baseline methods, including various link prediction heuristics and network embedding methods.
Article
Full-text available
For the last years, a considerable amount of attention has been devoted to the research about the link prediction (LP) problem in complex networks. This problem tries to predict the likelihood of an association between two not interconnected nodes in a network to appear in the future. One of the most important approaches to the LP problem is based on supervised machine learning (ML) techniques for classification. Although many works have presented promising results with this approach, choosing the set of features (variables) to train the classifiers is still a major challenge. In this article, we report on the effects of three different automatic variable selection strategies (Forward, Backward and Evolutionary) applied to the feature-based supervised learning approach in LP applications. The results of the experiments show that the use of these strategies does lead to better classification models than classifiers built with the complete set of variables. Such experiments were performed over three datasets (Microsoft Academic Network, Amazon and Flickr) that contained more than twenty different features each, including topological and domain-specific ones. We also describe the specification and implementation of the process used to support the experiments. It combines the use of the feature selection strategies, six different classification algorithms (SVM, K-NN, naïve Bayes, CART, random forest and multilayer perceptron) and three evaluation metrics (Precision, F-Measure and Area Under the Curve). Moreover, this process includes a novel ML voting committee inspired approach that suggests sets of features to represent data in LP applications. It mines the log of the experiments in order to identify sets of features frequently selected to produce classification models with high performance. The experiments showed interesting correlations between frequently selected features and datasets.
Article
Full-text available
Networks have become increasingly important to model complex systems comprised of interacting elements. Network data mining has a large number of applications in many disciplines including protein-protein interaction networks, social networks, transportation networks, and telecommunication networks. Different empirical studies have shown that it is possible to predict new relationships between elements attending to the topology of the network and the properties of its elements. The problem of predicting new relationships in networks is called link prediction. Link prediction aims to infer the behavior of the network link formation process by predicting missed or future relationships based on currently observed connections. It has become an attractive area of study since it allows us to predict how networks will evolve. In this survey we will review the general-purpose techniques at the heart of the link prediction problem, which can be complemented by domain-specific heuristic methods in practice.
Conference Paper
Full-text available
Graph embedding algorithms embed a graph into a vector space where the structure and the inherent properties of the graph are preserved. The existing graph embedding methods cannot preserve the asymmetric transitivity well, which is a critical property of directed graphs. Asymmetric transitivity depicts the correlation among directed edges, that is, if there is a directed path from u to v, then there is likely a directed edge from u to v. Asymmetric transitivity can help in capturing structures of graphs and recovering from partially observed graphs. To tackle this challenge, we propose the idea of preserving asymmetric transitivity by approximating high-order proximity which are based on asymmetric transitivity. In particular, we develop a novel graph embedding algorithm, High-Order Proximity preserved Embedding (HOPE for short), which is scalable to preserve high-order proximities of large scale graphs and capable of capturing the asymmetric transitivity. More specifically, we first derive a general formulation that cover multiple popular high-order proximity measurements, then propose a scalable embedding algorithm to approximate the high-order proximity measurements based on their general formulation. Moreover, we provide a theoretical upper bound on the RMSE (Root Mean Squared Error) of the approximation. Our empirical experiments on a synthetic dataset and three real-world datasets demonstrate that HOPE can approximate the high-order proximities significantly better than the state-of-art algorithms and outperform the state-of-art algorithms in tasks of reconstruction, link prediction and vertex recommendation.
Article
Full-text available
Networks are mathematical structures that are universally used to describe a large variety of complex systems, such as social, biological, and technological systems. The prediction of missing links in incomplete complex networks aims to estimate the likelihood of the existence of a link between a pair of nodes. Various topological features of networks have been applied to develop link prediction methods. However, the exploration of features of links is still limited. In this paper, we demonstrate the power of node and link clustering information in predicting top -L missing links. In the existing literature, link prediction algorithms have only been tested on small-scale and middle-scale networks. The network scale factor has not attracted the same level of attention. In our experiments, we test the proposed method on three groups of networks. For small-scale networks, since the structures are not very complex, advanced methods cannot perform significantly better than classical methods. For middle-scale networks, the proposed index, combining both node and link clustering information, starts to demonstrate its advantages. In many networks, combining both node and link clustering information can improve the link prediction accuracy a great deal. Large-scale networks with more than 100 000 links have rarely been tested previously. Our experiments on three large-scale networks show that local clustering information based methods outperform other methods, and link clustering information can further improve the accuracy of node clustering information based methods, in particular for networks with a broad distribution of the link clustering coefficient.
Article
Multiplex networks are very flexible at showing heterogeneous relationships between identical entities. Link prediction is a fundamental problem in network science. There are many studies on link prediction in complex networks, but few studies were conducted on link prediction in multiplex networks. This study proposes a method for estimating link likelihood in multiplex networks based on the Node-Accessibility-Distribution (NAD) and the co-evolving factors of layers. The NAD is introduced as a probabilistic measure to find local and pseudo-global structural features of nodes in layers of the multiplex network. The probabilistic distance among nodes is calculated using Jensen–Shannon diversity. Since the evolution of one layer subsequently affects the dynamics of other layers, this study introduces the co-evolving factors as criteria for determining the effect of the evolution of layers in the formation of new links in the target layer. In order to estimate the co-evolving factors, logistics regression and Maximum Likelihood Estimation(MLE) are employed. The proposed method is evaluated with six real-world datasets. The results show that the proposed approach has a better average AUC and precision than the state-of-the-art methods. Based on various datasets, the AUC and precision were improved by 1% to 5% compared with the state-of-the-art.
Article
Learning good quality neural graph embeddings has long been achieved by minimzing the pointwise mutual information (PMI) for co-occuring nodes in simulated random walks. This design choice has been mostly popularized by the direct application of the highly-successful word embedding algorithm word2vec to predicting the formation of new links in social, co-citation, and biological networks. However, such a skeuomorphic design of graph embedding methods entails a truncation of information coming from pairs of nodes with low PMI. To circumvent this issue, we propose an improved approach to learning low-rank factorization embeddings that incorporate information from such unlikely pairs of nodes and show that it can improve the link prediction performance of baseline methods from 1.2% to 24.2%. Based on our results and observations, we outline further steps that could improve the design of next graph embedding algorithms that are based on matrix factorizaion.
Article
Community detection is an important research field of complex network analysis and focuses on the study of networks’ aggregation behaviours. Different from traditional methods that only consider network structural topology, many efforts have been put into combining network structural topology with node content attributes to achieve better community detection performance. However, it is critical to make an appropriate trade-off between structural topology and node content. In this paper, we propose an adaptive trade-off approach, called ANMF, which not only considers both structural topology and node content, but also provides a flexible parameter to balance their contribution. Compared with other related approaches, ANMF is a kind of non-negative matrix factorization (NMF)-based community detection method, but it imposes more constraints on the network reconstruction. More precisely, ANMF simultaneously employs a decoder that reconstructs a network from its community membership space and an encoder that transforms the network into the community membership space. Moreover, compared with the most related state-of-the-art effort adaptive semantic community detection (ASCD), which considers the topology part always has more contribution if there is a mismatch, ANMF considers the mismatch in two different situations, i.e., the topology part contributes more than the node content part and the node content part contributes more than the topology part. Based on the intensive evaluation on both real and artificial networks, ANMF provided 4.95%∼126.41% higher normalized mutual information (NMI) values than the models without considering node content information on 13 out of 14 experimental networks. ANMF also presented 7.38%∼201.01% higher NMI values than ASCD on 13 out of 14 experimental networks. In addition, ANMF showed good convergence performance, and it could converge after 100 iterations on all of the networks. ANMF also provided stability alike to similar methods in terms of the average NMI standard deviation, which is 0.03 on all of the networks.
Article
As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network type-specific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is dominated by the Triadic Closure Principle. We still lack an adaptive and comprehensive understanding of network formation patterns for predicting potential links. In addition, it is valuable to investigate how network local information can be better utilized. To this end, we proposed a novel method named Link prediction using Multiple Order Local Information (MOLI) that exploits the local information from the neighbors of different distances, with parameter that can be a prior-driven based on prior knowledge, or data-driven by solving an optimization problem on observed networks. MOLI defined a local network diffusion process via random walks on the graph, resulting in better use of network information. We show that MOLI outperforms the other 12 widely used link prediction algorithms on 15 different types of simulated and real-world networks. We also conclude that there are different patterns of local information utilization for different networks, including social networks, communication networks, biological networks, etc. In particular, the classical common neighbor-based algorithm is not as adaptable to all social networks as it is perceived to be; instead, some of the social networks obey the Quadrilateral Closure Principle which preferentially connects paths of length three.
Article
A network is a form of data representation and is widely used in many fields. For example, in social networks, we regard nodes as individuals or groups, and the edges between nodes are called links, that is, the interaction between people. By analyzing the interaction of nodes, we can learn more about network relationships. The core idea of link prediction is to predict whether there is a new relationship between a pair of nodes or to discover hidden links in the network. Link prediction has been applied to many fields such as social networking, e-commerce, bioinformatics, and so on. In addition, many studies have used graph embedding for link prediction, which effectively preserves the network structure and converts node information into a low-dimensional vector space. In this research, we used three graph embedding approaches: matrix decomposition-based methods, random walk-based methods, and deep learning-based methods. Since each method has its own advantages and disadvantages, we propose an ensemble model to combine these graph embeddings into a new representation of each node. Then, we designed a two-stage link prediction model based on a multi-classifier ensemble and took the new node representation as its input. Performance evaluation was conducted on multiple data sets. Experimental results show that the integration of multiple embedding methods and multiple classifiers can significantly improve the performance of link prediction.
Article
Quantifying and predicting the long-term impact of both scientific papers and individual authors have important implications for many academic policy decisions, from identifying emerging trends to assessing the merits of proposals for potential funding. This paper presents SI-HDGNN, a novel heterogeneous dynamical graph neural network that explicitly models a heterogeneous, weighted, directed and attributed academic graph, enabling a prediction of the cumulative scientific impact of papers and authors by a specifically designed aggregation method. Unlike the existing feature-based or homogeneous approaches, SI-HDGNN addresses the problem by capturing the temporal-structural characteristics of the papers and authors as well as their complex interactions and long-term dependencies. Extensive experiments conducted on three large-scale multidisciplinary academic datasets demonstrate its superior performance in predicting the long-term scientific impact of both scientific papers and authors.
Article
The rapid advance of online social networks and the tremendous growth in the number of participants and attention have led to information overload and increased the difficulty of making accurate recommendations of new friends. Existing recommendation methods based on semantic similarity, social graphs, or collaborative filtering are unsuitable for very large social networks because of their high computational cost or low effectiveness. We present an approach entitled Hybrid Recommendation Through Community Detection (HRTCD) for friend prediction with linear runtime complexity that makes full use of the characteristics of social media based on hybrid information fusion. It extracts the content topics of microblog for each participant along with the appraisal of domain-dependent user impact, builds a small-size heterogeneous network for each target user by fusing the interest similarity and social interaction between individuals, discovers all of the implicit clusters of target user via a community detection algorithm, and establishes the recommendation set consisting of a fixed number of potential friends. Experimental results on both the synthetic and real-world social networks demonstrate that our scheme provides a higher prediction rating and significantly improves the recommendation accuracy and offers much faster performance.
Article
Link prediction aims to predict missing links or eliminate spurious links and new links in future network by known network structure information. Most existing link prediction methods are shallow models and didn’t consider network noise. To address this issue, in this paper, we propose a novel link prediction model based on deep non-negative matrix factorization, which elegantly fuses topology and sparsity-constrained to perform link prediction tasks. Specifically, our model fully exploits the observed link information for each hidden layer by deep non-negative matrix factorization. Then, we utilize the common neighbor method to calculate the similarity scores and map it to multi-layer low-dimensional latent space to obtain the topological information of each hidden layer. Simultaneously, we employ the ℓ2,1-norm constrained factor matrix at each hidden layer to remove the random noise. Besides, we provide an effective the multiplicative updating rules to learn the parameter of this model with a convergence guarantee. Extensive experiments results on eight real-world datasets demonstrate that our proposed model significantly outperforms the state-of-the-art methods.
Article
Link prediction refers to predicting the connection probability between two nodes in terms of existing observable network information, such as network structural topology and node properties. Although traditional similarity-based methods are simple and efficient, their generalization performance varies widely in different networks. In this paper, we propose a novel link prediction approach ICP based on inductive matrix completion, which recoveries node connection probability matrix by applying node features to a low-rank matrix. The approach first explores a comprehensive node feature representation by combining different structural topology information with node importance properties via feature construction and selection. The selected node features are then used as the input of a supervised learning task for solving the low-rank matrix. The node connection probability matrix is finally recovered by a bi-linear function, which predicts the connection probability between two nodes with their features and the low-rank matrix. In order to demonstrate the ICP superiority, we took eleven related efforts including two recent methods proposed in 2020 as baseline methods, and it is shown that ICP has stable performance and good universality in twelve different real networks. Compared with the baseline methods, the improvements of ICP in terms of the average AUC results are ranging from 3.81% ∼ 12.77% and its AUC performance is improved by 0.08% ∼ 3.54% compared with the best baseline method. The limitation of ICP lies in its high computational complexity due to the feature construction, but the complexity can be reduced by replacing complex features with node semantic attributes if there are additional data available. Moreover, it provides a potential link prediction solution for large-scale networks, since inductive matrix completion is a supervised learning task, in which the underlying low-rank matrix can be solved by representative nodes instead of all their nodes.
Article
Link prediction is a technique to forecast future new or missing relationships between nodes based on the current network information. However, the link prediction in monoplex networks seems to have a long background, the attempts to accomplish the same task on multiplex networks are not abundant, and it was often a challenge to apply conventional similarity methods to multiplex networks. The issue of link prediction in multiplex networks is the way of predicting the links in one layer, taking structural information of other layers into account. One of the most important methods of link prediction in a monoplex network is a local random walk (LRW) that captures the network structure using pure random walking to measure nodes similarity of the graph and find unknown connections. The goal of this paper is to propose an extended version of local random walk based on pure random walking for solving link prediction in the multiplex network, referred to as the Multiplex Local Random Walk (MLRW). We explore approaches for leveraging information mined from inter-layer and intra-layer in a multiplex network to define a biased random walk for finding the probability of the appearance of a new link in one target layer. Experimental studies on seven multiplex networks in the real world demonstrate that a multiplex biased local random walk performs better than the state-of-the-art methods of link prediction and corresponding unbiased case and improves prediction accuracy.
Article
Link prediction aims to predict the missing interactions in evolving networks that may appear in the future. It has practical importance in various real-world applications, ranging from friendship recommendation, knowledge graph completion, target advertising, and protein-protein interaction prediction. Most of the recent efforts focus on the structure of the network while ignoring many other essential factors. In this paper, we present a modified Latent Dirichlet Allocation (LDA), and Hidden Naive Bayesian (HNB) based link prediction technique named PILHNB model for link prediction in dynamic social networks by considering behavioral controlling elements like relationship network structure, nodes’ attributes, location-based information of nodes, nodes’ popularity, users’ interests, and learning the evolution pattern of these factors in the networks. Experimental results on six real-world networks demonstrate our proposed models’ effectiveness and efficiency compared with existing state-of-the-art link prediction techniques.
Article
In the field of ecommerce, most recommendation algorithms are based on user-item bipartite graph network (BGN). But this kind of recommendation algorithm is severely lacking in accuracy and diversity. In this paper, a novel ecommerce recommendation algorithm is proposed based on BGN link prediction. Firstly, all the user-item data were imported into distance formula to calculate the similarity between the attributes. Then, the BGN was projected into a single-mode network (SMN), making it more efficient to extract potential links from the BGN. On this basis, the potential links were predicted based on similarity. Through experiments on real ecommerce datasets, it was proved that our algorithm has a higher accuracy and coverage than typical recommendation algorithms.
Article
Link prediction is an important problem in topics of complex networks, which can be applied to many practical scenarios such as information retrieval and marketing analysis. Strategies based on random walk are commonly used to address this problem. In common practice of a random walk, a link predictor may move from one node to one of its neighbors with uniform transferring probability regardless of the characteristics of the local structure around that node, which, however, may contain useful information for a successful prediction. In this paper, we propose a refined random walk approach which incorporates graph embedding method. This approach may provide biased transferring probabilities to perform random walk so as to further exploit topological properties embedded in the network structure. The performance of proposed method is examined by comparing with other commonly used indexes. Results show that our method outperforms all these indexes reflected by better prediction accuracy.
Article
Link prediction is one of the most important and challenging tasks in complex network analysis, which aims to predict missing link based on existing ones in a network. This problem is of both theoretical interest and has applications in diverse scientific disciplines, including social network analysis, recommendation systems, and biological networks. In this paper we propose a novel link prediction method that aims at improving the accuracy of existing path-based methods by incorporating information about the nodes along local paths. We investigate the proposed framework empirically and conduct extensive experiments on real-world datasets obtained from diverse domains. Results show that the proposed method has achieved increased prediction accuracy when compared to existing state-of-the-art link prediction methods.
Article
Discovering knowledge combination has been considered an effective strategy for knowledge retrieval and knowledge discovery. Generally, knowledge combination driven by close-cooperation can be achieved via modeling the process of knowledge transfer. However, the existing studies seldom built connections between knowledge transfer and the identification of knowledge combination, especially in the existing knowledge transfer models, less attention is paid to the effects of trust and knowledge similarity. Therefore, the research motivations of this paper are to model the process of knowledge transfer and to further discover knowledge combinations. To minimize the risks of knowledge transfer, both the knowledge similarity and the trust embodied within need to be taken into account, thereby proposing a bi-layered network regarding knowledge similarity and trust. First, a trust network is obtained novelly whereby the proposed method of trust link prediction. Accordingly, a directed knowledge flow network is constructed through a proposed knowledge transfer model endowed with trust scores. Second, knowledge combinations in a knowledge flow network are therefore acquired by adopting a community detection method. Third, various probabilities of knowledge combinations based on the maximum network modularity are calculated with respect to the influence of knowledge similarity on cooperation probability. The key contributions of this paper are summarized as an effective approach to identifying knowledge combinations conducted to improve the efficiencies of knowledge management. Related experiments and comparisons are presented to illustrate the practicalities of the proposed method.
Article
Link prediction finds missing links (in static networks) or predicts the likelihood of future links (in dynamic networks). The latter definition is useful in network evolution (Wang et al., 2011; Barabasi and Albert, 1999; Kleinberg, 2000; Leskovec et al., 2005; Zhang et al., 2015). Link prediction is a fast-growing research area in both physics and computer science domain. There exists a wide range of link prediction techniques like similarity-based indices, probabilistic methods, dimensionality reduction approaches, etc., which are extensively explored in different groups of this article. Learning-based methods are covered in addition to clustering-based and information-theoretic models in a separate group. The experimental results of similarity and some other representative approaches are tabulated and discussed. To make it general, this review also covers link prediction in different types of networks, for example, directed, temporal, bipartite, and heterogeneous networks. Finally, we discuss several applications with some recent developments and concludes our work with some future works.
Article
In social network analysis, link prediction is a fundamental tool to determine new relationships among users which are most likely to occur in the future. Link prediction by means of a similarity metric is common in which a pair of similar nodes is likely to be connected. In this paper, we propose a similarity-based link prediction algorithm, referred to as CNDP, which similarity score is determined according to the structure and specific characteristics of the network, as well as the topological characteristics. In the proposed method, a new metric for link prediction is introduced, considering clustering coefficient as a structural property of the network. Moreover, the presented method considers the neighbors of shared neighbors in addition to only shared neighbors of each pair of nodes, which leads to achieve better performance than other similar link prediction methods. The empirical results of evaluation on synthetic and real-world networks demonstrate that the proposed algorithm achieves higher accuracy prediction results with lower complexity, and performs superior compared to other algorithms.
Article
The aim of link prediction is to disclose the underlying evolution mechanism of networks, which could be utilized to predict missing links or eliminate spurious links. However, real-world networks data usually encounters challenges,such as missing links, spurious links and random noise, which seriously hamper the prediction accuracy of existing link prediction methods. Therefore, in this paper, we propose a novel Robust Non-negative Matrix Factorization via jointly Manifold regularization and Sparse learning (MS-RNMF) method in link prediction that solves the problems. Compared to existing methods, MS-RNMF has three-fold advantages: First of all, the MS-RNMF employ manifold regularization and k-medoids algorithm jointly to preserve the network local and global topology information. Besides, the MS-RNMF adopts ℓ2,1-norm to constrain loss function and regularization term, random noise and spurious links could be effectively remove. Finally, we employ multiplicative updating rules to learn the model parameter and prove the convergence of the algorithm. Extensive experiments results performed on eleven real-world networks demonstrate that the MS-RNMF outperforms the state-of-the-arts methods in predicting missing links , identifying spurious links and eliminating random noise.
Article
Link Prediction, which aims to infer the missing or future connections between two nodes, is a key step in many complex network analysis areas such as social friend recommendation and protein function prediction. A majority of existing efforts are devoted to define the influence of neighbor nodes. However, even though recent studies show that node attributes have an added value to network structure for accurate link prediction, it still remains ignoring the real node influence. To address this problem, in this paper we investigate influential node identification technique to formulate a node ranking-based link prediction metric. The general idea of our approach is to exploit the ranking score as the contribution of a common neighbor. Such fundamental mechanism preserve both local structure and global information. Experimental results on real-world networks with two scenario demonstrate that our proposed metrics achieves better performance than existing state-of-the-art local and global similarity methods.
Article
Networks have become increasingly important to model many complex systems. This powerful representation has been employed in different tasks of artificial intelligence including machine learning, expert and intelligent systems. Link prediction, a branch of network pattern recognition, is the most fundamental and essential problem for complex network analysis. However, most existing link-prediction methods only consider a network's topology structures, and in doing so, these methods miss the opportunity to use nodes’ attribute information. We present a combined approach here that uses nodes’ attribute information and topology structure to direct link prediction. First, we propose a discriminative feature combinations selection method. Specifically, we present a novel mathematics inference to detail discriminative feature combinations. Second, based on the selected feature combinations, we aggregate the network, and further compute each feature combination's contributing degree to the link's formation, called the strength of feature combination. Third, we apply discriminative feature combinations into a local random walk model; in particular, we compute and redistribute the random walk particle's transfer probability in terms of each feature combination's strength, which makes the transfer probability depend on feature combinations satisfied by each node's edges. Finally, we predict links in complex networks based on the improved random walk model. Experimental results on real-life complex network datasets demonstrate that, compared to other baseline methods, using discriminative feature combinations and topology structures in tandem strengthens prediction performance remarkably.
Article
The associations between genetic and environmental factors (EFs) are significant to understand the development and progression of many complex human diseases. There have been many research studies concerning genetic factors (protein-coding genes, microRNAs) and EFs but limited research addressing the associations between long noncoding RNAs (lncRNAs) and EFs. LncRNAs of more than 200 nucleotides are an important class of non-coding transcripts and are effective in the organization of gene expressions and, therefore, on the formation of diseases. Environmental factors can alter the expression patterns of some lncRNAs, so a thorough understanding of the associations between lncRNAs and environmental factors will contribute to the understanding of the mechanisms of many complex diseases at the molecular level. In this study, we have developed a model based on the KATZ measure to find potential new associations between lncRNAs and EFs by using the DLREFD database, which contains proven associations between lncRNAs and EFs. The KATZ measure and Gaussian interaction profile kernel similarity were used to predict new potential associations between lncRNAs and EFs. The AUC results obtained by global leave-one-out cross-validation and 2-fold and 5-fold cross-validations were 0.855, 0.827, 0.838, respectively. These results show that our model can predict new potential associations between lncRNAs and EFs with high reliability. Also, the results obtained in case studies demonstrate the effectiveness of our model.
Article
Among the numerous link prediction algorithms in complex networks, similarity-based algorithms play an important role due to promising accuracy and low computational complexity. Apart from the classical CN-based indexes, several interdisciplinary methods provide new ideas to this problem and achieve improvements in some aspects. In this article, we propose a new model from the perspective of an intermediary process and introduce indexes under the framework, which show better performance for precision. Combined with k-shell decomposition, our deeper analysis gives a reasonable explanation and presents an insight on classical and proposed algorithms, which can further contribute to the understanding of link prediction problem.
Article
In social network analysis, community detection is a basic step to understand the structure and function of networks. Some conventional community detection methods may have limited performance because they merely focus on the networks’ topological structure. Besides topology, content information is another significant aspect of social networks. Although some state-of-the-art methods started to combine these two aspects of information for the sake of the improvement of community partitioning, they often assume that topology and content carry similar information. In fact, for some examples of social networks, the hidden characteristics of content may unexpectedly mismatch with topology. To better cope with such situations, we introduce a novel community detection method under the framework of non-negative matrix factorization (NMF). Our proposed method integrates topology as well as content of networks and has an adaptive parameter (with two variations) to effectively control the contribution of content with respect to the identified mismatch degree. Based on the disjoint community partition result, we also introduce an additional overlapping community discovery algorithm, so that our new method can meet the application requirements of both disjoint and overlapping community detection. The case study using real social networks shows that our new method can simultaneously obtain the community structures and their corresponding semantic description, which is helpful to understand the semantics of communities. Related performance evaluations on both artificial and real networks further indicate that our method outperforms some state-of-the-art methods while exhibiting more robust behavior when the mismatch between topology and content is observed.
Article
Real-world networks feature weights of interactions, where link weights often represent some physical attributes. In many situations, to recover the missing data or predict the network evolution, we need to predict link weights in a network. In this paper, we first proposed a series of new centrality indices for links in line graph. Then, utilizing these line graph indices, as well as a number of original graph indices, we designed three supervised learning methods to realize link weight prediction both in the networks of single layer and multiple layers, which perform much better than several recently proposed baseline methods. We found that the resource allocation index (RA) plays a more important role in the weight prediction than other topological properties, and the line graph indices are at least as important as the original graph indices in link weight prediction. In particular, the success application of our methods on Yelp layered network suggests that we can indeed predict the offline co-foraging behaviors of users just based on their online social interactions, which may open a new direction for link weight prediction algorithms, and meanwhile provide insights to design better restaurant recommendation systems.
Article
Link prediction aims to extract missing informations, identify spurious interactions and potential informations in complex networks. Similarity-based methods, maximum likelihood methods and probabilistic models are the mainstreaming classes algorithms for link prediction. Meanwhile, low rank matrix approximation has been widely used in networks analysis and it can extract more useful features hidden in the original data through some kernel-induced nonlinear mapping. In this paper, based on the non-negative matrix factorization (NMF), we propose a kernel framework for link prediction and network reconstruction by using different kernels which could get both global and local information of the network through kernel mapping. In detailed, we map the adjacency matrix of the network to another feature space by two kernel functions, the Linear Kernel and Covariance Kernel, which have the principled interpretations for the network analysis and link predication. We test the AUC and Precision of widely used methods on a series of real world networks with different proportions of the training sets, experimental results show that our proposed framework has more robust and accurate performance compared with state-of-the-art methods. Remarkably, our approach also has the potential to address the problem of link prediction using small fraction of training set.
Article
Recently, a number of similarity-based methods have been proposed to predict the missing links in complex network. Among these indices, the resource allocation index performs very well with lower time complexity. However, it ignores potential resources transferred by local paths between two endpoints. Motivated by the resource exchange taking places between endpoints, an extended resource allocation index is proposed. Empirical study on twelve real networks and three synthetic dynamic networks has shown that the index we proposed can achieve a good performance, compared with eight mainstream baselines.
Conference Paper
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
Article
We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.