Figure 8 - uploaded by Stephen C. North
Content may be subject to copyright.
Plot showing size of proximity graphs and their relation to captured proximity. The top plot shows the distribution of graph sizes for our sample of 2000 pairs. The bottom figure plots graph size against the percent captured proximity, with a smoothing spline plotted through the data.
Source publication
Measuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extracting proximity in networks called "cycle- free effective conductance" (CFEC). Importantly, the measure...
Context in source publication
Context 1
... is clear that α = 10 is not sufficient to provide captured proximity of 95% or more, probably because there are simply so many paths between nodes due to the density of the graph. In Figure 8, we explore how large the subgraphs need to be to capture a meaningful percentage of proximity. Looking at the re- sulting subgraphs for 2000 pairs using α = 10, we plot graph size against the percent overall captured proximity in the bottom figure. ...
Similar publications
p> Nowadays with growth of using Internet as a principle way of communication, likes different social medias channels (Twitter, Facebook, etc . ) and also access to huge amount of information like News, there appear a main research subject to help users to find his/her interests among vast amount of relevant and irrelevant information. Recommender...
Conversational recommender systems (CRSs) have garnered increasing attention for their ability to provide personalized recommendations through natural language interactions. Although large language models (LLMs) have shown potential in recommendation systems owing to their superior language understanding and reasoning capabilities, extracting and u...
Matrix Factorization (MF) based approaches have proven to be e-cient for rating-based recommendation systems. In this work, we propose several matrix factorization ap- proaches with improved prediction accuracy. We introduce a novel and fast (semi)-positive MF approach that approx- imates the features by using positive values for either users or it...
The recommendation is playing an essential
part in our lives. Precise recommendations facilitate users
to swiftly locate desirable items without being inundated
by irrelevant information. In the last few years, the
amount of customers, products and online information
has raised speedily and results out into the huge data
analysis problem for recomm...
This paper proposes a method to construct an evaluation dataset from microblogs for the development of recommendation systems. We extract the relationships among three main entities in a recommendation event, i.e., who recommends what to whom. User-to-user friend relationships and user-to-resource interesting relationships in social media and resou...
Citations
... In this paper, we propose a link-based picture semantic similarity search method, namely PictureSim, for effectively searching similar pictures by building a picture-tag network. We first build a picture-tag network based on "description" relationships between pictures and tags, and then exploit the object-to-object relationships [36,37] in picture-tag network. The intuition behind PictureSim is that "similar pictures contain similar tags, and similar tags describe similar pictures", which is consistent with the intuition of SimRank. ...
Searching similar pictures for a given picture is an important task in numerous applications, including image recommendation system, image classification and image retrieval. Previous studies mainly focused on the similarities of content, which measures similarities based on visual features, such as color and shape, and few of them pay enough attention to semantics. In this paper, we propose a link-based semantic similarity search method, namely PictureSim, for effectively searching similar pictures by building a picture-tag network. The picture-tag network is built by “description” relationships between pictures and tags, in which tags and pictures are treated as nodes, and relationships between pictures and tags are regarded as edges. Then we design a TF-IDF-based model to removes the noisy links, so the traverses of these links can be reduced. We observe that “similar pictures contain similar tags, and similar tags describe similar pictures”, which is consistent with the intuition of the SimRank. Consequently, we utilize the SimRank algorithm to compute the similarity scores between pictures. Compared with content-based methods, PictureSim could effectively search similar pictures semantically. Extensive experiments on real datasets to demonstrate the effectiveness and efficiency of the PictureSim.
... We call the task of searching for a desired semantic class of proximity w.r.t. a query node as semantic proximity search [4]. It is a new problem in the sense that previous studies on proximity search [2], [3], [5], [6], [7] neither intend to explicitly differentiate the semantic classes, nor can effectively accomplish so. Beyond proximity search, the closest problems to ours are social circle learning [8] and relationship profiling [9] on graphs. ...
... Proximity search. Most earlier research [2], [3], [5], [6] only measures a "generic" form of proximity on graphs. Different senses of proximity have also emerged, such as hub and authority [32], probabilistic precision and recall [33], [34], as well as importance and specificity [35], [36]. ...
Data in the form of graphs are prevalent, ranging from biological and social networks to citation graphs and the Web. In particular, most real-world graphs are heterogeneous, containing objects of multiple types, which present new opportunities for many problems on graphs. Consider a typical proximity search problem on graphs, which boils down to measuring the proximity between two given nodes. Most earlier studies on homogeneous or bipartite graphs only measure a generic form of proximity, without accounting for different "semantic classes"-for instance, on a social network two users can be close for different reasons, such as being classmates or family members, which represent two distinct semantic classes. Learning these semantic classes are made possible on heterogeneous graphs through the concept of metagraphs. In this study, we identify metagraphs as a novel and effective means to characterize the common structures for a desired class of proximity. Subsequently, we propose a family of metagraph-based proximity, and employ a learning-to-rank technique that automatically learns the right parameters to suit the desired semantic class. In terms of efficiency, we develop a symmetry-based matching algorithm to speed up the computation of metagraph instances. Empirically, extensive experiments reveal that our metagraph-based proximity substantially outperforms the best competitor by more than 10%, and our matching algorithm can reduce matching time by more than half. As a further generalization, we aim to derive a general node and edge representation for heterogeneous graphs, in order to support arbitrary machine learning tasks beyond proximity search. In particular, we propose the finer-grained anchored metagraph, which is capable of discriminating the roles of nodes within the same metagraph. Finally, further experiments on the general representation show that we can outperform the state of the art significantly and consistently across various machine learning tasks.
... Centerpiece subgraphs and community search. Perhaps closer to our approach is work related to the centerpiece subgraphs and the community-search problem [16,22,23]. In this class of problems, a set of source vertices S is given and the goal is to find a subgraph so that S belongs in the subgraph and the subgraph forms a tight community. ...
... In this class of problems, a set of source vertices S is given and the goal is to find a subgraph so that S belongs in the subgraph and the subgraph forms a tight community. The quality of the subgraph is measured with various objective functions, such as degree [22], conductance [16], or random-walk-based measures [23]. The difference of these methods with the one presented here is that these methods return only one community, while in this paper we deal with the problem of finding a sequence of nested communities. ...
Finding communities in graphs is one of the most well-studied problems in data mining and social-network analysis. In many real applications, the underlying graph does not have a clear community structure. In those cases, selecting a single community turns out to be a fairly ill-posed problem, as the optimization criterion has to make a difficult choice between selecting a tight but small community or a more inclusive but sparser community. In order to avoid the problem of selecting only a single community we propose discovering a sequence of nested communities. More formally, given a graph and a starting set, our goal is to discover a sequence of communities all containing the starting set, and each community forming a denser subgraph than the next. Discovering an optimal sequence of communities is a complex optimization problem, and hence we divide it into two subproblems: 1) discover the optimal sequence for a fixed order of graph vertices, a subproblem that we can solve efficiently, and 2) find a good order. We employ a simple heuristic for discovering an order and we provide empirical and theoretical evidence that our order is good.
... • Cycle-Free Effective Conductance (CFEC) [32]: The cycle-free effective conductance (CFEC) proposed in measures the proximity between two nodes in a network [32] . The CFEC between the nodes s and t is formulated as ...
... • Cycle-Free Effective Conductance (CFEC) [32]: The cycle-free effective conductance (CFEC) proposed in measures the proximity between two nodes in a network [32] . The CFEC between the nodes s and t is formulated as ...
... The cycle-free effective conductance (CFEC) proposed measures the proximity between two nodes in a network [32]. The CFEC between the nodes s and t is formulated as ...
In recent years, link prediction has been applied to a wide range of real-world applications which often generate massive dynamic networks that require an effective real-time approach to predicting the formation of future links. Traditionally, link prediction approaches utilize a single snapshot of a network to predict future links. However, real-world network data often evolves dynamically at a rapid pace by adding and removing links. Therefore, there is a need for a dynamic and online link prediction framework. This dissertation focuses on challenges and solutions with the aim of advancing a link prediction framework for use in real-time analytics. For real-time link prediction, the framework should 1) be reliable and accurate, 2) maintain learning models, and 3) calculate node similarities in real time. In a real-world application that deals with time-varying networks, it is important to understand predictive models in a time-varying context. In this work, we develop several guidelines for using prediction models in a dynamic network. We also propose an incremental support vector machine method for link prediction, which updates the model using the latest data available as well as historical information. While being able to forecast future links accurately is vital, another equally important problem is to identify the most important and relevant links among large numbers of future links. To address this problem, we propose a domain-independent, supervised method that predicts the rank of future links using objective interestingness measures. We also propose an iterative link classification method, which updates the network using only predicted links with a high confidence level at each iteration. Using this method, we observed a significant improvement in accuracy and recall over the baseline link prediction method. Our proposed solutions address two out of the three requirements defined above, by focusing on maintaining the learning models and increasing the reliability and accuracy of link prediction in a dynamic network. In our future work, we plan to extend this research to address the final requirement by developing the approximation algorithms for computing similarity measures in large dynamic and streaming networks, in real time, using distributed computing frameworks.
... To emphasize the prominence of capturing proximity among two non-adjacent nodes in a network using a subgraph rather than a single good path. For that, we will employ single good path-related proximity measures, such as shortest path [8], which represents the length of the shortest path connecting two non-adjacent nodes together. 3. Distance Function. ...
This paper studies the problem of learning large-scale graph representations (a.k.a. embeddings). Such representations encode the relations among distinct nodes on the continuous feature space. The learned representations generalize over various tasks, such as node classification, link prediction, and recommendation. Learning nodes representations aims to map proximate nodes close to one another in the low-dimension vector space. Thus, embedding algorithms pursue to preserve local and global network structure by identifying nodes neighborhood notions. However, the means proposed methods have been employed in order to identify nodes neighborhoods fail to precisely capture network structure. In this paper, we propose a novel scalable graph embedding algorithmic framework called GECS, which aims to learn graph representations using connection subgraphs, where analogy with electrical circuits has been employed. Th connection subgraphs are created to address the proximity among each two non-adjacent nodes, which are abundant in real-world networks, by maximizing the amount of flow between them. Although a subgraph captures proximity between two non-adjacent nodes, the formation of the subgraph addresses the direct connections with immediate neighbors as well. Therefore, our algorithm better preserves the local and global structure of a network. Further, despite the fact that non-adjacent nodes are numerous in real-world networks, our algorithm can scale to large-scale graphs, because we do not deal with the graph as a whole, instead, with much more smaller extracted subgraphs. Since our algorithm is not yet empirically examined, we here introduce a potential solution that can better learn graph representations comparing to existing embedding methods accompanied by rational reasoning.
... The Cycle Free Effective Conductance (CFEC) proposed in [34] measures the proximity between two nodes in a network. The CFEC between the node s and t is formulated as: Score CFEC s,t = deg s . ...
For the last decade, the automatic generation of hypothesis from the literature has been widely studied. One common approach is to model biomedical literature as a concept network; then a prediction model is applied to predict the future relationships (links) between pairs of concept. Typically, this link prediction task can be cast into in one of two forms: (a) predict the future links for a specific concept (node) or (b) predict the future links for the entire network. However, while being able to accurately forecast future relationships is vital, another, equally important question should be addressed: of the predicted links, which will be most important and/or most relevant? Attempts to answer these questions in the past have generally been domain specific. In this paper, we propose a domain-independent, supervised method that predicts the rank of future links utilizing objective interestingness measures. The results, based on analysis of thirteen common interestingness measures, indicate that, while predicting the specific future interestingness values is difficult, our approach allowed us to capture the relative ordering of the links with low error.
... Tong and Faloutsos [32] extend [10] by introducing the concept of Center-piece Subgraph dealing with query sets of any size, but again having a budget b of additional vertices. Koren et al. [17] rede ne proximity using the notion of cycle-free e ective conductance and propose a branch and bound algorithm. All the approaches described above require several parameters: common to all is the size of the required solution, plus all the usual cps ctp mwc ldm mdl mis parameters of PageRank methods, e.g., the jumpback probability, or the number of iterations. ...
We study the problem of extracting a selective connector for a given set of query vertices Q subset of V in a graph G = (V,E). A selective connector is a subgraph of G which exhibits some cohesiveness property, and contains the query vertices but does not necessarily connect them all. Relaxing the connectedness requirement allows the connector to detect multiple communities and to be tolerant to outliers. We achieve this by introducing the new measure of network inefficiency and by instantiating our search for a selective connector as the problem of finding the minimum inefficiency subgraph.
We show that the minimum inefficiency subgraph problem is NP-hard, and devise efficient algorithms to approximate it. By means of several case studies in a variety of application domains (such as human brain, cancer, and food networks), we show that our minimum inefficiency subgraph produces high-quality solutions, exhibiting all the desired behaviors of a selective connector.
... Tong and Faloutsos [32] extend [10] by introducing the concept of Center-piece Subgraph dealing with query sets of any size, but again having a budget b of additional vertices. Koren et al. [17] rede ne proximity using the notion of cycle-free e ective conductance and propose a branch and bound algorithm. All the approaches described above require several parameters: common to all is the size of the required solution, plus all the usual cps ctp mwc ldm mdl mis parameters of PageRank methods, e.g., the jumpback probability, or the number of iterations. ...
We study the problem of extracting a selective connector for a given set of query vertices in a graph . A selective connector is a subgraph of G which exhibits some cohesiveness property, and contains the query vertices but does not necessarily connect them all. Relaxing the connectedness requirement allows the connector to detect multiple communities and to be tolerant to outliers. We achieve this by introducing the new measure of network inefficiency and by instantiating our search for a selective connector as the problem of finding the minimum inefficiency subgraph. We show that the minimum inefficiency subgraph problem is NP-hard, and devise efficient algorithms to approximate it. By means of several case studies in a variety of application domains (such as human brain, cancer, and food networks), we show that our minimum inefficiency subgraph produces high-quality solutions, exhibiting all the desired behaviors of a selective connector.
... In [13], different proximity measures for link prediction in scientific coauthorship networks were evaluated and the Adamic-Adar was the most efficient algorithm. The Algorithm presented in [16] scores the nodes according to the probability of a random walker to reach from one node to another. In [17,18], random walk was used for link prediction. ...
One of the common methods used in recommender systems is collaborative filtering methods. In these methods, same-interest users' preferences are often recommended to each other based on examining their past interests. On the other hand, one of the recommendation methods in social networks is to measure the proximity of the two nodes in the graph. Although many researchers have dealt with friendship link prediction in different online social networks, very little notice has been spent on activity prediction based on different users' interactions. The main objective of this paper is the use of collaborative filtering methods for activity prediction and recommendation both for pairs of users without any interaction background and also for user pairs with the activity background. In this regard, a new concept is initially presented named as “collaborative path”. Then based on the collaborative path, four directed proximity measures are proposed. In addition, three new algorithms, including two algorithms based on collaborative random walks, one for mixed network and one for multilayer network and the Collaborative-Association-Rule algorithm are presented. Finally, in order to evaluate our proposed methods, we perform some experiments on the data set of different Facebook activity networks including like, comment, post, and share networks. The results show that the proposed collaborative methods deal with the activity prediction well without suffering from the cold start problem, and outperform the existing state of the art methods.
... A method for measuring and extracting proximity graphs [18] uses academic co-authorship networks, movie recommendation systems and IMDB actor graphs. ...