Conference PaperPDF Available

Multiperspective Graph-Theoretic Similarity Measure

Authors:

Abstract and Figures

Determining the similarity between two objects is pertinent to many applications. When the basis for similarity is a set of object-to-object relationships, it is natural to rely on graph-theoretic measures. One seminal technique for measuring the structural-context similarity between a pair of graph vertices is SimRank, whose underlying intuition is that two objects are similar if they are connected by similar objects. However, by design, SimRank as well as its variants capture only a single view or perspective of similarity. Meanwhile, in many real-world scenarios, there emerge multiple perspectives of similarity, i.e., two objects may be similar from one perspective, but dissimilar from another. For instance, human subjects may generate varied, yet valid, clusterings of objects. In this work, we propose a graph-theoretic similarity measure that is natively multiperspective. In our approach, the observed object-to-object relationships due to various perspectives are integrated into a unified graph-based representation, stylised as a hypergraph to retain the distinct perspectives. We then introduce a novel model for learning and reflecting diverse similarity perceptions given the hypergraph, yielding the similarity score between any pair of objects from any perspective. In addition to proposing an algorithm for computing the similarity scores, we also provide theoretical guarantees on the convergence of the algorithm. Experiments on public datasets show that the proposed model deals better with multiperspectivity than the baselines.
Content may be subject to copyright.
Multiperspective Graph-Theoretic Similarity Measure
  
  


  
  


ABSTRACT
        
           
         
       
          
           
          
          
        
          
        
         
         
       
        
        
         
        
         
         
        
        
         
   
1 INTRODUCTION
         
        
         
          
         
     similar     
           
        
       
         
         
          
                
             
              
              
            
              
       
CIKM ’18, October 22–26, 2018, Torino, Italy
           
  

         
       
         
         
         
       
Problem.          graph-theoretic
similarity      
        
          
          
        
         
        
        
 
        
         
           
    (i,j)similar      
     k  i  l  j 
  similar       
           
          
          
         
    uniperspective    
         
        
          
          
        
          
         
       
        
       
        
          
      naive      
        
         
          
        
        
            
           
         
       
Proposed Approach.      multi-
perspective      
          
        
          
           
       
        inter-object 
         inter-perspective
         
        
       
        
Contributions.       
First            
    Second  
       
      
      ird  
         
      
     Fourth   
        
         
         
    Fih      
         
      
      
   Finally     
        
       
        

2 RELATED WORK
           
     
            
       
       
Graph-eoretic Similarity.      
         
     G(V,E)   
         
       
S(a,b)   a,b   
S(a,b) =
C
|N(a)| |N(b)|
|N(a)|
i=1
|N(b)|
j=1
S(Ni(a),Nj(b)), a,b,
1, a=b

  C        N(a) N(b)
    a b    
   a,b      
           
          a  
   S(a,b) = 0    b,a
         
           
      
       
         
            
          
        
         
           
          
      
        
          
           
      
Multiperspective Similarity.    
          
        
        
          
          
           
          
           
          

         
        
             
       
3 OVERVIEW
          
        
         
         
     
3.1 Problem Formulation
 O={o1,o2, . . . ,on}       
           
 m  P={p1,p2, . . . ,pm}  
    O    p∈ P    
 Gp(O,Ep)  Ep O × O   
    p     
 G={G1,G2, . . . ,Gm}      
   Gp       
 O         
   m      
           
        
           
Figure 1: Illustration of the Hypergraph Representation
            
        
           
         
         
        perspective
   object      
   G      H=
(X,E)      X=P ∪ O     
 E={(pk,oi,oj) : 1 km; 1 i,jn}  
(pk,oi,oj)∈ E   oi oj    
 pk  (oi,oj)Epk Gpk     
     P={p1,p2} 
  O={o1,o2,o3,o4}  (p1,o1,o4)
 (p1,o2,o3)     p1  o1
    o4   o2    o3 
   p2o1   o2  o3   o4
    H(X,E)  
    oi,oj∈ O    p P 
  Sp(oi,oj)      [0,1]  
  Sp(oi,oj) = 1  i=j       
   
Problem 1 (Multiperspective Similarity). Given a multiper-
spective hypergraph H, determine the similarity score Sp(oi,oj)for
each perspective p∈ P and pair of objects oi,oj∈ O.
3.2 Framework for Multiperspective Solutions
          
        Gp
       Disjoint-SimRank 
      
          
Gp         
           
        
          
           
        
        
Table 1: List of Notations
Symbols Description
P {p1,p2, . . . ,pm} 
O {o1,o2, . . . , om} 
X     P ∪ O
E    
{(p,oi,oj) : oi oj    p}
m  
n  
Np(oi){oj∈ O|(p,oi,oj)∈ E}
Sp(oi,oj)    oi ok
   p
Sp[Sp(oi,oj)]n×n 
   
Wp   n×n  
      p∈ P
sim(p,p)     p,p
C 
n    n
        
        
       
   sim(p,p)[0,1]      
 p,p∈ P     sim(p,p)=1
p=p       p,p  

    Sp(oi,oj)   oi oj
  p          
   sim(p,p)    
 Sp(oi,oj)       
     p    
   Np(oi)    oi Gp
Sp(oi,oj) = C
| P |
p∈P
sim(p,p)
okNp(oi)
olNp(oj)
Sp(ok,ol)
|Np(oi)| |Np(oj)|,

        
         
         
        
          
        
 Sp= [Sp(oi,oj)]n×n     
     Wp 
       
 p∈ P          
Sp=C
|P |
pP
sim(p,p).WpTSpWp
      
    sim(p,p)   
         
          
        
       
          
        
      
       
         
       
4 STRAIGHTFORWARD SOLUTION:
PIPELINED-SIMRANK
        
       
          
 sim(p,p)   Sp(oi,oj)  
        sim(p,p)
        H
4.1 Inter-Perspective Similarity
       p  
     Gp   
   p p     
 Gp Gp      p
        p    
      
      H= ({P ,O},E) 
  B        
           
P       O × O 
           
 p    oij     
B (p,oi,oj)∈ E     H




Figure 2: : Bipartite graph for computing
similarity between perspective nodes
    B       
          
    sim(p,p)   
       Sp(oi,oj)
4.2 Learning Algorithm
      
          
  sim(p,p)  p,p P  
         
        
     sim(p,p)     
     S(0)
p(,)p∈ P     
           
S(0)
p(oi,oj) = 0  i,j 1 i=j
            
         
       
Algorithm 1 
Require:  H    
 —- create bipartite graph from hypergraph —- 
 B ← bipartiteTransform(H)

 —- compute the similarity between perspectives —- 
 {sim()(p,p)}p,p∈P bipartiteSimRank(B)

 Initialize S(0)
pn,p∈ P
 while   do
S(t+1)
p(oi,oj) = C
|P |
pP
sim()(p,p)
×
okNp(oi)
olNp(oj)
S(t)
p(ok,ol)
|Np(oi)||Np(oj)|,
( 1i,jn)
 S(t+1)
p(oi,oi) = 1(  1in)

 Return {Sconverged
p(oi,oj),p∈ P,oi,oj∈ O }
  {sim()(p,p),p,p∈ P}.
4.3 Convergence Property
         
        
  e sequence of perspective-specic similarity score
produced by Algorithm 1 is non-decreasing and bounded by [0,1],
i.e., for p∈ P,oi,oj∈ O,t0.
1S(t+1)
p(oi,oj)S(t)
p(oi,oj)0,
Proof:        (5) 
         
S(1)
p(oi,oj)0 = S(0)
p(oi,oj),p∈ P,oi,oj∈ O
 S(1)
p(oi,oi) = 1 = S(0)
p(oi,oi),p∈ P,oi∈ O .
       t= 0    
          t1
   {S(t)
p(oi,oj)}t0  
        
{S(t)
p(oi,oj)}t0     Sp(oi,oj)[0,1]
 {Sp(oi,oj)} {sim()(p,p)}    
 
5 JOINT SOLUTION: MP-SIMRANK
     joint   
          
    sim(p,p)  
 Sp(oi,oj)
5.1 Inter-Perspective Similarity
     sim(p,p) 
Sp          
       sim(p,p)   Sp
         
          
     
sim(p,p) = 1
Sp− S
p
F
n,
     p,p  
      Sp Sp    
 
Sp−S
p
F
n     sim(p,p)    
  Sp Sp   
Sp−S
p
F
n 
   sim(p,p)   
5.2 Learning Algorithm
         
     S(0)
p(,)p
Algorithm 2 
Require:  H    
  S(0)
pIn,p∈ P
  sim(0)(p,p) = 1  p=p 0 p,p

 while   do

S(t+1)
p(oi,oj) = C
|P |
pP
sim(t)(p,p)
×
okNp(oi)
olNp(oj)
S(t)
p(ok,ol)
|Np(oi)||Np(oj)|,
( 1i,jn)
 S(t+1)
p(oi,oi) = 1 ( 1in)
 sim(t+1)(p,p) = 1
S(t+1)
p−S(t+1)
p
F
n,p,p∈ P
 Return {Sconverged
p(oi,oj),p∈ P,oi,oj∈ O }
  {simconverged(p,p),p,p∈ P}
P         
 sim(0)(p,p)p,p∈ P     
sim(0)(p,p) = 0  p,p 1 p=p
           
          
        
5.3 Convergence Property
         
       
  e sequence of similarity between perspectives pro-
duced by Algorithm 2 is non-decreasing and bounded by [0,1], i.e.,
for t1,
1sim(t+1)(p,p)sim(t)(p,p)0,p,p∈ P.
Proof:    t0
S(t+1)
p− S(t+1)
p
F
S(t)
p− S(t)
p
F,
   p,p∈ P  
S(t+1)
p− S(t+1)
p
F
=
C
|P |
p′′ P
sim(t)(p,p′′)sim(t)(p,p′′)WT
p′′ · S(t)
p′′ ·Wp
F
C
|P |
p′′ P
sim(t)(p,p′′)sim(t)(p,p′′).
WT
p′′ · S(t)
p′′ ·Wp
F
=C
|P |
p′′ P
S(t)
p− S(t)
p′′
F
S(t)
p− S(t)
p′′
F.
WT
p′′ · S(t)
p′′ ·Wp
F
n
=C
|P |
p′′ P
(S(t)
p− S(t)
p′′ )(S(t)
p− S(t)
p′′ )
F.
WT
p′′ · S(t)
p′′ ·Wp
F
n
C
|P |
p′′ P
S(t)
p− S(t)
p
F<
S(t)
p− S(t)
p
F
sim(t+1)(p,p)sim(t)(p,p).
 0
S(t)
p−S(t)
p
F
n1,t1 p,p∈ P   
sim(t)(p,p)[0,1],t0 p,p∈ P   
   sim(t)(p,p)    sim(p,p)
          
        
     {S(t)
p(oi,oj)}t0   
Sp(oi,oj)[0,1]  {sim(t+1)(p,p)}t0   
sim(p,p) Sp(oi,oj) sim(p,p)  
6 EXPERIMENTS ON EFFECTIVENESS
        
      
         
   
6.1 Experimental Settings
Datasets.        
         
        
           
     
Zoo       
 legstype        
          
        (p,oi,oj)
oi oj     p     
legselephantgirae       
Congressional Voting Records (or HouseVote)  
        
          
        
         Zoo 
Paris Aractions       
           
         
i j        
          
        
mn2          
        
Task and Metrics.      
           
          
           
          
         
          
         
         
          
       
       
        
          
         
Recall:   p∈ P         
         
          
         
         
  E
p    E
p  Recall  
  E
p  E
p   
Recall =1
m
pP
|E 
p∩ E
p|
|E 
p|
PRES:         
            



          
         
 PRES        
        
         n  
        ri  
         Nmax  
          
          
PRES = 1 ri
nn+1
2
Nmax

Methods.        
       
            
       
        
    C       
    uniperspective   Merged-
SimRank           
        Average-
SimRank        
         
         
         
Disjoint-SimRank     
          
          
          
         
           
      
  
   Personalized Collaborative Clustering  PCC
          
           
        
         
        
6.2 Comparison to Baselines
         
       Recall   
    
Disjoint-SimRank      Recall  Zoo
HouseVote  Paris Aractions      
         
        
Merged-SimRank    Recall  Disjoint-
SimRank      Zoo   HouseVote 
  Paris Aractions      
     Average-SimRank  
  Recall        Zoo 
 HouseVote    Paris Aractions   
     
5
Figure 3: Recall values of all models
00.20.40.60.81
Merged-S imRank
Average-SimRank
Disjoint-Si mRank
PCC
Pipelined-SimRank
MP-SimRan k
PRES
Zoo
00.20.40.60.81
Merged-S imRank
Average-SimRank
Disjoint-Si mRank
PCC
Pipelined-SimRank
MP-SimRan k
PRES
Paris Attractions
00.20.40.60.81
Merged-S imRank
Average-SimRank
Disjoint-Si mRank
PCC
Pipelined-SimRank
MP-SimRan k
PRES
HouseVote
Figure 4: PRES values of all models
         
           
           
      
  Recall    Zoo   HouseVote 
  Paris Aractions      Recall 
          
         
         
   
     PCC    
  Zoo   HouseVote    Paris Aractions 
    Disjoint-SimRank     
         Merged-
SimRank      Average-SimRank
    PRES      
 Recall          
       Recall  PRES
       PRES    
           
         
          
       Paris Aractions 
       Recall  PRES  
            
 
6.3 Inter-Perspective Similarities
         
       
        
 sim(p,p)p,p∈ P       
      p,p
  sim(p,p)    
       sim(p,p)
          
     Zoo  HouseVote   
         
           
          
      p∈ P    
        
    P     Paris Aractions
         
           
     
         
       
          
        
        
        
      
Table 2: Correlation between NMI scores
and inter-perspective similarities for Zoo (17 perspectives)
  
p1 
p2 
p3 
p3 
p5 
p6 
p7 
p8 
p9 
p10  
p11  
p12  
p13  
p14  
p15  
p16  
p17  
6.4 Illustrative Case Study
           
         
     Paris Aractions    
    
           
          
   | |          
             
         
              
             
          
        
       Paris Aractions 
       
          
          
          
        
           
            
       
            
             
7 DISCUSSION ON EFFICIENCY
         
    
Table 3: Correlation between NMI scores and
inter-perspective similarities for HouseVote (16
perspectives)
  
p1 
p2 
p3 
p4 
p5 
p6 
p7 
p8 
p9 
p10  
p11  
p12  
p13  
p14  
p15  
p16  
Table 4: Cluster data of four users from Paris Aractions
  
  30  50 62 76 88       
       30 50 62 88    
         62   88   50  
      88         76   
   30        
Table 5: Complexity analysis (per iteration) of all
SimRank-based methods
Methods Storage Time
 On2On2dmax
 On2Omn2dmax
 Omn2Omn2dmax
 Om2+n4+mn2O(m2+n4+mn2)dbi +m2n2dmax
 Om2+mn2Om2n2dmax
7.1 Complexity Analysis
         
         Merged-
SimRank         
          n2  
   dp      
 |Np(oi)|.|Np(oj)|   oi,oj∈ O  dmax 
       p∈ P
        
Average-SimRank  Disjoint-SimRank     
     m    
        
         
     
         
50
62
30
88
50
62
30
88
50
62
88
76
76
88
30
id:
id:
id:
id:





Figure 5: Illustrative example of multiperspective similarity from Paris Aractions dataset.
       
 mn2       
 m2         n4
dbi           B
            
     
    m2n2dmax     
      
       
        
   n4     
      
          
      convergence rate   
    
Dt=1
m
pP
S(t+1)
p− S(t)
p
F
n,
       Dt   
t        
       Zoo  HouseVote 
     Paris Aractions
7.2 Heuristic for More Ecient MP-SimRank
         
         
         
         
      
       Disjoint-SimRank 
    Sdisjoint
p,p∈ P    
   O(mn2dmax)      
        k-medoids
 k        
            
          
          
O(m2+km)        
Hc   cluster-specic inter-object  Sc 
    O(k2n2dmax)     
        
    O(mn2dmax+m2+km+k2n2dmax) 
       Om2n2dmax
km
       
          k  
           
       Recall    PRES
           
          k
     
  k=m      k 
         
       
8 CONCLUSION
          
       
        
       
          
         
        
      
        
       
       
       
         
       
   

      



  
 







Running Time (second)
HouseVote
PRES Recall
k = 1
910 11 12 13 14 15 16 k=17
k=1
910 11
12 13
14 15 16
k=17








Running Time (second)
Zoo
PRES Recall
       
 
         








Running Time (second)
P
ar
i
s
A
ttract
i
ons
PRES Recall
Figure 6: PRES, Recall, and running time of  with dierent number of clusters k.
Algorithm 3 
Require:  H    k
 – Step 1: run disjoint-simrank on each perspective graph – 
 Sdistjoint
pDisjoint SimRank(Gp),p∈ P.

 - Step 2: compute Frobenius distances between perspectives - 
 F=[F(p,p)]p,p∈P  

F(p,p) =
Sdistjoint
pSdistjoint
p
F

 – Step 3: cluster perspectives and merge graphs – 
 C ← KMedoids(F,k); Hcmerge graph(H,C)

 – Step 4: run  on the new hypergraph Hc
 {Sc}c∈C (Hc)

 – Step 5: assign each perspective the inter-object similarity –
15: – of the cluster it belongs to–
 SpSc,p∈ P,c∈ C, pc

  {Sp,p∈ P}
ACKNOWLEDGMENTS
        
        
   
REFERENCES
           
  ACM Transactions on Knowledge Discovery from Data (TKDD)  
 
          User
modeling and user-adapted interaction    
             
       Journal of Machine Learning
Research    
          
         
  Proceedings of the Workshop on Geometrical Models of Natural Lan-
guage Semantics     
           
     BMC bioinformatics    
             
     IEEE Transactions on Neural
Networks    
          
  Proceedings of the irtieth international conference on Very large data
bases-Volume 30   
            
        Proceedings of the
16th ACM SIGKDD international conference on Knowledge discovery and data min-
ing  
           Pro-
ceedings of the sixth new zealand computer science research student conference
(NZCSRSC2008), Christchurch, New Zealand 
           
    Proceedings of the 2007 joint conference on empirical meth-
ods in natural language processing and computational natural language learning
(EMNLP-CoNLL)
           
  Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining  
         
      Proceedings of the 2014 ACM SIGMOD in-
ternational conference on Management of data  
              
           
  Proceedings of the 13th International Conference on Extending Database
Technology  
               
    Proceedings of the 2010 SIAM International Confer-
ence on Data Mining  
             
     Proceedings of the 33rd in-
ternational ACM SIGIR conference on Research and development in information
retrieval  
           Intro-
duction to information retrieval      
            
    Proceedings of the 52nd Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers)   

              
        
Proceedings of the VLDB Endowment    
            
      6th International Conference on Data Mining,
ICDM 2006
         
     Advances in Neural Information Processing Sys-
tems 
              
        
  Advances in neural information processing systems 
           
    WWW 
... And for time series, Dynamic Time Warping (DTW) 10,11 is the most well known technique for evaluating the similarity with respect to their shape information. Among all of them, network (or graph) based approach is commonly adopted 12 . Generally, a network is built in which nodes are the entities and links corresponds to association between entities. ...
Preprint
Full-text available
The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystem, and brain network. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here we show that traditional methods based on the aggregated network can bring unwanted in-directed relationship. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity's centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks.
... And for time series, Dynamic Time Warping (DTW) 10,11 is the most well known technique for evaluating the similarity with respect to their shape information. Among all of them, network (or graph) based approach is commonly adopted 12 . Generally, a network is built in which nodes are the entities and links corresponds to association between entities. ...
Article
Full-text available
The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystems, and brain networks. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here, we show that traditional methods based on the aggregated network can bring unwanted indirect relationships. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity’s centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks.
Conference Paper
Full-text available
We consider the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly, however, it is reasonable to assume that they can compare items and judge whether they are similar or not. An important question is what queries to make, and we compare two types: random edge queries, where a pair of items is revealed, and random triangles, where a triple is. Since it is far too expensive to query all possible edges and/or triangles, we need to work with partial observations subject to a fixed query budget constraint. When a generative model for the data is available (and we consider a few of these) we determine the cost of a query by its entropy; when such models do not exist we use the average response time per query of the workers as a surrogate for the cost. In addition to theoretical justification, through several simulations and experiments on two real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a fixed budget, triangle queries uniformly outperform edge queries. Even though, in contrast to edge queries, triangle queries reveal dependent edges, they provide more reliable edges and, for a fixed budget, many more of them. We also provide a sufficient condition on the number of observations, edge densities inside and outside the clusters and the minimum cluster size required for the exact recovery of the true adjacency matrix via triangle queries using a convex optimization-based clustering algorithm.
Article
Full-text available
Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. They have become fundamental applications in electronic commerce and information access, providing suggestions that effectively prune large information spaces so that users are directed toward those items that best meet their needs and preferences. A variety of techniques have been proposed for performing recommendation, including content-based, collaborative, knowledge-based and other techniques. To improve performance, these methods have sometimes been combined in hybrid recommenders. This paper surveys the landscape of actual and possible hybrid recommenders, and introduces a novel hybrid, EntreeC, a system that combines knowledge-based recommendation and collaborative filtering to recommend restaurants. Further, we show that semantic ratings obtained from the knowledge-based part of the system enhance the effectiveness of collaborative filtering.
Conference Paper
Full-text available
This paper presents a graph-theoretic approach to the identification of yet-unknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.
Conference Paper
We present CoSimRank, a graph-theoretic similarity measure that is efficient because it can compute a single node similarity without having to compute the similarities of the entire graph. We present equivalent formalizations that show CoSimRank's close relationship to Personalized Page- Rank and SimRank and also show how we can take advantage of fast matrix multiplication algorithms to compute CoSim- Rank. Another advantage of CoSimRank is that it can be flexibly extended from basic node-node similarity to several other graph-theoretic similarity measures. In an experimental evaluation on the tasks of synonym extraction and bilingual lexicon extraction, CoSimRank is faster or more accurate than previous approaches.
Article
SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many of the above mentioned applications. While there are many algorithms proposed so far to compute SimRank, but unfortunately, none of them are scalable up to graphs of billions size. Motivated by this fact, we consider the following SimRank-based similarity search problem: given a query vertex u, find top-k vertices v with the k highest SimRank scores s(u,v) with respect to u. We propose a very fast and scalable algorithm for this similarity search problem. Our method consists of the following ingredients: (1) We first introduce a "linear" recursive formula for SimRank. This allows us to formulate a problem that we can propose a very fast algorithm. (2) We establish a Monte-Carlo based algorithm to compute a single pair SimRank score s(u,v), which is based on the random-walk interpretation of our linear recursive formula. (3) We empirically show that SimRank score s(u,v) decreases rapidly as distance d(u,v) increases. Therefore, in order to compute SimRank scores for a query vertex u for our similarity search problem, we only need to look at very "local" area. (4) We can combine two upper bounds for SimRank score s(u,v) (which can be obtained by Monte-Carlo simulation in our preprocess), together with some adaptive sample technique, to prune the similarity search procedure. This results in a much faster algorithm. Once our preprocess is done (which only takes O(n) time), our algorithm finds, given a query vertex u, top-20 similar vertices v with the 20 highest SimRank scores s(u,v) in less than a few seconds even for graphs with billions edges. To the best of our knowledge, this is the first time to scale for graphs with at least billions edges(for the single source case).
Article
One of the main challenges in data clustering is to define an appropriate similar-ity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclus-tering is that it can only cluster objects when their manual annotations are avail-able. To address this limitation, we propose a new approach for clustering, called semi-crowdsourced clustering that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowd-sourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects and from the manual annotations of only a small por-tion of the data to be clustered. One difficulty in learning the pairwise similarity measure is that there is a significant amount of noise and inter-worker variations in the manual annotations obtained via crowdsourcing. We address this difficulty by developing a metric learning algorithm based on the matrix completion method. Our empirical study with two real-world image data sets shows that the proposed algorithm outperforms state-of-the-art distance metric learning algorithms in both clustering accuracy and computational efficiency.
Conference Paper
We study the problem of learning personalized user models from rich user interactions. In particular, we focus on learning from clustering feedback (i.e., grouping recommended items into clusters), which enables users to express similarity or redundancy between different items. We propose and study a new machine learning problem for personalization, which we call collaborative clustering. Analogous to collaborative filtering, in collaborative clustering the goal is to leverage how existing users cluster or group items in order to predict similarity models for other users' clustering tasks. We propose a simple yet effective latent factor model to learn the variability of similarity functions across a user population. We empirically evaluate our approach using data collected from a clustering interface we developed for a goal-oriented data exploration (or sensemaking) task: asking users to explore and organize attractions in Paris. We evaluate using several realistic use cases, and show that our approach learns more effective user models than conventional clustering and metric learning approaches.