Edge Anonymity in Social Network Graphs
ABSTRACT Edges in social network graphs may represent sensitive relationships. In this paper, we consider the problem of edges anonymity in graphs. We propose a probabilistic notion of edge anonymity, called graph confidence, which is general enough to capture the privacy breach made by an adversary who can pinpoint target persons in a graph partition based on any given set of topological features of vertexes. We consider a special type of edge anonymity problem which uses vertex degree to partition a graph. We analyze edge disclosure in realworld social networks and show that although some graphs can preserve vertex anonymity, they may still not preserve edge anonymity. We present three heuristic algorithms that protect edge anonymity using edge swap or edge deletion. Our experimental results, based on three realworld social networks and several utility measures, show that these algorithms can effectively preserve edge anonymity yet obtain anonymous graphs of acceptable utility.
 Citations (14)
 Cited In (0)

Article: A brief survey on anonymization techniques for privacy preserving publishing of social network data.
[show abstract] [hide abstract]
ABSTRACT: Nowadays, partly driven by many Web 2.0 applications, more and more social network data has been made publicly available and analyzed in one way or another. Privacy pre serving publishing of social network data becomes a more and more important concern. In this paper, we present a brief yet systematic review of the existing anonymization techniques for privacy preserving publishing of social net work data. We identify the new challenges in privacy pre serving publishing of social network data comparing to the extensively studied relational case, and examine the pos sible problem formulation in three important dimensions: privacy, background knowledge, and data utility. We sur vey the existing anonymization methods for privacy preser vation in two categories: clusteringbased approaches and graph modiflcation approaches.SIGKDD Explorations. 01/2008; 10:1222.  SourceAvailable from: psu.edu
Conference Proceeding: Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography
[show abstract] [hide abstract]
ABSTRACT: In a social network, nodes correspond to people or other social en tities, and edges correspond to social links between them. In an effort to preserve privacy, the practice of anonymization replaces names with meaningless unique identifiers. We describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes.01/2007  SourceAvailable from: psu.edu
Conference Proceeding: Preserving Privacy in Social Networks Against Neighborhood Attacks
[show abstract] [hide abstract]
ABSTRACT: Recently, as more and more social network data has been published in one way or another, preserving privacy in publishing social network data becomes an important concern. With some local knowledge about individuals in a social network, an adversary may attack the privacy of some victims easily. Unfortunately, most of the previous studies on privacy preservation can deal with relational data only, and cannot be applied to social network data. In this paper, we take an initiative towards preserving privacy in social network data. We identify an essential type of privacy attacks: neighborhood attacks. If an adversary has some knowledge about the neighbors of a target victim and the relationship among the neighbors, the victim may be reidentified from a social network even if the victim's identity is preserved using the conventional anonymization techniques. We show that the problem is challenging, and present a practical solution to battle neighborhood attacks. The empirical study indicates that anonymized social networks generated by our method can still be used to answer aggregate network queries with high accuracy.Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on; 05/2008
Page 1
1
Edge Anonymity in Social Network Graphs
Lijie Zhang and Weining Zhang
Department of Computer Science, University of Texas at San Antonio
{lijez, wzhang}@cs.utsa.edu
Abstract—Edges in social network graphs can model sensitive
relationships. In this paper, we consider the problem of edges
anonymity in graphs. We propose a probabilistic notion of edge
anonymity, called graph confidence, which is general enough to
capture the privacy breach made by an adversary who is able to
pinpoint target persons in a graph partition based on any given
set of topological features of vertexes. We then focus on a special
type of edge anonymity problem which uses vertex degree to
partition a graph. We analyze edge disclosure in realworld social
networks and show that even if graphs are anonymized to prevent
vertex disclosure, they still do not guarantee edge anonymity. We
present three heuristic algorithms that protect edge anonymity
using edge swap or edge deletion. Our experimental results, based
on three realworld social networks and several utility measures,
show that these algorithms can effectively protect edge anonymity
and obtain anonymous graphs of acceptable utility.
I. INTRODUCTION
Social networks emerge as an important platform for peo
ple to establish, discover, and maintain their relationships
with others. These systems also provide enormous potential
for ebusiness and present unique opportunities for social
behavior research. Conceptually, social networks are graphs
in which vertexes represent individuals and edges represent
relationships among individuals, such as friendship, trust, and
social contact. These graphs are extremely useful for studying
patterns of social influence [1], models of viral marketing [9],
collaborative filtering in recommendation systems [2], to name
a few. However, these applications of social networks also raise
serious concerns of privacy.
It has been pointed out [3], [7] that privacy can be breached
in published social networks even if they do not contain
personal identities. One type of attack to privacy is vertex re
identification [7], [11], [18], [19] whereby an adversary can
identify the vertex of a target person by analyzing topological
features of the vertex based on his background knowledge
about the person. For example, based on his knowledge about
a target person Alice, the adversary may know the degrees of
vertexes of Alice and her friends in a social network. Using
this knowledge, the adversary can analyze a published graph of
the social network and identify all vertexes whose surrounding
graphs (SGs) have these specific degrees. These vertexes form
an equivalence class (EC). Alice can be reidentified if her
equivalence class contains a single vertex. A fundamental
privacy notion with respect to vertex reidentification is vertex
kanonymity, which requires each equivalence class to contain
at least k vertexes.
Another type of attack is edge reidentification [12], [16],
[17], [19] whereby the adversary can identify the edge incident
to vertexes of two target persons by analyzing topological
features of the vertexes using his knowledge about the pair of
persons. Edge reidentification is a breach to privacy because
edges may represent sensitive information. Notice that edge
can be reidentified even when vertexes cannot. For example,
the adversary may know the degrees of two target persons and
use this knowledge to identify ECs of the two persons. If every
vertex in one EC has an edge with every vertex in the other EC,
the adversary can infer with probability 1.0 that an edge exists
between the two target persons, even if the adversary may not
be able to identify the two persons within their respective ECs.
Unlike vertex anonymity, there is currently no wellestablished
privacy notion of edge anonymity.
A. Our Contributions
In this paper, we focus on protecting edge anonymity in
published social networks. First, we propose a probabilistic
privacy notion of edge anonymity, called graph confidence,
which is defined with respect to a partition the graph using
a vertex description type (VDT), topological features of ver
texes. This privacy notion captures the privacy breach made
by an adversary who is able to pinpoint vertex ECs of target
persons in a graph partition based on any given set of topo
logical features. Although it is an open problem to determine
what is the most appropriate assumption of the ability of the
adversary, our notion of edge anonymity is general enough to
accommodate any given VDT. We then focus on a special type
of edge anonymity problem, which uses vertex degree as the
VDT, and make the following contributions.
1) We analyze edge disclosure in two realworld social
networks and show that edge disclosure can occur in
these graphs, especially the denser one, both before and
after applying algorithms of vertex anonymity.
2) We present three algorithms to obtain τconfidence
graphs, graphs whose confidence is no less than a
threshold τ. These algorithms use heuristics to perform
either edge swap or edge deletion, so that not only to
achieve edge anonymity but also to preserve utility.
3) We study empirically the performance and utility of
these algorithms. Our preliminary results, based on three
realworld social networks and several utility measures,
show that these algorithms can effectively protect edge
anonymity and can obtain anonymous graphs of accept
able utility.
B. Related Work
Several methods have been proposed to prevent vertex re
identification through vertex kanonymity, which places at
least k vertexes in each equivalence class. These methods
Page 2
2
differ in the types of the structural features that an adversary
might use to partition vertexes in anonymous graphs.
Liu and Terzi [11] studied the vertex reidentification attack
assuming that the adversary knows only the degree of the
vertex of a target person. Their method obtains a vertex k
anonymous graph by adding or deleting edges of the original
graph, so that, there are at least k vertexes of each degree.
Zhou and Pei [18] studied a neighborhood attack, in which
the adversary reidentifies a target person by partitioning the
graph according to isomorphism of vertex neighborhoods, a
type of subgraph surrounding a vertex. Their method obtains
a vertex kanonymous graph by adding edges, so that, at least
k vertexes will have isomorphic neighborhoods.
Hay et al. [7] model the background knowledge of an
adversary as the answer to a knowledge query and proposed
a method to prevent a vertex reidentification attack based
on any type of knowledge queries. Their method obtains a
vertex kanonymous supergraph by aggregating vertexes into
supernodes and edges into superedges, so that, each supernode
represents at least k vertexes and each superedge represents all
edges between vertexes in two supernodes. Since supernodes
and superedges do not reveal internal topology, users of a
supergraph have to generate conventional graphs by random
sampling.
A number of methods were also proposed for edge
anonymity.
Singh and Zhan [14] briefly introduced edge inference
breach and proposed to measure graph anonymity in terms
of vertex degree and vertex clustering coefficient. But they
did not measure the privacy between individuals.
Zheleva et al. [17] considered a link reidentification attacks
in which the adversary infers sensitive relationships from
nonsensitive ones in graphs that contains multiple types
of edges. In addition to removing all sensitive edges, their
method also removes some nonsensitive edges and aggregates
vertexes/edges into clustered nodes/edges.
Ying and Wu [16] considers edge reidentification attacks in
which the adversary does not have any background knowledge.
Their methods obtain an anonymous graph by performing
random edge swap, edge addition, and edge deletion. However,
since their methods typically introduce very small random
noise, the anonymous graphs obtained by these methods may
not provide sufficient protection to edge anonymity.
We refer interested readers to the recent survey by Liu et
al. [10], which gives a more detailed account on research of
privacypreserving data analysis on graphs and networks.
C. Road maps
The rest of the paper is organized as follows. In Section
II, we define the notion of graph confidence. In Section III,
we analyze edge disclosure in graphs of two realworld social
networks. In Section IV, we present three heuristic algorithms
for edge anonymity. In Section V, we present experimental
results on utility and performance of our algorithms. Section
VI draws conclusions.
II. EDGE ANONYMITY
In the following, we define a privacy notion of edge
anonymity, which is based on a partition of graph.
A. Vertex Description Type and Graph Partition
Let G = (V,E) be a simple undirected graph, where V =
{v1,...,vn} is a set of vertexes and E = {(vi,vj)vi,vj∈ V }
is a set of edges. Each vertex represents an unidentified person.
For convenience, we will not distinguish vertex from person.
Definition 1: (radiusr subgraph) For an integer r ≥ 0,
the radius r subgraph (or rSG for short) of a vertex v ∈ V
is sgr(v) = (V?,E?), where V?⊆ V such that ∀v?∈ V?the
shortest path between v and v?contains at most r edges, and
E?⊆ E consists of edges among vertexes in V?.
Many topological descriptions of graphs have been defined
in graph theory [15], including degree sequence, adjacency
matrix, shortest path, hub, etc. A topological description of
the subgraph surrounding a vertex can give a nontrivial
description of the vertex.
Definition 2: (vertex description type) A vertex descrip
tion type (VDT) D is a set of finite topological descriptions of
rSGs of a given radius r ≥ 0, with an equivalence relation
∼Dand a function that maps each vertex v to D(v), the type
D description of vertex v.
For example, the set of degrees of vertexes is a VDT (here
r = 0) whose equivalence relation is the integer equality;
the set of degree sequences of 1SGs with the equality of
sequence of integers up to a permutation as the equivalence
relation is another VDT; and a set of adjacency matrices
of 3SGs with the graph isomorphism is yet another VDT.
Implicitly, VDTs have been used in previous studies of graph
anonymization. Examples include vertex degree [7], [11], hubs
[7], and neighborhood [18].
Definition 3: (graph partition) Let D be a VDT1. The
typeD vertex partition of a graph G = (V,E) is PD(G) =
{CC ⊆ V,(∀v1,v2 ∈ C)(D(v1) ∼D D(v2)}, where each
C ∈ PD(G) is a vertex equivalence class (or VEC). The
typeD edge partition of G is {Eij(∀(v,v?) ∈ Eij)(v ∈
Ci,v?∈ Cj,Ci ∈ PD(G),Cj ∈ PD(G)}, where Eij is an
edge equivalence class (or EEC) with end VECs Ciand Cj.
Intuitively, a VEC C contains all vertexes of an equivalent
typeD description D(C) and an EEC Eijcontains all edges
between VECs Ciand Cj.
B. Graph Confidence
Suppose that an adversary is able to determine in a graph
partitioned using a given VDT two (not necessarily distinct)
VECs for target persons u and v, but cannot distinguish these
target persons from other persons in their VECs. The adversary
wants to determine the probability that there is an edge linking
the two individuals.
Definition 4: (linking probability) Let Ciand Cjbe (not
necessarily distinct) VECs in a typeD vertex partition of a
1For the sake of presentation, we only consider vertex partitions based on a
single VDT. However, our results apply to partitions based on multiple VDTs
(such as PD1,D2(G)).
Page 3
3
(a) COA k = 1 to 5
(b) COA k = 6 to 10
(c) EPINION
Fig. 1: Edge Disclosure in Social Networks
graph G. The probability that an edge in Eij links a target
person u hidden in Ciand a target persons v hidden in Cjis
pij= Pr[Ci,Cj] =αij
βij
(1)
where αijis the number of edges in Eij, i.e., αij= Eij and
βijis the number of pairs of vertexes between Ciand Cj, that
is,
?Ci×(Ci−1),
Ci × Cj,
Intuitively, βijis the number of possible relationships between
individuals in Ci and individuals in Cj. Since only αij out
of βij relationships actually exists between Ci and Cj, the
probability that any randomly chosen pair of individuals has a
relationship is
between Ciand Cj, assuming that each pair of individuals is
equally likely to have a relationship.
Definition 5: (τconfidence) Given a VDT D. The confi
dence of a graph G (that it protects edge anonymity) is defined
as confD(G) = 1−maxPG,D, where PG,D= {pijCi,Cj∈
PD(G),i ≤ j} is the set of linking probabilities calculated
based on the typeD partition of G. A graph G is τconfident
wrt D if confD(G) ≥ τ.
βij=
2
i = j;
i < j.
αij
βij, the ratio of actual edges and possible edges
C. Discussion
Definition 4 assumes that the adversary has complete knowl
edge to locate a single VEC for each target person. In practice,
the adversary may only have partial knowledge and cannot
locate a single VDT. For example, the adversary may know
that target persons have certain degrees and the anonymous
graph is obtained by deleting edges of the original graph,
but does not know if the degrees of target persons in the
anonymous graph are reduced or not. In this case, if vertex
degree is used as the VDT for graph partition, the adversary
has to assume that each target person may be hidden in any
VEC whose degree is equal to or less than the true degree of
the person. Definition 4 can be easily extended as follows to
cover this case.
Definition 6: (linking probability generalized) Let Suand
Svbe two sets of VECs in a typeD vertex partition of a graph.
The probability that there is an edge linking a target person
u hidden in Suand a target person v hidden in Svis
1
Su × Sv
Pr[Su,Sv] =
?
Ci∈Su,Cj∈Sv
Pr[Ci,Cj].
This definition assumes that target person u (or v) is equally
likely to be hidden in any VEC in Su(or Sv). With Definition
6, if an anonymous graph is τconfident with respect to
(wrt) single VEC under a VDT, it is also τconfident wrt
multiple VECs. This is obvious because if ∀Ci,Cj∈ PD(G),
Pr[Ci,Cj] ≤ 1 − τ, then ∀Su,Sv ⊆ PD(G), Pr[Su,Sv] ≤
1 − τ. Thus, in the rest of this paper, we only consider
confidence wrt single VEC.
Our definition of graph confidence is general in the sense that
it allows for any given VDT. Of course, the more complex
the VDT is, the more powerful the adversary is, and the more
difficult the edge anonymity problem becomes. However, we
believe that in practice, it is not necessary to use very complex
VDT. The reason is that if the adversary is able to use a
very complex VDT to attack, he perhaps already knows the
sensitive information targeted by the very attack. Nonetheless,
it is still an open problem to determine which VDT is the
most appropriate for protecting edge anonymity. We leave it
for future research. In the rest of this paper, we focus on
a special type of edge anonymity problem that uses vertex
degree as the VDT.
III. EDGE DISCLOSURE IN SOCIAL NETWORKS
In this section, we analyze through experiments the risk of
edge disclosure in some realworld social networks. We con
sider two datasets: EPINION[6] and COA [4]. The EPINION
dataset is the “web of trust” social network extracted from
Epinion.com. The dataset is a directed graph in which vertexes
represent members and edges represent the trust relationship
among members. For the purpose of our experiments, we
converted the data into an undirected graph by simply ignore
the direction of edges. The resulting graph contains 49,287
vertexes and 381,035 edges. The COA dataset is a social
network used in [11]. The dataset is an undirected graph
containing 7,955 vertexes and 10,055 edges.
To investigate edge disclosure in graphs produced by al
gorithms of vertex anonymity, we implemented a vertex k
anonymity algorithm described in [11]: the priority algorithm
with the probing scheme using degree as a VDT and edge
deletion as anonymization strategy. We applied this algorithm
on COA graph to generate anonymous graphs. However, we
were unable to obtain vertex anonymous graph of EPINION
dataset using this algorithm due to the size and density of the
graph. In fact, all existing vertex kanonymity algorithms have
some problems working on EPINION. For example, we also
Page 4
4
Fig. 2: Edge Swap
implemented the simulated annealing algorithm of [7], which
searches for anonymous graph that optimizes a likelihood
estimate. However, the computation of the likelihood of a
single graph is O(E). Due to the size of EPINION graph, the
computation for split or moving one node can cause an out
ofmemory exception on a computer with 2GB main memory
and can take 23 minutes on a computer with 128GB main
memory.
We partitioned each graph using degree as the VDT, mea
sured the linking probabilities of each EEC, and counted
percentage of edges of various linking probabilities. The
results are shown in Figure 1, in which the Xaxis is the
linking probability and the Yaxis is the percentage of edges. In
Figures 1a and 1b, curves correspond to vertex kanonymous
graphs of COA, where k = 1 corresponds to the original graph.
A point (x,y) on a curve means that y percent of edges have
a linking probability of at least x. Based on the results, we
have the following observations.
1) Edge disclosure is more likely to occur in dense graphs.
As shown in Figure 1c, in EPINION graph, 5% (or
17064) edges have a linking probability of 0.5 or higher,
and 2% (or 6280) edges are completely disclosed. These
numbers become 0.1% (9) and 0.03% (3), respectively,
in COA graph.
2) Edge disclosure can still occur in vertex kanonymous
graphs. As shown in Figures 1a and 1b, even though the
original COA graph has much less risk of edge disclo
sure than EPINION, the vertex kanonymity algorithm
still cannot protect edge anonymity. Interestingly, for
k = 4 or 6, the anonymous graphs have higher risk
of edge disclosure than the original graph.
Based on this analysis, we believe that algorithms specifically
designed for edge anonymity are needed.
IV. ALGORITHMS FOR DEGREEBASED EDGE ANONYMITY
Even for the special type of edge anonymity problem that
uses degree of vertex as the VDT, finding the optimal τ
confident edge anonymization is intractable. Therefore, in this
section, we present heuristic algorithms.
There are four graph anonymization strategies: 1) random
edge deletion, 2) random edge addition, 3) random edge swap
and 4) random edge addition/deletion. Edge swap is a special
type of edge addition/deletion that deletes two edges and
adds back two new edges connecting the four vertexes in
one of the two specific ways illustrated in Figure 2. A basic
operation of each strategy makes a minimum change to a
Algorithm 1 Degreebased Edge Swap
Input: graph G = (V,E) and threshold τ
Output: τconfident graph G?
Method:
1.
G?= G;
2.partition G?by vertex degree;
3.while (confidence of G?is less than τ) do
4.randomly select an edge e1from the leading EEC;
5.find second edge e2according to Theorem 1;
6.if e2exists, perform edge swap with e1and e2;
7. else return an empty graph;
8.end while
9. return G?;
graph. For example, deleting one edge is a basic operation
of edge deletion, and swapping two edges is a basic operation
of edge swap.
For edge anonymity, these strategies have different impact
on linking probability. For example, edge deletion can always
reduce linking probability, but edge addition may increase
linking probability. On the other hand, the effect of edge swap
is often difficult to predict. These anonymization strategies
also have different impact on different graph measurements
[5]. For example, edge swap does not alter vertex degree, but
may change centrality and shortest paths.
In the rest of this section, we present algorithms that
perform either edge swap or edge deletion. Intuitively, it is also
possible to obtain edge anonymity by adding edges. However,
directly adding counterfeit edges to the original graph will not
help, because adding more edges to EECs will increase linking
probabilities. (Although the added edges do not represent
real relationships, they cause those real relationships to be
identified more easily.) One option will be to start from the
graph that contains all vertexes but no edge, and then add
original edges into the graph one at a time as long as the edge
anonymity requirement is still satisfied. To preserve utility,
we will want to add back to the graph as many original
edges as possible. This method however can suffer from poor
performance if the original graph contains a large number of
vertexes and edges, and the majority of original edges will
remain in the anonymous graph. Thus, for practical reasons,
we do not consider algorithms that use edge addition.
A. Degreebased Edge Swap
Algorithm 1 takes as input a graph and a confidence thresh
old, and used edge swap to obtain a τconfident anonymous
graph if one can be found or an empty graph otherwise. The
goal is to find a graph that not only satisfies the privacy re
quirement but also has a good utility. To achieve this goal, the
algorithm uses a greedy strategy to improve graph confidence,
namely, it focuses on reducing the size of the leading EEC,
which corresponds to the maximum linking probability of the
graph. Intuitively, reducing the size of the leading EEC may
improve graph confidence more quickly than reducing the size
of other EECs, therefore result in fewer edges being swapped
and better utility of anonymous graphs.
Page 5
5
Algorithm 2 Edge Deletion with Maximum Choice
Input: graph G = (V,E) and threshold τ
Output: τconfident graph G?
Method:
1.
G?= G;
2.partition G?by vertex degree;
3.while (confidence of G?is less than τ) do
4.for each edge e in the leading EEC do
5.compute RMLP and IOLP of deleting e;
6.delete the edge of maximum RMLP and
minimum IOLP;
7.end while
8. return G?;
Algorithm 3 Edge Deletion with Random Choice
Input: graph G = (V,E) and threshold τ
Output: τconfident graph G?
Method:
1.
G?= G;
2.partition G?by vertex degree;
3.while (confidence of G?is less than τ) do
4.randomly delete an edge in the leading EEC;
5. return G?;
In each iteration (steps 38), the algorithm attempts to swap
the pair of edges that can lead to the biggest improvement of
the graph confidence. If such a pair of edges does not exist,
the algorithm will terminates with an empty graph. To choose
the edges to swap, Algorithm 1 takes an edge from the leading
EEC and find a second edge from an EEC that satisfies the
conditions of the following theorem.
Theorem 1: Let a graph G be a graph partitioned using
vertex degree and G?be the graph obtained by a valid swap
of two edges e1 ∈ Eij and e2 ∈ Est. where i ≤ j, s ≤ t.
Then, for each EEC Exy in G?that receives any new edge,
the corresponding linking probability p
and only if indexes i,j,s,t contain at least two distinct values
and each of these values appears at most twice, and one of
the following conditions holds.
1) i = j, s = t, and αis< αii·βis
2) i < j or s < t, and for x ∈ {i,j}, y ∈ {s,t}, αxy<
αij·βxy
Proof: See Appendix.
Lemma 1: Given Eijand Estthat satisfy the condition of
Theorem 1. Any pair of edges e1 ∈ Eij and e2 ∈ Est will
reduce pijby the same amount.
Proof: As indicated in the proof of Theorem 1, which is
independent of the choice of e1and e2.
Intuitively, Theorem 1 guarantees that the swap of the appro
priate pair of edges will always reduce the maximum linking
probability pij and will not cause other linking probabilities
to become larger than pij. Lemma 1 indicates that as long
as the appropriate EECs are determined, the choice of edges
within these EECs does not make any difference provided a
valid swap can be made.
?
xyis less than pij if
βii− 2;
βij− 1.
B. Degreebased Edge Deletion
Algorithm 2 takes as input a graph and a confidence
threshold, and returns a τconfident anonymous graph using
edge deletion.
To preserve utility, Algorithm 2 also focuses on deleting
edges of the leading EEC. To select an edge to delete, it
estimates the amount of decrease of the maximum linking
probability and the amount of increase of other linking proba
bilities that will be resulted from the deletion of the edge. The
edge that will be selected should result in the largest reduction
to the maximum linking probability (RMLP). If more than one
such edge exists, the one that results in the smallest increase of
other linking probabilities (IOLP) should be selected. Notice
that it is possible that the deletion of the selected edge
does not immediately improve the graph confidence. However,
the algorithm will not stop if this situation occurs because
by deleting more edges the maximum linking probability
will eventually be reduced, and a τconfident graph will be
obtained. Furthermore, this can happen independent of the
order in which edges are deleted.
To efficiently estimate the reduction of the maximum linking
probability and the increase of other linking probabilities re
sulted from deleting an edge in the leading EEC, the algorithm
does the following.
Suppose that the leading EEC is Eij and the edge (u,v)
is under consideration, where u ∈ Ci, v ∈ Cj, and VEC Ci
contains vertexes of degree i. If (u,v) is deleted, u and v will
be moved into VECs Ci−1and Cj−1, respectively. If i ?= j,
the deletion of the edge will decrease the sizes of Ci and
Cj and increase the sizes of Ci−1 and Cj−1, each by one.
If i = j, the deletion of the edge will decrease the size of
Ci and increase the size of Ci−1, each by two. As u and v
are moved, so will edges incident to u or v. To be specific,
consider an edge (u,w), where w is in some VEC Cs. Once
u is moved, this edge will also be moved from EEC Eisinto
EEC E(i−1)s. This move will decrease the size of Eis and
increase the size of E(i−1)s, each by one. The size changes
of EECs affected by moving v can be determined similarly.
Thus, the size changes of VECs and EECs affected by the
edge deletion can be efficiently determined without actually
moving vertexes and edges. These size changes can then be
used to calculate RMLP and IOLP.
Algorithm 3 is an alternative edge deletion method, which
chooses a random edge, instead of the best edge, from the
leading EEC for deletion.
V. EXPERIMENTS
In this section, we present results of our empirical study.
We implemented in Java the three algorithms described in
Section IV and used prefuse (http://prefuse.org/), an open
source graph package, for graph maintenance as well as for
graph operations, such as edge deletion and edge swap.
We performed experiments on three datasets. In addition
to EPINION and COA datasets, we also used KDDCUP[8]
dataset, which is a collection of abstracts of papers in high
energy physics published between 1992 through 2003. We
extracted from the data a graph, which has a vertex for each