Page 1

1

Edge Anonymity in Social Network Graphs

Lijie Zhang and Weining Zhang

Department of Computer Science, University of Texas at San Antonio

{lijez, wzhang}@cs.utsa.edu

Abstract—Edges in social network graphs can model sensitive

relationships. In this paper, we consider the problem of edges

anonymity in graphs. We propose a probabilistic notion of edge

anonymity, called graph confidence, which is general enough to

capture the privacy breach made by an adversary who is able to

pinpoint target persons in a graph partition based on any given

set of topological features of vertexes. We then focus on a special

type of edge anonymity problem which uses vertex degree to

partition a graph. We analyze edge disclosure in real-world social

networks and show that even if graphs are anonymized to prevent

vertex disclosure, they still do not guarantee edge anonymity. We

present three heuristic algorithms that protect edge anonymity

using edge swap or edge deletion. Our experimental results, based

on three real-world social networks and several utility measures,

show that these algorithms can effectively protect edge anonymity

and obtain anonymous graphs of acceptable utility.

I. INTRODUCTION

Social networks emerge as an important platform for peo-

ple to establish, discover, and maintain their relationships

with others. These systems also provide enormous potential

for e-business and present unique opportunities for social

behavior research. Conceptually, social networks are graphs

in which vertexes represent individuals and edges represent

relationships among individuals, such as friendship, trust, and

social contact. These graphs are extremely useful for studying

patterns of social influence [1], models of viral marketing [9],

collaborative filtering in recommendation systems [2], to name

a few. However, these applications of social networks also raise

serious concerns of privacy.

It has been pointed out [3], [7] that privacy can be breached

in published social networks even if they do not contain

personal identities. One type of attack to privacy is vertex re-

identification [7], [11], [18], [19] whereby an adversary can

identify the vertex of a target person by analyzing topological

features of the vertex based on his background knowledge

about the person. For example, based on his knowledge about

a target person Alice, the adversary may know the degrees of

vertexes of Alice and her friends in a social network. Using

this knowledge, the adversary can analyze a published graph of

the social network and identify all vertexes whose surrounding

graphs (SGs) have these specific degrees. These vertexes form

an equivalence class (EC). Alice can be re-identified if her

equivalence class contains a single vertex. A fundamental

privacy notion with respect to vertex re-identification is vertex

k-anonymity, which requires each equivalence class to contain

at least k vertexes.

Another type of attack is edge re-identification [12], [16],

[17], [19] whereby the adversary can identify the edge incident

to vertexes of two target persons by analyzing topological

features of the vertexes using his knowledge about the pair of

persons. Edge re-identification is a breach to privacy because

edges may represent sensitive information. Notice that edge

can be re-identified even when vertexes cannot. For example,

the adversary may know the degrees of two target persons and

use this knowledge to identify ECs of the two persons. If every

vertex in one EC has an edge with every vertex in the other EC,

the adversary can infer with probability 1.0 that an edge exists

between the two target persons, even if the adversary may not

be able to identify the two persons within their respective ECs.

Unlike vertex anonymity, there is currently no well-established

privacy notion of edge anonymity.

A. Our Contributions

In this paper, we focus on protecting edge anonymity in

published social networks. First, we propose a probabilistic

privacy notion of edge anonymity, called graph confidence,

which is defined with respect to a partition the graph using

a vertex description type (VDT), topological features of ver-

texes. This privacy notion captures the privacy breach made

by an adversary who is able to pinpoint vertex ECs of target

persons in a graph partition based on any given set of topo-

logical features. Although it is an open problem to determine

what is the most appropriate assumption of the ability of the

adversary, our notion of edge anonymity is general enough to

accommodate any given VDT. We then focus on a special type

of edge anonymity problem, which uses vertex degree as the

VDT, and make the following contributions.

1) We analyze edge disclosure in two real-world social

networks and show that edge disclosure can occur in

these graphs, especially the denser one, both before and

after applying algorithms of vertex anonymity.

2) We present three algorithms to obtain τ-confidence

graphs, graphs whose confidence is no less than a

threshold τ. These algorithms use heuristics to perform

either edge swap or edge deletion, so that not only to

achieve edge anonymity but also to preserve utility.

3) We study empirically the performance and utility of

these algorithms. Our preliminary results, based on three

real-world social networks and several utility measures,

show that these algorithms can effectively protect edge

anonymity and can obtain anonymous graphs of accept-

able utility.

B. Related Work

Several methods have been proposed to prevent vertex re-

identification through vertex k-anonymity, which places at

least k vertexes in each equivalence class. These methods

Page 2

2

differ in the types of the structural features that an adversary

might use to partition vertexes in anonymous graphs.

Liu and Terzi [11] studied the vertex re-identification attack

assuming that the adversary knows only the degree of the

vertex of a target person. Their method obtains a vertex k-

anonymous graph by adding or deleting edges of the original

graph, so that, there are at least k vertexes of each degree.

Zhou and Pei [18] studied a neighborhood attack, in which

the adversary re-identifies a target person by partitioning the

graph according to isomorphism of vertex neighborhoods, a

type of subgraph surrounding a vertex. Their method obtains

a vertex k-anonymous graph by adding edges, so that, at least

k vertexes will have isomorphic neighborhoods.

Hay et al. [7] model the background knowledge of an

adversary as the answer to a knowledge query and proposed

a method to prevent a vertex re-identification attack based

on any type of knowledge queries. Their method obtains a

vertex k-anonymous supergraph by aggregating vertexes into

supernodes and edges into superedges, so that, each supernode

represents at least k vertexes and each superedge represents all

edges between vertexes in two supernodes. Since supernodes

and superedges do not reveal internal topology, users of a

supergraph have to generate conventional graphs by random

sampling.

A number of methods were also proposed for edge

anonymity.

Singh and Zhan [14] briefly introduced edge inference

breach and proposed to measure graph anonymity in terms

of vertex degree and vertex clustering coefficient. But they

did not measure the privacy between individuals.

Zheleva et al. [17] considered a link re-identification attacks

in which the adversary infers sensitive relationships from

non-sensitive ones in graphs that contains multiple types

of edges. In addition to removing all sensitive edges, their

method also removes some non-sensitive edges and aggregates

vertexes/edges into clustered nodes/edges.

Ying and Wu [16] considers edge re-identification attacks in

which the adversary does not have any background knowledge.

Their methods obtain an anonymous graph by performing

random edge swap, edge addition, and edge deletion. However,

since their methods typically introduce very small random

noise, the anonymous graphs obtained by these methods may

not provide sufficient protection to edge anonymity.

We refer interested readers to the recent survey by Liu et

al. [10], which gives a more detailed account on research of

privacy-preserving data analysis on graphs and networks.

C. Road maps

The rest of the paper is organized as follows. In Section

II, we define the notion of graph confidence. In Section III,

we analyze edge disclosure in graphs of two real-world social

networks. In Section IV, we present three heuristic algorithms

for edge anonymity. In Section V, we present experimental

results on utility and performance of our algorithms. Section

VI draws conclusions.

II. EDGE ANONYMITY

In the following, we define a privacy notion of edge

anonymity, which is based on a partition of graph.

A. Vertex Description Type and Graph Partition

Let G = (V,E) be a simple undirected graph, where V =

{v1,...,vn} is a set of vertexes and E = {(vi,vj)|vi,vj∈ V }

is a set of edges. Each vertex represents an unidentified person.

For convenience, we will not distinguish vertex from person.

Definition 1: (radius-r subgraph) For an integer r ≥ 0,

the radius r subgraph (or r-SG for short) of a vertex v ∈ V

is sgr(v) = (V?,E?), where V?⊆ V such that ∀v?∈ V?the

shortest path between v and v?contains at most r edges, and

E?⊆ E consists of edges among vertexes in V?.

Many topological descriptions of graphs have been defined

in graph theory [15], including degree sequence, adjacency

matrix, shortest path, hub, etc. A topological description of

the subgraph surrounding a vertex can give a non-trivial

description of the vertex.

Definition 2: (vertex description type) A vertex descrip-

tion type (VDT) D is a set of finite topological descriptions of

r-SGs of a given radius r ≥ 0, with an equivalence relation

∼Dand a function that maps each vertex v to D(v), the type-

D description of vertex v.

For example, the set of degrees of vertexes is a VDT (here

r = 0) whose equivalence relation is the integer equality;

the set of degree sequences of 1-SGs with the equality of

sequence of integers up to a permutation as the equivalence

relation is another VDT; and a set of adjacency matrices

of 3-SGs with the graph isomorphism is yet another VDT.

Implicitly, VDTs have been used in previous studies of graph

anonymization. Examples include vertex degree [7], [11], hubs

[7], and neighborhood [18].

Definition 3: (graph partition) Let D be a VDT1. The

type-D vertex partition of a graph G = (V,E) is PD(G) =

{C|C ⊆ V,(∀v1,v2 ∈ C)(D(v1) ∼D D(v2)}, where each

C ∈ PD(G) is a vertex equivalence class (or VEC). The

type-D edge partition of G is {Eij|(∀(v,v?) ∈ Eij)(v ∈

Ci,v?∈ Cj,Ci ∈ PD(G),Cj ∈ PD(G)}, where Eij is an

edge equivalence class (or EEC) with end VECs Ciand Cj.

Intuitively, a VEC C contains all vertexes of an equivalent

type-D description D(C) and an EEC Eijcontains all edges

between VECs Ciand Cj.

B. Graph Confidence

Suppose that an adversary is able to determine in a graph

partitioned using a given VDT two (not necessarily distinct)

VECs for target persons u and v, but cannot distinguish these

target persons from other persons in their VECs. The adversary

wants to determine the probability that there is an edge linking

the two individuals.

Definition 4: (linking probability) Let Ciand Cjbe (not

necessarily distinct) VECs in a type-D vertex partition of a

1For the sake of presentation, we only consider vertex partitions based on a

single VDT. However, our results apply to partitions based on multiple VDTs

(such as PD1,D2(G)).

Page 3

3

(a) COA k = 1 to 5

(b) COA k = 6 to 10

(c) EPINION

Fig. 1: Edge Disclosure in Social Networks

graph G. The probability that an edge in Eij links a target

person u hidden in Ciand a target persons v hidden in Cjis

pij= Pr[Ci,Cj] =αij

βij

(1)

where αijis the number of edges in Eij, i.e., αij= |Eij| and

βijis the number of pairs of vertexes between Ciand Cj, that

is,

?|Ci|×(|Ci|−1),

|Ci| × |Cj|,

Intuitively, βijis the number of possible relationships between

individuals in Ci and individuals in Cj. Since only αij out

of βij relationships actually exists between Ci and Cj, the

probability that any randomly chosen pair of individuals has a

relationship is

between Ciand Cj, assuming that each pair of individuals is

equally likely to have a relationship.

Definition 5: (τ-confidence) Given a VDT D. The confi-

dence of a graph G (that it protects edge anonymity) is defined

as confD(G) = 1−maxPG,D, where PG,D= {pij|Ci,Cj∈

PD(G),i ≤ j} is the set of linking probabilities calculated

based on the type-D partition of G. A graph G is τ-confident

wrt D if confD(G) ≥ τ.

βij=

2

i = j;

i < j.

αij

βij, the ratio of actual edges and possible edges

C. Discussion

Definition 4 assumes that the adversary has complete knowl-

edge to locate a single VEC for each target person. In practice,

the adversary may only have partial knowledge and cannot

locate a single VDT. For example, the adversary may know

that target persons have certain degrees and the anonymous

graph is obtained by deleting edges of the original graph,

but does not know if the degrees of target persons in the

anonymous graph are reduced or not. In this case, if vertex

degree is used as the VDT for graph partition, the adversary

has to assume that each target person may be hidden in any

VEC whose degree is equal to or less than the true degree of

the person. Definition 4 can be easily extended as follows to

cover this case.

Definition 6: (linking probability generalized) Let Suand

Svbe two sets of VECs in a type-D vertex partition of a graph.

The probability that there is an edge linking a target person

u hidden in Suand a target person v hidden in Svis

1

|Su| × |Sv|

Pr[Su,Sv] =

?

Ci∈Su,Cj∈Sv

Pr[Ci,Cj].

This definition assumes that target person u (or v) is equally

likely to be hidden in any VEC in Su(or Sv). With Definition

6, if an anonymous graph is τ-confident with respect to

(wrt) single VEC under a VDT, it is also τ-confident wrt

multiple VECs. This is obvious because if ∀Ci,Cj∈ PD(G),

Pr[Ci,Cj] ≤ 1 − τ, then ∀Su,Sv ⊆ PD(G), Pr[Su,Sv] ≤

1 − τ. Thus, in the rest of this paper, we only consider

confidence wrt single VEC.

Our definition of graph confidence is general in the sense that

it allows for any given VDT. Of course, the more complex

the VDT is, the more powerful the adversary is, and the more

difficult the edge anonymity problem becomes. However, we

believe that in practice, it is not necessary to use very complex

VDT. The reason is that if the adversary is able to use a

very complex VDT to attack, he perhaps already knows the

sensitive information targeted by the very attack. Nonetheless,

it is still an open problem to determine which VDT is the

most appropriate for protecting edge anonymity. We leave it

for future research. In the rest of this paper, we focus on

a special type of edge anonymity problem that uses vertex

degree as the VDT.

III. EDGE DISCLOSURE IN SOCIAL NETWORKS

In this section, we analyze through experiments the risk of

edge disclosure in some real-world social networks. We con-

sider two datasets: EPINION[6] and COA [4]. The EPINION

dataset is the “web of trust” social network extracted from

Epinion.com. The dataset is a directed graph in which vertexes

represent members and edges represent the trust relationship

among members. For the purpose of our experiments, we

converted the data into an undirected graph by simply ignore

the direction of edges. The resulting graph contains 49,287

vertexes and 381,035 edges. The COA dataset is a social

network used in [11]. The dataset is an undirected graph

containing 7,955 vertexes and 10,055 edges.

To investigate edge disclosure in graphs produced by al-

gorithms of vertex anonymity, we implemented a vertex k-

anonymity algorithm described in [11]: the priority algorithm

with the probing scheme using degree as a VDT and edge

deletion as anonymization strategy. We applied this algorithm

on COA graph to generate anonymous graphs. However, we

were unable to obtain vertex anonymous graph of EPINION

dataset using this algorithm due to the size and density of the

graph. In fact, all existing vertex k-anonymity algorithms have

some problems working on EPINION. For example, we also

Page 4

4

Fig. 2: Edge Swap

implemented the simulated annealing algorithm of [7], which

searches for anonymous graph that optimizes a likelihood

estimate. However, the computation of the likelihood of a

single graph is O(|E|). Due to the size of EPINION graph, the

computation for split or moving one node can cause an out-

of-memory exception on a computer with 2GB main memory

and can take 23 minutes on a computer with 128GB main

memory.

We partitioned each graph using degree as the VDT, mea-

sured the linking probabilities of each EEC, and counted

percentage of edges of various linking probabilities. The

results are shown in Figure 1, in which the X-axis is the

linking probability and the Y-axis is the percentage of edges. In

Figures 1a and 1b, curves correspond to vertex k-anonymous

graphs of COA, where k = 1 corresponds to the original graph.

A point (x,y) on a curve means that y percent of edges have

a linking probability of at least x. Based on the results, we

have the following observations.

1) Edge disclosure is more likely to occur in dense graphs.

As shown in Figure 1c, in EPINION graph, 5% (or

17064) edges have a linking probability of 0.5 or higher,

and 2% (or 6280) edges are completely disclosed. These

numbers become 0.1% (9) and 0.03% (3), respectively,

in COA graph.

2) Edge disclosure can still occur in vertex k-anonymous

graphs. As shown in Figures 1a and 1b, even though the

original COA graph has much less risk of edge disclo-

sure than EPINION, the vertex k-anonymity algorithm

still cannot protect edge anonymity. Interestingly, for

k = 4 or 6, the anonymous graphs have higher risk

of edge disclosure than the original graph.

Based on this analysis, we believe that algorithms specifically

designed for edge anonymity are needed.

IV. ALGORITHMS FOR DEGREE-BASED EDGE ANONYMITY

Even for the special type of edge anonymity problem that

uses degree of vertex as the VDT, finding the optimal τ-

confident edge anonymization is intractable. Therefore, in this

section, we present heuristic algorithms.

There are four graph anonymization strategies: 1) random

edge deletion, 2) random edge addition, 3) random edge swap

and 4) random edge addition/deletion. Edge swap is a special

type of edge addition/deletion that deletes two edges and

adds back two new edges connecting the four vertexes in

one of the two specific ways illustrated in Figure 2. A basic

operation of each strategy makes a minimum change to a

Algorithm 1 Degree-based Edge Swap

Input: graph G = (V,E) and threshold τ

Output: τ-confident graph G?

Method:

1.

G?= G;

2.partition G?by vertex degree;

3.while (confidence of G?is less than τ) do

4.randomly select an edge e1from the leading EEC;

5.find second edge e2according to Theorem 1;

6. if e2exists, perform edge swap with e1and e2;

7.else return an empty graph;

8.end while

9.return G?;

graph. For example, deleting one edge is a basic operation

of edge deletion, and swapping two edges is a basic operation

of edge swap.

For edge anonymity, these strategies have different impact

on linking probability. For example, edge deletion can always

reduce linking probability, but edge addition may increase

linking probability. On the other hand, the effect of edge swap

is often difficult to predict. These anonymization strategies

also have different impact on different graph measurements

[5]. For example, edge swap does not alter vertex degree, but

may change centrality and shortest paths.

In the rest of this section, we present algorithms that

perform either edge swap or edge deletion. Intuitively, it is also

possible to obtain edge anonymity by adding edges. However,

directly adding counterfeit edges to the original graph will not

help, because adding more edges to EECs will increase linking

probabilities. (Although the added edges do not represent

real relationships, they cause those real relationships to be

identified more easily.) One option will be to start from the

graph that contains all vertexes but no edge, and then add

original edges into the graph one at a time as long as the edge

anonymity requirement is still satisfied. To preserve utility,

we will want to add back to the graph as many original

edges as possible. This method however can suffer from poor

performance if the original graph contains a large number of

vertexes and edges, and the majority of original edges will

remain in the anonymous graph. Thus, for practical reasons,

we do not consider algorithms that use edge addition.

A. Degree-based Edge Swap

Algorithm 1 takes as input a graph and a confidence thresh-

old, and used edge swap to obtain a τ-confident anonymous

graph if one can be found or an empty graph otherwise. The

goal is to find a graph that not only satisfies the privacy re-

quirement but also has a good utility. To achieve this goal, the

algorithm uses a greedy strategy to improve graph confidence,

namely, it focuses on reducing the size of the leading EEC,

which corresponds to the maximum linking probability of the

graph. Intuitively, reducing the size of the leading EEC may

improve graph confidence more quickly than reducing the size

of other EECs, therefore result in fewer edges being swapped

and better utility of anonymous graphs.

Page 5

5

Algorithm 2 Edge Deletion with Maximum Choice

Input: graph G = (V,E) and threshold τ

Output: τ-confident graph G?

Method:

1.

G?= G;

2.partition G?by vertex degree;

3. while (confidence of G?is less than τ) do

4.for each edge e in the leading EEC do

5.compute RMLP and IOLP of deleting e;

6. delete the edge of maximum RMLP and

minimum IOLP;

7.end while

8. return G?;

Algorithm 3 Edge Deletion with Random Choice

Input: graph G = (V,E) and threshold τ

Output: τ-confident graph G?

Method:

1.

G?= G;

2.partition G?by vertex degree;

3.while (confidence of G?is less than τ) do

4. randomly delete an edge in the leading EEC;

5. return G?;

In each iteration (steps 3-8), the algorithm attempts to swap

the pair of edges that can lead to the biggest improvement of

the graph confidence. If such a pair of edges does not exist,

the algorithm will terminates with an empty graph. To choose

the edges to swap, Algorithm 1 takes an edge from the leading

EEC and find a second edge from an EEC that satisfies the

conditions of the following theorem.

Theorem 1: Let a graph G be a graph partitioned using

vertex degree and G?be the graph obtained by a valid swap

of two edges e1 ∈ Eij and e2 ∈ Est. where i ≤ j, s ≤ t.

Then, for each EEC Exy in G?that receives any new edge,

the corresponding linking probability p

and only if indexes i,j,s,t contain at least two distinct values

and each of these values appears at most twice, and one of

the following conditions holds.

1) i = j, s = t, and αis< αii·βis

2) i < j or s < t, and for x ∈ {i,j}, y ∈ {s,t}, αxy<

αij·βxy

Proof: See Appendix.

Lemma 1: Given Eijand Estthat satisfy the condition of

Theorem 1. Any pair of edges e1 ∈ Eij and e2 ∈ Est will

reduce pijby the same amount.

Proof: As indicated in the proof of Theorem 1, which is

independent of the choice of e1and e2.

Intuitively, Theorem 1 guarantees that the swap of the appro-

priate pair of edges will always reduce the maximum linking

probability pij and will not cause other linking probabilities

to become larger than pij. Lemma 1 indicates that as long

as the appropriate EECs are determined, the choice of edges

within these EECs does not make any difference provided a

valid swap can be made.

?

xyis less than pij if

βii− 2;

βij− 1.

B. Degree-based Edge Deletion

Algorithm 2 takes as input a graph and a confidence

threshold, and returns a τ-confident anonymous graph using

edge deletion.

To preserve utility, Algorithm 2 also focuses on deleting

edges of the leading EEC. To select an edge to delete, it

estimates the amount of decrease of the maximum linking

probability and the amount of increase of other linking proba-

bilities that will be resulted from the deletion of the edge. The

edge that will be selected should result in the largest reduction

to the maximum linking probability (RMLP). If more than one

such edge exists, the one that results in the smallest increase of

other linking probabilities (IOLP) should be selected. Notice

that it is possible that the deletion of the selected edge

does not immediately improve the graph confidence. However,

the algorithm will not stop if this situation occurs because

by deleting more edges the maximum linking probability

will eventually be reduced, and a τ-confident graph will be

obtained. Furthermore, this can happen independent of the

order in which edges are deleted.

To efficiently estimate the reduction of the maximum linking

probability and the increase of other linking probabilities re-

sulted from deleting an edge in the leading EEC, the algorithm

does the following.

Suppose that the leading EEC is Eij and the edge (u,v)

is under consideration, where u ∈ Ci, v ∈ Cj, and VEC Ci

contains vertexes of degree i. If (u,v) is deleted, u and v will

be moved into VECs Ci−1and Cj−1, respectively. If i ?= j,

the deletion of the edge will decrease the sizes of Ci and

Cj and increase the sizes of Ci−1 and Cj−1, each by one.

If i = j, the deletion of the edge will decrease the size of

Ci and increase the size of Ci−1, each by two. As u and v

are moved, so will edges incident to u or v. To be specific,

consider an edge (u,w), where w is in some VEC Cs. Once

u is moved, this edge will also be moved from EEC Eisinto

EEC E(i−1)s. This move will decrease the size of Eis and

increase the size of E(i−1)s, each by one. The size changes

of EECs affected by moving v can be determined similarly.

Thus, the size changes of VECs and EECs affected by the

edge deletion can be efficiently determined without actually

moving vertexes and edges. These size changes can then be

used to calculate RMLP and IOLP.

Algorithm 3 is an alternative edge deletion method, which

chooses a random edge, instead of the best edge, from the

leading EEC for deletion.

V. EXPERIMENTS

In this section, we present results of our empirical study.

We implemented in Java the three algorithms described in

Section IV and used prefuse (http://prefuse.org/), an open

source graph package, for graph maintenance as well as for

graph operations, such as edge deletion and edge swap.

We performed experiments on three datasets. In addition

to EPINION and COA datasets, we also used KDDCUP[8]

dataset, which is a collection of abstracts of papers in high

energy physics published between 1992 through 2003. We

extracted from the data a graph, which has a vertex for each