Content uploaded by David Sivakoff

Author content

All content in this area was uploaded by David Sivakoff on Sep 18, 2017

Content may be subject to copyright.

Fast Change Point Detection on Dynamic Social Networks

Yu Wang∗Aniket Chakrabarti∗David Sivakoff#Srinivasan Parthasarathy∗

∗Department of Computer Science and Engineering, #Department of Statistics

The Ohio State University, Columbus, Ohio, USA

Contact: wang.5205@osu.edu or srini@cse.ohio-state.edu

Abstract

A number of real world problems in many domains

(e.g. sociology, biology, political science and com-

munication networks) can be modeled as dynamic

networks with nodes representing entities of inter-

est and edges representing interactions among the

entities at different points in time. A common rep-

resentation for such models is the snapshot model -

where a network is deﬁned at logical time-stamps.

An important problem under this model is change

point detection. In this work we devise an effec-

tive and efﬁcient three-step-approach for detect-

ing change points in dynamic networks under the

snapshot model. Our algorithm achieves up to 9X

speedup over the state-of-the-art while improving

quality on both synthetic and real world networks.

1 Introduction

Dynamic network analysis is increasingly used in complex

application domains ranging from social networks (Face-

book network evolution [Leskovec and Rok Sosiˇ

c, 2014])

to biological networks (protein-protein interaction [Shih and

Parthasarathy, 2012]), from political science (United Na-

tions General Assembly voting network [Voeten, 2012]) to

communication networks (Enron network [Klimt and Yang,

2004]). Such dynamic networks are often represented using

the snapshot model. Under this model, every network snap-

shot (represented by a graph) is deﬁned at a logical times-

tamp. Two questions of fundamental importance are – (i) how

does a network evolve? (ii) when does a network change sig-

niﬁcantly so as to arise suspicion that something fundamen-

tally different is happening?

Various generative models [Peixoto and Rosvall, 2015;

Zhang et al., 2016]have been proposed to address ques-

tion (i) - to explain the evolution of a network. They study

network evolution under certain generative models [Erd˝

os

and R´

enyi, 1960; Karrer and Newman, 2011]. In reality,

the generative model itself might change, as addressed in

question (ii) above. Existing work [Akoglu et al., 2014;

Ranshous et al., 2015]use complex methods to detect such

changes. One drawback of those delicate methods is that they

are time-consuming, and hence often not scalable (in terms of

both network size and number of snapshots). We seek to ﬁnd

an efﬁcient and effective solution that can scale up both with

network size and with number of snapshots.

In this paper, we present a simple and efﬁcient algo-

rithm based on likelihood maximization to detect change

points in dynamic networks under the snapshot model. We

demonstrate the utility of our algorithm on both synthetic

and real world networks drawn from political science (con-

gressional voting, UN voting), and show that it outperforms

two recent approaches (DeltaCon[Koutra et al., 2016], and

LetoChange[Peel and Clauset, 2015]) in terms of both qual-

ity and efﬁciency. Our work has the following contributions:

1. Our approach is general purpose – it can accommodate

various snapshot generative models (see Table 1).

2. We model network evolution as a ﬁrst order Markov pro-

cess and consequently our algorithm accounts for the

temporal dependency while computing the dissimilarity

between snapshots.

3. Our algorithm is efﬁcient and has constant memory over-

head that can be tuned by a user controlled parameter.

We extensively evaluate our approach on synthetic as well as

real world networks and show that our approach is extremely

efﬁcient (both in performance and quality).

2 Related Work

Ranshous et al. [Ranshous et al., 2015], and Akoglu et al.

[Akoglu et al., 2014]recently survey network anomaly detec-

tion. Our change point detection problem is similar to Type

4, the “Event and Change Detection”, of the former: given

a network sequence, a dissimilarity scoring function, and a

threshold, a change is ﬂagged if the dissimilarity of two con-

secutive snapshots is above the threshold. We differ in that we

assume there is a latent generation model governing the net-

work dynamics, and we are trying to detect the change in the

latent space, while they did not explicitly mention the latent

generation model. Moreover, we consider the temporal de-

pendency across the snapshots while no work in the surveys

accounted for temporal dependency.

DeltaCon [Koutra et al., 2016]uses a graph similarity-

based [Berlingerio et al., 2012]approach to detect change

points in dynamic networks. It derives the features of a snap-

shot based on sociological theories. And the feature similarity

of each consecutive snapshot pair is calculated. That work is

model agnostic (has no assumption on the generation model

arXiv:1705.07325v2 [cs.SI] 4 Jun 2017

of networks), and is the state-of-the-art in terms of efﬁciency.

We compare our algorithm against this.

Moreno and Neville [Moreno and Neville, 2013], Bridges

et al. [Bridges et al., 2015]and Peel and Clauset [Peel and

Clauset, 2015]develop network hypothesis testing based ap-

proaches. The advantage is that one can get a p-value of the

test, which quantiﬁes the conﬁdence of the conclusion. How-

ever, these approaches have two shortcomings: ﬁrstly, they

need to assume a speciﬁc generation model of the networks

(mKPGM, GBTER and GHRG respectively); secondly, they

are extremely slow, mostly due to the bootstrapping for p-

value calculation. La Fond et al.’s work [La Fond et al., 2014]

can also generate a p-value. It is tested against DeltaCon

without reporting running time and efﬁciency concern is also

mentioned in the paper. These algorithms will not work in our

setting where the detection is done real time under bounded

memory constraints. We compare our model agnostic algo-

rithm against [Peel and Clauset, 2015].

The DAPPER heuristic [Caceres and Berger-Wolf, 2013]

proposes a similar edge probability estimator as ours. How-

ever, it does not consider the temporal dependency of snap-

shots. Moreover, it focuses on temporal scale determination

while ours focuses on change point detection. Loglisci et al.

[Loglisci et al., 2015]study change point detection on rela-

tional network using rule-based analysis. Our approach uses

(hidden) parameter estimation instead of semantic rule to in-

fer the structure. Li et al. [Li et al., 2016]propose an online

algorithm, and consider temporal dependency. The problem

they study is different from ours in that they study informa-

tion diffusion on network with ﬁxed structure and use con-

tinuous time. A recent work by Zhang et al. [Zhang et al.,

2016]also studies the dynamic network in a Markov chain

setting. They focus on community detection while we focus

on change point detection.

3 Problem Formulation

This paper studies how to detect the times at which the funda-

mental evolution mechanism of a dynamic network changes.

We assume that there is some unknown underlying model that

governs the generative process. Our change point detection

algorithm is agnostic to this model. We assume that the ob-

served network snapshots are samples that depend on some

generative model and the previous snapshot. Networks have

ﬂuctuation across snapshots even when the generative model

stays unchanged. Only when the generative model changes

do we consider it a fundamental change. We represent the

evolutionary process as a Markov Network (Figure 1).

In Figure 1, Mtis the network generation model at time t.

It is a triad Mt=hTypet,Θt, αti, where αtis the continu-

ity parameter at time t, Typetspeciﬁes the model, while Θt

represents the model parameters (Table 1 consists of some

generative models we experiment on). Gtis the network

(graph) observable at time t. We assume the number of ver-

tices in Gtis ﬁxed to be Nfor all snapshots (the union of all

nodes is used when there is node addition/deletion, as in [Peel

and Clauset, 2015]), so each Gthas 2(N

2)possible conﬁgu-

rations, and Tis the total number of snapshots we observe.

As per Figure 1 the conﬁguration of the network at time t,

Gt, depends on the generation model at time t,Mt(unob-

Latent M1M2M3Mt

G1G2G3Gt

Observed

...

...

Figure 1: Representation of the underlying generative process. Our

inference is agnostic to Mts.

served) and the network conﬁguration at time t−1,Gt−1

(observed). Hence the networks in the observed sequence are

samples from a conditional distribution (samples are not in-

dependent). The continuity rate parameter αtcontrols the

fraction of edges and non-edges that are retained from the

previous snapshot, Gt−1. The network at time tis assumed

to be generated in the following way: for each dyad, keep the

connection status from time t−1with probability 1−αt, and

with probability αt, resample the connection according to the

generation model at time t. Consequently, the smaller αtis,

the more overlap between two consecutive snapshots there is.

Note that two consecutive network conﬁgurations may differ

substantially if αt>0, even though the underlying genera-

tion model may be the same. Moreover, the changes of the

generation model are assumed to be rare across the time span

(Mt6=Mt+1 is a rare event).

Problem Deﬁnition Our goal is to efﬁciently ﬁnd a set S⊂

{2, . . . , T }such that t∈S⇐⇒ Mt6=Mt−1, that is,

to efﬁciently ﬁnd all the time points at which the network

generation model is different from the previous time point.

4 Methodology

Given the graphical formulation of the problem, exact infer-

ence is impossible since we do not know the underlying gen-

erative model, and our observations are stochastic. However,

even without prior knowledge of the generative model, we

can still design an approximate inference technique based on

MCMC sampling theory.

The framework is straightforward, as mentioned in Sec-

tion 2: we ﬁrst extract a “feature vector” from each snapshot,

then quantify the dissimilarity between consecutive snap-

shots, and ﬂag out a change point when the dissimilarity score

is above a threshold. We use the joint edge probability as the

“feature vector” (Section 4.1), exploit Kolmogorov-Smirnov

statistic, Kullback-Leibler divergence and Euclidean distance

for dissimilarity measure (Section 4.2), and use a permutation

test like approach to determine the threshold (Section 4.3).

4.1 Edge Probability Estimation

In this subsection, we discuss how to (approximately) esti-

mate the joint distribution of the dyads 1. We track the pres-

ence or absence of a small ﬁxed number of dyads through-

out the entire observed sequence of network snapshots. We

break down the observation sequence into ﬁxed-length win-

dows, and for each window we infer the joint distribution of

the dyads in our sample. We model each dyad to be a condi-

tionally independent two-state Markov chain (Figure 2, we

1we refer node pairs, which may or not be linked, as dyads

Table 1: Edge probability between a dyad in each model

Model Edge probability Explanation

Erd˝

os–R´

enyi

(ER)

p(hni, nji | M) = p p: edge probability

Chung–Lu (CL) p(hni, nji | M) = βwiwj/Piwiwi: weight of node i([Pfeiffer III et al., 2012]);

β: edge density

Stochastic Block

Model (SBM)

p(hni, nji | M) = p(ci, cj)ci: community assignment of node ni

p(r, s): probability of edges between communities rand s

SBM-CL p(hni, nji | M)∝p(ci, cj)wiwjnotation as above

BTER p(hni, nji | M)

space=pER I[ci=cj] + pCL I[ci6=cj]

Intra-community edge probability follows ER,

inter-community CL; [Seshadhri et al., 2012]

I[·]is the indicator function

Table 2: Notation Table

Notation Explanation

Nnetwork size, in terms of the number of nodes

Tnumber of snapshots

ttime stamp, t∈ {1,...,T}

Sset of all change points

Mt(unknown) generative model at time stamp t

Gtsnapshot at time stamp t

α, αtcontinuity rate (at time t)

swindow size, or number of snapshots in a window

Wta window of size sending at time t

ηstep size of the sliding windows

wnumber of windows: η= 1 →w=T−1; η=

s→w=dT/se

ejconnection status of dyad j

Maveraged number of edges in each snapshot: M=

PT

t=1 P(N

2)

j=1 e(t)

j/T

pj, p(t)

jconnection probability of dyad j(at time t)

knumber of dyads to be sampled and tracked

N(j)

01 number of ﬂips from 0 to 1 of dyad jduring the

period of interest

N(j)

0∗N(j)

0∗:=N(j)

00 +N(j)

01 ,N(j)

0∗+N(j)

1∗≡s−1

N(j)

∗∗ N(j)

00 , N (j)

01 , N (j)

10 , N (j)

11

N(j)

0number of disconnected occurences of dyad jin the

period of interest

use αinstead of αtin this section for brevity) given the se-

quence of generative models (Mt)t≥0(this conditional inde-

pendence assumption is satisﬁed for the choices of models in

Table 1). Note that even for generative processes that may

result in greater dependence among dyads (such as the con-

ﬁguration model), in many cases such dependence will be lo-

cal, and if the number of dyads sampled is small, then these

dyads will be spread out enough to be considered indepen-

dent. Moreover, the conditional independence assumption

signiﬁcantly improves computational efﬁciency ([Hunter et

al., 2012]). The marginal probabilities of these dyads can

then be estimated using the observed samples within each

time window.

We formalize the estimation procedure below. Given a net-

work sequence G1, . . . , GT, we group the networks into slid-

ing windows. We deﬁne Wtto be a subsequence of scon-

secutive observed networks ending at network Gt, so Wt≡

0 1

αp (to right)

1−αp 1−α(1 −p)

α(1 −p)(to left)

Figure 2: Two state Markov Chain

(Gt−s+1, Gt−s+2 , . . . , Gt). We use equal sized sliding win-

dows with a step size η, and we obtain a window sequence

Ws, Ws+η, . . . , Ws+(i−1)η, . . . , WT. Non-overlapping win-

dow setting uses η=s. In each window i, we can es-

timate the joint edge distribution (for the selected dyads)

P(e1, e2, . . . , ek|Ms+(i−1)η), where ej= 1 indicates an

undirected edge between the j-th dyad, and k << N

2is the

number of dyads tracked. For each of the models in Table 1,

the joint distribution can be factorized into P(e1, e2, . . . , ek|

Ms+(i−1)η)=ΠjP(ej|Ms+(i−1)η). (conditional indepen-

dence, see method description above)

We can view a dyad across time as a two state Markov

chain, and the chain length is the window size. We call a

dyad across time a chain in the following text for brevity. Let

pj≡P(ej= 1 |Ms+(i−1)η), qj≡1−pj, and suppose we

are interested in kchains.

1) Maximum Likelihood Estimator (MLE)

The joint probability of the chains is (Figure 2)

P(N(j=1,...,k)

∗∗ |α, ~p)

=ckΠk

j=1(αpj)N(j)

01 (αqj)N(j)

10 (1 −αpj)N(j)

00 (1 −αqj)N(j)

11

(1)

where N(j)

01 is the number of transitions from 0 to 1 for a

chain (non-edge to edge for the dyad jwithin the window),

ckstands for the combinatorial coefﬁcients independent of

α, ~p. and Pp,q ∈{0,1}N(j)

pq =s−1for all js. And hence the

log-likelihood (omitting the coefﬁcient ck) is:

L(α, ~p |N(j)

∗∗ ) =

k

X

j=1

[N(j)

01 ln(αpj) + N(j)

10 ln(αqj)

+N(j)

00 ln(1 −αpj) + N(j)

11 ln(1 −αqj)]

(2)

MLE for a single chain First consider there is only one

chain. Solving the zero-derivative Equations (4) and (5),

leads to estimators of α, p. And the estimators indeed lead to

a negative deﬁnite Hessian, and therefore is the MLE. Hence

we have

ˆαMLE =N01N1∗+N10 N0∗

N0∗N1∗

ˆpMLE =N01N1∗

N01N1∗+N10 N0∗

(3)

MLE for multiple chains The MLE for multiple chains

essentially involves solving a high degree polynomial, which

in general does not have a closed form solution.

∂L

∂pj

=N(j)

01

pj

−N(j)

10

qj

−αN(j)

00

1−αpj

+αN(j)

11

1−αqj

= 0 (4)

∂L

∂α =XN−N(j)

00 −N(j)

11

α−pjN(j)

00

1−αpj

−qjN(j)

11

1−αqj

= 0

α6=0

===⇒XN(j)

00

1−αpj

+N(j)

11

1−αqj

=kN

(5)

where 1−αis the continuity rate. If α= 0 then all snapshots

are identical, which is uninteresting, so we have α6= 0 in

Equation (5).

Combining Equations (4) and (5), one can get a high order

polynomial of pj, which in general does not have closed-form

solutions by Abel-Rufﬁni theorem. We tried solving a spe-

cial case where there are two chains by Wolfram Mathemat-

ica [mat, ]. The solutions (of two quartic functions) turn out

to be very complicated and take over 40 pages. A common

way to solve such maximization problems is to employ nu-

merical methods such as gradient descent. The drawback of

such an approach is that it can be computationally expensive

with hundreds of dyads and windows. Therefore, we settle

for an approximation of the MLE that empirically approxi-

mates numerical values well2. Intuitively, the estimator for α

should depend on all the chains, but chains that spend more

time in both states 0and 1provide more information about α

than chains that spend most time in one state (the latter may

be due to small αor to a value of pjfar from 1/2). Since

we can easily compute the MLE for αfor a single chain, we

estimate αwith a weighted average of the MLEs from the

individual chains, with chains that spend more time in both

states being weighted more heavily. We then estimate each

pjby the MLE for the jth chain, since the chains are con-

ditionally independent given α. This results in the following

estimators.

ˆαapprox =X

j

wjˆαjMLE =X

j

wj

N(j)

01 N(j)

1∗+N(j)

10 N(j)

0∗

N(j)

0∗N(j)

1∗

ˆpjapprox = ˆpjMLE =N(j)

01 N(j)

1∗

N(j)

01 N(j)

1∗+N(j)

10 N(j)

0∗

(6)

where wj=[N(j)

0∗N(j)

1∗]p

Pj[N(j)

0∗N(j)

1∗]p,and N(j)

0∗=N(j)

00 +N(j)

01

Empirically we ﬁnd the exponent p=∞works best,

which means ˆαapprox is a simple average of the αjMLE corre-

sponding to the chains with maximal value of N(j)

0∗N(j)

1∗. The

2At signiﬁcant level 0.05, two sample t-test shows the approxi-

mated values equal to the numerical values.

continuity rate describes the temporal dependency among

networks, and can help us determine a proper window size.

Drawbacks of MLE Though MLEs are consistent in gen-

eral, there is no guarantee of unbiasness for these particular

estimators with limited samples. Moreover, they have three

random quantities (N01, N10 , N1∗, N0∗in (6) have 3 degrees

of freedom for ﬁxed s) and hence require more samples to

estimate, making it prohibitive in practice.

2) Simpliﬁed Estimator

To overcome the drawbacks of MLE, we propose a simple

estimator for the edge probability which is consistent and un-

biased, has only one random quantity and therefore requires

fewer samples. The simple estimator essentially estimates the

edge frequency in each window. If we know changes happen

rarely, and the process stays in equilibrium in most of time,

we can show the following estimator to be consistent and un-

biased in equilibrium:

ˆpjeq ≡N(j)

1∗+e[s+(i−1)η]

j

s=#of 1s in the chain

s≡N(j)

1

s(7)

which is the proportion of snapshots in which the dyad j

being an edge within the window.

Proposition 1 In equilibrium, ˆpjeq is consistent as chain

length (window size) increases.

Proof. π(j)

0≡P(non-edge of chain jin equilibrium), s ≡

N(j)

0+N(j)

1(ﬁxed), N(j)

1≡#of 1s in the chain

By ergodic theorem [Givens and Hoeting, 2012],

π(j)

1

almost surely

=========== lims→∞ N(j)

1/s = lims→∞ ˆpjeq, where

the ﬁrst equation means almost sure convergence, and implies

convergence in probability (estimator being consistent).

Proposition 2 In equilibrium, ˆpjeq is unbiased.

Proof. p(j)

01 ≡pjα, p(j)

10 ≡(1 −pj)α. (Figure 2)

π(j)

0p(j)

01 +π(j)

1p(j)

11 =π(j)

1=⇒π(j)

0p(j)

01 =π(j)

1p(j)

10

=⇒π(j)

0αpj=π(j)

1α(1 −pj)

=⇒pj=π(j)

1=EN(j)

1/(N(j)

1+N(j)

0) = EN(j)

1/s

=⇒Eˆpjeq =EN(j)

1/s =pj=⇒ˆpjeq is unbiased

The above propositions imply that the larger the window

size the better the estimation, and that in equilibrium, the

temporal dependency (continuity rate) has no impact on es-

timating the onset probability of a Markov chain, and hence

no impact on estimating the edge probability of a snapshot.

Although MLE is close to the true value when the chain is

long enough (100 or longer), we do not use so large a window

size in practice (20 usually, no larger than 50). Experiments

(Figure 3) show that the simpliﬁed estimator is much better

than MLE for change point detection in practice.

Figure 3: Comparison between MLE (red, top) and Simpliﬁed

Estimator (green, bottom) for change point detection on the same

sequence as Figure 4 and Table 3. Distance measure KL is used

and other measures have consistent results; horizontal bars are cor-

responding thresholds. MLE has large ﬂuctuation, and increasing

window size reduces ﬂuctuation; MLE misses true changes at 3

,

4

and 5

, while edge frequency estimator has perfect recall and

precision (see Section 5 for detail). (re-scaled and shifted for visu-

alization)

4.2 Distance measure

Now, we need to compare the probability distributions of

edges across consecutive windows. Kolmogorov-Smirnov

(KS) statistic and Kullback-Leibler (KL) divergence are two

common measures for comparing distribution. Their calcu-

lations require the enumeration of the whole state space and

hence exponential to the number of variables for joint distri-

butions. Although KS statistic is designed for univariate dis-

tribution, we can map the joint distribution, which has mul-

tivariate binary variables, to one dimension by decoding the

binary vectors as an integer and use KS statistic. We bootstrap

from empirical distributions of two consecutive windows re-

spectively and use two sample KS test to quantify the differ-

ence of two distributions. We can use divide-and-conquer to

alleviate the exponential complexity: partition the dyads into

ggroups, compute KL/KS dissimilarity within each small

group, and record the median among all the ggroups as the

ﬁnal dissimilarity.

Both of the above measures have good quality in terms

of change point detection (Figure 4), but KS statistic is ex-

tremely slow (Table 4), mostly due to the large sample boot-

strap from each window. Euclidean distance, though lack of

probability interpretation, has linear complexity and has rea-

sonable quality in practice.

4.3 Threshold Determination

Suppose we have wwindows, then we compare w−1pairs of

distributions and get w−1difference/distance scores. How

do we choose a threshold to determine at which window the

network changes? We use a permutation test [Pitman, 1937]

based approach to determine the threshold. For a desired

signiﬁcance level αs, we bootstrap from the w−1distance

scores, and use the upper 100αs%quantile as the threshold.

4.4 Complexity Analysis

The algorithm is linear to the number of windows and con-

stant to the network size for moderately large network. Only a

small fraction of dyads in the network is sampled and tracked.

The sampling of the dyads is only performed once at the be-

ginning, and hence irrelevant to the number of snapshots. For

each snapshot, selecting a speciﬁc set of dyads has linear cost

to the number of edges. Each window is only scanned once

and therefore the time cost is linear to the number of win-

dows. Moreover, since the number of windows is linear to

the total number of snapshots even in the worst case (win-

dows are overlapping, and window step is one), the algorithm

is linear to the number of snapshots. Therefore, the time com-

plexity is O(¯

M T ), where ¯

Mis the averaged number of edges

in each snapshot, and Tis the number of snapshots.

The memory cost is low, and can be viewed as constant:

for each snapshot, only the information of the tracked dyads

is stored; information of dyads within the same window is ag-

gregated; dyads information in the old window is overwritten

once it is compared against the new window. And the space

complexity is O(c), where cis a prescribed sample size. The-

oretically the sample size should be proportional to the net-

work size for good estimation. Our experiments show that a

ﬁxed sample size (to track 250 out of 1.2G dyads ≈50k

2)

works well on a moderately large network.

(a) Likelihood (ground truth)

(b) Algorithms comparison. Curves are dissimilarity scores and

horizontal bars are thresholds, and they two have corresponding

colors and line shapes. (re-scaled and shifted for visualization)

Figure 4: SBM, ground truth changes explained in Table 3. 1−α

= 0.51 and window size = 20. DeltaCon (Fig b-top) and EM-KL

(Fig b-bottom) have the smallest variance, but DeltaCon has two

false negatives at 4

and 5

.

Table 3: Model Change Explanation for an SBM-CL Experiment in Figure 4

Order Window Index Type of Change

1 15 The weight sequence of 1/3 of the nodes is re-generated

2 30 The weight sequence of 2/3 of the nodes is re-generated

3 60 Half of the communities change their (inter- and intra-community) connection rate, overall density retained

4 75 All of the communities change their (inter- and intra-community) connection rate, overall density retained

5 90 Half of the communities change their (inter- and intra- community) connection rate, overall density changed

6 105 All of the communities change their (inter- and intra- community) connection rate, overall density changed

7 135 Community assignments of all the nodes are changed

5 Experiments And Results

We did thorough evaluation of our edge probability es-

timation based change point detection algorithm (called

EdgeMonitoring for simplicity) on synthetic and real world

datasets. For the synthetic datasets, the generative process

is known, and we can compute the ground truth in the form

of likelihood, which is naturally a baseline choice. We also

use the state-of-the-art DeltaCon [Koutra et al., 2016]and

LetoChange [Peel and Clauset, 2015]as two baselines.

Figure 5: Change point detection on US Senate co-sponsorship

network. Change points at the 100th and the 104th Congresses

(boxed) correspond to partisan domination shifts. Both EM-KL

(green) and LetoChange (cyan) have perfect recall and precision,

while DeltaCon (pink) has 3 false positives and 1 false negative.

5.1 Synthetic Data

Data generation3We generate a sequence of networks from

a ﬁxed generation model. The snapshots are not independent,

each snapshot depends on the preceding one through the con-

tinuity parameter α(αt≡α). For each snapshot, each edge

is selected independently with probability α, and if selected,

the edge is again sampled from the generative model (Table

1). We introduce the change points by changing the genera-

tive model in the middle of the sequence of snapshots. Note

that this change may be simply a change of parameter val-

ues for a given model (Eg. ER0.4to ER0.6), or a change in

the model type (Eg. SBM to ER), as well. Since our algo-

rithm makes no assumptions about model speciﬁcs, we are

able to detect both kinds of changes. We only inject parame-

ter change in the synthetic experiment since the latter change

is easily detectable. Sample changes are displayed in Table 3.

The likelihood of the snapshot sequence is also provided.

3Generated using SNAP[Leskovec and Rok Sosiˇ

c, 2014]

We ran experiments with network sizes ranging from 1k to

50k, window size to be from 10 to 100 and continuity rate

1−αto be 0.51 and 0.9. We generated a total of 5000 snap-

shots and sampled 250 edges uniformly at random to track.

Both overlapping window (s=2η) and non-overlapping win-

dow have similar results, yet the latter is faster simply due to

fewer windows. Hence we display non-overlapping window

results only. For KL and KS, edges are grouped into 25 equal-

sized groups. We use upper 5% quantile as the threshold.

Results Figure 4 shows the qualitative comparison and

Table 4 reports the efﬁciency. Figure 4a shows the likeli-

hood of the network drops dramatically after the generative

model changes, and recovers to new equilibrium afterwards.

Our EdgeMonitoring (EM-Eu, EM-KL) approach can suc-

cessfully identify all change points with 5X speed up over

DeltaCon. The changes are explained in Table 3. We can

see that EM-KL has the best performance: little ﬂuctuation

and perfect precision and recall. DeltaCon, though has small-

est ﬂuctuation, misses two change points. Both EM-KS and

EM-Eu have large ﬂuctuation. The quality of EM-KS heav-

ily relies on the joint probability estimation, and we do see

smaller ﬂuctuation and higher recall for larger window size.

EM-Eu in general has large ﬂuctuation. EM-KL has the best

overall performance, in terms of both quality and time efﬁ-

ciency. We believe grouping together with median selection

contribute to its superiority.

Table 4: Time efﬁciency comparison (5k snapshots)

Model Network

Size

Window

Size

EM

Time1

EM-KS

Time

DC Time

(speedup)

LC

Time

CL 1k 20 18s 11h 091s (5X) DNF

SBM-CL 1k 10 27s 22h 125s (5X) DNF

SBM-CL 1k 50 9s 4.5h 043s (5X) DNF

SBM-CL 5k 20 54s 11h 309s (6X) DNF

SBM-CL 10k 20 232s 10h .32m(8X) DNF

SBM-CL 50k 20 26m 10h i04h (9X) DNF

BTER21k 20 3s 87m 012s (4X) 6h

Figure 4 1k 20 21s 6.5h 103s (5X) DNF

Figure 5 100 biennial 4s 43m 016s (4X) 13h

[Voeten, 2012]≈200 annual 10s 3h 093s (9X) DNF

Enron 150 weekly 1s 7.5h 001s (1X) 60h

1EM for EdgeMonitoring (running time includes both KL and Euclidean),

EM-KS for EdgeMonitoring with KS test, DC for DeltaCon, LC for

LetoChange. EM and DC are implemented in MATLAB while LC in

Python. All run on a commercial desktop with 48hrs as time limit. Each

running time averaged over 5 runs.

2BTER dataset has 800 snapshots

5.2 Real World Data

Senate cosponsorship network([Fowler, 2006]) We con-

struct a co-sponsorship network from bills (co-)sponsored

in US Senate during the 93rd-108th Congress. An edge is

formed between two congresspersons if they cosponsored the

same bill. Each bill corresponds to a snapshot, and forms a

clique of co-sponsors. A window is set to include all bills in

a single Congress (Biennially).

We randomly selected 250 dyads and tracked their ﬂuctua-

tions across the Congresses. We start from the 97th Congress

since full amendments data is available only from 97th ses-

sion onwards. Figure 5 compares EdgeMonitoring+KL,

DeltaCon and LetoChange. All methods were able to de-

tect the most signiﬁcant change point at the 104th Congress.

Fowler [Fowler, 2006]points out that there was a “Repub-

lican Revolution” in the 104th Congress which “caused a

dramatic change in the partisan and seniority compositions.”

The author also points out the signiﬁcance of the 100th (high-

est clustering coefﬁcient, signiﬁcant collaboration) and 104th

Congress (lowest clustering coefﬁcient, low point in collab-

oration) as inﬂection points in the Senate political process.

Both our EdgeMonitoring approach and LetoChange classify

these two Congresses as change points, but the latter takes

much more time. DeltaCon picks up on one (104th) and not

the other (100th). This provides evidence that our algorithm

is able to capture the changes in network evolution effectively

while being signiﬁcantly faster than the state-of-the-art.

6 Conclusion

In this paper, we develop a change point detection algorithm

for dynamic networks that is efﬁcient and accurate. Our ap-

proach relies on sampling and comparing the estimated joint

edge (dyad) distribution. We ﬁrst develop a maximum likeli-

hood estimator, and analyze its drawbacks for small window

sizes (the typical case). We then develop a consistent and un-

biased estimator that overcomes the drawbacks of the MLE,

resulting in signiﬁcant quality improvement over the MLE.

We conduct a thorough evaluation of our change point de-

tection algorithm against two state-of-the-art DeltaCon and

LetoChange on synthetic as well as the real world datasets.

Our results indicate that our method is up to 9X faster than

DeltaCon while achieving better quality. In the future we plan

to extend our work to track higher order structures of the net-

work such as 3-proﬁles [Elenberg et al., 2015]or 4-proﬁles

and see how they evolve over time.

Acknowledgments

This work is supported in part by NSF grant DMS-1418265,

IIS-1550302 and IIS-1629548. Any opinions, ﬁndings, and

conclusions or recommendations expressed in this material

are those of the authors and do not necessarily reﬂect the

views of the National Science Foundation.

References

[Akoglu et al., 2014]Leman Akoglu, Hanghang Tong, and

Danai Koutra. Graph-based anomaly detection and de-

scription: A survey. Data Mining and Knowledge Dis-

covery (DAMI), 28(4), 2014.

[Berlingerio et al., 2012]Michele Berlingerio, Danai

Koutra, Tina Eliassi-Rad, and Christos Faloutsos. Net-

simile: a scalable approach to size-independent network

similarity. arXiv preprint arXiv:1209.2684, 2012.

[Bridges et al., 2015]Robert A Bridges, John P Collins,

Erik M Ferragut, Jason A Laska, and Blair D Sullivan.

Multi-level anomaly detection on time-varying graph data.

In Proceedings of the 2015 IEEE/ACM International Con-

ference on Advances in Social Networks Analysis and Min-

ing 2015, pages 579–583. ACM, 2015.

[Caceres and Berger-Wolf, 2013]Rajmonda Sulo Caceres

and Tanya Berger-Wolf. Temporal scale of dynamic net-

works. In Temporal Networks, pages 65–94. Springer,

2013.

[Elenberg et al., 2015]Ethan R Elenberg, Karthikeyan Shan-

mugam, Michael Borokhovich, and Alexandros G Di-

makis. Beyond triangles: A distributed framework for es-

timating 3-proﬁles of large graphs. In Proceedings of the

21th ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, pages 229–238. ACM,

2015.

[Erd˝

os and R´

enyi, 1960]Paul Erd˝

os and A R´

enyi. On the

evolution of random graphs. Publ. Math. Inst. Hungar.

Acad. Sci, 5:17–61, 1960.

[Fowler, 2006]James H Fowler. Legislative cosponsorship

networks in the US House and Senate. Social Networks,

28(4):454–465, 2006.

[Givens and Hoeting, 2012]Geof H Givens and Jennifer A

Hoeting. Computational statistics, volume 710. John Wi-

ley & Sons, 2012.

[Hunter et al., 2012]David R Hunter, Pavel N Krivitsky, and

Michael Schweinberger. Computational statistical meth-

ods for social network models. Journal of Computational

and Graphical Statistics, 21(4):856–882, 2012.

[Karrer and Newman, 2011]Brian Karrer and Mark EJ New-

man. Stochastic blockmodels and community structure in

networks. Physical Review E, 83(1):016107, 2011.

[Klimt and Yang, 2004]Bryan Klimt and Yiming Yang. The

enron corpus: A new dataset for email classiﬁcation re-

search. In Machine learning: ECML 2004, pages 217–

226. Springer, 2004.

[Koutra et al., 2016]Danai Koutra, Neil Shah, Joshua T Vo-

gelstein, Brian Gallagher, and Christos Faloutsos. Delta-

con: Principled Massive-Graph Similarity Function with

Attribution. ACM Transactions on Knowledge Discovery

from Data (TKDD), 10(3):28, 2016.

[La Fond et al., 2014]Timothy La Fond, Jennifer Neville,

and Brian Gallagher. Anomaly detection in networks with

changing trends, 2014.

[Leskovec and Rok Sosiˇ

c, 2014]Jure Leskovec and Rok

Sosiˇ

c. SNAP: A general purpose network analysis

and graph mining library in C++. http://snap.

stanford.edu/snap, Jun 2014.

[Li et al., 2016]Shuang Li, Yao Xie, Mehrdad Farajtabar,

and Le Song. Detecting weak changes in dynamic events

over networks. arXiv preprint arXiv:1603.08981, 2016.

[Loglisci et al., 2015]Corrado Loglisci, Michelangelo Ceci,

and Donato Malerba. Relational mining for discovering

changes in evolving networks. Neurocomputing, 150:265–

288, 2015.

[mat, ]Wolfram mathematica. https://www.

wolfram.com/mathematica/. Accessed: 2017-06-

03.

[Moreno and Neville, 2013]Sebastian Moreno and Jennifer

Neville. Network hypothesis testing using mixed kro-

necker product graph models. In Data Mining (ICDM),

2013 IEEE 13th International Conference on, pages 1163–

1168. IEEE, 2013.

[Peel and Clauset, 2015]Leto Peel and Aaron Clauset. De-

tecting change points in the large-scale structure of evolv-

ing networks. In Twenty-Ninth AAAI Conference on Arti-

ﬁcial Intelligence, 2015.

[Peixoto and Rosvall, 2015]Tiago P Peixoto and Martin

Rosvall. Modeling sequences and temporal networks

with dynamic community structures. arXiv preprint

arXiv:1509.04740, 2015.

[Pfeiffer III et al., 2012]Joseph J Pfeiffer III, Timothy

La Fond, Sebastian Moreno, and Jennifer Neville. Fast

generation of large scale social networks with clustering.

arXiv preprint arXiv:1202.4805, 2012.

[Pitman, 1937]Edwin JG Pitman. Signiﬁcance tests which

may be applied to samples from any populations. Sup-

plement to the Journal of the Royal Statistical Society,

4(1):119–130, 1937.

[Ranshous et al., 2015]Stephen Ranshous, Shitian Shen,

Danai Koutra, Steve Harenberg, Christos Faloutsos, and

Nagiza F Samatova. Anomaly detection in dynamic net-

works: a survey. Wiley Interdisciplinary Reviews: Compu-

tational Statistics, 7(3):223–247, 2015.

[Seshadhri et al., 2012]C Seshadhri, Tamara G Kolda, and

Ali Pinar. Community structure and scale-free collections

of erd˝

os-r´

enyi graphs. Physical Review E, 85(5):056109,

2012.

[Shih and Parthasarathy, 2012]Yu-Keng Shih and Srini-

vasan Parthasarathy. Identifying functional modules in in-

teraction networks through overlapping markov clustering.

Bioinformatics, 28(18):i473–i479, 2012.

[Voeten, 2012]Erik Voeten. Data and analyses of voting in

the UN general assembly. Available at SSRN 2111149,

2012.

[Zhang et al., 2016]Xiao Zhang, Cristopher Moore, and

MEJ Newman. Random graph models for dynamic net-

works. arXiv preprint arXiv:1607.07570, 2016.