Content uploaded by David Sivakoff
Author content
All content in this area was uploaded by David Sivakoff on Sep 18, 2017
Content may be subject to copyright.
Fast Change Point Detection on Dynamic Social Networks
Yu Wang∗Aniket Chakrabarti∗David Sivakoff#Srinivasan Parthasarathy∗
∗Department of Computer Science and Engineering, #Department of Statistics
The Ohio State University, Columbus, Ohio, USA
Contact: wang.5205@osu.edu or srini@cse.ohio-state.edu
Abstract
A number of real world problems in many domains
(e.g. sociology, biology, political science and com-
munication networks) can be modeled as dynamic
networks with nodes representing entities of inter-
est and edges representing interactions among the
entities at different points in time. A common rep-
resentation for such models is the snapshot model -
where a network is defined at logical time-stamps.
An important problem under this model is change
point detection. In this work we devise an effec-
tive and efficient three-step-approach for detect-
ing change points in dynamic networks under the
snapshot model. Our algorithm achieves up to 9X
speedup over the state-of-the-art while improving
quality on both synthetic and real world networks.
1 Introduction
Dynamic network analysis is increasingly used in complex
application domains ranging from social networks (Face-
book network evolution [Leskovec and Rok Sosiˇ
c, 2014])
to biological networks (protein-protein interaction [Shih and
Parthasarathy, 2012]), from political science (United Na-
tions General Assembly voting network [Voeten, 2012]) to
communication networks (Enron network [Klimt and Yang,
2004]). Such dynamic networks are often represented using
the snapshot model. Under this model, every network snap-
shot (represented by a graph) is defined at a logical times-
tamp. Two questions of fundamental importance are – (i) how
does a network evolve? (ii) when does a network change sig-
nificantly so as to arise suspicion that something fundamen-
tally different is happening?
Various generative models [Peixoto and Rosvall, 2015;
Zhang et al., 2016]have been proposed to address ques-
tion (i) - to explain the evolution of a network. They study
network evolution under certain generative models [Erd˝
os
and R´
enyi, 1960; Karrer and Newman, 2011]. In reality,
the generative model itself might change, as addressed in
question (ii) above. Existing work [Akoglu et al., 2014;
Ranshous et al., 2015]use complex methods to detect such
changes. One drawback of those delicate methods is that they
are time-consuming, and hence often not scalable (in terms of
both network size and number of snapshots). We seek to find
an efficient and effective solution that can scale up both with
network size and with number of snapshots.
In this paper, we present a simple and efficient algo-
rithm based on likelihood maximization to detect change
points in dynamic networks under the snapshot model. We
demonstrate the utility of our algorithm on both synthetic
and real world networks drawn from political science (con-
gressional voting, UN voting), and show that it outperforms
two recent approaches (DeltaCon[Koutra et al., 2016], and
LetoChange[Peel and Clauset, 2015]) in terms of both qual-
ity and efficiency. Our work has the following contributions:
1. Our approach is general purpose – it can accommodate
various snapshot generative models (see Table 1).
2. We model network evolution as a first order Markov pro-
cess and consequently our algorithm accounts for the
temporal dependency while computing the dissimilarity
between snapshots.
3. Our algorithm is efficient and has constant memory over-
head that can be tuned by a user controlled parameter.
We extensively evaluate our approach on synthetic as well as
real world networks and show that our approach is extremely
efficient (both in performance and quality).
2 Related Work
Ranshous et al. [Ranshous et al., 2015], and Akoglu et al.
[Akoglu et al., 2014]recently survey network anomaly detec-
tion. Our change point detection problem is similar to Type
4, the “Event and Change Detection”, of the former: given
a network sequence, a dissimilarity scoring function, and a
threshold, a change is flagged if the dissimilarity of two con-
secutive snapshots is above the threshold. We differ in that we
assume there is a latent generation model governing the net-
work dynamics, and we are trying to detect the change in the
latent space, while they did not explicitly mention the latent
generation model. Moreover, we consider the temporal de-
pendency across the snapshots while no work in the surveys
accounted for temporal dependency.
DeltaCon [Koutra et al., 2016]uses a graph similarity-
based [Berlingerio et al., 2012]approach to detect change
points in dynamic networks. It derives the features of a snap-
shot based on sociological theories. And the feature similarity
of each consecutive snapshot pair is calculated. That work is
model agnostic (has no assumption on the generation model
arXiv:1705.07325v2 [cs.SI] 4 Jun 2017
of networks), and is the state-of-the-art in terms of efficiency.
We compare our algorithm against this.
Moreno and Neville [Moreno and Neville, 2013], Bridges
et al. [Bridges et al., 2015]and Peel and Clauset [Peel and
Clauset, 2015]develop network hypothesis testing based ap-
proaches. The advantage is that one can get a p-value of the
test, which quantifies the confidence of the conclusion. How-
ever, these approaches have two shortcomings: firstly, they
need to assume a specific generation model of the networks
(mKPGM, GBTER and GHRG respectively); secondly, they
are extremely slow, mostly due to the bootstrapping for p-
value calculation. La Fond et al.’s work [La Fond et al., 2014]
can also generate a p-value. It is tested against DeltaCon
without reporting running time and efficiency concern is also
mentioned in the paper. These algorithms will not work in our
setting where the detection is done real time under bounded
memory constraints. We compare our model agnostic algo-
rithm against [Peel and Clauset, 2015].
The DAPPER heuristic [Caceres and Berger-Wolf, 2013]
proposes a similar edge probability estimator as ours. How-
ever, it does not consider the temporal dependency of snap-
shots. Moreover, it focuses on temporal scale determination
while ours focuses on change point detection. Loglisci et al.
[Loglisci et al., 2015]study change point detection on rela-
tional network using rule-based analysis. Our approach uses
(hidden) parameter estimation instead of semantic rule to in-
fer the structure. Li et al. [Li et al., 2016]propose an online
algorithm, and consider temporal dependency. The problem
they study is different from ours in that they study informa-
tion diffusion on network with fixed structure and use con-
tinuous time. A recent work by Zhang et al. [Zhang et al.,
2016]also studies the dynamic network in a Markov chain
setting. They focus on community detection while we focus
on change point detection.
3 Problem Formulation
This paper studies how to detect the times at which the funda-
mental evolution mechanism of a dynamic network changes.
We assume that there is some unknown underlying model that
governs the generative process. Our change point detection
algorithm is agnostic to this model. We assume that the ob-
served network snapshots are samples that depend on some
generative model and the previous snapshot. Networks have
fluctuation across snapshots even when the generative model
stays unchanged. Only when the generative model changes
do we consider it a fundamental change. We represent the
evolutionary process as a Markov Network (Figure 1).
In Figure 1, Mtis the network generation model at time t.
It is a triad Mt=hTypet,Θt, αti, where αtis the continu-
ity parameter at time t, Typetspecifies the model, while Θt
represents the model parameters (Table 1 consists of some
generative models we experiment on). Gtis the network
(graph) observable at time t. We assume the number of ver-
tices in Gtis fixed to be Nfor all snapshots (the union of all
nodes is used when there is node addition/deletion, as in [Peel
and Clauset, 2015]), so each Gthas 2(N
2)possible configu-
rations, and Tis the total number of snapshots we observe.
As per Figure 1 the configuration of the network at time t,
Gt, depends on the generation model at time t,Mt(unob-
Latent M1M2M3Mt
G1G2G3Gt
Observed
...
...
Figure 1: Representation of the underlying generative process. Our
inference is agnostic to Mts.
served) and the network configuration at time t−1,Gt−1
(observed). Hence the networks in the observed sequence are
samples from a conditional distribution (samples are not in-
dependent). The continuity rate parameter αtcontrols the
fraction of edges and non-edges that are retained from the
previous snapshot, Gt−1. The network at time tis assumed
to be generated in the following way: for each dyad, keep the
connection status from time t−1with probability 1−αt, and
with probability αt, resample the connection according to the
generation model at time t. Consequently, the smaller αtis,
the more overlap between two consecutive snapshots there is.
Note that two consecutive network configurations may differ
substantially if αt>0, even though the underlying genera-
tion model may be the same. Moreover, the changes of the
generation model are assumed to be rare across the time span
(Mt6=Mt+1 is a rare event).
Problem Definition Our goal is to efficiently find a set S⊂
{2, . . . , T }such that t∈S⇐⇒ Mt6=Mt−1, that is,
to efficiently find all the time points at which the network
generation model is different from the previous time point.
4 Methodology
Given the graphical formulation of the problem, exact infer-
ence is impossible since we do not know the underlying gen-
erative model, and our observations are stochastic. However,
even without prior knowledge of the generative model, we
can still design an approximate inference technique based on
MCMC sampling theory.
The framework is straightforward, as mentioned in Sec-
tion 2: we first extract a “feature vector” from each snapshot,
then quantify the dissimilarity between consecutive snap-
shots, and flag out a change point when the dissimilarity score
is above a threshold. We use the joint edge probability as the
“feature vector” (Section 4.1), exploit Kolmogorov-Smirnov
statistic, Kullback-Leibler divergence and Euclidean distance
for dissimilarity measure (Section 4.2), and use a permutation
test like approach to determine the threshold (Section 4.3).
4.1 Edge Probability Estimation
In this subsection, we discuss how to (approximately) esti-
mate the joint distribution of the dyads 1. We track the pres-
ence or absence of a small fixed number of dyads through-
out the entire observed sequence of network snapshots. We
break down the observation sequence into fixed-length win-
dows, and for each window we infer the joint distribution of
the dyads in our sample. We model each dyad to be a condi-
tionally independent two-state Markov chain (Figure 2, we
1we refer node pairs, which may or not be linked, as dyads
Table 1: Edge probability between a dyad in each model
Model Edge probability Explanation
Erd˝
os–R´
enyi
(ER)
p(hni, nji | M) = p p: edge probability
Chung–Lu (CL) p(hni, nji | M) = βwiwj/Piwiwi: weight of node i([Pfeiffer III et al., 2012]);
β: edge density
Stochastic Block
Model (SBM)
p(hni, nji | M) = p(ci, cj)ci: community assignment of node ni
p(r, s): probability of edges between communities rand s
SBM-CL p(hni, nji | M)∝p(ci, cj)wiwjnotation as above
BTER p(hni, nji | M)
space=pER I[ci=cj] + pCL I[ci6=cj]
Intra-community edge probability follows ER,
inter-community CL; [Seshadhri et al., 2012]
I[·]is the indicator function
Table 2: Notation Table
Notation Explanation
Nnetwork size, in terms of the number of nodes
Tnumber of snapshots
ttime stamp, t∈ {1,...,T}
Sset of all change points
Mt(unknown) generative model at time stamp t
Gtsnapshot at time stamp t
α, αtcontinuity rate (at time t)
swindow size, or number of snapshots in a window
Wta window of size sending at time t
ηstep size of the sliding windows
wnumber of windows: η= 1 →w=T−1; η=
s→w=dT/se
ejconnection status of dyad j
Maveraged number of edges in each snapshot: M=
PT
t=1 P(N
2)
j=1 e(t)
j/T
pj, p(t)
jconnection probability of dyad j(at time t)
knumber of dyads to be sampled and tracked
N(j)
01 number of flips from 0 to 1 of dyad jduring the
period of interest
N(j)
0∗N(j)
0∗:=N(j)
00 +N(j)
01 ,N(j)
0∗+N(j)
1∗≡s−1
N(j)
∗∗ N(j)
00 , N (j)
01 , N (j)
10 , N (j)
11
N(j)
0number of disconnected occurences of dyad jin the
period of interest
use αinstead of αtin this section for brevity) given the se-
quence of generative models (Mt)t≥0(this conditional inde-
pendence assumption is satisfied for the choices of models in
Table 1). Note that even for generative processes that may
result in greater dependence among dyads (such as the con-
figuration model), in many cases such dependence will be lo-
cal, and if the number of dyads sampled is small, then these
dyads will be spread out enough to be considered indepen-
dent. Moreover, the conditional independence assumption
significantly improves computational efficiency ([Hunter et
al., 2012]). The marginal probabilities of these dyads can
then be estimated using the observed samples within each
time window.
We formalize the estimation procedure below. Given a net-
work sequence G1, . . . , GT, we group the networks into slid-
ing windows. We define Wtto be a subsequence of scon-
secutive observed networks ending at network Gt, so Wt≡
0 1
αp (to right)
1−αp 1−α(1 −p)
α(1 −p)(to left)
Figure 2: Two state Markov Chain
(Gt−s+1, Gt−s+2 , . . . , Gt). We use equal sized sliding win-
dows with a step size η, and we obtain a window sequence
Ws, Ws+η, . . . , Ws+(i−1)η, . . . , WT. Non-overlapping win-
dow setting uses η=s. In each window i, we can es-
timate the joint edge distribution (for the selected dyads)
P(e1, e2, . . . , ek|Ms+(i−1)η), where ej= 1 indicates an
undirected edge between the j-th dyad, and k << N
2is the
number of dyads tracked. For each of the models in Table 1,
the joint distribution can be factorized into P(e1, e2, . . . , ek|
Ms+(i−1)η)=ΠjP(ej|Ms+(i−1)η). (conditional indepen-
dence, see method description above)
We can view a dyad across time as a two state Markov
chain, and the chain length is the window size. We call a
dyad across time a chain in the following text for brevity. Let
pj≡P(ej= 1 |Ms+(i−1)η), qj≡1−pj, and suppose we
are interested in kchains.
1) Maximum Likelihood Estimator (MLE)
The joint probability of the chains is (Figure 2)
P(N(j=1,...,k)
∗∗ |α, ~p)
=ckΠk
j=1(αpj)N(j)
01 (αqj)N(j)
10 (1 −αpj)N(j)
00 (1 −αqj)N(j)
11
(1)
where N(j)
01 is the number of transitions from 0 to 1 for a
chain (non-edge to edge for the dyad jwithin the window),
ckstands for the combinatorial coefficients independent of
α, ~p. and Pp,q ∈{0,1}N(j)
pq =s−1for all js. And hence the
log-likelihood (omitting the coefficient ck) is:
L(α, ~p |N(j)
∗∗ ) =
k
X
j=1
[N(j)
01 ln(αpj) + N(j)
10 ln(αqj)
+N(j)
00 ln(1 −αpj) + N(j)
11 ln(1 −αqj)]
(2)
MLE for a single chain First consider there is only one
chain. Solving the zero-derivative Equations (4) and (5),
leads to estimators of α, p. And the estimators indeed lead to
a negative definite Hessian, and therefore is the MLE. Hence
we have
ˆαMLE =N01N1∗+N10 N0∗
N0∗N1∗
ˆpMLE =N01N1∗
N01N1∗+N10 N0∗
(3)
MLE for multiple chains The MLE for multiple chains
essentially involves solving a high degree polynomial, which
in general does not have a closed form solution.
∂L
∂pj
=N(j)
01
pj
−N(j)
10
qj
−αN(j)
00
1−αpj
+αN(j)
11
1−αqj
= 0 (4)
∂L
∂α =XN−N(j)
00 −N(j)
11
α−pjN(j)
00
1−αpj
−qjN(j)
11
1−αqj
= 0
α6=0
===⇒XN(j)
00
1−αpj
+N(j)
11
1−αqj
=kN
(5)
where 1−αis the continuity rate. If α= 0 then all snapshots
are identical, which is uninteresting, so we have α6= 0 in
Equation (5).
Combining Equations (4) and (5), one can get a high order
polynomial of pj, which in general does not have closed-form
solutions by Abel-Ruffini theorem. We tried solving a spe-
cial case where there are two chains by Wolfram Mathemat-
ica [mat, ]. The solutions (of two quartic functions) turn out
to be very complicated and take over 40 pages. A common
way to solve such maximization problems is to employ nu-
merical methods such as gradient descent. The drawback of
such an approach is that it can be computationally expensive
with hundreds of dyads and windows. Therefore, we settle
for an approximation of the MLE that empirically approxi-
mates numerical values well2. Intuitively, the estimator for α
should depend on all the chains, but chains that spend more
time in both states 0and 1provide more information about α
than chains that spend most time in one state (the latter may
be due to small αor to a value of pjfar from 1/2). Since
we can easily compute the MLE for αfor a single chain, we
estimate αwith a weighted average of the MLEs from the
individual chains, with chains that spend more time in both
states being weighted more heavily. We then estimate each
pjby the MLE for the jth chain, since the chains are con-
ditionally independent given α. This results in the following
estimators.
ˆαapprox =X
j
wjˆαjMLE =X
j
wj
N(j)
01 N(j)
1∗+N(j)
10 N(j)
0∗
N(j)
0∗N(j)
1∗
ˆpjapprox = ˆpjMLE =N(j)
01 N(j)
1∗
N(j)
01 N(j)
1∗+N(j)
10 N(j)
0∗
(6)
where wj=[N(j)
0∗N(j)
1∗]p
Pj[N(j)
0∗N(j)
1∗]p,and N(j)
0∗=N(j)
00 +N(j)
01
Empirically we find the exponent p=∞works best,
which means ˆαapprox is a simple average of the αjMLE corre-
sponding to the chains with maximal value of N(j)
0∗N(j)
1∗. The
2At significant level 0.05, two sample t-test shows the approxi-
mated values equal to the numerical values.
continuity rate describes the temporal dependency among
networks, and can help us determine a proper window size.
Drawbacks of MLE Though MLEs are consistent in gen-
eral, there is no guarantee of unbiasness for these particular
estimators with limited samples. Moreover, they have three
random quantities (N01, N10 , N1∗, N0∗in (6) have 3 degrees
of freedom for fixed s) and hence require more samples to
estimate, making it prohibitive in practice.
2) Simplified Estimator
To overcome the drawbacks of MLE, we propose a simple
estimator for the edge probability which is consistent and un-
biased, has only one random quantity and therefore requires
fewer samples. The simple estimator essentially estimates the
edge frequency in each window. If we know changes happen
rarely, and the process stays in equilibrium in most of time,
we can show the following estimator to be consistent and un-
biased in equilibrium:
ˆpjeq ≡N(j)
1∗+e[s+(i−1)η]
j
s=#of 1s in the chain
s≡N(j)
1
s(7)
which is the proportion of snapshots in which the dyad j
being an edge within the window.
Proposition 1 In equilibrium, ˆpjeq is consistent as chain
length (window size) increases.
Proof. π(j)
0≡P(non-edge of chain jin equilibrium), s ≡
N(j)
0+N(j)
1(fixed), N(j)
1≡#of 1s in the chain
By ergodic theorem [Givens and Hoeting, 2012],
π(j)
1
almost surely
=========== lims→∞ N(j)
1/s = lims→∞ ˆpjeq, where
the first equation means almost sure convergence, and implies
convergence in probability (estimator being consistent).
Proposition 2 In equilibrium, ˆpjeq is unbiased.
Proof. p(j)
01 ≡pjα, p(j)
10 ≡(1 −pj)α. (Figure 2)
π(j)
0p(j)
01 +π(j)
1p(j)
11 =π(j)
1=⇒π(j)
0p(j)
01 =π(j)
1p(j)
10
=⇒π(j)
0αpj=π(j)
1α(1 −pj)
=⇒pj=π(j)
1=EN(j)
1/(N(j)
1+N(j)
0) = EN(j)
1/s
=⇒Eˆpjeq =EN(j)
1/s =pj=⇒ˆpjeq is unbiased
The above propositions imply that the larger the window
size the better the estimation, and that in equilibrium, the
temporal dependency (continuity rate) has no impact on es-
timating the onset probability of a Markov chain, and hence
no impact on estimating the edge probability of a snapshot.
Although MLE is close to the true value when the chain is
long enough (100 or longer), we do not use so large a window
size in practice (20 usually, no larger than 50). Experiments
(Figure 3) show that the simplified estimator is much better
than MLE for change point detection in practice.
Figure 3: Comparison between MLE (red, top) and Simplified
Estimator (green, bottom) for change point detection on the same
sequence as Figure 4 and Table 3. Distance measure KL is used
and other measures have consistent results; horizontal bars are cor-
responding thresholds. MLE has large fluctuation, and increasing
window size reduces fluctuation; MLE misses true changes at 3
,
4
and 5
, while edge frequency estimator has perfect recall and
precision (see Section 5 for detail). (re-scaled and shifted for visu-
alization)
4.2 Distance measure
Now, we need to compare the probability distributions of
edges across consecutive windows. Kolmogorov-Smirnov
(KS) statistic and Kullback-Leibler (KL) divergence are two
common measures for comparing distribution. Their calcu-
lations require the enumeration of the whole state space and
hence exponential to the number of variables for joint distri-
butions. Although KS statistic is designed for univariate dis-
tribution, we can map the joint distribution, which has mul-
tivariate binary variables, to one dimension by decoding the
binary vectors as an integer and use KS statistic. We bootstrap
from empirical distributions of two consecutive windows re-
spectively and use two sample KS test to quantify the differ-
ence of two distributions. We can use divide-and-conquer to
alleviate the exponential complexity: partition the dyads into
ggroups, compute KL/KS dissimilarity within each small
group, and record the median among all the ggroups as the
final dissimilarity.
Both of the above measures have good quality in terms
of change point detection (Figure 4), but KS statistic is ex-
tremely slow (Table 4), mostly due to the large sample boot-
strap from each window. Euclidean distance, though lack of
probability interpretation, has linear complexity and has rea-
sonable quality in practice.
4.3 Threshold Determination
Suppose we have wwindows, then we compare w−1pairs of
distributions and get w−1difference/distance scores. How
do we choose a threshold to determine at which window the
network changes? We use a permutation test [Pitman, 1937]
based approach to determine the threshold. For a desired
significance level αs, we bootstrap from the w−1distance
scores, and use the upper 100αs%quantile as the threshold.
4.4 Complexity Analysis
The algorithm is linear to the number of windows and con-
stant to the network size for moderately large network. Only a
small fraction of dyads in the network is sampled and tracked.
The sampling of the dyads is only performed once at the be-
ginning, and hence irrelevant to the number of snapshots. For
each snapshot, selecting a specific set of dyads has linear cost
to the number of edges. Each window is only scanned once
and therefore the time cost is linear to the number of win-
dows. Moreover, since the number of windows is linear to
the total number of snapshots even in the worst case (win-
dows are overlapping, and window step is one), the algorithm
is linear to the number of snapshots. Therefore, the time com-
plexity is O(¯
M T ), where ¯
Mis the averaged number of edges
in each snapshot, and Tis the number of snapshots.
The memory cost is low, and can be viewed as constant:
for each snapshot, only the information of the tracked dyads
is stored; information of dyads within the same window is ag-
gregated; dyads information in the old window is overwritten
once it is compared against the new window. And the space
complexity is O(c), where cis a prescribed sample size. The-
oretically the sample size should be proportional to the net-
work size for good estimation. Our experiments show that a
fixed sample size (to track 250 out of 1.2G dyads ≈50k
2)
works well on a moderately large network.
(a) Likelihood (ground truth)
(b) Algorithms comparison. Curves are dissimilarity scores and
horizontal bars are thresholds, and they two have corresponding
colors and line shapes. (re-scaled and shifted for visualization)
Figure 4: SBM, ground truth changes explained in Table 3. 1−α
= 0.51 and window size = 20. DeltaCon (Fig b-top) and EM-KL
(Fig b-bottom) have the smallest variance, but DeltaCon has two
false negatives at 4
and 5
.
Table 3: Model Change Explanation for an SBM-CL Experiment in Figure 4
Order Window Index Type of Change
1 15 The weight sequence of 1/3 of the nodes is re-generated
2 30 The weight sequence of 2/3 of the nodes is re-generated
3 60 Half of the communities change their (inter- and intra-community) connection rate, overall density retained
4 75 All of the communities change their (inter- and intra-community) connection rate, overall density retained
5 90 Half of the communities change their (inter- and intra- community) connection rate, overall density changed
6 105 All of the communities change their (inter- and intra- community) connection rate, overall density changed
7 135 Community assignments of all the nodes are changed
5 Experiments And Results
We did thorough evaluation of our edge probability es-
timation based change point detection algorithm (called
EdgeMonitoring for simplicity) on synthetic and real world
datasets. For the synthetic datasets, the generative process
is known, and we can compute the ground truth in the form
of likelihood, which is naturally a baseline choice. We also
use the state-of-the-art DeltaCon [Koutra et al., 2016]and
LetoChange [Peel and Clauset, 2015]as two baselines.
Figure 5: Change point detection on US Senate co-sponsorship
network. Change points at the 100th and the 104th Congresses
(boxed) correspond to partisan domination shifts. Both EM-KL
(green) and LetoChange (cyan) have perfect recall and precision,
while DeltaCon (pink) has 3 false positives and 1 false negative.
5.1 Synthetic Data
Data generation3We generate a sequence of networks from
a fixed generation model. The snapshots are not independent,
each snapshot depends on the preceding one through the con-
tinuity parameter α(αt≡α). For each snapshot, each edge
is selected independently with probability α, and if selected,
the edge is again sampled from the generative model (Table
1). We introduce the change points by changing the genera-
tive model in the middle of the sequence of snapshots. Note
that this change may be simply a change of parameter val-
ues for a given model (Eg. ER0.4to ER0.6), or a change in
the model type (Eg. SBM to ER), as well. Since our algo-
rithm makes no assumptions about model specifics, we are
able to detect both kinds of changes. We only inject parame-
ter change in the synthetic experiment since the latter change
is easily detectable. Sample changes are displayed in Table 3.
The likelihood of the snapshot sequence is also provided.
3Generated using SNAP[Leskovec and Rok Sosiˇ
c, 2014]
We ran experiments with network sizes ranging from 1k to
50k, window size to be from 10 to 100 and continuity rate
1−αto be 0.51 and 0.9. We generated a total of 5000 snap-
shots and sampled 250 edges uniformly at random to track.
Both overlapping window (s=2η) and non-overlapping win-
dow have similar results, yet the latter is faster simply due to
fewer windows. Hence we display non-overlapping window
results only. For KL and KS, edges are grouped into 25 equal-
sized groups. We use upper 5% quantile as the threshold.
Results Figure 4 shows the qualitative comparison and
Table 4 reports the efficiency. Figure 4a shows the likeli-
hood of the network drops dramatically after the generative
model changes, and recovers to new equilibrium afterwards.
Our EdgeMonitoring (EM-Eu, EM-KL) approach can suc-
cessfully identify all change points with 5X speed up over
DeltaCon. The changes are explained in Table 3. We can
see that EM-KL has the best performance: little fluctuation
and perfect precision and recall. DeltaCon, though has small-
est fluctuation, misses two change points. Both EM-KS and
EM-Eu have large fluctuation. The quality of EM-KS heav-
ily relies on the joint probability estimation, and we do see
smaller fluctuation and higher recall for larger window size.
EM-Eu in general has large fluctuation. EM-KL has the best
overall performance, in terms of both quality and time effi-
ciency. We believe grouping together with median selection
contribute to its superiority.
Table 4: Time efficiency comparison (5k snapshots)
Model Network
Size
Window
Size
EM
Time1
EM-KS
Time
DC Time
(speedup)
LC
Time
CL 1k 20 18s 11h 091s (5X) DNF
SBM-CL 1k 10 27s 22h 125s (5X) DNF
SBM-CL 1k 50 9s 4.5h 043s (5X) DNF
SBM-CL 5k 20 54s 11h 309s (6X) DNF
SBM-CL 10k 20 232s 10h .32m(8X) DNF
SBM-CL 50k 20 26m 10h i04h (9X) DNF
BTER21k 20 3s 87m 012s (4X) 6h
Figure 4 1k 20 21s 6.5h 103s (5X) DNF
Figure 5 100 biennial 4s 43m 016s (4X) 13h
[Voeten, 2012]≈200 annual 10s 3h 093s (9X) DNF
Enron 150 weekly 1s 7.5h 001s (1X) 60h
1EM for EdgeMonitoring (running time includes both KL and Euclidean),
EM-KS for EdgeMonitoring with KS test, DC for DeltaCon, LC for
LetoChange. EM and DC are implemented in MATLAB while LC in
Python. All run on a commercial desktop with 48hrs as time limit. Each
running time averaged over 5 runs.
2BTER dataset has 800 snapshots
5.2 Real World Data
Senate cosponsorship network([Fowler, 2006]) We con-
struct a co-sponsorship network from bills (co-)sponsored
in US Senate during the 93rd-108th Congress. An edge is
formed between two congresspersons if they cosponsored the
same bill. Each bill corresponds to a snapshot, and forms a
clique of co-sponsors. A window is set to include all bills in
a single Congress (Biennially).
We randomly selected 250 dyads and tracked their fluctua-
tions across the Congresses. We start from the 97th Congress
since full amendments data is available only from 97th ses-
sion onwards. Figure 5 compares EdgeMonitoring+KL,
DeltaCon and LetoChange. All methods were able to de-
tect the most significant change point at the 104th Congress.
Fowler [Fowler, 2006]points out that there was a “Repub-
lican Revolution” in the 104th Congress which “caused a
dramatic change in the partisan and seniority compositions.”
The author also points out the significance of the 100th (high-
est clustering coefficient, significant collaboration) and 104th
Congress (lowest clustering coefficient, low point in collab-
oration) as inflection points in the Senate political process.
Both our EdgeMonitoring approach and LetoChange classify
these two Congresses as change points, but the latter takes
much more time. DeltaCon picks up on one (104th) and not
the other (100th). This provides evidence that our algorithm
is able to capture the changes in network evolution effectively
while being significantly faster than the state-of-the-art.
6 Conclusion
In this paper, we develop a change point detection algorithm
for dynamic networks that is efficient and accurate. Our ap-
proach relies on sampling and comparing the estimated joint
edge (dyad) distribution. We first develop a maximum likeli-
hood estimator, and analyze its drawbacks for small window
sizes (the typical case). We then develop a consistent and un-
biased estimator that overcomes the drawbacks of the MLE,
resulting in significant quality improvement over the MLE.
We conduct a thorough evaluation of our change point de-
tection algorithm against two state-of-the-art DeltaCon and
LetoChange on synthetic as well as the real world datasets.
Our results indicate that our method is up to 9X faster than
DeltaCon while achieving better quality. In the future we plan
to extend our work to track higher order structures of the net-
work such as 3-profiles [Elenberg et al., 2015]or 4-profiles
and see how they evolve over time.
Acknowledgments
This work is supported in part by NSF grant DMS-1418265,
IIS-1550302 and IIS-1629548. Any opinions, findings, and
conclusions or recommendations expressed in this material
are those of the authors and do not necessarily reflect the
views of the National Science Foundation.
References
[Akoglu et al., 2014]Leman Akoglu, Hanghang Tong, and
Danai Koutra. Graph-based anomaly detection and de-
scription: A survey. Data Mining and Knowledge Dis-
covery (DAMI), 28(4), 2014.
[Berlingerio et al., 2012]Michele Berlingerio, Danai
Koutra, Tina Eliassi-Rad, and Christos Faloutsos. Net-
simile: a scalable approach to size-independent network
similarity. arXiv preprint arXiv:1209.2684, 2012.
[Bridges et al., 2015]Robert A Bridges, John P Collins,
Erik M Ferragut, Jason A Laska, and Blair D Sullivan.
Multi-level anomaly detection on time-varying graph data.
In Proceedings of the 2015 IEEE/ACM International Con-
ference on Advances in Social Networks Analysis and Min-
ing 2015, pages 579–583. ACM, 2015.
[Caceres and Berger-Wolf, 2013]Rajmonda Sulo Caceres
and Tanya Berger-Wolf. Temporal scale of dynamic net-
works. In Temporal Networks, pages 65–94. Springer,
2013.
[Elenberg et al., 2015]Ethan R Elenberg, Karthikeyan Shan-
mugam, Michael Borokhovich, and Alexandros G Di-
makis. Beyond triangles: A distributed framework for es-
timating 3-profiles of large graphs. In Proceedings of the
21th ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, pages 229–238. ACM,
2015.
[Erd˝
os and R´
enyi, 1960]Paul Erd˝
os and A R´
enyi. On the
evolution of random graphs. Publ. Math. Inst. Hungar.
Acad. Sci, 5:17–61, 1960.
[Fowler, 2006]James H Fowler. Legislative cosponsorship
networks in the US House and Senate. Social Networks,
28(4):454–465, 2006.
[Givens and Hoeting, 2012]Geof H Givens and Jennifer A
Hoeting. Computational statistics, volume 710. John Wi-
ley & Sons, 2012.
[Hunter et al., 2012]David R Hunter, Pavel N Krivitsky, and
Michael Schweinberger. Computational statistical meth-
ods for social network models. Journal of Computational
and Graphical Statistics, 21(4):856–882, 2012.
[Karrer and Newman, 2011]Brian Karrer and Mark EJ New-
man. Stochastic blockmodels and community structure in
networks. Physical Review E, 83(1):016107, 2011.
[Klimt and Yang, 2004]Bryan Klimt and Yiming Yang. The
enron corpus: A new dataset for email classification re-
search. In Machine learning: ECML 2004, pages 217–
226. Springer, 2004.
[Koutra et al., 2016]Danai Koutra, Neil Shah, Joshua T Vo-
gelstein, Brian Gallagher, and Christos Faloutsos. Delta-
con: Principled Massive-Graph Similarity Function with
Attribution. ACM Transactions on Knowledge Discovery
from Data (TKDD), 10(3):28, 2016.
[La Fond et al., 2014]Timothy La Fond, Jennifer Neville,
and Brian Gallagher. Anomaly detection in networks with
changing trends, 2014.
[Leskovec and Rok Sosiˇ
c, 2014]Jure Leskovec and Rok
Sosiˇ
c. SNAP: A general purpose network analysis
and graph mining library in C++. http://snap.
stanford.edu/snap, Jun 2014.
[Li et al., 2016]Shuang Li, Yao Xie, Mehrdad Farajtabar,
and Le Song. Detecting weak changes in dynamic events
over networks. arXiv preprint arXiv:1603.08981, 2016.
[Loglisci et al., 2015]Corrado Loglisci, Michelangelo Ceci,
and Donato Malerba. Relational mining for discovering
changes in evolving networks. Neurocomputing, 150:265–
288, 2015.
[mat, ]Wolfram mathematica. https://www.
wolfram.com/mathematica/. Accessed: 2017-06-
03.
[Moreno and Neville, 2013]Sebastian Moreno and Jennifer
Neville. Network hypothesis testing using mixed kro-
necker product graph models. In Data Mining (ICDM),
2013 IEEE 13th International Conference on, pages 1163–
1168. IEEE, 2013.
[Peel and Clauset, 2015]Leto Peel and Aaron Clauset. De-
tecting change points in the large-scale structure of evolv-
ing networks. In Twenty-Ninth AAAI Conference on Arti-
ficial Intelligence, 2015.
[Peixoto and Rosvall, 2015]Tiago P Peixoto and Martin
Rosvall. Modeling sequences and temporal networks
with dynamic community structures. arXiv preprint
arXiv:1509.04740, 2015.
[Pfeiffer III et al., 2012]Joseph J Pfeiffer III, Timothy
La Fond, Sebastian Moreno, and Jennifer Neville. Fast
generation of large scale social networks with clustering.
arXiv preprint arXiv:1202.4805, 2012.
[Pitman, 1937]Edwin JG Pitman. Significance tests which
may be applied to samples from any populations. Sup-
plement to the Journal of the Royal Statistical Society,
4(1):119–130, 1937.
[Ranshous et al., 2015]Stephen Ranshous, Shitian Shen,
Danai Koutra, Steve Harenberg, Christos Faloutsos, and
Nagiza F Samatova. Anomaly detection in dynamic net-
works: a survey. Wiley Interdisciplinary Reviews: Compu-
tational Statistics, 7(3):223–247, 2015.
[Seshadhri et al., 2012]C Seshadhri, Tamara G Kolda, and
Ali Pinar. Community structure and scale-free collections
of erd˝
os-r´
enyi graphs. Physical Review E, 85(5):056109,
2012.
[Shih and Parthasarathy, 2012]Yu-Keng Shih and Srini-
vasan Parthasarathy. Identifying functional modules in in-
teraction networks through overlapping markov clustering.
Bioinformatics, 28(18):i473–i479, 2012.
[Voeten, 2012]Erik Voeten. Data and analyses of voting in
the UN general assembly. Available at SSRN 2111149,
2012.
[Zhang et al., 2016]Xiao Zhang, Cristopher Moore, and
MEJ Newman. Random graph models for dynamic net-
works. arXiv preprint arXiv:1607.07570, 2016.