ArticlePDF Available

Fast Change Point Detection on Dynamic Social Networks


Abstract and Figures

A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.
Content may be subject to copyright.
Fast Change Point Detection on Dynamic Social Networks
Yu WangAniket ChakrabartiDavid Sivakoff#Srinivasan Parthasarathy
Department of Computer Science and Engineering, #Department of Statistics
The Ohio State University, Columbus, Ohio, USA
Contact: or
A number of real world problems in many domains
(e.g. sociology, biology, political science and com-
munication networks) can be modeled as dynamic
networks with nodes representing entities of inter-
est and edges representing interactions among the
entities at different points in time. A common rep-
resentation for such models is the snapshot model -
where a network is defined at logical time-stamps.
An important problem under this model is change
point detection. In this work we devise an effec-
tive and efficient three-step-approach for detect-
ing change points in dynamic networks under the
snapshot model. Our algorithm achieves up to 9X
speedup over the state-of-the-art while improving
quality on both synthetic and real world networks.
1 Introduction
Dynamic network analysis is increasingly used in complex
application domains ranging from social networks (Face-
book network evolution [Leskovec and Rok Sosiˇ
c, 2014])
to biological networks (protein-protein interaction [Shih and
Parthasarathy, 2012]), from political science (United Na-
tions General Assembly voting network [Voeten, 2012]) to
communication networks (Enron network [Klimt and Yang,
2004]). Such dynamic networks are often represented using
the snapshot model. Under this model, every network snap-
shot (represented by a graph) is defined at a logical times-
tamp. Two questions of fundamental importance are – (i) how
does a network evolve? (ii) when does a network change sig-
nificantly so as to arise suspicion that something fundamen-
tally different is happening?
Various generative models [Peixoto and Rosvall, 2015;
Zhang et al., 2016]have been proposed to address ques-
tion (i) - to explain the evolution of a network. They study
network evolution under certain generative models [Erd˝
and R´
enyi, 1960; Karrer and Newman, 2011]. In reality,
the generative model itself might change, as addressed in
question (ii) above. Existing work [Akoglu et al., 2014;
Ranshous et al., 2015]use complex methods to detect such
changes. One drawback of those delicate methods is that they
are time-consuming, and hence often not scalable (in terms of
both network size and number of snapshots). We seek to find
an efficient and effective solution that can scale up both with
network size and with number of snapshots.
In this paper, we present a simple and efficient algo-
rithm based on likelihood maximization to detect change
points in dynamic networks under the snapshot model. We
demonstrate the utility of our algorithm on both synthetic
and real world networks drawn from political science (con-
gressional voting, UN voting), and show that it outperforms
two recent approaches (DeltaCon[Koutra et al., 2016], and
LetoChange[Peel and Clauset, 2015]) in terms of both qual-
ity and efficiency. Our work has the following contributions:
1. Our approach is general purpose – it can accommodate
various snapshot generative models (see Table 1).
2. We model network evolution as a first order Markov pro-
cess and consequently our algorithm accounts for the
temporal dependency while computing the dissimilarity
between snapshots.
3. Our algorithm is efficient and has constant memory over-
head that can be tuned by a user controlled parameter.
We extensively evaluate our approach on synthetic as well as
real world networks and show that our approach is extremely
efficient (both in performance and quality).
2 Related Work
Ranshous et al. [Ranshous et al., 2015], and Akoglu et al.
[Akoglu et al., 2014]recently survey network anomaly detec-
tion. Our change point detection problem is similar to Type
4, the “Event and Change Detection”, of the former: given
a network sequence, a dissimilarity scoring function, and a
threshold, a change is flagged if the dissimilarity of two con-
secutive snapshots is above the threshold. We differ in that we
assume there is a latent generation model governing the net-
work dynamics, and we are trying to detect the change in the
latent space, while they did not explicitly mention the latent
generation model. Moreover, we consider the temporal de-
pendency across the snapshots while no work in the surveys
accounted for temporal dependency.
DeltaCon [Koutra et al., 2016]uses a graph similarity-
based [Berlingerio et al., 2012]approach to detect change
points in dynamic networks. It derives the features of a snap-
shot based on sociological theories. And the feature similarity
of each consecutive snapshot pair is calculated. That work is
model agnostic (has no assumption on the generation model
arXiv:1705.07325v2 [cs.SI] 4 Jun 2017
of networks), and is the state-of-the-art in terms of efficiency.
We compare our algorithm against this.
Moreno and Neville [Moreno and Neville, 2013], Bridges
et al. [Bridges et al., 2015]and Peel and Clauset [Peel and
Clauset, 2015]develop network hypothesis testing based ap-
proaches. The advantage is that one can get a p-value of the
test, which quantifies the confidence of the conclusion. How-
ever, these approaches have two shortcomings: firstly, they
need to assume a specific generation model of the networks
(mKPGM, GBTER and GHRG respectively); secondly, they
are extremely slow, mostly due to the bootstrapping for p-
value calculation. La Fond et al.’s work [La Fond et al., 2014]
can also generate a p-value. It is tested against DeltaCon
without reporting running time and efficiency concern is also
mentioned in the paper. These algorithms will not work in our
setting where the detection is done real time under bounded
memory constraints. We compare our model agnostic algo-
rithm against [Peel and Clauset, 2015].
The DAPPER heuristic [Caceres and Berger-Wolf, 2013]
proposes a similar edge probability estimator as ours. How-
ever, it does not consider the temporal dependency of snap-
shots. Moreover, it focuses on temporal scale determination
while ours focuses on change point detection. Loglisci et al.
[Loglisci et al., 2015]study change point detection on rela-
tional network using rule-based analysis. Our approach uses
(hidden) parameter estimation instead of semantic rule to in-
fer the structure. Li et al. [Li et al., 2016]propose an online
algorithm, and consider temporal dependency. The problem
they study is different from ours in that they study informa-
tion diffusion on network with fixed structure and use con-
tinuous time. A recent work by Zhang et al. [Zhang et al.,
2016]also studies the dynamic network in a Markov chain
setting. They focus on community detection while we focus
on change point detection.
3 Problem Formulation
This paper studies how to detect the times at which the funda-
mental evolution mechanism of a dynamic network changes.
We assume that there is some unknown underlying model that
governs the generative process. Our change point detection
algorithm is agnostic to this model. We assume that the ob-
served network snapshots are samples that depend on some
generative model and the previous snapshot. Networks have
fluctuation across snapshots even when the generative model
stays unchanged. Only when the generative model changes
do we consider it a fundamental change. We represent the
evolutionary process as a Markov Network (Figure 1).
In Figure 1, Mtis the network generation model at time t.
It is a triad Mt=hTypet,Θt, αti, where αtis the continu-
ity parameter at time t, Typetspecifies the model, while Θt
represents the model parameters (Table 1 consists of some
generative models we experiment on). Gtis the network
(graph) observable at time t. We assume the number of ver-
tices in Gtis fixed to be Nfor all snapshots (the union of all
nodes is used when there is node addition/deletion, as in [Peel
and Clauset, 2015]), so each Gthas 2(N
2)possible configu-
rations, and Tis the total number of snapshots we observe.
As per Figure 1 the configuration of the network at time t,
Gt, depends on the generation model at time t,Mt(unob-
Latent M1M2M3Mt
Figure 1: Representation of the underlying generative process. Our
inference is agnostic to Mts.
served) and the network configuration at time t1,Gt1
(observed). Hence the networks in the observed sequence are
samples from a conditional distribution (samples are not in-
dependent). The continuity rate parameter αtcontrols the
fraction of edges and non-edges that are retained from the
previous snapshot, Gt1. The network at time tis assumed
to be generated in the following way: for each dyad, keep the
connection status from time t1with probability 1αt, and
with probability αt, resample the connection according to the
generation model at time t. Consequently, the smaller αtis,
the more overlap between two consecutive snapshots there is.
Note that two consecutive network configurations may differ
substantially if αt>0, even though the underlying genera-
tion model may be the same. Moreover, the changes of the
generation model are assumed to be rare across the time span
(Mt6=Mt+1 is a rare event).
Problem Definition Our goal is to efficiently find a set S
{2, . . . , T }such that tSMt6=Mt1, that is,
to efficiently find all the time points at which the network
generation model is different from the previous time point.
4 Methodology
Given the graphical formulation of the problem, exact infer-
ence is impossible since we do not know the underlying gen-
erative model, and our observations are stochastic. However,
even without prior knowledge of the generative model, we
can still design an approximate inference technique based on
MCMC sampling theory.
The framework is straightforward, as mentioned in Sec-
tion 2: we first extract a “feature vector” from each snapshot,
then quantify the dissimilarity between consecutive snap-
shots, and flag out a change point when the dissimilarity score
is above a threshold. We use the joint edge probability as the
“feature vector” (Section 4.1), exploit Kolmogorov-Smirnov
statistic, Kullback-Leibler divergence and Euclidean distance
for dissimilarity measure (Section 4.2), and use a permutation
test like approach to determine the threshold (Section 4.3).
4.1 Edge Probability Estimation
In this subsection, we discuss how to (approximately) esti-
mate the joint distribution of the dyads 1. We track the pres-
ence or absence of a small fixed number of dyads through-
out the entire observed sequence of network snapshots. We
break down the observation sequence into fixed-length win-
dows, and for each window we infer the joint distribution of
the dyads in our sample. We model each dyad to be a condi-
tionally independent two-state Markov chain (Figure 2, we
1we refer node pairs, which may or not be linked, as dyads
Table 1: Edge probability between a dyad in each model
Model Edge probability Explanation
p(hni, nji | M) = p p: edge probability
Chung–Lu (CL) p(hni, nji | M) = βwiwj/Piwiwi: weight of node i([Pfeiffer III et al., 2012]);
β: edge density
Stochastic Block
Model (SBM)
p(hni, nji | M) = p(ci, cj)ci: community assignment of node ni
p(r, s): probability of edges between communities rand s
SBM-CL p(hni, nji | M)p(ci, cj)wiwjnotation as above
BTER p(hni, nji | M)
space=pER I[ci=cj] + pCL I[ci6=cj]
Intra-community edge probability follows ER,
inter-community CL; [Seshadhri et al., 2012]
I[·]is the indicator function
Table 2: Notation Table
Notation Explanation
Nnetwork size, in terms of the number of nodes
Tnumber of snapshots
ttime stamp, t∈ {1,...,T}
Sset of all change points
Mt(unknown) generative model at time stamp t
Gtsnapshot at time stamp t
α, αtcontinuity rate (at time t)
swindow size, or number of snapshots in a window
Wta window of size sending at time t
ηstep size of the sliding windows
wnumber of windows: η= 1 w=T1; η=
ejconnection status of dyad j
Maveraged number of edges in each snapshot: M=
t=1 P(N
j=1 e(t)
pj, p(t)
jconnection probability of dyad j(at time t)
knumber of dyads to be sampled and tracked
01 number of flips from 0 to 1 of dyad jduring the
period of interest
00 +N(j)
01 ,N(j)
∗∗ N(j)
00 , N (j)
01 , N (j)
10 , N (j)
0number of disconnected occurences of dyad jin the
period of interest
use αinstead of αtin this section for brevity) given the se-
quence of generative models (Mt)t0(this conditional inde-
pendence assumption is satisfied for the choices of models in
Table 1). Note that even for generative processes that may
result in greater dependence among dyads (such as the con-
figuration model), in many cases such dependence will be lo-
cal, and if the number of dyads sampled is small, then these
dyads will be spread out enough to be considered indepen-
dent. Moreover, the conditional independence assumption
significantly improves computational efficiency ([Hunter et
al., 2012]). The marginal probabilities of these dyads can
then be estimated using the observed samples within each
time window.
We formalize the estimation procedure below. Given a net-
work sequence G1, . . . , GT, we group the networks into slid-
ing windows. We define Wtto be a subsequence of scon-
secutive observed networks ending at network Gt, so Wt
0 1
αp (to right)
1αp 1α(1 p)
α(1 p)(to left)
Figure 2: Two state Markov Chain
(Gts+1, Gts+2 , . . . , Gt). We use equal sized sliding win-
dows with a step size η, and we obtain a window sequence
Ws, Ws+η, . . . , Ws+(i1)η, . . . , WT. Non-overlapping win-
dow setting uses η=s. In each window i, we can es-
timate the joint edge distribution (for the selected dyads)
P(e1, e2, . . . , ek|Ms+(i1)η), where ej= 1 indicates an
undirected edge between the j-th dyad, and k << N
2is the
number of dyads tracked. For each of the models in Table 1,
the joint distribution can be factorized into P(e1, e2, . . . , ek|
Ms+(i1)η)=ΠjP(ej|Ms+(i1)η). (conditional indepen-
dence, see method description above)
We can view a dyad across time as a two state Markov
chain, and the chain length is the window size. We call a
dyad across time a chain in the following text for brevity. Let
pjP(ej= 1 |Ms+(i1)η), qj1pj, and suppose we
are interested in kchains.
1) Maximum Likelihood Estimator (MLE)
The joint probability of the chains is (Figure 2)
∗∗ |α, ~p)
01 (αqj)N(j)
10 (1 αpj)N(j)
00 (1 αqj)N(j)
where N(j)
01 is the number of transitions from 0 to 1 for a
chain (non-edge to edge for the dyad jwithin the window),
ckstands for the combinatorial coefficients independent of
α, ~p. and Pp,q ∈{0,1}N(j)
pq =s1for all js. And hence the
log-likelihood (omitting the coefficient ck) is:
L(α, ~p |N(j)
∗∗ ) =
01 ln(αpj) + N(j)
10 ln(αqj)
00 ln(1 αpj) + N(j)
11 ln(1 αqj)]
MLE for a single chain First consider there is only one
chain. Solving the zero-derivative Equations (4) and (5),
leads to estimators of α, p. And the estimators indeed lead to
a negative definite Hessian, and therefore is the MLE. Hence
we have
ˆαMLE =N01N1+N10 N0
ˆpMLE =N01N1
N01N1+N10 N0
MLE for multiple chains The MLE for multiple chains
essentially involves solving a high degree polynomial, which
in general does not have a closed form solution.
= 0 (4)
∂α =XNN(j)
00 N(j)
= 0
where 1αis the continuity rate. If α= 0 then all snapshots
are identical, which is uninteresting, so we have α6= 0 in
Equation (5).
Combining Equations (4) and (5), one can get a high order
polynomial of pj, which in general does not have closed-form
solutions by Abel-Ruffini theorem. We tried solving a spe-
cial case where there are two chains by Wolfram Mathemat-
ica [mat, ]. The solutions (of two quartic functions) turn out
to be very complicated and take over 40 pages. A common
way to solve such maximization problems is to employ nu-
merical methods such as gradient descent. The drawback of
such an approach is that it can be computationally expensive
with hundreds of dyads and windows. Therefore, we settle
for an approximation of the MLE that empirically approxi-
mates numerical values well2. Intuitively, the estimator for α
should depend on all the chains, but chains that spend more
time in both states 0and 1provide more information about α
than chains that spend most time in one state (the latter may
be due to small αor to a value of pjfar from 1/2). Since
we can easily compute the MLE for αfor a single chain, we
estimate αwith a weighted average of the MLEs from the
individual chains, with chains that spend more time in both
states being weighted more heavily. We then estimate each
pjby the MLE for the jth chain, since the chains are con-
ditionally independent given α. This results in the following
ˆαapprox =X
wjˆαjMLE =X
01 N(j)
10 N(j)
ˆpjapprox = ˆpjMLE =N(j)
01 N(j)
01 N(j)
10 N(j)
where wj=[N(j)
1]p,and N(j)
00 +N(j)
Empirically we find the exponent p=works best,
which means ˆαapprox is a simple average of the αjMLE corre-
sponding to the chains with maximal value of N(j)
1. The
2At significant level 0.05, two sample t-test shows the approxi-
mated values equal to the numerical values.
continuity rate describes the temporal dependency among
networks, and can help us determine a proper window size.
Drawbacks of MLE Though MLEs are consistent in gen-
eral, there is no guarantee of unbiasness for these particular
estimators with limited samples. Moreover, they have three
random quantities (N01, N10 , N1, N0in (6) have 3 degrees
of freedom for fixed s) and hence require more samples to
estimate, making it prohibitive in practice.
2) Simplified Estimator
To overcome the drawbacks of MLE, we propose a simple
estimator for the edge probability which is consistent and un-
biased, has only one random quantity and therefore requires
fewer samples. The simple estimator essentially estimates the
edge frequency in each window. If we know changes happen
rarely, and the process stays in equilibrium in most of time,
we can show the following estimator to be consistent and un-
biased in equilibrium:
ˆpjeq N(j)
s=#of 1s in the chain
which is the proportion of snapshots in which the dyad j
being an edge within the window.
Proposition 1 In equilibrium, ˆpjeq is consistent as chain
length (window size) increases.
Proof. π(j)
0P(non-edge of chain jin equilibrium), s
1(fixed), N(j)
1#of 1s in the chain
By ergodic theorem [Givens and Hoeting, 2012],
almost surely
=========== lims→∞ N(j)
1/s = lims→∞ ˆpjeq, where
the first equation means almost sure convergence, and implies
convergence in probability (estimator being consistent).
Proposition 2 In equilibrium, ˆpjeq is unbiased.
Proof. p(j)
01 pjα, p(j)
10 (1 pj)α. (Figure 2)
01 +π(j)
11 =π(j)
01 =π(j)
1α(1 pj)
0) = EN(j)
=Eˆpjeq =EN(j)
1/s =pj=ˆpjeq is unbiased
The above propositions imply that the larger the window
size the better the estimation, and that in equilibrium, the
temporal dependency (continuity rate) has no impact on es-
timating the onset probability of a Markov chain, and hence
no impact on estimating the edge probability of a snapshot.
Although MLE is close to the true value when the chain is
long enough (100 or longer), we do not use so large a window
size in practice (20 usually, no larger than 50). Experiments
(Figure 3) show that the simplified estimator is much better
than MLE for change point detection in practice.
Figure 3: Comparison between MLE (red, top) and Simplified
Estimator (green, bottom) for change point detection on the same
sequence as Figure 4 and Table 3. Distance measure KL is used
and other measures have consistent results; horizontal bars are cor-
responding thresholds. MLE has large fluctuation, and increasing
window size reduces fluctuation; MLE misses true changes at 3
and 5
, while edge frequency estimator has perfect recall and
precision (see Section 5 for detail). (re-scaled and shifted for visu-
4.2 Distance measure
Now, we need to compare the probability distributions of
edges across consecutive windows. Kolmogorov-Smirnov
(KS) statistic and Kullback-Leibler (KL) divergence are two
common measures for comparing distribution. Their calcu-
lations require the enumeration of the whole state space and
hence exponential to the number of variables for joint distri-
butions. Although KS statistic is designed for univariate dis-
tribution, we can map the joint distribution, which has mul-
tivariate binary variables, to one dimension by decoding the
binary vectors as an integer and use KS statistic. We bootstrap
from empirical distributions of two consecutive windows re-
spectively and use two sample KS test to quantify the differ-
ence of two distributions. We can use divide-and-conquer to
alleviate the exponential complexity: partition the dyads into
ggroups, compute KL/KS dissimilarity within each small
group, and record the median among all the ggroups as the
final dissimilarity.
Both of the above measures have good quality in terms
of change point detection (Figure 4), but KS statistic is ex-
tremely slow (Table 4), mostly due to the large sample boot-
strap from each window. Euclidean distance, though lack of
probability interpretation, has linear complexity and has rea-
sonable quality in practice.
4.3 Threshold Determination
Suppose we have wwindows, then we compare w1pairs of
distributions and get w1difference/distance scores. How
do we choose a threshold to determine at which window the
network changes? We use a permutation test [Pitman, 1937]
based approach to determine the threshold. For a desired
significance level αs, we bootstrap from the w1distance
scores, and use the upper 100αs%quantile as the threshold.
4.4 Complexity Analysis
The algorithm is linear to the number of windows and con-
stant to the network size for moderately large network. Only a
small fraction of dyads in the network is sampled and tracked.
The sampling of the dyads is only performed once at the be-
ginning, and hence irrelevant to the number of snapshots. For
each snapshot, selecting a specific set of dyads has linear cost
to the number of edges. Each window is only scanned once
and therefore the time cost is linear to the number of win-
dows. Moreover, since the number of windows is linear to
the total number of snapshots even in the worst case (win-
dows are overlapping, and window step is one), the algorithm
is linear to the number of snapshots. Therefore, the time com-
plexity is O(¯
M T ), where ¯
Mis the averaged number of edges
in each snapshot, and Tis the number of snapshots.
The memory cost is low, and can be viewed as constant:
for each snapshot, only the information of the tracked dyads
is stored; information of dyads within the same window is ag-
gregated; dyads information in the old window is overwritten
once it is compared against the new window. And the space
complexity is O(c), where cis a prescribed sample size. The-
oretically the sample size should be proportional to the net-
work size for good estimation. Our experiments show that a
fixed sample size (to track 250 out of 1.2G dyads 50k
works well on a moderately large network.
(a) Likelihood (ground truth)
(b) Algorithms comparison. Curves are dissimilarity scores and
horizontal bars are thresholds, and they two have corresponding
colors and line shapes. (re-scaled and shifted for visualization)
Figure 4: SBM, ground truth changes explained in Table 3. 1α
= 0.51 and window size = 20. DeltaCon (Fig b-top) and EM-KL
(Fig b-bottom) have the smallest variance, but DeltaCon has two
false negatives at 4
and 5
Table 3: Model Change Explanation for an SBM-CL Experiment in Figure 4
Order Window Index Type of Change
1 15 The weight sequence of 1/3 of the nodes is re-generated
2 30 The weight sequence of 2/3 of the nodes is re-generated
3 60 Half of the communities change their (inter- and intra-community) connection rate, overall density retained
4 75 All of the communities change their (inter- and intra-community) connection rate, overall density retained
5 90 Half of the communities change their (inter- and intra- community) connection rate, overall density changed
6 105 All of the communities change their (inter- and intra- community) connection rate, overall density changed
7 135 Community assignments of all the nodes are changed
5 Experiments And Results
We did thorough evaluation of our edge probability es-
timation based change point detection algorithm (called
EdgeMonitoring for simplicity) on synthetic and real world
datasets. For the synthetic datasets, the generative process
is known, and we can compute the ground truth in the form
of likelihood, which is naturally a baseline choice. We also
use the state-of-the-art DeltaCon [Koutra et al., 2016]and
LetoChange [Peel and Clauset, 2015]as two baselines.
Figure 5: Change point detection on US Senate co-sponsorship
network. Change points at the 100th and the 104th Congresses
(boxed) correspond to partisan domination shifts. Both EM-KL
(green) and LetoChange (cyan) have perfect recall and precision,
while DeltaCon (pink) has 3 false positives and 1 false negative.
5.1 Synthetic Data
Data generation3We generate a sequence of networks from
a fixed generation model. The snapshots are not independent,
each snapshot depends on the preceding one through the con-
tinuity parameter α(αtα). For each snapshot, each edge
is selected independently with probability α, and if selected,
the edge is again sampled from the generative model (Table
1). We introduce the change points by changing the genera-
tive model in the middle of the sequence of snapshots. Note
that this change may be simply a change of parameter val-
ues for a given model (Eg. ER0.4to ER0.6), or a change in
the model type (Eg. SBM to ER), as well. Since our algo-
rithm makes no assumptions about model specifics, we are
able to detect both kinds of changes. We only inject parame-
ter change in the synthetic experiment since the latter change
is easily detectable. Sample changes are displayed in Table 3.
The likelihood of the snapshot sequence is also provided.
3Generated using SNAP[Leskovec and Rok Sosiˇ
c, 2014]
We ran experiments with network sizes ranging from 1k to
50k, window size to be from 10 to 100 and continuity rate
1αto be 0.51 and 0.9. We generated a total of 5000 snap-
shots and sampled 250 edges uniformly at random to track.
Both overlapping window (s=2η) and non-overlapping win-
dow have similar results, yet the latter is faster simply due to
fewer windows. Hence we display non-overlapping window
results only. For KL and KS, edges are grouped into 25 equal-
sized groups. We use upper 5% quantile as the threshold.
Results Figure 4 shows the qualitative comparison and
Table 4 reports the efficiency. Figure 4a shows the likeli-
hood of the network drops dramatically after the generative
model changes, and recovers to new equilibrium afterwards.
Our EdgeMonitoring (EM-Eu, EM-KL) approach can suc-
cessfully identify all change points with 5X speed up over
DeltaCon. The changes are explained in Table 3. We can
see that EM-KL has the best performance: little fluctuation
and perfect precision and recall. DeltaCon, though has small-
est fluctuation, misses two change points. Both EM-KS and
EM-Eu have large fluctuation. The quality of EM-KS heav-
ily relies on the joint probability estimation, and we do see
smaller fluctuation and higher recall for larger window size.
EM-Eu in general has large fluctuation. EM-KL has the best
overall performance, in terms of both quality and time effi-
ciency. We believe grouping together with median selection
contribute to its superiority.
Table 4: Time efficiency comparison (5k snapshots)
Model Network
DC Time
CL 1k 20 18s 11h 091s (5X) DNF
SBM-CL 1k 10 27s 22h 125s (5X) DNF
SBM-CL 1k 50 9s 4.5h 043s (5X) DNF
SBM-CL 5k 20 54s 11h 309s (6X) DNF
SBM-CL 10k 20 232s 10h .32m(8X) DNF
SBM-CL 50k 20 26m 10h i04h (9X) DNF
BTER21k 20 3s 87m 012s (4X) 6h
Figure 4 1k 20 21s 6.5h 103s (5X) DNF
Figure 5 100 biennial 4s 43m 016s (4X) 13h
[Voeten, 2012]200 annual 10s 3h 093s (9X) DNF
Enron 150 weekly 1s 7.5h 001s (1X) 60h
1EM for EdgeMonitoring (running time includes both KL and Euclidean),
EM-KS for EdgeMonitoring with KS test, DC for DeltaCon, LC for
LetoChange. EM and DC are implemented in MATLAB while LC in
Python. All run on a commercial desktop with 48hrs as time limit. Each
running time averaged over 5 runs.
2BTER dataset has 800 snapshots
5.2 Real World Data
Senate cosponsorship network([Fowler, 2006]) We con-
struct a co-sponsorship network from bills (co-)sponsored
in US Senate during the 93rd-108th Congress. An edge is
formed between two congresspersons if they cosponsored the
same bill. Each bill corresponds to a snapshot, and forms a
clique of co-sponsors. A window is set to include all bills in
a single Congress (Biennially).
We randomly selected 250 dyads and tracked their fluctua-
tions across the Congresses. We start from the 97th Congress
since full amendments data is available only from 97th ses-
sion onwards. Figure 5 compares EdgeMonitoring+KL,
DeltaCon and LetoChange. All methods were able to de-
tect the most significant change point at the 104th Congress.
Fowler [Fowler, 2006]points out that there was a “Repub-
lican Revolution” in the 104th Congress which “caused a
dramatic change in the partisan and seniority compositions.
The author also points out the significance of the 100th (high-
est clustering coefficient, significant collaboration) and 104th
Congress (lowest clustering coefficient, low point in collab-
oration) as inflection points in the Senate political process.
Both our EdgeMonitoring approach and LetoChange classify
these two Congresses as change points, but the latter takes
much more time. DeltaCon picks up on one (104th) and not
the other (100th). This provides evidence that our algorithm
is able to capture the changes in network evolution effectively
while being significantly faster than the state-of-the-art.
6 Conclusion
In this paper, we develop a change point detection algorithm
for dynamic networks that is efficient and accurate. Our ap-
proach relies on sampling and comparing the estimated joint
edge (dyad) distribution. We first develop a maximum likeli-
hood estimator, and analyze its drawbacks for small window
sizes (the typical case). We then develop a consistent and un-
biased estimator that overcomes the drawbacks of the MLE,
resulting in significant quality improvement over the MLE.
We conduct a thorough evaluation of our change point de-
tection algorithm against two state-of-the-art DeltaCon and
LetoChange on synthetic as well as the real world datasets.
Our results indicate that our method is up to 9X faster than
DeltaCon while achieving better quality. In the future we plan
to extend our work to track higher order structures of the net-
work such as 3-profiles [Elenberg et al., 2015]or 4-profiles
and see how they evolve over time.
This work is supported in part by NSF grant DMS-1418265,
IIS-1550302 and IIS-1629548. Any opinions, findings, and
conclusions or recommendations expressed in this material
are those of the authors and do not necessarily reflect the
views of the National Science Foundation.
[Akoglu et al., 2014]Leman Akoglu, Hanghang Tong, and
Danai Koutra. Graph-based anomaly detection and de-
scription: A survey. Data Mining and Knowledge Dis-
covery (DAMI), 28(4), 2014.
[Berlingerio et al., 2012]Michele Berlingerio, Danai
Koutra, Tina Eliassi-Rad, and Christos Faloutsos. Net-
simile: a scalable approach to size-independent network
similarity. arXiv preprint arXiv:1209.2684, 2012.
[Bridges et al., 2015]Robert A Bridges, John P Collins,
Erik M Ferragut, Jason A Laska, and Blair D Sullivan.
Multi-level anomaly detection on time-varying graph data.
In Proceedings of the 2015 IEEE/ACM International Con-
ference on Advances in Social Networks Analysis and Min-
ing 2015, pages 579–583. ACM, 2015.
[Caceres and Berger-Wolf, 2013]Rajmonda Sulo Caceres
and Tanya Berger-Wolf. Temporal scale of dynamic net-
works. In Temporal Networks, pages 65–94. Springer,
[Elenberg et al., 2015]Ethan R Elenberg, Karthikeyan Shan-
mugam, Michael Borokhovich, and Alexandros G Di-
makis. Beyond triangles: A distributed framework for es-
timating 3-profiles of large graphs. In Proceedings of the
21th ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, pages 229–238. ACM,
os and R´
enyi, 1960]Paul Erd˝
os and A R´
enyi. On the
evolution of random graphs. Publ. Math. Inst. Hungar.
Acad. Sci, 5:17–61, 1960.
[Fowler, 2006]James H Fowler. Legislative cosponsorship
networks in the US House and Senate. Social Networks,
28(4):454–465, 2006.
[Givens and Hoeting, 2012]Geof H Givens and Jennifer A
Hoeting. Computational statistics, volume 710. John Wi-
ley & Sons, 2012.
[Hunter et al., 2012]David R Hunter, Pavel N Krivitsky, and
Michael Schweinberger. Computational statistical meth-
ods for social network models. Journal of Computational
and Graphical Statistics, 21(4):856–882, 2012.
[Karrer and Newman, 2011]Brian Karrer and Mark EJ New-
man. Stochastic blockmodels and community structure in
networks. Physical Review E, 83(1):016107, 2011.
[Klimt and Yang, 2004]Bryan Klimt and Yiming Yang. The
enron corpus: A new dataset for email classification re-
search. In Machine learning: ECML 2004, pages 217–
226. Springer, 2004.
[Koutra et al., 2016]Danai Koutra, Neil Shah, Joshua T Vo-
gelstein, Brian Gallagher, and Christos Faloutsos. Delta-
con: Principled Massive-Graph Similarity Function with
Attribution. ACM Transactions on Knowledge Discovery
from Data (TKDD), 10(3):28, 2016.
[La Fond et al., 2014]Timothy La Fond, Jennifer Neville,
and Brian Gallagher. Anomaly detection in networks with
changing trends, 2014.
[Leskovec and Rok Sosiˇ
c, 2014]Jure Leskovec and Rok
c. SNAP: A general purpose network analysis
and graph mining library in C++. http://snap., Jun 2014.
[Li et al., 2016]Shuang Li, Yao Xie, Mehrdad Farajtabar,
and Le Song. Detecting weak changes in dynamic events
over networks. arXiv preprint arXiv:1603.08981, 2016.
[Loglisci et al., 2015]Corrado Loglisci, Michelangelo Ceci,
and Donato Malerba. Relational mining for discovering
changes in evolving networks. Neurocomputing, 150:265–
288, 2015.
[mat, ]Wolfram mathematica. https://www. Accessed: 2017-06-
[Moreno and Neville, 2013]Sebastian Moreno and Jennifer
Neville. Network hypothesis testing using mixed kro-
necker product graph models. In Data Mining (ICDM),
2013 IEEE 13th International Conference on, pages 1163–
1168. IEEE, 2013.
[Peel and Clauset, 2015]Leto Peel and Aaron Clauset. De-
tecting change points in the large-scale structure of evolv-
ing networks. In Twenty-Ninth AAAI Conference on Arti-
ficial Intelligence, 2015.
[Peixoto and Rosvall, 2015]Tiago P Peixoto and Martin
Rosvall. Modeling sequences and temporal networks
with dynamic community structures. arXiv preprint
arXiv:1509.04740, 2015.
[Pfeiffer III et al., 2012]Joseph J Pfeiffer III, Timothy
La Fond, Sebastian Moreno, and Jennifer Neville. Fast
generation of large scale social networks with clustering.
arXiv preprint arXiv:1202.4805, 2012.
[Pitman, 1937]Edwin JG Pitman. Significance tests which
may be applied to samples from any populations. Sup-
plement to the Journal of the Royal Statistical Society,
4(1):119–130, 1937.
[Ranshous et al., 2015]Stephen Ranshous, Shitian Shen,
Danai Koutra, Steve Harenberg, Christos Faloutsos, and
Nagiza F Samatova. Anomaly detection in dynamic net-
works: a survey. Wiley Interdisciplinary Reviews: Compu-
tational Statistics, 7(3):223–247, 2015.
[Seshadhri et al., 2012]C Seshadhri, Tamara G Kolda, and
Ali Pinar. Community structure and scale-free collections
of erd˝
enyi graphs. Physical Review E, 85(5):056109,
[Shih and Parthasarathy, 2012]Yu-Keng Shih and Srini-
vasan Parthasarathy. Identifying functional modules in in-
teraction networks through overlapping markov clustering.
Bioinformatics, 28(18):i473–i479, 2012.
[Voeten, 2012]Erik Voeten. Data and analyses of voting in
the UN general assembly. Available at SSRN 2111149,
[Zhang et al., 2016]Xiao Zhang, Cristopher Moore, and
MEJ Newman. Random graph models for dynamic net-
works. arXiv preprint arXiv:1607.07570, 2016.
... Past research for identifying change points used stochastic models, of either scalar values representing the longitudinal data [5], or probabilistic and model-based representations of the network [3,[6][7][8]. However, the works mentioned above did not examine the complex network's structure as manifested through distributions. ...
... A complementary approach, similar to ours, is to extract a large number of features from each consecutive graph snapshots and find the distance between them [6,7,16,24]. A change is determined if a predefined threshold for the distance is crossed. ...
... Frameworks for change point detection divide the data to consecutive snapshots according to a natural division derived from the nature of the data, such as daily or weekly snapshots of organizational frameworks, or monthly graphs of votes. In methods measuring the distance between features extracted from two consecutive graph snapshots [6,7,16], a change is Hypothesis testing over a distance measure is used to determine whether the underlying model has changed. On the left graphs generated from the same model, on the right a graph generated from a changed model. ...
Full-text available
Changes in the structure of observed social and complex networks can indicate a significant underlying change in an organization, or reflect the response of the network to an external event. Automatic detection of change points in evolving networks is rudimentary to the research and the understanding of the effect of such events on networks. Here we present an easy-to-implement and fast framework for change point detection in evolving temporal networks. Our method is size agnostic, and does not require either prior knowledge about the network’s size and structure, nor does it require obtaining historical information or nodal identities over time. We tested it over both synthetic data derived from dynamic models and two real datasets: Enron email exchange and AskUbuntu forum. Our framework succeeds with both precision and recall and outperforms previous solutions.
... The results illustrate that when experiencing clustering events, there is a transition in the time scale (from slow to fast) and direction (from hierarchical to distributed) of information transfer in the network. Wang et al. [48] expressed the evolution of the temporal network as a Markov network and detected change points through estimating and comparing the joint edge (dyad) distribution. Experiments on the Senate cosponsorship network show that the method is more efficient than the other approaches in the same period while ensuring a good detection effect. ...
The social network is closely related to people’s lives. And social events are the products of the human subjective initiative during the evolution of networks. Therefore, there is a close correlation between social events and network evolution. This paper studies the characteristics of network evolution corresponding to social events from the perspective of temporal networks. The change point detection method is applied to capture the “shocks” of social events on the network structure. Then, the patterns of structural changes are analyzed based on the theory of community evolution. Experiments on two cases illustrate that social events are significant milestones to promote the development of social networks. And the mesostructure is the intermediary connecting evolving network and social events.
How can we detect traffic disturbances from international flight transportation logs, or changes to collaboration dynamics in academic networks? These problems can be formulated as detecting anomalous change points in a dynamic graph. Current solutions do not scale well to large real world graphs, lack robustness to large amount of node additions / deletions and overlook changes in node attributes. To address these limitations, we propose a novel spectral method: Scalable Change Point Detection (SCPD). SCPD generates an embedding for each graph snapshot by efficiently approximating the distribution of the Laplacian spectrum at each step. SCPD can also capture shifts in node attributes by tracking correlations between attributes and eigenvectors. Through extensive experiments using synthetic and real world data, we show that SCPD (a) achieves state-of-the-art performance, (b) is significantly faster than the state-of-the-art methods and can easily process millions of edges in a few CPU minutes, (c) can effectively tackle a large quantity of node attributes, additions or deletions and (d) discovers interesting events in large real world graphs. Code is publicly available at DetectionDynamic GraphsSpectral Methods
We introduce a unified framework, formulated as general latent space models, to study complex higher-order network interactions among multiple entities. Our framework covers several popular models in recent network analysis literature, including mixture multi-layer latent space model and hypergraph latent space model. We formulate the relationship between the latent positions and the observed data via a generalized multilinear kernel as the link function. While our model enjoys decent generality, its maximum likelihood parameter estimation is also convenient via a generalized tensor decomposition procedure. We propose a novel algorithm using projected gradient descent on Grassmannians. We also develop original theoretical guarantees for our algorithm. First, we show its linear convergence under mild conditions. Second, we establish finite-sample statistical error rates of latent position estimation, determined by the signal strength, degrees of freedom and the smoothness of link function, for both general and specific latent space models. We demonstrate the effectiveness of our method on synthetic data. We also showcase the merit of our method on two real-world datasets that are conventionally described by different specific models in producing meaningful and interpretable parameter estimations and accurate link prediction.
Full-text available
Motivated by the recent surge of criminal activities with cross-cryptocurrency trades, we introduce a new topological perspective to structural anomaly detection in dynamic multilayer networks. We postulate that anomalies in the underlying blockchain transaction graph that are composed of multiple layers are likely to also be manifested in anomalous patterns of the network shape properties. As such, we invoke the machinery of clique persistent homology on graphs to systematically and efficiently track evolution of the network shape and, as a result, to detect changes in the underlying network topology and geometry. We develop a new persistence summary for multilayer networks, called stacked persistence diagram, and prove its stability under input data perturbations. We validate our new topological anomaly detection framework in application to dynamic multilayer networks from the Ethereum Blockchain and the Ripple Credit Network, and demonstrate that our stacked PD approach substantially outperforms state-of-art techniques.
Full-text available
Sequences of networks are currently a common form of network data sets. Identification of structural change-points in a network data sequence is a natural problem. The problem of change-point detection can be classified into two main types - offline change-point detection and online or sequential change-point detection. In this paper, we propose three different algorithms for online change-point detection based on certain cusum statistics for network data with community structures. For two of the proposed algorithms, we use information theoretic measures to construct the statistic for the estimation of a change-point. In the third algorithm, we use eigenvalues of the Bethe Hessian matrix to construct the statistic for the estimation of a change-point. We show the consistency property of the estimated change-point theoretically under networks generated from the multi-layer stochastic block model and the multi-layer degree-corrected block model. We also conduct an extensive simulation study to demonstrate the key properties of the algorithms as well as their efficacy.
The influence maximization problem has gained increasing attention in recent years. Previous research focuses on the development of algorithms to analyze static social networks. However, real social networks are not static but they are represented as dynamic networks that evolve across time. Motivated by this drawback, the purpose of this survey is to highlight the characteristics and challenges of the influential nodes detection problem. A classification of published approaches should be proposed. This work is organizing state-of-the-art methods into a technical comparison that are based on network models. Due to the definition of network models and the influential nodes detection problem, this survey will help researchers to find the set of methods best suited for their needs. The proposed classification could also help researchers to select the right direction in which their future research should be oriented.
Full-text available
We propose generalizations of a number of standard network models, including the classic random graph, the configuration model, and the stochastic block model, to the case of time-varying networks. We assume that the presence and absence of edges are governed by continuous-time Markov processes with rate parameters that can depend on properties of the nodes. In addition to computing equilibrium properties of these models, we demonstrate their use in data analysis and statistical inference, giving efficient algorithms for fitting them to observed network data. This allows us, for instance, to estimate the time constants of network evolution or infer community structure from temporal network data using cues embedded both in the probabilities over time that node pairs are connected by edges and in the characteristic dynamics of edge appearance and disappearance. We illustrate our methods with a selection of applications, both to computer-generated test networks and real-world examples.
Full-text available
Large volume of event data are becoming increasingly available in a wide variety of applications, such as social network analysis, Internet traffic monitoring and healthcare analytics. Event data are observed irregularly in continuous time, and the precise time interval between two events carries a great deal of information about the dynamics of the underlying systems. How to detect changes in these systems as quickly as possible based on such event data? In this paper, we present a novel online detection algorithm for high dimensional event data over networks. Our method is based on a likelihood ratio test for point processes, and achieve weak signal detection by aggregating local statistics over time and networks. We also design an online algorithm for efficiently updating the statistics using an EM-like algorithm, and derive highly accurate theoretical characterization of the false-alarm-rate. We demonstrate the good performance of our algorithm via numerical examples and real-world twitter and memetracker datasets.
Full-text available
Community-detection methods that describe large-scale patterns in the dynamics on and of networks suffer from effects of limited memory and arbitrary time binning. We develop a variable-order Markov chain model that generalizes the stochastic block model for discrete time-series as well as temporal networks. The temporal model does not use time binning but takes full advantage of the time-ordering of the tokens or edges. When the edge ordering is random, we recover the traditional static block model as a special case. Based on statistical evidence and without overfitting, we show how a Bayesian formulation of the model allows us to select the most appropriate Markov order and number of communities.
Conference Paper
Full-text available
We study the problem of approximating the 3-profile of a large graph. 3-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 3-profile sparsifiers: sparse graphs that can be used to approximate the full 3-profile counts for a given large graph. Further, we study the problem of estimating lo- cal and ego 3-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex pro- gram over the GraphLab framework. We introduce the con- cept of edge pivoting which allows us to collect 2-hop infor- mation without maintaining an explicit 2-hop neighborhood list at each vertex. This enables the computation of all the local 3-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to 640 cores on Amazon EC2. We find that our algorithm can estimate the 3-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 3-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.
Full-text available
Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressiveness and their natural ability to represent complex relationships. Originally, techniques focused on anomaly detection in static graphs, which do not change and are capable of representing only a single snapshot of data. As real‐world networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. In this survey, we aim to provide a comprehensive overview of anomaly detection in dynamic networks, concentrating on the state‐of‐the‐art methods. We first describe four types of anomalies that arise in dynamic networks, providing an intuitive explanation, applications, and a concrete example for each. Having established an idea for what constitutes an anomaly, a general two‐stage approach to anomaly detection in dynamic networks that is common among the methods is presented. We then construct a two‐tiered taxonomy, first partitioning the methods based on the intuition behind their approach, and subsequently subdividing them based on the types of anomalies they detect. Within each of the tier one categories—community, compression, decomposition, distance, and probabilistic model based—we highlight the major similarities and differences, showing the wealth of techniques derived from similar conceptual approaches. WIREs Comput Stat 2015, 7:223–247. doi: 10.1002/wics.1347 This article is categorized under: Algorithms and Computational Methods > Algorithms Data: Types and Structure > Graph and Network Data Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition
Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social-network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task. Here, we describe the Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy-to-use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines, and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs in which nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and metadata on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic; they can be modified during the computation at low cost. SNAP is provided as an open-source library in C++ as well as a module in Python. We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.
Conference Paper
The recent interest in networks-social, physical, communication, information, etc.-has fueled a great deal of research on the analysis and modeling of graphs. However, many of the analyses have focused on a single large network (e.g., a sub network sampled from Facebook). Although several studies have compared networks from different domains or samples, they largely focus on empirical exploration of network similarities rather than explicit tests of hypotheses. This is in part due to a lack of statistical methods to determine whether two large networks are likely to have been drawn from the same underlying graph distribution. Research on across-network hypothesis testing methods has been limited by (i) difficulties associated with obtaining a set of networks to reason about the underlying graph distribution, and (ii) limitations of current statistical models of graphs that make it difficult to represent variations across networks. In this paper, we exploit the recent development of mixed-Kronecker Product Graph Models, which accurately capture the natural variation in real world graphs, to develop a model-based approach for hypothesis testing in networks.
Roll-call voting in the United Nations General Assembly (UNGA) has long attracted the attention of scholars; first to study the formation of voting blocs in the UNGA and more recently to create indicators for the common interests of states. This chapter discusses the data and the various choices scholars have to make when using these data for both these purposes. The chapter points out various common errors, such as confusing abstentions and absentee votes, and discusses appropriate methodologies for estimating state preferences from observed vote choices. I argue that studies that use UN voting data to measure common interests pay insufficient attention to the content of UN votes and show how ignoring (changes in) the UN’s agenda and dimensions of contestation can lead to serious biases. The chapter reviews characteristics of available data and gives a bird’s eye view of the history of UN voting.