Preprint

Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

How can we detect traffic disturbances from international flight transportation logs or changes to collaboration dynamics in academic networks? These problems can be formulated as detecting anomalous change points in a dynamic graph. Current solutions do not scale well to large real-world graphs, lack robustness to large amounts of node additions/deletions, and overlook changes in node attributes. To address these limitations, we propose a novel spectral method: Scalable Change Point Detection (SCPD). SCPD generates an embedding for each graph snapshot by efficiently approximating the distribution of the Laplacian spectrum at each step. SCPD can also capture shifts in node attributes by tracking correlations between attributes and eigenvectors. Through extensive experiments using synthetic and real-world data, we show that SCPD (a) achieves state-of-the art performance, (b) is significantly faster than the state-of-the-art methods and can easily process millions of edges in a few CPU minutes, (c) can effectively tackle a large quantity of node attributes, additions or deletions and (d) discovers interesting events in large real-world graphs. The code is publicly available at https://github.com/shenyangHuang/SCPD.git

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Networks provide a powerful formalism for modeling complex systems, by representing the underlying set of pairwise interactions. But much of the structure within these systems involves interactions that take place among more than two nodes at once; for example, communication within a group rather than person-to-person, collaboration among a team rather than a pair of co-authors, or biological interaction between a set of molecules rather than just two. We refer to these type of simultaneous interactions on sets of more than two nodes as higher-order interactions; they are ubiquitous, but the empirical study of them has lacked a general framework for evaluating higher-order models. Here we introduce such a framework, based on link prediction, a fundamental problem in network analysis. The traditional link prediction problem seeks to predict the appearance of new links in a network, and here we adapt it to predict which (larger) sets of elements will have future interactions. We study the temporal evolution of 19 datasets from a variety of domains, and use our higher-order formulation of link prediction to assess the types of structural features that are most predictive of new multi-way interactions. Among our results, we find that different domains vary considerably in their distribution of higher-order structural parameters, and that the higher-order link prediction problem exhibits some fundamental differences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the appearance of new interactions.
Conference Paper
Full-text available
Automatic Dependent Surveillance-Broadcast (ADS-B) is one of the key components of the next generation air transportation system. Since ADS-B will become mandatory by 2020 for most airspaces, it is important that aspects such as capacity, applications, and security are investigated by an independent research community. However, large-scale real-world data was previously only accessible to a few closed industrial and governmental groups because it required specialized and expensive equipment. To enable researchers to conduct experimental studies based on real data, we developed OpenSky, a sensor network based on low-cost hardware connected over the Internet. OpenSky is based on off-the-shelf ADS-B sensors distributed to volunteers throughout Central Europe. It covers 720,000 sq km2, is able to capture more than 30% of the commercial air traffic in Europe, and enables researchers to analyze billions of ADS-B messages. In this paper, we report on the challenges we faced during the development and deployment of this participatory network and the insights we gained over the last two years of operations as a service to academic research groups. We go on to provide real-world insights about the possibilities and limitations of such low-cost sensor networks concerning air traffic surveillance and further applications such as multilateration.
Conference Paper
Full-text available
We report on an automated runtime anomaly detection method at the application layer of multi-node computer systems. Although several network management systems are available in the market, none of them have sufficient capabilities to detect faults in multi-tier Web-based systems with redundancy. We model a Web-based system as a weighted graph, where each node represents a "service" and each edge represents a dependency between services. Since the edge weights vary greatly over time, the problem we address is that of anomaly detection from a time sequence of graphs.In our method, we first extract a feature vector from the adjacency matrix that represents the activities of all of the services. The heart of our method is to use the principal eigenvector of the eigenclusters of the graph. Then we derive a probability distribution for an anomaly measure defined for a time-series of directional data derived from the graph sequence. Given a critical probability, the threshold value is adaptively updated using a novel online algorithm.We demonstrate that a fault in a Web application can be automatically detected and the faulty services are identified without using detailed knowledge of the behavior of the system.
Conference Paper
Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues --- the \em spectral density --- is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret. In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to handle the spectral signatures of common graph motifs. The resulting methods are highly efficient, as we illustrate by computing spectral densities for graphs with over a billion edges on a single compute node. Beyond providing visually compelling fingerprints of graphs, we show how the estimation of spectral densities facilitates the computation of many common centrality measures, and use spectral densities to estimate meaningful information about graph structure that cannot be inferred from the extremal eigenpairs alone.
Conference Paper
How do we spot interesting events from e-mail or transportation logs? How can we detect port scan or denial of service attacks from IP-IP communication data? In general, given a sequence of weighted, directed or bipartite graphs, each summarizing a snapshot of activity in a time window, how can we spot anomalous graphs containing the sudden appearance or disappearance of large dense subgraphs (e.g., near bicliques) in near real-time using sublinear memory? To this end, we propose a randomized sketching-based approach called SpotLight, which guarantees that an anomalous graph is mapped 'far' away from 'normal' instances in the sketch space with high probability for appropriate choice of parameters. Extensive experiments on real-world datasets show that SpotLight (a) improves accuracy by at least 8.4% compared to prior approaches, (b) is fast and can process millions of edges within a few minutes, (c) scales linearly with the number of edges and sketching dimensions and (d) leads to interesting discoveries in practice.
Conference Paper
A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.
Conference Paper
In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph comprised of six types of entities that model the scholarly activities: field of study, author, institution, paper, venue, and event. In addition to obtaining these entities from the publisher feeds as in the previous effort, we in this version include data mining results from the Web index and an in-house knowledge base from Bing, a major commercial search engine. As a result of the Bing integration, the new MAS graph sees significant increase in size, with fresh information streaming in automatically following their discoveries by the search engine. In addition, the rich entity relations included in the knowledge base provide additional signals to disambiguate and enrich the entities within and beyond the academic domain. The number of papers indexed by MAS, for instance, has grown from low tens of millions to 83 million while maintaining an above 95% accuracy based on test data sets derived from academic activities at Microsoft Research. Based on the data set, we demonstrate two scenarios in this work: a knowledge driven, highly interactive dialog that seamlessly combines reactive search and proactive suggestion experience, and a proactive heterogeneous entity recommendation.
Article
A stochastic model is proposed for social networks in which the actors in a network are partitioned into subgroups called blocks. The model provides a stochastic generalization of the blockmodel. Estimation techniques are developed for the special case of a single relation social network, with blocks specified a priori. An extension of the model allows for tendencies toward reciprocation of ties beyond those explained by the partition. The extended model provides a one degree-of-freedom test of the model. A numerical example from the social network literature is used to illustrate the methods.
Article
In this paper the solution to the problem of placing n connected points (or nodes) in r-dimensional Euclidean space is given. The criterion for optimality is minimizing a weighted sum of squared distances between the points subject to quadratic constraints of the form X'X - 1, for each of the r unknown coordinate vectors. It is proved that the problem reduces to the minimization of a sum or r positive semi-definite quadratic forms which, under the quadratic constraints, reduces to the problem of finding r eigenvectors of a special "disconnection" matrix. It is shown, by example, how this can serve as a basis for cluster identification.
Conference Paper
Spectral graph theory is the study of the eigenvalues and eigenvectors of matrices associated with graphs. In this tutorial, we will try to provide some intuition as to why these eigenvectors and eigenvalues have combinatorial significance, and will sitn'ey some of their applications.
Robust random cut forest based anomaly detection on streams
  • S Guha
  • N Mishra
  • G Roy
  • O Schrijvers
S. Guha, N. Mishra, G. Roy, and O. Schrijvers. Robust random cut forest based anomaly detection on streams. In International conference on machine learning, pages 2712-2721. PMLR, 2016.
Density of states graph kernels
  • L Huang
  • A J Graven
  • D Bindel
L. Huang, A. J. Graven, and D. Bindel. Density of states graph kernels. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), 2021.
Tensorsplat: Spotting latent anomalies in time
  • D Koutra
  • E E Papalexakis
  • C Faloutsos
D. Koutra, E. E. Papalexakis, and C. Faloutsos. Tensorsplat: Spotting latent anomalies in time. In 2012 16th Panhellenic Conference on Informatics. IEEE.
Detecting change points in the large-scale structure of evolving networks
  • L Peel
  • A Clauset
L. Peel and A. Clauset. Detecting change points in the large-scale structure of evolving networks. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
Chartalist: Labeled graph datasets for utxo and account-based blockchains
  • K Shamsi
  • F Victor
  • M Kantarcioglu
  • Y Gel
  • C G Akcora
K. Shamsi, F. Victor, M. Kantarcioglu, Y. Gel, and C. G. Akcora. Chartalist: Labeled graph datasets for utxo and account-based blockchains. Advances in Neural Information Processing Systems, 35:34926-34939, 2022.
How powerful are graph neural networks?
  • K Xu
  • W Hu
  • J Leskovec
  • S Jegelka
K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.