-
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011; 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Large bipartite graphs which evolve and grow over time (e.g., new links arrive, old links die out, or link weights change)
arise in many settings, such as social networks, co-citations, market-basket analysis, and collaborative filtering.
Our goal is to monitor (i) the centrality of an individual node (e.g., who are the most important authors?) and (ii) the proximity of two nodes or sets of nodes (e.g., who are the most important authors with respect to a particular conference?). Moreover, we want to do this efficiently and incrementally and to provide “any-time” answers. In this chapter we propose
pTrack, which is based on random walks with restart, together with some important modifications to adapt these measures to a dynamic,
evolving setting. Additionally, we develop techniques for fast, incremental updates of these measures that allow us to track
them continuously, as link updates arrive. In addition, we discuss variants of our method that can handle batch updates, as
well as place more emphasis on recent links. Based on proximity tracking, we further proposed cTrack, which enables us to track the centrality of the nodes over time. We demonstrate the effectiveness and efficiency of our
methods on several real data sets.
12/2009: pages 211-236;
-
[show abstract]
[hide abstract]
ABSTRACT: How can we find communities in dynamic networks of social interactions, such as who calls whom, who emails whom, or who sells
to whom? How do we store a large volume of IP network source–destination connection graphs, which grow over time? In this
chapter, we study these two fundamental problems on time-evolving graphs and exploit the subtle connection between pattern
mining and compression. We propose a pattern mining method, GraphScope, that automatically reveals the underlying communities in the graphs, as well as the change points in time. Our method needs
no human intervention, and it is carefully designed to operate in a streaming fashion. Moreover, it is based on lossless compression
principles. Therefore, in addition to revealing the fundamental structure of the graphs, the discovered patterns naturally
lead to an excellent storage scheme for graph streams. Thus, our proposed GraphScope method unifies and solves both the mining and the compression problem (1) by producing meaningful time-evolving patterns
agreeing with human intuition and (2) by identifying key change points in several real large time-evolving graphs. We demonstrate
its efficiency and effectiveness on real data sets from several domains.
12/2009: pages 73-104;
-
[show abstract]
[hide abstract]
ABSTRACT: Given a large bipartite graph (like document-term, or userproduct graph), how can we find meaningful communities, quickly,
and automatically? We propose to look for community hierarchies, with communities- within-communities. Our proposed method,
the Context-specific Cluster Tree (CCT) finds such communities at multiple levels, with no user intervention, based on information theoretic principles (MDL). More
specifically, it partitions the graph into progressively more refined subgraphs, allowing users to quickly navigate from the
global, coarse structure of a graph to more focused and local patterns. As a fringe benefit, and also as an additional indication
of its quality, it also achieves better compression than typical, non-hierarchical methods. We demonstrate its scalability
and effectiveness on real, large graphs.
08/2008: pages 170-187;
-
[show abstract]
[hide abstract]
ABSTRACT: Data stream values are often associated with multiple aspects. For example each value observed at a given time-stamp from environmental sensors may have an associated type (e.g., temperature,
humidity, etc.) as well as location. Time-stamp, type and location are the three aspects, which can be modeled using a tensor
(high-order array). However, the time aspect is special, with a natural ordering, and with successive time-ticks having usually
correlated values. Standard multiway analysis ignores this structure. To capture it, we propose 2 Heads Tensor Analysis (2-heads), which provides a qualitatively different treatment on time. Unlike most existing approaches that use a PCA-like
summarization scheme for all aspects, 2-heads treats the time aspect carefully. 2-heads combines the power of classic multilinear
analysis with wavelets, leading to a powerful mining tool. Furthermore, 2-heads has several other advantages as well: (a)
it can be computed incrementally in a streaming fashion, (b) it has a provable error guarantee and, (c) it achieves significant
compression ratio against competitors. Finally, we show experiments on real datasets, and we illustrate how 2-heads reveals
interesting trends in the data. This is an extended abstract of an article published in the Data Mining and Knowledge Discovery
journal.
Data Mining and Knowledge Discovery 07/2008; 17(1):111-128. · 1.54 Impact Factor
-
TKDD. 01/2008; 2.
-
Proceedings of the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, China, June 1-6, 2008; 01/2008
-
Machine Learning and Knowledge Discovery in Databases, European Conference, ECML/PKDD 2008, Antwerp, Belgium, September 15-19, 2008, Proceedings, Part I; 01/2008
-
IEEE Trans. Circuits Syst. Video Techn. 01/2008; 18:1397-1410.
-
Statistical Analysis and Data Mining. 01/2008; 1:6-22.
-
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008; 01/2008
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the problem of capturing correlations and finding hidden variables corresponding to trends on collections of time
series streams. Our proposed method, SPIRIT, can incrementally find correlations and hidden variables, which summarise the
key trends in the entire stream collection. It can do this quickly, with no buffering of stream values and without comparing
pairs of streams. Moreover, it is any-time, single pass, and it dynamically detects changes. The discovered trends can also
be used to immediately spot potential anomalies, to do efficient forecasting and, more generally, to dramatically simplify
further data processing.
04/2007: pages 261-288;
-
Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12-14, 2007; 01/2007
-
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007; 01/2007
-
Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA; 01/2007
-
Neural Information Processing, 14th International Conference, ICONIP 2007, Kitakyushu, Japan, November 13-16, 2007, Revised Selected Papers, Part I; 01/2007
-
Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006; 01/2006
-
Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006; 01/2006
-
Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings; 01/2006
-
01/2006;