[Show abstract][Hide abstract] ABSTRACT: Abstract Energy consumption,in hosting Internet services is be- coming,a pressing issue as these services scale up. Dy- namic,server provisioning techniques are effective in turning off unnecessary,servers to save energy. Such techniques, mostly studied for request-response services , face challenges in the context of connection servers that host a large number,of long-lived TCP connections. In this paper, we characterize unique properties, perfor- mance, and power models of connection servers, based on a real data trace collected from the deployed Windows Live Messenger. Using the models, we design server provisioning and load dispatching algorithms and study subtle interactions between them. We show that our al- gorithms can save a significant amount of energy without sacrificing user experiences.
5th USENIX Symposium on Networked Systems Design & Implementation, NSDI 2008, April 16-18, 2008, San Francisco, CA, USA, Proceedings; 01/2008
[Show abstract][Hide abstract] ABSTRACT: Sensor networks have been widely used to collect data about the environment. When analyzing data from these systems, people tend to ask exploratory questions—they want to find subsets of data, namely signal, reflecting some characteris- tics of the environment. In this paper, we study the problem of searching for drops in sensor data. Specifically, the search is to find periods in history when a certain amount of drop over a threshold occurs in data within a time span. We propose a framework, SegDi , for extracting features, com- pressing them, and transforming the search into standard database queries. Approximate results are returned from the framework with the guarantee that no true events are missed and false positives are within a user specified toler- ance. The framework eciently utilizes space and provides fast response to users' search. Experimental results with real world data demonstrate the eciency of our framework with respect to feature size and search time.
EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings; 01/2008
[Show abstract][Hide abstract] ABSTRACT: Each of us has a complex and reciprocal relationship with our environment. Based on limited knowledge of this interwoven set of influences and consequences, we constantly make choices: where to live, how to go to work, what brands to buy, what to do with our leisure time. These choices evolve into patterns, and these patterns become driving functions of our relationship with the world around us. With increasing ease, devices we carry can sense, process, and transmit data on these patterns for our own use or to share, carefully, with others. In particular, here we will focus on location time series, gathered from GPS-enabled personal mobile devices. From this capacity emerges a new class of hybrid mobile-web applications that, first, enable personal exploration of our own patterns and, second, use the same data to index our life into other available datasets about the world around us. Such applications, revealing the previously unobservable about our own lives, offer an opportunity to employ mobile technology to illuminate the ramifications of our choices on others and the effects of the "microenvironments" we move through on us [1, 10].
Proceedings of the 9th Workshop on Mobile Computing Systems and Applications, HotMobile 2008, Napa Valley, California, USA, February 25-26, 2008; 01/2008
[Show abstract][Hide abstract] ABSTRACT: Frequent pattern mining from data streams is an active research topic in data mining. Existing research efforts often rely on a two-phase framework to discover frequent patterns: (1) using internal data structures to store meta-patterns obtained by scanning the stream data; and (2) re-mining the meta-patterns to finalize and output frequent patterns. The defectiveness of such a two-phase framework lies in the fact that the two stages provide barriers to dynamically and immediately finding frequent patterns with online functionalities. It is expected that a single-phase algorithm can fulfil frequent pattern mining from data streams in such a way that the users can see patterns in an immediate and dynamic manner, as soon as the patterns have become frequent. In this paper, we propose INSTANT, a single-phase algorithm for discovering frequent itemsets from data streams. The theoretical foundation of INSTANT is based on a framework theory on a set of itemsets, which is also presented in the paper. The novel design of INSTANT ensures that it employs com- pact data structures to mine frequent patterns from data streams in a single phase. Our experimental results demonstrate the time and space efficiency of the proposed algorithm.
Journal of Information Science 03/2007; 33:251-262. · 1.09 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential
pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns
to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique
feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process
to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As
MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.
New Generation Computing 01/2007; 26:75-96. · 0.59 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: This paper defines a challenging problem of pattern matching between a pattern P and a text T, with wildcards and length constraints, and designs an efficient algorithm to return each pattern occurrence in an online manner. In this pattern matching problem, the user can specify the constraints on the number of wildcards between each two consecutive letters of P and the constraints on the length of each matching substring in T. We design a complete algorithm, SAIL that returns each matching substring of P in T as soon as it appears in T in an O(n+klmg) time with an O(lm) space overhead, where n is the length of T, k is the frequency of P's last letter occurring in T, l is the user-specified maximum length for each matching substring, m is the length of P, and g is the maximum difference between the user-specified maximum and minimum numbers of wildcards allowed between two consecutive letters in P.
Knowledge and Information Systems 10/2006; 10(4):399-419. · 2.64 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, we deal with mining sequential patterns in multiple data streams. Building on a state-of-the-art sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE, an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns to avoid redundant data scanning, and can therefore effectively speed up the new patterns' discovery process. Another unique feature of MILE is that it can incorporate some prior knowledge of the data distribution in data streams into the mining process to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As MILE consumes more memory than PrefixSpan, we also present a solution to balance the memory usage and time efficiency in memory constrained environments.
Data Mining, Fifth IEEE International Conference on; 12/2005
[Show abstract][Hide abstract] ABSTRACT: There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different data streams. In this paper, we define a challenging problem of mining frequent sequential patterns across multiple data streams. We propose an efficient algorithm MILE 1 to manage the mining process. The proposed algorithm recur-sively utilizes the knowledge of existing patterns to make new patterns' mining fast. We also apply a state-of-the-art sequential pattern mining algorithm PrefixSpan which was designed for transaction databases to solve our problem. Extensive empirical results show that MILE is signif-icantly faster than PrefixSpan. One unique feature of our algorithm is when some prior knowledge of the data distribution in the data streams is available, it can be incorporated into the mining process to further im-prove the performance of MILE. As MILE consumes more memory than PrefixSpan, we also propose a solution to balance the memory usage and time efficiency in memory limited environments.
[Show abstract][Hide abstract] ABSTRACT: Sensor systems are quickly becoming a flexible, inexpensive, and reliable platform to provide solutions for a wide variety of applications in real-world settings. For instance, sensor systems have been used for medical monitoring, detection and classification for defense purposes, and to perform envi- ronmental monitoring. The increase in the proliferation of sensor systems has paralleled the use of more heterogeneous systems in deployments. This paper presents a scalable, open middleware that enables interoperability between var- ied sensor systems and also details some of the fundamental services needed in such an environment. The ESP frame- work aims to provide a standard way to manage, query, and interact with sensor systems. We provide a method to de- scribe sensor systems using ESPml, a XML schema, as the modeling language. The fundamental goal of the schema is to be able to describe sensor systems in a simple, com- pact manner while still having the ability to represent de- tails such as the general setup, the type of data that can be provided, and the commands that are available. In ad- dition, we present a web services based framework that en- ables system discovery using metadata information, interac- tion using system defined functional abstractions, and the ability to publish sensor data for future retrieval and ag- gregation purposes. By implementing several systems that scale in terms of complexity and capabilities, we show that the framework can accommodate various types of sensor sys- tems. Also, performance tests of the framework show that the middleware is able to handle high demands on services.
[Show abstract][Hide abstract] ABSTRACT: Over the last decade, embedded sensing systems have been successfully deployed in a range of application areas, from ed- ucation and science to military and industry. These systems are becoming more robust, capable, and widely adopted. Yet today, most sensor networks function in isolated patches, each withdifferentmechanismstodeliverdatatotheirusers, andof- ten have no formal methods to share data with others. As sen- sornets become more numerous and their data more valuable, it becomes increasingly important to have common means to share data over the Internet. In addition to simplifying use of a single sensornet, we seek to enable sharing of data across multiple systems, and ultimately slogging (sensornet logging), where a single user may discover, process, and republish data from thousands of independently operated sensors. To meet these goals we propose an architecture to interconnect, share, and search sensor data. This paper describes the building blocks of this architecture: sensor stores, search engines, and publishers, joined by a common sensor data streaming proto- col. We then detail the research challenges that must be ad- dressed to meet our goal of enabling sensor access to users from scientists, data analysts, to citizen scientists. Categories and Subject Descriptors
[Show abstract][Hide abstract] ABSTRACT: Mining data streams often requires real-time extraction of interesting patterns from dynamic and continuously growing data. This requirement has imposed challenges on discovering and outputting current useful patterns in an instant way, commonly referred to as online streaming data mining. In this paper, we present INSTANT, a novel algorithm that explores maximal frequent itemsequences from streaming data in an online fashion. We first provide useful operators on the lattice of itemsequential sets, and then apply them to the algorithm design of INSTANT. In comparison with the most popular methods such as close-itemset based mining algorithms, INSTANT has solid theoretical foundations to ensure that it employs more compact in-memory data structures than closed itemse-quences. Experimental results show that our method can achieve better results than previous related methods in terms of both time and space efficiency.