Frequent Items Computation over Uncertain Wireless Sensor Network.
ABSTRACT There is an increasing interest in uncertain and probabilistic databases arising in application domains such as sensor networks, information retrieval, mobile object data management, information extraction, and data integration. A range of different approaches have been proposed to find the frequent items in uncertain database. But there is little work on processing such query in distributed, in-network inference, such as sensor network. In sensor network, communication is the primary problem because of limited batteries. In this paper, a synopsis with minimum amount tuples is proposed, which sufficient for answering the top-k query. And this synopsis can be dynamic maintained with new tuples been added. A novel communication efficient algorithm is presented in taking advantage of this synopsis. The test results confirm the effectiveness and efficiency of our approaches.
- SourceAvailable from: unibo.it[show abstract] [hide abstract]
ABSTRACT: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). Each object is assigned an overall grade, that is obtained by combining the attribute grades using a fixed monotone aggregation function, or combining rule, such as min or average. To determine the top k objects, that is, k objects with the highest overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagin's Algorithm”, or FA) that is much more efficient. For some monotone aggregation functions, FA is optimal with high probability in the worst case. We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability worst-case sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer. TA allows early stopping, which yields, in a precise sense, an approximate version of the top k answers. We distinguish two types of access: sorted access (where the middleware system obtains the grade of an object in some sorted list by proceeding through the list sequentially from the top), and random access (where the middleware system requests the grade of object in a list, and obtains it in one step). We consider the scenarios where random access is either impossible, or expensive relative to sorted access, and provide algorithms that are essentially optimal for these cases as well.Journal of Computer and System Sciences. 01/2001;
Conference Proceeding: Fast Algorithms for Mining Association Rules in Large DatabasesProceedings of the 20th International Conference on Very Large Data Bases; 01/1994
Conference Proceeding: Top-k Query Processing in Uncertain Databases[show abstract] [hide abstract]
ABSTRACT: Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on "marriage" of traditional top-k semantics and possible worlds semantics. In the light of these formulations, we construct a framework that encapsulates a state space model and efficient query processing techniques to tackle the challenges of uncertain data settings. We prove that our techniques are optimal in terms of the number of accessed tuples and materialized search states. Our experiments show the efficiency of our techniques under different data distributions with orders of magnitude improvement over naive materialization of possible worlds.Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on; 05/2007