-
PVLDB. 01/2010; 3:494-505.
-
PVLDB. 01/2010; 3:1125-1136.
-
[show abstract]
[hide abstract]
ABSTRACT: Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications.
In this article, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct result at
any point during the algorithms’ execution. We adopt a probabilistic approach where we seek to report at any point of operation
of the algorithm the confidence that the top-k result has been identified. Such a functionality can be a valuable asset when one is interested in reducing the runtime cost
of top-k computations. We present a thorough experimental evaluation to validate our techniques using both synthetic and real data
sets.
The VLDB Journal 03/2009; 18(2):407-427. · 1.56 Impact Factor
-
PVLDB. 01/2009; 2:958-969.
-
VLDB J. 01/2009; 18:407-427.
-
Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007; 01/2007
-
Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007; 01/2007
-
Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007; 01/2007
-
[show abstract]
[hide abstract]
ABSTRACT: XQuery path queries form the basis of complex matching and processing of XML data. Most current XML query processing techniques
can be divided in two groups. Navigation-based algorithms compute results by analyzing an input stream of documents one tag at a time. In contrast, index-based algorithms take advantage of (precomputed or computed-on-demand) numbering schemes over each input XML document in the stream.
In this chapter, we present an index-based technique, Index-Filter, to answer multiple path queries. Index-Filter uses indexes built over the document tags to avoid processing large portions of an input document that are guaranteed not
to be part of any match. We analyze Index-Filter, compare it against Y-Filter, a state-of-the-art navigation-based technique, and present the advantages of each technique.
03/2006: pages 59-81;
-
Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006; 01/2006
-
Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006; 01/2006
-
Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3-8 April 2006, Atlanta, GA, USA; 01/2006
-
IEEE Trans. Knowl. Data Eng. 01/2006; 18:525-539.
-
ACM Trans. Database Syst. 01/2006; 31:161-207.
-
Proceedings of the 2nd Workshop on Data Management for Sensor Networks, in conjunction with VLDB, DMSN 2005, Trondheim, Norway, August 30, 2005; 01/2005
-
Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005; 01/2005
-
Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5-8 April 2005, Tokyo, Japan; 01/2005
-
[show abstract]
[hide abstract]
ABSTRACT: Data Cleaning is an important process that has been at the center of research interest in recent years. Poor data quality is the result of a variety of reasons, including data entry errors and multiple conventions for recording database fields, and has a significant impact on a variety of business issues. Hence, there is a pressing need for technologies that enable flexible (fuzzy) matching of string information in a database. Cosine similarity with tf-idf is a well-established metric for comparing text, and recent proposals have adapted this similarity measure for flexibly matching a query string with values in a single attribute of a relation.
10/2004;
-
[show abstract]
[hide abstract]
ABSTRACT: Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples that are "most related" to a given query tuple. Various techniques have been proposed in the literature for efficiently identifying approximate matches to a query string against a single attribute of a relation. In addition to constructing a ranking (i.e., ordering) of these matches, the techniques often associate, with each match, scores that quantify the extent of the match. Since multiple attributes could exist in the query tuple, issuing approximate match operations for each of them separately will effectively create a number of ranked lists of the relation tuples. Merging these lists to identify a final ranking and scoring, and returning the top-K tuples, is a challenging task.
10/2004;
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that constitute solutions of a large class of optimization problems involving the database tuples. The constraints and the objective function are specified in terms of aggregate functions of relational attributes, and the parameter values identify the constants used in the aggregation constraints.
02/2004;