Conference Paper
Relaxing Join and Selection Queries.
Conference: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 1215, 2006
Source: DBLP

Conference Paper: Worstcase optimal join algorithms: [extended abstract]
[Show abstract] [Hide abstract]
ABSTRACT: Efficient join processing is one of the most fundamental and wellstudied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worstcase data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is informationtheoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this paper that any projectjoin plan is polynomially slower than the optimal bound for some queries. We construct an algorithm whose running time is worstcase optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. In addition, we show that this bound is equivalent to a geometric inequality by Bollobás and Thomason, one of whose special cases is the famous LoomisWhitney inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.Proceedings of the 31st symposium on Principles of Database Systems; 05/2012 
Article: An Efficient Triebased Method for Approximate Entity Extraction with EditDistance Constraints
[Show abstract] [Hide abstract]
ABSTRACT: Dictionarybased entity extraction has attracted much attention from the database community recently, which locates sub strings in a document into predefined entities (e.g., person names or locations). To improve extraction recall, a recent trend is to provide approximate matching between sub strings of the document and entities by tolerating minor errors. In this paper we study dictionarybased approximate entity extraction with editdistance constraints. Existing methods have several limitations. First, they need to tune many parameters to achieve high performance. Second, they are inefficient for large editdistance thresholds. We propose a triebased method to address these problems. We first partition each entity into a set of segments, and then use a trie structure to index segments. To extract similar entities, we search segments from the document, and extend the matching segments in both entities and the document to find similar pairs. We develop an extensionbased method to efficiently find similar string pairs by extending the matching segments. We optimize our partition scheme and select the best partition strategy to improve the extraction performance. Experimental results show that our method achieves much higher performance compared with stateoftheart studies.01/2012;  [Show abstract] [Hide abstract]
ABSTRACT: This tutorial provides a comprehensive overview of recent research progress on the important problem of approximate search in string collections. We identify existing indexes, search algorithms, filtering strategies, selectivityestimation techniques and other work, and comment on their respective merits and limitations. 1. MOTIVATION Text data is ubiquitous. Management of string data in databases and information systems has taken on particular importance recently. This tutorial focuses on the following problem: Given a collection of strings, eciently identify
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.