Conference Paper

Relaxing Join and Selection Queries.

Conference: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006
Source: DBLP


Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the 'minimal' amount of relaxation that yields an answer. We first characterize the types of answers that we return to relaxed queries. We then propose a lattice based framework in order to aid query relaxation. Nodes in the lattice correspond to different ways to relax queries. We characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer. We then discuss how to traverse this lattice in a way that a non-empty query answer is obtained with the minimum amount of query condition relaxation. We implemented this framework and we present our results of a thorough performance evaluation using real and synthetic data. Our results indicate the practical utility of our framework.

Download full-text


Available from: Anthony Tung, Jul 29, 2015
  • Source
    • ", [13], [2], [8], [6], [10], [16], [17], [15], [28], [29], [11], [12] have been proposed to study the approximate string search problem, which, given a set of strings and a query string, finds all similar strings of the query string from the set. Existing methods usually employ a filter-and-verify framework . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dictionary-based entity extraction has attracted much attention from the database community recently, which locates sub strings in a document into predefined entities (e.g., person names or locations). To improve extraction recall, a recent trend is to provide approximate matching between sub strings of the document and entities by tolerating minor errors. In this paper we study dictionary-based approximate entity extraction with edit-distance constraints. Existing methods have several limitations. First, they need to tune many parameters to achieve high performance. Second, they are inefficient for large edit-distance thresholds. We propose a trie-based method to address these problems. We first partition each entity into a set of segments, and then use a trie structure to index segments. To extract similar entities, we search segments from the document, and extend the matching segments in both entities and the document to find similar pairs. We develop an extension-based method to efficiently find similar string pairs by extending the matching segments. We optimize our partition scheme and select the best partition strategy to improve the extraction performance. Experimental results show that our method achieves much higher performance compared with state-of-the-art studies.
    Preview · Article · Apr 2012
  • Source
    • "More recently there are even k-nearest neighbor considerations, like [5] [25] [4] [23] [22] [24], that are applicable in the setting of searching over a database. Similar to the k-nearest neighbors, but from a join relaxation problem in databases is the work described in [16]. We find such work very valuable in relaxing the user query and finding good quality results within a reasonable distance around what was specified in the query. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Web search engines and specialized online verticals are increasingly incorporating results from structured data sources to answer semantically rich user queries. For example, the query \WebQuery{Samsung 50 inch led tv} can be answered using information from a table of television data. However, the users are not domain experts and quite often enter values that do not match precisely the underlying data. Samsung makes 46- or 55- inch led tvs, but not 50-inch ones. So a literal execution of the above mentioned query will return zero results. For optimal user experience, a search engine would prefer to return at least a minimum number of results as close to the original query as possible. Furthermore, due to typical fast retrieval speeds in web-search, a search engine query execution is time-bound. In this paper, we address these challenges by proposing algorithms that rewrite the user query in a principled manner, surfacing at least the required number of results while satisfying the low-latency constraint. We formalize these requirements and introduce a general formulation of the problem. We show that under a natural formulation, the problem is NP-Hard to solve optimally, and present approximation algorithms that produce good rewrites. We empirically validate our algorithms on large-scale data obtained from a commercial search engine's shopping vertical.
    Preview · Article · Aug 2011
  • Source
    • "Other works have also considered the issue of skyline join computation . The first work that implicitly deals with skyline join is from Koudas et al. [8]. Nevertheless, the proposed algorithms either do not support early termination or require multiple indexes over each input relation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the problem of efficiently computing the skyline set of a relational join. Existing techniques either require to access all tuples of the input relations or demand specialized multi-dimensional access methods to generate the skyline join result. To avoid these inefficiencies, we introduce the novel SFSJ algorithm that fuses the identification of skyline tuples with the computation of the join. SFSJ is able to compute the correct skyline set by accessing only a subset of the input tuples, i.e., it has the property of early termination. SFSJ employs standard access methods for reading the input tuples and is readily implementable in an existing database system. Moreover, it can be used in pipelined execution plans, as it generates the skyline tuples progressively. Additionally, we formally analyze the performance of SFSJ and propose a novel strategy for accessing the input tuples that is proven to be optimal for SFSJ. Finally, we present an extensive experimental study that validates the effectiveness of SFSJ and demonstrates its advantages over existing techniques.
    Full-text · Conference Paper · Jan 2011
Show more