Cost-based optimization in DB2 XML

IBM Almaden Research Center, 650 Harry Road, San Jose, California 95120, USA
Ibm Systems Journal (Impact Factor: 1.79). 01/2006; 45(2):299-320. DOI: 10.1147/sj.452.0299
Source: DBLP


DB2 XML is a hybrid database system that combines the relational capabilities of DB2 Universal Database™ (UDB) with comprehensive native XML support. DB2 XML augments DB2® UDB with a native XML store, XML indexes, and query processing capabilities for both XQuery and SQL/XML that are integrated with those of SQL. This paper presents the extensions made to the DB2 UDB compiler, and especially its cost-based query optimizer, to support XQuery and SQL/XML queries, using much of the same infrastructure developed for relational data queried by SQL. It describes the challenges to the relational infrastructure that supporting XQuery and SQL/XML poses and provides the rationale for the extensions that were made to the three main parts of the optimizer: the plan operators, the cardinality and cost model, and statistics collection.

Download full-text


Available from: Guy M. Lohman,
61 Reads
  • Source
    • "Unfortunately, they do not cover set-oriented SJ and HTJ operators. Balmin et al. [3] sketch the development of a hybrid costbased optimizer for SQL and XQuery being part of DB2 XML. Compared to our approach, they evaluate every path expression using an HTJ operator and cannot decide on a fine-granular level whether to use SJ operators or not. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Even though an effective cost-based query optimizer is of utmost importance for the efficient evaluation of XQuery expressions in native XML database systems, such a component is currently out of sight, because former approaches do not pay attention to the latest advances in the area of physical operators (e. g., Holistic Twig Joins and advanced indexes) or just focus only on some of them. To support the development of native XML query optimizers, we introduce an extensible cost-based optimization framework that integrates the cutting-edge XML query evaluation operators into a single system. Using the well-known plan generation techniques from the relational world and a novel set of plan equivalences---which allows for the generation of alternative query plans consisting of Structural Joins, Holistic Twig Joins, and numerous indexes (especially path indexes and content-and-structure indexes)---our optimizer can now benefit from the knowledge on native XML query evaluation to speed-up query execution significantly.
    Fourteenth International Database Engineering and Applications Symposium (IDEAS 2010), August 16-18, 2010, Montreal, Quebec, Canada; 01/2010
  • Source
    • "Cost-based query optimization in XML databases, although not as well covered in the literature as selectivity estimation, has been employed successfully in commercial databases like IBM DB2 pureXML [2] [3]. Balmin et al. [2] [3] outlines some of the cost models and optimization heuristics used in DB2 pureXML. Hidaka et al. [12] outlines a cost model for XQuery. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The wide availability of commodity multi-core systems presents an opportunity to address the latency issues that have plaqued XML query processing. However, simply executing multiple XML queries over multiple cores merely addresses the throughput issue: intra-query parallelization is needed to exploit multiple processing cores for better latency. Toward this effort, this paper investigates the parallelization of individual XPath queries over shared-address space multi-core processors. Much previous work on parallelizing XPath in a distributed setting failed to exploit the shared memory parallelism of multi-core systems. We propose a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query. This decision is based on a statistics-based approach that relies both on the query specifics and the data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallelization algorithm uses XML statistics to estimate the relative efficiencies of these different alternatives and find an optimal parallel XPath processing plan. Our experiments using well-known XML documents validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.
    EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings; 01/2010
  • Source
    • "Bloom filters are used in a wide variety of application areas , such as databases [1], distributed information retrieval [20], network computing [5], and bioinformatics [15]. Some of these applications require large Bloom filters to reduce the false positive rate. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bloom Filters are widely used in many applications includ-ing database management systems. With a certain allowable error rate, this data structure provides an efficient solution for membership queries. The error rate is inversely pro-portional to the size of the Bloom filter. Currently, Bloom filters are stored in main memory because the low locality of operations makes them impractical on secondary storage. In multi-user database management systems, where there is a high contention for the shared memory heap, the limited memory available for allocating a Bloom filter may cause a high rate of false positives. In this paper we are proposing a technique to reduce the memory requirement for Bloom filters with the help of solid state storage devices (SSD). By using a limited memory space for buffering the read/write requests, we can afford a larger SSD space for the actual Bloom filter bit vector. In our experiments we show that with significantly less memory requirement and fewer hash functions the proposed technique reduces the false positive rate effectively. In addition, the proposed data structure runs faster than the traditional Bloom filters by grouping the inserted records with respect to their locality on the filter.
Show more