Article

An Extensible Index for XML Containment Queries

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Containment queries for XML documents is one of the most important query types, and thus the efficient support for this type of query is crucial for XML databases. Recently, object-relational database management system (ORDBMS) vendors try to store and retrieve XML data in their products. In this paper, we propose an extensible index to support containment queries over the XML data stored as BLOB type in ORDBMSs. That is, we describe how to implement the index using the extensibility feature of an ORDBMS, and describe its usage.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Content-based retrieval of images is the ability to retrieve images that are similar to a query image. Oracle8i Visual Information Retrieval provides this facility based on technology licensed from Virage, Inc. This product is built on top of Oracle8i interMedia which enables storage, retrieval and management of images, audios and videos. Images are matched using attributes such as color, texture and structure and efficient content-based retrieval is provided using indexes of an image index type. The design of the index type is based on a multi-level filtering algorithm. The filters reduce the search space so that the expensive comparison algorithm operates on a small subset of the data. Bitmap indexes are used to evaluate the first filter resulting in a design which performs well and is scalable. The image index type is built using Oracle8i extensible indexing technology, allowing users to create, use, and drop instances of this index type as they would any other standard index. In this paper we present an overview of the product, the design of the image index type, and some performance results of our product.
Article
Full-text available
Data Types): If the underlying DBMS supports user-defined ADTs, we can define the database schemas exploiting them. An example of such an ADT is the one representing regions. In the basic XRel schema, two separate attributes are used to represent regions. We can define an ADT REGION to manage regions of nodes. An instance of the REGION type is a pair of numbers (r, s) such that 0 r s. Useful functions that could be defined for this ADT include the following: ---BOOLEAN contain(REGION pos) For a given REGION instance pos = (r a , s a ), this function returns true if (r < r a ) (s a < s) holds, and returns false otherwise.
Article
Full-text available
A wide-range of applications, including Publish/Subscribe, Workflow, and Web-site Personalization, require maintaining user's interest in expected data as conditional expressions. This paper proposes to manage such expressions as data in Relational Database Systems (RDBMS). This is accomplished 1) by allowing expressions to be stored in a column of a database table and 2) by introducing a SQL EVALUATE operator to evaluate expressions for given data. Expressions when combined with predicates on other forms of data in a database, are just a flexible and powerful way of expressing interest in a data item. The ability to evaluate expressions (via EVALUATE operator) in SQL, enables applications to take advantage of the expressive power of SQL to support complex subscription models. The paper describes the key concepts, presents our approach of managing expressions in Oracle RDBMS, discusses a novel indexing scheme that allows efficient filtering of a large set of expressions, and outlines future directions.
Article
In order to better support current and new applications, the major DBMS vendors are stepping beyond uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and third-party developers alike. This paper reports on an implementation of an Informix DataBlade for the GR-tree, a new R-tree based index. This effort represents a stress test of the perhaps currently most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.
Article
XML is fast emerging as the dominant standard for representing data in the World Wide Web. Sophisticated query engines that allow users to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. While there has been a great deal of activity recently proposing new semistructured data models and query languages for this purpose, this paper explores the more conservative approach of using traditional relational database engines for processing XML documents conforming to Document Type Descriptors (DTDs). To this end, we have developed algorithms and implemented a prototype system that converts XML documents to relational tuples, translates semi-structured queries over XML documents to SQL queries over tables, and converts the results to XML. We have qualitatively evaluated this approach using several real DTDs drawn from diverse domains. It turns out that the relational approach can handle most (but not all) of the semantic...
Article
Spatial indexing has been one of the active focus ar-eas in recent database research. Several variants of Quadtree and R-tree indexes have been proposed in database literature. In this paper, we first describe briefly our implementation of Quadtree and R-tree index structures and related optimizations in Ora-cle Spatial. We then examine the relative merits of t h e two structures as implemented in Oracle Spatial and compare their performance for different types of queries and other operations. Finally, we summarize our experiences with these different structures in in-dexing large GIS datasets in Oracle Spatial.
Article
The World Wide Web Consortium has convened a working group to design a query language for Extensible Markup Language (XML) data sources. This new query language, called XQuery, is still evolving and has been described in a series of drafts published by the working group. XQuery is a functional language comprised of several kinds of expressions that can be nested and composed with full generality. It is based on the type system of XML Schema and is designed to be compatible with other XML-related standards. This paper explains the need for an XML query language, provides a tutorial overview of XQuery, and includes several examples of its use.
Article
The inverted index is widely used in the existing information retrieval field. In order to support containment queries for structured documents such as XML, it needs to be extended. Previous work suggested an extension in storing the inverted index for XML documents and processing containment queries, and compared two implementation options: using an RDBMS and using an Information Retrieval (IR) engine. However, the previous work has two drawbacks in extending the inverted index. One is that the RDBMS implementation is generally much worse in the performance than the IR engine implementation. The other is that when a containment query is processed in an RDBMS, the number of join operations increases in proportion to the number of containment relationships in the query and a join operation always occurs between large relations. In order to solve these problems, we propose in this paper a novel approach to extend the inverted index for containment query processing, and show its effectiveness through experimental results. In particular, our performance study shows that (1) our RDBMS approach almost always outperforms the previous RDBMS and IR approaches, (2) our RDBMS approach is not far behind our IR approach with respect to performance, and (3) our approach is scalable to the number of containment relationships in queries. Therefore, our results suggest that, without having to make any modifications on the RDBMS engine, a native implementation using an RDBMS can support containment queries as efficiently as an IR implementation.
Conference Paper
Extensible indexing is a SQL-based framework that allows users to define domain-specific indexing schemes, and integrate them into the Oracle8i server. Users register a new indexing scheme, the set of related operators, and additional properties through SQL data definition language extensions. The implementation for an indexing scheme is provided as a set of Oracle Data Cartridge Interface (ODCIIndex) routines for index-definition, index-maintenance, and index-scan operations. An index created using the new indexing scheme, referred to as domain index, behaves and performs analogous to those built natively by the database system. The Oracle8i server implicitly invokes user-supplied index implementation code when domain index operations are performed, and executes user-supplied index scan routines for efficient evaluation of domain-specific operators. This paper provides an overview of the framework and describes the steps needed to implement an indexing scheme. The paper also presents a case study of Oracle Cartridges (intermedia text, spatial, and visual information retrieval), and Daylight (Chemical compound searching) Cartridge, which have implemented new indexing schemes using this framework and discusses the benefits and limitations
Conference Paper
In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research Laboratory.
Conference Paper
Over the last decade, database system products have been extended to provide support for defining, storing, updating, indexing and retrieving complex data with full transaction semantics. Oracle, IBM, Informix and others have used extensibility technology to build database system extensions for text, image, spatial, audio/video, chemical, genetic and other types of complex data. Currently, we find database systems being deployed in support of e-commerce. In many cases, these e-commerce database applications use only simple SQL data types to represent items such as office supplies, computers, books and CDs. There is also a large and important set of e-commerce applications that employ complex data formats such as EDI, SWIFT and HL7. The database extensibility features initially developed to support text, spatial and similar forms of complex data are now being used to build e-commerce applications. Thus, database extensibility technology is evolving into an important mechanism to enable the development of e-commerce systems
Article
This paper explores a mechanism to support user-defined data types for columns in a relational data base system. Previous work suggested how to support new operators and new data types. The contribution of this work is to suggest ways to allow query optimization on commands which include new data types and operators and ways to allow access methods to be used for new data types. 1. INTRODUCTION The collection of built-in data types in a data base system (e.g. integer, floating point number, character string) and built-in operators (e.g. +, -, *, /) were motivated by the needs of business data processing applications. However, in many engineering applications this collection of types is not appropriate. For example, in a geographic application a user typically wants points, lines, line groups and polygons as basic data types and operators which include intersection, distance and containment. In scientific application, one requires complex numbers and time series with appropriate operat...
Extensible Indexing Support in DB2 Universal Database
  • S Debloch
Oracle9i Data Cartridge Developer's Guide Release 2 (9.2)
  • Oracle Corp