Article

Mixing Querying and Navigation in MIX

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Web-based information systems provide to their users the ability to interleave querying and browsing during their information discovery efforts. The MIX system provides an API called QDOM (Querible Document Object Model) that supports the interleaved querying and browsing of virtual XML views, specified in an XQuery-like language. QDOM is based on the DOM standard. It allows the client applications to navigate into the view using standard DOM navigation commands. Then the application can use any visited node as the root for a query that creates a new view. The query/navigation processing algorithms of MIX perform decontextualization, i.e., they translate a query that has been issued from within the context of other queries and navigations into efficient queries that are understood by the source outside of the context of previous operations. In addition, MIX provides a navigation-driven query evaluation model, where source data are retrieved only as needed by the subsequent navigations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Résumé. Dans le contexte de l'analyse OLAP, le concept de navigation n'a ja-mais été défini formellement. Nous montrons pourquoi cette lacune est préju-diciable. Nous proposons ensuite une formalisation du concept de navigation, ainsi qu'une première ébauche de langage de définition de navigations.
Conference Paper
Full-text available
We provide an overview of the Garlic project, a new project at the IBM Almaden Research Center. The goal of this project is to develop a system and associated tools for the management of large quantities of heterogeneous multimedia information. Garlic permits traditional and multimedia data to be stored in a variety of existing data repositories, including databases, files, text managers, image managers, video servers and so on; the data is seen through a unified schema expressed in an object-oriented data model and can be queried and manipulated using an object-oriented dialect of SQL, perhaps through an advanced querybrowser tool that we are also developing. The Garlic architecture is designed to be extensible to new kinds of data repositories, and access effi- ciency is addressed via a "middleware" query processor that uses database query optimization techniques to exploit the native associative search capabilities of the underlying data repositories.
Conference Paper
Full-text available
Query processing and optimization in mediator systems that access distributed non-proprietary sources pose many novel problems. Cost-based query optimization is hard because the mediator does not have access to source statistics information and furthermore it may not be easy to model the source's performance. At the same time, querying remote sources may be very expensive because of high connection overhead, long computation time, financial charges, and temporary unavailability. We propose a cost-based optimization technique that caches statistics of actual calls to the sources and consequently estimates the cost of the possible execution plans based on the statistics cache. We investigate issues pertaining to the design of the statistics cache and experimentally analyze various tradeoffs. We also present a query result caching mechanism that allows us to effectively use results of prior queries when the source is not readily available. We employ the novel mechanism, which shows how semantic information about data sources may be used to discover cached query results of interest.
Article
Full-text available
Introduction The TSIMMIS system [1] integrates data from multiple heterogeneous sources and provides users with seamless integrated views of the data. It translates a user query on the integrated views into a set of source queries and postprocessing steps that compute the answer to the user query from the results of the source queries. TSIMMIS uses a mediation architecture [11] to accomplish this (Figure 1). User Source 1 Source 2 Source N Mediator Figure 1: The TSIMMIS Architecture Many other data integration systems like Garlic [2, 9] and Information Manifold [4] employ a similar architecture. One of the distinguishing features of TSIMMIS is its use of a semi-structured data model (called the Object Exchange Model or OEM [7]) for dealing with the heterogeneity of the data sources. In particular, it employs source wrappers [3] that provide a uniform OEM interface to the mediator. In SIGMOD-97, we demonstrated a wrapper toolkit in TSIMMIS that helps in efficient development of w
Article
Full-text available
Since its introduction, XML, the eXtended Markup Language, has quickly emerged as the universal format for publishing and exchanging data in the World Wide Web. As a result, data sources, including object-relational databases, are now faced with a new class of users: clients and customers who would like to deal directly with XML data rather than being forced to deal with the data source's particular (e.g., object-relational) schema and query language. The goal of the XPERANTO project at the IBM Almaden Research Center is to serve as a middleware layer that supports the publishing of XML data to this class of users. XPERANTO provides a uniform, XML-based query interface over an object-relational database that allows users to query and (re)structure the contents of the database as XML data, ignoring the underlying SQL tables and query language. In this paper, we give an overview of the XPERANTO system prototype, explaining how it translates XML-based queries into SQL requests, receives a...
Article
Full-text available
The MIX mediator systems incorporates a novel framework for navigation-driven evaluation of virtual mediated views. Its architecture allows the on-demand computation of views and query results as the user navigates them. The evaluation scheme minimizes superfluous source access through the use of lazy mediators that translate incoming client navigations on virtual XML views into navigations on lower level mediators or wrapped sources. The proposed demand-driven approach is inevitable for handling up-to-date mediated views of large Web sources or query results. The non-materialization of the query answer is transparent to the client application since clients can navigate the query answer using a subset of the standard DOM API for XML documents. We elaborate on query evaluation in such a framework and show how algebraic plans can be implemented as trees of lazy mediators. Finally, we present a new buffering technique that can mediate between the fine granularity of DOM navigations and the coarse granularity of real world sources.
Article
Full-text available
We address the problem of e#ciently constructing materialized XML views of relational databases. In our setting, the XML view is specified by a query in the declarative query language of a middle-ware system, called SilkRoute. The middle-ware system evaluates a query by sending one or more SQL queries to the target relational database, integrating the resulting tuple streams, and adding the XML tags. We focus on how to best choose the SQL queries, without having control over the target RDBMS. 1.
Article
Full-text available
Many declarative query languages for object-oriented databases allow nested subqueries. This paper contains the first (to our knowledge) proposal to optimize them. A two-phase approach is used to optimize nested queries in the objectoriented context. The first phase---called dependency-based optimization---transforms queries at the query language level in order to treat common subexpressions and independent subqueries more efficiently. The transformed queries are translated to nested algebraic expressions. These entail nested loop evaluation which may be very inefficient. Hence, the second phase unnests nested algebraic expressions to allow for more efficient evaluation. 1 Introduction Many declarative query languages for object-oriented database management systems have been proposed in the last few years (e.g. [?, 4, 2, 13, 10]). To express complex conditions, access nested structure, or produce nested results, an essential feature found in these languages is the nesting of queries,...
Article
We describe the Enosys XML integration platform, focusing on the query language, algebra, and architecture of its query processor. The platform enables the development of eBusiness applications in customer relationship management, e-commerce, supply chain management, and decision support. These applications often require that data be integrated dynamically from multiple information sources. The Enosys platform allows one to build (virtual and/or materialized) integrated XML views of multiple sources, using XML queries as view definitions. During run-time, the application issues XML queries against the views. Queries and views are translated into the XCQL algebra and are combined into a single algebra expression/plan. Query plan composition and query plan decomposition challenges are faced in this process. Finally, the query processor lazily evaluates the result, using an appropriate adaptation of relational database iterator models to XML. The paper describes the platform architecture and components, the supported XML query language and the query processor architecture. It focuses on the underlying XML query algebra, which differs from the algebras that have been considered by W3C in that it is particularly tuned to semistructured data and to optimization and efficient evaluation in a system that follows the conventional architecture of database systems.
Article
XML is the standard format for data exchange between inter-enterprise applications on the Internet. To facilitate data exchange, industry groups define public document type definitions (DTDs) that specify the format of the XML data to be exchanged between their applications. In this paper, we address the problem of automating the conversion of relational data into XML. We describe SilkRoute, a general, dynamic, and efficient tool for viewing and querying relational data in XML. SilkRoute is general, because it can express mappings of relational data into XML that conforms to arbitrary DTDs. We call these mappings views. Applications express the data they need as an XML-QL query over the view. SilkRoute is dynamic, because it only materializes the fragment of an XML view needed by an application, and it is efficient, because it fully exploits the underlying RDBMs query engine whenever data items in an XML view need to be materialized.
Conference Paper
Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating new sources into the model. Database implementers must deal with the translation of queries between query languages and schemas. The Distributed Information Search COmponent (Disco) addresses these problems. Query processing semantics are developed to process queries over data sources which do not return answers. Data modeling techniques manage connections to data sources. The component interface to data sources flexibly handles different query languages and translates queries. This paper describes (a) the distributed mediator architecture of Disco, (b) its query processing semantics, (C) the data model and its modeling of data source connections, and (d) the interface to underlying data sources
Conference Paper
Due to the development of the World Wide Web, the integration of heterogeneous data sources has become a major concern of the database community. Appropriate architectures and query languages have been proposed. Yet, the problem of data conversion which is essential for the development of mediators/wrappers architectures has remained largely unexplored. In this paper, we present the YAT system for data conversion. This system provides tools for the specification and the implementation of data conversions among heterogeneous data sources. It relies on a middleware model, a declarative language, a customization mechanism and a graphical interface. The model is based on named trees with ordered and labeled nodes. Like semistructured data models, it is simple enough to facilitate the representation of any data. Its main originality is that it allows to reason at various levels of representation. The YAT conversion language (called YATL) is declarative, rule-based and features enhanced pattern matching facilities and powerful restructuring primitives. It allows to preserve or reconstruct the order of collections. The customization mechanism relies on program instantiations: an existing program may be instantiated into a more specific one, and then easily modified. We also present the architecture, implementation and practical use of the YAT prototype, currently under evaluation within the OPAL * project.
Article
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Article
Many declarative query languages for object-oriented databases allow nested subqueries. This paper contains the first (to our knowledge) proposal to optimize them. A two-phase approach is used to optimize nested queries in the object-oriented context. The first phase—called dependency-based optimization—transforms queries at the query language level in order to treat common subexpressions and independent subqueries more efficiently. The transformed queries are translated to nested algebraic expressions. These entail nested loop evaluation which may be very inefficient. Hence, the second phase unnests nested algebraic expres­sions to allow for more efficient evaluation.
Article
XML has emerged as the standard data exchange format for Internet-based business applications. This has created the need to publish existing business data, stored in relational databases, as XML. A general way to publish relational data as XML is to provide XML views over relational data, and allow business partners to query these views using an XML query language. In this paper, we address the problem of evaluating XML queries over XML views of relational data. This paper makes two main contributions. The first is a general framework for processing arbitrarily complex queries specified using the XQuery query language. The second is a technique for efficiently evaluating XML queries by pushing most of the query computation down to the relational engine. 1.
Article
In the past few years, query languages featuring generalized path expressions have been proposed. These languages allow the interrogation of both data and structure. They are powerful and essential for a number of applications. However, until now, their evaluation has relied on a rather naive and inefficient algorithm. In this paper, we extend an object algebra with two new operators and present some interesting rewriting techniques for queries featuring generalized path expressions. We also show how a query optimizer can integrate the new techniques. 1 Introduction In the past few years there has been a growing interest in query languages featuring generalized path expressions (GPE) [BRG88, KKS92, CACS94, QRS + 95]. With these languages, one may issue queries on data without exact knowledge of its structure. A GPE queries data and structure at the same time. Although very useful for standard database applications, these languages are vital for new applications dedicated, for insta...
Scaling heteroge-neous databases and the design of DISCO Ja-The 2001. universal,real-time Available at A query language for 1.0formalsemantics
  • A Tomasic
  • L Raschid
  • P Valduriez
A. Tomasic, L. Raschid, and P. Valduriez. Scaling heteroge-neous databases and the design of DISCO. Technical report, INRIA, 1995. In Proc. In H. V. Ja-The 2001. universal,real-time Available at A query language for 1.0formalsemantics. availableLatestversionat In XML Views, Queries and Querying xml
The universal, real-time data integration platform
  • Enosys Markets
  • Inc
Enosys Markets, Inc. The universal, real-time data integration platform, 2001. Available at http://www.enosysmarkets.com/solutions/ whitepaper/paper-form.html.
Publishing object-relational data as XML
  • M Carey
  • D Florescu
  • Z Ives
M. Carey, D. Florescu, Z. Ives, et al. XPERANTO: Publishing object-relational data as XML. In Proc. of the Third International Workshop on the Web and Databases, 2000.
Evaluating queries with generalized path expressions
  • V Christophides
  • S Cluet
  • G Moerkotte
V. Christophides, S. Cluet, and G. Moerkotte. Evaluating queries with generalized path expressions. In H. V. Jagadish and I. S. Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996, pages 413-422. ACM Press, 1996.
Nested queries in object bases
  • S Cluet
  • G Moerkotte
S. Cluet and G. Moerkotte. Nested queries in object bases. In C. Beeri, A. Ohori, and D. Shasha, editors, Database Programming Languages (DBPL-4), Proceedings of the Fourth International Workshop on Database Programming Languages -Object Models and Languages, Workshops in Computing, pages 226-242. Springer, 1993.
Capability based mediation in TSIMMIS
  • C Li
  • R Yerneni
  • V Vassalos
  • H Garcia-Molina
  • Y Papakonstantinou
  • J D Ullman
  • M Valiveti
C. Li, R. Yerneni, V. Vassalos, H. Garcia-Molina, Y. Papakonstantinou, J. D. Ullman, and M. Valiveti. Capability based mediation in TSIMMIS. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, pages 564-566. ACM Press, 1998.