Querying Web Data - The WebQA Approach
ABSTRACT The common paradigm of searching and retrieving information on the Web is based on keyword-based search using one or more search engines, and then browsing through the large number of returned URLs. This is significantly weaker than the declarative querying that is supported by DBMSs. The lack of a schema and the high volatility of Web make "database-like" querying of Web data difficult. In this paper we report on our work in building a system, called WebQA, that provides a declarative query-based approach to Web data retrieval that uses question-answering technology in extracting information from Web sites that are retrieved by search engines. The approach consists of first using meta-search techniques in an open environment to gather candidate responses from search engines and other on-line databases, and then using information extraction techniques to find the answer to the specific question from these candidates. A prototype system has been developed to test this approach. Testing includes evaluation of its performance as a question-answering system using a wellknown evaluation system called TREC-9. Its accuracy using TREC-9 data for simple questions is high and its retrieval performance is good. The system employs an open system architecture allowing for on-going improvements in various aspects.
- SourceAvailable from: acberg.com
Conference Proceeding: Approximating Aggregate Queries about Web Pages via Random Walks01/2000
- [show abstract] [hide abstract]
ABSTRACT: A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used equally well to represent sets and tuples. Such structures allow great flexibility in data representation What query language is appropriate for such structures? Here we propose a simple language UnQL for querying data organized as a rooted, edge-labeled graph. In this model, relational data may be represented as fixed-depth trees, and on such trees UnQL is equivalent to the relational algebra. The novelty of UnQL consists in its programming constructs for arbitrarily deep data and for cyclic structures. While strictly more powerful than query languages with path expressions like XSQL, UnQL can still be efficiently evaluated. We describe new optimization techniques for the deep or "vertical" dimension of UnQL queries. Furthermore, we show that known optimization techniques for operators on flat relations apply to the "horizontal" dimension of UnQL. 103/2001;
Article: To Weave the Web[show abstract] [hide abstract]
ABSTRACT: The paper discusses the issue of views in the Web context. We introduce a set of languages for managing and restructuring data coming from the World Wide Web. We present a specific data model, called the Araneus Data Model, inspired to the structures typically present in Web sites. The model allows us to describe the scheme of a Web hypertext, in the spirit of databases. Based on the data model, we develop two languages to support a sophisticate view definition process: the first, called Ulixes, is used to build database views of the Web, which can then be analyzed and integrated using database techniques; the second, called Penelope, allows the definition of derived Web hypertexts from relational views. This can be used to generate hypertextual views over the Web. 1 Introduction As a consequence of the explosion of the World Wide Web , an increasing amount of information is stored in repositories organized according to loose structures, usually as hypertextual d...07/1997;