Querying Web Data - The WebQA Approach

Article · December 2002
Source: CiteSeer


    The common paradigm of searching and retrieving information on the Web is based on keyword-based search using one or more search engines, and then browsing through the large number of returned URLs. This is significantly weaker than the declarative querying that is supported by DBMSs. The lack of a schema and the high volatility of Web make "database-like" querying of Web data difficult. In this paper we report on our work in building a system, called WebQA, that provides a declarative query-based approach to Web data retrieval that uses question-answering technology in extracting information from Web sites that are retrieved by search engines. The approach consists of first using meta-search techniques in an open environment to gather candidate responses from search engines and other on-line databases, and then using information extraction techniques to find the answer to the specific question from these candidates. A prototype system has been developed to test this approach. Testing includes evaluation of its performance as a question-answering system using a wellknown evaluation system called TREC-9. Its accuracy using TREC-9 data for simple questions is high and its retrieval performance is good. The system employs an open system architecture allowing for on-going improvements in various aspects.