Conference Paper

Schema Mapping in P2P Networks Based on Classification and Probing.

DOI: 10.1007/978-3-540-71703-4_58 In proceeding of: Advances in Databases: Concepts, Systems and Applications, 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, Bangkok, Thailand, April 9-12, 2007, Proceedings
Source: DBLP

ABSTRACT In this paper, we address the problems of adaptive schema mappings between difierent peers in peer-to-peer network and searching for interesting data residing at difierent peers based on such mappings. We begin by classifying the shared schema of each peer into a taxon- omy of relation categories and attribute categories. We then propose our adaptive schema mapping by selectively probing the shared schema with query probes, which are generated by the classiflcation rules. To improve the accuracy of schema mapping, we introduce the notion of confusion matrix and prior-knowledge. Finally, we present the query reformulation strategy for retrieving and integrating data from all relevant peers. We have implemented our proposed schema mapping and query processing methods in real settings with real datasets. The experimental results show that our method can be adopted efiectively in practice.

0 Bookmarks
 · 
52 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In a Web database that dynamically provides information in response to user queries, two distinct schemas, interface schema (the schema users can query) and result schema (the schema users can browse), are presented to users. Each partially reflects the actual schema of the Web database. Most previous work only studied the problem of schema matching across query interfaces of Web databases. In this paper, we propose a novel schema model that distinguishes the interface and the result schema of a Web database in a specific domain. In this model, we address two significant Web database schema- matching problems: intra-site and inter-site. The first problem is crucial in automatically extracting data from Web databases, while the second problem plays a significant role in meta-retrieving and integrating data from different Web databases. We also investigate a unified solution to the two problems based on query probing and instance-based schema matching techniques. Using the model, a cross validation technique is also proposed to improve the accuracy of the schema matching. Our experiments on real Web databases demonstrate that the two problems can be solved simultaneously with high precision and recall.
    (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004; 01/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Peer-to-peer" systems like Napster and Gnutella have recently become popular for sharing information. In this paper, we study the relevant issues and tradeoffs in designing a scalable P2P system. We focus on a subset of P2P systems, known as "hybrid" P2P, where some functionality is still centralized. (In Napster, for example, indexing is centralized, and file exchange is distributed.) We model a file-sharing application, developing a probabilistic model to describe query behavior and expected query result sizes. We also develop an analytic model to describe system performance. Using experimental data collected from a running, publicly available hybrid P2P system, we validate both models. We then present several hybrid P2P system architectures and evaluate them using our model. We discuss the tradeoffs between the architectures and highlight the effects of key parameter values on system performance. 1
    09/2001;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This demo presents Hyperion, a prototype system that supports data sharing for a network of independent Peer Relational Database Management Systems (PDBMSs). The nodes of such a network are assumed to be autonomous PDBMSs that form acquaintances at run-time, and manage mapping tables to define value correspondences among different databases. They also use distributed Event-Condition-Action (ECA) rules to enable and coordinate data sharing. Peers perform local querying and update processing, and also propagate queries and updates to their acquainted peers. The demo illustrates the following key functionalities of Hyperion: (1) the use of (data level) mapping tables to infer new metadata as peers dynamically join the network, (2) the ability to answer queries using data in acquaintances, and (3) the ability to coordinate peers through update propagation.
    Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005; 01/2005

Full-text

View
1 Download