[Show abstract][Hide abstract] ABSTRACT: OpenII (openintegration.org) is a collaborative effort to create a suite of open-source tools for information integration (II). The project is leveraging the latest developments in II research to create a platform on which integration tools can be built and further research conducted. In addition to a scalable, extensible platform, OpenII includes industrial-strength components developed by MITRE, Google, UC-Irvine, and UC-Berkeley that interoperate through a common repository in order to solve II problems. Components of the toolkit have been successfully applied to several large-scale US government II challenges.
Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010; 10/2010
[Show abstract][Hide abstract] ABSTRACT: Many data sharing communities create data standards ("hub" schemata) to speed information integration by increasing reuse of both data definitions and mappings. Unfortunately, creation of these standards and the mappings to the enterprise's implemented systems is both time consuming and expensive. This paper presents Unity, a novel tool for speeding the development of a community vocabulary, which includes both a standard schema and the necessary mappings. We present Unity's scalable algorithms for creating vocabularies and its novel human computer interface which gives the integrator a powerful environment for refining the vocabulary. We then describe Unity's extensive reuse of data structures and algorithms from the OpenII information integration framework, which not only sped the construction of Unity but also results in reuse of the artifacts produced by Unity: vocabularies serve as the basis of information exchanges, and also can be reused as thesauri by other tools within the OpenII framework. Unity has been applied to real U.S. government information integration challenges.
Proceedings of the IEEE International Conference on Information Reuse and Integration, IRI 2011, 3-5 August 2011, Las Vegas, Nevada, USA; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Generating new knowledge from scientific databases, fusioning products information of business companies or computing an overlap between various data collections are a few examples of applications that require data integration. A crucial step during this integration process is the discovery of correspondences between the data sources, and the evaluation of their quality. For this purpose, the overall metric has been designed to compute the post-match effort, but it suffers from major drawbacks. Thus, we present in this paper two related metrics to compute this effort. The former is called post-match effort, i.e., the amount of work that the user must provide to correct the correspondences that have been discovered by the tool. The latter enables the measurement of human-spared resources, i.e., the rate of automation that has been gained by using a matching tool.
On the Move to Meaningful Internet Systems: OTM 2011 - Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2011, Hersonissos, Crete, Greece, October 17-21, 2011, Proceedings, Part I; 01/2011
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.