Conference Paper

Curated databases.

Edinburgh Univ., UK;
DOI: 10.1109/WISE.2003.1254462 Conference: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver, BC, Canada
Source: DBLP

ABSTRACT Summary form only given. Scientists, notably biologists, are making increasing use of databases to publish both their data and their interpretation of data. These databases are valuable because of the human effort (curation) that goes into their construction and maintenance. They typically consist of a mixture of source data, metadata, annotations, and relevant data that has been extracted from other curated databases. Current database and data exchange technology does not serve database curation well. In this paper, the author addresses a number of issues connected with curated databases. Annotation of existing data now provides a new form of communication between scientists, but conventional database technology provides little support for attaching annotations. The author shows why new models of both data and query languages are needed. Closely related to annotation is provenance - archiving - is also important for verifying the basis of scientific research, yet few published scientific databases do a good job of archiving. Past "editions" of the database get lost. The author describes a system that allows frequent archiving and efficient retrieval with remarkably little space overhead. Finally the author argues that we need a new model of how curated databases are constructed. The idea that such databases are constructed as views of other data through conventional query and update languages is unhelpful, and that formulation of a "copy-and-paste" model of data construction may provide us with better curation tools.

1 Bookmark
 · 
118 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the increasing use of Web 2.0 to create, disseminate, and consume large volumes of data, more and more information is published and becomes available for potential data consumers, that is, applications/services, individual users and communities, outside their production site. The most representative example of this trend is Linked Open Data (LOD), a set of interlinked data and knowledge bases. The main challenge in this context is data governance within loosely coordinated organizations that are publishing added-value interlinked data on the Web, bringing together issues related to data management and data quality, in order to support the full lifecycle of data production, consumption, and management. In this article, we are interested in curation issues for RDF(S) data, which is the default data model for LOD. In particular, we are addressing change management for RDF(S) data maintained by large communities (scientists, librarians, etc.) which act as curators to ensure high quality of data. Such curated Knowledge Bases (KBs) are constantly evolving for various reasons, such as the inclusion of new experimental evidence or observations, or the correction of erroneous conceptualizations. Managing such changes poses several research problems, including the problem of detecting the changes (delta) between versions of the same KB developed and maintained by different groups of curators, a crucial task for assisting them in understanding the involved changes. This becomes all the more important as curated KBs are interconnected (through copying or referencing) and thus changes need to be propagated from one KB to another either within or across communities. This article addresses this problem by proposing a change language which allows the formulation of concise and intuitive deltas. The language is expressive enough to describe unambiguously any possible change encountered in curated KBs expressed in RDF(S), and can be efficiently and deterministically detected in an automated way. Moreover, we devise a change detection algorithm which is sound and complete with respect to the aforementioned language, and study appropriate semantics for executing the deltas expressed in our language in order to move backwards and forwards in a multiversion repository, using only the corresponding deltas. Finally, we evaluate through experiments the effectiveness and efficiency of our algorithms using real ontologies from the cultural, bioinformatics, and entertainment domains.
    ACM Transactions on Database Systems (TODS). 04/2013; 38(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of query containment of conjunctive queries over annotated databases. Annotations are typically attached to tuples and represent metadata, such as probability, multiplicity, comments, or provenance. It is usually assumed that annotations are drawn from a commutative semiring. Such databases pose new challenges in query optimization, since many related fundamental tasks, such as query containment, have to be reconsidered in the presence of propagation of annotations. We axiomatize several classes of semirings for each of which containment of conjunctive queries is equivalent to existence of a particular type of homomorphism. For each of these types, we also specify all semirings for which existence of a corresponding homomorphism is a sufficient (or necessary) condition for the containment. We develop new decision procedures for containment for some semirings which are not in any of these classes. This generalizes and systematizes previous approaches.
    ACM Transactions on Database Systems (TODS). 01/2014; 39(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: The transition from a paper-based work environment to a largely paperless environment is still in full swing, in healthcare as well as in other domains. Analysts predict a further decade of efforts is necessary at least. In reality, paperless IT-based workflows offer both advantages and disadvantages over paper-based solutions. This is in contrast to the naïve expectation that a paperless solution should be a strict improvement over paper-based processes. We identify a set of generic requirements that address common drawbacks of IT solutions, and we propose a system model that helps to create IT systems which preserve the advantages of paper-based processing. The main tenet is that the paperless solution should be based on a naturalistic paper metaphor. Our system model supports auditability of IT systems by direct reference to the paper metaphor and ensures that information is faithfully presented to the practitioner. The system model is intended for mission critical applications such as health record management.
    Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management - Volume 120; 01/2011

Full-text (2 Sources)

Download
38 Downloads
Available from
May 30, 2014