Conference Paper

Curated databases.

Edinburgh Univ., UK;
DOI: 10.1109/WISE.2003.1254462 Conference: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver, BC, Canada
Source: DBLP

ABSTRACT Summary form only given. Scientists, notably biologists, are making increasing use of databases to publish both their data and their interpretation of data. These databases are valuable because of the human effort (curation) that goes into their construction and maintenance. They typically consist of a mixture of source data, metadata, annotations, and relevant data that has been extracted from other curated databases. Current database and data exchange technology does not serve database curation well. In this paper, the author addresses a number of issues connected with curated databases. Annotation of existing data now provides a new form of communication between scientists, but conventional database technology provides little support for attaching annotations. The author shows why new models of both data and query languages are needed. Closely related to annotation is provenance - archiving - is also important for verifying the basis of scientific research, yet few published scientific databases do a good job of archiving. Past "editions" of the database get lost. The author describes a system that allows frequent archiving and efficient retrieval with remarkably little space overhead. Finally the author argues that we need a new model of how curated databases are constructed. The idea that such databases are constructed as views of other data through conventional query and update languages is unhelpful, and that formulation of a "copy-and-paste" model of data construction may provide us with better curation tools.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Provenance is an increasing concern due to the ongoing revolution in sharing and processing scientific data on the Web and in other computer systems. It is proposed that many computer systems will need to become provenance-aware in order to provide satisfactory accountability, reproducibility, and trust for scientific or other high-value data. To date, there is not a consensus concerning appropriate formal models or security properties for provenance. In previous work, we introduced a formal framework for provenance security and proposed formal definitions of properties called disclosure and obfuscation. In this article, we study refined notions of positive and negative disclosure and obfuscation in a concrete setting, that of a general-purpose programing language. Previous models of provenance have focused on special-purpose languages such as workflows and database queries. We consider a higher-order, functional language with sums, products, and recursive types and functions, and equip it with a tracing semantics in which traces themselves can be replayed as computations. We present an annotation-propagation framework that supports many provenance views over traces, including standard forms of provenance studied previously. We investigate some relationships among provenance views and develop some partial solutions to the disclosure and obfuscation problems, including correct algorithms for disclosure and positive obfuscation based on trace slicing.
    Journal of Computer Security. 10/2013; 21(6).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Geolinguistic systems explore the relationship between language and cultural adaptation and change and they can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels. However, the heterogeneity of geolinguistic projects has been recognised as a key problem limiting the reusability of linguistic tools and data collections. We propose an approach based on LOD, which moves the focus from the systems handling the data to the data themselves with the main goal of increasing the level of interoperability of geolinguistic applications and the reuse of the data. We defined an extensible ontology for geolinguistic resources based on the common ground defined by current European linguistic projects. We provide a Geolinguistic Linked Open Dataset based on the data case study of a linguistic project named ASIt. Finally, we show a geolinguistic application, which exploits this dataset for dynamically generating linguistic maps.
    International Journal of Metadata Semantics and Ontologies 02/2014; 9(1):29-41.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of query containment of conjunctive queries over annotated databases. Annotations are typically attached to tuples and represent metadata, such as probability, multiplicity, comments, or provenance. It is usually assumed that annotations are drawn from a commutative semiring. Such databases pose new challenges in query optimization, since many related fundamental tasks, such as query containment, have to be reconsidered in the presence of propagation of annotations. We axiomatize several classes of semirings for each of which containment of conjunctive queries is equivalent to existence of a particular type of homomorphism. For each of these types, we also specify all semirings for which existence of a corresponding homomorphism is a sufficient (or necessary) condition for the containment. We develop new decision procedures for containment for some semirings which are not in any of these classes. This generalizes and systematizes previous approaches.
    ACM Transactions on Database Systems (TODS). 01/2014; 39(1).

Full-text (2 Sources)

Available from
May 30, 2014