Conference Paper

Curated databases

Edinburgh Univ., UK
DOI: 10.1109/WISE.2003.1254462 Conference: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver, BC, Canada
Source: DBLP

ABSTRACT Summary form only given. Scientists, notably biologists, are making increasing use of databases to publish both their data and their interpretation of data. These databases are valuable because of the human effort (curation) that goes into their construction and maintenance. They typically consist of a mixture of source data, metadata, annotations, and relevant data that has been extracted from other curated databases. Current database and data exchange technology does not serve database curation well. In this paper, the author addresses a number of issues connected with curated databases. Annotation of existing data now provides a new form of communication between scientists, but conventional database technology provides little support for attaching annotations. The author shows why new models of both data and query languages are needed. Closely related to annotation is provenance - archiving - is also important for verifying the basis of scientific research, yet few published scientific databases do a good job of archiving. Past "editions" of the database get lost. The author describes a system that allows frequent archiving and efficient retrieval with remarkably little space overhead. Finally the author argues that we need a new model of how curated databases are constructed. The idea that such databases are constructed as views of other data through conventional query and update languages is unhelpful, and that formulation of a "copy-and-paste" model of data construction may provide us with better curation tools.

Download full-text


Available from: James Cheney, Dec 23, 2013
1 Follower
  • Source
    Conference Paper: Query by output
    [Show abstract] [Hide abstract]
    ABSTRACT: It has recently been asserted that the usability of a database is as important as its capability. Understanding the database schema, the hidden relationships among attributes in the data all play an important role in this context. Subscribing to this viewpoint, in this paper, we present a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems. The central goal of QBO is as follows: given the output of some query Q on a database D, denoted by Q(D), we wish to construct an alternative query Q′ such that Q(D) and Q′ (D) are instance-equivalent. To generate instance-equivalent queries from Q(D), we devise a novel data classification-based technique that can handle the at-least-one semantics that is inherent in the query derivation. In addition to the basic framework, we design several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility. Our framework is evaluated comprehensively on three real data sets and the results show that the instance-equivalent queries we obtain are interesting and that the approach is scalable and robust to queries of different selectivities.
    Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009; 06/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques. Comment: Long version of paper in 2007 Symposium on Database Programming Languages; revised November 2009
    Mathematical Structures in Computer Science 08/2007; 21(06). DOI:10.1007/978-3-540-75987-4_10 · 0.35 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational Transportation is an emerging discipline that poses many data management challenges. Computational transportation is characterized by the existence of a massive number of moving objects, moving sensors, and moving queries. This paper highlights important data management challenges for computational transportation and promising approaches towards addressing them.