Conference Paper

Curated Databases

Edinburgh Univ., UK
DOI: 10.1109/WISE.2003.1254462 Conference: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver, BC, Canada
Source: DBLP

ABSTRACT Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries -- dictionaries, encyclopedias, gazetteers etc. -- are now curated
databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area.

Curated databases present a number of challenges for database research. Because they are heavily crossreferenced with, and include data from, other databases, the topics of provenance and citation are
important; as is annotation, since of much of the work of a curator is annotating existing data. Because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries, evolution of structure is important. Much of the work in these areas is in its infancy, but it is beginning to suggest new research for both theory and practice. We discuss
some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.

Download full-text


Available from: James Cheney, Dec 23, 2013
1 Follower
38 Reads
  • Source
    • "Annotating data with relevant metadata is essential in curated databases [6]. Such reverse engineering is also useful for generating concise query-based summaries of groups of tuples of interest to the user (e.g., dominant tuples selected by skyline queries [4]). "
    Conference Paper: Query by output
    [Show abstract] [Hide abstract]
    ABSTRACT: It has recently been asserted that the usability of a database is as important as its capability. Understanding the database schema, the hidden relationships among attributes in the data all play an important role in this context. Subscribing to this viewpoint, in this paper, we present a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems. The central goal of QBO is as follows: given the output of some query Q on a database D, denoted by Q(D), we wish to construct an alternative query Q′ such that Q(D) and Q′ (D) are instance-equivalent. To generate instance-equivalent queries from Q(D), we devise a novel data classification-based technique that can handle the at-least-one semantics that is inherent in the query derivation. In addition to the basic framework, we design several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility. Our framework is evaluated comprehensively on three real data sets and the results show that the instance-equivalent queries we obtain are interesting and that the approach is scalable and robust to queries of different selectivities.
    Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009; 06/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques. Comment: Long version of paper in 2007 Symposium on Database Programming Languages; revised November 2009
    Mathematical Structures in Computer Science 08/2007; 21(06). DOI:10.1007/978-3-540-75987-4_10 · 0.45 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational Transportation is an emerging discipline that poses many data management challenges. Computational transportation is characterized by the existence of a massive number of moving objects, moving sensors, and moving queries. This paper highlights important data management challenges for computational transportation and promising approaches towards addressing them.
Show more