Provenance for cloud data accountability

To read the full-text of this research, you can request a copy directly from the authors.


Although cloud adoption has increased in recent years, it is still hampered by the lack of means for data accountability. Data provenance, the information that describes the historical events surrounding the datum, can potentially address the data accountability issue in the cloud. While provenance research has produced tools that can actively collect data provenance in a cloud environment, these tools incur a substantial amount of overhead in terms of time and storage. This overhead, along with other disadvantages, render these tools untenable as a long-term solution.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
While provenance research is common in distributed systems, many proposed solutions do not address the security of systems and accountability of data stored in those systems. In this paper, we survey provenance solutions which were proposed to address the problems of system security and data accountability in distributed systems. From our survey, we derive a set of minimum requirements that are necessary for a provenance system to be effective in addressing the two problems. Finally, we identify several gaps in the surveyed solutions and present them as challenges that future provenance researchers should tackle. We argue that these gaps have to be addressed before a complete and fool-proof provenance solution can be arrived at in the future.
SUMMARY The Earth System Science Server (ES3) project is developing a local infrastructure for managing Earth science data products derived from satellite remote sensing. By 'local,' we mean the infrastructure that a scientist uses to manage the creation and dissemination of her own data products, particularly those that are constantly incorporating corrections or improvements based on the scientist's own research. There- fore, in addition to being robust and capacious enough to support public access, ES3 is intended to be flexible enough to manage the idiosyncratic computing ensembles that typify scientific research. Instead of specifying provenance explicitly with a workflow model, ES3 extracts provenance information auto- matically from arbitrary applications by monitoring their interactions with their execution environment. These interactions (arguments, file I/O, system calls, etc.) are logged to the ES3 database, which assembles them into provenance graphs. These graphs resemble workflow specifications, but are really reports—they describe what actually happened, as opposed to what was requested. The ES3 database supports forward and backward navigation through provenance graphs (i.e. ancestor/descendant queries), as well as graph retrieval. Copyright © 2007 John Wiley & Sons, Ltd.