Towards a Threat Model for Provenance in e-Science
Luiz M. R. Gadelha Jr.1, Marta Mattoso1, Michael Wilde2, and Ian Foster2
1Computer and Systems Engineering Program
Federal University of Rio de Janeiro, Brazil
University of Chicago / Argonne National Laboratory, USA
Abstract. Scientists increasingly rely on workflow management systems to per-
form large-scale computational scientific experiments. These systems often col-
lect provenance information that is useful in the analysis and reproduction of such
experiments. On the other hand, this provenance data may be exposed to security
threats which can result, for instance, in compromising the analysis of these ex-
periments, or in illegitimate claims of attribution. In this work, we describe our
ongoing work to trace security requirements for provenance systems in the con-
text of e-Science, and propose some security controls to fulfill them.
As an important paradigm of scientific research, computer simulations are increasingly
being used to perform computational scientific experiments. As the scale of these ex-
periments increase, scientific workflow management systems become a relevant tool
to specify, execute, and analyze them. These systems often collect provenance infor-
mation, often distributed in grids or remote clusters, that is useful in the analysis and
reproduction of such experiments. If the appropriate security controls are not in place,
provenance systems may be exposed to threats that may compromise the integrity, con-
fidentiality, or availability of provenance data. In this work, we describe our ongoing
work to trace security requirements for provenance systems in the context of e-Science,
and propose some security controls to fulfill them. The study of security issues in prove-
nance systems is relatively recent   . However, some important security require-
ments, described in section 2, were not yet identified in related academic works, to our
2Requirements for Secure Provenance Management in e-Science
The typical execution of a workflow involves specifying its flow using some mech-
anism, such as a parallel scripting language or a GUI-based workflow specification
tool. Later on, it can be executed by a workflow management system, this involves se-
lecting appropriate computational resources, submitting tasks to these resources, and
transferring data. After the experiment is executed, scientists typically face the chal-
lenge of analyzing a large number of output data files, provenance systems are useful
in this context since they can help to determine, for instance, which tasks where ex- Download full-text
ecuted to generate a particular data object, and which parameters were used for these
tasks. This provenance data is usually collected and stored during workflow execution,
to describe causal relationships between tasks and data (retrospective provenance); or
during workflow specification, to describe the planned tasks, and data flow (prospective
provenance). In general, provenance data is accessed and analyzed by scientists using
a query language, such as SQL. In our ongoing threat modeling effort, we are enumer-
ating threats to each of these components of a provenance system. Many of these are
already taken into account by security frameworks for underlying technologies used
by provenance systems, such as databases and grids. Scientists, specially in the life
sciences, often avoid sharing details of experiments prior to publishing their results in
some academic journal or event, to assure correct attribution of scientific results. Dur-
ing this interval, scientific collaboration is prevented. Therefore, security controls that
prevent illegitimate claims attribution are an important security requirement for prove-
scientists can define which individuals can read, or modify provenance data.
3 Concluding Remarks
This work describes our progress in defining a threat model and proposing security con-
trols for provenance systems in the context of e-Science. We identify the assurance of
correct attribution of scientific results as an important security requirement for these
systems. For this purpose, we proposed Kairos , a security architecture for prove-
nance that uses cryptographic timestamps  and digital signatures. We are currently
working on the implementation of the proposed techniques in Swift , a provenance-
enabled parallel scripting system. As future work, we plan to investigate fine-grained
access control techniques, and a data model to store and query security properties of
1. U. Braun, A. Shinnar, and M. Seltzer. Securing Provenance. In Proc. 3rd USENIX Workshop
on Hot Topics in Security (HotSec ’08), 2008.
2. L. Gadelha and M. Mattoso. Kairos: An Architecture for Securing Authorship and Temporal
Information of Provenance Data in Grid-Enabled Workflow Management Systems. In Proc.
4th IEEE International Conference on e-Science (e-Science 2008), pages 597–602, 2008.
3. S. Haber and W. Stornetta. How to Time-Stamp a Digital Document. Journal of Cryptology,
4. R. Hasan, R. Sion, and M. Winslett. The Case of the Fake Picasso: Preventing History Forgery
with Secure Provenance. In Proc. 7th USENIX Conference on File and Storage Technologies
(FAST ’09), pages 1–14, 2009.
5. M. Nagappan and M. Vouk. A Model for Sharing of Confidential Provenance Information
in a Query Based System. In Proc. 2nd International Provenance and Annotation Workshop
(IPAW 2008), volume 5272 of LNCS, pages 62–69. Springer, 2008.
6. M. Wilde, I. Foster, K. Iskra, P. Beckman, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu.
Parallel Scripting for Applications at the Petascale and Beyond. IEEE Computer, 42(11):50–
60, November 2009.