A Framework for Scientific Workflow Reproducibility in the Cloud
Workflow is a well-established means by which to capture scientific methods in an abstract graph of interrelated processing tasks. The reproducibility of scientific workflows is therefore fundamental to reproducible e-Science. However, the ability to record all the required details so as to make a workflow fully reproducible is a long-standing problem that is very difficult to solve. In this paper, we introduce an approach that integrates system description, source control, container management and automatic deployment techniques to facilitate workflow reproducibility. We have developed a framework that leverages this integration to support workflow execution, re-execution and reproducibility in the cloud and in a personal computing environment. We demonstrate the effectiveness of our approach by examining various aspects of repeatability and reproducibility on real scientific workflows. The framework allows workflow and task images to be captured automatically, which improves not only repeatability but also runtime performance. It also gives workflows portability across different cloud environments. Finally, the framework can also track changes in the development of tasks and workflows to protect them from unintentional failures.
A Framework for Scientiﬁc Workﬂow
Reproducibility in the Cloud
Newcastle upon Tyne, UK
Mosul University, Iraq
Newcastle upon Tyne, UK
Newcastle upon Tyne, UK
Abstract—Workﬂow is a well-established means by which to
capture scientiﬁc methods in an abstract graph of interrelated
processing tasks. The reproducibility of scientiﬁc workﬂows is
therefore fundamental to reproducible e-Science. However, the
ability to record all the required details so as to make a workﬂow
fully reproducible is a long-standing problem that is very difﬁcult
In this paper, we introduce an approach that integrates system
description, source control, container management and automatic
deployment techniques to facilitate workﬂow reproducibility. We
have developed a framework that leverages this integration to
support workﬂow execution, re-execution and reproducibility in
the cloud and in a personal computing environment.
We demonstrate the effectiveness of our approach by ex-
amining various aspects of repeatability and reproducibility
on real scientiﬁc workﬂows. The framework allows workﬂow
and task images to be captured automatically, which improves
not only repeatability but also runtime performance. It also
gives workﬂows portability across different cloud environments.
Finally, the framework can also track changes in the development
of tasks and workﬂows to protect them from unintentional
Workﬂows have become a valuable mechanism for speci-
fying and automating scientiﬁc experiments running on dis-
tributed computing infrastructure. Researchers in different
disciplines have embraced them to conduct a wide range
of analyses and scientiﬁc pipelines , mainly because a
workﬂow can be considered as a model deﬁning the structure
of the computational and/or data processing tasks necessary
for the management of a scientiﬁc process .
However, workﬂows are not only useful in representing
and managing the computation but also as a way of sharing
knowledge and experimental methods. When shared, they can
help users to understand the overall experiment, or they can
become an essential building block in their new experiments.
Lastly, workﬂows can also be used to repeat or reproduce the
experiment and replicate the original results .
One of the major challenges in achieving workﬂow repro-
ducibility, however, is the heterogeneity of workﬂow com-
ponents which demand different, sometimes conﬂicting sets
of dependencies. Ensuring successful reproducibility of work-
ﬂows requires more than simply sharing their speciﬁcations.
It also depends on the ability to isolate necessary and sufﬁ-
cient computational artifacts and preserve them with adequate
description for future re-use .
A number of analyses and research efforts have already
been conducted to determine the salient issues and challenges
in workﬂow reproducibility , , , . In short the
issues can be summarized as: insufﬁcient and non-portable
description of a workﬂow including missing details of the
processing tools and execution environment, unavailable ex-
ecution environments, missing third party resources and data,
and reliance on external dependencies, such as external web
services, which add difﬁculty to reproducibility at a later time.
Currently, most of the approaches that address reproducibil-
ity of scientiﬁc workﬂows have focused either on their physical
preservation, in which a workﬂow is conserved by packaging
all of its components, so an identical replica is created and can
be reused; or on logical preservation, in which the workﬂow
and its components are described with enough information for
others to reproduce a similar workﬂow in the future .
Although both, packaging and describing, play a vital role
in supporting workﬂow re-use, alone they are not sufﬁcient to
effectively maintain reproducibility. On the one hand physical
preservation is limited to recreating the packaged components
and resources while it lacks a structured description about the
workﬂow. Thus, it makes easy to repeat exactly the same exe-
cution, yet it is often not enough to reproduce the experiment
with different parameters or input data. On the other hand
logical preservation can provide detailed description of various
levels of the workﬂow. It is still not enough, however, in the
absence of the necessary tools and dependencies.
A need to integrate these two forms of preservation becomes
increasingly apparent. That, combined with a portable descrip-
tion of the workﬂow, which can be used in different environ-
ments, and an automated workﬂow deployment mechanism
has potential to signiﬁcantly improve workﬂow reproducibility.
In this paper we present a framework designed to address
the challenges mentioned earlier. The framework integrates
features of both logical and physical preservation approaches.
Firstly, using OASIS speciﬁcation “Topology and Orches-
tration Speciﬁcation for Cloud Applications” (TOSCA) 
it allows a workﬂow description to include the top-level
structure of the abstract workﬂow together with details about
its execution environment. The description is portable and may
be used in automated deployment across different execution
environments including the Cloud and a local VM.
Secondly, using Docker virtualisation and imaging our
framework offers portable packaging of whole workﬂows and
their parts. By integration with TOSCA, the packaging is
automated, hence users are free from creating and managing
Docker images. Additionally, our framework is built upon code
repositories that natively support version control – crucial in
tracking the evolution of workﬂows and their components over
We argue that these three elements: portable and compre-
hensive description, portable packaging and widely applied
version control, play a fundamental role in maintaining re-
producibility of scientiﬁc workﬂows over longer periods of
time. They allowed us to build the framework which we
present as the main contribution of this paper. We evaluate
the framework using real scientiﬁc workﬂows developed in
our previous projects to demonstrate that it can effectively
realise its goal.
II. BACKGROU ND A ND RE LATE D WORK
Workﬂow reproducibility and repeatability have been dis-
cussed in a number of studies, such as , , and are
considered to be an essential part of the computational sci-
entiﬁc method. As our approach to improving reproducibility
of workﬂows is based on the TOSCA speciﬁcation and Docker
technology, in this section we present the three relevant areas.
A. Scientiﬁc Workﬂow Reproducibility
There have been various attempts proposed in the literature
or as software tools to address repeatability and reproducibility
of scientiﬁc workﬂows. As mentioned earlier, most of them
follow one of two directions: (1) packaging the components
of a workﬂow, known as physical preservation/conservation,
or (2) describing a workﬂow and all its components, called
To implement packaging of workﬂows Chirigati et al.
proposed ReproZip . It tracks system calls during the
execution of a workﬂow to capture the dependencies, data
and conﬁguration used at runtime, and to package them all
together. Then the package can be used to re-execute the
archived workﬂow invocation.
Other researcher to package workﬂows have used virtual-
ization mechanisms, speciﬁcally the ability to save the state
of a virtual machine as an image (VMI) , , .
The main advantage in using VMIs is that they allow the
complete experimental workﬂow and environment to be easily
captured and shared with other scientists . However, the
resulting images are large in size and costly to be publicly
distributed . And despite packaging mechanisms allowing
workﬂows to be re-executed (i.e. allow repeatability), they
usually do not convey a detailed and structured description
of the entire computation, relevant dependencies and execution
environments, which would help in understanding the package
contents. Therefore, their ability to reproduce or even reuse a
packaged workﬂow in other contexts (e.g. using different input
data, parameters or execution environments) is often limited.
The logical preservation techniques focus on capturing
all the details required to repeat and potentially reproduce
scientiﬁc workﬂows. A notable example is myExperiment 
which offers a web interface to support social sharing of
workﬂows with computational description and visualizations
of their components. myExperiment, as a general repository
for workﬂows, contributes to the improvement of workﬂow
Santana-Perez et al. proposed in  a semantic-based ap-
proach to preserve workﬂows with their execution environ-
ment. They use a set of semantic vocabularies to specify the
resources involved in the execution of a workﬂow. However,
other studies have shown that sharing only the speciﬁcations
of a workﬂow is not enough to ensure successful reproducibil-
Another technique of logical preservation is capturing the
provenance information of the workﬂow results , .
Retrospective provenance encapsulates the exact trace of a past
workﬂow execution, which can then help in its re-execution.
Nevertheless, provenance usually describes only the abstract
layer of a workﬂow because detailed traces of the use of
execution environment (e.g. at the OS level) quickly become
More recently, Hasham et al.  presented a framework
that captures information about Cloud infrastructure of a
workﬂow execution and interlinks it with data provenance of
the workﬂow. They propose workﬂow reproducibility by re-
provisioning similar execution infrastructure using the Cloud
provenance and then re-execution of the workﬂow. Although
the approach enables the re-execution, it is unable to track and
address changes to the original workﬂow.
Belhajjame et al.  proposed Research Objects as a
preservation approach for scientiﬁc workﬂows. Research Ob-
jects can aggregate various types of data to enhance workﬂow
reproducibility like: workﬂow speciﬁcations, description of
workﬂow components and provenance traces. However, they
do not include enough technical details about dependencies
and the workﬂow execution environment to easily allow re-
The speciﬁcation-based mechanisms provide various details
that can help in understanding the workﬂow and its compo-
nents. Yet, they are still insufﬁcient when some of the required
dependencies change or become unavailable, in which case
the ability to reconstruct the same execution environment is
lost. Therefore, the integration of workﬂow speciﬁcation and
description of its components alongside a portable packaging
mechanism that facilitates sharing becomes fundamental.
B. Topology and Orchestration Speciﬁcation for Cloud Appli-
TOSCA is an OASIS speciﬁcation for modeling a complete
application stack, and automating its deployment and manage-
ment in the Cloud. The main intent of TOSCA is to improve
the portability of Cloud applications in the face of the growing
diversity of Cloud environments .
The speciﬁcation deﬁnes a meta-model for describing both
the structure and management of IT applications. The struc-
ture of an application, its components and the relationships
between them are represented by a Topology Template. The
components and their relationships are deﬁned as Node and
Relationship Templates instantiated from a predeﬁned Node
and Relationship Types. The types are reusable entities that
can be used to construct new Topology Templates for different
In our previous work  we proposed the use of TOSCA
to describe the entire structure of a scientiﬁc workﬂow, to-
gether with all its components and speciﬁcation of a host
environment. By adopting TOSCA, we can turn workﬂows
into reusable entities that include not only the description of a
scientiﬁc experiment but also all details needed to deploy and
execute them automatically. Therefore, we use TOSCA as the
basis for the framework presented in this paper.
C. Reproducibility using Lightweight Virtualization
Container-based virtualization is a lightweight alternative to
Virtual Machines. It is not new but Docker,1one of the recently
developed tools for Linux systems, established a strong and
open ecosystem that several Cloud providers support and
promote in their offers. Importantly, Docker containers are
portable and can run on different hosts, which makes them
a suitable packaging tool to support the reproducibility of
Similarly to a VMI, a Docker image is a ﬁle that includes
an Operating System together with a set of relevant libraries,
software packages, conﬁguration and data. And it can later
be used to create a container – a running instance of the
system/application. That makes containers equally suitable to
encapsulate and then re-execute scientiﬁc workﬂows. But the
main attraction in using containers, when compared to Virtual
Machines, is that images are smaller in size and starting a
container is a few orders of magnitude faster than starting an
VM. Therefore, in our work we integrated Docker images and
containers in the deployment and reproducibly process.
Similarly to Virtual Machine hypervisors, Docker allows
workﬂow applications along with all necessary dependencies
to be encapsulated into a container image , . But
even if these approaches can offer a convenient mechanism
to preserve workﬂows, they still lack a structured description
of the aggregate. In addition, they are limited to packaged
resources and dependencies, and lack ﬂexibility to change the
components or dependencies in an already packaged workﬂow.
III. IMPROVING WORKFL OW REPRODUCIBILITY
Clearly, the complete reproducibility of a workﬂow is hard
to achieve due to possible changes at various levels of software
and hardware platforms used to run it. We can, however, sig-
niﬁcantly increase the degree of reproducibility by addressing
Core Repository (GitHub)
Target Execution Environment
(Docker over Local VM, AWS, Azure, GCE, ...)
Workflow Deployment & Enactment Engine
(TOSCA Runtime Environment: Cloudify)
Fig. 1: The architecture of our workﬂow reproducibility
the challenges discussed earlier. And our goal when designing
our workﬂow reproducibility framework was to ﬁnd ways in
which we can effectively respond to these challenges.
A. The Framework Architecture
The proposed workﬂow reproducibility framework consists
of four main components: the Core repository, a set of Work-
ﬂow and Task repositories, the Image repository supported by
the Automatic Image Creation (AIC) facility, and the workﬂow
enactment engine (Fig. 1). The Core repository includes a set
of common and reusable TOSCA elements such as Node-
and RelationshipTypes, and life cycle management scripts.
They are a foundation for building tasks and workﬂows. The
Workﬂow and Task repositories are used to store workﬂows
and their components so they can be accessed during enact-
ment and also shared and reused in designing new workﬂows.
The Image repository contains workﬂow and task images that
are used to improve reproducibility and also performance of
workﬂow enactment. Images are automatically captured by the
AIC. Finally, the workﬂow enactment engine is implemented
by a TOSCA-compliant runtime environment.
To implement logical preservation we rely on the TOSCA
speciﬁcation which we previously adopted as a method to
model portable workﬂows . With TOSCA we can describe
workﬂows not only at the abstract level but together with the
complete software stack required to deploy and enact them.
And it is portable because we can use a TOSCA-compliant
runtime environment to automatically deploy and enact our
workﬂows on different Cloud platforms or in a local VM.
To control changes that can affect a workﬂow and its
components we use a version control platform. It gives us the
ability to track the complete history of developmental changes
of workﬂows and tasks. The version control platform supports
also the Automatic Image Creation facility. The AIC uses
Docker to implement physical preservation of workﬂows and
greatly helps in building and management of image libraries.
Moreover, instead of building yet another workﬂow repos-
itory and yet another workﬂow engine, we deﬁne our frame-
work on top of open platforms like GitHub and DockerHub.
The former allows workﬂow and task source code to be stored
and maintained under version control, the latter can store
workﬂows and tasks packaged as Docker images. Importantly,
both platforms offer mechanisms which promote sharing and
B. The Framework in Use
To create a workﬂow the user needs to implement and
model its essential components including: NodeTypes and
tasks code. The NodeTypes are used to declare tasks and
dependency libraries, they also refer to the task code – the
actual software artifacts which will be deployed and executed.
Currently, to facilitate building new tasks and workﬂows
we implement a set of basic NodeTypes and tasks which
others can reuse. Additionally, our Core repository provides
RelationshipTypes and life cycle management scripts that
are common to all workﬂows. They deﬁne and implement
basic workﬂow functionality like passing data between tasks,
conﬁguration of library dependencies, etc. Given all these
components, the workﬂow can be encoded as a TOSCA
ServiceTemplate. The template includes Node- and Rela-
tionshipTemplates that are instances of the types developed
earlier; these templates represent tasks and task links, respec-
Once the workﬂow ServiceTemplate has been prepared
it can be deployed by a TOSCA-compliant runtime environ-
ment. Currently, we support Cloudify2but there are other
options available like OpenTOSCA3and Alien4Cloud.4The
enactment of workﬂows follows the structure embedded in
the TopologyTemplate, a part of the ServiceTemplate that
in a declarative way combines components and dependencies.
Using the TopologyTemplate the runtime environment is able
to infer the appropriate workﬂow execution plan.
IV. WOR KFL OW AND TAS K REPOSITORIES
Since we have been using publicly available platforms like
GitHub to maintain the Workﬂow and Task repositories, these
repositories can remain under users’ control. We provide our
own repositories with a set of basic reusable workﬂow tasks
and example workﬂows mainly to illustrate how the framework
can support reproducibility. But primarily, the ecosystem of
workﬂows and tasks will be grown by researchers and scien-
tists who want to develop their own workﬂow applications.
The choice of source version control platforms, such as
GitHub, to host repositories of workﬂows and tasks was
not accidental. These platforms offer great tools to support
sharing and communication. But more importantly, they allow
code developers and users to keep track of the developmental
changes, and that can directly help to improve repeatability
Our approach works on the principle that each single
workﬂow and workﬂow task is maintained in a separate code
repository. That brings multiple beneﬁts: repositories mark
clear boundaries between components, they offer independent
version control, allow for easy referencing and sharing, and
additionally, provide branches and tags to implement strict
control of workﬂow and task interface. With multiple reposi-
tories it’s also easy to encapsulate auxiliary information, such
as sample data and human readable description speciﬁc to
each workﬂow and task, which help to maintain long-term
A. Repository Structure
A repository aggregates various artifacts with information
and resources related to the workﬂow or task. These artifacts
include: TOSCA-based descriptors, workﬂow/task-speciﬁc life
cycle scripts, sample data, human readable description and the
one-click deployment script. The key and mandatory artifact
is a TOSCA-based descriptor. In the case of a workﬂow, it is
aServiceTemplate descriptor that encodes the structure of a
workﬂow and references all the workﬂow components and life
cycle scripts required for enactment. In the case of a task, the
descriptor includes TOSCA NodeType that deﬁnes the task
interface and refers to the actual task implementation code.
Other artifacts, although optional, are helpful to maintain
reproducibility. For example, provided with sample data and
the one-click deployment script users can easily test a work-
ﬂow or task in their environment. The script starts a multi-
step process which deploys the workﬂow together with basic
dependencies such as Docker and Cloudify and then enacts
it. Moreover, given a human readable description stored in
a repository, users can better understand the purpose of the
component and more easily use it. That also helps to recover
from failures in the face of changes in the workﬂow or any of
The structures of a workﬂow and task repository are very
similar to each other. This is because our tasks also include a
simple test workﬂow descriptor and sample data which allow
users to easily run a task and test whether it actually meets
Usually, our repositories include two workﬂow descriptors
that deﬁne the single- and multi-container conﬁguration. The
single-container workﬂows are executed within one Docker
container, whereas in the multi-container conﬁguration each
task runs in its own container. The use of the single- or multi-
container conﬁguration has also impact on the kind of images
that will be generated by our Automatic Image Capture facility.
We discuss this aspect later in Section VI.
These two default conﬁgurations describe, however, only
two extremes out of the range of possible workﬂow deploy-
ments. For more speciﬁc, advanced scenarios developers can
create workﬂows that include containers which group together
a subset of tasks, for example due to security reasons.
B. Interface Control via Branches and Tags
One of the major sources of workﬂow decay is changes
in the components that a workﬂow is comprised of. In a
living system changes are inevitable because the components
– tasks, libraries and other dependency workﬂows – undergo
continuous development. Yet to maintain reproducibility we
cannot forbid changes at all. Instead, we need to control them,
so they do not contribute to the decay.
The changes that occur naturally during workﬂow and task
development can affect two layers: the interface and/or im-
plementation of a component. By the workﬂow/task interface
we consider the contract between the developer and user of
a component. Speciﬁcally, it is the number and type of input
data and properties that the workﬂow/task uses in processing
but also the number and type of output data it produces.
Changes in the interface usually indicate some important
modiﬁcation to a component and need to be followed by
changes in its implementation. Conversely, changes to the
implementation only, if made carefully, are often merely
improvements in the code which can remain unnoticed.
Since in our framework each component has been main-
tained in a separate repository, we can control these two
types of changes effectively. We use repository branches to
denote changes in the interface, and tags to indicate signiﬁcant
improvements in the implementation. Minor implementation
changes are simple commit events in the repository which
do not need any special attention. All that, supported by an
effective way to reference a speciﬁc branch or tag offered
by GitHub are enough to address the problem of changing
However, these mechanisms are not only important for our
framework to maintain reproducibility of existing workﬂows
but they are also crucial for users in creating new workﬂows.
With repository branches users can easily see different ﬂavours
of a speciﬁc task or workﬂow and decide which one to use. On
the other hand tags help users to see major improvements of a
component or workﬂow. Tags also indicate to our framework
when there is a need to create a new component image.
To illustrate the use of branching and tagging in practice we
show later, in the Evaluation section, a development scenario
of one of our test workﬂows.
V. AUT OMATED WO RK FLOW DEPLOYMENT AND
The model of describing workﬂows using TOSCA proposed
in our previous work  is important because it not only
supports logical preservation but also offers the ability to
automatically deploy and enact our workﬂows. That facilitates
repeatability and improves workﬂow reproducibility.
Currently, as a workﬂow engine we use Cloudify – a
TOSCA-compliant runtime environment. To run a workﬂow,
users need to clone its repository to a target machine in which
they are going to run it. The repository includes sample data
and the one-click deployment script. It is a simple script able
to install the software stack required to run the workﬂow
(Cloudify, Docker and some auxiliary tools) and then to
submit the workﬂow to Cloudify with default conﬁguration
The default conﬁguration and sample data allow users to
easily test the workﬂow. It is also a means to repeat the
execution as well as a starting point to reproduce it. To repeat
a workﬂow users can simply switch to a very speciﬁc version
Workflow deployment initialization
Task image creation
Workflow deployment finalization
Fig. 2: Steps in automatic workﬂow deployment using the
of the workﬂow in the repository history and run the one-
click deployment script. Then, they can modify the default
conﬁguration and provide their own data. They can also switch
to the latest version of the workﬂow to validate the output or
compare it with output generated by previous versions.
The TOSCA descriptor of a workﬂow is a declarative
speciﬁcation that includes all tasks, dependency libraries and
task links embedded in the workﬂow ServiceTemplate. The
template includes also dependency against the task execution
environment which may be composed of one or more Docker
containers and VMs. Apart from the declaration of tasks and
libraries, the workﬂow ServiceTemplate encodes also the
topology of the workﬂow. For scientiﬁc workﬂows, usually
implemented as directed acyclic graphs, it is enough informa-
tion so a linear workﬂow execution plan can be automatically
inferred (Fig. 2). Cloudify follows the generated plan, and
deploys and runs one task at a time.
Crucial to workﬂow enactment are life cycle management
scripts. They implement deployment operations that each
workﬂow and task needs to go through, such as: initialization
of a shared space used to exchange data between tasks,
provisioning of the host environment (a container), installation
and conﬁguration of library dependencies. As the majority
of tasks follow a very similar pattern of deployment, we
developed a set of common, reusable life cycle scripts and
included them in the Core repository. Developers would refer
to these scripts when building their own workﬂows and tasks.
VI. AUTOMATIC IM AGE CA PT UR E
TOSCA-based descriptors are the fundamental element of
our framework, partly because they are used to implement log-
ical preservation, and partly because they allow workﬂows to
be automatically deployed and enacted. But running workﬂows
based only on these descriptors would end with signiﬁcant
Workflow deployment initialization
Workflow deployment finalization
Task 1 deployment
Task 2 deployment
Fig. 3: Steps in automatic workﬂow deployment using the
task images created by the AIC; cf. Fig. 2.
runtime overheads. The framework would repeat the same,
sometimes long running, steps to deploy a task every time it
However, our framework is ﬂexible enough to run the
workﬂow and task deployment process using a variety of
Docker images – starting from a pure OS image commonly
available from DockerHub, to a speciﬁc user-deﬁned image
which includes some workﬂow/task dependencies, to a com-
plete image that contains all of the required dependencies.
If the image referred to in the workﬂow ServiceTemplate
does not contain all the dependencies, they will be installed by
the framework on-demand during workﬂow enactment. That
automation simpliﬁes the development cycle because users are
not forced to manually prepare and manage task or workﬂow
images before they can use a workﬂow.
Yet, to simplify the use of the framework even more we
implemented the Automatic Image Capture facility. Using the
Docker image manipulation operations, the AIC is able to
create workﬂow and task images for the user automatically, so
they can be deposited in a private or public Image Repository.
Next time when a task is executed, instead of the complete
deployment cycle, the framework will use the images captured
earlier (Fig. 3). As shown later in the Evaluation section,
that simpliﬁcation can have very positive impact on runtime
The workﬂows we implemented are usually described with
two conﬁguration options: single- and multi-container. That
inﬂuences the way in which deployment and enactment of
workﬂows is performed. But it also determines what image
the AIC will create for the workﬂow. If the workﬂow uses
the single-container conﬁguration, the AIC will capture a
single image that encapulates the whole workﬂow with all
its components. Conversely, if the workﬂow uses the multi-
container conﬁguration, many smaller task images will be cre-
ated. Both options have their advantages: the former imposes
Fig. 4: The structure of the Sequence Cleaning workﬂow in
multi-container conﬁguration described in TOSCA.
less overhead in terms of storage and performance, whereas
the latter promotes better reuse of task images and gives more
ﬂexibility if the workﬂow requires updates. Nonetheless, they
support repeatability and reproducibility of workﬂows equally
Yet, to realise that goal images must be properly versioned.
The AIC uses identiﬁers from the Image Repository and tags
from the Workﬂow and Task Repositories to address this
aspect. The workﬂow/task image identiﬁer is generated based
on the base Docker image identiﬁer and the URL of a branch
or tag of a workﬂow/task for which the image is built. That
simple and unique mapping between code and image versions
allows users to include only the code URL in their workﬂow
ServiceTemplate, which is enough for the framework to fetch
and use the correct image for a task or workﬂow. And in the
case that the image does not yet exist, the workﬂow enactment
will follow the full deployment cycle while the AIC will
generate and deposit the relevant images for future use.
VII. EVALUATI ON A ND DISCUSSION
We describe the evaluation of our framework from three
different angles. First, we present a set of experiments to show
portability of the workﬂow description, so it can be enacted in
different environments. Second, we show the beneﬁt of using
the AIC to reduce a workﬂow’s runtime. Finally, we describe
a scenario of workﬂow and task development to illustrate
how the framework can maintain reproducibility in the face
of component changes.
A. Repeatability on Different Clouds
The goal of this set of experiments was to re-enact a work-
ﬂow, initially designed in a local development environment, on
three different Clouds and a local VM. We ran the experiment
for four different workﬂows which were previously designed
in e-Science Central.5The workﬂows: Neighbor Joining (NJ),
Sequence Cleaning (SC), Column Invert (CI) and File Zip
(FZ) are different in terms of structure, dependency libraries
they require and the number of tasks they include (11, 8, 7
and 3 tasks, respectively). As an example, Fig. 4 depicts the
structure of the Sequence Cleaning workﬂow used in a NGS
pipeline  and re-implemented using TOSCA.
TABLE I: Basic details about the execution environments.
Local VM 1 3 13 Ubuntu 14.04
Amazon EC2 1 1 8 Ubuntu Srv 14.04
Google Cloud 1 3.75 10 Ubuntu Srv 14.04
Microsoft Azure 1 3.5 7 Ubuntu Srv 14.04
TABLE II: The average execution time (in minutes) for
different workﬂows executed in different environments.
Neighbour Join. Column Invert File Zip
Single Multi Single Multi Single Multi
Devel. Env. 2.13 2.54 0.9 1.3 0.6 0.94
Amazon 1.74 2.27 0.66 1.18 0.5 0.84
Azure 2.52 3.86 1.35 2.1 1.23 1.38
Google 1.52 2.48 0.74 1.18 0.5 1.01
Local VM 1.65 2.5 1.03 1.37 0.53 1.03
To illustrate the potential of our framework in supporting
repeatability and reproducibility and the value of the proposed
workﬂow representation, each of the selected workﬂows was
ﬁrst developed, and then deployed and enacted in a local
development environment. We recorded the execution time
of that initial enactment which also automatically created a
workﬂow or task Docker images.
To conduct the rest of the experiment, we cloned the
workﬂow repositories in four different environments: a local
VM, and Amazon AWS, Google Engine and Microsoft Azure
Clouds. Finally, we re-executed workﬂows ﬁve times in each
VM and collected results. The conﬁguration of the VMs is
presented in Table I.
Each workﬂow was used in two available conﬁgurations:
single- and multi-container to show the overheads of running
multiple task containers. The output data of the workﬂows
were the same in all executions and the average execution
times were similar. Fig. 5 shows a chart with the results for
the SC workﬂow, whereas Table II includes the results for the
other tested workﬂows.
The experimental results show that our scientiﬁc workﬂows
can be re-enacted, producing the same outputs in similar
runtime. They also illustrate a common development pattern
in which developers build and test a workﬂow in their local
environment and once it is ready they can share it with others
via Workﬂow, Task and Image Repositories. Both the TOSCA
representation and Docker packaging offer signiﬁcant support
for this pattern.
B. Automatic Image Capture for Improved Performance
As mentioned earlier, our framework is ﬂexible enough to
allow tasks and workﬂows to use pure OS images available
from DockerHub or custom, predeﬁned task/workﬂow images
created by users or the AIC. By using a predeﬁned image
we can avoid the installation of dependency libraries and task
artifacts required during workﬂow execution. And, as shown
Fig. 5: The average execution time for Sequence Cleaning
workﬂow executed in different environments.
Fig. 6: The average execution time of test workﬂows using
different task images.
previously in Fig. 3, that can reduce the number of deployment
steps required in workﬂow enactment.
However, the elimination of some of the deployment tasks
can have very positive impact on the runtime of workﬂows.
To show it we prepared a set of experiments in which we ran
our workﬂows using different images: the base image available
on DockerHub, the base image with pre-installed dependency
libraries and task images captured by the AIC. Fig. 6 depicts
the average workﬂow execution time for four tested workﬂows.
Clearly, there was a signiﬁcant overhead in using the base
image from DockerHub. The main reason was the time re-
quired to install dependency libraries such as the Java Runtime
Environment or, in the case of the NJ workﬂow, the Wine
The second and third option show small differences with
slightly shorter execution for experiments which used im-
ages created by the AIC. That is because the AIC captures
everything the task needs to run (according to the task’s
TOSCA descriptor), whereas the second option included only
dependency libraries while the task artifacts were downloaded
and installed on-demand.
The results explicitly show that from the performance per-
spective the use of pre-packaged images is the most effective
option. However, from the user perspective the quickest and
easiest is the use of the base images already available on
DockerHub instead of building images manually. Our frame-
work supports such ﬂexibility for the cost of some overhead
incurred by the initial execution of a workﬂow. The ﬁrst run
will involve the complete deployment cycle and creation of the
images, whereas any subsequent executions will beneﬁt from
that images and will run at full speed.
C. Reproducibility in the Face of Development Changes
One of the key factors that can reduce decay of our
workﬂows is their ability to embrace changes that occur nat-
urally during workﬂow and task development. These changes
may affect mainly two layers: the input/output interface of a
workﬂow or task, and their implementation.
In Fig. 7 we illustrate a hypothetical evolution scenario
of the Sequence Cleaning workﬂow shown earlier in Fig. 4.
The left side depicts the timeline of development events that
occured in the scenario. It is accompanied by change trees
from two repositories: the left tree represents the evolution of
the workﬂow, the one on the right shows the evolution of one
of the workﬂow tasks.
We start the analysis with the version of the SC workﬂow
presented earlier and tagged as v1 in Fig. 7 (event 1). By
tagging we acknowledge that this version has been published,
advertised and so may be used by others.
Now, let us imagine that a new requirement for our work-
ﬂow appeared (event 2) – users of the workﬂow want to save
storage space by compressing the workﬂow output ﬁles. In
response to that, the developers created a new Zip task (cf.
the right version tree) and wanted to add it to the workﬂow.
Note, however, that changing the type of outputs generated
by the workﬂow is a change of its interface. For example,
it would likely break any external application that has used
uncompressed outputs provided by version v1. Thus, before
we can add the Zip task to the workﬂow we need to create
a new branch, named zipped in the ﬁgure (event 3).
The zipped branch of the workﬂow refers to the
Zip/master branch of the task. By default such a reference
means that the workﬂow depends on the latest tagged version
of the task coming from that branch. This is convenient
because as the task implementation is improved over time,
the zipped workﬂow will use a task’s latest tagged ver-
sion (including v1.1). In this way workﬂows are updated
automatically without the need to change them when only
implementation improvements are made to the tasks. However,
if strict workﬂow repeatability is required, the reference to the
Zip task would include a speciﬁc tag. That would prevent the
automatic update of such a workﬂow.
Next, event (4) denotes a new release of the Java library
used by some tasks in the workﬂow. In our hypothetical sce-
nario the new version of the library has improved performance
and many errors ﬁxed. Thus, the event is a signal for us
to update the workﬂow as soon as possible. That change is
compatible with the previous version of the workﬂow and so
we do not need to create a new branch. Instead, we merge
in the changes from master to the zipped branch, so that
both branches can beneﬁt from the updated library.
After adding the Zip task and updating the Java library we
also tag and announce new, improved versions of our workﬂow
(event 5). Speciﬁcally, SampleCleaning/v2 runs faster
and produces smaller outputs which is of great value to the
Event (6) marks the arrival of yet another requirement –
users want the outputs of the workﬂow to be encrypted to avoid
leakage of patients’ raw genomic data. That requires, however,
some improvements in the Zip task, including changes to the
underlying tool used to compress the data.
After running some tests it appeared that the new zip tool
has much better performance, and so we quickly decided to
swap the old implementation with the new tool and tag the task
v1.1. Note that this simple act of tagging a version causes
an automatic update of all workﬂows that rely on that branch.
Therefore, from now on the SampleCleaning.v1.1 and
.v2 workﬂows will use the updated implementation of the
Continuing with the task update, we create a new
password branch in the task repository (event 7). This new
branch is needed due to the changes in the task’s input inter-
face – the new version has the extra password input prop-
erty. But the use of encryption is optional, so to limit the num-
ber of branches we decided to discontinue the previous version
of the Zip task and tag branch master as deprecated
(event 8). That indicates to users that they should use other
branches of the task in their new workﬂows. Nonetheless, the
old version will need to remain in the repository because others
may still use workﬂow SampleCleaning.v2 which relies
on the Zip/master branch.
As proactive workﬂow developers we noticed that the
master branch of the Zip block has been deprecated and
so we decided to update the reference in the zipped version
of the workﬂow to the active password branch. Note that
this update does not require a new branch because the use of
encryption in the Zip task is optional. Thus, the workﬂow’s
input and output interface can remain the same (event 9).
The new branch is created later (event 10) when the
password property is exposed to end users as the workﬂow
input property. We want the users to be able to set a custom
password for the output data and that requires a change in the
workﬂow interface which, in turn, requires a new branch.
The presented hypothetical evolution shows very common
patterns in the development of workﬂows and their compo-
nents – changes can occur at different layers of workﬂow
and tasks. However, by means of separate task and workﬂow
repositories, and conscious tagging and branching of their
code, we can maintain all workﬂow versions in the working
state and also ensure that their evolution does not break
external applications that rely on them.
VIII. CONCLUSIONS AND FUTURE WOR K
Reproducibility is a crucial requirement for scientiﬁc ex-
periments, enabling them to be veriﬁed, shared and further
developed. Therefore, workﬂow reproducibility should be an
important requirement in e-Science. In this paper we presented
a design and prototype implementation of a framework that
supports repeatability and reproducibility of scientiﬁc work-
ﬂows. It combines two well-known techniques: logical and
Fig. 7: A hypothetical evolution of the Sequence Cleaning workﬂow.
physical preservation. To implement the logical preservation
technique we use the TOSCA speciﬁcation as a means to
describe workﬂows in a standardised way. To realise physical
preservation we use lightweight virtualisation which allows
us to package workﬂows, tasks and all their dependencies as
Moreover, our framework uniquely combines software
repositories to manage versioning of source code, an au-
tomated workﬂow deployment tool that facilitates workﬂow
enactment and reuse, and automatic image creation to improve
performance. They all signiﬁcantly increase the degree of
workﬂow reproducibility. And although, currently, our frame-
work does not capture retrospective provenance traces, which
has been left for future work, the proposed TOSCA-based
workﬂow descriptors may be considered to be a detailed
prospective provenance document. They describe the high-
level structure of the workﬂow, which might also be encoded
using, for example, the ProvONE speciﬁcation,6together with
all details to recreate the complete software stack needed for
deployment and enactment.
Still, however, a considerable part of reproducibility is in
the hands of workﬂow developers: scientists and researchers
who will use our tools. Only with their help and dedication
6The latest draft of the ProvONE speciﬁcation from May 2016 is available
can workﬂows be adequately described, have sample input and
conﬁguration data to facilitate testing, and be properly ver-
sioned with branches and tags indicating major development
events. Our framework merely makes these tasks easier.
As for the future, the presented work opens a variety
of interesting research avenues. We plan to add a facility
to capture retrospective provenance information for work-
ﬂows and tasks that could complement the history of their
development. We consider implementing support for large-
scale, distributed workﬂow enactment. Finally, we also plan
to investigate to what extent our framework can model legacy
workﬂows designed in other scientiﬁc workﬂow management
systems like Pegasus and Taverna.
IX. ACK NOWLEDGEMENT
This work was partially supported by EPSRC grant no.
EP/N01426X/1 in the UK.
 E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workﬂows and
e-Science: An overview of workﬂow system features and capabilities,”
Future Generation Computer Systems, vol. 25, no. 5, pp. 528–540, May
 B. Liu, B. Sotomayor, R. Madduri, K. Chard, and I. Foster, “Deploying
Bioinformatics Workﬂows on Clouds with Galaxy and Globus Provi-
sion,” 2012 SC Companion: High Performance Computing, Networking
Storage and Analysis, pp. 1087–1095, Nov. 2012.
 G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, and K. Vahi,
“Characterizing and proﬁling scientiﬁc workﬂows,” Future Generation
Computer Systems, vol. 29, no. 3, pp. 682–692, Mar. 2013.
 H. Meng, R. Kommineni, Q. Pham, R. Gardner, T. Malik, and D. Thain,
“An invariant framework for conducting reproducible computational
science,” Journal of Computational Science, vol. 9, pp. 137–142, 2015.
 J. Zhao, J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. Garcia-
Cuesta, A. Garrido, K. Hettne, M. Roos, D. De Roure, and C. Goble,
“Why workﬂows break Understanding and combating decay in Taverna
workﬂows,” in 2012 IEEE 8th International Conference on E-Science.
IEEE, Oct. 2012, pp. 1–9.
 J. Goecks, A. Nekrutenko, and J. Taylor, “Galaxy: a comprehensive
approach for supporting accessible, reproducible, and transparent com-
putational research in the life sciences.” Genome biology, vol. 11, p.
 A. Banati, P. Kacsuk, and M. Kozlovszky, “Four level provenance
support to achieve portable reproducibility of scientiﬁc workﬂows,” in
2015 38th International Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO), no. May. IEEE,
May 2015, pp. 241–244.
 J. Freire, P. Bonnet, and D. Shasha, “Computational reproducibility:
state-of-the-art, challenges, and database research opportunities,” Pro-
ceedings of the 2012 ACM SIGMOD . . . , pp. 593–596, 2012.
 I. Santana-Perez, R. F. da Silva, M. Rynge, E. Deelman, M. S. P´
andez, and O. Corcho, “Reproducibility of execution environments
in computational science using Semantics and Clouds,” Future Genera-
tion Computer Systems, 2016.
 O. Standard, “Topology and Orchestration Speciﬁcation for Cloud
Applications version 1.0,” pp. 1–114, 2013.
 S. Arabas, M. R. Bareford, L. R. De Silva, I. P. Gent, B. M. Gorman,
M. Hajiarabderkani, T. Henderson, L. Hutton, A. Konovalov, L. Kotthoff,
C. Mccreesh, M. a. Nacenta, R. R. Paul, K. E. J. Petrie, A. Razaq,
D. Reijsbergen, and K. Takeda, “Case Studies and Challenges in
Reproducibility in the Computational Sciences,” pp. 1–14, 2014.
 F. Chirigati, D. Shasha, and J. Freire, “ReproZip : Using Provenance
to Support Computational Reproducibility,” USENIX Workshop on the
Theory and Practice of Provenance, 2013.
 V. Stodden, F. Leisch, and R. D. Peng, Implementing reproducible
research. CRC Press, 2014.
 B. Howe, “Virtual Appliances, Cloud Computing, and Reproducible
Research,” Computing in Science & Engineering, vol. 14, no. 4, pp.
36–41, Jul. 2012.
 F. Jiang, C. Castillo, C. Schmitt, A. Mandal, P. Ruth, and I. Baldin, “En-
abling workﬂow repeatability with virtualization support,” Proceedings
of the 10th Workshop on Workﬂows in Support of Large-Scale Science
- WORKS ’15, pp. 1–10, 2015.
 O. Spjuth, M. Dahl¨
o, F. Haziza, A. Kallio, E. Korpelainen, and
E. Bongcam-Rudloff, “BioImg.org: A Catalog of Virtual Machine Im-
ages for the Life Sciences,” Bioinformatics and Biology Insights, no.
Vmi, p. 125, Sep. 2015.
 P. Bonnet, S. Manegold, M. Bjørling, W. Cao, J. Gonzalez, J. Granados,
N. Hall, S. Idreos, M. Ivanova, R. Johnson, D. Koop, T. Kraska,
uller, D. Olteanu, P. Papotti, C. Reilly, D. Tsirogiannis, C. Yu,
J. Freire, and D. Shasha, “Repeatability and workability evaluation of
sigmod 2011,” SIGMOD Rec., vol. 40, no. 2, pp. 45–48, Sep. 2011.
 C. a. Goble, J. Bhagat, S. Aleksejevs, D. Cruickshank, D. Michaelides,
D. Newman, M. Borkum, S. Bechhofer, M. Roos, P. Li, and D. de Roure,
“myExperiment: A repository and social network for the sharing of
bioinformatics workﬂows,” Nucleic Acids Research, vol. 38, no. May,
pp. 677–682, 2010.
 K. Belhajjame, J. Zhao, D. Garijo, M. Gamble, K. Hettne, R. Palma,
E. Mina, O. Corcho, J. M. G´
erez, S. Bechhofer, G. Klyne, and
C. Goble, “Using a suite of ontologies for preserving workﬂow-centric
research objects,” Web Semantics: Science, Services and Agents on the
World Wide Web, vol. 32, pp. 16–42, 2015.
 P. Missier, S. Woodman, H. Hiden, and P. Watson, “Provenance and
data differencing for workﬂow reproducibility analysis,” Concurrency
Computation Practice and Experience, no. October, 2013.
 D. McGuinness, J. Michaelis, L. Moreau, O. Hartig, and J. Zhao,
“Provenance and Annotation of Data and Processes,” Ipaw, vol. 5272,
pp. 78–90–90, 2008.
 K. Hasham, K. Munir, R. McClatchey, and J. Shamdasani, “Re-
provisioning of Cloud-Based Execution Infrastructure Using the Cloud-
Aware Provenance to Facilitate Scientiﬁc Workﬂow Execution Repro-
ducibility,” in Cloud Computing and Services Science, 2016, pp. 74–94.
 Tobias Binz; Gerd Breiter; Frank Leymann; Thomas Spatzier, “Portable
Cloud Services Using TOSCA,” pp. 80–84, 2012.
 T. Binz, U. Breitenb ¨
ucher, O. Kopp, and F. Leymann, TOSCA:
Portable Automated Deployment and Management of Cloud Applica-
tions, A. Bouguettaya, Q. Z. Sheng, and F. Daniel, Eds. New York,
NY: Springer New York, 2014.
 R. Qasha, J. Cala, and P. Watson, “Towards Automated Workﬂow De-
ployment in the Cloud Using TOSCA,” in 2015 IEEE 8th International
Conference on Cloud Computing. IEEE, Jun. 2015, pp. 1037–1040.
 D. Merkel, “Docker: lightweight Linux containers for consistent devel-
opment and deployment,” p. 2, 2014.
 R. Chamberlain and J. Schommer, “Using Docker to support Repro-
ducible Research (submission to WSSSPE2),” pp. 1–4, 2014.
 C. Boettiger, “An introduction to Docker for reproducible research,”
ACM SIGOPS Operating Systems Review, vol. 49, no. 1, pp. 71–79,
 J. Cała, E. Marei, Y. Xu, K. Takeda, and P. Missier, “Scalable and
efﬁcient whole-exome data processing using workﬂows on the cloud,”
Future Generation Computer Systems, Jan. 2016. [Online]. Available: