ArticlePDF Available

Streamlining the Inclusion of Computer Experiments In a Research Paper

Authors:

Abstract and Figures

Designing clean, reusable, and repeatable experiments for a research paper does not have to be difficult. We report on our efforts to create an integrated toolchain for running, processing, and including the results of computer experiments in scientific publications.
Content may be subject to copyright.
Streamlining the Inclusion of
Computer Experiments In A
Research Paper
Sylvain Hallé, Raphaël Khoury, Mewena Awesso
Université du Québec à Chicoutimi, Canada
Designing clean, reusable and repeatable experiments for a research paper doesn’t have to be
hard. We report on the efforts we invested in creating an integrated toolchain for running,
processing and including the results of computer experiments in scientific publications —and
saving us time along the way.
Fo
r authors of scientific research involving computer
experiments, writing a new paper oten feels like
reinventing the wheel. We spend considerable
amounts of time at the command line, writing shell scripts
that run programs, shufle around temporary data files,
and crunch these files in various ways. For first-time au-
thors, mastering the syntax of GnuPlot, learning to use
Unix pipes and to parse CSV files is almost a rite of pas-
sage. Such skills are so ingrained in the computer culture
that they have recently been regrouped into a corpus of
basic know-how called sotware carpentry.
Alas, not much of these scripts and temp files survive
the next paper. More oten than not, their content is so
specific to our current experiments and data-crunching
tasks that hardly anything is worth reusing. As one study
has put it, we paper authors are very proficient at cooking
up “hack-together, use-once, throw-away scripts” []. As
fas as carpentry is concerned, it is as if we had to reinvent
hammers, nails and screwdrivers every time we wanted
to build a house.
An important casuality of this state of things is repro-
ducibility, or the capability for a group of people to confirm
experimental claims in a more or less independent way
(see sidebar). Through our poorly documented, throw-
away scripts, we give anybody else a hard time trying to
https://software-carpentry.org
simply re-execute what we did —as a matter of fact, even
the authors themselves may have trouble re-running their
experiments ater six months. Our experimental setup
may be so messy that we don’t even bother making it avail-
able to others. In a recent paper, Collberg and Proebsting
reveal that out of a sample of  research papers that
describe experiments,  had results that could not be
reproduced, mostly due to code and data not being made
publicly available []. This has led to what some in the field
of computing research have called a credibility crisis []: one
of the cornerstones of science is being compromised by
our careless attitude towards empirical work. Recent ef-
forts such as paper badges, open repositories, submission
guidelines and artifact evaluation committees, are all at-
tempts to reverse this trend []. Unfortunately, complying
with these guidelines and principles, from the author’s
point of view, oten amounts to additional work, and is
oten viewed with resistance.
Not so long ago, this is what we were thinking too. This
was before we decided to invest time and energy to de-
velop generic and, most importantly, reusable tools to help
us run, process and include experimental results in our
research papers. The result is LabPal
, an open source sot-
ware library that streamlines many menial data crunch-
ing tasks that we used to hand code in single-use scripts.
https://liflab.github.io/labpal
1Computer
Sidebar: Repeatability, Replicability, Repro-
ducibility
Definitions of reproducibility, repatability and other
terms abound in the literature and vary subtly; here we
follow the basic terminology proposed by the ACM.aRe-
peatability means that a researcher can reliably repeat
her own computation. Replicability means that an in-
dependent group can obtain the same result using the
author’s own artifacts. Finally, reproducibility means
that an independent group can obtain the same result
using artifacts which they develop completely indepen-
dently.
Pushing data processing and manipulation functions
outside of custom scripts, and into shared and reusable
libraries, reduces the code footprint of what is con-
sidered the author’s “own” artifacts –and hence the
amount of code that must be re-implemented by an
independent third party to be considered as “repro-
duced”.
ahttps://www.acm.org/publications/policies/
artifact-review- badging
In the following, we shall show through a simple, step-
by-step example, how LabPal can be used to execute ex-
periments, and easily transfer their results into a scien-
tific publication. Most importantly, we hope to convince
portential authors that designing experiments for repro-
ducibility, when using the right tools, does not necessarily
represent more work, and can even provide tangible ben-
efits for them.
A BASIC WORKFLOW
LabPal is a sotware library whose primary goal is to
make the life of authors easier, by helping them run, pro-
cess and include experimental results in a research pa-
per. Contrary to many other tools and platforms (some
of which will be discussed at the end of this paper), it is
not focused on help third-parties reproduce and explore
results. Rather, reproducibility comes as a by-product of
our design goals: if running experiments and examining
their results is easy and intuitive for paper authors, it will
be equally easy for anyone else to do the same thing.
As a running example, suppose we are writing a paper
that compares sorting algorithms. We would like our pa-
per to include experimental results of running multiple
existing algorithms on various random arrays, and use
LabPal to help us run these experiments. We shall start
with an empty template project, such as the one that is
freely available online
https://github.com/liflab/labpal-project
. The com-
plete version of the lab shown as an example in this paper can also be
Structure of a lab
First, we have to understand the structure of LabPal,
summarized in Figure . Its basic building block is an
experiment
, which is an object that can take input param-
eters, can be run, and produces one or more output values.
Typically, many experiment copies (or “instances”) are cre-
ated, each with its own set of input parameters. There can
even exist more than one type of experiment, and each
type may expect diferent input parameters, perform dif-
ferent things when run, and produce diferent kinds of
output parameters.
The results of experiments can be collected into
tables
,
which are data structures that are populated by fetching
values of specific parameters in a pool of experiments.
Many tables can be created, each being associated to dif-
ferent sets of experiments or fetching diferent parame-
ters. Tables can be transformed into other tables using
operations called
transformations
. For example, a table
can be transposed, two tables can be joined on a common
parameter name (similar to an SQL join), the sum, average,
or quartiles of a table’s columns can be calculated.
Finally, tables can be given as input to
plots
, which are
graphical representations of a table’s content. By default,
LabPal provides interchangeable connectors to two widely
used plotting systems, GnuPlot
and GRAL
. A
laboratory
(or “lab” for short) is an organized and documented col-
lection of experiments, tables and plots.
In line with what other research groups have already
revealed, we agree that “we can only achieve the necessary
level of reliability and transparency by automating every
step” []. To this end, the execution of a lab can be done
interactively, but can also run in a completely autonomous
way. Perhaps more importantly, we believe in encapsulat-
ing important components into reusable units. Therefore,
while the concepts of experiments and tables are implic-
itly present in virtually any experimental setup in some
form or another, LabPal allows a user to manipulate them
through objects of a high level of abstraction, resulting in
a relatively small amount of code. Since LabPal is a Java
library, all these concepts correspond concretely to Java
classes and objects. One can also do the same thing in
Python and Scala.
Creating experiments
To setup our sorting lab, we first need to create an exper-
iment. In LabPal, experiments are all descendants of the
Experiment
class. In our example, an experiment takes
a single input parameter, which is the size of the array we
found in the Examples folder of LabPal itself: https://github.com/
liflab/labpal/tree/master/Source/Examples.
https://gnuplot.info
https://trac.erichseifert.de/gral
XXXX 2018 2
E
1
t=2
x=[3,1,0.3]
u=http://... k=0.239
z=["foo","bar"]
E
1
t=5
x=[6,2,0.2]
u=http://...
E
2
q=0
f="foo"
...
t z b
SUM JOIN
Experiments
Tables
Table
transformation
Plot
Parameters
Figure 1: The basic structure of a laboratory in LabPal.
wish to sort. Setting an input parameter is done by calling
the method
setInput
, which associates to a parameter
*name* a particular *value*. When being run, the exper-
iment will generate an array of given size and then sort
it. This will be done in method
execute
, which all experi-
ments must implement. Finally, our experiment produces
a single output value, which corresponds to the time it
takes to sort that particular array. Writing an output value
is done by calling the method
write
, which associates to
aname a particular value.
Therefore, a sensible way to create our experiment
would be to write this:
class GnomeSort extends Experiment {
transient int[] array;
public GnomeSort(int[] array) {
this.array = array;
setInput("Size", array.length);
setInput("Algorithm", "Gnome Sort");
}
public void execute() {
long start = System.currentTimeMillis();
int pos = 0;
while (pos < array.length)
if (pos == 0 || array[pos] >= array[pos-1]) pos++;
else swap(array, pos, (pos--)-1);
write("Time", System.currentTimeMillis() - start);
}
}
The constructor receives an array to sort, and sets its
length as an input parameter of the experiment with name
“Size”. When called, method
execute
sorts this array us-
ing some algorithm (here Gnome sort). This last bit of
code is surrounded by two calls to get the current system
time. The duration of the sort operation is written as an
output data and is given the name “Time”.
We are now ready to create a
laboratory
(“lab” for short),
which will be the environment in which these experi-
ments will be run. In LabPal, a lab is a descendent of the
Laboratory
class. The template project already contains
an empty laboratory called
SortingLab
. Experiments
can be created in a method called
setup
, and are added to
the lab by a call to method
add
. Our lab could hence look
like this:
class SortingLab extends Laboratory {
public void setup() {
for (int n : new int[]{10, 100, 1000}) {
int[] array = generateRandomArray(n);
add(new GnomeSort(array));
add(new QuickSort(array));
...
}
}
public static void main(String[] args) {
initialize(args, SortingLab.class);
}
}
This lab creates instances of the
GnomeSort
experi-
ment with diferent array sizes, and adds them to the lab
(it also adds instances of experiments that use other sort-
ing algorithms on the same arrays, which we do not show
here). The
main
method is only there so that our lab can
be executable from the command line.
Our template project contains an Ant build script that
compiles and bundles everything into a JAR file called (by
default)
my-lab.jar
. This JAR is runnable and stand-
alone: we can move it around without needing to worry
about installed libraries and other dependencies. We can
then start the lab by simply running the JAR file.
3Computer
Using the web console
By default, running a lab starts a local web server, which
can be accessed from a browser. The default address is
http://localhost:21212
, but this can be changed by
a command-line setting. Opening this URL in a browser
leads us to the home page of the lab. We can setup this
page to give a text that provides a few details about what
this lab is about, so that someone who retrieves our lab file
and runs it has an idea of what it does. That same person
can also click on the
Help
button at the top of the page to
get more information about how the console works and
how to run the experiments that the lab contains.
Clicking on the
Experiments
button at the top of the page
brings us to the list of experiments, as is shown in Figure
a. For each, we see its unique number, as well as the input
parameters that determine what this experiment does. In
our case, our experiments are meant to compare sorting
algorithms, so each of them sorts an array of a given size
using a specific algorithm. For each experiment, we can
also see a status icon that tells whether the experiment is
queued, is currently running, etc.
We are now ready to run the experiments. This is done
by sending them to what we call a
lab assistant
. The as-
sistant is a process that merely runs, in a linear fashion,
all the experiments that are sent to its queue.
To send
experiments to the queue, one simply checks some of the
experiments in the list, and sends them to the assistant
with the appropriate button. The assistant has its own
page, where the contents of its queue can be shown and
modified. Other pages in the web console allow us to visu-
alize the status of each experiments and their results in
various ways, by the means of auto-generated
tables
and
plots.
Adding tables and plots
So far, our lab contains three experiments, each of
which computes and generates a single output data ele-
ment, namely the duration of the sorting operation. These
can be viewed by clicking on each of the experiments in the
web console. Let us now collect these results and display
them.
To do so, we need to create a
Table
. A table is a collec-
tion of table
entries
, each of which is a set of key-value
pairs. We would like to create a table from the results
produced by our experiments: each entry should contain
the
Size
of the array and the
Time
it took to sort it. This
is done by creating a new
ExperimentTable
—that is, a
table whose content is fetched from the data produced by
Alternatively, we can choose a non-linear assistant that runs experi-
ments using multiple threads. Obviously, this assistant is appropriate
only for situations where the experiment results are not sensitive to the
use of multi-threading.
(a)
(b)
(c)
Figure 2: Screenshots of LabPal’s web inter-
face. (a) the list of experiments included in
a lab; the running status for each of them is
shown as an icon. (b) each experiment can
show its own input/output parameters, as well
as metadata automatically recorded by the lab
assistant. (c) The auto-generated plots provide
various buttons for exporting them to many for-
mats.
XXXX 2018 4
one or more experiments. We create the table by telling
it the names of the parameters we wish to fetch from the
experiments:
ExperimentTable t = new ExperimentTable(
"Algorithm", "Size", "Duration");
If we want the table to show up in the lab console, we
must also add it to the lab by calling
add
. Once the table is
created, experiments must be added to it. If we recompile
and run this new lab, you will now see that a table shows
up in the
Tables
page in the web console, with the name
“Table ”. Clicking on it will show something like this:
Algorithm Size Duration
Gnome Sort



Quick Sort



Each line of the table corresponds to the values fetched
from one experiment we added to it. Table entries are
automatically sorted in the order the columns are men-
tioned, and identical values are grouped. The
Size
param-
eter is filled, but the
Duration
column shows nothing. This
is normal: since we haven’t run any experiment, these
data elements have not yet been produced. If we run one
of the experiments and go back to the table, you will see
that the corresponding cell now has a value. As a matter of
fact, when we run a lot of experiments, we can periodically
refresh a table’s page and see the cells being filled with
data progressively.
It is sometimes better to display data graphically, so
let’s add a
Plot
to our lab. A plot is always created with
respect to an existing table. In our case, we would like to
trace a line showing the sorting time with respect to the
size of the array, and have one such line for each sorting
algorithm. The object we use for this is a
Scatterplot
.
However, such a plot expects its input to be organized in a
diferent way: the first column should contain the values
of the
x
axis, and the remaining columns should contain
the yvalues of each data series.
This can be done by applying a table
transformation
to the original experiment table. A transformation is an
operation that takes one or more tables, and produces an-
other table as an output. Here, the
ExpandAsColumns
transformation is instructed to make one column for each
distinct value of “Algorithm”, and for each line of the orig-
inal table, use the value of “Time” as the values for this
column. This table can then be passed to the scatterplot,
as follows:
TransformedTable t2 = new TransformedTable(
ExpandAsColumns.get("Algorithm", "Time"), t);
Scatterplot plot = new Scatterplot(t2);
Table
t2
now has the appropriate structure to be passed
to our scatterplot:
Size Gnome Sort Quick Sort



The concept of table transformations is very power-
ful; many common operations on tables can actually be
done by chaining single-line instructions over existing
tables. Moreover, if more usage-specific transformations
are needed, users can write their own and compose them
with the existing ones. If we recompile and restart the
lab, we will now see a plot in the
Plots
page, called “Plot
”. Since the plot is created from a table, its contents are
dynamically updated every time the page is refreshed.
Including results in a paper
We claimed earlier that LabPal could help us streamline
the inclusion of experimental results in a research paper.
We have seen how the web interface can simplify the exe-
cution and processing of experimental results. We shall
now see how these results can be easily transplanted into
a paper in progress, especially if we use L
A
T
E
X.
Through the web interface, each plot can be saved as a
PNG image, exported as a PDF file, or have its raw data
output as a GnuPlot input file we can then process outside
of LabPal if we wish. However, for a paper in progress,
it may be tedious to re-download each plot every time
the lab is re-run. This is why, in the
Plots
page, a button
allows the user to download all plots at once; each plot
becomes one distinct page of the document. This means
that in a L
A
T
E
X document, to show each plot, we always
refer to the same file, but to a diferent
page
of that file.
A big advantage is that if we update the lab, we simply
re-download that PDF, and all the plots are updated at
once. There is another button on the page, which ofers us
to download a set of macros that go with the plots. These
macros are L
A
T
E
X commands one can use to summon each
plot with an identifier, eliminating the need to explicitly
mention a page number.
A table can also be exported in various ways from the
web interface: we can copy-paste its contents in a word
processor, which should normally preserve its formatting.
Otherwise, in the
Tables
page of the web interface, we can
click on one of the buttons to download the table as an
HTML, plain-text (CSV) or L
A
T
E
X file. The L
A
T
E
X version, in
particular, is already formatted with borders, headings,
etc., which spares us from doing it by hand from raw text
data. Similarly to plots, all tables can be download into
a single-file bundle, and macros can be used to include
the table of our choosing using a simple one-line L
A
T
E
X
5Computer
Sidebar: The Red (and Green and Blue)
Badge of Courage
In 2016, the ACM Task Force on Data, Software and
Reproducibility in Publication [2] setup a set of “badges”
that can be attached to a research paper when it is
accompanied with executable artifacts.
If the artifacts have been checked to be available on
an archival platform, it receives the Artifacts Available
badge. If the artifacts have been successfully executed
by an external referee, it receives the Artifacts Evalu-
ated – Functional badge, and the Artifacts Evaluated –
Reusable if they exceed minimal functionality. The Re-
sults Replicated badge is given if the artifact can be run
and re-obtain the results mentioned in the paper, and
Results Reproduced badge indicates that the paper’s
claims have been verified through an entirely indepen-
dent implementation.
A well-designed LabPal instance provides an easy way
to claim the Functional,Reusable and even Replicated
badges, by letting a referee use the web interface to
run experiments and examine all the tables and plots
that are generated from them.
command. So, not only does LabPal take care of some
tedious formatting tasks, the single-file bundle of plots
and tables makes it actually faster to update an existing
paper with freshly computed data.
ADVANCED FEATURES
We now have a basic running laboratory with auto-
generated tables, plots, and an interactive web interface.
Let us remind the reader that we have written, all in all,
about  lines of Java code.
Adding metadata
LabPal ofers various ways to define
metadata
for your
lab –that is, data about its data. A first metadata is about
the lab itself. You can enter a textual description for the
lab using method
setDescription
. If a description is
defined, it will be displayed in the web console in the
Home
page, replacing the default help text that shows up
otherwise. This description can include any valid HTML
markup, and can be loaded from an external file for con-
venience.
A description can be entered for each experiment
separately. This can be done using the experiment’s
setDescription
method to set the text. Again, this de-
scription can contain any valid HTML, and it is displayed
in the corresponding
Experiment
page. Finally, each in-
dividual parameter (input and output) can also be given
a description. To this end, an experiment can use the
method describe:
public GnomeSort(int[] array) {
this.array = array;
setInput("Size", array.size);
describe("Size", "The size of the array to sort");
describe("Time", "The sorting time (in ms)");
setInput("Algorithm", "Gnome Sort");
describe("Algorithm", "The algorithm used");
}
One can see how the
describe
method has been used
to associate a short description of the input parameter
“Size”, as well as the output parameter “Time” and the in-
put parameter "Algorithm". These last two parameters
have not yet received a value, but they can already get a
description. In the web console, the description of pa-
rameters shows as tooltips wherever the parameter name
appears.
Saving, loading, merging labs
Our examples so far have involved relatively short ex-
periments that could run within a few seconds. However,
the labs we create will likely contain more than a handful
of experiments, whose running time may take minutes, if
not hours. For all sorts of reasons, we might not want to
run all these experiments in a single pass. It would be nice
if we could select and run a few of them, close our setup
(and even our computer) and run more experiments at a
later time.
Saving the current state of a lab can be done by going to
the
Status
page and clicking on the save button. For each
experiment, this file saves all of its input parameters, the
output parameters it generated (if any), as well as its cur-
rent status (finished or not). Loading a lab is the reverse
operation: given a previously saved lab file, LabPal will
read and restore each experiment in the state it was when
the lab was saved. A lab can be loaded either through the
web interface, or when launching it at the command line.
If you write a research paper, it might be desirable to
put online a copy of your lab, which contains the exact
results you refer to in the paper. One possible way is to
also put online the save file corresponding to the lab, as
described above. A user can then download the lab, and
(through the command line or the web interface), load
the save file to retrieve the data. LabPal can also be setup
to load this data automatically, without the need for user
intervention. In order to do so, the saved file simply needs
to be placed the save file
within
the JAR bundle that you
create and distribute.
We shall note that lab authors do not need to write
saving/loading code for their experiments by themselves.
LabPal takes care of serializing and deserializing any
Experiment
,
Table
and
Plot
object without user inter-
vention.
XXXX 2018 6
Provenance Tracking
Provenance is the ability to retrace the sequence of oper-
ations that lead to the production of a specific data point.
In other words, it is computing an answer to the ques-
tion: where does this value come from? LabPal provides
facilities to keep track of the sequence of operations that
lead to the production of a specific data point. Using stan-
dard tables, table transformations and plot objects, this
means that a computed value can be traced all the way
back to the individual experiments that contributed to the
computation of that value.
For example, in the web interface, all cells of all tables
are clickable, as is shown in Figure a. Clicking on any
of these cells leads to a graphical illustration of the chain
of operations leading to this value, called a
provenance tree
(Figure c). In our example, we can see that cell (,) of Ta-
ble  is the sum of values in column  of Table . This entry
in the page can be further expanded, in order to see where
cells in Table  come from, and so on. This provenance
tree is also interactive; clicking on any element of this tree
leads to the page where the corresponding datapoint lives
inside the lab, and is highlighted (Fig. b).
Like many other features in LabPal, provenance track-
ing can be taken all the way down to the final research
paper that presents these results. The L
A
T
E
X tables and
plots generated by LabPal are not mere plain tables and
plots. Rather, we can see by viewing the resulting PDF that
each cell of each table, and each plot in itself, is actually
a hyperlink. Hovering over one of these hyperlinks re-
veals a string of letters and numbers, such as “T..” (Fig.
a). This corresponds to what we call a LabPal Datapoint
Identifier (LDI).
This link cannot be clicked, but its destination can be
copied to the clipboard. One can then go back to the lab in
a page called
Find
, and paste this LDI. LabPal takes us to
the corresponding element and highlights it. From then
on, this element can be inspected using the provenance
features described above. The same can be done for a
plot: by copying the LDI corresponding to the plot, and
useing the
Find
function to get to that same plot in the
web console.
If the lab itself is hosted on some archival platform
and given a DOI, the LDI can be seen as an extension
of that DOI to refer to individual objects inside the lab.
This means that other people can refer to our published
results in a precise manner. Suppose for example that
./. is the DOI given to a lab, and that T.. is
a specific data value in this lab. It becomes possible for an
author to write “our new algorithm performs faster than
Hallé
et al.
for
n= 2
”, with the asterisk (
) leading to a
footnote with the DOI ././T...
An external referee can also use this feature to cross-
check experimental results. In the current state of things,
an artifact evaluation reviewer re-runs a set of scripts, and
then has to hunt in the generated results to find the figures
and numbers that match those in the paper —a task that is
more or less easy, depending on the structure of the scripts
and their documentation. The LDI hyperlink functionality
makes this search quicker: one can go directly to a table
or a particular table cell, and compare the re-computed
value with the one that appears in the paper.
We know of no other system, platform or library, that
can provide such deep referencing and provenance track-
ing facilities at such a fine level of granularity. We shall
also emphasize that this tracking comes
for free
. On a stan-
dard lab instance, no extra code is required on behalf of
the author to assign LDIs and keep track of their relation-
ships.
A FEW OTHER FEATURES
There are many other features of the LabPal library
which, for space considerations, cannot be described in
detail in this paper. Each of them is designed first and
foremost to help authors run their experiments. More
details on each of them can be found in the online User’s
Manual.
Macros
These are additional key-value pairs that can be
added to a lab by the user. Like tables and plots, macros
are assigned an LDI, can be given a textual description,
and can be exported as L
A
T
E
X commands to be inserted in
the text. For example, the following instruction adds to
the lab a macro named maxSize with a value of 1000:
add(new ConstantNumberMacro(this, "maxSize",
"The maximum array size", 1000));
Instead of hard-coding the number  into a paper,
this value can be inserted in the text by using the L
A
T
E
X
command
\maxSize
, so that if the value ever changes in
the lab, this change is automatically reflected in the paper.
This principle can be taken further; users can define their
own custom
Macro
objects, whose value is arbitrary (a
number, a piece of text, etc.) and can be computed on
demand using arbitrary code. For instance, in the full
sorting lab example, a macro goes through all experiments
and fetches the name of the algorithm with the smallest
cumulative sorting time.
Versioning and continuous integration
LabPal labs are
just plain old Java files that reside on a computer. This
means they can be hosted, versioned and compiled on
https://www.gitbook.com/book/liflab/
labpal-user- manual
7Computer
(a) (b)
Cell (0,2) in Table #3
The sum of column 2 in Table #2
Cell (0,2) in Table #2
The value of
Cell (1,1) in Table #1
Value of time in Experiment #2
Cell (1,2) in Table #1
Value of name in Experiment #2
Cell (1,2) in Table #2
The value of
Cell (5,1) in Table #1
Value of time in Experiment #6
Cell (5,2) in Table #1
Value of name in Experiment #6
Cell (2,2) in Table #2
The value of
Cell (9,1) in Table #1
Value of time in Experiment #10
Cell (9,2) in Table #1
Value of name in Experiment #10
(c)
Figure 3: Data provenance tracking features. Table cells are clickable elements (a); clicking on
one of them shows a tree that describes the chain of operations leading to this value (c); clicking
on a node of this tree summons the page of the corresponding datapoint for further examination
(b).
collaborative platforms such as GitHub
and BitBucket
like any other project. They are also easily amenable to
continuous integration
using services like Travis-CI

. In
such a workflow, every modification to the lab pushed
to the repository triggers the re-execution of the whole
Ant build script, which compiles, re-runs the whole lab
and re-exports all tables and figures in a single pass. If the
lab is associated to a working paper, the re-compilation of
the paper can even be afixed to the build script, leading to
a completely automated solution from raw results to pub-
lication. Used in this fashion, it can be seen as a modern
descendent of the pioneering approaches first introduced
by Schwab et al., more than twenty years ago [].
https://github.com
https://bitbucket.org
https://travis- ci.org
Distributed computing Multiple instances of the same
lab can be started on distinct machines, and each of them
can be configured to be responsible for a disjoint set of
all the experiments of the lab. At periodic intervals, each
lab instance can report its current state (and data) to a
master instance in a loosely-coupled protocol based on
HTTP. This can prove useful for splitting very long ex-
periments across multiple machines, and automatically
merging their results in a single save file.
Shadow experiments
We have seen how existing re-
sults can be loaded inside a lab and be explored interac-
tively. Such a save file can also be loaded in
shadow mode
.
In such a case, pre-recorded experimental results can be
displayed side by side with the results that are being com-
puted by the lab’s current user, allowing for easy compari-
XXXX 2018 8
(a)
(b)
Figure 4: A LabPal-exported table contains hy-
perlinks to individual datapoints that are visi-
ble in the generated PDF (a). Pasting such a hy-
perlink in the Find box locates this datapoint
inside the lab (b).
son.
OTHER ISSUES
LabPal is part of a larger ecosystem: there exist many
other facets to the problem of experiment reproducibility
that are deliberately
not
in its scope. In many cases, there
already exist solutions for these diferent issues, and Lab-
Pal can be made to nicely interact with them instead of
trying to duplicate their work.
Environment provisioning
This concerns the question
of re-creating the environment (hardware or sotware)
in which a set of experiments was run on a host system.
LabPal itself can perform basic checks for the presence of
some files or executables on its host machines, but many
solutions solve this issue much more thoroughly. Virtual
machines and Docker containers

can fill this role, and
reliably reproduce a capsule of environment with all its de-
pendencies. Other tools like ReproZip

can automatically
track all files and libraries accessed by a set of experiments,
and package them into a bundle that can be deployed else-
where.
https://www.docker.com
https://www.reprozip.org
Runtime hosting
Similarly, LabPal expects a lab in-
stance to be downloaded and run on some machine. It
does not, by itself, provide online resources for running
labs. This, however, is nicely taken care of by platforms
such as CodeOcean

. As a matter of fact, we are currently
working with CodeOcean to facilitate the interplay of Lab-
Pal with the widgets already provided by the platform.
Artifact referencing and archival
LabPal provides a
naming scheme for referencing resources inside a lab,
but referencing the lab itself must be done through an
external means. There already exist many platforms, such
as Dryad

, DataHub

, Sotware Heritage

and Zenodo

,
that can store artifacts and assign them a unique DOI. Ma-
jor publishers such as the IEEE and the ACM also support
the uploading of files as auxiliary materials associated
with a research paper. Since a lab instance is ultimately
an “inert” JAR file, it can be treated as a piece of data and
uploaded as a resource to either of these platforms.
Interactive notebooks
An alternate form for the presen-
tation of scientific data is a “notebook”, where text is inter-
spersed with code instructions that generate tables and
plots on-the-fly. This is the case, for example, of Jupyter
Notebooks

and Scimax

. We can also place in this cate-
gory the “interactive widgets” (plots, tables, etc.) that can
now be placed in the online version of a paper on some
publishers’ web sites (notably Elsevier). LabPal itself pro-
vides notebook-like facilities, as the user can write custom
pages for its web interface, and these pages can include
plots and tables generated by the lab itself. However, in
general notebook solutions are not concerned with gen-
erating raw data in the first place, but rather process and
display visualizations of this data.
Other languages
The choice of Java as the implemen-
tation language, and the various features ofered by the
library, are somehow “personal”: we wrote the tools that
we needed. Virtually all work in our research lab uses Java,
which made it natural to use Java when deciding to im-
plement a scafolding library to help us write papers. Of
course, we are aware that a Java solution does not suit
every possible use case —but so does any solution using
some other language (we shall mention that we developed
template projects that allow users to write their labs in
https://codeocean.com
https://datadryad.org
https://datahub.io
https://www.softwareheritage.org
https://zenodo.org
https://jupyter.org
http://kitchingroup.cheme.cmu.edu/scimax
9Computer
Python

, connecting to LabPal using the Jython inter-
preter). Similarly, if an experimental tool chain requires
heavy use of external programs, compilers, etc., an en-
vironment such as Collective Knowledge (cK)

is better
suited to the task. However, since LabPal uses JSON as
its native format for storing and exchanging data, it can
easily be integrated into a cK workflow (which, too, uses
JSON for communication between its components).
THE FUTURE
A few years ago, Sandve
et al.
proposed ten simple rules
for reproducible computational research []. Let us see
how LabPal allow us to make a checkmark for each of them:
. For every result, keep track of how it was produced.
LabPal
takes care of this through its provenance tracking
features.
. Avoid manual data-manipulation steps.
LabPal ofers
high-level objects (experiments, tables) that provide
many data-manipulation operations built-in, remov-
ing the need for custom scripts.
. Archive the exact versions of all external programs used.
Through a feature called
environment checks
, LabPal
can be made to check the presence and version of var-
ious pieces of sotware (although we have seen that
some other tools can complement it in this regard).
. Version-control all custom scripts.
Since a lab is just Java
(or Python) code, it can be hosted and versioned on
GitHub like any other piece of sotware.
. Record all intermediate results, when possible in standard
formats.
All tables (including intermediate tables) can
be exported to various formats, including CSV.
. For analyses that include randomness, note underlying
random seeds.
Although this was not discussed here,
LabPal can be initialized with specific random seeds,
which are saved along with the lab’s state.
. Always store raw data behind plots.
LabPal saves all data
generated by the experiments, not just the last table
that produces a plot.
. Generate hierarchical analysis output, allowing layers of
increasing detail to be inspected.
The provenance tree is
just that.
. Connect textual statements to underlying results.
We
have seen how macros can be used to insert snip-
pets of data computed by the lab directly into a paper.
Moreover, each macro has a hyperlink to its LDI.
https://github.com/liflab/labpal- python-project
http://cknowledge.org/
. Provide public access to scripts, runs, and results.
The
single-file bundles LabPal produces can be hosted
online on various platforms.
LabPal is released under an open source license and is
publicly available online. Due to space restrictions, many
of its unique features have only been mentioned in pass-
ing; the online documentation provides much more in-
formation and tutorial videos showing the framework in
action.
In the “eat your own dog food” spirit, researchers at the
Laboratoire d’informatique formelle (LIF), where LabPal is
developed, have started systematically using it for all their
papers (e.g. [
]), and provide links to downloadable lab
instances wherever experimental data is mentioned. Once
the initial efort of developing LabPal has been completed,
creating sets of experiments for papers has been greatly
streamlined, and we would never consider going back to
the “Stone Age” of custom command line scripts.
Over time, it is hoped that ournals and conferences will
help advertise the existence of LabPal, and encourage its
use by researchers, along with the many other tools and
platforms mentioned earlier. The features provided by the
framework are in direct line with their long-term goals
of facilitating the execution of computer experiments for
research purposes, and sharing executable artifacts for
cross-verification and extension by other researchers.
References
[]
L. A. Barba. The hard road to reproducibility.
Science
,
():, .
[]
R. F. Boisvert. Incentivizing reproducibility.
Com-
mun. ACM, ():, .
[]
B. R. Childers and P. K. Chrysanthis. Artifact eval-
uation: Is it a real incentive? In
th IEEE Interna-
tional Conference on e-Science, e-Science , Auckland,
New Zealand, October -, 
, pages –. IEEE
Computer Society, .
[]
C. S. Collberg and T. A. Proebsting. Repeatability in
computer systems research.
Commun. ACM
, ():–
, .
[]
T. Crick, B. A. Hall, and S. Ishtiaq. “Can I implement
your algorithm?”: A model for reproducible research
sotware. In WSSSPE, . arXiv ..
[]
D. L. Donoho, A. Maleki, I. U. Rahman, M. Shahram,
and V. Stodden. Reproducible research in compu-
tational harmonic analysis.
Computing in Science and
Engineering, ():–, .
XXXX 2018 10
[]
S. Hallé, R. Khoury, A. El-Hokayem, and Y. Falcone.
Decentralized enforcement of artifact lifecycles. In
F. Matthes, J. Mendling, and S. Rinderle-Ma, edi-
tors,
th IEEE International Enterprise Distributed Ob-
ject Computing Conference, EDOC , Vienna, Austria,
September -, 
, pages –. IEEE Computer Soci-
ety, .
[]
S. Hallé, R. Khoury, and S. Gaboury. Event stream
processing with multiple threads. In S. K. Lahiri and
G. Reger, editors,
Runtime Verification - th Interna-
tional Conference, RV , Seattle, WA, USA, September
-, , Proceedings
, volume  of
Lecture Notes
in Computer Science, pages –. Springer, .
[]
R. Khoury, S. Hallé, and O. Waldmann. Execution
trace analysis using LTL-FO ˆ+. In T. Margaria and
B. Stefen, editors,
Leveraging Applications of Formal
Methods, Verification and Validation: Discussion, Dis-
semination, Applications - th International Symposium,
ISoLA , Imperial, Corfu, Greece, October -, ,
Proceedings, Part II
, volume  of
Lecture Notes in
Computer Science, pages –, .
[]
G. K. Sandve, A. Nekrutenko, J. Taylor, and E. Hovig.
Ten simple rules for reproducible computational re-
search. PLoS Comput. Biol., ():e, .
[]
M. Schwab, N. Karrenbach, and J. F. Claerbout. Mak-
ing scientific computations reproducible.
Computing
in Science and Engineering, ():–, .
11 Computer
... The set of experiments has been encapsulated into a LabPal testing bundle [92], which is a self-contained executable package containing all the code required to rerun them [170]. For each variation of a scenario, we ran the enforcement pipeline on a randomly generated trace of another that stipulates that events must come in pairs and the last corresponding to the regular expression ( ) * . ...
Thesis
Full-text available
This thesis innovates in the fields of Runtime Verification (RV) and Runtime Enforcement (RE) by addressing key challenges in monitoring and enforcing software behavior. In RV, where traditional monitors assume complete event trace visibility, the thesis introduces a novel framework designed to handle uncertainty when only partial trace information is available. Central to this framework is a stateful access control proxy that transforms events into sets of possible events, termed "multi-traces". This extension of classical Mealy machines enhances monitor resilience against data degradation and access limitations, validated through extensive experiments across diverse scenarios. Meanwhile, in RE, the thesis proposes a modular enforcement monitor model that separates tasks such as altering program execution, enforcing policy compliance, and selecting replacement sequences into distinct modules. This modular approach simplifies monitor design and implementation, enhancing flexibility and adaptability in enforcing security policies at runtime. Practical implementation using the BeepBeep event stream processor demonstrates the frameworks' effectiveness in dynamically selecting and enforcing actions, thereby streamlining operational processes and improving overall system robustness and performance efficiency.
... The set of experiments has been encapsulated into a LabPal testing bundle [39], which is a selfcontained executable package containing all the code required to rerun them [67]. For each variation of a scenario, we ran the enforcement pipeline on a randomly generated trace of length 1,000 // Define the monitor verifying the policy Processor mu = new StateMooreMachine(1, 1); mu.addTransition(0, new EventTransition("a", 1)); ... In addition to the Museum and Casino use cases described earlier, our experiments include the following. ...
Article
Full-text available
Runtime enforcement ensures the respect of a user-specified security policy by a program by providing a valid replacement for any misbehaving sequence of events that may occur during that program’s execution. However, depending on the capabilities of the enforcement mechanism, multiple possible replacement sequences may be available, and the current literature is silent on the question of how to choose the optimal one. Furthermore, the current design of runtime monitors imposes a substantial burden on the designer, since the entirety of the monitoring task is accomplished by a monolithic construct, usually an automata-based model. In this paper, we propose a new modular model of enforcement monitors, in which the tasks of altering the execution, ensuring compliance with the security policy, and selecting the optimal replacement are split into three separate modules, which simplifies the creation of runtime monitors. We implement this approach by using the event stream processor BeepBeep and a use case is presented. Experimental evaluation shows that our proposed framework can dynamically select an adequate enforcement actions at runtime, without the need to manually define an enforcement monitor.
... This setup makes it possible to run our hypergraph-based test generation technique on a large sample of Boolean formulas and study its performance, both in terms of the size of the generated test suites and the time taken to generate them. The set of experiments has been encapsulated into a LabPal testing bundle [12], which is a publicly-available, self-contained executable package containing all the code and input files for our experiments [13]. ...
... To test the implemented approach, we performed several experiments made of a number of scenarios, where each scenario corresponds to a source of events, a property to monitor, a proxy applying specific corrective actions, a filter, and a ranking selector applying specific enforcement preorder. The set of experiments has been encapsulated into a LabPal testing bundle [17], which is a self-contained executable package containing all the code required to rerun them [26]. In addition to the Casino use case described earlier, our experiments include the following. ...
Chapter
Full-text available
Runtime enforcement seeks to provide a valid replacement to any misbehaving sequence of events of a running system so that the correct sequence complies with a user-defined security policy. However, depending on the capabilities of the enforcement mechanism, multiple possible replacement sequences may be available, and the current literature is silent on the question of how to choose the optimal one. In this paper, we propose a new model of enforcement monitors, that allows the comparison between multiple alternative corrective enforcement actions and the selection of the optimal one, with respect to an objective user-defined gradation, separate from the security policy. These concepts are implemented using the event stream processor BeepBeep and a use case is presented. Experimental evaluation shows that our proposed framework can dynamically select enforcement actions at runtime, without the need to manually define an enforcement monitor.
... This simple model has been implemented as a computer program, which makes it possible to simulate a reviewing process by generating a large number of "fake" papers and reviews, and studying the behavior of the resulting system according to various combinations of parameters. The source code of all simulations in this paper is available online1 in the form of a LabPal experimental package [13]. A simulated pool of papers of uniformly distributed quality is given to a program committee of 100 members with randomly selected parameters. ...
Article
Full-text available
A simple mathematical model of the scientific peer reviewing process is developed. Papers and reviewers are modeled as numerical vectors, respectively representing the paper’s value among multiple quality dimensions, and the importance given to these dimensions by a reviewer. Computer simulations show that the model can reproduce various characteristics of a real-world paper decision process, and in particular its propensity to act as an “arbitrary” decision procedure for a range of submissions. A key finding of this study is that the appearance of randomness can be explained by a mismatch between high quality dimensions of a paper, and those valued by the reviewers it is assigned to. As a consequence, a program committee may exhibit arbitrariness even with a set of completely reliable reviewers. Various factors contributing to this arbitrariness are then examined, and alternate selection models are studied that could help reduce arbitrariness and reviewer effort.
Conference Paper
Compliance checking is the operation that consists of assessing whether every execution trace of a business process satisfies a given correctness condition. The paper introduces the notion of hyperquery, which is a calculation that involves multiple traces from a log at the same time. A particular case of hyperquery is a hypercompliance condition, which is a correctness requirement that involves the whole log instead of individual process instances. A formalization of hyperqueries is presented, along with a number of elementary operations to express hyperqueries on arbitrary logs. An implementation of these concepts in an event stream processing engine allows users to concretely evaluate hyperqueries in real time.
Chapter
Integrating security in the development and operation of information systems is the cornerstone of SecDevOps. From an operational perspective, one of the key activities for achieving such an integration is the detection of incidents (such as intrusions), especially in an automated manner. However, one of the stumbling blocks of an automated approach to intrusion detection is the management of the large volume of information typically produced by this type of solution. Existing works on the topic have concentrated on the reduction of volume by increasing the precision of the detection approach, thus lowering the rate of false alarms. However, another less explored possibility is to reduce the volume of evidence gathered for each alarm raised. This chapter explores the concept of intrusion detection from the angle of complex event processing. It provides a formalization of the notion of pattern matching in a sequence of events produced by an arbitrary system, by framing the task as a runtime monitoring problem. It then focuses on the topic of incident reporting and proposes a technique to automatically extract relevant elements of a stream that explain the occurrence of an intrusion. These relevant elements generally amount to a small fraction of all the data ingested for an alarm to be triggered and thus help reduce the volume of evidence that needs to be examined by manual means. The approach is experimentally evaluated on a proof-of-concept implementation of these principles.
Conference Paper
Full-text available
We present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Compared to the single-threaded version of BeepBeep, the allocation of just a few threads to specific portions of a query provides improvement in terms of throughput.
Article
Full-text available
Current runtime verification tools seldom make use of multi-threading to speed up the evaluation of a property on a large event trace. In this paper, we present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Various parallelization strategies are presented and described on simple examples. The implementation of these strategies is then evaluated empirically on a sample of problems. Compared to the previous, single-threaded version of the BeepBeep engine, the allocation of just a few threads to specific portions of a query provides dramatic improvement in terms of running time.
Conference Paper
Full-text available
We explore of use of the tool BeepBeep, a monitor for the temporal logic LTL-FO+^+, in interpreting assembly traces, focusing on security-related applications. LTL-FO+^+ is an extension of LTL, which includes first order quantification. We show that LTL-FO+^+ is a sufficiently expressive formalism to state a number of interesting program behaviors, and demonstrate experimentally that BeepBeep can efficiently verify the validity of the properties on assembly traces in tractable time.
Article
Full-text available
The reproduction and replication of novel scientific results has become a major issue for a number of disciplines. In computer science and related disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it in a new codebase frequently requires local knowledge missing from the published manuscripts and project websites. Alongside this issue, benchmarking, and the development of fair, and widely available benchmark sets present another barrier. In this paper, we outline several suggestions to address these issues, driven by specific examples from a range of scientific domains. Finally, based on these suggestions, we propose a new open platform for scientific software development which effectively isolates specific dependencies from the individual researcher and their workstation and allows faster, more powerful sharing of the results of scientific software engineering.
Article
Early in my Ph.D. studies, my supervisor assigned me the task of running computer code written by a previous student who was graduated and gone. It was hell. I had to sort through many different versions of the code, saved in folders with a mysterious numbering scheme. There was no documentation and
Article
To encourage repeatable research, fund repeatability engineering and reward commitments to sharing research artifacts.