ArticlePDF Available

Streamlining the Inclusion of Computer Experiments In a Research Paper


Abstract and Figures

Designing clean, reusable, and repeatable experiments for a research paper does not have to be difficult. We report on our efforts to create an integrated toolchain for running, processing, and including the results of computer experiments in scientific publications.
Content may be subject to copyright.
Streamlining the Inclusion of
Computer Experiments In A
Research Paper
Sylvain Hallé, Raphaël Khoury, Mewena Awesso
Université du Québec à Chicoutimi, Canada
Designing clean, reusable and repeatable experiments for a research paper doesn’t have to be
hard. We report on the efforts we invested in creating an integrated toolchain for running,
processing and including the results of computer experiments in scientific publications —and
saving us time along the way.
r authors of scientific research involving computer
experiments, writing a new paper oten feels like
reinventing the wheel. We spend considerable
amounts of time at the command line, writing shell scripts
that run programs, shufle around temporary data files,
and crunch these files in various ways. For first-time au-
thors, mastering the syntax of GnuPlot, learning to use
Unix pipes and to parse CSV files is almost a rite of pas-
sage. Such skills are so ingrained in the computer culture
that they have recently been regrouped into a corpus of
basic know-how called sotware carpentry.
Alas, not much of these scripts and temp files survive
the next paper. More oten than not, their content is so
specific to our current experiments and data-crunching
tasks that hardly anything is worth reusing. As one study
has put it, we paper authors are very proficient at cooking
up “hack-together, use-once, throw-away scripts” []. As
fas as carpentry is concerned, it is as if we had to reinvent
hammers, nails and screwdrivers every time we wanted
to build a house.
An important casuality of this state of things is repro-
ducibility, or the capability for a group of people to confirm
experimental claims in a more or less independent way
(see sidebar). Through our poorly documented, throw-
away scripts, we give anybody else a hard time trying to
simply re-execute what we did —as a matter of fact, even
the authors themselves may have trouble re-running their
experiments ater six months. Our experimental setup
may be so messy that we don’t even bother making it avail-
able to others. In a recent paper, Collberg and Proebsting
reveal that out of a sample of  research papers that
describe experiments,  had results that could not be
reproduced, mostly due to code and data not being made
publicly available []. This has led to what some in the field
of computing research have called a credibility crisis []: one
of the cornerstones of science is being compromised by
our careless attitude towards empirical work. Recent ef-
forts such as paper badges, open repositories, submission
guidelines and artifact evaluation committees, are all at-
tempts to reverse this trend []. Unfortunately, complying
with these guidelines and principles, from the author’s
point of view, oten amounts to additional work, and is
oten viewed with resistance.
Not so long ago, this is what we were thinking too. This
was before we decided to invest time and energy to de-
velop generic and, most importantly, reusable tools to help
us run, process and include experimental results in our
research papers. The result is LabPal
, an open source sot-
ware library that streamlines many menial data crunch-
ing tasks that we used to hand code in single-use scripts.
Sidebar: Repeatability, Replicability, Repro-
Definitions of reproducibility, repatability and other
terms abound in the literature and vary subtly; here we
follow the basic terminology proposed by the ACM.aRe-
peatability means that a researcher can reliably repeat
her own computation. Replicability means that an in-
dependent group can obtain the same result using the
author’s own artifacts. Finally, reproducibility means
that an independent group can obtain the same result
using artifacts which they develop completely indepen-
Pushing data processing and manipulation functions
outside of custom scripts, and into shared and reusable
libraries, reduces the code footprint of what is con-
sidered the author’s “own” artifacts –and hence the
amount of code that must be re-implemented by an
independent third party to be considered as “repro-
artifact-review- badging
In the following, we shall show through a simple, step-
by-step example, how LabPal can be used to execute ex-
periments, and easily transfer their results into a scien-
tific publication. Most importantly, we hope to convince
portential authors that designing experiments for repro-
ducibility, when using the right tools, does not necessarily
represent more work, and can even provide tangible ben-
efits for them.
LabPal is a sotware library whose primary goal is to
make the life of authors easier, by helping them run, pro-
cess and include experimental results in a research pa-
per. Contrary to many other tools and platforms (some
of which will be discussed at the end of this paper), it is
not focused on help third-parties reproduce and explore
results. Rather, reproducibility comes as a by-product of
our design goals: if running experiments and examining
their results is easy and intuitive for paper authors, it will
be equally easy for anyone else to do the same thing.
As a running example, suppose we are writing a paper
that compares sorting algorithms. We would like our pa-
per to include experimental results of running multiple
existing algorithms on various random arrays, and use
LabPal to help us run these experiments. We shall start
with an empty template project, such as the one that is
freely available online
. The com-
plete version of the lab shown as an example in this paper can also be
Structure of a lab
First, we have to understand the structure of LabPal,
summarized in Figure . Its basic building block is an
, which is an object that can take input param-
eters, can be run, and produces one or more output values.
Typically, many experiment copies (or “instances”) are cre-
ated, each with its own set of input parameters. There can
even exist more than one type of experiment, and each
type may expect diferent input parameters, perform dif-
ferent things when run, and produce diferent kinds of
output parameters.
The results of experiments can be collected into
which are data structures that are populated by fetching
values of specific parameters in a pool of experiments.
Many tables can be created, each being associated to dif-
ferent sets of experiments or fetching diferent parame-
ters. Tables can be transformed into other tables using
operations called
. For example, a table
can be transposed, two tables can be joined on a common
parameter name (similar to an SQL join), the sum, average,
or quartiles of a table’s columns can be calculated.
Finally, tables can be given as input to
, which are
graphical representations of a table’s content. By default,
LabPal provides interchangeable connectors to two widely
used plotting systems, GnuPlot
and GRAL
. A
(or “lab” for short) is an organized and documented col-
lection of experiments, tables and plots.
In line with what other research groups have already
revealed, we agree that “we can only achieve the necessary
level of reliability and transparency by automating every
step” []. To this end, the execution of a lab can be done
interactively, but can also run in a completely autonomous
way. Perhaps more importantly, we believe in encapsulat-
ing important components into reusable units. Therefore,
while the concepts of experiments and tables are implic-
itly present in virtually any experimental setup in some
form or another, LabPal allows a user to manipulate them
through objects of a high level of abstraction, resulting in
a relatively small amount of code. Since LabPal is a Java
library, all these concepts correspond concretely to Java
classes and objects. One can also do the same thing in
Python and Scala.
Creating experiments
To setup our sorting lab, we first need to create an exper-
iment. In LabPal, experiments are all descendants of the
class. In our example, an experiment takes
a single input parameter, which is the size of the array we
found in the Examples folder of LabPal itself:
XXXX 2018 2
u=http://... k=0.239
t z b
Figure 1: The basic structure of a laboratory in LabPal.
wish to sort. Setting an input parameter is done by calling
the method
, which associates to a parameter
*name* a particular *value*. When being run, the exper-
iment will generate an array of given size and then sort
it. This will be done in method
, which all experi-
ments must implement. Finally, our experiment produces
a single output value, which corresponds to the time it
takes to sort that particular array. Writing an output value
is done by calling the method
, which associates to
aname a particular value.
Therefore, a sensible way to create our experiment
would be to write this:
class GnomeSort extends Experiment {
transient int[] array;
public GnomeSort(int[] array) {
this.array = array;
setInput("Size", array.length);
setInput("Algorithm", "Gnome Sort");
public void execute() {
long start = System.currentTimeMillis();
int pos = 0;
while (pos < array.length)
if (pos == 0 || array[pos] >= array[pos-1]) pos++;
else swap(array, pos, (pos--)-1);
write("Time", System.currentTimeMillis() - start);
The constructor receives an array to sort, and sets its
length as an input parameter of the experiment with name
“Size”. When called, method
sorts this array us-
ing some algorithm (here Gnome sort). This last bit of
code is surrounded by two calls to get the current system
time. The duration of the sort operation is written as an
output data and is given the name “Time”.
We are now ready to create a
(“lab” for short),
which will be the environment in which these experi-
ments will be run. In LabPal, a lab is a descendent of the
class. The template project already contains
an empty laboratory called
. Experiments
can be created in a method called
, and are added to
the lab by a call to method
. Our lab could hence look
like this:
class SortingLab extends Laboratory {
public void setup() {
for (int n : new int[]{10, 100, 1000}) {
int[] array = generateRandomArray(n);
add(new GnomeSort(array));
add(new QuickSort(array));
public static void main(String[] args) {
initialize(args, SortingLab.class);
This lab creates instances of the
ment with diferent array sizes, and adds them to the lab
(it also adds instances of experiments that use other sort-
ing algorithms on the same arrays, which we do not show
here). The
method is only there so that our lab can
be executable from the command line.
Our template project contains an Ant build script that
compiles and bundles everything into a JAR file called (by
. This JAR is runnable and stand-
alone: we can move it around without needing to worry
about installed libraries and other dependencies. We can
then start the lab by simply running the JAR file.
Using the web console
By default, running a lab starts a local web server, which
can be accessed from a browser. The default address is
, but this can be changed by
a command-line setting. Opening this URL in a browser
leads us to the home page of the lab. We can setup this
page to give a text that provides a few details about what
this lab is about, so that someone who retrieves our lab file
and runs it has an idea of what it does. That same person
can also click on the
button at the top of the page to
get more information about how the console works and
how to run the experiments that the lab contains.
Clicking on the
button at the top of the page
brings us to the list of experiments, as is shown in Figure
a. For each, we see its unique number, as well as the input
parameters that determine what this experiment does. In
our case, our experiments are meant to compare sorting
algorithms, so each of them sorts an array of a given size
using a specific algorithm. For each experiment, we can
also see a status icon that tells whether the experiment is
queued, is currently running, etc.
We are now ready to run the experiments. This is done
by sending them to what we call a
lab assistant
. The as-
sistant is a process that merely runs, in a linear fashion,
all the experiments that are sent to its queue.
To send
experiments to the queue, one simply checks some of the
experiments in the list, and sends them to the assistant
with the appropriate button. The assistant has its own
page, where the contents of its queue can be shown and
modified. Other pages in the web console allow us to visu-
alize the status of each experiments and their results in
various ways, by the means of auto-generated
Adding tables and plots
So far, our lab contains three experiments, each of
which computes and generates a single output data ele-
ment, namely the duration of the sorting operation. These
can be viewed by clicking on each of the experiments in the
web console. Let us now collect these results and display
To do so, we need to create a
. A table is a collec-
tion of table
, each of which is a set of key-value
pairs. We would like to create a table from the results
produced by our experiments: each entry should contain
of the array and the
it took to sort it. This
is done by creating a new
—that is, a
table whose content is fetched from the data produced by
Alternatively, we can choose a non-linear assistant that runs experi-
ments using multiple threads. Obviously, this assistant is appropriate
only for situations where the experiment results are not sensitive to the
use of multi-threading.
Figure 2: Screenshots of LabPal’s web inter-
face. (a) the list of experiments included in
a lab; the running status for each of them is
shown as an icon. (b) each experiment can
show its own input/output parameters, as well
as metadata automatically recorded by the lab
assistant. (c) The auto-generated plots provide
various buttons for exporting them to many for-
XXXX 2018 4
one or more experiments. We create the table by telling
it the names of the parameters we wish to fetch from the
ExperimentTable t = new ExperimentTable(
"Algorithm", "Size", "Duration");
If we want the table to show up in the lab console, we
must also add it to the lab by calling
. Once the table is
created, experiments must be added to it. If we recompile
and run this new lab, you will now see that a table shows
up in the
page in the web console, with the name
“Table ”. Clicking on it will show something like this:
Algorithm Size Duration
Gnome Sort
Quick Sort
Each line of the table corresponds to the values fetched
from one experiment we added to it. Table entries are
automatically sorted in the order the columns are men-
tioned, and identical values are grouped. The
eter is filled, but the
column shows nothing. This
is normal: since we haven’t run any experiment, these
data elements have not yet been produced. If we run one
of the experiments and go back to the table, you will see
that the corresponding cell now has a value. As a matter of
fact, when we run a lot of experiments, we can periodically
refresh a table’s page and see the cells being filled with
data progressively.
It is sometimes better to display data graphically, so
let’s add a
to our lab. A plot is always created with
respect to an existing table. In our case, we would like to
trace a line showing the sorting time with respect to the
size of the array, and have one such line for each sorting
algorithm. The object we use for this is a
However, such a plot expects its input to be organized in a
diferent way: the first column should contain the values
of the
axis, and the remaining columns should contain
the yvalues of each data series.
This can be done by applying a table
to the original experiment table. A transformation is an
operation that takes one or more tables, and produces an-
other table as an output. Here, the
transformation is instructed to make one column for each
distinct value of “Algorithm”, and for each line of the orig-
inal table, use the value of “Time” as the values for this
column. This table can then be passed to the scatterplot,
as follows:
TransformedTable t2 = new TransformedTable(
ExpandAsColumns.get("Algorithm", "Time"), t);
Scatterplot plot = new Scatterplot(t2);
now has the appropriate structure to be passed
to our scatterplot:
Size Gnome Sort Quick Sort
The concept of table transformations is very power-
ful; many common operations on tables can actually be
done by chaining single-line instructions over existing
tables. Moreover, if more usage-specific transformations
are needed, users can write their own and compose them
with the existing ones. If we recompile and restart the
lab, we will now see a plot in the
page, called “Plot
”. Since the plot is created from a table, its contents are
dynamically updated every time the page is refreshed.
Including results in a paper
We claimed earlier that LabPal could help us streamline
the inclusion of experimental results in a research paper.
We have seen how the web interface can simplify the exe-
cution and processing of experimental results. We shall
now see how these results can be easily transplanted into
a paper in progress, especially if we use L
Through the web interface, each plot can be saved as a
PNG image, exported as a PDF file, or have its raw data
output as a GnuPlot input file we can then process outside
of LabPal if we wish. However, for a paper in progress,
it may be tedious to re-download each plot every time
the lab is re-run. This is why, in the
page, a button
allows the user to download all plots at once; each plot
becomes one distinct page of the document. This means
that in a L
X document, to show each plot, we always
refer to the same file, but to a diferent
of that file.
A big advantage is that if we update the lab, we simply
re-download that PDF, and all the plots are updated at
once. There is another button on the page, which ofers us
to download a set of macros that go with the plots. These
macros are L
X commands one can use to summon each
plot with an identifier, eliminating the need to explicitly
mention a page number.
A table can also be exported in various ways from the
web interface: we can copy-paste its contents in a word
processor, which should normally preserve its formatting.
Otherwise, in the
page of the web interface, we can
click on one of the buttons to download the table as an
HTML, plain-text (CSV) or L
X file. The L
X version, in
particular, is already formatted with borders, headings,
etc., which spares us from doing it by hand from raw text
data. Similarly to plots, all tables can be download into
a single-file bundle, and macros can be used to include
the table of our choosing using a simple one-line L
Sidebar: The Red (and Green and Blue)
Badge of Courage
In 2016, the ACM Task Force on Data, Software and
Reproducibility in Publication [2] setup a set of “badges”
that can be attached to a research paper when it is
accompanied with executable artifacts.
If the artifacts have been checked to be available on
an archival platform, it receives the Artifacts Available
badge. If the artifacts have been successfully executed
by an external referee, it receives the Artifacts Evalu-
ated – Functional badge, and the Artifacts Evaluated –
Reusable if they exceed minimal functionality. The Re-
sults Replicated badge is given if the artifact can be run
and re-obtain the results mentioned in the paper, and
Results Reproduced badge indicates that the paper’s
claims have been verified through an entirely indepen-
dent implementation.
A well-designed LabPal instance provides an easy way
to claim the Functional,Reusable and even Replicated
badges, by letting a referee use the web interface to
run experiments and examine all the tables and plots
that are generated from them.
command. So, not only does LabPal take care of some
tedious formatting tasks, the single-file bundle of plots
and tables makes it actually faster to update an existing
paper with freshly computed data.
We now have a basic running laboratory with auto-
generated tables, plots, and an interactive web interface.
Let us remind the reader that we have written, all in all,
about  lines of Java code.
Adding metadata
LabPal ofers various ways to define
for your
lab –that is, data about its data. A first metadata is about
the lab itself. You can enter a textual description for the
lab using method
. If a description is
defined, it will be displayed in the web console in the
page, replacing the default help text that shows up
otherwise. This description can include any valid HTML
markup, and can be loaded from an external file for con-
A description can be entered for each experiment
separately. This can be done using the experiment’s
method to set the text. Again, this de-
scription can contain any valid HTML, and it is displayed
in the corresponding
page. Finally, each in-
dividual parameter (input and output) can also be given
a description. To this end, an experiment can use the
method describe:
public GnomeSort(int[] array) {
this.array = array;
setInput("Size", array.size);
describe("Size", "The size of the array to sort");
describe("Time", "The sorting time (in ms)");
setInput("Algorithm", "Gnome Sort");
describe("Algorithm", "The algorithm used");
One can see how the
method has been used
to associate a short description of the input parameter
“Size”, as well as the output parameter “Time” and the in-
put parameter "Algorithm". These last two parameters
have not yet received a value, but they can already get a
description. In the web console, the description of pa-
rameters shows as tooltips wherever the parameter name
Saving, loading, merging labs
Our examples so far have involved relatively short ex-
periments that could run within a few seconds. However,
the labs we create will likely contain more than a handful
of experiments, whose running time may take minutes, if
not hours. For all sorts of reasons, we might not want to
run all these experiments in a single pass. It would be nice
if we could select and run a few of them, close our setup
(and even our computer) and run more experiments at a
later time.
Saving the current state of a lab can be done by going to
page and clicking on the save button. For each
experiment, this file saves all of its input parameters, the
output parameters it generated (if any), as well as its cur-
rent status (finished or not). Loading a lab is the reverse
operation: given a previously saved lab file, LabPal will
read and restore each experiment in the state it was when
the lab was saved. A lab can be loaded either through the
web interface, or when launching it at the command line.
If you write a research paper, it might be desirable to
put online a copy of your lab, which contains the exact
results you refer to in the paper. One possible way is to
also put online the save file corresponding to the lab, as
described above. A user can then download the lab, and
(through the command line or the web interface), load
the save file to retrieve the data. LabPal can also be setup
to load this data automatically, without the need for user
intervention. In order to do so, the saved file simply needs
to be placed the save file
the JAR bundle that you
create and distribute.
We shall note that lab authors do not need to write
saving/loading code for their experiments by themselves.
LabPal takes care of serializing and deserializing any
object without user inter-
XXXX 2018 6
Provenance Tracking
Provenance is the ability to retrace the sequence of oper-
ations that lead to the production of a specific data point.
In other words, it is computing an answer to the ques-
tion: where does this value come from? LabPal provides
facilities to keep track of the sequence of operations that
lead to the production of a specific data point. Using stan-
dard tables, table transformations and plot objects, this
means that a computed value can be traced all the way
back to the individual experiments that contributed to the
computation of that value.
For example, in the web interface, all cells of all tables
are clickable, as is shown in Figure a. Clicking on any
of these cells leads to a graphical illustration of the chain
of operations leading to this value, called a
provenance tree
(Figure c). In our example, we can see that cell (,) of Ta-
ble  is the sum of values in column  of Table . This entry
in the page can be further expanded, in order to see where
cells in Table  come from, and so on. This provenance
tree is also interactive; clicking on any element of this tree
leads to the page where the corresponding datapoint lives
inside the lab, and is highlighted (Fig. b).
Like many other features in LabPal, provenance track-
ing can be taken all the way down to the final research
paper that presents these results. The L
X tables and
plots generated by LabPal are not mere plain tables and
plots. Rather, we can see by viewing the resulting PDF that
each cell of each table, and each plot in itself, is actually
a hyperlink. Hovering over one of these hyperlinks re-
veals a string of letters and numbers, such as “T..” (Fig.
a). This corresponds to what we call a LabPal Datapoint
Identifier (LDI).
This link cannot be clicked, but its destination can be
copied to the clipboard. One can then go back to the lab in
a page called
, and paste this LDI. LabPal takes us to
the corresponding element and highlights it. From then
on, this element can be inspected using the provenance
features described above. The same can be done for a
plot: by copying the LDI corresponding to the plot, and
useing the
function to get to that same plot in the
web console.
If the lab itself is hosted on some archival platform
and given a DOI, the LDI can be seen as an extension
of that DOI to refer to individual objects inside the lab.
This means that other people can refer to our published
results in a precise manner. Suppose for example that
./. is the DOI given to a lab, and that T.. is
a specific data value in this lab. It becomes possible for an
author to write “our new algorithm performs faster than
et al.
n= 2
”, with the asterisk (
) leading to a
footnote with the DOI ././T...
An external referee can also use this feature to cross-
check experimental results. In the current state of things,
an artifact evaluation reviewer re-runs a set of scripts, and
then has to hunt in the generated results to find the figures
and numbers that match those in the paper —a task that is
more or less easy, depending on the structure of the scripts
and their documentation. The LDI hyperlink functionality
makes this search quicker: one can go directly to a table
or a particular table cell, and compare the re-computed
value with the one that appears in the paper.
We know of no other system, platform or library, that
can provide such deep referencing and provenance track-
ing facilities at such a fine level of granularity. We shall
also emphasize that this tracking comes
for free
. On a stan-
dard lab instance, no extra code is required on behalf of
the author to assign LDIs and keep track of their relation-
There are many other features of the LabPal library
which, for space considerations, cannot be described in
detail in this paper. Each of them is designed first and
foremost to help authors run their experiments. More
details on each of them can be found in the online User’s
These are additional key-value pairs that can be
added to a lab by the user. Like tables and plots, macros
are assigned an LDI, can be given a textual description,
and can be exported as L
X commands to be inserted in
the text. For example, the following instruction adds to
the lab a macro named maxSize with a value of 1000:
add(new ConstantNumberMacro(this, "maxSize",
"The maximum array size", 1000));
Instead of hard-coding the number  into a paper,
this value can be inserted in the text by using the L
, so that if the value ever changes in
the lab, this change is automatically reflected in the paper.
This principle can be taken further; users can define their
own custom
objects, whose value is arbitrary (a
number, a piece of text, etc.) and can be computed on
demand using arbitrary code. For instance, in the full
sorting lab example, a macro goes through all experiments
and fetches the name of the algorithm with the smallest
cumulative sorting time.
Versioning and continuous integration
LabPal labs are
just plain old Java files that reside on a computer. This
means they can be hosted, versioned and compiled on
labpal-user- manual
(a) (b)
Cell (0,2) in Table #3
The sum of column 2 in Table #2
Cell (0,2) in Table #2
The value of
Cell (1,1) in Table #1
Value of time in Experiment #2
Cell (1,2) in Table #1
Value of name in Experiment #2
Cell (1,2) in Table #2
The value of
Cell (5,1) in Table #1
Value of time in Experiment #6
Cell (5,2) in Table #1
Value of name in Experiment #6
Cell (2,2) in Table #2
The value of
Cell (9,1) in Table #1
Value of time in Experiment #10
Cell (9,2) in Table #1
Value of name in Experiment #10
Figure 3: Data provenance tracking features. Table cells are clickable elements (a); clicking on
one of them shows a tree that describes the chain of operations leading to this value (c); clicking
on a node of this tree summons the page of the corresponding datapoint for further examination
collaborative platforms such as GitHub
and BitBucket
like any other project. They are also easily amenable to
continuous integration
using services like Travis-CI
. In
such a workflow, every modification to the lab pushed
to the repository triggers the re-execution of the whole
Ant build script, which compiles, re-runs the whole lab
and re-exports all tables and figures in a single pass. If the
lab is associated to a working paper, the re-compilation of
the paper can even be afixed to the build script, leading to
a completely automated solution from raw results to pub-
lication. Used in this fashion, it can be seen as a modern
descendent of the pioneering approaches first introduced
by Schwab et al., more than twenty years ago [].
Distributed computing Multiple instances of the same
lab can be started on distinct machines, and each of them
can be configured to be responsible for a disjoint set of
all the experiments of the lab. At periodic intervals, each
lab instance can report its current state (and data) to a
master instance in a loosely-coupled protocol based on
HTTP. This can prove useful for splitting very long ex-
periments across multiple machines, and automatically
merging their results in a single save file.
Shadow experiments
We have seen how existing re-
sults can be loaded inside a lab and be explored interac-
tively. Such a save file can also be loaded in
shadow mode
In such a case, pre-recorded experimental results can be
displayed side by side with the results that are being com-
puted by the lab’s current user, allowing for easy compari-
XXXX 2018 8
Figure 4: A LabPal-exported table contains hy-
perlinks to individual datapoints that are visi-
ble in the generated PDF (a). Pasting such a hy-
perlink in the Find box locates this datapoint
inside the lab (b).
LabPal is part of a larger ecosystem: there exist many
other facets to the problem of experiment reproducibility
that are deliberately
in its scope. In many cases, there
already exist solutions for these diferent issues, and Lab-
Pal can be made to nicely interact with them instead of
trying to duplicate their work.
Environment provisioning
This concerns the question
of re-creating the environment (hardware or sotware)
in which a set of experiments was run on a host system.
LabPal itself can perform basic checks for the presence of
some files or executables on its host machines, but many
solutions solve this issue much more thoroughly. Virtual
machines and Docker containers
can fill this role, and
reliably reproduce a capsule of environment with all its de-
pendencies. Other tools like ReproZip
can automatically
track all files and libraries accessed by a set of experiments,
and package them into a bundle that can be deployed else-
Runtime hosting
Similarly, LabPal expects a lab in-
stance to be downloaded and run on some machine. It
does not, by itself, provide online resources for running
labs. This, however, is nicely taken care of by platforms
such as CodeOcean
. As a matter of fact, we are currently
working with CodeOcean to facilitate the interplay of Lab-
Pal with the widgets already provided by the platform.
Artifact referencing and archival
LabPal provides a
naming scheme for referencing resources inside a lab,
but referencing the lab itself must be done through an
external means. There already exist many platforms, such
as Dryad
, DataHub
, Sotware Heritage
and Zenodo
that can store artifacts and assign them a unique DOI. Ma-
jor publishers such as the IEEE and the ACM also support
the uploading of files as auxiliary materials associated
with a research paper. Since a lab instance is ultimately
an “inert” JAR file, it can be treated as a piece of data and
uploaded as a resource to either of these platforms.
Interactive notebooks
An alternate form for the presen-
tation of scientific data is a “notebook”, where text is inter-
spersed with code instructions that generate tables and
plots on-the-fly. This is the case, for example, of Jupyter
and Scimax
. We can also place in this cate-
gory the “interactive widgets” (plots, tables, etc.) that can
now be placed in the online version of a paper on some
publishers’ web sites (notably Elsevier). LabPal itself pro-
vides notebook-like facilities, as the user can write custom
pages for its web interface, and these pages can include
plots and tables generated by the lab itself. However, in
general notebook solutions are not concerned with gen-
erating raw data in the first place, but rather process and
display visualizations of this data.
Other languages
The choice of Java as the implemen-
tation language, and the various features ofered by the
library, are somehow “personal”: we wrote the tools that
we needed. Virtually all work in our research lab uses Java,
which made it natural to use Java when deciding to im-
plement a scafolding library to help us write papers. Of
course, we are aware that a Java solution does not suit
every possible use case —but so does any solution using
some other language (we shall mention that we developed
template projects that allow users to write their labs in
, connecting to LabPal using the Jython inter-
preter). Similarly, if an experimental tool chain requires
heavy use of external programs, compilers, etc., an en-
vironment such as Collective Knowledge (cK)
is better
suited to the task. However, since LabPal uses JSON as
its native format for storing and exchanging data, it can
easily be integrated into a cK workflow (which, too, uses
JSON for communication between its components).
A few years ago, Sandve
et al.
proposed ten simple rules
for reproducible computational research []. Let us see
how LabPal allow us to make a checkmark for each of them:
. For every result, keep track of how it was produced.
takes care of this through its provenance tracking
. Avoid manual data-manipulation steps.
LabPal ofers
high-level objects (experiments, tables) that provide
many data-manipulation operations built-in, remov-
ing the need for custom scripts.
. Archive the exact versions of all external programs used.
Through a feature called
environment checks
, LabPal
can be made to check the presence and version of var-
ious pieces of sotware (although we have seen that
some other tools can complement it in this regard).
. Version-control all custom scripts.
Since a lab is just Java
(or Python) code, it can be hosted and versioned on
GitHub like any other piece of sotware.
. Record all intermediate results, when possible in standard
All tables (including intermediate tables) can
be exported to various formats, including CSV.
. For analyses that include randomness, note underlying
random seeds.
Although this was not discussed here,
LabPal can be initialized with specific random seeds,
which are saved along with the lab’s state.
. Always store raw data behind plots.
LabPal saves all data
generated by the experiments, not just the last table
that produces a plot.
. Generate hierarchical analysis output, allowing layers of
increasing detail to be inspected.
The provenance tree is
just that.
. Connect textual statements to underlying results.
have seen how macros can be used to insert snip-
pets of data computed by the lab directly into a paper.
Moreover, each macro has a hyperlink to its LDI.
 python-project
. Provide public access to scripts, runs, and results.
single-file bundles LabPal produces can be hosted
online on various platforms.
LabPal is released under an open source license and is
publicly available online. Due to space restrictions, many
of its unique features have only been mentioned in pass-
ing; the online documentation provides much more in-
formation and tutorial videos showing the framework in
In the “eat your own dog food” spirit, researchers at the
Laboratoire d’informatique formelle (LIF), where LabPal is
developed, have started systematically using it for all their
papers (e.g. [
]), and provide links to downloadable lab
instances wherever experimental data is mentioned. Once
the initial efort of developing LabPal has been completed,
creating sets of experiments for papers has been greatly
streamlined, and we would never consider going back to
the “Stone Age” of custom command line scripts.
Over time, it is hoped that ournals and conferences will
help advertise the existence of LabPal, and encourage its
use by researchers, along with the many other tools and
platforms mentioned earlier. The features provided by the
framework are in direct line with their long-term goals
of facilitating the execution of computer experiments for
research purposes, and sharing executable artifacts for
cross-verification and extension by other researchers.
L. A. Barba. The hard road to reproducibility.
():, .
R. F. Boisvert. Incentivizing reproducibility.
mun. ACM, ():, .
B. R. Childers and P. K. Chrysanthis. Artifact eval-
uation: Is it a real incentive? In
th IEEE Interna-
tional Conference on e-Science, e-Science , Auckland,
New Zealand, October -, 
, pages –. IEEE
Computer Society, .
C. S. Collberg and T. A. Proebsting. Repeatability in
computer systems research.
Commun. ACM
, ():–
, .
T. Crick, B. A. Hall, and S. Ishtiaq. “Can I implement
your algorithm?”: A model for reproducible research
sotware. In WSSSPE, . arXiv ..
D. L. Donoho, A. Maleki, I. U. Rahman, M. Shahram,
and V. Stodden. Reproducible research in compu-
tational harmonic analysis.
Computing in Science and
Engineering, ():–, .
XXXX 2018 10
S. Hallé, R. Khoury, A. El-Hokayem, and Y. Falcone.
Decentralized enforcement of artifact lifecycles. In
F. Matthes, J. Mendling, and S. Rinderle-Ma, edi-
th IEEE International Enterprise Distributed Ob-
ject Computing Conference, EDOC , Vienna, Austria,
September -, 
, pages –. IEEE Computer Soci-
ety, .
S. Hallé, R. Khoury, and S. Gaboury. Event stream
processing with multiple threads. In S. K. Lahiri and
G. Reger, editors,
Runtime Verification - th Interna-
tional Conference, RV , Seattle, WA, USA, September
-, , Proceedings
, volume  of
Lecture Notes
in Computer Science, pages –. Springer, .
R. Khoury, S. Hallé, and O. Waldmann. Execution
trace analysis using LTL-FO ˆ+. In T. Margaria and
B. Stefen, editors,
Leveraging Applications of Formal
Methods, Verification and Validation: Discussion, Dis-
semination, Applications - th International Symposium,
ISoLA , Imperial, Corfu, Greece, October -, ,
Proceedings, Part II
, volume  of
Lecture Notes in
Computer Science, pages –, .
G. K. Sandve, A. Nekrutenko, J. Taylor, and E. Hovig.
Ten simple rules for reproducible computational re-
search. PLoS Comput. Biol., ():e, .
M. Schwab, N. Karrenbach, and J. F. Claerbout. Mak-
ing scientific computations reproducible.
in Science and Engineering, ():–, .
11 Computer
... This simple model has been implemented as a computer program, which makes it possible to simulate a reviewing process by generating a large number of "fake" papers and reviews, and studying the behavior of the resulting system according to various combinations of parameters. The source code of all simulations in this paper is available online1 in the form of a LabPal experimental package [13]. A simulated pool of papers of uniformly distributed quality is given to a program committee of 100 members with randomly selected parameters. ...
Full-text available
A simple mathematical model of the scientific peer reviewing process is developed. Papers and reviewers are modeled as numerical vectors, respectively representing the paper’s value among multiple quality dimensions, and the importance given to these dimensions by a reviewer. Computer simulations show that the model can reproduce various characteristics of a real-world paper decision process, and in particular its propensity to act as an “arbitrary” decision procedure for a range of submissions. A key finding of this study is that the appearance of randomness can be explained by a mismatch between high quality dimensions of a paper, and those valued by the reviewers it is assigned to. As a consequence, a program committee may exhibit arbitrariness even with a set of completely reliable reviewers. Various factors contributing to this arbitrariness are then examined, and alternate selection models are studied that could help reduce arbitrariness and reviewer effort.
... In all these tasks, the inputs given to the function are randomly generated structures of the corresponding type. The experiments were implemented using the LabPal testing framework [25], which makes it possible to bundle all the necessary code, libraries and input data within a single self-contained executable file, such that anyone can download and independently reproduce the experiments. A downloadable lab instance containing all the experiments of this paper can be obtained online [27]. ...
Full-text available
Explainability is the process of linking part of the inputs given to a calculation to its output, in such a way that the selected inputs somehow “cause” the result. We establish the formal foundations of a notion of explainability for arbitrary abstract functions manipulating nested data structures. We then establish explanation relationships for a set of elementary functions, and for compositions thereof. A fully functional implementation of these concepts is finally presented and experimentally evaluated.
... Counting all combinations of state/action/transition residual/non-residual t-way triaging functions on all specifications, this corresponds to a set of 1452 distinct problem instances which generate a total of 5808 measurements. The experiments were implemented using the LabPal testing framework [19], and their source code is publicly available, including all input specifications. 7 Each problem instance was given a timeout of 15 seconds. ...
... The scenarios are available on Git 3 . cASTD and iASTD were compared to other tools such as Beepbeep v1, Beepbeep v3 and MonPoly [30] using the LABPAL framework [133]. The LABPAL framework allows one to easily run several experiments and it generates reports after execution. ...
Full-text available
The cybersecurity ecosystem continuously evolves with the number, the diversity, and the complexity of cyber attacks. Generally, we have three IDS types: anomaly-based detection, signature-based detection, and hybrid detection. Anomaly detection is based on the usual behavior description of the system, typically in a static manner. It enables detecting known or unknown attacks, but generating also a large number of false positives. Signature based detection enables detecting known attacks by defining rules that describe known attacker's behavior. It needs a good knowledge of attacker behavior. Hybrid detection relies on several detection methods including the previous ones. It has the advantage of being more precise during detection. Tools like Snort and Zeek offer low level languages to represent rules for detecting attacks. The number of potential attacks being large, these rule bases become quickly hard to manage and maintain. Moreover, the representation of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach to identify complex attacks. We consider the hierarchical state-transition diagram approach, using the ASTDs. ASTDs allow a graphical and modular representation of a specification that facilitates maintenance and understanding of rules. We extend the ASTD notation with new features to represent complex attacks. Next, we specify several attacks with the extended notation and run the resulting specifications on event streams using an interpreter to identify attacks. We also evaluate the performance of the interpreter with industrial tools such as Snort and Zeek. Then, we build a compiler in order to generate executable code from an ASTD specification, able to efficiently identify sequences of events.
... To provide even greater replicability, security researchers could make the entire evaluation process available. Tools such as LabPal [55] allow researchers to bundle an entire experimental setup, including data and code, into single, runnable JAR file, making it easy for anybody to download and re-run, or even alter the experiments. ...
Android has dominated the smartphone market and has become the most popular operating system for mobile devices. However, security threats in Android applications have also increased in lockstep with Android’s success. More than 3 million new malware samples, targeting the Android operating system were discovered in 2017. Although persistent research efforts have been to address these threats and several detection techniques and tools have been developed as a result, they all exhibit distinct limitations such that no single solution can claim to solve the Android malware problem. In this paper, we survey the main mechanisms and approaches for malware detection in Android applications. We identify the advantages and limitations of each and suggest avenues of research to advance knowledge in this regard.
... The experiments were implemented using the LabPal testing framework [12], which makes it possible to bundle all the necessary code, libraries and input data within a single selfcontained executable file, such that anyone can download and independently reproduce the experiments. A downloadable lab instance containing all the experiments of this paper can be obtained online from Zenodo, a research data sharing platform [15]. ...
Conference Paper
Full-text available
Added value can be extracted from event logs generated by business processes in various ways. However, although complex computations can be performed over event logs, the result of such computations is often difficult to explain; in particular, it is hard to determine what parts of an input log actually matters in the production of that result. This paper describes a framework to provide explainable results for queries executed over sequences of events, where individual output values can be precisely traced back to the data elements of the log that contribute to (i.e. “explain”) the result. This framework has been implemented into the BeepBeep event processing engine and empirically evaluated on various queries.
The paper addresses the issue of layout bugs, in which elements of a web page may overlap, become misaligned or protrude from their parent container for fortuitous reasons. It proposes a technique to apply corrections to a rendered page by formulating its current state and associated layout constraints into a Mixed Integer Linear Programming problem. An off-the-shelf numerical solver is used to generate a layout that satisfies the constraints, in such a way that disruptions to the original page are minimized. A probe then injects these corrections in the form of a temporary “hotfix”. The approach has been implemented and tested on samples of real-world web pages; using techniques that aim to reduce the size of the optimization problem, a solution can often be computed in a few seconds on commodity hardware.
The paper reports results of a large-scale survey of 708 websites, in order to measure various features related to their size and structure: DOM tree size, maximum degree, depth, diversity of element types and CSS classes, among others. The goal of this research is to serve as a reference point for studies that include an empirical evaluation on samples of web pages.
Responsive Web Design (RWD) is a concept that is born from the need to provide users with a positive and intuitive experience, no matter what device they use. Complex Cascading Style Sheets (CSS) are used in RWD to smoothly change the appearance of a website based on the window width of the device being used. The paper presents an automated approach for testing these dynamic web applications, where a combination of dynamic crawling and back-end testing is used to automatically detect RWD bugs.
Conference Paper
Full-text available
We present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Compared to the single-threaded version of BeepBeep, the allocation of just a few threads to specific portions of a query provides improvement in terms of throughput.
Full-text available
Current runtime verification tools seldom make use of multi-threading to speed up the evaluation of a property on a large event trace. In this paper, we present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Various parallelization strategies are presented and described on simple examples. The implementation of these strategies is then evaluated empirically on a sample of problems. Compared to the previous, single-threaded version of the BeepBeep engine, the allocation of just a few threads to specific portions of a query provides dramatic improvement in terms of running time.
Conference Paper
Full-text available
We explore of use of the tool BeepBeep, a monitor for the temporal logic LTL-FO\(^+\), in interpreting assembly traces, focusing on security-related applications. LTL-FO\(^+\) is an extension of LTL, which includes first order quantification. We show that LTL-FO\(^+\) is a sufficiently expressive formalism to state a number of interesting program behaviors, and demonstrate experimentally that BeepBeep can efficiently verify the validity of the properties on assembly traces in tractable time.
Full-text available
The reproduction and replication of novel scientific results has become a major issue for a number of disciplines. In computer science and related disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it in a new codebase frequently requires local knowledge missing from the published manuscripts and project websites. Alongside this issue, benchmarking, and the development of fair, and widely available benchmark sets present another barrier. In this paper, we outline several suggestions to address these issues, driven by specific examples from a range of scientific domains. Finally, based on these suggestions, we propose a new open platform for scientific software development which effectively isolates specific dependencies from the individual researcher and their workstation and allows faster, more powerful sharing of the results of scientific software engineering.
Early in my Ph.D. studies, my supervisor assigned me the task of running computer code written by a previous student who was graduated and gone. It was hell. I had to sort through many different versions of the code, saved in folders with a mysterious numbering scheme. There was no documentation and
To encourage repeatable research, fund repeatability engineering and reward commitments to sharing research artifacts.