The Replication Tax: Shifting the Financial Burden to Incentivize Reproducibility in Computational Research

Working Paper (PDF Available) · January 2018with 36 Reads
DOI: 10.1145/3093338.3093378e
Version: 20180105, DOI:10.1145/3093338.3093378e
The Replication Tax: Shifting the Financial Burden to Incentivize
Reproducibility in Computational Research
Robert Nagler and David Bruhwiler
RadiaSoft LLC, Boulder, CO 80301 USA
By its very nature, software and its outputs are reproducible. We regularly make exact copies of software
and its outputs. For example, you are reading an exact copy of this position paper.
Software can also be easily created. Almost every scientist has to learn to code, be it programming in C++
or scripting in R. Creating software has never been easier, and there are myriad ways to do computational
geospatial science. The fact that software is so
malleable is a good thing.
The flexibility of software is unfortunately what inhibits reproducibility. Computational scientists
sometimes create Gordian Knots of software that become impossible for others to untangle. All manner of
advances in software engineering tools and programming languages have not and will not change this
behavior, because the root cause is cultural, not technical.
The Geospatial Software Institute (GSI) can effect change in reproducibility of geospatial research, but it
needs to take a different approach: a tax on irreplicability, or more simply, a replication tax. The harder it
1
is to replicate a research result, the greater the cost (tax) to the research project promoting the result.
Conversely, trivially replicable results would impose only a small financial burden. For example, projects
using the cybergis-jupyter framework can be easily replicated, because the software and inputs are
defined clearly in an Jupyter notebook. [1]
Taxes (regulations) are rarely popular, especially on scientific research budgets, but they exist for a
reason: to incentivize good behavior and to promote public goods. For example, this workshop taxes
participants to write a position paper in order to attend. This helps ensure an active and engaged audience.
Participants are also taxed on the number of words used: no more than two pages in Times New Roman
font type at a size of 11 points or larger with one inch or wider margins. [2] As with replication, concise
text is not free.
The replication tax needs to be well-defined. The Association for Computing Machinery (ACM) Artifact
Review and Badging Policy defines replicability as follows: "for computational experiments, this means
that an independent group can obtain the same result using the author's own artifacts." [3] Replication is a
first and practical step towards true scientific reproducibility, which means "that an independent group
can obtain the same result using artifacts which they develop completely independently." [3]
1 Replication tax is also easier to say than replication regulation, which is how an economist would phrase
our proposal for incentivizing reproducibility.
The replication tax shifts the burden of proof of replicability from the readers to the authors. Currently,
journals ask authors to "provide sufficient details to allow the work to be reproduced by an independent
researcher." [4] We propose the GSI require a concrete demonstration: a link to a screencast from a third
party showing the replication of results using the same software as the author or a link to an executable
replica of the experiment such as a Jupyter notebook on tmpnb.org. [5]
The term tax may be toxic to some. The GSI might label the tax as a subsidy. Economists argue about
what type of incentive is the best means to change behavior. [6] Our proposal is not about the name, but
about the need for a hard requirement that authors bear the financial burden to help motivate change in
reproducibility. The precise form of this requirement is not important. Rather, to effect real change, a
clear standard must be promoted: all geospatial computation science results must be demonstrably
replicable.
References
http://rsl.link/gsi18
[1] Yin D, Liu Y, Padmanabhan A, Terstriep J, Rush J, Wang S. A cybergis-jupyter framework for
geospatial analytics at scale. PEARC
2017, Vol. Part F128771
, a18, DOI: 10.1145/3093338.3093378e
rsl.link/gsi18/1
[2] Geospatial Software: Connecting Big Data with Geospatial Discovery and Innovation rsl.link/gsi18/2
[3] Artifact Review and Badging Publication Policy of the ACM rsl.link/gsi18/3
[4] Article Structure: Material and Methods in Guide for Authors of Elsevier rsl.link/gsi18/4
[5] Public Jupyter Notebook Server hosted by Rackspace rsl.link/gsi18/5
[6] Carbon tax v cap-and-trade: which is better? rsl.link/gsi18/6
This research hasn't been cited in any other publications.
    • D Yin
    • Y Liu
    • A Padmanabhan
    • J Terstriep
    • J Rush
    • S Wang
    Yin D, Liu Y, Padmanabhan A, Terstriep J, Rush J, Wang S. A cybergis-jupyter framework for geospatial analytics at scale. ​ PEARC​ ​ 2017​, ​ Vol. Part F128771​, a18, DOI: 10.1145/3093338.3093378e rsl.link/gsi18/1