ArticlePDF Available

Abstract and Figures

This essay provides practical advice about how to do transparent and reproducible data analysis and writing. We note that doing research in this way today will not only improve the cumulation of knowledge within a discipline, but it will also improve the life of the researcher tomorrow. We organize the argument around a series of homilies that lead to concrete actions. (1) Data analysis is computer programming. (2) No data analyst is an island for long. (3) The territory of data analysis requires maps. (4) Version control prevents clobbering, reconciles history, and helps organize work. (5) Testing minimizes error. (6) Work *can* be reproducible. (7) Research ought to be credible communication.
Content may be subject to copyright.
REVISTA DE CIENCIA POLÍTICA / VOLUMEN 36 / N° 3 / 2016 / 829-848
Cómo mejorar su relación con su futuro yo
Universidad de Illinois
Wageningen University
This essay provides practical advice about how to do transparent and reproduc-
ible data analysis and writing. We note that doing research in this way today will
not only improve the cumulation of knowledge within a discipline, but it will also
improve the life of the researcher tomorrow. We organize the argument around a
series of homilies that lead to concrete actions. (1) Data analysis is computer pro-
gramming. (2) No data analyst is an island for long. (3) The territory of data anal-
ysis requires maps. (4) Version control prevents clobbering, reconciles history, and
helps organize work. (5) Testing minimizes error. (6) Work *can* be reproducible.
(7) Research ought to be credible communication.
Key words: research transparency, reproducible research, workow, methodology
Este ensayo ofrece consejos prácticos sobre cómo efectuar análisis de datos y escritura cientí-
ca de forma transparente y reproducible. Argumentamos que organizar la investigación de
esta manera en tiempo presente no sólo mejorará la acumulación de conocimientos dentro de
una disciplina, sino que también mejorará la vida académica futura del propio investigador.
El argumento está organizado en torno a una serie de lecciones que conducen a acciones
concretas. (1) El análisis de datos es programación computacional. (2) Ningún analista de
datos es una isla por mucho tiempo. (3) El territorio del análisis de datos requiere del uso
de mapas. (4) El control de versiones evita la superposición de versiones, la reconciliación
del historial y favorece la organización del trabajo. (5) La prueba minimiza el error. (6) El
trabajo * puede* ser reproducible. (7) La investigación debe ser una comunicación creíble.
Palabras clave: transparencia en la investigación, investigación reproducible, ujo de
trabajo, metodología
1 Many thanks to the EGAP Learning Days 2016 participants in Santiago de Chile, to the BITSS team, the De-
partment of Economics and Ted Miguel at UC Berkeley, where Maarten was a visiting researcher during spring
2016. Maarten gratefully acknowledges nancial support from N.W.O. grant 451-14-001. A previous version of
this paper beneted from comments and discussions with Mark Fredrickson, Brian Gaines, Kieran Healy, Kevin
Quinn, Cara Wong, Mika LaVaque-Manty and Ben Hansen. The source code for this document may be freely
downloaded and modied from This paper extends a previous ver-
sion by Jake Bowers (2011b). While most of the text remains the same we have expanded and updated the essay
to reect current developments in technology and thinking about transparency and reproducibility of research.
“If you tell the truth, you don’t have to remember anything.”
(Twain 1975)
Memory is tricky. Learning requires effort. When we do not practice and
repeat something that we want to remember, most people forget quickly, are
overcondent in their abilities to recall future information (Koriat and Bjork
2005), and may even recall events that never happened.1 Moreover, most of us
live busy lives. We type text knowing that the laundry needs doing, hearing
children play or cry, ignoring the news or the latest journal review, worrying
about a friend. Our lives and minds are full and we need to efciently move
from task to task. If we cannot count on memory, then how can we do science?
How long does it take from planning a study to publication and then to the rst
reproduction of it? Is three years too short? Is ten years too long? We suspect
that few of our colleagues in the social and behavioral sciences conceive of,
eld, write and publish a data driven study faster than about three years. We
also suspect that, if some other scholar decides to re-examine the analyses of a
published study, it will occur after publication. Moreover, this new scholarly
activity of learning from one another’s data and analyses can occur at any time,
many years past the initial publication of the article.2
If we cannot count on our memories about why we made such and such a design
or analysis decision, then what should we do? How can we minimize regret
with our past decisions? How can we improve our relationship with our future
self? This essay is a heavily revised and updated version of Bowers (2011b)
and provides some suggestions for practices that will make reproducible data
analysis easy and quick. Specically, this piece aims to amplify some of what we
already ought to know and do, and highlight some current practices, platforms
and possibilities.3 We aim to provide practical advice about how to do work such
that one complies with such recommendations as a matter of course and, in so
doing, can focus personal regret on bad past decisions that do not have to do
with data analysis and the production of scholarly papers.
1 See the following site for a nice overview of what we know about memory—including the fact that learning
requires practice: 10-things-most-people-get-
wrong.php On false memory see Wikipedia and linked studies
2 The process of reproducing past ndings can occur when one researcher wants to build on the work of ano-
ther. It can also occur within the context of classes—some professors assign reproduction tasks to students
to aid learning about data analysis and statistics. In addition to those models, reproduction of research has
recently been organized to enhance the quality of public policy in the eld of economic development by the
3IE Replication Program (Brown, Cameron, and Wood 2014) and to assess the quality of scientic research
within social psychology (Open Science Collaboration 2015 and others) and within experimental economics
(Camerer et al. 2016). In another study, 29 research teams recently collaborated on a project focusing on
applied statistics to see if the same answers would emerge from re-analyses of the same data set (Silberzahn
and Uhlmann 2015). They didn’t.
3 King (1995) and Nagler (1995) were two of the rst pieces introducing these kinds of ideas to political scien-
tists. Now, the efforts to encourage transparency and ease of learning from the data and analyses of others
have become institutionalized with the DA-RT initiative (; see also Lupia
and Elman 2014). These ideas are spreading beyond political science as well (see Freese 2007; Asendorpf et
al. 2013; see also and
We organize the paper around a series of homilies that lead to certain concrete
Data analysis is computer programming.
No data analyst is an island for long.
The territory of data analysis requires maps.
Version control prevents clobbering, reconciles history, and helps organize
Testing minimizes error.
Work can be reproducible.
Research ought to be credible communication.
All your results (numbers, comparisons, tables, plots, gures) should be
produced from code, not from a series of mouse clicks or copying and pasting.4
Imagine you wanted to re-create a gure and include a new variable, you should
be able to do so with just a few edits to the code rather than knowledge of how
you used a pointing device in your graphical user interface all those years ago.
Let’s look at an example. Using an open-source statistics programming
language called R (R Development Core Team 2016), you might specify that
a le, called g1.pdf is produced by the following set of commands in a le
called makegl.R. Let’s look at some annotated R code:
# This le produces a plot relating the explanatory variable to
the outcome.
## Read the data
thedata <- read.csv(“Data/thedata-15-03-2011.csv”)
## begin writing to the pdf le
please-plot(outcome by explanatory using thedata. red lines
please-add-a-line(using model1)
## Note to self: a quadratic term does not add to the substance
## model2 <- please-t(outcome by explanatory+explanatory^2
using thedata
## summary(abs(tted(model1)-tted(model2)))
4 To our future selves: both of us are using a computer control device called a track-pad, and haven’t used
an older device called a mouse in some years. Our current track-pads do not click. We are not sure why we
talked about mouse clicks just now.
## stop writing to the pdf le
Now, in the future if you wonder how “that plot on page 10” was created, you
will know: (1) “that plot” is from a le called g1.pdf and (2) g1.pdf was
created in makeg1.R. Any changes to the le in the future will just require
some quick edits of commands already written (provided R still exists).5 Even if
in the future R ceases to exist, you (or someone else) will at least be able to read
the plain text commands and use them to write code in a new favorite statistical
computing language: R scripts are written in plain text, and plain text is a format
that will be around as long as computer programmers write computer programs.
Moreover, coding saves time. Often, typing commands into a le saves a lot
of time, especially if projects grow in the number of les, collaborators, or
complexity of analysis. Manually importing one data le into R may be effortless.
Importing 100 les is another issue, and the time costs of each manual action
add up quickly (and the probability for mistakes increases too).
Also, realize that le names send messages to co-authors and to your future
self. This means that if you name your les with evocative and descriptive
names, your collaborators are less likely to call you at midnight asking
for help and you will remove some regret from your future self and protect
your friendships and working relationships. For example, if you are
studying inequality and protest, you might try naming a le something
like inequality-and-protest-gures.R instead of temp9.R or
supercalifragilisticexpialidocious.R. By the way, the extension .R
tells us and the operating system that the le contains R commands. This part
of the lename enables us to quickly search our antique hard drives for les
containing R scripts.6
Coding helps us to avoid making mistakes. For example, in our example above
we may be interested in how many people protest. We may use a data le
containing all protests for several years. People often create a tabular display of
this data by copying the results manually to the working paper document. In
copying we can make mistakes. So a better approach is to automatically create
the table (in the format you like, with horizontal lines, 3 decimals, a title, etc.)
and save this table in whatever le type you need (.tex. .pdf, .rtf, etc.). Now
when we obtain new data, a new table can be created quickly, so we make fewer
mistakes and save time!
For example, the following table was created entirely with code using R and the
xtable package Dahl (2016).7 We then read this table into the current le using
5 We use data from Norris (2015) throughout this paper.
6 We think that some method of tagging les by the purpose of the le will continue help analysts nd and
organize their les for long after the idea of a computer mouse ceases to make sense.
7 @beck2010reg inspired this particular presentation of a linear model.
the LaTeX command \inputiprotesttable.tex} which produces a nice
looking table in pdf format.
#Run the regression
lm1 <- lm(protest05 ~ gini04 + meanpr, data = good.df)
#make the table le
makebecktable(lmobj = lm1, vars = c(“Intercept”, “Income
Inequality (lower = more equal)”, “Mean Political Rights (lower
= more rights)”),
thecaption = “People living in countries with unequal income
distributions report more protest activity to World Values
Survey interviewers than people living in countries with
relatively more equal income distributions, adjusting for
average political rights as measured by Freedom House 1980--
thelabel = “tab:protest”, lename = “protesttable.tex”)
Coef Std. Err. 95% CI
Intercept 43.3 11.5 20.0 66.7
Income Inequality (lower = more equal) 43.0 28.9 –15.6 101.5
Mean Political Rights (lower = more rights) –8.5 1.7 –11.9 –5.2
n: 41, 18, R2: 0.41
Table 1: People living in countries with unequal income distributions report
more protest activity to World Values Survey interviewers than people li-
ving in countries with relatively more equal income distributions, adjusting
for average political rights as measured by Freedom House 1980–2010.
Notice that we might have made one le called makeg1.R and another called
protestincomedisttab.R which in turn produce a pdf le (g1.pdf) and
a LaTeX format table (protesttable.tex). This idea of modularity (Nagler
1995) in code enables us to quickly nd errors, and, if we want to make changes
to improve our work, only one le may be changed at a time. Further, it enhances
collaboration—Jake can work to make the gure and Maarten can work to make
the table without worry about conicting les.
STEP 1 Code everything that can be coded. If we know the provenance of results,
future or current collaborators make fewer mistakes and can quickly and easily
reproduce (and thus change and improve) upon the work.
Data analysis involves a long series of decisions. Each decision requires a
justication. Some decisions will be too small and technical for inclusion
in the published article itself. Still, these need to be documented in the code
itself (Nagler 1995). Paragraphs and citations in the publication will justify the
most important decisions but the code itself documents the smaller but still
consequential decisions. So, one must code to communicate with yourself
and others. There are two main ways to avoid forgetting the reasons you did
something with data: comment your code and tightly link your code with your
writing—making your code literate.
Code is communication: Comment code
Comments, unexecuted text inside of a script, are a message to collaborators
(including your future self) and other consumers of your work. In the above
code chunk, we used comments to explain the lines to readers unfamiliar with
R and to remember that we had tried a different specication but decided not
to use it because adding the squared term did not really change the substantive
story arising from the model.8 Messages left for your future self (or near-future
others) help retrace and justify your decisions as the work moves from seminar
paper to conference paper to poster back to paper to dissertation and onwards
maybe even to publication.
Notice one other benet of coding for an audience: we learn by teaching. By
assuming that others will look at your code, you will be more likely to write
clearer code, or perhaps even to think more deeply about what you are doing as
you do it since you are explaining even as you write.
Comment liberally. Comments are discarded when R or STATA runs analysis, so
only those who dig into the source code of your work will see them.
Code to communicate: Literate programming.
“Let us change our traditional attitude to the construction of
programs: Instead of imagining that our main task is to instruct
a computer what to do, let us concentrate rather on explaining
to human beings what we want a computer to do.”
(Knuth 1984, 97)
Imagine you discover something new (or conrm something old). You produce
a nice little report on your work for use in discussions of your working group
or as a memo for a website or reviewer appendix. The report itself is a pdf
or html le or some other format which displays page images to ease reading
rather than to encourage reanalysis and rewriting. Eventually pieces of that
report (tables, graphs, paragraphs) ought to show up in, or at least inform, the
publishable paper. Re-creating those analyses by pointing, clicking, copying, or
8 R considers text marked with # as a comment. For Stata simply add a * before the text.
pasting would invite typing error and waste time. Re-creating your arguments
justifying your analysis decisions would also waste time.
More importantly, we and others want to know why we did what we did. Such
explanations may not be very clear if we have some pages of printed code in one
hand and a manuscript in the other. Keep in mind the distinction between the
“source code” of a document (i.e. what computation was required to produce
it) and the visible, type-set page image. Page images are great for reading, but
not great for reproducing or collaborating. The source code of any document
exchanged by the group must be available and executable.9
How might one avoid these problems? Literate programming is the practice of
weaving code into a document such that paragraphs, equations, and diagrams
can explain the code, and the code can produce numbers, gures, and tables
(and diagrams and even equations and paragraphs). Literate programming is
not merely fancy commenting but is about enabling the practice of programming
itself to facilitate easy reproduction and communication.
For example, in § Section 1, we suggested that we knew where “that plot on
page 10” comes from by making sure we had a g1.pdf le produced from a
clearly commented plain text le called something like makegl.R. An even
easier solution would be to directly include a chunk of code to produce the
gure inside of the paper itself. This paper, for example, was written in plain
text using markdown markup with R code chunks to make things like Figure
1.10 Here is an example of embedding a plot inside of a document along with the
source of this paragraph. In a document sent to a journal one would tend to hide
the code chunks and thus one would only see the plot.
## Make a scatterplot of Protest by Inequality
par(bty = “n”, xpd = TRUE, pty = “s”, tcl = -0.25)
with(good.df, plot(gini04, protest05, xlab = “Gini Coecient
2004 (UNDP)”, ylab = “Proportion Reporting Protest Activities\
n(World Values Survey 2005)”, cex = 0.8))
## Label a few interesting points
with(good.df[c(“Brazil”, “United Kingdom”, “United States”,
“Sweden”,”Chile”), ], text(gini04, protest05, labels = Nation,
srt = 0, cex = 0.6, pos = 3, oset = 0.1))
9 As of the 2016 version of this paper, this idea is now widespread and made much easier than before via
online services for code sharing and collaboration such as GitHub, Open Science Framework and BitBucket.
As more data analysis moves online, it will become easier for cross-platform and geographically distant
collaboration to occur. For just one set of examples, see Docker or other services that make cloud computing
easier and more accessible.
10 This combination of Markdown and R is called R Markdown.
Figure 1: Average number of protest activities by income inequality across
countries in 2004–2005.
0.25 0.30 0.35 0.40 0.45 0.50 0.55
20 40 60 80
Gini Coefficient 2004 (UNDP)
Proportion Reporting Protest Activities
(World Values Survey 2005)
United Kingdom
United States
Markdown (and LaTeX and HTML) all have ways to cross-reference within
a document. For example, by using \label{g:giniprot} in LaTeX or
{#g:giniprot} in the pandoc version of markdown built into RStudio, we
do not need to keep track of the gure number, nor do extra work when we
reorganize the document in response to reviewer suggestions. Nor do we need
a separate makegl.R le or g1.pdf le. Tables and other numerical results
are also possible to generate within the source code of a scholarly paper. For
example, if we had omitted the lename in makebecktable above, the LaTeX
formatted table would have appeared within this document itself. For Stata
users, there now is Markdoc, which is very similar to R markdown.
The R project has a task view devoted to reproducible research listing many of
the different approaches to literate programming for R. If your workow does
not involve R, you can still implement some of the principles here. Imagine
creating a style in Microsoft Word called “code” which hides your code when
you print your document, but which allows you to at least run each code chunk
piece by piece (or perhaps there are ways to extract all text of style “code” from
a Microsoft Word document for use in some other program). Or one could just
use some other kind of indication linking paragraphs to specic places in code
les. There are many ways that creative people can program in a literate way.
Literate programming need not go against the principle of modular data analysis
(Nagler 1995). Jake routinely uses several different les that fulll different
functions, some of them create LaTeX code that he can \input into his
main.tex le, others setup the data, run simulations, or allow him to record
his journeys down blind alleys. Of course, when we have ying cars running on
autopilot, perhaps something other than combining R and Markdown or LaTeX
will make our lives even easier. Then we’ll change.
STEP 2 We analyze data in order to explain something about the world to
other scholars and policy makers. If we focus on explaining how we got our
computers to do data analysis, we will do a better job with the data analysis
itself: we will learn as we focus on teaching others about why we did what we
did, and we will avoid errors and save time as we ensure that others (including
our future selves) can retrace our steps. A paper than can be “run” to reproduce
all of analyses also instills condence in readers and can more effectively spur
discussion and learning and cumulation of research.
Data analysis tends to involve multiple people working with multiple les.
Future collaborators will need a map to understand the ow of inputs (like raw
data) and outputs (like gures, tables, and individual numbers).
Meaningful code requires data
All les containing commands operating on data must refer to a data le.
A reference to a data le is a line of code that the analysis program will use
to operate on (“load” / “open” / “get”/ “use”) the data le. One
should not have to edit this line on different computers or platforms in order
to execute this command. Using R, for example, all analysis les should
have load(“thedata.rda”) or read.csv(“thedata.csv”) or some
equivalent line in them, and thedata.csv should be stored in some easy to
nd place (like in the same directory as the le or perhaps in “Data/thedata.
rda”). Of course, its even better to include a comment pointing to the data
le in addition to the line loading the le itself. This means one should not
see lines like setwd(C:\JakesFiles\theproject) or setwd(/Users/
maartenvoors/theproject) at the beginning of documents that are meant
to be shared with a future self or present or future others.
File organization can be a map itself
Where should one store data les? An obvious solution is to make sure that
the data le used by a command le is in the same directory as the command
le. More elegant solutions require all co-authors to have the same directory
structure so that load(“Data/thedata.rda”) means the same thing on all
computers used to work on the project. This kind of solution is one of the things
that Dropbox and more formal version control systems do well (as discussed a
bit more below in § Section 4).
The principle of modularity suggests that you separate data cleaning,
processing, recoding and merging from analysis in different les (Nagler 1995).
So, perhaps your analysis les will load (“cleandata.rda”) and a comment
in the code will alert the future you (among others) that cleandata.rda was
created from create-cleandata.R which in turn begins with read.csv(\
url(“”)). Such a data processing le
will typically end with something like save(“cleandata.rda”) so that we
are doubly certain about the provenance of the data.11
Now, if in the future we wonder where cleandata.rda came from, we might
search for occurrences of cleandata in the les on our system. Of course,
if it is difcult to nd the right version of cleandata on the system, it might
help to know where les like cleandata tend to be: that is, to have a system
for project le organization. Best practice is to create a le system where you
separate folders by function and separate input from output les. For example,
below we show a folder structure for a paper where we look at inequality. The
paper is written in .tex, data analysis in both Stata (that Maarten prefers, so
we see .do and .dta le extensions) and in R (the program of choice for Jake,
see the .R extensions). Maarten likes to make sure that the directory structure
in his projects is the same across projects (so, for example, the numbers on the
directory names allow for easy default sorting). The le naming with full dates
also allows for easy sorting:
11 Of course, if you need math or paragraphs to explain what is happening in these les, you might prefer
to make them into R+Markdown or R+LaTeX les, for which the conventional extension is .Rmd or .Rnw
respectively. So you’d have create-cleandata.Rmd written as a mixture of Markdown and R which might
explain and explore the different coding decisions you made, perhaps involving some diagnostic plots.
There are a couple of things to note: See that les start with a number to give
them an ordering. Note also we create an 00_archive folder, here you can
store older les without having to delete them permanently and can reduce the
number of les in your main folders. Also there is a clear separation between
raw data (data that came straight from the eld, downloaded data, someone
else’s replication les, etc.) and clean data (data ready to use). Note that in
the analysis folder we created two les, one for cleaning and merging data
called 20160622_prep_data and another for running the analysis called
20160622_models. See also the placeholder output folders, with subfolders
for gures and tables.
The input-output map can also be in the form of les
Another solution to the problem of nding les and knowing how les relate
is to maintain a le for each project called MANIFEST.txt or INDEX.txt or
README.txt, which lists the data and command les with brief descriptions of
their functions and relations. However, this le may be a burden to maintain.
Jake tends to be less organized than Maarten and relies on les
to tell his future self what is going on in a given directory, a version control
system to keep track of versions of les (so that he doesn’t need to add dates to
lenames), and a Makele to keep track of relationships between les. Notice
that the analysis subdirectory has its own le.
Imagine that the paper.Rmd shows the boxplots.pdf le and also reports
on the total size of the data. So, the introduction to the paper requires the
workingdata.rda le (to know the sample size) and boxplots.pdf.
In turn, those les depend on workingdata.R which takes rawdata.
csv and produces the workingdata.rda le, and boxplots.R which
uses workingdata.rda to make the plots. A Makele records this web of
relationships in a structured way such that one can say make manuscript/
paper.pdf and all of the les that are required to produce paper.pdf will
be executed if they have not recently been used or if they have recently been
changed.12 Here is an example that records the relationships just described:
manuscript/paper.pdf: manuscript/paper.Rmd gures/boxplot.pdf
cd manuscript && Rscript -e “library(rmarkdown);
gures/boxplot.pdf: gures/boxplot.R data/workingdata.rda
R CMD BATCH gures/boxplot.R
data/workingdata.rda: rawdata.csv workingdata.R
R CMD BATCH data/workingdata.R
To produce paper.pdf having only downloaded rawdata.csv and the
command les, one would type make paper.pdf and the GNU make system
would rst run R CMD BATCH data/workingdata.R then, if that is successful,
it would run R CMD BATCH gures/boxplot.R and nally run the line
required for creation of paper.pdf. Later, if only paper.Rmd is edited, make
paper.pdf would only run the line for paper.pdf because it would know
that boxplot.pdf and workingdata.rda are relatively recent and the .csv
and .R les required for them have not changed.
STEP 3 We should know where the data came from and what operations were
performed on which set of data. The authors of this paper have two different
versions of this workow. Both fulll their purpose as a map to the territory of
a data analysis.
12 See for more about Makeles.
Group work requires knowing which versions of les are new, which are
old, and what changed in between. Many people are familiar with the “track
changes” feature in modern WYSIWYG (what you see is what you get) word
processors or the fact that Dropbox or Google Docs allow one to recover previous
versions of les. These are both kinds of version control. More generally,
when we collaborate, we’d like to do a variety of actions with our shared les.
Collaboration on data analytic projects is more productive and better when: (1)
it is easy to see what has changed between versions of les; (2) members of the
team feel free to experiment and then to dump parts of the experimentation in
favor of previous work while merging the successful parts into the main body
of the paper; (3) the team can produce “releases” of the same document (one
to MPSA, one to APSR, one to their parents) without spawning many possibly
conicting copies of the same document; and (4) people can work on the same
les at the same time without conicting with one another, and can reconcile
their changes without too much confusion and clobbering. Clobbering is
what happens when your future self or your current collaborator saves an old
version of a le over a new version, erasing good work by accident (or creating
a conicted copy resulting in an arduous back-and-forth between versions).
Using Dropbox, “track changes” in Word or Google Docs is one way to manage
your collaborative efforts. They are popular tools, but have some downsides as
they require you to communicate with other folks in your group before you can
edit existing les. On Dropbox, you cannot edit and save a given le at the same
time. Tracking changes with Word has similar limitations (merging the changes
between two documents is not simple or built-in). Google Docs allows for two
people to edit the same document at the same time, but you must be online with
a fairly fast internet connection.13 Communication and trading-off while editing
a le prevents your work (or your colleagues’ work) from getting lost when you
both try to save the same le on top of each other. One may also use Dropbox
as a kind of server for version control. See the example above. This system is
ne until your collaborations or project grows large (in terms of MBs) and you
may run out of storage space on your computer or you may need to upgrade
your Dropbox’s storage space. You will, however, need to agree on an iron rule
for le and directory naming with your collaborators to ensure that the les and
directories are always named the same across machines and versions, plus a
clear communication plan for trading-off.
An excellent, simple, and robust version control system is to rename your
les with the date and time of saving them: yyyymmdd_project.docx (for
changes within the same day, just before a deadline for example, you may
even add a time, so it becomes 20160623_4PM_inequalitypaper.docx).
13 Note to future selves: change this section every ve years or so. Probably stop saying ‘internet’ in 2026.
This communicates version and prevents clobbering unless the old le has the
same name or you don’t update the new le name. Remember the days when
you received a le called inequalityMV4_APSRsubmit_reallynal2.
tex? That should quickly become something of the past. We nd ourselves
preaching this gospel to our collaborators frequently and will keep renaming
les until a collaborator has been successfully pacied. Be sure to include year
in the le names—remember, the life of an idea is measured in years. If you are
wise enough to have saved your documents as plain text then you can easily
compare versions of the same document using the many utilities available
for comparing text les.14 When you reach certain milestones you can rename
the le accordingly: thedocAPSA2009.tex—for the one sent to discussants
at APSA—or thedocAPSR2015.tex—for the version eventually sent to the
APSR six years after you presented it at APSA. The formal version control
systems mentioned above all allow this kind of thing and are much more
elegant and capable, but you can do it by hand as long as you don’t mind taking
up disk space and having many “thedoc...” les around. Indeed, the number
of les can add up quickly (especially around submission time!) and you may
want to create an “0_Archive” folder to store older versions. If you do version
control by hand, spend a little extra time to ensure that you do not clobber les
when you make mistakes typing in the le-names. And, if you nd yourself
spending extra time reconciling changes made by different collaborators by
hand, remember this is a task that modern version control systems take care of
quickly and easily.
We feel the best practice is to use formal version control approaches to organizing
work version control. As of 2016, the standard for managing large collaborations
with many les is the Git system. User friendly free front-ends for Git currently
make the process of learning and using Git very convenient and integrated
into the kind of workow that your future self will be proud of: GitHub and
BitBucket make it easier to use Git, and the Open Science Framework integrates
with GitHub to further ease this task. Jake uses github for all of his collaborations
with others as well as with his future self: his collaborators appreciate some
of the nice extra features of GitHub, such as the ability to keep a shared task
list. Learning about version control systems takes a bit of time. We suggest,
however, it is well worth the time investment as it will save lots of time later on.
STEP 4 Writing is rewriting. Thus, all writing involves versions. When we
collaborate with ourselves and others we want to avoid clobbering and we
want to enable graceful reconciliation of rewriting. One can do these things with
formal systems of software (like Git) or with formal systems of le naming, le
comparing and communication—or, even better, with both. In either case, plain
14 Adobe Acrobat allows one to compare differences in pdf les. OpenOfce supports a “Compare Docu-
ments” option. Word now does the same. And Google Docs will report on the version history of a document.
On a Mac, the FileMerge utility works for plain text les.
text les will make such tasks easier, take up less disk space and be easier to
read for the future you.
Anyone writing custom code should worry about getting it right. The more code
one writes, the more time one has to appreciate problems arising from bugs,
errors, and typos in data analysis and code. One thing we can do to minimize
and catch the inevitable mistakes is to include testing in our coding.
This idea is not new. The desire to avoid error looms large when large groups
of programmers write code for multi-million dollar programs. The idea of test
driven development and the idea that one ought to create tests of small parts of
one’s code arose to address such concerns.15
For the social scientist collaborating with her future self and/or a small group
of collaborators, here is an example of this idea in a very simple form: Say you
want to write a function to multiply a number by 2. If the function works, when
you give it the number 4, you should see it return the number 8 and when you
give it -4, you should get -8.
## The test function:
test.times.2.fn <- function(){
## This function tests times.2.fn
if (times.2.fn(thenumber = 4) == 8 &
times.2.fn(thenumber = -4) == -8) {
print(“It works!”)
} else { print(“It does not work!”)
## The actual function is:
times.2.fn <- function(thenumber){
## This function multiplies a scalar number by 2
## thenumber is a scalar number
Here we use the test function to make sure it works.
[1] “It does not work!”
15 There are now R packages to help R package writers do this, see for example
testthat and the article on it 1/RJournal_2011- 1_Wickham.pdf
Ack! we mistyped + instead of *. Good thing we wrote the test!16
That approach works well when one is paying close attention to messages within
the code. Another approach that we like better is to make the code stop with an
error whenever it doesn’t pass a test. In R we use the stopifnot function for this.
For example, say we wanted to re-run our analyses excluding the countries with
extreme values of protest. The following code makes a new dataset excluding
places where more than 90% of the respondents to the World Values survey
reported some protest activity. If all of the values of protest05 are not less
than 90, the code will stop with an error.
smalldat <- subset(good.df, subset = protest05 < 90)
stopifnot(all(smalldat$protest05 < 90))
STEP 5 No one can foresee all of the ways that a computer program can fail. One
can, however, at least make sure that it succeeds in doing the task of motivating
the writing of the code in the rst place or produces an error in the right place
and the right time.
Over the past decades more and more attention is being paid to “reproducible
research”, and creating a “reproducible workow”. As always, the devil is in
the details: Here we list a few of our own attempts at enabling reproducible
research. You’ll nd many other inspiring examples on the web. Luckily, the
open source ethos aligns nicely with academic incentives, so we are beginning
to nd more and more people offering their les and ideas about workow
online for copying and improvement. By the way, if you do copy and improve,
it is polite to alert the person from whom you made the copy and to credit them
in some way if not also to offer your ideas for improvement.17
We have experimented with several systems so far: (1) We wrote this paper
in the R Markdown literate programming format using two different text
editors (Jake used vim) and (Maarten used atom). We organized our les and
collaboration on GitHub. The source code for this document can be accessed,
downloaded and modied from We
also used GitHub Issues to send notes to each other and maintain a task list. The
nice thing about GitHub is that is enables you to make “releases” of a project,
which enable you to smooth the reproduction of all of the products of your
16 A more common example of this kind of testing occurs everyday when we recode variables into new forms
but look at a crosstab of the old versus new variable before proceeding.
17 For a formal example of copying and improving code, see the GitHub-based fork and pull-request workow
research (simulations, data analyses, tables, gures, etc.). Similar features are
offered by the Open Science Framework (OSF). Maarten recently participated
in a workshop on research transparency organized by the Berkeley Institute for
Transparency in the Social Sciences in which all course material was accessible
through OSF. OSF integrates with GitHub nicely and a key benet is that both
systems have been created to remain for a long time (unlike perhaps your
university drive). (2) For one paper Jake simply mixed R and LaTeX code in
one document (called the “Sweave” format of literate programming) and then
added that document and data into a compressed archive (Bowers and Drake
2005); (3) For another, more computing intensive paper, Jake’s collaborator, Mark
Fredrickson, assembled a set of les that enabled reproduction of results using
the make system (Bowers, Hansen, and Fredrickson 2008); (4) Jake also tried the
“compendium” approach (Gentleman 2005; Gentleman and Temple Lang 2007),
which embeds an academic paper within the R package system (Bowers 2011a);
The benet of the compendium approach is that one is not required to have
access to a command line for make: the compendium is downloadable from
within R using install.packages() and is viewable using the vignette()
function in any operating system than runs R.18 The idea that one ought to be
able to install and run and use an academic paper just as one installs and uses
statistical software packages is very attractive, and we anticipate that it will
become ever easier to turn papers into R packages as creative and energetic
folks turn their attention to the question of reproducible research.19 Finally, (5)
Maarten has been using Dropbox for most of his collaborative projects. See
comments above. As a tool for making les publicly available, Dropbox is less
useful. You can, of course, make replication les available through creating an
public folder, but you will still need to refer to this folder on a particular project
website (an example is Maarten’s project on conict and football).
There are some noteworthy new developments also. See, for example, a new
web app, Jupyter Notebook (, that enables you to make and
share documents that include text, code, equations and visualizations in one
literate programming environment. For researchers working across different
platforms, there is Docker, which promises to enable all of the collaborators
on one project to use the same computing environment even if they are using
different laptops running different operating systems.
There are also are great notes programs that help you communicate with
collaborators. Remember, while email is fantastic for fast communication, it
is not a tool designed for project management. It may work for projects with
few co-authors, but when the number of collaborators increase it becomes very
difcult to keep track. Most online systems for task management have mobile
18 Notice that Jake’s reproduction archives and/or instructions for using them are hosted on the Dataverse,
which is another system designed to enhance academic collaboration across time and space.
19 See, for example,
apps, including, amongst others, Flow, Asana, Wrike, Basecamp, Simplenote,
Evernote and for Windows users there is OneNote.
STEP 7 We all learn by doing. When we create a reproducible workow and
share reproduction materials we improve both cumulation of knowledge and
our methods for doing social science (Freese 2007; King 1995).
[I]f the empirical basis for an article or book cannot be repro-
duced, of what use to the discipline are its conclusions? What
purpose does an article like this serve?
(King 1995, 445)
We all always collaborate. Many of us collaborate with groups of people at
one moment in time as we race against a deadline. All of us collaborate with
ourselves over time. The time-frames over which collaboration is required—
whether among a group of people working together or within a single scholar’s
productive life, or probably both—are much longer than any given version of
any given software will easily exist. Plain text is the exception.
But what if no one ever hears of your work, or, by some cruel fate, your article
does not spawn debate? Why then would you spend time to communicate
with your future self and others? Our own answer to this question is that it is
efcient and that out of principle we want our work to be credible and useful to
ourselves and other scholars. What we report in our data analyses should have
two main characteristics: (1) the ndings of the work should not be a matter of
opinion; and (2) other people should be able to reproduce the ndings. That is,
the work represents a shared experience—and an experience shared without
respect to the identities of others (although requiring some common technical
training and research resources).
Assume we want others to believe us when we say something. More narrowly,
assume we want other people to believe us when we say something about data:
“data” here can be words, numbers, musical notes, images, ideas, etc. The point
is that we are making some claims about patterns in some collection of stuff.
For example, when Jake was invited into the homes and ofces of ordinary
people in Chile in 1991, the stuff was recordings of long and semi-structured
conversations about life during the rst year of democracy. Now, it might be easy
to convince others that “this collection of stuff” is different from “that collection
of stuff” if those others were looking over our shoulders the whole time that we
made decisions about collecting the stuff and broke it up into understandable
parts and reorganized and summarized it. Unfortunately, we can’t assume that
people are willing to shadow a researcher throughout her career. Rather, we
do our work alone or in small groups and want to convince other distant and
future people to take our analyses and ndings seriously.
Now, say your collections of stuff are large or complex and your chosen tools
of analyses are computer programs. How can we convince people that what we
did with some data with some code is credible, not a matter of whim or opinion,
and reproducible by others who didn’t shadow us as we wrote our papers?
This essay has suggested a few concrete ways to enhance the believability of
such scholarly work. In addition, these actions (as summarized in the section
headings of this essay) make collaboration within research groups more
effective. Believability comes in part from reproducibility and researchers often
need to be able to reproduce in part or in whole what different people in the
group have done or what they, themselves, did in the past.
In the end, following these practices and those recommended by Fredrickson,
Testa, and Weidmann (2011) and Healy (2011) among others working on these
topics allows your computerized analyses of your collections of stuff to be
credible. If someone then quibbles with your analyses, your future self can
shoot them the archive required to reproduce your work.20 You can say, “Here
is everything you need to reproduce my work.” To be extra helpful you can
add “Read the README le for further instructions.” And then you can get on
with enjoying your life full of friends, children, laundry, news, or even journal
Asendorpf, Jens B., Mark Conner, Filip De Fruyt, Jan De Houwer, Jaap J. A. Denissen, Klaus
Fiedler, Susann Fiedler, David C. Funder, Reinhold Kliegl, Brian A. Nosek, Marco Pe-
rugini, Brent W. Roberts, Manfred Schmitt, Marcel A. G. Van Aken, Hannelore Weber,
Jelte M. Wicherts. 2013. “Recommendations for Increasing Replicability in Psycholo-
gy”. European Journal of Personality 27(2): 108-19.
Bowers, Jake. 2011a. “Reproduction Compendium for: ‘Making Effects Manifest in Ran-
domized Experiments’”. Retrieved on 26 Sep 2008 from
Bowers, Jake. 2011b. “Six Steps to a Better Relationship with Your Future Self”. The Political
Methodologist 18(2): 2-8.
Bowers, Jake and Katherine W. Drake. 2005. “Reproduction Archive for: ‘EDA for HLM:Vi-
sualization when Probabilistic Inference Fails’”. Retrieved from http://hdl.handle.
Bowers, Jake, Ben B. Hansen and Mark M. Fredrickson. 2008. “Reproduction Archive for: ‘At-
tributing Effects to A Cluster Randomized Get-Out-The-Vote Campaign’”. Retrieved
on 26 Sep 2008 from
Brown, Annette N., Drew B. Cameron and Benjamin DK Wood. 2014. “Quality Evidence for
Policymaking: I’ll Believe It When I See the Replication”. Journal of Development Effec-
tiveness 6(3): 215-35.
20 Since you used plain text, the les will still be intelligible, analyzed using commented code so that folks can
translate to whatever system succeeds R, or since you used R, you can include a copy of R and all of the
R packages you used in your nal analyses in the archive itself. You can even throw in a copy of whatever
version of Linux you used and an open source virtual machine running the whole environment using, say,
Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannes-
son, Michael Kirchler, Johan Almenberg, Adam Altmejd, Taizan Chan, Emma Heiken-
sten, Felix Holzmeister, Taisuke Imai, Siri Isaksson, Gideon Nave, Thomas Pfeiffer, Mi-
chael Razen and Hang Wu. 2016. “Evaluating Replicability of Laboratory Experiments
in Economics”. Science 351(6280): 1433-1436.
Dahl, David B. 2016. “Xtable: Export Tables to Latex or Html”. Retrieved from https://
Fredrickson, Mark M., Paul F. Testa, and Nils B. Weidmann. 2011. “Collaboration for Social
Scientists, or Software Is the Easy Part”. The Political Methodologist, 18(2): 19-23.
Freese, Jeremy. 2007. “Replication Standards for Quantitative Social Science Why Not Sociol-
ogy?”. Sociological Methods & Research 36(2): 153-72.
Gentleman, Robert. 2005. “Reproducible Research: A Bioinformatics Case Study”. Statistical
Applications in Genetics and Molecular Biology 4(1): 1-23.
Gentleman, Robert and Duncan Temple Lang. 2007. “Statistical Analyses and Reproducible
Research”. Journal of Computational and Graphical Statistics 16(1): 1-23.
Healy, Kieran. 2011. “Choosing Your Workow Applications”. The Political Methodologist
18(2): 9-18
King, Gary. 1995. “Replication, Replication”. PS: Political Science and Politics 28(3): 444-452.
Knuth, Donald E. 1984. “Literate Programming”. The Computer Journal 27(2): 97-111.
Koriat, Asher and Robert A. Bjork. 2005. “Illusions of Competence in Monitoring One’s
Knowledge During Study”. Journal of Experimental Psychology: Learning, Memory, and
Cognition 31(2): 187-194.
Lupia, Arthur and Colin Elman. 2014. “Openness in Political Science: Data Access and Re-
search Transparency”. PS: Political Science & Politics 47(1): 19-42.
Nagler, Jonathan. 1995. “Coding Style and Good Computing Practices”. PS: Political Science
and Politics 28(3): 488-92.
Norris, Pippa. 2015. “Democracy Crossnational Data, Release4.0”. Accessed from https://
Open Science Collaboration, and others. 2015. “Estimating the Reproducibility of Psychologi-
cal Science”. Science 349(6251): aac4716.
R Development Core Team. 2016. R: A Language and Environment for Statistical Computing. Vi-
enna, Austria: R Foundation for Statistical Computing. Retrieved from http://ww-
Silberzahn, Raphael and Eric L. Uhlmann. 2015. “Crowdsourced Research: Many Hands
Make Tight Work”. Nature 526(7572): 189.
Twain, Mark. 1975. Mark Twain’s Notebooks & Journals, Volume I: (1855-1873). Berkley and Los
Angeles: University of California Press
Jake Bowers is an Associate Professor in the Departments of Political Science & Statistics at the
University of Illinois at Urbana-Champaign, and a Fellow of the White House Social and Behavioral
Sciences Team. Contact email:
Maarten Voors is an Assistant Professor at the Development EconomicsGroup at Wageningen
University, the Netherlands. Contact email:
... On the basis of our experience with the special issue-as well as the suggestions of others (Alvarez et al. 2018;Barnes 2010;Benureau and Rougier 2018;Bowers and Voors 2016;Eubank 2016;Fehr et al. 2016;Grüning et al. 2018;Konkol et al. 2019;Marwick, Boettiger, and Mullen 2018;Nagler 1995;Peng 2009;Sandve et al. 2013;Stodden et al. 2016;Tatman et al. 2018;Wilson et al. 2017)-we have some recommendations for authors who wish to improve the computational reproducibility of their results now and in the future. We believe that these recommendations are applicable to all authors, even if they are not using software containers and cloud computing. ...
Full-text available
Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author’s raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals.
... The scientist can repeat the numerical experiment or the data evaluation with exactly the same software version and thus prove the reproducibility of the results. VCS are therefore not only beneficial for collaborating teams, but also for the individual developer, since they improve the collaboration with the future self (Bowers and Voors, 2016). First version control systems have been developed in the 1970s, where they initially helped with the administration of the Unix system by protocolling changes in configuration files etc. (Rochkind, 1975) and developing operational systems. ...
Full-text available
In this paper, recent software project management tools and their possible advantages, disadvantages, and possible limitations will be discussed, with respect to their application in scientific projects in geoscience and climate science.
... In the future, the construction of well-organized, clear, and usable replication materials will depend on the training of scholars so that research replication is effectively incorporated into their workflows. Researchers can find further advice, particularly relative to coding practices, in articles by Nagler (1995) and Bowers and Voors (2016). ...
Full-text available
With the discipline’s push toward data access and research transparency (DA-RT), journal replication archives are becoming increasingly common. As researchers work to ensure that replication materials are provided, they also should pay attention to the content—rather than simply the provision—of journal archives. Based on our experience in analyzing and handling journal replication materials, we present a series of recommendations that can make them easier to understand and use. The provision of clear, functional, and well-documented replication materials is key for achieving the goals of transparent and replicable research. Furthermore, good replication materials enhance the development of extensions and related research by making state-of-the-art methodologies and analyses more accessible.
Field experiments have been embraced in development economics and political science as a core method to learn what development interventions work and why. Scientists across the globe actively engage with development practitioners to evaluate projects and programmes. However, even though field experiments have raised the bar on causality, they are often too narrowly defined and lack focus on structural development problems. Researchers and development practitioners should do more to improve the diagnostic process of the problem under study. Rodrik's (2010) diagnostic framework provides a useful tool to improve the design and relevance of field experiments. Specifically, more should be done to seek coordination across studies, broaden the scope for interdisciplinary collaborations and seek peer review to increase validation and verification of evaluations. Only then can we increase knowledge aggregation and improve development policy making.
Full-text available
Crowdsourcing research can balance discussions, validate findings and better inform policy, say Raphael Silberzahn and Eric L. Uhlmann.
Full-text available
Full-text available
In 2012, the American Political Science Association (APSA) Council adopted new policies guiding data access and research transparency in political science. The policies appear as a revision to APSA's Guide to Professional Ethics in Political Science. The revisions were the product of an extended and broad consultation with a variety of APSA committees and the association's membership.
Full-text available
In this paper, we make the case for replication as a crucial methodology for validating research used for evidence-based policymaking, especially in low- and middle-income countries. We focus on internal replication or the reanalysis of original data to address an original evaluation or research question. We review the current state of replication in the social sciences and present data on the trends among academic publications. We then discuss four challenges facing empirical research that internal replication can help to address. We offer a new typology of replication approaches for addressing these challenges. The types – pure replication, measurement and estimation analysis, and theory of change analysis – highlight that internal replication can test for consistency and statistical robustness but can and should also be used to ensure that a study fully explores possible theories of change in order to draw appropriate conclusions and recommendations for policymaking and programme design.
Another social science looks at itself Experimental economists have joined the reproducibility discussion by replicating selected published experiments from two top-tier journals in economics. Camerer et al. found that two-thirds of the 18 studies examined yielded replicable estimates of effect size and direction. This proportion is somewhat lower than unaffiliated experts were willing to bet in an associated prediction market, but roughly in line with expectations from sample sizes and P values. Science , this issue p. 1433
Replication of scholarly analysis depends on individual researchers being able to explain exactly what they have done. And being able to explain exactly what one has done requires keeping good records of it. This article describes basic good computing practices and offers advice for writing clear code that facilitates the task of replicating the research. The goals are simple. First, the researcher should be able to replicate his or her own work six hours later, six months later, and even six years later.
The credibility of quantitative social science benefits from policies that increase confidence that results reported by one researcher can be verified by others. Concerns about replicability have increased as the scale and sophistication of analyses increase the possible dependence of results on subtle analytic decisions and decrease the extent to which published articles contain full descriptions of methods. The author argues that sociology should adopt standards regarding replication that minimize its conceptualization as an ethical and individualistic matter and advocates for a policy in which authors use independent online archives to deposit the maximum possible information for replicating published results at the time of publication and are explicit about the conditions of availability for any necessary materials that are not provided. The author responds to several objections that might be raised to increasing the transparency of quantitative sociology in this way and offers a candidate replication policy for sociology.
Replicability of findings is at the heart of any empirical science. The aim of this article is to move the current replicability debate in psychology towards concrete recommendations for improvement. We focus on research practices but also offer guidelines for reviewers, editors, journal management, teachers, granting institutions, and university promotion committees, highlighting some of the emerging and existing practical solutions that can facilitate implementation of these recommendations. The challenges for improving replicability in psychological science are systemic. Improvement can occur only if changes are made at many levels of practice, evaluation, and reward.