Hastings et al. GigaScience 2014, 3:31
COMMENTARY Open Access
Ten recommendations for software
engineering in research
Janna Hastings*, Kenneth Haug and Christoph Steinbeck
Research in the context of data-driven science requires a backbone of well-written software, but scientific researchers
are typically not trained at length in software engineering, the principles for creating better software products. To
address this gap, in particular for young researchers new to programming, we give ten recommendations to ensure
the usability, sustainability and practicality of research software.
Keywords: Software engineering, Best practices
Scientific research increasingly harnesses computing as a
platform , and the size, complexity, diversity and rel-
atively high availability of research datasets in a variety
of formats is a strong driver to deliver well-designed,
efficient and maintainable software and tools. As the
frontier of science evolves, new tools constantly need to
be written; however scientists, in particular early-career
researchers, might not have received training in software
engineering , thus their code is in jeopardy of being
difficult and costly to maintain and re-use.
To address this gap, we have compiled ten brief software
1. Keep it simple
Every software project starts somewhere. A rule of thumb
is to start as simply as you possibly can.Significantlymore
problems are created by over-engineering than under-
engineering. Simplicity starts with design: a clean and ele-
gant data model is a kind of simplicity that leads naturally
to efficient algorithms.
Do the simplest thing that could possibly work, and then
double-check it really does work.
Cheminformatics and Metabolism, European Molecular Biology Laboratory –
European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10
1SD Hinxton, UK
2. Test, test, test
For objectivity, large software development efforts assign
different people to test software than those who develop
it. This is a luxury not available in most research labs,
but there are robust testing strategies available to even the
Unit tests are software tests which are executed auto-
matically on a regular basis. In test driven development,
the tests are written first, serving as a specification and
checking every aspect of the intended functionality as it is
developed . One must make sure that unit tests exhaus-
tively simulate all possible – not only that which seems
reasonable – inputs to each method.
technique when you encounter similar requirements.
Even though this seems to be the simplest approach, it
will not remain simple, because important lines of code
will end up duplicated. When making changes, you will
have to do them twice, taking twice as long, and you may
forget an obscure place to which you copied that code,
leaving a bug.
Automated tools, such as Simian , can help to detect
and fix duplication in existing codebases. To fix duplica-
tions or bugs, consider writing a library with methods that
can be called when needed.
4. Use a modular design
Modules act as building blocks that can be glued together
to achieve overall system functionality. They hide the
© 2014 Hastings et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/4 .0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication
waiver (http://creativecommons.org/publicdomain/zero /1.0/) applies to the data made available in this article, unless otherwise
Hastings et al. GigaScience 2014, 3:31 Page 2 of 4
Figure 1 An example of incomprehensible code: What does this
code actually do? It contains a bug; is it easy to spot?
details of their implementation behind a public interface,
which provides all the methods that should be used.
Users should code – and test – to the interface rather
than the implementation . Thus, concrete implemen-
tation details can change without impacting downstream
users of the module. Application programming interfaces
(APIs) can be shared between different implementation
Scrutinise modules and libraries that already exist for
the functionality you need. Do not rewrite what you can
profitably re-use – and do not be put off if the best candi-
date third-party library contains more functionality than
you need (now).
5. Involve your users
Users know what they need software to do. Let them try
the software as early as possible, and make it easy for them
to give feedback, via a mailing list or an issue tracker. In an
open source software development paradigm, your users
can become co-developers. In closed-source and commer-
cial paradigms, you can offer early-access beta releases to
Many sophisticated methods have been developed for
user experience analysis. For example, you could hold an
interactive workshop .
6. Resist gold plating
Sometimes, users ask for too much, leading to fea-
ture creep or “gold plating”. Learn to tell the difference
between essential features and the long list of wishes users
may have. Prioritise aggressively with as broad a collec-
tion of stakeholders as possible, perhaps using “game-
storming” techniques .
Gold plating is a challenge in all phases of development,
not only in the early stages of requirements analysis. In its
most mischievous disguise, just a little something is added
in every iterative project meeting. Those little somethings
7. Document everything
Comprehensive documentation helps other developers
who may take over your code, and will also help you in
tion, especially for any technically challenging blocks, and
public interface methods. However, there is no need for
comments that mirror the exact detail of code line-by-line.
It is better to have two or three lines of code that are easy
to understand than to have one incomprehensible line, for
example see Figure 1.
Write clean code [ 8] th at you would want to maintain
long-term (Figure 2). Meaningful, readable variable and
method names are a form of documentation.
Write an easily accessible module guide for each mod-
ule, explaining the higher level view: what is the purpose
of this module? How does it fit together with other mod-
ules? How does one get started using it?
8. Avoid spaghetti
Since GOTO-like commands fell justifiably out of
favour several decades ago , you might believe that
spaghetti code is a thing of the past. However, a similar
phenomenon may be observed in inter-method and inter-
module relationships (see Figures 3 and 4). Debugging –
stepping through your code as it executes line by line – can
help you diagnose modern-day spaghetti code. Beware of
module designs where for every unit of functionality you
have to step through several different modules to discover
where the error is, and along the way you have long lost
the record of what the original method was actually doing
or what the erroneous input was. The use of effective
and granular logging is another way to trace and diagnose
problems with the flow through code modules.
9. Optimise last
Beware of optimising too early. Although research appli-
cations are often performance-critical, until you truly
encounter the wide range of inputs that your software
Figure 2 This code peform s the same function, but is written more clearly.
Hastings et al. GigaScience 2014, 3:31 Page 3 of 4
Figure 3 An unhealthy module design for ‘biotool‘ with multiple interdependencies between different packages. An addition of
functionality to the system (such as supporting a new field) requires updating the software in many different places. Refactoring into a simpler
architecture would improve maintainability.
will eventually run against in the production environment,
it may not be possible to anticipate where the real bot-
tlenecks will lie. Develop the correct functionality first,
deploy it and then continuously improve it using repeated
evaluation of the system running time as a guide (while
your unit tests keep checking that the system is doing what
10. Evolution, not revolution
Maintenance becomes harder as a system gets older. Take
time on a regular basis to revisit the codebase, and con-
sider whether it can be renovated and improved .
However, the urge to rewrite an entire system from the
beginning should be avoided, unless it is really the only
option or the system is very small. Be pragmatic  – you
may never finish the rewrite . This is e specially true for
systems that were written without following the preceding
Figure 4 The functional units from the biotool architecture can
be grouped together in a refactoring process, putting similar
functions together. The result may resemble a Model-View-Controller
Use a good version control system (e.g., Git ) and a
central repository (e.g., GitHub ). In general, commit
early and commit often, and not only when refactoring.
Effective software engineering is a challenge in any enter-
prise, but may be even more so in the research context.
Among other reasons, the research context can encourage
a rapid turnover of staff, with the result that knowledge
about legacy systems is lost. There can be a shortage of
software engineering-specific training, and the “publish or
perish” culture may incentivise taking shortcuts.
Table 1 Further reading
Software Carpentry: scientific
computing skills; learn online
or in face-to-face workshops http://software- carpentry.org/
The Software Sustainability Institute http://software.ac.uk/
Learn more about what makes
code easy to maintain http://www.thc.org/root/phun/
How to write good unit tests http://developer.salesforce.com/
What is clean code? http://java.dzone.com/articles/
Introduction to refactoring http://sourcemaking.com/
The danger of premature
This table lists additional online resources where the interested reader can learn
more about software engineering best practices in the research context.
Hastings et al. GigaScience 2014, 3:31 Page 4 of 4
The recommendations above give a brief introduction
to established best practices in software engineering that
may serve as a useful reference. Some of these recom-
mendations may be debated in some contexts, but never-
theless are important to understand and master. To learn
more, Table 1 lists some additional online and educational
The authors declare that they have no competing interests.
JH prepared the initial draft. All authors contributed to, and have read and
approved, the final version.
This commentary is based on a presentation given by JH at a workshop on
Software Engineering held at the 2014 annual Metabolomics conference in
Tsuruoka, Japan. The authors would like to thank Saravanan Dayalan for
organising the workshop and giving JH the opportunity to present. We would
furthermore like to thank Robert P. Davey and Chris Mungall for their careful
and helpful reviews of an earlier version of this manuscript.
Received: 25 September 2014 Accepted: 19 November 2014
Published: 4 December 2014
1. Goble C: Better software, better research. IEEE Internet Comput 2014,
2. Wilson G, Aruliah DA, Brown C T, Chue Hong NP, Davis M, Guy RT, Haddock
SHD, Huff KD, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P:
Best practices for scientific computing. PLoS Biol 2014, 12:e1001745.
3. Beck K: Test Driven Development. New York: Addison-Wesley; 2002.
4. Simian: Similarity Analyzer. [http://www.harukizaemon.com/simian/]
5. Bloch J: Effective Java. New York: Addison-Wesley; 2008.
6. Pavelin K, Pundir S, Cham JA: Ten simple rules for running interactive
workshops. PLoS Comput Biol 2014, 10:e1003485.
7. Gray D, Brown S, Macanufo J: Game-storming: A playbook for innovators
rulebreakers and changemak ers. Sebastopol, CA: O’Reilly; 2010.
8. Martin RC: Cl ean Code: A Handbook of Agile Software Craftsmanship.
New York: Prentice Hall; 2008.
9. Dijkstra EW: Letters to the editor: go to statement considered
harmful. Commun ACM 1968, 11(3):147–148.
10. Fowler M: Refactoring: Improving the Design of Existi ng Code. Reading,
Massachusetts: Addison Wesley; 1999.
11. Hunt A, Thomas D: The Pragmatic P rogrammer. Reading, Massachusetts:
Addison Wesley Longman; 2000.
12. Brooks FP: T he Mythical Man Month. New York: Addison-Wesley; 1975.
13. The Git distributed version control management system.
14. GitHub: An online collaboration p latform for Git version-controlled
Cite this arti cle as: Hastings et al.:Ten recommendations for software
engineering in research. GigaScience 2014 3:31.
Submit your next manuscript to BioMed Central
and take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color ﬁgure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at