Costanza et al. BMC Bioinformatics (2019) 20:301
METHODOLOGY ARTICLE Open Access
A comparison of three programming
languages for a full-fledged next-generation
Pascal Costanza *†, Charlotte Herzeel†and Wilfried Verachtert
Background: elPrep is an established multi-threaded framework for preparing SAM and BAM files in sequencing
pipelines. To achieve good performance, its software architecture makes only a single pass through a SAM/BAM file
for multiple preparation steps, and keeps sequencing data as much as possible in main memory. Similar to other
SAM/BAM tools, management of heap memory is a complex task in elPrep, and it became a serious productivity
bottleneck in its original implementation language during recent further development of elPrep. We therefore
investigated three alternative programming languages: Go and Java using a concurrent, parallel garbage collector on
the one hand, and C++17 using reference counting on the other hand for handling large amounts of heap objects.
We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use.
Results: The Go implementation performs best, yielding the best balance between runtime performance and
memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use
of the Java runs is significantly higher. The C++17 benchmarks run significantly slower than both Go and Java, while
using somewhat more memory than the Go runs. Our analysis shows that concurrent, parallel garbage collection is
better at managing a large heap of objects than reference counting in our case.
Conclusions: Based on our benchmark results, we selected Go as our new implementation language for elPrep, and
recommend considering Go as a good candidate for developing other bioinformatics tools for processing SAM/BAM
data as well.
Keywords: Next-generation sequencing, Sequence analysis, SAM/BAM files, C++, Go, Java, Runtime performance,
Memory usage, Garbage collection, Reference counting
The sequence alignment/map format (SAM/BAM) is
the de facto standard in the bioinformatics community for
storing mapped sequencing data. There exists a large body
of work on tools for processing SAM/BAM files for anal-
ysis [1–15]. The SAMtools , Picard , and Genome
Analysis Toolkit (GATK)  software packages devel-
oped by the Broad and Sanger institutes are considered
to be reference implementations for many operations on
SAM/BAM files, examples of which include sorting reads,
marking polymerase chain reaction (PCR) and optical
†Pascal Costanza and Charlotte Herzeel contributed equally to this work.
imec, ExaScience Lab, Kapeldreef 75, 3001 Leuven, Belgium
duplicates, recalibrating base quality scores, indel realign-
ment, and various filtering options, which typically pre-
cede variant calling. Many alternative software packages
[4–10,12,14,15] focus on optimizing the computations
of these operations, either by providing alternative algo-
rithms, or by using parallelization, distribution, or other
optimization techniques specific to their implementation
language, which is often C, C++, or Java.
We have developed elPrep [8,16], an open-source,
multi-threaded framework for processing SAM/BAM
files in sequencing pipelines, especially designed for
optimizing computational performance. It can be used
as a drop-in replacement for many operations imple-
mented by SAMtools, Picard, and GATK, while pro-
ducing identical results [8,16]. elPrep allows users to
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 2 of 10
specify arbitrary combinations of SAM/BAM opera-
tions as a single pipeline in one command line. elPrep’s
unique software architecture then ensures that running
such a pipeline requires only a single pass through
the SAM/BAM file, no matter how many operations
are specified. The framework takes care of merg-
ing and parallelizing the execution of the operations,
which significantly speeds up the overall execution of
In contrast, related work focuses on optimizing
individual SAM/BAM operations, but we have shown
that our approach of merging operations outperforms this
strategy . For example, compared to using GATK4,
elPrep executes the 4-step Broad Best Practices pipeline
 (consisting of sorting, marking PCR and optical dupli-
cates, and base quality score recalibration and application)
up to 13x faster on whole-exome data, and up to 7.4x
faster on whole-genome data, while utilizing fewer com-
pute resources .
All SAM/BAM tools have in common that they need
to manipulate large amounts of data, as SAM/BAM files
easily take up 10–100 gigabytes (GB) in compressed
form. Some tools implement data structures that spill
to disk when reaching a certain threshold on random
access memory (RAM) use, but elPrep uses a strategy
where data is split upfront into chunks that are processed
entirely in memory to avoid repeated file input/output
. Our benchmarks show that elPrep’s representation
of SAM/BAM data is more efficient than, for example,
GATK version 4 (GATK4), as elPrep uses less memory
for loading the same number of reads from a SAM/BAM
file in memory . However, since elPrep does not pro-
vide data structures that spill to disk, elPrep currently
requires a fixed minimum amount of RAM to process a
whole-exome or whole-genome file, whereas other tools
sometimes allow putting a cap on the RAM use by using
disk space instead. Nonetheless, for efficiency, it is rec-
ommended to use as much RAM as available, even when
spilling to disk [8,18]. This means that, in general, tools
for processing SAM/BAM data need to be able to manip-
ulate large amounts of allocated memory.
In most programming languages, there exist more or
less similar ways to explicitly or implicitly allocate mem-
ory for heap objects which, unlike stack values, are not
bound to the lifetimes of function or method invocations.
However, programming languages strongly differ in how
memory for heap objects is subsequently deallocated. A
detailed discussion can be found in “The Garbage Collec-
tion Handbook” by Jones, Hosking, and Moss . There
are mainly three approaches:
Manual memory management Memory has to be
explicitly deallocated in the program source code
(for example by calling free in C ).
Garbage collection Memory is automatically managed
by a separate component of the runtime library
called the garbage collector. At arbitrary points in
time, it traverses the object graph to determine
which objects are still directly or indirectly accessible
by the running program, and deallocates inacces-
sible objects. This ensures that object lifetimes do
not have to be explicitly modelled, and that point-
ers can be more freely passed around in a program.
Most garbage collector implementations interrupt
the running program and only allow it to continue
executing after garbage collection – they “stop the
world”  – and perform object graph traversal
using a sequential algorithm. However, advanced
implementation techniques, as employed by Java 
and Go , include traversing the object graph
concurrently with the running program while limit-
ing its interruption as far as possible; and using a
multi-threaded parallel algorithm that significantly
speeds up garbage collection on modern multicore
Reference counting Memory is managed by maintain-
ing a reference count with each heap object. When
pointers are assigned to each other, these reference
counts are increased or decreased to keep track of
how many pointers refer to each object. Whenever
a reference count drops to zero, the corresponding
object can be deallocated.1
elPrep was originally, up to version 2.6, implemented
in the Common Lisp programming language . Most
existing Common Lisp implementations use stop-the-
world, sequential garbage collectors. To achieve good per-
formance, it was therefore necessary to explicitly control
how often and when the garbage collector would run to
avoid needless interruptions of the main program, espe-
cially during parallel phases. As a consequence, we also
had to avoid unnecessary memory allocations, and reuse
already allocated memory as far as possible, to reduce
the number of garbage collector runs. However, our more
recent attempts to add more functionality to elPrep (like
optical duplicate marking, base quality score recalibra-
tion, and so on) required allocating additional memory for
these new steps, and it became an even more complex task
and a serious productivity bottleneck to keep memory
allocation and garbage collection in check. We there-
fore started to look for a different programming language
using an alternative memory management approach to
continue developing elPrep and still achieve good perfor-
Existing literature on comparing programming lan-
guages and their implementations for performance
typically focus on specific algorithms or kernels in isola-
tion, no matter whether they cover specific domains like
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 3 of 10
bioinformatics , economics , or numerical com-
puting , or are about programming languages in gen-
eral [27–31]. Except for one of those articles , none of
them consider parallel algorithms. Online resources that
compare programming language performance also focus
on algorithms and kernels in isolation . elPrep’s per-
formance stems both from efficient parallel algorithms for
steps like parallel sorting or concurrent duplicate mark-
ing, but also from the overall software architecture that
organizes these steps into a single-pass, multi-threaded
pipeline. Since such software-architectural aspects are not
covered by the existing literature, it therefore became
necessary to perform the study described in this article.
elPrep is an open-ended software framework that allows
for arbitrary combinations of different functional steps in
a pipeline, like duplicate marking, sorting reads, replac-
ing read groups, and so on; additionally, elPrep also
accommodates functional steps provided by third-party
tool writers. This openness makes it difficult to precisely
determine the lifetime of allocated objects during a pro-
gram run. It is known that manual memory management
can contribute to extremely low productivity when devel-
oping such software frameworks. See for example the
IBM San Francisco project, where a transition from C++
with manual memory management to Java with garbage
collection led to an estimated 300% productivity increase
. Other open-ended software frameworks for process-
ing SAM/BAM files include GATK4 , Picard , and
Therefore, manual memory management is not a practi-
cal candidate for elPrep, and concurrent, parallel garbage
collection and reference counting are the only remain-
ing alternatives. By restricting ourselves to mature pro-
gramming languages where we can expect long-term
community support, we identified Java and Go as the
only candidates with support for concurrent, parallel
garbage collection2, and C++17  as the only candi-
date with support for reference counting (through the
std::shared_ptr library feature).3
The study consisted of reimplementations of elPrep in
C++17, Go, and Java, and benchmarking their runtime
performance and memory usage. These are full-fledged
applications in the sense that they fully support a typical
preparation pipeline for variant calling consisting of sort-
ing reads, duplicate marking, and a few other commonly
used steps. While these three reimplementations of elPrep
only support a limited set of functionality, in each case
the software architecture could be completed with addi-
tional effort to support all features of elPrep version 2.6
Running a typical preparation pipeline using elPrep’s soft-
ware architecture in the three selected programming lan-
guages shows that the Go implementation performs best,
followed by the Java implementation, and then the C++17
To determine this result, we used a five-step prepara-
tion pipeline, as defined in our previous article , on a
whole-exome data set (NA12878 ). This preparation
pipeline consists of the following steps:
•Sorting reads for coordinate order.
•Removing unmapped reads.
•Marking duplicate reads.
•Replacing read groups.
•Reordering and filtering the sequence dictionary.
We ran this pipeline 30 times for each implementation,
and recorded the elapsed wall-clock time and maximum
memory use for each run using the Unix time com-
mand. We then determined the standard deviation and
confidence intervals for each set of runs .
C++17 and Java allow for fine-grained tuning of their
memory management, leading to four variations each.
For the final ranking in this section, we have chosen the
best result from each set of variations, one for C++17
and one for Java. The other results are presented in the
“Discussion” section below. The Go benchmarks were
executed with default settings.
The benchmark results for the runtime performance of
the three selected implementations are shown in Fig. 1.Go
needs on average 7 mins 56.152 secs with a standard devi-
ation of 8.571 secs; Java needs on average 6 mins 54.546
secs with a standard deviation of 5.376 secs; and C++17
needs on average 10 mins 23.603 secs with a standard
deviation of 22.914 secs. The confidence intervals for Go
interval for C++17.
The benchmark results for the maximum memory use
are shown in Fig. 2. Go needs on average ca. 221.73 GB
with a standard deviation of ca. 6.15 GB; Java needs on
average ca. 335.46 GB with a standard deviation of ca. 0.13
GB; and C++17 needs on average ca. 255.48 GB with a
standard deviation of ca. 2.93 GB. Confidence intervals are
The goal of elPrep is to simultaneously keep both the
runtime and the memory use low. To determine the final
ranking, we therefore multiply the average elapsed wall-
clock time in hours (h) with the average maximum mem-
ory use in gigabytes (GB), with lower values in gigabyte
hours (GBh) being better. This yields the following values
(cf. Fig. 3):
•29.33 GBh for Go
•38.63 GBh for Java
•44.26 GBh for C++17
This appropriately reflects the results of the benchmarks:
While the Java benchmarks report a somewhat faster run-
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 4 of 10
Fig. 1 Runtime performance. Average elapsed wall-clock times in minutes for the best Go, Java, and C++17 implementations, with confidence
time than the Go benchmarks, the memory use of the
Java runs is significantly higher, leading to a higher GBh
value than for the Go runs. The C++17 runs are sig-
nificantly slower than both Go and Java, explaining the
highest reported GBh value. We therefore consider Go to
be the best choice, yielding the best balance between run-
time performance and memory use, followed by Java and
Memory management issues in elPrep in more detail
sorting of reads and duplicate marking, among other steps
. Such a pipeline executes in two phases: In the first
phase, elPrep reads a BAM input file, parses the read
entries into objects, and performs duplicate marking and
some filtering steps on the fly. Once all reads are stored as
heap objects in RAM, they are sorted using a parallel sort-
ing algorithm. Finally, in the second phase, the modified
reads are converted back into entries for a BAM output
file and written back. elPrep splits the processing of reads
into these two phases because writing the reads back to an
output file can only commence once duplicates are fully
known and reads are fully sorted in RAM.
Phase 1 allocates various data structures while pars-
ing the read representations from BAM files into heap
objects. A subset of these objects become obsolete after
phase 1. The different memory management approaches
outlined in the “Background” section above deal with
these temporary objects in different ways.
A garbage collector needs to spend time to classify these
obsolete objects as inaccessible and deallocate them. A
stop-the-world, sequential garbage collector creates a sig-
nificant pause in which the main program cannot make
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 5 of 10
Fig. 2 Maximum memory use. Average maximum memory use in GB for the best Go, Java, and C++17 implementations, with confidence intervals
Fig. 3 Final ranking of programming languages. Average elapsed wall-clock times multiplied by average maximum memory use in GBh
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 6 of 10
progress. This was the case with the previous elPrep ver-
sions (up to version 2.6), which is why we provided an
option to users to disable garbage collection altogether
in those versions . In contrast, a concurrent, parallel
garbage collector can perform its job concurrently with
phase 2, which can therefore commence immediately.
With reference counting, objects are recognized as
obsolete due to their reference counts dropping to zero.
Deallocation of these objects leads to transitive dealloca-
tions of other objects because of their reference counts
transitively dropping to zero. Since this is an inherently
sequential process, this leads to a similar significant pause
as with a stop-the-world garbage collector.
C++17 performance in more detail
C and C++ typically perform much better than other pro-
gramming languages in most benchmarks that focus on
isolated algorithms or kernels [24–26,28–30]. Since our
C++17 implementation of elPrep uses reference counting,
this performance gap may be explained by the dealloca-
tion pause caused by reference counting, as described in
the previous subsection.
To verify this theory, we timed each phase and the deal-
location pause in the C++17 implementation of elPrep
separately, and repeated the benchmark another 30 times
to determine the timings, standard deviations, and confi-
dence intervals. The results are shown in Fig. 4.Thefirst
phase needs on average 4 mins 26.657 secs, with a stan-
dard deviation of 6.648 secs; the deallocation pause needs
on average 2 mins 18.633 secs, with a standard deviation
of 4.77 secs; and the second phase needs on average 3 mins
33.832 secs, with a standard deviation of 17.376 secs.
The average total sum of the 30 C++17 runtimes is 10
mins 19.122 secs with a standard deviation of 22.782 secs.
If we substract the timings of the deallocation pause from
the average total runtime, we get 8 mins 0.489 secs with a
standard deviation of 20.605 secs. This is indeed very close
to the Go benchmarks which, as reported above, need
on average 7 mins 56.152 secs. We therefore conclude
that the performance gap between the C++17 version
and the Go and Java versions can indeed be explained by
the deallocation pause caused by the reference counting
mechanism in C++17.
C++ provides many features for more explicit memory
management than is possible with reference counting. For
example, it provides allocators  to decouple memory
management from handling of objects in containers. In
principle, this may make it possible to use such an allo-
cator to allocate temporary objects that are known to
become obsolete during the deallocation pause described
above. Such an allocator could then be freed instantly,
removing the described pause from the runtime. However,
this approach would require a very detailed, error-prone
analysis which objects must and must not be managed
by such an allocator, and would not translate well to
other kinds of pipelines beyond this particular use case.
Since elPrep’s focus is on being an open-ended software
framework, this approach is therefore not practical.
Tuning of memory management in C++17
The performance of parallel C/C++ programs often suf-
fers from the low-level memory allocator provided by the
C/C++ standard libraries. This can be mitigated by linking
a high-level memory allocator into a program that reduces
Fig. 4 Runtimes of phases in the C++17 implementation. Average elapsed wall-clock times in minutes for the two main phases of an elPrep pipeline
in the C++17 implementation, and the deallocation pause in between phase 1 and 2 caused by the reference counting mechanism, with
confidence intervals. The second row depicts the same averages as in the first now, but without the deallocation pause. The sum of the two phases
in the second row is very close to the Go runtimes shown in Fig. 1
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 7 of 10
synchronization, false sharing, and memory consumption,
among other things . Such a memory allocator also
groups objects of similar sizes into separate groups that
can be allocated from the operating system and freed
again in larger blocks, to efficiently handle even large
numbers of small-scale heap allocations in programs.
These are techniques which are also commonly found in
garbage-collected programming languages, but are largely
independent from whether memory management is auto-
matic or manual . In our study, we have benchmarked
the C++17 implementation using the default unmodi-
fied memory allocator, the tbbmalloc allocator from Intel
Threading Building Blocks , the tcmalloc allocator
from gperftools , and the jemalloc allocator . The
measurements are shown in Table 1. According to the
listed GBh values, jemalloc performs best.
Tuning of memory management in Java
Java provides a number of tuning options for its mem-
ory management . Since our Java implementation of
elPrep suffers from a significantly higher average maxi-
mum memory use than the C++17 and Go implementa-
tions, we have investigated two of these options in more
•The string deduplication option identifies strings
with the same contents during garbage collection,
and subsequently removes the redundancy by letting
these strings share the same underlying character
arrays. Since a significant portion of read data in
SAM/BAM files is represented by strings, it seemed
potentially beneficial to use this option.
•The minimum and maximum allowed percentage of
free heap space after garbage collection can be
configured using the “MinFreeHeap” and
“MaxFreeHeap” options to minimze the heap size.
We ran the Java benchmark 30 times each with the
following cofigurations: with the default options; with
just the string deduplication option; with just the free-
heap options; and with both the string deduplication
and the free-heap options. For the free-heap options, we
followed the recommendation of the Java documenta-
tion to reduce the heap size as far as possible without
Table 1 Performance results for the different memory allocators
used in the C++17 benchmarks: (1) default allocator,
(2) tbbmalloc, (3) tcmalloc, (4) jemalloc
Average runtime Average memory Product
(1) 16 mins 57.467 secs 233.63 GB 66.03 GBh
(2) 16 mins 26.450 secs 233.51 GB 63.96 GBh
(3) 11 mins 24.809 secs 246.78 GB 46.94 GBh
(4) 10 mins 23.603 secs 255.48 GB 44.26 GBh
causing too much performance regression. The mea-
surements are shown in Table 2: The free-heap options
show no observable impact on the runtime perfor-
mance or the memory use, and the string deduplica-
tion option increases the average elapsed wall-clock time
with a minor additional increase in memory use. Accord-
ing to the listed GBh values, Java with default options
Due to the concurrency and parallelism of Go’s and
Java’s garbage collectors, the elPrep reimplementations in
these programming languages perform significantly faster
than the C++17 implementation which relies on reference
counting. Since the Go implementation uses significantly
less heap memory than the Java implementation, we there-
fore decided to base the official elPrep implementation
since version 3.0 on Go.
Based on our positive experiences, we recommend
authors of other bioinformatics tools for processing
SAM/BAM data, and potentially also other sequencing
data formats, to also consider Go as an implementation
language. Previous bioinformatics tools that are imple-
mented in Go include bíogo , Fastcov , SeqKit ,
and Vcfanno , among others.
Existing literature on comparing programming languages
for performance strives to replicate algorithm or ker-
nel implementations as close to each other as possi-
ble across different programming languages, to ensure
fair comparisons of the underlying compiler and run-
time implementations. We focused on taking advantage
of the respective strengths of the different programming
languages and their libraries instead. Eventually, a reim-
plementation of elPrep would have to do this anyway to
achieve optimal performance, so this approach results in a
more appropriate assessment for our purpose. For exam-
ple, in C++17 we have used Intel’s Threading Building
Blocks as an advanced library for parallel programming,
and benchmarked different memory allocators optimized
for multi-threaded programs; in Go, we have relied on
Table 2 Performance results for the different memory
management options used in the Java benchmarks: (1) default
options, (2) with string deduplication, (3) with heap-free options,
(4) with string deduplication and heap-free options
Average runtime Average memory Product
(1) 6 mins 54.546 secs 335.46 GB 38.63 GBh
(2) 7 mins 30.815 secs 338.74 GB 42.42 GBh
(3) 6 mins 55.842 secs 335.45 GB 38.75 GBh
(4) 7 mins 25.415 secs 338.74 GB 41.91 GBh
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 8 of 10
its concurrency support through goroutines and channels
for communicating between them; and in Java, we have
based elPrep on its framework to support functional-
style operations on streams of elements in the package
java.util.Stream introduced in Java 8.
The benchmarks have all been performed on a Supermi-
cro SuperServer 1029U-TR4T node with two Intel Xeon
Gold 6126 processors consisting of 12 processor cores
each, clocked at 2.6 gigahertz (GHz), with 384 GB RAM.
The operating system used for the benchmarks is the
CentOS 7 distribution of Linux.
We have used the following compilers and libraries:
•C++17: GNU g++ version 7.2.1
– Threading Building Blocks 2018 Update 2
– gperftools version 2.6.3
– jemalloc version 5.0.1
•Go: Official Go distribution version 1.9.5
•Java: Java Platform, Standard Edition (JDK) 10
For C++17, we additionally used the Intel Threading
Building Blocks, gperftools, and jemalloc libraries. The Go
and Java versions do not require additional libraries.
We verified that all implementations produce exactly
the same results by using the method described in our pre-
vious paper on elPrep . This method consists of the
1 We verify that the resulting BAM file is properly
sorted by coordinate order with samtools index.
2 We remove the program record identifier tag (PG)
and alphabetically sort the optional fields in each
read with biobambam.
3 We sort the BAM file by read name and store it in
SAM format with samtools sort.
4 Finally, we verify that the contents are identical with
the result of the original elPrep version with the Unix
1Object graphs with cycles cannot be easily reclaimed
using reference counting alone. However, such cyclic data
structures have not occurred yet in elPrep, which is why
we do not discuss this issue further in this paper.
2Specifically, Java uses concurrent, parallel Garbage-
firstgarbagecollection, whereas Go uses a more
traditional concurrent, parallel mark-and-sweep collector
3Other mature programming languages with support
for reference counting include Objective-C, Swift, and
Rust . However, in its algorithm for duplicate marking,
elPrep requires an atomic compare-and-swap operation
on reference-counted pointers, which does not exist in
those languages, but exists in C++17.
4We have not performed a detailed comparison against
the original version of elPrep implemented in Common
Lisp, but based on previous performance benchmarks,
the Go implementation seems to perform close to the
Common Lisp implementation.
BAM: The binary file format equivalent of SAM files; biobambam: A program
for processing BAM files; CentOS 7: A distribution of Linux; diff: A Unix tool for
reporting the differences between two text files; GATK: Genome Analysis
Toolkit: A software framework for variant discovery, genotyping, and other
kinds of analysis of sequencing data; GATK4: Genome Analysis Toolkit version
4; GBh: Gigabyte hours: A measure of memory size multiplied by time; GHz:
Gigahertz: A measure of computer processor speed; GNU g++: A C++
compiler; gperftools: Google Performance Tools: A C/C++ library that contains
tcmalloc and additional memory analysis tools; htsjdk: A Java library for
processing DNA sequencing data; jemalloc: A C/C++ library for managing
heap memory; NA12878: A publicly available DNA sequencing data set; PCR:
Polymerase chain reaction: A method for copying DNA segments; PG:
Program record identifier: Entries used in SAM/ BAM files for recording which
software programs were used to create them; RAM: Random access memory:
The main memory in a computer; SAM: Sequence Alignment/Map format: A
text file format for representing aligned DNA sequencing data; samtools: A
software tool for processing SAM/BAM files; Superserver 1029U-TR4T: A
particular server computer system sold by Supermicro; tbbmalloc: A C/C++
library for managing heap memory, part of Intel Threading Building Blocks;
tcmalloc: A C/C++ library for managing heap memory, part of gperftools;
Xeon Gold 6126: A particular server computer processor sold by Intel
The authors are grateful to the imec.icon GAP project members, and especially
Western Digital for providing the compute infrastructure for performing the
benchmarks described in this paper. The authors also thank Thomas J. Ashby
and Tom Haber for in-depth discussions about memory management
techniques in various programming languages.
No funding was received for this study.
Availability of data and materials
The source code for the different elPrep implementations are available at the
•Common Lisp: https://github.com/exascience/cl-elprep
•C++17, Java: https://github.com/exascience/elprep-bench
The five-step preparation pipeline benchmarked in this paper corresponds to
the pipeline implemented in the script run-wes-gatk.sh, which is
available at https://github.com/exascience/elprep/tree/v3.04/demo together
with its input files.
PC designed and performed the study, participated in the Common Lisp and
Go implementations of elPrep, implemented the C++17 and Java versions of
elPrep, and drafted the manuscript. CH designed the elPrep software
architecture and the benchmarked preparation pipeline, participated in the
Common Lisp and Go implementations of elPrep, and drafted the manuscript.
PC, CH and WV contributed to the final manuscript. All authors read and
approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 9 of 10
The authors are employees of imec, Belgium, and declare that they have no
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Received: 21 February 2019 Accepted: 15 May 2019
1. Li H, Hansaker B, Wysoker A, Fennell T, Ruan J, Homer N, Abecasis G,
Durbin R. TheSequenceAlignment/Mapformat and SAMtools. Bioinformatics.
2. Broad Institute. Picard. http://broadinstitute.github.io/picard. Accessed 19
3. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A,
Garimella K, Altshuler D, Gabriel S, Daly M, DePristo M. The Genome
Analysis Toolkit: A MapReduce framework for analyzing nextgeneration
DNA sequencing data. Genome Res. 2010;20:1297–303. https://doi.org/
4. Tarasov A, Vilella A, Cuppen E, Nijman I, Prins P. Sambamba: fast
processing of NGS alignment formats. Bioinformatics. 2015;31(12):
5. Tischler G, Leonard S. biobambam: tools for read pair collation based
algorithms on BAM files. Source Code Biol Med. 2014;9(13):. https://doi.
6. Jun G, Wing M, Abecasis G , Kang H. An efficient and scalable analysisframe work
forvariant extraction and refinement from population scale DNA sequence
data. Genome Res. 2015. https://doi.org/10.1101/gr.176552.114.
7. Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural
variant read extraction. Bioinformatics. 2014;30(17):2503–5. https://doi.
8. Herzeel C, Costanza P, Decap D, Fostier J, Verachtert W. elPrep 4: A
multithreaded framework for sequence analysis. PLoS ONE. 2019;14(2):.
9. Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade: scalable
sequence analysis with MapReduce. Bioinformatics. 2015;31(15):2482–8.
10. Nothaft FA, Massie M, Danford T, Zhang Z, Laserson U, Yeksigian C,
Kottalam J, Ahuja A, Hammerbacher J, Linderman M, Franklin M,
Joseph AD, Patterson DA. Rethinking data-intensive science using
scalable analytics systems. In: Proceedings of the 2015 International
Conference on Management of Data (SIGMOD ’15). New York: ACM; 2015.
11. Guimera R. bcbio-nextgen: Automated, distributed next-gen sequencing
pipeline. EMBnet.journal. 2012;17:30. https://doi.org/10.14806/ej.17.B.286.
12. Niemenmaa M, Kallio A, Schumacher A, Klemela P, Korpelainen E,
Heljanko K. Hadoop-BAM: directly manipulating next generation
sequencing data in the cloud. Bioinformatics. 2012;28(6):876–7. https://
13. Deng L, Huang G, Zhuang Y, Wei J, Yan Y. Higene: A high-performance
platform for genomic data analysis; 2017. https://doi.org/10.1109/BIBM.
14. Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade-RNA:
Parallel variant calling from transcriptomic data using MapReduce. PLOS
ONE. 2017;12(3):. https://doi.org/10.1371/journal.pone.0174575.
15. Weeks N, Luecke G. Cluster computing - The Journal of Networks
Software Tools and Applications. 2017;20(3):1869–80. https://doi.org/10.
16. Herzeel C, Costanza P, Decap D, Fostier J, Reumers J. elPrep:
High-performance preparation of sequence alignment/map files for
variant calling. PLoS ONE. 2015;10(7):. https://doi.org/10.1371/journal.
17. Van der Auwera GA, Carmeiro MO, Hartl C, Poplin R, del Angel G,
Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E,
Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to
high confidence variant calls: the Genome Analysis Toolkit best practices
pipeline. Curr Protoc Bioinform. 2013;43(1):11–101111033. https://doi.
18. Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Performance
analysis of a parallel, multi-node pipeline for DNA sequencing. In:
Proceedings of the 11th International Conference on Parallel Processing
and Applied Mathematics (PPAM):6-9 September 2015. Krakow: LNCS,
Springer; 2015. p. 233–42. https://doi.org/10.1007/978-3-319- 32152-3_22.
19. Jones R, Hosking A, Moss E. The Garbage Collection Handbook. Boca
Raton: CRC Press; 2012.
20. Harbison III SP, Steele Jr GL. C — A Reference Manual, Fifth Edition. Upper
Saddle River: Prentice Hall; 2002.
21. Gosling J, Joy B, Steele Jr GL, Bracha G, Buckley A. The Java Language
Specification, Java SE 8 Edition. Upper Saddle River: Addison-Wesley
22. Donovan AAA, Kernighan BW. The Go Programming Language. Upper
Saddle River: Addison-Wesley Professional; 2015.
23. Steele Jr GL. Common Lisp, The Language, Second Edition. Boston: Digital
24. Fourment M, Gillings MR. A comparison of common programming
languages used in bioinformatics. BMC Bioinformatics. 2008;9(1):. https://
gan Aruoba S, Fernández-Villaverde J. A comparison of programming
languages in economics. J Econ Dyn Control. 2015;58:265–73.
26. MoreiraJE, Midkiff SP, Gupta M. A comparison of Java, C/C++, and FORTRAN
for numerical computing. IEEE Antennas Propag Mag. 1998;40(5):102–5.
27. Biswa K, Jamatia B, Choudhury D, Borah P. Comparative analysis of C,
FORTRAN, C# and Java programming languages. Int J Comput Sci Inf
28. Hundt R. Loop recognition in C++/Java/Go/Scala. In: Proceedings of Scala
Days 2011; 2011. https://days2011.scalalang.org/sites/days2011/files/
ws3-1- Hundt.pdf. Accessed 19 Sept 2018.
29. Nanz S, Furia CA. A comparative study of programming languages in
Rosetta Code. In: 2015 IEEE/ACM 37th IEEE International Conference on
Software Engineering. Los Alamitos: IEEE; 2015. p. 778–88.https://doi.
30. Prechelt L. An empirical comparison of seven programming languages.
31. Togashi N, Klyuev V. Concurrency in Go and Java: Perf or ma nc e an al ys is . In :
2014 4th IEEEInternationalConferenceon Information Society and Technology.
Beijing: IEEE; 2014. https://doi.org/10.1109/ICIST.2014.6920368.
32 . Gouy I . The Computer Language Benchmarks Game . https://benchmarksgame-
team.pages.debian.net/benchmarksgame/. Accessed 19 Sept 2018.
33. Rubin BS, Christ AR, Bohrer KA. Java and the IBM San Francisco project.
IBM Syst J. 1998;37(3):365–71.
34. Samtools organisation. Htsjdk. https://github.com/samtools/htsjdk.
Accessed 19 Sept 2018.
35. Stroustrup B. A Tour of C++, Second Edition. Upper Saddle River:
Addison-Wesley Professional; 2018.
36. Icahn School of Medicine at Mount Sinai. High-coverage Whole Exome
Sequencing of CEPH/UTAH Female Individual (HapMap: NA12878).
https://www.ncbi.nlm.nih.gov/sra/SRX731649. Accessed 19 Sept 2018.
37. Georges A, Buytaert D, Eeckhout L. Adding rigorous statistics to the Java
benchmarker’s toolbox. In: Companion to the 22nd ACM SIGPLAN
Conference on Object-oriented Programming Systems and Applications.
New York: ACM; 2007. p. 793–4. https://doi.org/10.1145/1297846.
38. Herzeel C. elPrep – Execution Command Options. https://github.com/
ExaScience/elprep/tree/2.61#execution-command- options. Accessed 19
39. Berger ED, McKinley KS, Blumofe RD, Wilson PR. Hoard: a scalable
memory allocator for multithreaded applications. In: Proceedings of the
Ninth International Conference on Architectural Support for
Programming Languages and Operating Systems. New York: ACM; 2000.
40. Reinders J. Intel Threading Building Blocks. Sebastopol: O’Reilly; 2007.
41. gperftools. https://github.com/gperftools/gperftools. Accessed 19 Sept2018.
42. jemalloc. http://jemalloc.net. Accessed 19 Sept 2018.
43. Java Platform, Standard Edition Tools Reference. https://docs.oracle.com/
javase/10/tools/java.htm. Accessed 19 Sept 2018.
44. Kortschak RD, Snyder JB, Maragkakis M, Adelson DL. bíogo: a simple
high-performance bioinformatics toolkit for the Go language. J Open
Source Softw. 2017;2(10):167. https://doi.org/10.21105/joss.00167.
Costanza et al. BMC Bioinformatics (2019) 20:301 Page 10 of 10
45. Shen W, Li Y. A novel algorithm for detecting multiple covariance and
clustering of biological sequences. Sci Rep. 2016;6:. https://doi.org/10.
46. Shen W, Le S, Li Y, Hu F. Seqkit: A cross-platform and ultrafast toolkit for
FASTA/Q file manipulation. PLoS ONE. 2016;11(10):. https://doi.org/10.
47. Pedersen BS, Layer RM, Quinlan AR. Vcfanno: fast, flexible annotation of
genetic variants. Genome Biol. 2016;17(1):118. https://doi.org/10.1186/
48. Detlefs D, Flood C, Heller S, Printezis T. Garbage-first garbage collection.
In: Proceedings of the 4th International Symposium on Memory
Managament. New York: ACM; 2004. https://doi.org/10.1145/1029873.
49. Hudson RL. Getting to Go. https://blog.golang.org/ismmkeynote.
Accessed 19 Sept 2018.
50. Klabnik S, Nichols C. The Rust Programming Language. San Francisco: No
Starch Press; 2018.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at