PreprintPDF Available

# Worst-Case Efficient Sorting with QuickMergesort

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

## Abstract and Figures

The two most prominent solutions for the sorting problem are Quicksort and Mergesort. While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but needs extra space for a linear number of elements. Worst-case efficient in-place sorting, however, remains a challenge: the standard solution, Heapsort, suffers from a bad cache behavior and is also not overly fast for in-cache instances. In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be implemented in place. Our new variant applies the median-of-medians algorithm for selecting pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most $n \log n + 1.6n$ comparisons for $n$ large enough. We experimentally confirm the theoretical estimates and show that the new algorithm outperforms Heapsort by far and is only around 10% slower than Introsort (std::sort implementation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate the worst case, which is only around 10% slower than the average case. In particular, the new algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort.
Content may be subject to copyright.
arXiv:1811.00833v1 [cs.DS] 2 Nov 2018
Worst-Case Eﬃcient Sorting with QuickMergesort
Stefan EdelkampArmin Weiß
Abstract
The two most prominent solutions for the sorting problem are Quicksort and Mergesort.
While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but
needs extra space for a linear number of elements. Worst-case eﬃcient in-place sorting, however,
remains a challenge: the standard solution, Heapsort, suﬀers from a bad cache behavior and is
also not overly fast for in-cache instances.
In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new
variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be
implemented in place. Our new variant applies the median-of-medians algorithm for selecting
pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most
nlog n+ 1.6ncomparisons for nlarge enough.
We experimentally conﬁrm the theoretical estimates and show that the new algorithm out-
performs Heapsort by far and is only around 10% slower than Introsort (std::sort implemen-
tation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate
the worst case, which is only around 10% slower than the average case. In particular, the new
algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort.
keywords: in-place sorting, quicksort, mergesort, analysis of algorithms
1 Introduction
Sorting elements of some totally ordered universe always has been among the most important tasks
carried out on computers. Comparison based sorting of nelements requires at least log n!nlog n
1.44ncomparisons (where log is base 2). Up to constant factors this bound is achieved by the classical
sorting algorithms Heapsort, Mergesort, and Quicksort. While Quicksort usually is considered
the fastest one, the O(nlog n)-bound applies only for its average case (both for the number of
comparisons and running time) – in the worst-case it deteriorates to a Θ(n2) algorithm. The
standard approach to prevent such a worst-case is Musser’s Introsort [29]: whenever the recursion
depth of Quicksort becomes too large, the algorithm switches to Heapsort (we call this the worst-
case stopper ). This works well in practice for most instances. However, on small instances Heapsort
is already considerably slower than Quicksort (in our experiments more than 30% for n= 210 ) and
on larger instances it suﬀers from its poor cache behavior (in our experiments more than eight times
slower than Quicksort for sorting 228 elements). This is also the reason why in practice it is mainly
used as a worst-case stopper in Introsort.
Another approach for preventing Quicksort’s worst case is by using the median-of-medians algo-
rithm [4] for pivot selection. However, choosing the pivot as median of the whole array yields a bad
average (and worst-case) running time. On the other hand, when choosing the median of a smaller
King’s College London, UK.
Universit¨at Stuttgart, Germany. Supported by the DFG grant DI 435/7-1.
1
sample as pivot, the average performance becomes quite good [25], but the guarantees for the worst
case become even worse.
The third algorithm, Mergesort, is almost optimal in terms of comparisons: it uses only nlog n
0.91ncomparisons in the worst-case to sort nelements. Moreover, it performs well in terms of
running time. Nevertheless, it is not used as worst-case stopper for Introsort because it needs extra
space for a linear number of data elements. In recent years, several in-place (we use the term
for at most logarithmic extra space) variants of Mergesort appeared, both stable ones (meaning
that the relative order of elements comparing equal is not changed) [18, 23, 16] and unstable ones
[6, 13, 16, 22]. Two of the most eﬃcient implementations of stable variants are Wikisort [28]
(based on [23]) and Grailsort [2] (based on [18]). An example for an unstable in-place Mergesort
implementation is in-situ Mergesort [13]. It uses Quick/Introselect [17] (std::nth element) to ﬁnd
the median of the array. Then it partitions the array according to the median (i.e., move all smaller
elements to the right and all greater elements to the left). Next, it sorts one half with Mergesort
using the other half as temporary space, and, ﬁnally, sort the other half recursively. Since the
elements in the temporary space get mixed up (they are used as “dummy” elements), this algorithm
is not stable. In-situ Mergesort gives an O(nlog n) bound for the worst case. As validated in our
experiments all the in-place variants are considerably slower than ordinary Mergesort.
When instead of the median an arbitrary element is chosen as the pivot, we obtain Quick-
Mergesort [11], which is faster on average – with the price that the worst-case can be quadratic.
QuickMergesort follows the more general concept of QuickXsort[11]: ﬁrst, choose a pivot element
and partition the array according to it. Then, sort one part with X and, ﬁnally, the other part
recursively with QuickXsort. As for QuickMergesort, the part which is currently not being sorted
can be used as temporary space for X.
Other examples for QuickXsort are QuickHeapsort [5, 9] and QuickWeakheapsort [10, 11] and
Ultimate Heapsort [21]. QuickXsort with median-of-npivot selection uses at most nlog n+cn+o(n)
comparisons on average to sort nelements given that X also uses at most nlog n+cn +o(n)
comparisons on average [11]. Moreover, recently Wild [34] showed that, if the pivot is selected as
median of some constant size sample, then the average number of comparisons of QuickXsort is only
some small linear term (depending on the sample size) above the average number of comparisons
of X(for the median-of-three case see also [12]). However, as long as no linear size samples are
used for pivot selection, QuickXsort does not provide good bounds for the worst case. This defect
is overcome in Ultimate Heapsort [21] by using the median of the whole array as pivot. In Ultimate
Heapsort the median-of-medians algorithms [4] (which is linear in the worst case) is used for ﬁnding
the median, leading to an nlog n+O(n) bound for the number of comparisons. Unfortunately, due
to the large constant of the median-of-medians algorithm, the O(n)-term is quite big.
Contribution. In this work we introduce median-of-medians QuickMergesort (MoMQuickMerge-
sort) as a variant of QuickMergesort using the median-of-medians algorithms for pivot selection.
The crucial observation is that it is not necessary to use the median of the whole array as pivot, but
only the guarantee that the pivot is not very far oﬀ the median. This observation allows to apply the
median-of-medians algorithm to smaller samples leading to both a better average- and worst-case
performance. Our algorithm is based on a merging procedure introduced by Reinhardt [32], which
requires less temporary space than the usual merging. A further improvement, which we call un-
dersampling (taking less elements for pivot selection into account), allows to reduce the worst-case
number of comparisons down to nlog n+ 1.59n+O(n0.8). Moreover, we heuristically estimate the
average case as nlog n+ 0.275n+o(n) comparisons. The good average case comes partially from the
fact that we introduce a new way of adaptive pivot selection for the median-of-medians algorithm
(compare to [1]). Our experiments conﬁrm the theoretical and heuristic estimates and also show
2
that MoMQuickMergesort is competitive to other algorithms (for n= 228 more than 7 times faster
than Heapsort and around 10% slower than Introsort (std::sort – throughout this refers to its
libstdc++ implementation)). Moreover, we apply MoMQuickMergesort (instead of Heapsort) as a
worst-case stopper for Introsort (std::sort). The results are striking: on special permutations, the
new variant is up to six times faster than the original version of std::sort.
Outline. In Section 2, we recall QuickMergesort and the median-of-medians algorithm. In Sec-
tion 3, we describe median-of-medians QuickMergesort, introduce the improvements and analyze
the worst-case and average-case behavior. Finally, in Section 4, we present our experimental results.
2 Preliminaries
Throughout we use standard Oand Θ notation as deﬁned e. g. in [8]. The logarithm log always refers
to base 2. For a background on Quicksort and Mergesort we refer to [8] or [24]. A pseudomedian of
nine (resp. ﬁfteen) elements is computed as follows: group the elements in groups of three elements
and compute the median of each group. The pseudomedian is the median of these three (resp. ﬁve)
medians.
Throughout, in our estimates we assume that the median of three (resp. ﬁve) elements is com-
puted using three (resp. seven) comparisons no matter on the outcome of previous comparisons.
This allows a branch-free implementation of the comparisons.
In this paper we have to deal with simple recurrences of two types, which both have straightfor-
ward solutions:
Lemma 2.1. Let 0< α, β, δ with α+β < 1,γ= 1 α, and A, C, D, N0Nand
T(n)T(αn+A) + T(βn+A) + Cn +D
Q(n)Q(αn+A) + γn log(γn) + Cn +O(nδ)
for nN0and T(n), Q(n)Dfor nN0(N0large enough). Moreover, let ζRsuch that
αζ+βζ= 1 (notice that ζ < 1). Then
T(n)Cn
1αβ+O(nζ)and
Q(n)nlog n+αlog α
γ+log γ+C
γn+O(nδ).
Proof. It is well-known that T(n) has a linear solution. Therefore, (after replacing T(n) by a
reasonably smooth function) T(αn+A) and T(αn) diﬀer by at most some constant. Thus, after
increasing D, we may assume that T(n) is of the simpler form
T(n)T(αn) + T(βn) + C n +D.(1)
We can split (1) into two recurrences
TC(n)T(αn) + T(βn) + C n and TD(n)T(αn) + T(βn) + D
with TC(n) = 0 and TD(n)Dfor nN0. For TCwe get the solution TC(n)C n
1αβ. By the
generalized Master theorem [20], it follows that TD∈ O(nζ) where ζRsatisﬁes αζ+βζ= 1.
Thus,
T(n)Cn
1αβ+O(nζ).
3
Now, let us consider the recurrence for Q(n). With the same argument as before we have
Q(n)Q(αn) + γn log(γn) + Cn +O(nδ). Thus, we obtain
Q(n)
logαn
X
i=0 αiγn log(αiγn) + C αin+O((αin)δ)
=nX
i0αiγ(log n+ilog(α) + log γ) + i+O(nδ)
=γ
1αnlog n+αγ log α
(α1)2+γlog γ
1α+C
1αn+O(nδ)
=nlog n+αlog α
1α+ log γ+C
1αn+O(nδ).
This proves Lemma 2.1.
2.1 QuickMergesort
QuickMergesort follows the design pattern of QuickXsort: let X be some sorting algorithm (in our
case X = Mergesort). QuickXsort works as follows: ﬁrst, choose some pivot element and partition
the array according to this pivot, i. e., rearrange it such that all elements left of the pivot are less
or equal and all elements on the right are greater than or equal to the pivot element. Then, choose
one part of the array and sort it with the algorithm X. After that, sort the other part of the array
recursively with QuickXsort. The main advantage of this procedure is that the part of the array
that is not being sorted currently can be used as temporary memory for the algorithm X. This
yields fast in-place variants for various external sorting algorithms such as Mergesort. The idea is
that whenever a data element should be moved to the extra (additional or external) element space,
instead it is swapped with the data element occupying the respective position in part of the array
which is used as temporary memory.
The most promising example for QuickXsort is QuickMergesort. For the Mergesort part we use
standard (top-down) Mergesort, which can be implemented using mextra element spaces to merge
two arrays of length m: after the partitioning, one part of the array – for a simpler description we
assume the ﬁrst part – has to be sorted with Mergesort (note, however, that any of the two sides
can be sorted with Mergesort as long as the other side contains at least n/3 elements). In order
to do so, the second half of this ﬁrst part is sorted recursively with Mergesort while moving the
elements to the back of the whole array. The elements from the back of the array are inserted as
dummy elements into the ﬁrst part. Then, the ﬁrst half of the ﬁrst part is sorted recursively with
Mergesort while being moved to the position of the former second half of the ﬁrst part. Now, at the
front of the array, there is enough space (ﬁlled with dummy elements) such that the two halves can
be merged. The executed stages of the algorithm QuickMergesort are illustrated in Figure 1.
2.2 The median-of-medians algorithm
The median-of-medians algorithm solves the selection problem: given an array A[1,...,n] and an
integer k∈ {1,...,n}ﬁnd the k-th element in the sorted order of A. For simplicity let us assume
that all elements are distinct – in Section 2.3 we show how to deal with the general case with
duplicates.
The basic variant of the median-of-medians algorithm [4] (see also [8, Sec. 9.3]) works as follows:
ﬁrst, the array is grouped into blocks of ﬁve elements. From each of these blocks the median is
selected and then the median of all these medians is computed recursively. This yields a provably
4
11 4 5 6 10 9 2 3 1 0 87
3 2 4 5 6 0 1 9 10 11 87
| {z }
sort recursively with Mergesort
3 2 4 11 9 10 8 70 1 5 6
sort recursively with Mergesort
| {z }
9 10 8 11 23470 1 5 6
| {z } | {z }
merge two parts
0 1 2345 6 711 9 8 10
|{z }
sort recursively with QuickMergesort
Figure 1: Example for the execution of QuickMergesort. Here 7 is chosen as pivot.
good pivot for performing a partitioning step. Now, depending on which side the k-th element is,
recursion takes place on the left or right side. It is well-known that this algorithm runs in linear
time with a rather big constant in the O-notation. We use a slight improvement:
Repeated step algorithm. Instead of grouping into blocks of 5 elements, we follow [7] and group
into blocks of 9 elements and take the pseudomedian (“ninther”) into the sample for pivot selection.
This method guarantees that every element in the sample has 4 elements less or equal and 4 element
greater or equal to it. Thus, when selecting the pivot as median of the sample of n/9 elements, the
guarantee is that at least 2n/9 elements are less or equal and the same number greater or equal to
the pivot. Since there might remain 8 elements outside the sample we obtain the recurrence
TMoM(n)TMoM
7n
9+8
+TMoMjn
9k+20n
9,
where 4n/3 of 20n/9 is due to ﬁnding the pseudomedians and 8n/9 is for partitioning the remaining
(non-pseudomedian) elements according to the pivot (notice that also some of the other elements
are already known to be greater/smaller than the pivot; however, using this information would
introduce a huge bookkeeping overhead). Thus, by Lemma 2.1, we have:
Lemma 2.2 ([1, 7]). TMoM(n)20n+O(nζ)where ζ0.78 satisﬁes (7/9)ζ+ (1/9)ζ= 1.
Adaptive pivot selection. For our implementation we apply a slight improvement over the basic
median-of-medians algorithm by using the approach of adaptive pivot selection, which is ﬁrst used
in the Floyd-Rivest algorithm [14, 15], later applied to smaller samples for Quickselect [26, 27], and
recently applied to the median-of-medians algorithm [1]. However, we use a diﬀerent approach than
in [1]: in any case we choose the sample of size n/9 as pseudomedians of nine. Now, if the position
we are looking for is on the far left (left of position 2n/9), we do not choose the median of the sample
as pivot but a smaller position: for searching the k-th element with k2n/9, we take the k/4-th
element of the sample as pivot. Notice that for k= 2n/9, this is exactly the median of the sample.
Since every element of the sample carries at least four smaller elements with it, this guarantees that
5
k
4·4 = kelements are smaller than or equal to the pivot – so the k-th element will lie in the left
part after partitioning (which is presumably the smaller one). Likewise when searching a far right
position, we proceed symmetrically.
Notice that this optimization does not improve the worst-case but the average case (see Sec-
tion 3.4).
2.3 Dealing with duplicate elements
With duplicates we mean that not all elements of the input array are distinct. The number of
comparisons for ﬁnding the median of three (resp. ﬁve) elements does not change in the presence
of duplicates. However, duplicates can lead to an uneven partition. The standard approach in
Quicksort and Quickselect for dealing with duplicates is due to Bentley and McIlroy [3]: in each
partitioning step the elements equal to the pivot are placed in a third partition in the middle of the
array. Recently, another approach appeared in the Quicksort implementation pdqsort [30]. Instead
of three-way partitioning it applies the usual two-way partitioning moving elements equal to the
pivot always to the right side. This method is also applied recursively – with one exception: if the
new pivot is equal to an old pivot (this can be tested with one additional comparison), then all
elements equal to the pivot are moved to the left side, which then can be excluded from recursion.
We propose to follow the latter approach: usually all elements equal to the pivot are moved
to the right side – possibly leading to an even unbalanced partitioning. However, whenever a
partitioning step is very uneven (outside the guaranteed bounds for the pivot in the median-of-
medians algorithm), we know that this must be due to many duplicate elements. In this case we
immediately partition again with the same pivot but moving equal elements to the left.
3 Median-of-Medians QuickMergesort
Although QuickMergesort has an O(n2) worst-case running time, it is quite simple to guarantee
a worst-case number of comparisons of nlog n+O(n): just choose the median of the whole array
as pivot. This is essentially how in-situ Mergesort [13] works. The most eﬃcient way for ﬁnding
the median is using Quickselect [17] as applied in in-situ Mergesort. However, this does not allow
the desired bound on the number of comparisons (even not when using Introselect as in [13]).
Alternatively, we can use the median-of-medians algorithm described in Section 2.2, which, while
having a linear worst-case running time, on average is quite slow. In this section we describe a
variation of the median-of-medians approach which combines an nlog n+O(n) worst-case number
of comparisons with a good average performance (both in terms of running time and number of
comparisons).
3.1 Basic version
The crucial observation is that it is not necessary to use the actual median as pivot (see also our
preprint [12]). As remarked in Section 2.1, the larger of the two sides of the partitioned array can
be sorted with Mergesort as long as the smaller side contains at least one third of the total number
of elements. Therefore, it suﬃces to ﬁnd a pivot which guarantees such a partition. For doing so,
we can apply the idea of the median-of-medians algorithm: for sorting an array of nelements, we
choose ﬁrst n/3 elements as median of three elements each. Then, the median-of-medians algorithm
is used to ﬁnd the median of those n/3 elements. This median becomes the next pivot. Like for
the median-of-medians algorithm, this ensures that at least 2 ·n/6elements are less or equal and
at least the same number of elements are greater or equal than the pivot – thus, always the larger
6
part of the partitioned array can be sorted with Mergesort and the recursion takes place on the
smaller part. The advantage of this method is that the median-of-medians algorithm is applied to
an array of size only n/3 instead of n(with the cost of introducing a small overhead for ﬁnding the
n/3 medians of three) giving less weight to its big constant for the linear number of comparisons.
We call this algorithm basic MoMQuickMergesort (bMQMS).
For the median-of-medians algorithm, we use the repeated step method as described in Sec-
tion 2.2. Notice that for the number of comparisons the worst case for MoMQuickMergesort happens
if the pivot is exactly the median since this gives the most weight on the “slow” median-of-medians
algorithm. Thus, the total number TbMQMS(n) of comparisons of MoMQuickMergesort in the worst
case to sort nelements is bounded by
TbMQMS(n)TbMQMS n
2+TMS n
2+TMoM n
3+ 3 ·n
3+2
3n+O(1)
where TMS(n) is the number of comparisons of Mergesort and TMoM(n) the number of comparisons of
the median-of-medians alg orithm. The 3 ·n
3-term comes from ﬁnding n/3 medians of three elements,
the 2n/3 comparisons from partitioning the remaining elements (after ﬁnding the pivot, the correct
side of the partition is known for n/3 elements).
By Lemma 2.2 we have TMoM(n)20n+O(n0.8) and by [33] we have TMS(n)nlog n0.91n+1.
Thus, we can use Lemma 2.1 to resolve the recurrence, which proves (notice that for every comparison
there is only a constant number of other operations):
Theorem 3.1. Basic MoMQuickMergesort (bMQMS) runs in O(nlog n)time and performs at
most nlog n+ 13.8n+O(n0.8)comparisons.
3.2 Improved version
In [32], Reinhardt describes how to merge two subsequent sequences in an array using additional
space for only half the number of elements in one of the two sequences. The additional space should
be located in front or after the two sequences. To be more precise, assume we are given an array A
with positions A[1,...,t] being empty or containing dummy elements (to simplify the description, we
assume the ﬁrst case), A[t+ 1,...,t+] and A[t++ 1,...,t++r] containing two sorted sequences.
We wish to merge the two sequences into the space A[1,...,ℓ+r] (so that A[+r+ 1,...,t++r]
becomes empty). We require that r/2t < r.
First we start from the left merging the two sequences into the empty space until there remains
no empty space between the last element of the already merged part and the ﬁrst element of the
left sequence (ﬁrst step in Figure 2). At this point, we know that at least telements of the right
sequence have been introduced into the merged part (because when introducing elements from the
left part, the distance between the last element in the already merged part and the ﬁrst element
in the left part does not decrease). Thus, the positions t++ 1 through + 2tare empty now.
Since +t+ 1 +r+ 2t, in particular, A[+r] is empty now. Therefore, we can start
merging the two sequences right-to-left into the now empty space (where the right-most element
is moved to position A[+r] – see the second step in Figure 2). Once the empty space is ﬁlled,
we know that all elements from the right part have been inserted, so A[1,...,ℓ +r] is sorted and
A[+r+ 1,...,t++r] is empty (last step in Figure 2).
When choosing =r(in order to have a balanced merging and so an optimal number of
comparisons), we need one ﬁfth of the array as temporary space. Moreover, by allowing a slightly
imbalanced merge we can also tolerate slightly less temporary space. In the case that the temporary
space is large (tr), we apply the merging scheme from Section 2.1. The situation where the
7
Step 1:
Step 2:
After:
Figure 2: In the ﬁrst step the two sequences are merged starting with the smallest elements until the
empty space is ﬁlled. Then there is enough empty space to merge the sequences from the right into its ﬁnal
position.
temporary space is located after the two sorted sequences is handled symmetrically (note that this
changes the requirement to ℓ/2t < ℓ).
By applying this merging method in MoMQuickMergesort, we can use pivots having much weaker
guarantees: instead of one third, we need only one ﬁfth of the elements being less (resp. greater)
than the pivot. We can ﬁnd such pivots by applying an idea similar to the repeated step method
for the median-of-medians algorithm: ﬁrst we group into blocks of ﬁfteen elements and compute the
pseudomedians of each group. Then, the pivot is selected as median of these pseudomedians; it is
computed using the median-of-medians algorithm. This guarantees that at least 2 ·3·n
3·5·2n
5
elements are less than or equal to (resp. greater than or equal to) the pivot. Computing the
pseudomedian of 15 elements requires 22 comparisons (ﬁve times three comparisons for the medians
of three and then seven comparisons for the median of ﬁve). After that, partitioning requires
14/15ncomparisons. Since still in any case the larger half can be sorted with Mergesort, we get the
recurrence (we call this algorithm MoMQuickMergesort (MQMS))
TMQMS(n)TMQMS (n/2) + TMS (n/2) + TMoM (n/15) + 22
15n+14
15n+O(1)
TMQMS(n/2) + n
2log(n/2) 0.91n
2+20
15n+36
15n+O(n0.8)
nlog n0.91n2n+112
15 n+O(n0.8)(by Lemma 2.1)
This proves:
Theorem 3.2. MoMQuickMergesort (MQMS) runs in O(nlog n)time and performs at most nlog n+
4.57n+O(n0.8)comparisons.
Notice that when computing the median of pseudomedians of ﬁfteen elements, in the worst case
approximately the same eﬀort goes into the calculation of the pseudomedians and into the median-
of-medians algorithm. This indicates that it is an eﬃcient method for ﬁnding a pivot with the
guarantee that one ﬁfth are greater or equal (resp. less or equal).
3.3 Undersampling
In [1] Alexandrescu selects pivots for the median-of-medians algorithm not as medians of medians
of the whole array but only of n/φ elements where φis some large constant (similar as in [25] for
Quicksort). While this improves the average case considerably and still gives a linear time algorithm,
the hidden constant for the worst case is large. In this section we follow the idea to a certain extent
without loosing a good worst-case bound.
8
As already mentioned in Section 3.2, Reinhardt’s merging procedure [32] works also with less
than one ﬁfth of the whole array as temporary space if we do not require to merge sequences of
equal length. Thus, we can allow the pivot to be even further oﬀ the median – with the cost of
making the Mergesort part more expensive due to imbalanced merging. For θ1 we describe a
variant MQMSθof MoMQuickMergesort using only n/θ elements for sampling the pivot. Before we
analyze this variant, let us look at the costs of Mergesort with imbalanced merging: in order to apply
Reinhardt’s merging algorithm, we need that one part is at most twice the length of the temporary
space. We always apply linear merging (no binary insertion) meaning that merging two sequences
of combined length ncosts at most n1 comparisons. Thus, we get the following estimate for the
worst case number of comparisons TMS,b (n, m) of Mergesort where nis the number of elements to
sort and mis the temporary space (= “buﬀer”):
TMS,b(n, m)(TMS (n) if n4m
n+TMS,b(n2m, m) + TMS (2m) otherwise.
If n > 2m(otherwise, there is nothing to do), this means
TMS,b(n, m)l n
2mm2·T(2m) + Tnl n
2mm2·2m+
n
2m3
X
i=0
(n2im)
=l n
2mm2·T(2m) + Tnl n
2mm2·2m
+n·l n
2mm2m·l n
2mm2l n
2mm3.
(2)
For a moment let us assume that n
2m=Z(with 1). In this case we have
TMS,b n, n
2(2) ·Tn
+T2n
+n·(2) n
2·(2)(3)
(2) ·n
·log n
κ+2n
·log 2n
κ+n·(2) ·1
2+3
2
nlog n+n·f()
for nlarge enough where f:R>0Ris deﬁned by
f() = (κlog +ℓ/2 + 1/21/ℓ for 2
κotherwise
and nlog nκn is a bound for the number of comparisons of Mergesort for nlarge enough (κ0.91
by [33]). Now, for arbitrary mwe can use fas an approximation, which turns out to be quite
precise:
Lemma 3.3. Let fbe deﬁned as above and write n
2m=n
2m+ξ. Then for mand nlarge enough
we have
TMS,b(n, m)nlog n+n·fn
2m+m·ǫ(ξ)
where ǫ(ξ) = max 0,5ξ4 + (4 2ξ) log(2 ξ)ξ2
0.015 =: ǫ.
9
Proof. Let mbe large enough such that TMS (m)mlog mκm. If for n2mwe have
TMS,b (n, m) = TMS (n) and so the lemma holds. Now let n > 2m. By (2) we obtain
TMS,b (n, m)n
2m+ξ2TMS(2m) + TMS ((2 ξ)2m)(3)
+n·n
2m+ξ2m·n
2m+ξ2n
2m+ξ3.(4)
We examine the two terms (3) and (4) separately using TMS(n)nlog nκn:
(3) = n
2m+ξ2TMS(2m) + TMS ((2 ξ)2m)
n
2m+ξ22m(log 2mκ) + (2 ξ)2mlog((2 ξ)2m)κ
= (nlog 2mκn) + (ξ2) 2m(log 2mκ) + (2 ξ)2mlog((2 ξ)2m)κ
= (nlog nκn)nlog n
2m+ (2 ξ) 2m·(log((2 ξ)2m)κ)(log 2mκ)
= (nlog nκn)nlog n
2m+ (2 ξ) 2mlog(2 ξ)
and
(4) = n·n
2m+ξ2m·n
2m+ξ2n
2m+ξ3
=n·n
2m2m·n
2m2n
2m3+mξn
2m3+n
2m2ξ+ξ2
=n·n
2m2m
n·n
2m2
5n
2m+ 6+m5ξξ2
=n·n
2·2m+1
23·2m
n+m5ξξ2.
Thus,
TMS,b(n, m)nlog n+n·fn
2m(3) + (4) nlog n+n·fn
2m
(2 ξ) 2mlog(2 ξ) + m5ξξ24m.
This completes the proof of Lemma 3.3.
For selecting the pivot in QuickMergesort, we apply the procedure of Section 3.2 to n/θ elements
(for some parameter θR,θ1): we select n/θ elements from the array, group them into
groups of ﬁfteen elements, compute the pseudomedian of each group, and take the median of those
pseudomedians as pivot. We call this algorithm MoMQuickMergesort with undersampling factor θ
(MQMSθ). Note that MQMS1= MQMS. For its worst case number of comparisons we have
TMQMSθ(n)max
1
5θα1
2
TMQMSθ(αn) + TMS,b (n(1 α), αn)
+22
15θn+20
15θn+11
15θn+O(n0.8)
where the 22
15θnis for ﬁnding the pseudomedians of ﬁfteen, the 20
15θn+O(n0.8) is for the median-of-
medians algorithm called on n
15θelements and 11
15θnis for partitioning the remaining elements.
Now we plug in the bound of Lemma 3.3 for TMS,b(n, m) with =1α
2αand apply Lemma 2.1:
TMQMSθ(n)max
1
5θα1
2
TMQMSθ(αn) + (1 α)nlog((1 α)n)
10
+ (1 α)nf1α
2α+ǫ+n·1 + 41
15θ+O(n0.8)
nlog n+n·max
1
5θα1
2
g(α, θ) + O(n0.8)
for
g(α, θ) = αlog(α)
1α+ log (1 α) + f1α
2α+1
1α·1 + 41
15θ+ǫ.
In order to ﬁnd a good undersampling factor, we wish to ﬁnd a value for θminimizing max 1
5θα1
2g(α, θ).
While we do not have a formal proof, intuitively the maximum should be either reached for α= 1/2
(if θis small) or for α= 1/(5θ) (if θis large) – see Figure 4 for a special value of θ. Moreover,
notice that we are dealing with an upper bound on TMQMSθ(n) only (with a small error due to
Lemma 3.3 and the bound nlog nκn for TMS(n)), so even if we could ﬁnd the θwhich minimizes
max 1
5θα1
2g(α, θ), this θmight not be optimal.
We proceed as follows: ﬁrst, we compute the point θopt where the two curves in Figure 3 in-
tersect. For this particular value of θ, we then show that indeed max 1
5θα1
2g(α, θ) = g(1/2, θ).
Since θ7→ g(1/2, θ) is monotonically decreasing (this is obvious) and θ7→ g(1/(5θ), θ ) is monoton-
ically increasing for θ2.13 (veriﬁed numerically), this together shows that max 1
5θα1
2g(α, θ) is
minimized at the intersection point.
1.0 1.5 2.0 2.5 3.0 3.5
undersampling factor θ
1.5
2.0
2.5
3.0
g(α, θ)
α= 1/2
α= 1/(5θ)
Figure 3: θ7→ g(α, θ) for α= 1/2 and α= 1/(5θ).
We compute the intersection point numerically as θopt 2.219695. For θopt we verify (using
Wolfram|Alpha [19]), that the maximum max 1
5θα1
2g(α, θ) is attained at α= 1/(5θopt) and that
g(1/(5θopt), θopt )1.56780 and g(1/2, θopt)1.56780. Thus, we have established the optimality
of θopt even though we have not computed g(α, θ) for α6∈ {1/5,1/(5θ)}and θ6=θopt. (In the
mathoverﬂow question [31], this value is veriﬁed analytically – notice that there g(α, θ) is slightly
diﬀerent giving a diﬀerent θ.)
For implementation reasons we want θto be a multiple of 1/30. Therefore, we propose θ=
11/5 – a choice which is only slightly smaller than the optimal value and conﬁrmed experimentally
(Figure 5). Again for this ﬁxed θ, we verify that indeed the maximum is at α= 1/2 and that
g(1/(5θ), θ)1.57 and g(1/2, θ)1.59, see Figure 4. Thus, up to the small diﬀerence 0.02, we
know that θ= 11/5 is optimal.
For this ﬁxed value of θ= 11/5 we have thus computed max 1
5θα1
2g(α, θ)1.59, which in
turn gives us a bound on TMQMS11/5(n).
11
0.1 0.2 0.3 0.4 0.5
α
1.0
1.2
1.4
1.6
g(α, θ)
Figure 4: α7→ g(α, θ) for θ= 11/5 with α[1/(5θ),1/2] reaches its maximum 1.59 for α= 1/2.
Theorem 3.4. MoMQuickMergesort with undersampling factor θ= 11/5(MQMS11/5) runs in
O(nlog n)time and performs at most nlog n+ 1.59n+O(n0.8)comparisons.
3.4 Heuristic estimate of the average case
It is hard to calculate an exact average case since at none but the ﬁrst stage during the execution of
the algorithm we are dealing with random inputs. We still estimate the average case by assuming
that all intermediate arrays are random and applying some more heuristic arguments.
Average of the median-of-medians algorithm. On average we can expect that the pivot
returned from the median-of-medians procedure is very close to an actual median, which gives us
an easy recurrence showing that Tav,MoM (n)40/7n. However, we have to take adaptive pivot
selection into account. The ﬁrst pivot is the n/2±o(n)-th element with very high probability. Thus,
the recursive call is on n/2 + o(n) elements with ko(n) (or k=n/2 – by symmetry we assume
the ﬁrst case). Due to adaptive pivot selection, the array will be also split in a left part of size o(n)
(with the element we are looking for in it – this is guaranteed even in the worst case) and a larger
right part. This is because an o(n) order element of the n/15 pseudomedians of ﬁfteen is also an o(n)
order elements of the whole array. Thus, all successive recursive calls will be made on arrays of size
o(n). We denote the average number of comparisons of the median-of-median algorithm recursing
on an array of size o(n) as TnoRec
av,MoM(n).
We also have to take the recursive calls for pivot selection into account. The ﬁrst pivot is the
median of the sample; thus, the same reasoning as for Tav,MoM (n) applies. The second pivot is an
element of order o(n) out of n/18 + o(n) elements – so we are in the situation of TnoRec
av,MoM(n). Thus,
we get
Tav,MoM(n) = TnoRec
av,MoM n
2+Tav,MoM n
9+20n
9
and
TnoRec
av,MoM(n) = TnoRec
av,MoM(n
9) + 20n
9+o(n).
Hence, by Lemma 2.1, we obtain TnoRec
av,MoM(n) = 20n
9·9
8+o(n) = 5n
2+o(n) and
Tav,MoM(n) = Tav,MoM(n/9) + 20n
9+5n
4+o(n)
=125
32 n+o(n)4n+o(n).
12
Average of MoMQuickMergesort. As for the median-of-medians algorithm, we can expect
that the pivot in MoMQuickMergesort is always very close to the median. Using the bound for the
adaptive version of the median-of-medians algorithm, we obtain
Tav,MQMSθ(n) = Tav,MQMSθ(n/2) + n
2log(n/2) 1.24n
2+22
15θn+4
15θn+15θ1
15θn+o(n)
nlog n+n·1.24 + 10
3θ+o(n).
by Lemma 2.1 (here the 4n/15θis for the average case of the median-of-medians algorithm, the
other terms as before). This yields
Tav,MQMS(n)nlog n+ 2.094n+o(n)
(for θ= 1). For our proposed θ=11
5= 2.2 we have
Tav,MQMS11/5(n)nlog n+ 0.275n+o(n).
3.5 Hybrid algorithms
In order to achieve an even better average case, we can apply a trick similar to Introsort [29]. Be
aware, however, that this deteriorates the worst case slightly. We ﬁx some small δ > 0. The
algorithms starts by executing QuickMergesort with median of three pivot selection. Whenever the
pivot is contained in the interval [δn, (1 δ)n], the next pivot is selected again as median of three,
otherwise according to Section 3.3 (as median of pseudomedians of n/θ elements) – for the following
pivots it switches back to median of 3. When choosing δnot too small, the worst case number of
comparisons will be only approximately 2nmore than of MoMQuickMergesort with undersampling
(because in the worst case before every partitioning step according to MoMQuickMergesort with
undersampling, there will be one partitioning step with median-of-3 using ncomparisons), while the
average is almost as QuickMergesort with median-of-3. We use δ= 1/16. We call this algorithm
hybrid QuickMergesort (HQMS).
Another possibility for a hybrid algorithm is to use MoMQuickMergesort (with undersampling)
instead of Heapsort as a worst-case stopper for Introsort. We test both variants in our experiments.
3.6 Summary of algorithms
For the reader’s convenience we provide a short summary of the diﬀerent versions of MoMQuick-
Mergesort and the results we obtained in Table 1.
4 Experiments
Experimental setup. We ran thorough experiments with implementations in C++ with dif-
ferent kinds of input permutations. The experiments are run on an Intel Core i5-2500K CPU
(3.30GHz, 4 cores, 32KB L1 instruction and data cache, 256KB L2 cache per core and 6MB L3
shared cache) with 16GB RAM and operating system Ubuntu Linux 64bit version 14.04.4. We
used GNU’s g++ (4.8.4); optimized with ﬂags -O3 -march=native. For time measurements, we
used std::chrono::high resolution clock, for generating random inputs, the Mersenne Twister
pseudo-random generator std::mt19937. All time measurements were repeated with the same 100
deterministically chosen seeds – the displayed numbers are the averages of these 100 runs. Moreover,
13
Acronym Algorithm Results
bMQMS basic MoMQuickMergesort Theorem 3.1: κwc 13.8
MQMS MoMQuickMergesort
(uses Reinhardt’s merging with balanced merges)
Theorem 3.2: κwc 4.57,
Section 3.4: κac 2.094
MQMS11/5MoMQuickMergesort with undersampling factor 11/5
(uses Reinhardt’s merging with imbalanced merges)
Theorem 3.4: κwc 1.59,
Section 3.4: κac 0.275
HQMS hybrid QuickMergesort (combines median-of-3 Quick-
Mergesort and MQMS11/5)
Section 3.5: κwc 3.58 for
δlarge enough,
κac smaller
Table 1: Overview over the algorithms in this paper. For the worst case number of comparisons nlog n
κwcn+O(n0.8) and average case of roughly nlog nκacnthe results on κwc and κac are shown. The average
cases are only heuristic estimates.
for each time measurement, at least 128MB of data were sorted – if the array size is smaller, then
for this time measurement several arrays have been sorted and the total elapsed time measured. If
not speciﬁed explicitly, all experiments were conducted with 32-bit integers.
Implementation details. The code of our implementation of MoMQuickMergesort as well as the
other algorithms and our running time experiments is available at https://github.com/weissan/QuickXsort.
In our implementation of MoMQuickMergesort, we use the merging procedure from [13], which
avoids branch mispredictions. We use the partitioner from the libstdc++ implementation of std::sort.
For the running time experiments, base cases up to 42 elements are sorted with Insertionsort. For
the comparison measurements Mergesort is used down to size one arrays.
Simulation of a worst case. In order to experimentally conﬁrm our worst case bounds for
MoMQuickMergesort, we simulate a worst case. Be aware that it is not even clear whether in reality
there are input permutations where the bounds for the worst case of Section 3 are tight since when
selecting pivots the array is already pre-sorted in a particular way (which is hard to understand for a
thorough analysis). Actually in [7] it is conjectured that similar bounds for diﬀerent variants of the
median-of-medians algorithm are not tight. Therefore, we cannot test the worst-case by designing
particularly bad inputs. Nevertheless, we can simulate a worst-case scenario where every pivot is
chosen the worst way possible (according to the theoretical analysis). More precisely, the simulation
of the worst case comprises the following aspects:
For computing the k-th element of a small array (up to 30 elements) we additionally sort it
with Heapsort. This is because our implementation uses Introselect (std::nth element) for
arrays of size up to 30.
When measuring comparisons, we perform a random shuﬄe before every call to Mergesort. As
the average case of Mergesort is close to its worst case (up to approximately 0.34n), this gives
a fairly well approximation of the worst case. For measuring time we apply some simpliﬁed
shuﬄing method, which shuﬄes only few positions in the array.
In the median-of-medians algorithm, we do not use the pivot selected by recursive calls, but
use std::nth element to ﬁnd the worst pivot the recursive procedure could possibly select.
We do not count comparisons incurred by std::nth element. This is the main contribution
to the worst case.
14
As pivot for QuickMergesort (the basic and improved variant) we always use the real median
(this is actually the worst possibility as the recursive call of QuickMergesort is guaranteed to
be on the smaller half and Mergesort is not slower than QuickMergesort). In the version with
undersampling we use the most extreme pivot (since this is worse than the median).
We also make 100 measurements for each data point. When counting comparisons, we take the
maximum over all runs instead of the mean. However, this makes only a negligible diﬀerence
(as the small standard deviation in Table 2 suggests). When measuring running times we still
take the mean since the maximum reﬂects only the large standard deviation of Quickselect
(std::nth element), which we use to ﬁnd bad pivots.
The simulated worst cases are always drawn as dashed lines in the plots (except in Figure 5).
Diﬀerent undersampling factors. In Figure 5, we compare the (simulated) worst-case number
of comparisons for diﬀerent undersampling factors θ. The picture resembles the one in Figure 3.
However, all numbers are around 0.4 smaller than in Figure 3 because we used the average case of
Mergesort to simulate its worst case. Also, depending on the array size n, the point where the two
curves for α= 1/2 and α= 1/(5θ) meet diﬀers (αas in Section 3.3). Still the minimum is always
achieved between 2.1 and 2.3 (recall that we have to take the maximum of the two curves for the
same n) – conﬁrming the calculations in Section 3.3 and suggesting θ= 2.2 as a good choice for
further experiments.
1.6 1.8 2.0 2.2 2.4 2.6 2.8
undersampling factor θ
1.0
1.2
1.4
1.6
(comparisonsnlog n)/n
n=1048576 α= 1/2
n=4194304 α= 1/2
n=33554432 α= 1/2
n=1048576 α= 1/(5θ)
n=4194304 α= 1/(5θ)
n=33554432 α= 1/(5θ)
Figure 5: Coeﬃcient of the linear term of the number of comparisons in the simulated worst case for
diﬀerent undersampling factors.
Comparison of diﬀerent variants. In Figure 6, we compare the running times (divided by
nlog n) of the diﬀerent variants of MoMQuickMergesort including the simulated worst cases. We
see that the version with undersampling is the fastest both in the average and worst case. Moreover,
while in the average case the diﬀerences are rather small, in the worst case the improved versions
are considerably better than the very basic variant.
In Figure 7 we count the number of comparisons of the diﬀerent versions. The plot shows the
coeﬃcient of the linear term of the number of comparisons (i. e. the total number of comparisons
minus nlog nand then divided by n). Table 2 summarizes the results for n= 228. We see that
our theoretical estimates are close to the real values: for the average case, the diﬀerence is almost
15
210 213 216 219 222 225 228
number of elements n
3.2
3.4
3.6
3.8
4.0
4.2
4.4
4.6
4.8
5.0
time per nlog n[ns]
bMQMS
MQMS
MQMS11/5
MQMS11/5(wc)
MQMS (wc)
bMQMS (wc)
Figure 6: Running times divided by nlog nof diﬀerent MoMQuickMergesort variants and their simulated
worst cases.
210 213 216 219 222 225 228
number of elements n
0
1
2
3
4
(comparisonsnlog n)/n
bMQMS
MQMS
MQMS11/5
MQMS11/5(wc)
MQMS (wc)
Figure 7: Number of comparisons (linear term) of diﬀerent MoMQuickMergesort variants and their simu-
lated worst cases. The worst case of bMQMS is out of range.
negligible; for the worst case, the gap is slightly larger because we use the average case of Mergesort
as “simulation” for its worst case (notice that the diﬀerence between the average and our bound
for the worst case is approximately 0.34n). Moreover, the data suggest that for the worst case of
bMQMS we would have to do experiment with even larger arrays in order to get a good estimate of
the linear term of the number of comparisons. Also we see that the actual number of comparisons
approaches from below towards the theoretically estimated values – thus, the O(n0.8)-terms in our
estimates are most likely negative. Notice however that, as remarked above, we do not know whether
the bounds for the worst case are tight for real inputs.
16
Algorithm average case worst case
exp. theo. exp. theo.
bMQMS 2.772 ±0.02 – 13.05 ±0.17 13.8
MQMS 2.084 ±0.001 2.094 4.220 ±0.007 4.57
MQMS11/50.246 ±0.01 0.275 1.218 ±0.011 1.59
Table 2: Experimentally established linear term of the average and worst case (simulated) number of
comparisons for MoMQuickMergesort for n= 228. The ±values are the standard deviations. The respective
second columns show our theoretical estimates.
210 213 216 219 222 225 228
number of elements n
3.0
3.5
4.0
4.5
5.0
time per nlog n[ns]
In-situ Mergesort
Hybrid QMS
MQMS11/5
std::sort
std::partial sort
std::stable sort
Wikisort
MQMS11/5(wc)
Figure 8: Running times of MoMQuickMergesort(average and simulated worst case), hybrid QMS and
other algorithms for random permutations of 32-bit integers. Running times are divided by nlog n.
Comparison with other algorithms. We conducted experiments comparing MoMQuickMerge-
sort with the following other algorithms: Wikisort [28], in-situ Mergesort [13], std::partial sort
(Heapsort), std::stable sort (Mergesort) and std::sort (Introsort), see Figure 8. For the latter
three algorithms we use the libstdc++ implementations (from GCC version 4.8). We also ran exper-
iments with bottom-up Heapsort and Grailsort [2], but omitted the results because these algorithms
behave similar to Heapsort (resp. Wikisort). We see that MoMQuickMergesort with undersam-
pling (MQMS11/5) performs better than all other algorithms except hybrid QuickMergesort and
std::sort. Moreover, for n= 228 the gap between MQMS11/5and std::sort is only roughly 10%,
and the simulated worst-case of MQMS11/5is again only slightly over 10% worse than its average
case.
Notice that while all algorithms have a worst-case guarantee of O(nlog n), the good average case
behavior of std::sort comes with the cost of a bad worst case (see Figure 11) – the same applies to
in-situ Mergesort. Also notice that even the simulated worst case of MoMQuickMergesort is better
than the running times of in-situ Mergesort, Wikisort and Heapsort, i. e. all the other non-hybrid
17
210 212 214 216 218 220 222
number of elements n
6
8
10
12
14
16
18
20
time per nlog n[ns]
In-situ Mergesort
Hybrid QMS
MQMS11/5
std::sort
std::partial sort
std::stable sort
Wikisort
MQMS11/5(wc)
Figure 9: Running times of MoMQuickMergesort (average and simulated worst case), hybrid QMS and
other algorithms for random permutations 44-byte records with 4-byte keys. Running times are divided by
nlog n.
in-place algorithms we tested.
In Figure 9 and Figure 10 we measure running times when sorting large objects: in Figure 9 we
sort 44-byte records which are compared according to their ﬁrst 4 bytes. Figure 10 shows the results
when comparing pointers so such records which are allocated on the heap. In both cases std::sort
is the fastest, but MoMQuickMergesort with undersampling is still faster than std::partial sort,
in-situ Mergesort and Wikisort (unfortunately the latter did not run for sorting pointers).
In all the experiments, the standard deviations of most algorithms was negligible. Only std::sort,
hybrid QuickMergesort and the simulated worst cases showed a standard deviation which would be
visible in the plots. For the worst cases the large standard deviation is because we use std::nth element
for choosing bad pivots – thus, it is only an artifact of the worst case simulation. The standard
deviations of std::sort and hybrid QuickMergesort can be seen in Figure 11 below.
MQMS as worst-case stopper. In Figure 11 we compare the hybrid algorithms described in
Section 3.5: hybrid QuickMergesort and Introsort (std::sort) with MoMQuickMergesort as worst-
case stopper. We also include the original std::sort. We compare three diﬀerent kinds of inputs:
random permutations, merging two sorted runs of almost equal length (the ﬁrst run two elements
longer than the second run), and a median-of-three killer sequence, where the largest elements are
located in the middle and back of the array and in between the array is randomly shuﬄed.
While for random inputs the diﬀerence between the two variants of std::sort is negligible
and also hybrid QuickMergesort is only slightly slower (be aware of the scale of the plot), on the
merge permutation we see a considerable diﬀerence for large n. Except for very small n, hybrid
QuickMergesort is the fastest here. On the median-of-three killer input, the original std::sort is
outperformed by the other algorithms by a far margin. Also hybrid QuickMergesort is faster than
both variants of std::sort.
18
210 212 214 216 218 220 222
number of elements n
20
30
40
50
60
time per nlog n[ns]
In-situ Mergesort
Hybrid QMS
MQMS11/5
std::sort
std::partial sort
std::stable sort
MQMS11/5(wc)
Figure 10: Running times of MoMQuickMergesort (average and simulated worst case), hybrid QMS and
other algorithms for random permutations of pointers to records. Running times are divided by nlog n.
Wikisort did not work with pointers.
210 215 220 225
number of elements n
3.0
3.1
3.2
time per nlog n[ns]
210 215 220 225
number of elements n
2
3
4
time per nlog n[ns]
Hybrid QMS std::sort std::sort (MQMS worst case stopper)
210 215 220 225
number of elements n
3
4
5
6
7
8
time per nlog n[ns]
Figure 11: Running times of Introsort (std::sort) with Heapsort (original) and MoMQuickMergesort as
worst-case stopper and hybrid QuickMergesort. Left: random permutation, middle: merge, right: median-
of-three killer sequence. The vertical bars represent the standard deviations.
Figure 11 also displays the standard deviation as error bars. We see that for the special permuta-
tions the standard deviations are negligible – which is no surprise. For random permutation hybrid
QuickMergesort shows a signiﬁcantly higher standard deviation than the std::sort variants. This
could be improved by using large pivot samples (e. g. pseudomedian of 9 or 25). Notice that only
for n225 the standard deviations are meaningful since for smaller neach measurement is already
an average so the calculated standard deviation is much smaller than the real standard deviation.
19
5 Conclusion
We have shown that by using the median-of-medians algorithm for pivot selection QuickMergesort
turns into a highly eﬃcient algorithm in the worst case, while remaining competitive on average
and fast in practice. Future research might address the following points:
Although pseudomedians of ﬁfteen elements for sampling the pivot sample seems to be a good
choice, other methods could be investigated (e. g. median of nine).
The running time could be further improved by using Insertionsort to sort small subarrays.
Since the main work is done by Mergesort, any tuning to the merging procedure also would
directly aﬀect MoMQuickMergesort.
Also other methods for in-place Mergesort implementations are promising and should be de-
veloped further – in particular, the (unstable) merging procedure by Chen [6] seems to be a
good starting point.
To get out the most performance of modern multi-core processors, a parallel version of the
algorithm is desirable. For both Quicksort and Mergesort eﬃcient parallel implementations
are known. Thus, an eﬃcient parallel implementation of MoMQuickMergesort is not out of
reach. However, there is one additional diﬃculty to overcome: while in the Quicksort recursion
both parts can be sorted independently in parallel, in QuickMergesort this is not possible since
one part is necessary as temporary memory for sorting the other part with Mergesort.
References
[1] Andrei Alexandrescu. Fast deterministic selection. In Costas S. Iliopoulos, Solon P. Pissis,
Simon J. Puglisi, and Ra jeev Raman, editors, 16th International Symposium on Experimental
Algorithms, SEA 2017, June 21-23, 2017, London, UK, volume 75 of LIPIcs, pages 24:1–24:19.
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.
[2] Andrey Astrelin. Grailsort. Github repository at https://github.com/Mrrl/GrailSort.
[3] Jon Louis Bentley and M. Douglas McIlroy. Engineering a sort function. Softw., Pract. Exper.,
23(11):1249–1265, 1993.
[4] Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert E. Tarjan.
Time bounds for selection. Journal of Computer and System Sciences, 7(4):448–461, 1973.
[5] Domenico Cantone and Gianluca Cincotti. Quickheapsort, an eﬃcient mix of classical sorting
algorithms. Theor. Comput. Sci., 285(1):25–42, 2002.
[6] Jing-Chao Chen. A simple algorithm for in-place merging. Inf. Process. Lett., 98(1):34–40,
2006.
[7] Ke Chen and Adrian Dumitrescu. Select with groups of 3 or 4. In Frank Dehne, J¨org-R¨udiger
Sack, and Ulrike Stege, editors, Algorithms and Data Structures - 14th International Sym-
posium, WADS 2015, Victoria, BC, Canada, August 5-7, 2015. Proceedings, volume 9214 of
Lecture Notes in Computer Science, pages 189–199. Springer, 2015.
[8] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliﬀord Stein. Introduction
to Algorithms. The MIT Press, 3th edition, 2009.
20
[9] Volker Diekert and Armin Weiß. Quickheapsort: Modiﬁcations and improved analysis. In CSR,
pages 24–35, 2013.
[10] Stefan Edelkamp and Patrick Stiegeler. Implementing Heapsort with nlog n0.9nand Quick-
sort with nlog n+ 0.2ncomparisons. Journal of Experimental Algorithmics, 7:Article 5, 2002.
[11] Stefan Edelkamp and Armin Weiß. QuickXsort: Eﬃcient Sorting with nlog n1.399n+o(n)
Comparisons on Average. In Edward A. Hirsch, Sergei O. Kuznetsov, Jean-´
Eric Pin, and
Nikolay K. Vereshchagin, editors, CSR, volume 8476 of Lecture Notes in Computer Science,
pages 139–152. Springer, 2014.
[12] Stefan Edelkamp and Armin Weiß. Quickmergesort: Practically eﬃcient constant-factor opti-
mal sorting. CoRR, abs/1804.10062, 2018.
[13] Amr Elmasry, Jyrki Katajainen, and Max Stenmark. Branch mispredictions don’t aﬀect merge-
sort. In SEA, pages 160–171, 2012.
[14] Robert W. Floyd and Ronald L. Rivest. The algorithm SELECT - for ﬁnding the ith smallest
of n elements [M1] (algorithm 489). Commun. ACM, 18(3):173, 1975.
[15] Robert W. Floyd and Ronald L. Rivest. Expected time bounds for selection. Commun. ACM,
18(3):165–172, 1975.
[16] Viliam Geﬀert, Jyrki Kata jainen, and Tomi Pasanen. Asymptotically eﬃcient in-place merging.
Theor. Comput. Sci., 237(1-2):159–181, 2000.
[17] C. A. R. Hoare. Algorithm 65: Find. Commun. ACM, 4(7):321–322, July 1961.
[18] Bing-Chao Huang and Michael A. Langston. Fast stable merging and sorting in constant extra
space. Comput. J., 35(6):643–650, 1992.
[19] Wolfram Research, Inc. Wolfram|Alpha. Champaign, IL, 2018.
[20] Ming-Yang Kao. Multiple-size divide-and-conquer recurrences. SIGACT News, 28(2):67–69,
1997.
[21] Jyrki Katajainen. The Ultimate Heapsort. In CATS, pages 87–96, 1998.
[22] Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort. Nord. J.
Comput., 3(1):27–40, 1996.
[23] Pok-Son Kim and Arne Kutzner. Ratio based stable in-place merging. In Manindra Agrawal,
Ding-Zhu Du, Zhenhua Duan, and Angsheng Li, editors, Theory and Applications of Models
of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008.
Proceedings, volume 4978 of Lecture Notes in Computer Science, pages 246–257. Springer, 2008.
[24] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.
Addison Wesley Longman, 2nd edition, 1998.
[25] Noriyuki Kurosawa. Quicksort with median of medians is considered practical. CoRR,
abs/1608.04852, 2016.
[26] Conrado Martinez, Daniel Panario, and Alfredo Viola. Adaptive sampling for quickselect. In
Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA
2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 447–455, 2004.
21
[27] Conrado Mart´ınez, Daniel Panario, and Alfredo Viola. Adaptive sampling strategies for qicks-
elects. ACM Trans. Algorithms, 6(3):53:1–53:45, 2010.
[28] Mike McFadden. WikiSort. Github repository at https://github.com/BonzaiThePenguin/WikiSort.
[29] David R. Musser. Introspective sorting and selection algorithms. Software—Practice and Ex-
perience, 27(8):983–993, 1997.
[30] Orson Peters. pdqsort. Github repository at https://github.com/orlp/pdqsort.
[31] Iosif Pinelis. Computing minimum / maximum of strange two variable function (answer).
MathOverﬂow. URL: https://mathoverflow.net/q/306757 (visited on: 2018-07-25).
[32] Klaus Reinhardt. Sorting in-place with a worst case complexity of nlog n1.3n+o(log n)
comparisons and ǫn log n+o(1) transports. In ISAAC, pages 489–498, 1992.
[33] Eric W. Weisstein. Merge sort. From MathWorld—A Wolfram Web Resource. Url:
http://mathworld.wolfram.com/MergeSort.htmlLast visited on 07/25/2018.
[34] Sebastian Wild. Average cost of QuickXsort with pivot sampling. In James Allen Fill and
Mark Daniel Ward, editors, 29th International Conference on Probabilistic, Combinatorial and
Asymptotic Methods for the Analysis of Algorithms, AofA 2018, June 25-29, 2018, Uppsala,
Sweden, volume 110 of LIPIcs, pages 36:1–36:19. Schloss Dagstuhl - Leibniz-Zentrum fuer In-
formatik, 2018.
22
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The linear pivot selection algorithm, known as median-of-medians, makes the worst case complexity of quicksort be $\mathrm{O}(n\ln n)$. Nevertheless, it has often been said that this algorithm is too expensive to use in quicksort. In this article, we show that we can make the quicksort with this kind of pivot selection approach be efficient.
Conference Paper
Full-text available
In quicksort, due to branch mispredictions, a skewed pivot-selection strategy can lead to a better performance than the exact-median pivot-selection strategy, even if the exact median is given for free. In this paper we investigate the effect of branch mispredictions on the behaviour of mergesort. By decoupling element comparisons from branches, we can avoid most negative effects caused by branch mispredictions. When sorting a sequence of n elements, our fastest version of mergesort performs n log2n + O(n) element comparisons and induces at most O(n) branch mispredictions. We also describe an in-situ version of mergesort that provides the same bounds, but uses only O(log2n) words of extra memory. In our test computers, when sorting integer data, mergesort was the fastest sorting method, then came quicksort, and in-situ mergesort was the slowest of the three. We did a similar kind of decoupling for quicksort, but the transformation made it slower.
Preprint
Full-text available
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort-construction. Both are efficient algorithms that incur approximately n log n - 1.26n +o(n) comparisons on the average. A worst case of n log n + O(n) comparisons can be achieved without significantly affecting the average case. Furthermore, we describe an implementation of MergeInsertion for small n. Taking MergeInsertion as a base case for QuickMergesort, we establish a worst-case efficient sorting algorithm calling for n log n - 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs: when sorting integers it is slower by only 15% to STL-Introsort.
Conference Paper
Full-text available
We present a new analysis for QuickHeapsort splitting it into the analysis of the partition-phases and the analysis of the heap-phases. This enables us to consider samples of non-constant size for the pivot selection and leads to better theoretical bounds for the algorithm. Furthermore we introduce some modifications of QuickHeapsort, both in-place and using n extra bits. We show that on every input the expected number of comparisons is n lg n - 0.03n + o(n) (in-place) respectively n lg n -0.997 n+ o (n). Both estimates improve the previously known best results. (It is conjectured in Wegener93 that the in-place algorithm Bottom-Up-Heapsort uses at most n lg n + 0.4 n on average and for Weak-Heapsort which uses n extra-bits the average number of comparisons is at most n lg n -0.42n in EdelkampS02.) Moreover, our non-in-place variant can even compete with index based Heapsort variants (e.g. Rank-Heapsort in WangW07) and Relaxed-Weak-Heapsort (n lg n -0.9 n+ o (n) comparisons in the worst case) for which no O(n)-bound on the number of extra bits is known.
Conference Paper
Full-text available
First we present a new variant of Merge-sort, which needs only 1.25n space, because it uses space again, which becomes available within the current stage. It does not need more comparisons than classical Merge-sort. The main result is an easy to implement method of iterating the procedure in-place starting to sort 4/5 of the elements. Hereby we can keep the additional transport costs linear and only very few comparisons get lost, so that n log n–0.8n comparisons are needed. We show that we can improve the number of comparisons if we sort blocks of constant length with Merge-Insertion, before starting the algorithm. Another improvement is to start the iteration with a better version, which needs only (1+)n space and again additional O(n) transports. The result is, that we can improve this theoretically up to n log n –1.3289n comparisons in the worst case. This is close to the theoretical lower bound of n log n–1.443n. The total number of transports in all these versions can be reduced to n log n+O(1) for any >0.
Article
This report lists all corrections and changes to volumes 1 and 3 of The Art of Computer Programming, as of May 14, 1976. The changes apply to the most recent printings of both volumes (February and March, 1975); if you have an earlier printing there have been many other changes not indicated here. Volume 2 has been completely rewritten and its second edition will be published early in 1977. For a summary of the changes made to volume 2, see SIGSAM Bulletin 9, 4 (November 1975), p. 10f -- the changes are too numerous to list except in the forthcoming book itself.
Article
The early algorithms for in-place merging were mainly focused on the time complexity, whereas their structures themselves were ignored. Most of them therefore are elusive and of only theoretical significance. For this reason, the paper simplifies the unstable in-place merge by Geffert et al. [V. Geffert, J. Katajainen, T. Pasanen, Asymptotically efficient in-place merging, Theoret. Comput. Sci. 237 (2000) 159–181]. The simplified algorithm is simple yet practical, and has a small time complexity.
Article
We present an efficient and practical algorithm for the internal sorting problem. Our algorithm works in-place and, on the average, has a running-time of in the size n of the input. More specifically, the algorithm performs comparisons and element moves on the average. An experimental comparison of our proposed algorithm with the most efficient variants of Quicksort and Heapsort is carried out and its results are discussed.
Article
Two new linear-time algorithms for in-place merging are presented. Both algorithms perform at most (1 + t)m + n/2 2 + o(m) element comparisons, where m and n are the sizes of the input sequences, m ⩽ n, and t = ILlog 2(n/m)⌋. The first algorithm is for unstable merging and it carries out no more than 4(m + n) + o(n) element moves. The second algorithm is for stable merging and it accomplishes at most 15m + 13n + o(n) moves.