Content uploaded by Armin Weiß

Author content

All content in this area was uploaded by Armin Weiß on Nov 12, 2018

Content may be subject to copyright.

arXiv:1811.00833v1 [cs.DS] 2 Nov 2018

Worst-Case Eﬃcient Sorting with QuickMergesort

Stefan Edelkamp∗Armin Weiß†

Abstract

The two most prominent solutions for the sorting problem are Quicksort and Mergesort.

While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but

needs extra space for a linear number of elements. Worst-case eﬃcient in-place sorting, however,

remains a challenge: the standard solution, Heapsort, suﬀers from a bad cache behavior and is

also not overly fast for in-cache instances.

In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new

variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be

implemented in place. Our new variant applies the median-of-medians algorithm for selecting

pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most

nlog n+ 1.6ncomparisons for nlarge enough.

We experimentally conﬁrm the theoretical estimates and show that the new algorithm out-

performs Heapsort by far and is only around 10% slower than Introsort (std::sort implemen-

tation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate

the worst case, which is only around 10% slower than the average case. In particular, the new

algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort.

keywords: in-place sorting, quicksort, mergesort, analysis of algorithms

1 Introduction

Sorting elements of some totally ordered universe always has been among the most important tasks

carried out on computers. Comparison based sorting of nelements requires at least log n!≈nlog n−

1.44ncomparisons (where log is base 2). Up to constant factors this bound is achieved by the classical

sorting algorithms Heapsort, Mergesort, and Quicksort. While Quicksort usually is considered

the fastest one, the O(nlog n)-bound applies only for its average case (both for the number of

comparisons and running time) – in the worst-case it deteriorates to a Θ(n2) algorithm. The

standard approach to prevent such a worst-case is Musser’s Introsort [29]: whenever the recursion

depth of Quicksort becomes too large, the algorithm switches to Heapsort (we call this the worst-

case stopper ). This works well in practice for most instances. However, on small instances Heapsort

is already considerably slower than Quicksort (in our experiments more than 30% for n= 210 ) and

on larger instances it suﬀers from its poor cache behavior (in our experiments more than eight times

slower than Quicksort for sorting 228 elements). This is also the reason why in practice it is mainly

used as a worst-case stopper in Introsort.

Another approach for preventing Quicksort’s worst case is by using the median-of-medians algo-

rithm [4] for pivot selection. However, choosing the pivot as median of the whole array yields a bad

average (and worst-case) running time. On the other hand, when choosing the median of a smaller

∗King’s College London, UK.

†Universit¨at Stuttgart, Germany. Supported by the DFG grant DI 435/7-1.

1

sample as pivot, the average performance becomes quite good [25], but the guarantees for the worst

case become even worse.

The third algorithm, Mergesort, is almost optimal in terms of comparisons: it uses only nlog n−

0.91ncomparisons in the worst-case to sort nelements. Moreover, it performs well in terms of

running time. Nevertheless, it is not used as worst-case stopper for Introsort because it needs extra

space for a linear number of data elements. In recent years, several in-place (we use the term

for at most logarithmic extra space) variants of Mergesort appeared, both stable ones (meaning

that the relative order of elements comparing equal is not changed) [18, 23, 16] and unstable ones

[6, 13, 16, 22]. Two of the most eﬃcient implementations of stable variants are Wikisort [28]

(based on [23]) and Grailsort [2] (based on [18]). An example for an unstable in-place Mergesort

implementation is in-situ Mergesort [13]. It uses Quick/Introselect [17] (std::nth element) to ﬁnd

the median of the array. Then it partitions the array according to the median (i.e., move all smaller

elements to the right and all greater elements to the left). Next, it sorts one half with Mergesort

using the other half as temporary space, and, ﬁnally, sort the other half recursively. Since the

elements in the temporary space get mixed up (they are used as “dummy” elements), this algorithm

is not stable. In-situ Mergesort gives an O(nlog n) bound for the worst case. As validated in our

experiments all the in-place variants are considerably slower than ordinary Mergesort.

When instead of the median an arbitrary element is chosen as the pivot, we obtain Quick-

Mergesort [11], which is faster on average – with the price that the worst-case can be quadratic.

QuickMergesort follows the more general concept of QuickXsort[11]: ﬁrst, choose a pivot element

and partition the array according to it. Then, sort one part with X and, ﬁnally, the other part

recursively with QuickXsort. As for QuickMergesort, the part which is currently not being sorted

can be used as temporary space for X.

Other examples for QuickXsort are QuickHeapsort [5, 9] and QuickWeakheapsort [10, 11] and

Ultimate Heapsort [21]. QuickXsort with median-of-√npivot selection uses at most nlog n+cn+o(n)

comparisons on average to sort nelements given that X also uses at most nlog n+cn +o(n)

comparisons on average [11]. Moreover, recently Wild [34] showed that, if the pivot is selected as

median of some constant size sample, then the average number of comparisons of QuickXsort is only

some small linear term (depending on the sample size) above the average number of comparisons

of X(for the median-of-three case see also [12]). However, as long as no linear size samples are

used for pivot selection, QuickXsort does not provide good bounds for the worst case. This defect

is overcome in Ultimate Heapsort [21] by using the median of the whole array as pivot. In Ultimate

Heapsort the median-of-medians algorithms [4] (which is linear in the worst case) is used for ﬁnding

the median, leading to an nlog n+O(n) bound for the number of comparisons. Unfortunately, due

to the large constant of the median-of-medians algorithm, the O(n)-term is quite big.

Contribution. In this work we introduce median-of-medians QuickMergesort (MoMQuickMerge-

sort) as a variant of QuickMergesort using the median-of-medians algorithms for pivot selection.

The crucial observation is that it is not necessary to use the median of the whole array as pivot, but

only the guarantee that the pivot is not very far oﬀ the median. This observation allows to apply the

median-of-medians algorithm to smaller samples leading to both a better average- and worst-case

performance. Our algorithm is based on a merging procedure introduced by Reinhardt [32], which

requires less temporary space than the usual merging. A further improvement, which we call un-

dersampling (taking less elements for pivot selection into account), allows to reduce the worst-case

number of comparisons down to nlog n+ 1.59n+O(n0.8). Moreover, we heuristically estimate the

average case as nlog n+ 0.275n+o(n) comparisons. The good average case comes partially from the

fact that we introduce a new way of adaptive pivot selection for the median-of-medians algorithm

(compare to [1]). Our experiments conﬁrm the theoretical and heuristic estimates and also show

2

that MoMQuickMergesort is competitive to other algorithms (for n= 228 more than 7 times faster

than Heapsort and around 10% slower than Introsort (std::sort – throughout this refers to its

libstdc++ implementation)). Moreover, we apply MoMQuickMergesort (instead of Heapsort) as a

worst-case stopper for Introsort (std::sort). The results are striking: on special permutations, the

new variant is up to six times faster than the original version of std::sort.

Outline. In Section 2, we recall QuickMergesort and the median-of-medians algorithm. In Sec-

tion 3, we describe median-of-medians QuickMergesort, introduce the improvements and analyze

the worst-case and average-case behavior. Finally, in Section 4, we present our experimental results.

2 Preliminaries

Throughout we use standard Oand Θ notation as deﬁned e. g. in [8]. The logarithm log always refers

to base 2. For a background on Quicksort and Mergesort we refer to [8] or [24]. A pseudomedian of

nine (resp. ﬁfteen) elements is computed as follows: group the elements in groups of three elements

and compute the median of each group. The pseudomedian is the median of these three (resp. ﬁve)

medians.

Throughout, in our estimates we assume that the median of three (resp. ﬁve) elements is com-

puted using three (resp. seven) comparisons no matter on the outcome of previous comparisons.

This allows a branch-free implementation of the comparisons.

In this paper we have to deal with simple recurrences of two types, which both have straightfor-

ward solutions:

Lemma 2.1. Let 0< α, β, δ with α+β < 1,γ= 1 −α, and A, C, D, N0∈Nand

T(n)≤T(⌈αn⌉+A) + T(⌈βn⌉+A) + Cn +D

Q(n)≤Q(⌈αn⌉+A) + γn log(γn) + Cn +O(nδ)

for n≥N0and T(n), Q(n)≤Dfor n≤N0(N0large enough). Moreover, let ζ∈Rsuch that

αζ+βζ= 1 (notice that ζ < 1). Then

T(n)≤Cn

1−α−β+O(nζ)and

Q(n)≤nlog n+αlog α

γ+log γ+C

γn+O(nδ).

Proof. It is well-known that T(n) has a linear solution. Therefore, (after replacing T(n) by a

reasonably smooth function) T(⌈αn⌉+A) and T(⌊αn⌋) diﬀer by at most some constant. Thus, after

increasing D, we may assume that T(n) is of the simpler form

T(n)≤T(αn) + T(βn) + C n +D.(1)

We can split (1) into two recurrences

TC(n)≤T(αn) + T(βn) + C n and TD(n)≤T(αn) + T(βn) + D

with TC(n) = 0 and TD(n)≤Dfor n≤N0. For TCwe get the solution TC(n)≤C n

1−α−β. By the

generalized Master theorem [20], it follows that TD∈ O(nζ) where ζ∈Rsatisﬁes αζ+βζ= 1.

Thus,

T(n)≤Cn

1−α−β+O(nζ).

3

Now, let us consider the recurrence for Q(n). With the same argument as before we have

Q(n)≤Q(αn) + γn log(γn) + Cn +O(nδ). Thus, we obtain

Q(n)≤

logαn

X

i=0 αiγn log(αiγn) + C αin+O((αin)δ)

=nX

i≥0αiγ(log n+ilog(α) + log γ) + Cαi+O(nδ)

=γ

1−αnlog n+αγ log α

(α−1)2+γlog γ

1−α+C

1−αn+O(nδ)

=nlog n+αlog α

1−α+ log γ+C

1−αn+O(nδ).

This proves Lemma 2.1.

2.1 QuickMergesort

QuickMergesort follows the design pattern of QuickXsort: let X be some sorting algorithm (in our

case X = Mergesort). QuickXsort works as follows: ﬁrst, choose some pivot element and partition

the array according to this pivot, i. e., rearrange it such that all elements left of the pivot are less

or equal and all elements on the right are greater than or equal to the pivot element. Then, choose

one part of the array and sort it with the algorithm X. After that, sort the other part of the array

recursively with QuickXsort. The main advantage of this procedure is that the part of the array

that is not being sorted currently can be used as temporary memory for the algorithm X. This

yields fast in-place variants for various external sorting algorithms such as Mergesort. The idea is

that whenever a data element should be moved to the extra (additional or external) element space,

instead it is swapped with the data element occupying the respective position in part of the array

which is used as temporary memory.

The most promising example for QuickXsort is QuickMergesort. For the Mergesort part we use

standard (top-down) Mergesort, which can be implemented using mextra element spaces to merge

two arrays of length m: after the partitioning, one part of the array – for a simpler description we

assume the ﬁrst part – has to be sorted with Mergesort (note, however, that any of the two sides

can be sorted with Mergesort as long as the other side contains at least n/3 elements). In order

to do so, the second half of this ﬁrst part is sorted recursively with Mergesort while moving the

elements to the back of the whole array. The elements from the back of the array are inserted as

dummy elements into the ﬁrst part. Then, the ﬁrst half of the ﬁrst part is sorted recursively with

Mergesort while being moved to the position of the former second half of the ﬁrst part. Now, at the

front of the array, there is enough space (ﬁlled with dummy elements) such that the two halves can

be merged. The executed stages of the algorithm QuickMergesort are illustrated in Figure 1.

2.2 The median-of-medians algorithm

The median-of-medians algorithm solves the selection problem: given an array A[1,...,n] and an

integer k∈ {1,...,n}ﬁnd the k-th element in the sorted order of A. For simplicity let us assume

that all elements are distinct – in Section 2.3 we show how to deal with the general case with

duplicates.

The basic variant of the median-of-medians algorithm [4] (see also [8, Sec. 9.3]) works as follows:

ﬁrst, the array is grouped into blocks of ﬁve elements. From each of these blocks the median is

selected and then the median of all these medians is computed recursively. This yields a provably

4

11 4 5 6 10 9 2 3 1 0 87

partitioning leads to

3 2 4 5 6 0 1 9 10 11 87

| {z }

sort recursively with Mergesort

3 2 4 11 9 10 8 70 1 5 6

sort recursively with Mergesort

| {z }

9 10 8 11 23470 1 5 6

| {z } | {z }

merge two parts

0 1 2345 6 711 9 8 10

|{z }

sort recursively with QuickMergesort

Figure 1: Example for the execution of QuickMergesort. Here 7 is chosen as pivot.

good pivot for performing a partitioning step. Now, depending on which side the k-th element is,

recursion takes place on the left or right side. It is well-known that this algorithm runs in linear

time with a rather big constant in the O-notation. We use a slight improvement:

Repeated step algorithm. Instead of grouping into blocks of 5 elements, we follow [7] and group

into blocks of 9 elements and take the pseudomedian (“ninther”) into the sample for pivot selection.

This method guarantees that every element in the sample has 4 elements less or equal and 4 element

greater or equal to it. Thus, when selecting the pivot as median of the sample of n/9 elements, the

guarantee is that at least 2n/9 elements are less or equal and the same number greater or equal to

the pivot. Since there might remain 8 elements outside the sample we obtain the recurrence

TMoM(n)≤TMoM

7n

9+8

+TMoMjn

9k+20n

9,

where 4n/3 of 20n/9 is due to ﬁnding the pseudomedians and 8n/9 is for partitioning the remaining

(non-pseudomedian) elements according to the pivot (notice that also some of the other elements

are already known to be greater/smaller than the pivot; however, using this information would

introduce a huge bookkeeping overhead). Thus, by Lemma 2.1, we have:

Lemma 2.2 ([1, 7]). TMoM(n)≤20n+O(nζ)where ζ≈0.78 satisﬁes (7/9)ζ+ (1/9)ζ= 1.

Adaptive pivot selection. For our implementation we apply a slight improvement over the basic

median-of-medians algorithm by using the approach of adaptive pivot selection, which is ﬁrst used

in the Floyd-Rivest algorithm [14, 15], later applied to smaller samples for Quickselect [26, 27], and

recently applied to the median-of-medians algorithm [1]. However, we use a diﬀerent approach than

in [1]: in any case we choose the sample of size n/9 as pseudomedians of nine. Now, if the position

we are looking for is on the far left (left of position 2n/9), we do not choose the median of the sample

as pivot but a smaller position: for searching the k-th element with k≤2n/9, we take the ⌈k/4⌉-th

element of the sample as pivot. Notice that for k= 2n/9, this is exactly the median of the sample.

Since every element of the sample carries at least four smaller elements with it, this guarantees that

5

k

4·4 = kelements are smaller than or equal to the pivot – so the k-th element will lie in the left

part after partitioning (which is presumably the smaller one). Likewise when searching a far right

position, we proceed symmetrically.

Notice that this optimization does not improve the worst-case but the average case (see Sec-

tion 3.4).

2.3 Dealing with duplicate elements

With duplicates we mean that not all elements of the input array are distinct. The number of

comparisons for ﬁnding the median of three (resp. ﬁve) elements does not change in the presence

of duplicates. However, duplicates can lead to an uneven partition. The standard approach in

Quicksort and Quickselect for dealing with duplicates is due to Bentley and McIlroy [3]: in each

partitioning step the elements equal to the pivot are placed in a third partition in the middle of the

array. Recently, another approach appeared in the Quicksort implementation pdqsort [30]. Instead

of three-way partitioning it applies the usual two-way partitioning moving elements equal to the

pivot always to the right side. This method is also applied recursively – with one exception: if the

new pivot is equal to an old pivot (this can be tested with one additional comparison), then all

elements equal to the pivot are moved to the left side, which then can be excluded from recursion.

We propose to follow the latter approach: usually all elements equal to the pivot are moved

to the right side – possibly leading to an even unbalanced partitioning. However, whenever a

partitioning step is very uneven (outside the guaranteed bounds for the pivot in the median-of-

medians algorithm), we know that this must be due to many duplicate elements. In this case we

immediately partition again with the same pivot but moving equal elements to the left.

3 Median-of-Medians QuickMergesort

Although QuickMergesort has an O(n2) worst-case running time, it is quite simple to guarantee

a worst-case number of comparisons of nlog n+O(n): just choose the median of the whole array

as pivot. This is essentially how in-situ Mergesort [13] works. The most eﬃcient way for ﬁnding

the median is using Quickselect [17] as applied in in-situ Mergesort. However, this does not allow

the desired bound on the number of comparisons (even not when using Introselect as in [13]).

Alternatively, we can use the median-of-medians algorithm described in Section 2.2, which, while

having a linear worst-case running time, on average is quite slow. In this section we describe a

variation of the median-of-medians approach which combines an nlog n+O(n) worst-case number

of comparisons with a good average performance (both in terms of running time and number of

comparisons).

3.1 Basic version

The crucial observation is that it is not necessary to use the actual median as pivot (see also our

preprint [12]). As remarked in Section 2.1, the larger of the two sides of the partitioned array can

be sorted with Mergesort as long as the smaller side contains at least one third of the total number

of elements. Therefore, it suﬃces to ﬁnd a pivot which guarantees such a partition. For doing so,

we can apply the idea of the median-of-medians algorithm: for sorting an array of nelements, we

choose ﬁrst n/3 elements as median of three elements each. Then, the median-of-medians algorithm

is used to ﬁnd the median of those n/3 elements. This median becomes the next pivot. Like for

the median-of-medians algorithm, this ensures that at least 2 ·⌊n/6⌋elements are less or equal and

at least the same number of elements are greater or equal than the pivot – thus, always the larger

6

part of the partitioned array can be sorted with Mergesort and the recursion takes place on the

smaller part. The advantage of this method is that the median-of-medians algorithm is applied to

an array of size only n/3 instead of n(with the cost of introducing a small overhead for ﬁnding the

n/3 medians of three) – giving less weight to its big constant for the linear number of comparisons.

We call this algorithm basic MoMQuickMergesort (bMQMS).

For the median-of-medians algorithm, we use the repeated step method as described in Sec-

tion 2.2. Notice that for the number of comparisons the worst case for MoMQuickMergesort happens

if the pivot is exactly the median since this gives the most weight on the “slow” median-of-medians

algorithm. Thus, the total number TbMQMS(n) of comparisons of MoMQuickMergesort in the worst

case to sort nelements is bounded by

TbMQMS(n)≤TbMQMS n

2+TMS n

2+TMoM n

3+ 3 ·n

3+2

3n+O(1)

where TMS(n) is the number of comparisons of Mergesort and TMoM(n) the number of comparisons of

the median-of-medians alg orithm. The 3 ·n

3-term comes from ﬁnding n/3 medians of three elements,

the 2n/3 comparisons from partitioning the remaining elements (after ﬁnding the pivot, the correct

side of the partition is known for n/3 elements).

By Lemma 2.2 we have TMoM(n)≤20n+O(n0.8) and by [33] we have TMS(n)≤nlog n−0.91n+1.

Thus, we can use Lemma 2.1 to resolve the recurrence, which proves (notice that for every comparison

there is only a constant number of other operations):

Theorem 3.1. Basic MoMQuickMergesort (bMQMS) runs in O(nlog n)time and performs at

most nlog n+ 13.8n+O(n0.8)comparisons.

3.2 Improved version

In [32], Reinhardt describes how to merge two subsequent sequences in an array using additional

space for only half the number of elements in one of the two sequences. The additional space should

be located in front or after the two sequences. To be more precise, assume we are given an array A

with positions A[1,...,t] being empty or containing dummy elements (to simplify the description, we

assume the ﬁrst case), A[t+ 1,...,t+ℓ] and A[t+ℓ+ 1,...,t+ℓ+r] containing two sorted sequences.

We wish to merge the two sequences into the space A[1,...,ℓ+r] (so that A[ℓ+r+ 1,...,t+ℓ+r]

becomes empty). We require that r/2≤t < r.

First we start from the left merging the two sequences into the empty space until there remains

no empty space between the last element of the already merged part and the ﬁrst element of the

left sequence (ﬁrst step in Figure 2). At this point, we know that at least telements of the right

sequence have been introduced into the merged part (because when introducing elements from the

left part, the distance between the last element in the already merged part and the ﬁrst element

in the left part does not decrease). Thus, the positions t+ℓ+ 1 through ℓ+ 2tare empty now.

Since ℓ+t+ 1 ≤ℓ+r≤ℓ+ 2t, in particular, A[ℓ+r] is empty now. Therefore, we can start

merging the two sequences right-to-left into the now empty space (where the right-most element

is moved to position A[ℓ+r] – see the second step in Figure 2). Once the empty space is ﬁlled,

we know that all elements from the right part have been inserted, so A[1,...,ℓ +r] is sorted and

A[ℓ+r+ 1,...,t+ℓ+r] is empty (last step in Figure 2).

When choosing ℓ=r(in order to have a balanced merging and so an optimal number of

comparisons), we need one ﬁfth of the array as temporary space. Moreover, by allowing a slightly

imbalanced merge we can also tolerate slightly less temporary space. In the case that the temporary

space is large (t≥r), we apply the merging scheme from Section 2.1. The situation where the

7

Step 1:

Step 2:

After:

Figure 2: In the ﬁrst step the two sequences are merged starting with the smallest elements until the

empty space is ﬁlled. Then there is enough empty space to merge the sequences from the right into its ﬁnal

position.

temporary space is located after the two sorted sequences is handled symmetrically (note that this

changes the requirement to ℓ/2≤t < ℓ).

By applying this merging method in MoMQuickMergesort, we can use pivots having much weaker

guarantees: instead of one third, we need only one ﬁfth of the elements being less (resp. greater)

than the pivot. We can ﬁnd such pivots by applying an idea similar to the repeated step method

for the median-of-medians algorithm: ﬁrst we group into blocks of ﬁfteen elements and compute the

pseudomedians of each group. Then, the pivot is selected as median of these pseudomedians; it is

computed using the median-of-medians algorithm. This guarantees that at least 2 ·3·n

3·5·2≈n

5

elements are less than or equal to (resp. greater than or equal to) the pivot. Computing the

pseudomedian of 15 elements requires 22 comparisons (ﬁve times three comparisons for the medians

of three and then seven comparisons for the median of ﬁve). After that, partitioning requires

14/15ncomparisons. Since still in any case the larger half can be sorted with Mergesort, we get the

recurrence (we call this algorithm MoMQuickMergesort (MQMS))

TMQMS(n)≤TMQMS (n/2) + TMS (n/2) + TMoM (n/15) + 22

15n+14

15n+O(1)

≤TMQMS(n/2) + n

2log(n/2) −0.91n

2+20

15n+36

15n+O(n0.8)

≤nlog n−0.91n−2n+112

15 n+O(n0.8)(by Lemma 2.1)

This proves:

Theorem 3.2. MoMQuickMergesort (MQMS) runs in O(nlog n)time and performs at most nlog n+

4.57n+O(n0.8)comparisons.

Notice that when computing the median of pseudomedians of ﬁfteen elements, in the worst case

approximately the same eﬀort goes into the calculation of the pseudomedians and into the median-

of-medians algorithm. This indicates that it is an eﬃcient method for ﬁnding a pivot with the

guarantee that one ﬁfth are greater or equal (resp. less or equal).

3.3 Undersampling

In [1] Alexandrescu selects pivots for the median-of-medians algorithm not as medians of medians

of the whole array but only of n/φ elements where φis some large constant (similar as in [25] for

Quicksort). While this improves the average case considerably and still gives a linear time algorithm,

the hidden constant for the worst case is large. In this section we follow the idea to a certain extent

without loosing a good worst-case bound.

8

As already mentioned in Section 3.2, Reinhardt’s merging procedure [32] works also with less

than one ﬁfth of the whole array as temporary space if we do not require to merge sequences of

equal length. Thus, we can allow the pivot to be even further oﬀ the median – with the cost of

making the Mergesort part more expensive due to imbalanced merging. For θ≥1 we describe a

variant MQMSθof MoMQuickMergesort using only n/θ elements for sampling the pivot. Before we

analyze this variant, let us look at the costs of Mergesort with imbalanced merging: in order to apply

Reinhardt’s merging algorithm, we need that one part is at most twice the length of the temporary

space. We always apply linear merging (no binary insertion) meaning that merging two sequences

of combined length ncosts at most n−1 comparisons. Thus, we get the following estimate for the

worst case number of comparisons TMS,b (n, m) of Mergesort where nis the number of elements to

sort and mis the temporary space (= “buﬀer”):

TMS,b(n, m)≤(TMS (n) if n≤4m

n+TMS,b(n−2m, m) + TMS (2m) otherwise.

If n > 2m(otherwise, there is nothing to do), this means

TMS,b(n, m)≤l n

2mm−2·T(2m) + Tn−l n

2mm−2·2m+

⌈n

2m⌉−3

X

i=0

(n−2im)

=l n

2mm−2·T(2m) + Tn−l n

2mm−2·2m

+n·l n

2mm−2−m·l n

2mm−2l n

2mm−3.

(2)

For a moment let us assume that n

2m=ℓ∈Z(with ℓ≥1). In this case we have

TMS,b n, n

2ℓ≤(ℓ−2) ·Tn

ℓ+T2n

ℓ+n·(ℓ−2) −n

2ℓ·(ℓ−2)(ℓ−3)

≤(ℓ−2) ·n

ℓ·log n

ℓ−κ+2n

ℓ·log 2n

ℓ−κ+n·(ℓ−2) ·1

2+3

2ℓ

≤nlog n+n·f(ℓ)

for nlarge enough where f:R>0→Ris deﬁned by

f(ℓ) = (−κ−log ℓ+ℓ/2 + 1/2−1/ℓ for ℓ≥2

−κotherwise

and nlog n−κn is a bound for the number of comparisons of Mergesort for nlarge enough (κ≈0.91

by [33]). Now, for arbitrary mwe can use fas an approximation, which turns out to be quite

precise:

Lemma 3.3. Let fbe deﬁned as above and write n

2m=n

2m+ξ. Then for mand nlarge enough

we have

TMS,b(n, m)≤nlog n+n·fn

2m+m·ǫ(ξ)

where ǫ(ξ) = max 0,5ξ−4 + (4 −2ξ) log(2 −ξ)−ξ2

≤0.015 =: ǫ.

9

Proof. Let mbe large enough such that TMS (m)≤mlog m−κm. If for n≤2mwe have

TMS,b (n, m) = TMS (n) and so the lemma holds. Now let n > 2m. By (2) we obtain

TMS,b (n, m)≤n

2m+ξ−2TMS(2m) + TMS ((2 −ξ)2m)(3)

+n·n

2m+ξ−2−m·n

2m+ξ−2 n

2m+ξ−3.(4)

We examine the two terms (3) and (4) separately using TMS(n)≤nlog n−κn:

(3) = n

2m+ξ−2TMS(2m) + TMS ((2 −ξ)2m)

≤n

2m+ξ−22m(log 2m−κ) + (2 −ξ)2mlog((2 −ξ)2m)−κ

= (nlog 2m−κn) + (ξ−2) 2m(log 2m−κ) + (2 −ξ)2mlog((2 −ξ)2m)−κ

= (nlog n−κn)−nlog n

2m+ (2 −ξ) 2m·(log((2 −ξ)2m)−κ)−(log 2m−κ)

= (nlog n−κn)−nlog n

2m+ (2 −ξ) 2mlog(2 −ξ)

and

(4) = n·n

2m+ξ−2−m·n

2m+ξ−2 n

2m+ξ−3

=n·n

2m−2−m·n

2m−2 n

2m−3+nξ −mξn

2m−3+n

2m−2ξ+ξ2

=n·n

2m−2−m

n·n

2m2

−5n

2m+ 6+m5ξ−ξ2

=n·n

2·2m+1

2−3·2m

n+m5ξ−ξ2.

Thus,

TMS,b(n, m)−nlog n+n·fn

2m≤(3) + (4) −nlog n+n·fn

2m

≤(2 −ξ) 2mlog(2 −ξ) + m5ξ−ξ2−4m.

This completes the proof of Lemma 3.3.

For selecting the pivot in QuickMergesort, we apply the procedure of Section 3.2 to n/θ elements

(for some parameter θ∈R,θ≥1): we select n/θ elements from the array, group them into

groups of ﬁfteen elements, compute the pseudomedian of each group, and take the median of those

pseudomedians as pivot. We call this algorithm MoMQuickMergesort with undersampling factor θ

(MQMSθ). Note that MQMS1= MQMS. For its worst case number of comparisons we have

TMQMSθ(n)≤max

1

5θ≤α≤1

2

TMQMSθ(αn) + TMS,b (n(1 −α), αn)

+22

15θn+20

15θn+1−1

15θn+O(n0.8)

where the 22

15θnis for ﬁnding the pseudomedians of ﬁfteen, the 20

15θn+O(n0.8) is for the median-of-

medians algorithm called on n

15θelements and 1−1

15θnis for partitioning the remaining elements.

Now we plug in the bound of Lemma 3.3 for TMS,b(n, m) with ℓ=1−α

2αand apply Lemma 2.1:

TMQMSθ(n)≤max

1

5θ≤α≤1

2

TMQMSθ(αn) + (1 −α)nlog((1 −α)n)

10

+ (1 −α)nf1−α

2α+ǫ+n·1 + 41

15θ+O(n0.8)

≤nlog n+n·max

1

5θ≤α≤1

2

g(α, θ) + O(n0.8)

for

g(α, θ) = αlog(α)

1−α+ log (1 −α) + f1−α

2α+1

1−α·1 + 41

15θ+ǫ.

In order to ﬁnd a good undersampling factor, we wish to ﬁnd a value for θminimizing max 1

5θ≤α≤1

2g(α, θ).

While we do not have a formal proof, intuitively the maximum should be either reached for α= 1/2

(if θis small) or for α= 1/(5θ) (if θis large) – see Figure 4 for a special value of θ. Moreover,

notice that we are dealing with an upper bound on TMQMSθ(n) only (with a small error due to

Lemma 3.3 and the bound nlog n−κn for TMS(n)), so even if we could ﬁnd the θwhich minimizes

max 1

5θ≤α≤1

2g(α, θ), this θmight not be optimal.

We proceed as follows: ﬁrst, we compute the point θopt where the two curves in Figure 3 in-

tersect. For this particular value of θ, we then show that indeed max 1

5θ≤α≤1

2g(α, θ) = g(1/2, θ).

Since θ7→ g(1/2, θ) is monotonically decreasing (this is obvious) and θ7→ g(1/(5θ), θ ) is monoton-

ically increasing for θ≥2.13 (veriﬁed numerically), this together shows that max 1

5θ≤α≤1

2g(α, θ) is

minimized at the intersection point.

1.0 1.5 2.0 2.5 3.0 3.5

undersampling factor θ

1.5

2.0

2.5

3.0

g(α, θ)

α= 1/2

α= 1/(5θ)

Figure 3: θ7→ g(α, θ) for α= 1/2 and α= 1/(5θ).

We compute the intersection point numerically as θopt ≈2.219695. For θopt we verify (using

Wolfram|Alpha [19]), that the maximum max 1

5θ≤α≤1

2g(α, θ) is attained at α= 1/(5θopt) and that

g(1/(5θopt), θopt )≈1.56780 and g(1/2, θopt)≈1.56780. Thus, we have established the optimality

of θopt even though we have not computed g(α, θ) for α6∈ {1/5,1/(5θ)}and θ6=θopt. (In the

mathoverﬂow question [31], this value is veriﬁed analytically – notice that there g(α, θ) is slightly

diﬀerent giving a diﬀerent θ.)

For implementation reasons we want θto be a multiple of 1/30. Therefore, we propose θ=

11/5 – a choice which is only slightly smaller than the optimal value and conﬁrmed experimentally

(Figure 5). Again for this ﬁxed θ, we verify that indeed the maximum is at α= 1/2 and that

g(1/(5θ), θ)≈1.57 and g(1/2, θ)≈1.59, see Figure 4. Thus, up to the small diﬀerence 0.02, we

know that θ= 11/5 is optimal.

For this ﬁxed value of θ= 11/5 we have thus computed max 1

5θ≤α≤1

2g(α, θ)≤1.59, which in

turn gives us a bound on TMQMS11/5(n).

11

0.1 0.2 0.3 0.4 0.5

α

1.0

1.2

1.4

1.6

g(α, θ)

Figure 4: α7→ g(α, θ) for θ= 11/5 with α∈[1/(5θ),1/2] reaches its maximum ≈1.59 for α= 1/2.

Theorem 3.4. MoMQuickMergesort with undersampling factor θ= 11/5(MQMS11/5) runs in

O(nlog n)time and performs at most nlog n+ 1.59n+O(n0.8)comparisons.

3.4 Heuristic estimate of the average case

It is hard to calculate an exact average case since at none but the ﬁrst stage during the execution of

the algorithm we are dealing with random inputs. We still estimate the average case by assuming

that all intermediate arrays are random and applying some more heuristic arguments.

Average of the median-of-medians algorithm. On average we can expect that the pivot

returned from the median-of-medians procedure is very close to an actual median, which gives us

an easy recurrence showing that Tav,MoM (n)≈40/7n. However, we have to take adaptive pivot

selection into account. The ﬁrst pivot is the n/2±o(n)-th element with very high probability. Thus,

the recursive call is on n/2 + o(n) elements with k∈o(n) (or k=n/2 – by symmetry we assume

the ﬁrst case). Due to adaptive pivot selection, the array will be also split in a left part of size o(n)

(with the element we are looking for in it – this is guaranteed even in the worst case) and a larger

right part. This is because an o(n) order element of the n/15 pseudomedians of ﬁfteen is also an o(n)

order elements of the whole array. Thus, all successive recursive calls will be made on arrays of size

o(n). We denote the average number of comparisons of the median-of-median algorithm recursing

on an array of size o(n) as TnoRec

av,MoM(n).

We also have to take the recursive calls for pivot selection into account. The ﬁrst pivot is the

median of the sample; thus, the same reasoning as for Tav,MoM (n) applies. The second pivot is an

element of order o(n) out of n/18 + o(n) elements – so we are in the situation of TnoRec

av,MoM(n). Thus,

we get

Tav,MoM(n) = TnoRec

av,MoM n

2+Tav,MoM n

9+20n

9

and

TnoRec

av,MoM(n) = TnoRec

av,MoM(n

9) + 20n

9+o(n).

Hence, by Lemma 2.1, we obtain TnoRec

av,MoM(n) = 20n

9·9

8+o(n) = 5n

2+o(n) and

Tav,MoM(n) = Tav,MoM(n/9) + 20n

9+5n

4+o(n)

=125

32 n+o(n)≤4n+o(n).

12

Average of MoMQuickMergesort. As for the median-of-medians algorithm, we can expect

that the pivot in MoMQuickMergesort is always very close to the median. Using the bound for the

adaptive version of the median-of-medians algorithm, we obtain

Tav,MQMSθ(n) = Tav,MQMSθ(n/2) + n

2log(n/2) −1.24n

2+22

15θn+4

15θn+15θ−1

15θn+o(n)

≤nlog n+n·−1.24 + 10

3θ+o(n).

by Lemma 2.1 (here the 4n/15θis for the average case of the median-of-medians algorithm, the

other terms as before). This yields

Tav,MQMS(n)≤nlog n+ 2.094n+o(n)

(for θ= 1). For our proposed θ=11

5= 2.2 we have

Tav,MQMS11/5(n)≤nlog n+ 0.275n+o(n).

3.5 Hybrid algorithms

In order to achieve an even better average case, we can apply a trick similar to Introsort [29]. Be

aware, however, that this deteriorates the worst case slightly. We ﬁx some small δ > 0. The

algorithms starts by executing QuickMergesort with median of three pivot selection. Whenever the

pivot is contained in the interval [δn, (1 −δ)n], the next pivot is selected again as median of three,

otherwise according to Section 3.3 (as median of pseudomedians of n/θ elements) – for the following

pivots it switches back to median of 3. When choosing δnot too small, the worst case number of

comparisons will be only approximately 2nmore than of MoMQuickMergesort with undersampling

(because in the worst case before every partitioning step according to MoMQuickMergesort with

undersampling, there will be one partitioning step with median-of-3 using ncomparisons), while the

average is almost as QuickMergesort with median-of-3. We use δ= 1/16. We call this algorithm

hybrid QuickMergesort (HQMS).

Another possibility for a hybrid algorithm is to use MoMQuickMergesort (with undersampling)

instead of Heapsort as a worst-case stopper for Introsort. We test both variants in our experiments.

3.6 Summary of algorithms

For the reader’s convenience we provide a short summary of the diﬀerent versions of MoMQuick-

Mergesort and the results we obtained in Table 1.

4 Experiments

Experimental setup. We ran thorough experiments with implementations in C++ with dif-

ferent kinds of input permutations. The experiments are run on an Intel Core i5-2500K CPU

(3.30GHz, 4 cores, 32KB L1 instruction and data cache, 256KB L2 cache per core and 6MB L3

shared cache) with 16GB RAM and operating system Ubuntu Linux 64bit version 14.04.4. We

used GNU’s g++ (4.8.4); optimized with ﬂags -O3 -march=native. For time measurements, we

used std::chrono::high resolution clock, for generating random inputs, the Mersenne Twister

pseudo-random generator std::mt19937. All time measurements were repeated with the same 100

deterministically chosen seeds – the displayed numbers are the averages of these 100 runs. Moreover,

13

Acronym Algorithm Results

bMQMS basic MoMQuickMergesort Theorem 3.1: κwc ≤13.8

MQMS MoMQuickMergesort

(uses Reinhardt’s merging with balanced merges)

Theorem 3.2: κwc ≤4.57,

Section 3.4: κac ≈2.094

MQMS11/5MoMQuickMergesort with undersampling factor 11/5

(uses Reinhardt’s merging with imbalanced merges)

Theorem 3.4: κwc ≤1.59,

Section 3.4: κac ≈0.275

HQMS hybrid QuickMergesort (combines median-of-3 Quick-

Mergesort and MQMS11/5)

Section 3.5: κwc ≤3.58 for

δlarge enough,

κac smaller

Table 1: Overview over the algorithms in this paper. For the worst case number of comparisons nlog n−

κwcn+O(n0.8) and average case of roughly nlog n−κacnthe results on κwc and κac are shown. The average

cases are only heuristic estimates.

for each time measurement, at least 128MB of data were sorted – if the array size is smaller, then

for this time measurement several arrays have been sorted and the total elapsed time measured. If

not speciﬁed explicitly, all experiments were conducted with 32-bit integers.

Implementation details. The code of our implementation of MoMQuickMergesort as well as the

other algorithms and our running time experiments is available at https://github.com/weissan/QuickXsort.

In our implementation of MoMQuickMergesort, we use the merging procedure from [13], which

avoids branch mispredictions. We use the partitioner from the libstdc++ implementation of std::sort.

For the running time experiments, base cases up to 42 elements are sorted with Insertionsort. For

the comparison measurements Mergesort is used down to size one arrays.

Simulation of a worst case. In order to experimentally conﬁrm our worst case bounds for

MoMQuickMergesort, we simulate a worst case. Be aware that it is not even clear whether in reality

there are input permutations where the bounds for the worst case of Section 3 are tight since when

selecting pivots the array is already pre-sorted in a particular way (which is hard to understand for a

thorough analysis). Actually in [7] it is conjectured that similar bounds for diﬀerent variants of the

median-of-medians algorithm are not tight. Therefore, we cannot test the worst-case by designing

particularly bad inputs. Nevertheless, we can simulate a worst-case scenario where every pivot is

chosen the worst way possible (according to the theoretical analysis). More precisely, the simulation

of the worst case comprises the following aspects:

•For computing the k-th element of a small array (up to 30 elements) we additionally sort it

with Heapsort. This is because our implementation uses Introselect (std::nth element) for

arrays of size up to 30.

•When measuring comparisons, we perform a random shuﬄe before every call to Mergesort. As

the average case of Mergesort is close to its worst case (up to approximately 0.34n), this gives

a fairly well approximation of the worst case. For measuring time we apply some simpliﬁed

shuﬄing method, which shuﬄes only few positions in the array.

•In the median-of-medians algorithm, we do not use the pivot selected by recursive calls, but

use std::nth element to ﬁnd the worst pivot the recursive procedure could possibly select.

We do not count comparisons incurred by std::nth element. This is the main contribution

to the worst case.

14

•As pivot for QuickMergesort (the basic and improved variant) we always use the real median

(this is actually the worst possibility as the recursive call of QuickMergesort is guaranteed to

be on the smaller half and Mergesort is not slower than QuickMergesort). In the version with

undersampling we use the most extreme pivot (since this is worse than the median).

•We also make 100 measurements for each data point. When counting comparisons, we take the

maximum over all runs instead of the mean. However, this makes only a negligible diﬀerence

(as the small standard deviation in Table 2 suggests). When measuring running times we still

take the mean since the maximum reﬂects only the large standard deviation of Quickselect

(std::nth element), which we use to ﬁnd bad pivots.

The simulated worst cases are always drawn as dashed lines in the plots (except in Figure 5).

Diﬀerent undersampling factors. In Figure 5, we compare the (simulated) worst-case number

of comparisons for diﬀerent undersampling factors θ. The picture resembles the one in Figure 3.

However, all numbers are around 0.4 smaller than in Figure 3 because we used the average case of

Mergesort to simulate its worst case. Also, depending on the array size n, the point where the two

curves for α= 1/2 and α= 1/(5θ) meet diﬀers (αas in Section 3.3). Still the minimum is always

achieved between 2.1 and 2.3 (recall that we have to take the maximum of the two curves for the

same n) – conﬁrming the calculations in Section 3.3 and suggesting θ= 2.2 as a good choice for

further experiments.

1.6 1.8 2.0 2.2 2.4 2.6 2.8

undersampling factor θ

1.0

1.2

1.4

1.6

(comparisons−nlog n)/n

n=1048576 α= 1/2

n=4194304 α= 1/2

n=33554432 α= 1/2

n=1048576 α= 1/(5θ)

n=4194304 α= 1/(5θ)

n=33554432 α= 1/(5θ)

Figure 5: Coeﬃcient of the linear term of the number of comparisons in the simulated worst case for

diﬀerent undersampling factors.

Comparison of diﬀerent variants. In Figure 6, we compare the running times (divided by

nlog n) of the diﬀerent variants of MoMQuickMergesort including the simulated worst cases. We

see that the version with undersampling is the fastest both in the average and worst case. Moreover,

while in the average case the diﬀerences are rather small, in the worst case the improved versions

are considerably better than the very basic variant.

In Figure 7 we count the number of comparisons of the diﬀerent versions. The plot shows the

coeﬃcient of the linear term of the number of comparisons (i. e. the total number of comparisons

minus nlog nand then divided by n). Table 2 summarizes the results for n= 228. We see that

our theoretical estimates are close to the real values: for the average case, the diﬀerence is almost

15

210 213 216 219 222 225 228

number of elements n

3.2

3.4

3.6

3.8

4.0

4.2

4.4

4.6

4.8

5.0

time per nlog n[ns]

bMQMS

MQMS

MQMS11/5

MQMS11/5(wc)

MQMS (wc)

bMQMS (wc)

Figure 6: Running times divided by nlog nof diﬀerent MoMQuickMergesort variants and their simulated

worst cases.

210 213 216 219 222 225 228

number of elements n

0

1

2

3

4

(comparisons−nlog n)/n

bMQMS

MQMS

MQMS11/5

MQMS11/5(wc)

MQMS (wc)

Figure 7: Number of comparisons (linear term) of diﬀerent MoMQuickMergesort variants and their simu-

lated worst cases. The worst case of bMQMS is out of range.

negligible; for the worst case, the gap is slightly larger because we use the average case of Mergesort

as “simulation” for its worst case (notice that the diﬀerence between the average and our bound

for the worst case is approximately 0.34n). Moreover, the data suggest that for the worst case of

bMQMS we would have to do experiment with even larger arrays in order to get a good estimate of

the linear term of the number of comparisons. Also we see that the actual number of comparisons

approaches from below towards the theoretically estimated values – thus, the O(n0.8)-terms in our

estimates are most likely negative. Notice however that, as remarked above, we do not know whether

the bounds for the worst case are tight for real inputs.

16

Algorithm average case worst case

exp. theo. exp. theo.

bMQMS 2.772 ±0.02 – 13.05 ±0.17 13.8

MQMS 2.084 ±0.001 2.094 4.220 ±0.007 4.57

MQMS11/50.246 ±0.01 0.275 1.218 ±0.011 1.59

Table 2: Experimentally established linear term of the average and worst case (simulated) number of

comparisons for MoMQuickMergesort for n= 228. The ±values are the standard deviations. The respective

second columns show our theoretical estimates.

210 213 216 219 222 225 228

number of elements n

3.0

3.5

4.0

4.5

5.0

time per nlog n[ns]

In-situ Mergesort

Hybrid QMS

MQMS11/5

std::sort

std::partial sort

std::stable sort

Wikisort

MQMS11/5(wc)

Figure 8: Running times of MoMQuickMergesort(average and simulated worst case), hybrid QMS and

other algorithms for random permutations of 32-bit integers. Running times are divided by nlog n.

Comparison with other algorithms. We conducted experiments comparing MoMQuickMerge-

sort with the following other algorithms: Wikisort [28], in-situ Mergesort [13], std::partial sort

(Heapsort), std::stable sort (Mergesort) and std::sort (Introsort), see Figure 8. For the latter

three algorithms we use the libstdc++ implementations (from GCC version 4.8). We also ran exper-

iments with bottom-up Heapsort and Grailsort [2], but omitted the results because these algorithms

behave similar to Heapsort (resp. Wikisort). We see that MoMQuickMergesort with undersam-

pling (MQMS11/5) performs better than all other algorithms except hybrid QuickMergesort and

std::sort. Moreover, for n= 228 the gap between MQMS11/5and std::sort is only roughly 10%,

and the simulated worst-case of MQMS11/5is again only slightly over 10% worse than its average

case.

Notice that while all algorithms have a worst-case guarantee of O(nlog n), the good average case

behavior of std::sort comes with the cost of a bad worst case (see Figure 11) – the same applies to

in-situ Mergesort. Also notice that even the simulated worst case of MoMQuickMergesort is better

than the running times of in-situ Mergesort, Wikisort and Heapsort, i. e. all the other non-hybrid

17

210 212 214 216 218 220 222

number of elements n

6

8

10

12

14

16

18

20

time per nlog n[ns]

In-situ Mergesort

Hybrid QMS

MQMS11/5

std::sort

std::partial sort

std::stable sort

Wikisort

MQMS11/5(wc)

Figure 9: Running times of MoMQuickMergesort (average and simulated worst case), hybrid QMS and

other algorithms for random permutations 44-byte records with 4-byte keys. Running times are divided by

nlog n.

in-place algorithms we tested.

In Figure 9 and Figure 10 we measure running times when sorting large objects: in Figure 9 we

sort 44-byte records which are compared according to their ﬁrst 4 bytes. Figure 10 shows the results

when comparing pointers so such records which are allocated on the heap. In both cases std::sort

is the fastest, but MoMQuickMergesort with undersampling is still faster than std::partial sort,

in-situ Mergesort and Wikisort (unfortunately the latter did not run for sorting pointers).

In all the experiments, the standard deviations of most algorithms was negligible. Only std::sort,

hybrid QuickMergesort and the simulated worst cases showed a standard deviation which would be

visible in the plots. For the worst cases the large standard deviation is because we use std::nth element

for choosing bad pivots – thus, it is only an artifact of the worst case simulation. The standard

deviations of std::sort and hybrid QuickMergesort can be seen in Figure 11 below.

MQMS as worst-case stopper. In Figure 11 we compare the hybrid algorithms described in

Section 3.5: hybrid QuickMergesort and Introsort (std::sort) with MoMQuickMergesort as worst-

case stopper. We also include the original std::sort. We compare three diﬀerent kinds of inputs:

random permutations, merging two sorted runs of almost equal length (the ﬁrst run two elements

longer than the second run), and a median-of-three killer sequence, where the largest elements are

located in the middle and back of the array and in between the array is randomly shuﬄed.

While for random inputs the diﬀerence between the two variants of std::sort is negligible

and also hybrid QuickMergesort is only slightly slower (be aware of the scale of the plot), on the

merge permutation we see a considerable diﬀerence for large n. Except for very small n, hybrid

QuickMergesort is the fastest here. On the median-of-three killer input, the original std::sort is

outperformed by the other algorithms by a far margin. Also hybrid QuickMergesort is faster than

both variants of std::sort.

18

210 212 214 216 218 220 222

number of elements n

20

30

40

50

60

time per nlog n[ns]

In-situ Mergesort

Hybrid QMS

MQMS11/5

std::sort

std::partial sort

std::stable sort

MQMS11/5(wc)

Figure 10: Running times of MoMQuickMergesort (average and simulated worst case), hybrid QMS and

other algorithms for random permutations of pointers to records. Running times are divided by nlog n.

Wikisort did not work with pointers.

210 215 220 225

number of elements n

3.0

3.1

3.2

time per nlog n[ns]

210 215 220 225

number of elements n

2

3

4

time per nlog n[ns]

Hybrid QMS std::sort std::sort (MQMS worst case stopper)

210 215 220 225

number of elements n

3

4

5

6

7

8

time per nlog n[ns]

Figure 11: Running times of Introsort (std::sort) with Heapsort (original) and MoMQuickMergesort as

worst-case stopper and hybrid QuickMergesort. Left: random permutation, middle: merge, right: median-

of-three killer sequence. The vertical bars represent the standard deviations.

Figure 11 also displays the standard deviation as error bars. We see that for the special permuta-

tions the standard deviations are negligible – which is no surprise. For random permutation hybrid

QuickMergesort shows a signiﬁcantly higher standard deviation than the std::sort variants. This

could be improved by using large pivot samples (e. g. pseudomedian of 9 or 25). Notice that only

for n≥225 the standard deviations are meaningful since for smaller neach measurement is already

an average so the calculated standard deviation is much smaller than the real standard deviation.

19

5 Conclusion

We have shown that by using the median-of-medians algorithm for pivot selection QuickMergesort

turns into a highly eﬃcient algorithm in the worst case, while remaining competitive on average

and fast in practice. Future research might address the following points:

•Although pseudomedians of ﬁfteen elements for sampling the pivot sample seems to be a good

choice, other methods could be investigated (e. g. median of nine).

•The running time could be further improved by using Insertionsort to sort small subarrays.

•Since the main work is done by Mergesort, any tuning to the merging procedure also would

directly aﬀect MoMQuickMergesort.

•Also other methods for in-place Mergesort implementations are promising and should be de-

veloped further – in particular, the (unstable) merging procedure by Chen [6] seems to be a

good starting point.

•To get out the most performance of modern multi-core processors, a parallel version of the

algorithm is desirable. For both Quicksort and Mergesort eﬃcient parallel implementations

are known. Thus, an eﬃcient parallel implementation of MoMQuickMergesort is not out of

reach. However, there is one additional diﬃculty to overcome: while in the Quicksort recursion

both parts can be sorted independently in parallel, in QuickMergesort this is not possible since

one part is necessary as temporary memory for sorting the other part with Mergesort.

References

[1] Andrei Alexandrescu. Fast deterministic selection. In Costas S. Iliopoulos, Solon P. Pissis,

Simon J. Puglisi, and Ra jeev Raman, editors, 16th International Symposium on Experimental

Algorithms, SEA 2017, June 21-23, 2017, London, UK, volume 75 of LIPIcs, pages 24:1–24:19.

Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.

[2] Andrey Astrelin. Grailsort. Github repository at https://github.com/Mrrl/GrailSort.

[3] Jon Louis Bentley and M. Douglas McIlroy. Engineering a sort function. Softw., Pract. Exper.,

23(11):1249–1265, 1993.

[4] Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert E. Tarjan.

Time bounds for selection. Journal of Computer and System Sciences, 7(4):448–461, 1973.

[5] Domenico Cantone and Gianluca Cincotti. Quickheapsort, an eﬃcient mix of classical sorting

algorithms. Theor. Comput. Sci., 285(1):25–42, 2002.

[6] Jing-Chao Chen. A simple algorithm for in-place merging. Inf. Process. Lett., 98(1):34–40,

2006.

[7] Ke Chen and Adrian Dumitrescu. Select with groups of 3 or 4. In Frank Dehne, J¨org-R¨udiger

Sack, and Ulrike Stege, editors, Algorithms and Data Structures - 14th International Sym-

posium, WADS 2015, Victoria, BC, Canada, August 5-7, 2015. Proceedings, volume 9214 of

Lecture Notes in Computer Science, pages 189–199. Springer, 2015.

[8] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliﬀord Stein. Introduction

to Algorithms. The MIT Press, 3th edition, 2009.

20

[9] Volker Diekert and Armin Weiß. Quickheapsort: Modiﬁcations and improved analysis. In CSR,

pages 24–35, 2013.

[10] Stefan Edelkamp and Patrick Stiegeler. Implementing Heapsort with nlog n−0.9nand Quick-

sort with nlog n+ 0.2ncomparisons. Journal of Experimental Algorithmics, 7:Article 5, 2002.

[11] Stefan Edelkamp and Armin Weiß. QuickXsort: Eﬃcient Sorting with nlog n−1.399n+o(n)

Comparisons on Average. In Edward A. Hirsch, Sergei O. Kuznetsov, Jean-´

Eric Pin, and

Nikolay K. Vereshchagin, editors, CSR, volume 8476 of Lecture Notes in Computer Science,

pages 139–152. Springer, 2014.

[12] Stefan Edelkamp and Armin Weiß. Quickmergesort: Practically eﬃcient constant-factor opti-

mal sorting. CoRR, abs/1804.10062, 2018.

[13] Amr Elmasry, Jyrki Katajainen, and Max Stenmark. Branch mispredictions don’t aﬀect merge-

sort. In SEA, pages 160–171, 2012.

[14] Robert W. Floyd and Ronald L. Rivest. The algorithm SELECT - for ﬁnding the ith smallest

of n elements [M1] (algorithm 489). Commun. ACM, 18(3):173, 1975.

[15] Robert W. Floyd and Ronald L. Rivest. Expected time bounds for selection. Commun. ACM,

18(3):165–172, 1975.

[16] Viliam Geﬀert, Jyrki Kata jainen, and Tomi Pasanen. Asymptotically eﬃcient in-place merging.

Theor. Comput. Sci., 237(1-2):159–181, 2000.

[17] C. A. R. Hoare. Algorithm 65: Find. Commun. ACM, 4(7):321–322, July 1961.

[18] Bing-Chao Huang and Michael A. Langston. Fast stable merging and sorting in constant extra

space. Comput. J., 35(6):643–650, 1992.

[19] Wolfram Research, Inc. Wolfram|Alpha. Champaign, IL, 2018.

[20] Ming-Yang Kao. Multiple-size divide-and-conquer recurrences. SIGACT News, 28(2):67–69,

1997.

[21] Jyrki Katajainen. The Ultimate Heapsort. In CATS, pages 87–96, 1998.

[22] Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort. Nord. J.

Comput., 3(1):27–40, 1996.

[23] Pok-Son Kim and Arne Kutzner. Ratio based stable in-place merging. In Manindra Agrawal,

Ding-Zhu Du, Zhenhua Duan, and Angsheng Li, editors, Theory and Applications of Models

of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008.

Proceedings, volume 4978 of Lecture Notes in Computer Science, pages 246–257. Springer, 2008.

[24] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.

Addison Wesley Longman, 2nd edition, 1998.

[25] Noriyuki Kurosawa. Quicksort with median of medians is considered practical. CoRR,

abs/1608.04852, 2016.

[26] Conrado Martinez, Daniel Panario, and Alfredo Viola. Adaptive sampling for quickselect. In

Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA

2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 447–455, 2004.

21

[27] Conrado Mart´ınez, Daniel Panario, and Alfredo Viola. Adaptive sampling strategies for qicks-

elects. ACM Trans. Algorithms, 6(3):53:1–53:45, 2010.

[28] Mike McFadden. WikiSort. Github repository at https://github.com/BonzaiThePenguin/WikiSort.

[29] David R. Musser. Introspective sorting and selection algorithms. Software—Practice and Ex-

perience, 27(8):983–993, 1997.

[30] Orson Peters. pdqsort. Github repository at https://github.com/orlp/pdqsort.

[31] Iosif Pinelis. Computing minimum / maximum of strange two variable function (answer).

MathOverﬂow. URL: https://mathoverflow.net/q/306757 (visited on: 2018-07-25).

[32] Klaus Reinhardt. Sorting in-place with a worst case complexity of nlog n−1.3n+o(log n)

comparisons and ǫn log n+o(1) transports. In ISAAC, pages 489–498, 1992.

[33] Eric W. Weisstein. Merge sort. From MathWorld—A Wolfram Web Resource. Url:

http://mathworld.wolfram.com/MergeSort.htmlLast visited on 07/25/2018.

[34] Sebastian Wild. Average cost of QuickXsort with pivot sampling. In James Allen Fill and

Mark Daniel Ward, editors, 29th International Conference on Probabilistic, Combinatorial and

Asymptotic Methods for the Analysis of Algorithms, AofA 2018, June 25-29, 2018, Uppsala,

Sweden, volume 110 of LIPIcs, pages 36:1–36:19. Schloss Dagstuhl - Leibniz-Zentrum fuer In-

formatik, 2018.

22