Content uploaded by Armin Weiß

Author content

All content in this area was uploaded by Armin Weiß on May 03, 2018

Content may be subject to copyright.

QuickMergesort: Practically Eﬃcient

Constant-Factor Optimal Sorting

Stefan Edelkamp1and Armin Weiß2

1 King’s College London, UK

stefan.edelkamp@kcl.ac.uk

2 Universität Stuttgart, Germany

armin.weiss@fmi.uni-stuttgart.de

Abstract

We consider the fundamental problem of internally sorting a sequence of nelements. In its

best theoretical setting QuickMergesort, a combination Quicksort with Mergesort with a Median-

of-√npivot selection, requires at most nlog n−1.3999n+o(n)element comparisons on the

average. The questions addressed in this paper is how to make this algorithm practical. As

reﬁned pivot selection usually adds much overhead, we show that the Median-of-3 pivot selection

of QuickMergesort leads to at most nlog n−0.75n+o(n)element comparisons on average, while

running fast on elementary data. The experiments show that QuickMergesort outperforms state-

of-the-art library implementations, including C++’s Introsort and Java’s Dual-Pivot Quicksort.

Further trade-oﬀs between a low running time and a low number of comparisons are studied.

Moreover, we describe a practically eﬃcient version with nlog n+O(n)comparisons in the worst

case.

1998 ACM Subject Classiﬁcation F.2.2 Nonnumerical Algorithms and Problems

Keywords and phrases in-place sorting, quicksort, mergesort, analysis of algorithms

Digital Object Identiﬁer 10.4230/LIPIcs.xxx.yyy.p

1 Introduction

Sorting a sequence of

n

elements remains one of the most fascinating topics in computer

science, and runtime improvements to sorting has signiﬁcant impact for many applications.

The lower bound is

log

(

n

!)

≈nlog n−

1

.

44

n

+ Θ(

log n

)element comparisons applies to the

worst and the average case1.

The sorting algorithms we propose in this paper are internal or in-place: they need at

most

O

(

log n

)space (computer words) in addition to the array to be sorted. That means

we consider Quicksort [

15

] an internal algorithm, whereas standard Mergesort is external

because it needs a linear amount of extra space.

Based on QuickHeapsort [

5

,

7

], Edelkamp and Weiß [

9

] developed the concept of QuickX-

sort and applied it to X = WeakHeapsort [

8

] and X = Mergesort. The idea – going back

to UltimateHeapsort [

17

] – is very simple: as in Quicksort the array is partitioned into the

elements greater and less than some pivot element, respectively. Then one part of the array

is sorted by X and the other part is sorted recursively. The advantage is that, if X is an

external algorithm, then in QuickXsort the part of the array which is not currently being

sorted may be used as temporary space, which yields an internal variant of X.

1

Logarithms denoted by

log

are base 2, and the term average case refers to a uniform distribution of all

input permutations assuming all elements are diﬀerent.

arXiv:1804.10062v1 [cs.DS] 26 Apr 2018

Using Mergesort as X, a partitioning scheme with

d√ne

pivots, known to be optimal

for classical Quicksort [

22

], and Ford and Johnson’s MergeInsertion as the base case [

13

],

QuickMergesort requires at most

nlog n−

1

.

3999

n

+

o

(

n

)element comparisons on the average

(and

nlog n−

1

.

4

n

+

o

(

n

)for

n

= 2

k

), while preserving worst-case bounds

nlog n

+

O

(

n

)

element comparisons and

O

(

nlog n

)time for all other operations [

9

]. To the authors’

knowledge the average-case result is the best-known upper bound for sequential sorting with

O

(

nlog n

)overall time bound, in the leading term matching, and in the linear term being less

than 0

.

045

n

away from the lower bound. The research question addressed in this paper, is

whether QuickMergesort can be made practical in relation to eﬃcient library implementations

for sorting, such as Introsort and Dual-Pivot Quicksort.

Introsort [

23

], implemented as

std::sort

in C++/STL, is a mix of Insertionsort, Clever-

Quicksort (the Median-of-3 variant of Quicksort) and Heapsort [

12

,

28

], where the former and

latter are used as recursion stoppers (the one for improving the performance for small sets of

data, the other one for improving worst-case performance). The average-time complexity,

however, is dominated by CleverQuicksort.

Dual-Pivot Quicksort

2

by Yaroslavskiy et al. as implemented in current versions of Java

(e.g., Oracle Java 7 and Java 8) is an interesting Quicksort variant using two (instead of one)

pivot elements in the partitioning stage (recent proposals use three and more pivots [

20

]). It

has been shown that – in contrast to ordinary Quicksort with an average case of 2

·nln n

+

O

(

n

)

element comparisons – Dual-Pivot Quicksort requires at most 1

.

9

·nln n

+

O

(

n

)element

comparisons on the average, and there are variants that give 1

.

8

·nln n

+

O

(

n

). For a rising

number of samples for pivot selection, the leading factor decreases [2, 27, 3].

So far there is no practical (competitive in performance to state-of-the-art library imple-

mentations) sorting algorithm that is internal and constant-factor-optimal (optimal in the

leading term). Maybe closest is InSituMergesort [

18

,

11

], but even though that algorithm

improves greatly over the library implementation of in-place stable sort in STL, it could

not match with other internal sorting algorithms. Hence, the aim of the paper is to design

fast QuickMergesort variants. Instead of using a Median-of-

√n

strategy, we will use the

Median-of-3. For Quicksort, the Median-of-3 strategy is also known as CleverQuicksort. The

leading constant in

c·nlog n

+

O

(

n

)for the average case of comparisons of CleverQuicksort

is

c

= (12

/

7)

ln

2

≈

1

.

188. As

c <

1

.

8

ln

2, CleverQuicksort is theoretically superior to the

wider class of DualPivotQuicksort algorithms considered in [2, 3, 27].

Another sorting algorithm studied in this paper is a mix of QuickMergesort and Clever-

Quicksort: during the sorting with Mergesort, for small arrays CleverQuicksort is applied.

The contributions of the paper are as follows.

1.

We derive a bound on the average number of comparisons in QuickMergesort when the

Median-of-3 partitioning strategy is used instead of the Median-of-

√n

strategy, and show

a surprisingly low upper bound of nlog n−0.75n+o(n)comparisons on average.

2.

We analyze a variant of QuickMergesort where base cases of size at most

nβ

for some

β∈

[0

,

1] are sorted using yet another sorting algorithm X; otherwise the algorithm is

identical to QuickMergesort. We show that if X is called for about

√n

elements and X

uses at most

α·nlog n

+

O

(

n

)comparisons on average, the average number of comparisons

of is (1 +

α

)

/

2

·nlog n

+

O

(

n

), with (1 +

α

)

/

2

≈

1

.

094 for X =Median-of-3 Quicksort.

Other element size thresholds for invoking X lead to further trade-oﬀs.

2

Oracle states: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley,

and Joshua Bloch; see http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 2–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

3.

We reﬁne a trick suggested in [

9

] in order to obtain a bound of

nlog n

+16

.

1

n

comparisons

in the worst case using the median-of-median algorithm [

4

] with an adaptive pivot

sampling strategy. On average the modiﬁed algorithm is only slightly slower than the

Median-of-3 variant of QuickMergesort.

4. We compare the proposals empirically to other algorithms from the literature.

We start with revisiting QuickXsort and especially QuickMergesort, including theoretically

important and practically relevant sub-cases. We derive an upper bound on the average

number of comparisons in QuickMergesort with Median-of-3 pivot selection. In Section 3,

we present changes to the algorithm that lead to the hybrid QuickMergeXsort. Next, we

introduce the worst-case eﬃcient variant MoMQuickMergesort, and, ﬁnally, we present

experimental results.

2 QuickXsort and QuickMergesort

In this section we give a brief description of QuickXsort and extend a result concerning the

number of comparisons performed in the average case.

Let X be some sorting algorithm. QuickXsort works as follows: First, choose a pivot

element as the median of some sample (the performance will depend on the size of the

sample). Next, partition the array according to this pivot element, i. e., rearrange the array

such that all elements left of the pivot are less or equal and all elements on the right are

greater or equal than the pivot element. Then, choose one part of the array and sort it with

the algorithm X. After that, sort the other part of the array recursively with QuickXsort.

The main advantage of this procedure is that the part of the array that is not being

sorted currently can be used as temporary memory for the algorithm X. This yields fast

internal variants for various external sorting algorithms such as Mergesort. The idea is that

whenever a data element should be moved to the extra (additional or external) element space,

instead it is swapped with the data element occupying the respective position in part of the

array which is used as temporary memory. Of course, this works only if the algorithm needs

additional storage only for data elements. Furthermore, the algorithm has to keep track of

the positions of elements which have been swapped.

For the number of comparisons some general results hold for a wide class of algorithms X.

Under natural assumptions the average number of comparisons of X and of QuickXsort diﬀer

only by an

o

(

n

)-term: Let X be some sorting algorithm requiring at most

nlog n

+

cn

+

o

(

n

)

comparisons on average. Then, QuickXsort with a Median-of-

√n

pivot selection also needs

at most

nlog n

+

cn

+

o

(

n

)comparisons on average [

9

]. Sample sizes of approximately

√n

are likely to be optimal [7, 22].

If the unlikely case happens that always the

√n

smallest elements are chosen for pivot

selection, Ω(

n3/2

)comparisons are performed. However, as we showed in [

9

], such a worst

case is unlikely. Nevertheless, for improving the worst-case complexity, in [

9

] we suggested a

trick similar to Introsort [

23

] leading to

nlog n

+

O

(

n

)comparisons in the worst case (use

the median of the whole array as pivot if the previous pivot was very bad). In Section 4 of

this paper, we reﬁne this method yielding a better average and worst-case performance.

One example for QuickXsort is QuickMergesort. For the Mergesort part we use standard

(top-down) Mergesort, which can be implemented using

m

extra element spaces to merge two

arrays of length

m

. After the partitioning, one part of the array – for a simpler description we

assume the ﬁrst part – has to be sorted with Mergesort (note, however, that any of the two

sides can be sorted with Mergesort as long as the other side contains at least

n/

3elements.

In order to do so, the second half of this ﬁrst part is sorted recursively with Mergesort while

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 3–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

11 4 5 6 10 9 2 3 1 0 87

partitioning leads to

3 2 4 5 6 0 1 9 10 11 87

| {z }

sort recursively with Mergesort

3 2 4 11 9 10 8 70 1 5 6

sort recursively with Mergesort

| {z }

9 10 8 11 23470 1 5 6

| {z } | {z }

merge two parts

0123456711 9 8 10

| {z }

sort recursively with QuickMergesort

Figure 1 Example for the execution of QuickMergesort.

moving the elements to the back of the whole array. The elements from the back of the array

are inserted as dummy elements into the ﬁrst part. Then, the ﬁrst half of the ﬁrst part is

sorted recursively with Mergesort while being moved to the position of the former second

half of the ﬁrst part. Now, at the front of the array, there is enough space (ﬁlled with dummy

elements) such that the two halves can be merged. The executed stages of the algorithm

QuickMergesort (with no median pivot selection strategy applied) are illustrated in Fig 1.

Mergesort requires approximately

nlog n−

1

.

26

n

comparisons on average, so that with a

Median-of-

√n

we obtain an internal sorting algorithm with

nlog n−

1

.

26

n

+

o

(

n

)comparisons

on average. One can do even better by sorting small subarrays with a more complicated

algorithm requiring less comparisons – for details see [9].

Since the Median-of-3 variant (i. e. CleverQuickMergesortsort) shows a slightly better

practical performance than with Median-of-

√n

(see [

9

]), we provide here a theoretical analysis

of it by showing that CleverQuickMergesortsort performs at most

nlog n−

0

.

75

n

+

o

(

n

)

comparisons on average. In fact, as in [

9

] we show a more general result for CleverQuickXsort

for an arbitrary algorithm X.

ITheorem 1

(Average Case CleverQuickXsort)

.

Let the algorithm X perform at most

αn log n

+

cn

+

O

(

log n

)comparisons on average. Then, CleverQuickXsort performs at most

αn log n

+

(c+κα)n+O(log2n)comparisons on average with κα=4

15 12 −7α

ln 2 ≤0.51.

Since Mergesort requires at most

nlog n−

1

.

26

n

+

o

(

n

)comparisons on average, we obtain

the following corollary:

ICorollary 2

(Average Case CleverQuickMergesort)

.

CleverQuickMergesort is an in-place

algorithm that performs at most nlog n−0.75n+o(n)comparisons on average.

Proof of Theorem 1.

The probability of choosing the

k

-th element (in the ordered sequence)

as pivot of a random n-element array is Pr [pivot =k] = (k−1)(n−k)n

3−1(one element

of the three element set has to be less than the

k

-th, one equal to the

k

-th, and one greater

than

k

-th element of the array). Note that this holds no matter whether we select the three

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 4–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

elements at random or we use ﬁxed positions and average over all input permutations. Since

probabilities sum up to 1, we have

n

X

k=1

(k−1)(n−k)n

3−1

= 1.(1)

Moreover, partitioning preserves randomness of the two sides of the array – this includes

the positions where the other two elements from the pivot sample are placed (since for a

ﬁxed pivot, every element smaller (resp. greater) than the pivot has the same probability of

being part of the sample). Also, using the array as temporary space for Mergesort does not

destroy randomness since the dummy elements are never compared.

Let

T

(

n

)be the average-case number of comparisons of CleverQuickXsort for sorting an

input of size nand let

S(n) = αn log n+cn +d(1 + log n)

be a bound for the average number of comparisons of the algorithm X (e. g. Mergesort). We

will show by induction that

T(n)≤αn log n+ (c+κα)n+D(1 + log2n)

for some constant

D≥d

(which we specify later such that the induction base is satisﬁed)

and

κα

=

4

15 12 −7α

ln 2 ≤

0

.

51 (since

α≥

1by the general lower bound on sorting). As

induction hypothesis for 1≤k≤nwe assume that

max{T(k−1) + S(n−k), T (n−k) + S(k−1) }

≤α(k−1) log(k−1) + α(n−k) log(n−k) + cn +καmax{k−1, n −k}

+D1 + log2(max{k−1, n −k})+d(1 + log(min{k−1, n −k}))

=: f(n, k).

In order to ﬁnd the pivot element, three comparisons are needed. After that, for

partitioning

n−

3comparisons are performed (all except the three elements of the pivot

sample are compared with the pivot). Since after partitioning, one part of the array is sorted

with X and the other recursively with CleverQuickXsort, we obtain the recurrence

T(n)≤n+

n

X

k=1

Pr [pivot =k]·max {T(k−1) + S(n−k), T (n−k) + S(k−1) }

≤n+

n

X

k=1

(k−1)(n−k)

n

3f(n, k)

≤n+1

n

3

n

X

k=1

(k−1)(n−k)α(k−1) log(k−1) + α(n−k) log(n−k)(2)

+1

n

3

n

X

k=1

(k−1)(n−k)καmax{k−1, n −k}(3)

+1

n

3

n

X

k=1

(k−1)(n−k)Dlog2(max{k−1, n −k})(4)

+1

n

3

n

X

k=1

(k−1)(n−k)cn +D+d+dlog (min{k−1, n −k})(5)

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 5–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

We simplify the terms (2)–(5) separately using

http://www.wolframalpha.com/

for eval-

uating the sums and integrals. The function

x7→ g

(

x

) = (

x−

1)

2

(

n−x

)

log

(

x−

1) is

non-negative and has a single maximum for 1

≤x≤n

at position

x

=

ξ

; on the left of

ξ

, it

is monotonically increasing, on the right monotonically decreasing. Therefore,

n

X

k=1

g(k) =

bξc

X

k=1

g(k) +

n

X

k=bξc+1

g(k)≤Zbξc

1

g(x)dx +Zn

bξc+1

g(x)dx + 2g(ξ).

Since the second term of (2) is obtained from the ﬁrst one by a substitution

k7→ n

+ 1

−k

,

it follows that

(2) −n≤α

n

3· Zn

1

g(x)dx +Zn

1

g(n+ 1 −x)dx + 4g(ξ)!

≤α

n

3· 2Zn

1

x2(n−x) log x dx + 4n3log n!

=α

n

3· 2

144 ln 2 n4(12 ln n−7) + 4n3log n!≤αn log n−7α

12 ln 2 n+c3αlog n

for some properly chosen constant c3. Now, ﬁrst assume that κα≥0. Then we have

(3) ≤2κα

n

3

dn/2e

X

k=1

(k−1)(n−k)2≤2καn

192n

311n3−20n2−44n+ 80≤11

16καn+c4

for some constant c4. On the other hand, if κα<0, we have

(3) ≤2κα

n

3

bn/2c

X

k=1

(k−1)(n−k)2≤2καn

192n

311n3−68n2−100n+ 16≤11

16καn+c4

for some constant

c4

. Thus, in any case, we have (3)

≤11

16 καn

+

c4

. With the same argument

as for (2), we have

(4) ≤2D

n

3

n

X

k=bn/2c

(k−1)(n−k) log2(k−1)

≤2D

n

3Zn

bn/2c

(x−1)(n−x) log2(x−1) dx +D·c0

5≤Dlog2n−5D

3log n+D·c5

for some constants c0

5and c5. Finally, by (1), we have

(5) ≤cn +D+d+dlog(n/2) = cn +D+dlog n.

Now, we combine all the terms and obtain

T(n)≤αn log n+n1 + −7α

12 ln 2 +c+11

16κα

+c3αlog n+c4+Dlog2n−5D

3log n+Dc5+D+dlog n

We can choose

D

such that

5D

3log n≥c3αlog n

+

c4

+

Dc5

+

D

+

dlog n

for

n

large enough

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 6–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

and D≥T(n)for all smaller n. Hence, we conclude the proof of Theorem 1:

T(n)≤αn log n+n1 + −7α

12 ln 2 +c+11

16κα+Dlog2n+D

=αn log n+n1 + −7α

12 ln 2 +c+11

16 ·4

15 12 −7α

ln 2 +Dlog2n+D

=αn log n+ (c+κα)n+Dlog2n+D.

J

Notice that in the case that in each recursion level always the smaller part is sorted with

X, the inequalities in the proof of Theorem 1 are tight up to some lower order terms. Thus,

the proof can be easily modiﬁed to provide a lower bound of

αn log n

+ (

c

+

κα

)

n−O

(

log2n

)

comparisons in this special case.

3 QuickMergeXsort

QuickMergeXsort agrees with QuickMergesort up to the following change: for arrays of

size smaller than some threshold cardinality X_THRESH, the sorting algorithm X is called

(instead of Mergesort) and the sorted elements are moved to their target location expected

by QuickMergesort.

Fig. 2 provides the full implementation details of QuickMerge(X)sort (in C++). The

realization of the sorting algorithm X and the partitioning algorithm have to be added. The

listing shows that by dropping the base cases from QuickMergesort the code is short enough

for textbooks on algorithms and data structures. The general principle is that we have a

merging step that takes two sorted areas, merges and swaps them into a third one.

The program msort applies Mergesort with X as a stopper. It goes down the recursion

tree and shrinks the size of the array accordingly. If the array is small enough, the algorithm

calls X followed by a joint movement (memory copy) of array elements (the only change of

code wrt. QuickMergesort). The algorithm out serves as an interface between the recursive

procedure msort and top-level procedure sort. Last, but not least, we have the overall

internal sorting algorithm sort, that performs the partitioning. The following result is a

generalization of the 1

·nlog n

+

cn

+

o

(

n

)average comparisons bound in [

9

]. Indeed, the

proof is almost a verbatim copy of the proof of [

9

, Thm. 1] (compare to the role of

α

in the

proof of Theorem 1).

ITheorem 3

(Average-Case QuickXsort)

.

For

α≥

1let X be some sorting algorithm requiring

at most

α·nlog n

+

cn

+

o

(

n

)comparisons on average. Then, QuickXsort with a Median-of-

√n

pivot selection also needs at most α·nlog n+cn +o(n)comparisons on average.

We are now ready to analyze the average-case performance of QuickMergeXsort.

ITheorem 4

(Average-Case QuickMergeXsort/CleverQuickMergeXsort)

.

Let X be a sorting

algorithm with

α·nlog n

+

cn

+

o

(

n

)comparisons in the average case, called when reaching

nβ

elements, 0

< β <

1. Then, QuickMergeXsort with Median-of-

√n

pivot selection, as

well as with Median-of-3 pivot selection, is a sorting algorithm that needs at most (

αβ

+ (1

−

β)) ·nlog n+O(n)comparisons in the average case.

Proof.

To begin with we analyze MergeXsort, i.e., Mergesort, with recursion stopper X. We

assume that every path of the recursion tree of Mergesort has the same length until the

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 7–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

1t y p e d e f s t d : : v e ct o r <t > :: i t e r a t o r i t e r ;

2vo i d m er ge ( i t e r b e gi n 1 , i t e r en d1 , i t e r t a r g e t , i t e r e n d t a r g e t ) {

3i t e r i 1 = b e gi n 1 , i 2 = t a r g e t + ( e nd 1 −b e g i n1 ) , i r e s = t a r g e t ;

4t temp = ∗target ;

5while ( i 1 ! = en d 1 && i 2 ! = e n d t a r g e t ) {

6i t e r te m pi t = ( ∗i 1 < ∗i 2 ) ? i 1++ : i 2 ++;

7∗i r e s++ = ∗te m p i t ; ∗te mp i t = ∗i r e s ;

8}

9while( i 1 < e nd 1 ) { ∗i r e s ++ = ∗i 1 ; ∗i 1++ = ∗i r e s ; }

10 ∗( i 1 −1) = temp ;

11 }

12 vo i d m so rt ( i t e r be gi n , i t e r end , i t e r t a r g et ) {

13 in d e x n = end −begin ;

14 i f ( n < X_THRESH) {

15 X( b e gi n , en d ) ;

16 for(i n t i =0 ; i <n ; i ++) s t d : : sw ap ( b e g i n [ i ] , t a r g e t [ i ] )

17 }

18 e l s e {

19 i nd e x q = n / 2 ;

20 ms o rt ( b e gi n + q , end , t a r g e t + q ) ;

21 ms o rt ( b e gi n , b e gi n + q , b e g in + q ) ;

22 me rg e ( b e g i n + q , b e gi n + n , ta r g e t , t a r g e t + n ) ;

23 }

24 }

25 vo i d o ut ( i t e r b eg in , i t e r en d , i t e r temp ) {

26 in d e x n = end −begin ;

27 i f ( n > 1 ) {

28 in d e x q = n / 2 , r = n −q ;

29 ms o rt ( b e gi n + q , end , te mp ) ;

30 ms or t ( be gi n , be g in + q , b eg in + r ) ;

31 me rg e ( tem p , tem p + r , b e gi n , e nd ) ;

32 }

33 }

34 vo i d s o r t ( s t d : : ve c t o r <t> &a ) {

35 i t e r b e gi n = a . b eg i n ( ) , e nd = a . e nd ( ) ;

36 while ( b e gi n < e nd ) {

37 i t e r b = p a r t i t i o n ( b e gi n , e nd ) ;

38 i f ( b < ( e nd + b eg in ) / 2) { ou t ( be gi n , b , b+1) ; b eg in = b+ 1; }

39 e l s e { o ut ( b +1 , en d , b eg i n ) ; e nd = b ; }

40 }

41 }

42

Figure 2 Implementation of QuickMergeXsort.

algorithm switches to X. This can be easily implemented and guarantees that all calls to X

are made on arrays of almost identical size.

First, we look at the

d(log n)·(1 −β)e

top layers of the recursion tree, which are sorted

by Mergesort. In the worst-case, in layer

i

of the tree, Mergesort requires at most

n−

2

i< n

comparisons, so that in total we have at most

CMergeXsort

(

n

) =

n· d(1 −β)·log ne

element

comparisons. The average case diﬀers only negligibly from the worst case.

In the

d(log n)·(1 −β)e

recursion levels of Mergesort, 2

d(1−β) log n)e

sorted arrays are

merged to one large sorted array. Each of the

gβ

(

n

) = 2

d(1−β) log ne

arrays is of size at most

fβ(n) = 2log n−d(1−β) log ne≤nβ.

Next, we look at the

gβ

(

n

) = 2

d(1−β) log ne

calls to X. Let

CX

(

n

)denote the average number

of element comparisons executed by all calls of X. Given that

gβ

(

n

)

fβ

(

n

) =

n

+

O

(

n1−β

)

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 8–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

and log fβ(n) = log 2log n−d(1−β) log ne= log n− d(1 −β) log ne+O(1/nβ), we obtain

CX(n) = gβ(n)·(α·fβ(n) log fβ(n) + cfβ(n) + o(fβ(n)))

=α·nlog fβ(n) + cn +o(n) = α·n(log n− d(1 −β) log ne) + cn +o(n)

In cumulation, for the average-case number of comparisons of MergeXsort we have the

following upper bound

CMergeXsort(n) = CX(n) + CMergeXsort (n)

≤α·n(log n− d(1 −β) log ne) + cn +o(n) + nd(1 −β) log ne

=nα·log n−(α−1) d(1 −β) log ne+cn +o(n)

= (αβ + (1 −β)) ·nlog n+O(n).

Using Theorem 3 (resp. Theorem 1 for Median-of-3) we obtain the matching bound of at

most (

αβ

+(1

−β

))

·nlog n

+

O

(

n

)element comparisons on average for QuickMergeXsort.

J

Theorem 4 implies that CleverQuickMergeXsort implemented with CleverQuicksort as

recursion stopper at

√n

elements (

β

= 1

/

2) is a sorting algorithm that needs at most

((α+ 1)/2) ·nlog n+O(n)=1.094 ·nlog n+O(n)comparisons on average.

4 Worst-Case Eﬃcient QuickMergeSort

Although QuickMergesort has an

O

(

n2

)worst-case running time, is is quite simple to

guarantee a worst-case number of comparisons of

nlog n

+

O

(

n

): just choose the median

of the whole array as pivot. This is essentially how InSituMergesort [

11

] works. The most

eﬃcient way for ﬁnding the median is using Quickselect [

14

] as applied in InSituMergesort.

However, this does not allow the desired bound on the number of comparisons (even not when

using IntroSelect as in [

11

]). Alternatively, one could use the median-of-medians algorithm

[

4

], which, while having a linear worst-case running time, on average is quite slow. In this

section we describe a slight variation of the median-of-medians approach, which combines a

linear worst-case running time with almost the same average performance as InSituMergesort.

Again, the crucial observation is that it is not necessary to use the actual median as

pivot. As remarked in Section 2, the larger of the two sides of the partitioned array can be

sorted with Mergesort as long as the smaller side contains at least one third of the total

number of elements. Therefore, it suﬃces to ﬁnd a pivot which guarantees such a partition.

For doing so, we can apply the idea of the median-of-medians algorithm: for sorting an array

of

n

elements, we choose ﬁrst

n/

3elements as median of three elements each. Then, the

median-of-medians algorithm is used to ﬁnd the median of those

n/

3elements. This median

becomes the next pivot. Like for the median-of-medians algorithm [

4

], this ensures that at

least 2

·bn/6c

elements are less or equal and at least the same number of elements are greater

or equal than the pivot – thus, always the larger part of the partitioned array can be sorted

with Mergesort and the recursion takes place on the smaller part. The big advantage over the

straightforward application of the median-of-medians algorithm it that it is called on an array

of only size

n/

3(with the cost of introducing a small overhead for ﬁnding the

n/

3medians

of three) – giving less weight on its big constant for the linear number of comparisons. We

call this algorithm MoMQuickMergesort (MOMQMS).

In our implementation of the median-of-medians algorithm, we use select the pivot as

median of the medians of groups of ﬁve elements – we refer to [

6

, Sec. 9.3] for a detailed

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 9–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

description. The total number

T

(

n

)of comparisons in the worst case of MoMQuickMergesort

is bounded by

T(n)≤T(n/2) + S(n/2) + M(n/3) + n

3·3 + 2

3n

where

S

(

n

)is the number of comparisons incurred by Mergesort and

M

(

n

)the number of

comparisons for the median-of-medians algorithm. We have

M

(

n

)

≤

22

n

(for the variant

used in our implementation, which uses seven comparisons for ﬁnding the median of ﬁve

elements). The

n

3·

3-term comes from ﬁnding

n/

3medians of three elements, the 2

n/

3

comparisons from partitioning the remaining elements (after ﬁnding the pivot, the correct

side of the partition is known for n/3elements).

Since by [19] we have S(n)≤nlog n−0.9n, this yields

T(n)≤T(n/2) + n

2log(n/2) −0.9n

2+22

3n+5

3n

resolving to T(n)≤nlog n+ 16.1n.

For our implementation we also use a slight improvement over the basic median-of-medians

algorithm by using the approach of adaption, which was ﬁrst introduced in [

21

] for Quickselect

and recently applied to the median-of-medians algorithm [

1

]. More speciﬁcally, whenever in

a recursive call the

k

-th element is searched with

k

far apart from

n/

2(more precisely for

k≤

0

.

3

n

or

k≥

0

.

7

n

), we do not choose the median of the medians as pivot but an element

proportional to

k

(while still guaranteeing that at least 0

.

3

n

elements are discarded for the

next recursive call as in [4]).

Notice that in the presence of duplicate elements, we need to apply three-way partitioning

for guaranteeing that worst-case number of comparisons (that is elements equal to the pivot

are placed in the middle and not included into the recursive call nor into Mergesort). With

the usual partitioning (as in our experiments), we obtain a worse bound for the worst case

since it might happen that the smaller part of the array has to be sorted with Mergesort.

In order to achieve the guarantee for the worst case together with the eﬃciency of the

Median-of-3 pivot sampling, we can combine the two approaches using a trick similar to

Introsort [

23

]: we ﬁx some small

δ >

0. Whenever the pivot is contained in the interval

[δn, (1 −δ)n]

, the next pivot is selected as Median-of-3, otherwise according to the worst-case

eﬃcient procedure described in the previous section – for the following pivots switch back to

Median-of-3. When choosing

δ

not too small, the worst case number of comparisons will be

only approximately 2

n

more than of MoMQuickMergesort (because in the worst case before

every partitioning step according to MoMQuickMergesort, there will be one partitioning step

with Median-of-3 using

n

comparisons), while the average is almost as CleverQuickMergesort.

We propose δ= 1/16. We call this algorithm HybridQuickMergsort (HQMS).

5 Experiments

The collection of sorting algorithms we considered for comparison is much larger than the one

we present here, but the bar of being competitive wrt. state-of-the-art library implementations

in C++ and Java on basic data types is surprisingly high. For example, all Heapsort variants

we are aware of fail this test, we checked reﬁned implementations of Binary Heapsort [

12

,

28

],

Bottom-Up Heapsort [

26

], MDR Heapsort [

25

], QuickHeapsort [

7

], and Weak-Heapsort [

8

].

Some of these algorithm even use extra space. Timsort (by Tim Peters; used in Java for

sorting non-elementary object sequences) was less performant on simple data types.

There are fast algorithms that exploit the set of keys to be sorted (like CountingSort or

Radixsort), but we aim at a general algorithm.

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 10–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

We also experimented with Sanders and Winkel’s SuperScalarSampleSort that has a

particular memory proﬁle [

24

,

10

]. The main reason not to include the results was that

it allocates substantial amounts of space for the elements and, thus, is not internal. We

experienced that it acts fast on random data, but not as good on presorted inputs.

One remaining competitor was (Bottom-Up) Mergesort (

std::stable_sort

) in the C++

STL library, which on some inputs shows a very good performance. As this is an external

algorithm, we chose a tuned version of in-place Mergesort (

stl::inplace_stable_sort

simply was too slow) called InSituMergesort (ISMS) [11] for our experiments.

According to [

2

,

27

,

3

], for the DualPivotQuicksort algorithm variants, there was no

clear-cut winner, but the experiments suggested that the standard ones had a slight edge.

For DualPivotQuicksort we translated the most recent Oracle’s (Java) version (the algorithm

selects the 2nd and 4th element of the inner ﬁve pivot candidates of a split-into-7). As the

full sorting algorithm is lengthy and contains many checks for special input types (with

diﬀerent code fragments and parameter settings for sorting arrays of bytes, shorts, ints, ﬂoats,

doubles etc.) we extracted the integer part.

TunedQuicksort [

11

] is an engineered implementation of CleverQuicksort, probably un-

noticed by the public and contained in a paper on tuning Mergesort for studying branch

misprediction as in [

16

]. It applies Lomuto’s uni-directional Median-of-3 partitioner [

6

],

which works well for permutations and a limited number of duplicates in the element set.

As with Introsort, the algorithm stops recursion, if less than a ﬁxed number of elements are

reached (16 in our case). These elements are then sorted together, calling STL’s Insertionsort

algorithm. The implementation utilizes a stack to avoid recursion, being responsible for

tracking the remaining array intervals to be processed. We dropped TunedQuicksort from

the experiments as it failed on presorted data and data with duplicates, but we used parts of

its eﬃcient stack-based implementation. This advanced CleverQuicksort implementation and

CleverQuickMergesort (QMS) are the two extremes, while CleverQuickMergeCleverQuick-

sort (QuickMergeCleverQuicksort with a modiﬁed TunedQuicksort implementation at

√n

elements) (QMQS for short) is our tested intermediate.

QMS uses hard-coded base cases for

n <

10, while the recursion stopper in QMQS does

not. Depending on the size of the arrays the displayed numbers are averages over multiple

runs (repeats)

3

. The arrays we sorted were random permutations of

{

1

, . . . , n}

. The number

of element comparisons was measured by increasing a counter for every comparison.

For CPU time experiments we used vectors of integers as this is often most challenging

for algorithms with a lower number of comparison. All algorithms sort the same arrays. As

counting the number of comparisons aﬀects the speed of the sorting algorithms, for further

measurements (e.g., moves and comparisons) we started another sets of experiments.

We made element comparisons more expensive (we experimented with logarithms, and

elements as vectors and records). Through a lower number of comparisons results were even

better.

As a ﬁrst empirical observation, for Introsort (Std) the number of element comparisons

divided by

nlog n

is larger than 1

.

18, due to higher lower-order terms. As theoretically

shown, for QMS the number of element comparisons divided by nlog nwas below 1.

For our QuickMergesort implementations we used the block partitionioner from [

10

], which

improves the performance considerably over the standard Hoare partitioner. Figs. 3–4 show

3

Experiments were run on one core of an Intel Core i5-2520M CPU (2.50GHz, 3MB Cache) with 16GB

RAM; Operating system: Ubuntu Linux 64bit; Compiler: GNU’s

g++

(4.8.2); optimized with ﬂags

-O3

-march=native -funroll-loops.

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 11–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

30

40

50

60

70

80

90

100

110

120

130

10000 100000 1x106 1x107 1x108

Running Time [ns]

Number of Elements [N]

Std

Java

ISMS

QMS

QMQS

HQMS

MOMQMS

-2

-1

0

1

2

3

4

5

6

10000 100000 1x106 1x107 1x108

Number of Element Comparisons [-N log N then /N]

Number of Elements [N]

Std

Java

ISMS

QMS

QMQS

HQMS

MOMQMS

Figure 3 Time (left) and element comparisons (right) for sorting random integer data.

the results when sorting random integer data (with QMQS: CleverQuickMergeCleverQuicksort,

QMS: CleverQuickMergesort, MOMQMS: worst-case-eﬃcent QuickMergesort, HQMS: hybrid

of worst-case- and average-case-eﬃcient QuickMergeSort, ISMS: InSituMergesort, Java:

DualPivotQuicksort, and Std:

std::sort

). Times displayed are the total running times

divided by the number of elements (in ns). We see that QuickMergeSort variants are fast. For

measuring element moves (assignments of input data elements, e.g., a swap of two elements

is counted as three moves).

1500

2000

2500

3000

3500

4000

4500

5000

5500

6000

6500

10000 100000 1x106 1x107 1x108

Running Time [ns]

Number of Elements [N]

Std

Java

ISMS

QMS

QMQS

HQMS

MOMMSH

0

10

20

30

40

50

60

10000 100000 1x106 1x107 1x108

Number of Element Moves [-N log N then /N]

Number of Elements [N]

Std

Java

ISMS

QMS

QMQS

HQMS

MOMQMS

Figure 4

Time for sorting random data with a comparator that applies the logarithm to the

integer elements (left), and number of element moves (right).

6 Conclusion

Sorting

n

elements is one of the most frequently studied subjects in computer science with

applications in almost all areas in which eﬃcient programs run.

With variants of QuickMergesort, we contributed sorting algorithms which are able to

run faster than Introsort and DualPivotQuicksort even for elementary data. Compared to

Introsort, we reduced the leading term

α

in

α·nlog n

+

O

(

n

)in the average number of

comparisons from

α≈

1

.

18 via 1

.

09 to ﬁnally reaching 1. The algorithms are simple but

eﬀective: a) Median-of-3 pivot selection (as opposed to using a sample of

√n

), b) faster

sorting for smaller element sets. Both modiﬁcations show empirical impact and are analyzed

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 12–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

theoretically to provide upper bounds on the average number of comparisons. We discussed

options to warrant a constant-factor optimal worst-case.

In the theoretical part of our work we concentrated on average-case analyses, as we

strongly believe that this reﬂects realistic behavior more closely than worst-case analyses.

With very low overhead, QuickMergesort has implemented in a way that it becomes constant-

factor optimal in the worst-case, too. We chose eﬃcient deterministic median-of-median

strategies that are also of interest for further considerations.

For future research we propose the integration of QuickMergesort with multi-way merging,

envisioning to scale the algorithm beyond main memory capacity and eﬀective parallelizations.

References

1Andrei Alexandrescu. Fast deterministic selection. In 16th International Symposium on

Experimental Algorithms, SEA 2017, June 21-23, 2017, London, UK, pages 24:1–24:19,

2017.

2Martin Aumüller and Martin Dietzfelbinger. Optimal partitioning for dual pivot quicksort

- (extended abstract). In ICALP, pages 33–44, 2013.

3Martin Aumüller, Martin Dietzfelbinger, and Pascal Klaue. How good is multi-pivot quick-

sort? CoRR, abs/1510.04676, 2015.

4Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert E. Tarjan.

Time bounds for selection. Journal of Computer and System Sciences, 7(4):448–461, 1973.

5D. Cantone and G. Cinotti. QuickHeapsort, an eﬃcient mix of classical sorting algorithms.

Theoretical Computer Science, 285(1):25–42, 2002.

6Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliﬀord Stein. Introduction

to Algorithms. The MIT Press, 3th edition, 2009.

7Volker Diekert and Armin Weiß. Quickheapsort: Modiﬁcations and improved analysis. In

CSR, pages 24–35, 2013.

8Ronald D. Dutton. Weak-heap sort. BIT, 33(3):372–381, 1993.

9Stefan Edelkamp and Armin Weiß. QuickXsort: Eﬃcient sorting with n logn - 1.399n +

o(n) comparisons on average. In CSR, pages 139–152, 2014.

10 Stefan Edelkamp and Armin Weiß. Blockquicksort: Avoiding branch mispredictions in

Quicksort. In Piotr Sankowski and Christos D. Zaroliagis, editors, 24th Annual European

Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark, volume 57

of LIPIcs, pages 38:1–38:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.

11 Amr Elmasry, Jyrki Katajainen, and Max Stenmark. Branch mispredictions don’t aﬀect

mergesort. In SEA, pages 160–171, 2012.

12 Robert W. Floyd. Algorithm 245: Treesort 3. Comm. of the ACM, 7(12):701, 1964.

13 Jr. Ford, Lester R. and Selmer M. Johnson. A tournament problem. The American Math-

ematical Monthly, 66(5):pp. 387–389, 1959.

14 C. A. R. Hoare. Algorithm 65: Find. Commun. ACM, 4(7):321–322, July 1961.

15 Charles A. R. Hoare. Quicksort. The Computer Journal, 5(1):10–16, 1962.

16 Kanela Kaligosi and Peter Sanders. How branch mispredictions aﬀect quicksort. In ESA,

pages 780–791, 2006.

17 Jyrki Katajainen. The ultimate heapsort. In Computing: The Fourth Australasian Theory

Symposium (CATS), pages 87–96, 1998.

18 Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort. Nord.

J. Comput., 3(1):27–40, 1996.

19 Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.

Addison Wesley Longman, 2nd edition, 1998.

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 13–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

20 Shrinu Kushagra, Alejandro López-Ortiz, Aurick Qiao, and J. Ian Munro. Multi-pivot

quicksort: Theory and experiments. In ALENEX, pages 47–60, 2014.

21 Conrado Martínez, Daniel Panario, and Alfredo Viola. Adaptive sampling strategies for

quickselects. ACM Trans. Algorithms, 6(3):53:1–53:45, 2010.

22 Conrado Martínez and Salvador Roura. Optimal Sampling Strategies in Quicksort and

Quickselect. SIAM J. Comput., 31(3):683–705, 2001.

23 David R. Musser. Introspective sorting and selection algorithms. Software—Practice and

Experience, 27(8):983–993, 1997.

24 Peter Sanders and Sebastian Winkel. Super scalar sample sort. In ESA, pages 784–796,

2004.

25 Ingo Wegener. The worst case complexity of McDiarmid and Reed’s variant of bottom-up-

heap sort is less than nlog n+ 1.1n. In STACS, pages 137–147, 1991.

26 Ingo Wegener. Bottom-up-Heapsort, a new variant of Heapsort beating, on an average,

Quicksort (if nis not very small). Theoretical Computer Science, 118:81–98, 1993.

27 Sebastian Wild, Markus E. Nebel, and Ralph Neininger. Average case and distributional

analysis of dual-pivot quicksort. ACM Transactions on Algorithms, 11(3):22:1–22:42, 2015.

28 J. W. J. Williams. Algorithm 232: Heapsort. Communications of the ACM, 7(6):347–348,

1964.

©Stefan Edelkamp and Armin Weiß;

licensed under Creative Commons License CC-BY

Conference title on which this volume is based on.

Editors: Billy Editor and Bill Editors; pp. 14–14

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany