Content uploaded by Armin Weiß

Author content

All content in this area was uploaded by Armin Weiß on Nov 12, 2018

Content may be subject to copyright.

QuickXsort – A Fast Sorting Scheme in

Theory and Practice∗

Stefan Edelkamp Armin Weiß Sebastian Wild

November 6, 2018

Abstract.

QuickXsort is a highly eﬃcient in-place sequential sorting scheme

that mixes Hoare’s Quicksort algorithm with X, where X can be chosen from a

wider range of other known sorting algorithms, like Heapsort, Insertionsort and

Mergesort. Its major advantage is that QuickXsort can be in-place even if X is

not. In this work we provide general transfer theorems expressing the number of

comparisons of QuickXsort in terms of the number of comparisons of X. More

speciﬁcally, if pivots are chosen as medians of (not too fast) growing size samples,

the average number of comparisons of QuickXsort and X diﬀer only by

o

(

n

)-

terms. For median-of-

k

pivot selection for some constant

k

, the diﬀerence is a

linear term whose coeﬃcient we compute precisely. For instance, median-of-three

QuickMergesort uses at most nlg n−0.8358n+O(log n)comparisons.

Furthermore, we examine the possibility of sorting base cases with some other

algorithm using even less comparisons. By doing so the average-case number

of comparisons can be reduced down to

nlg n−

1

.

4106

n

+

o

(

n

)for a remaining

gap of only 0

.

0321

n

comparisons to the known lower bound (while using only

O(log n)additional space and O(nlog n)time overall).

Implementations of these sorting strategies show that the algorithms challenge

well-established library implementations like Musser’s Introsort.

Contents

1. Introduction 2

1.1. Related work ........... 3

1.2. Contributions ........... 5

2. QuickXsort 7

2.1. QuickMergesort .......... 8

2.2. QuickHeapsort .......... 11

3. Preliminaries 12

3.1. Hölder continuity ......... 13

3.2. Concentration results ....... 14

3.3. Beta distribution ......... 15

3.4. Beta-binomial distribution ..... 16

3.5. Continuous Master Theorem .... 16

3.6. Average costs of Mergesort ..... 17

4. The QuickXsort recurrence 18

4.1. Prerequisites ........... 18

4.2. The recurrence for the expected costs 19

∗

Parts of this article have been presented (in preliminary form) at the International Computer Science

Symposium in Russia (CSR) 2014 [

12

] and at the International Conference on Probabilistic, Combinatorial

and Asymptotic Methods for the Analysis of Algorithms (AofA) 2018 [50].

arXiv:1811.01259v1 [cs.DS] 3 Nov 2018

2QuickXsort – A Fast Sorting Scheme in Theory and Practice

4.3. Distribution of subproblem sizes . . 21

5. Analysis for growing sample sizes 24

5.1. Expected costs .......... 24

5.2. Large-deviation bounds ...... 30

5.3. Worst-case guarantees ....... 30

6. Analysis for ﬁxed sample sizes 32

6.1. Transfer theorem for ﬁxed k .... 32

6.2. Approximation by beta integrals . . 34

6.3. The toll function ......... 35

6.4. The shape function ........ 36

6.5. Which case of the CMT? ..... 38

6.6. Error bound ........... 39

7. Analysis of QuickMergesort and Quick-

Heapsort 39

7.1. QuickMergesort .......... 39

7.2. QuickHeapsort .......... 41

8. Variance of QuickXsort 43

8.1. Transfer theorem for variance ... 43

8.2.

Variance for methods with optimal

leading term ........... 46

8.3. Variance in Mergesort ....... 47

8.4. Variance in QuickMergesort .... 48

9. QuickMergesort with base cases 49

9.1. Insertionsort ........... 50

9.2. MergeInsertion .......... 51

9.3.

Combination of (1,2)-Insertion and

MergeInsertion .......... 56

10. Experiments 56

10.1. Comparison counts ........ 58

10.2. Running time experiments ..... 61

11. Conclusion 64

A. Notation 71

A.1. Generic mathematics ....... 71

A.2. Stochastics-related notation .... 71

A.3.

Speciﬁc notation for algorithms and

analysis .............. 72

1. Introduction

Sorting a sequence of

n

elements remains one of the most frequent tasks carried out by

computers. In the comparisons model, the well-known lower bound for sorting

n

distinct

elements says that using fewer than

lg

(

n

!) =

nlg n−lg e·n± O

(

log n

)

≈nlg n−

1

.

4427

n

+

O

(

log n

)

1

comparisons is not possible, both in the worst case and in the average case. The

average case refers to a uniform distribution of all input permutations (random-permutation

model).

In many practical applications of sorting, element comparisons have a similar running-

time cost as other operations (

e.g.

, element moves or control-ﬂow logic). Then, a method has

to balance costs to be overall eﬃcient. This explains why Quicksort is generally considered

the fastest general purpose sorting method, despite the fact that its number of comparisons

is slightly higher than for other methods.

There are many other situations, however, where comparisons do have signiﬁcant costs,

in particular, when complex objects are sorted

w.r.t.

a order relation deﬁned by a custom

procedure. We are therefore interested in algorithms whose comparison count is optimal up

to lower order terms,

i.e.

, sorting methods that use

nlg n

+

o

(

nlog n

)or better

nlg n

+

O

(

n

)

comparisons; moreover, we are interested in bringing the coeﬃcient of the linear term as close

to the optimal

−

1

.

4427 as possible (since the linear term is not negligible for realistic input

sizes). Our focus lies on practical methods whose running time is competitive to standard

sorting methods even when comparisons are cheap. As a consequence, expected (rather than

worst case) performance is our main concern.

We propose QuickXsort as a general template for practical, comparison-eﬃcient

internal

2

sorting methods. QuickXsort we uses the recursive scheme of ordinary Quicksort,

1We write lg for log2, but use log to denote an otherwise unspeciﬁed logarithm in the Onotation.

2

Throughout the text, we avoid the (in our context somewhat ambiguous) terms in-place or in-situ. We

instead call an algorithm internal if it needs at most

O

(

log n

)words of space (in addition to the array

1. Introduction 3

but instead of doing two recursive calls after partitioning, ﬁrst one of the segments is

sorted by some other sorting method “X”. Only the second segment is recursively sorted by

QuickXsort. The key insight is that X can use the second segment as a temporary buﬀer

area; so X can be an external method, but the resulting QuickXsort is still an internal

method. QuickXsort only requires

O

(1) words of extra space, even when X itself requires

a linear-size buﬀer.

We discuss a few concrete candidates for X to illustrate the versatility of QuickXsort.

We provide a precise analysis of QuickXsort in the form of “transfer theorems”: we express

the costs of QuickXsort in terms of the costs of X, where generally the use of QuickXsort

adds a certain overhead to the lower order terms of the comparison counts. Unlike previous

analyses for special cases, our results give tight bounds.

A particularly promising (and arguably the most natural) candidate for X is Mergesort.

Mergesort is both fast in practice and comparison-optimal up to lower order terms; but the

linear-extra space requirement can make its usage impossible. With QuickMergesort we

describe an internal sorting algorithm that is competitive in terms of number of comparisons

and running time.

Outline.

The remainder of this section surveys previous work and summarizes the contri-

butions of this article. We then describe QuickXsort in detail in Section 2. In Section 3,

we introduce mathematical notation and recall known results that are used in our analysis

of QuickXsort. In Section 4, we postulate the general recurrence for QuickXsort and

describe the distribution of subproblem sizes. Section 5 contains transfer theorems for

growing size samples and Section 6 for constant size samples. In Section 7, we apply these

transfer theorems to QuickMergesort and QuickHeapsort and discuss the results. In

Section 8 contains a transfer theorem for the variance of QuickXsort. Finally, in Section 10

we present our experimental results and conclude in Section 11 with some open questions.

1.1. Related work

We pinpoint selected relevant works from the vast literature on sorting; our overview cannot

be comprehensive, though.

Comparison-eﬃcient sorting.

There is a wide range of sorting algorithms achieving the

bound of

nlg n

+

O

(

n

)comparisons. The most prominent is Mergesort, which additionally

comes with a small coeﬃcient in the linear term. Unfortunately, Mergesort requires linear

extra space. Concerning the space UltimateHeapsort [

28

] does better, however, with

the cost of a quite large linear term. Other algorithms, provide even smaller linear terms

than Mergesort. Table 1 lists some milestones in the race for reducing the coeﬃcient

in the linear term. Despite the fundamental nature of the problem, little improvement

has been made (

w.r.t.

the worst-case comparisons count) over the Ford and Johnson’s

MergeInsertion algorithm [

17

] – which was published 1959! MergeInsertion requires

nlg n−1.329n+O(log n)comparisons in the worst case [31].

MergeInsertion has a severe drawback that renders the algorithm completely im-

practical, though: in a naive the number of element moves is quadratic in

n

. Its running

to be sorted). In particular, Quicksort is an internal algorithm whereas standard Mergesort is not

(hence called external) since it uses a linear amount of buﬀer space for merges.

4QuickXsort – A Fast Sorting Scheme in Theory and Practice

Table 1:

Milestones of comparison-eﬃcient sorting methods. The methods use (at most)

nlg n

+

bn

+

o

(

n

)comparisons for the given

b

in worst (

bwc

) and/or average

case (bac). Space is given in machine words (unless indicated otherwise).

Algorithm bac bac empirical bwc Space Time

Lower bound −1.44 −1.44 O(1) O(nlog n)

Mergesort [31] −1.24 −0.91 O(n)O(nlog n)

Insertionsort [31] −1.38#−0.91 O(1) O(n2)

MergeInsertion [31] −1.3999#[−1.43,−1.41] −1.32 O(n)O(n2)

MI+IS [27] −1.4106 O(n)O(n2)

BottomUpHeapsort [48] ? [0.35,0.39] ω(1) O(1) O(nlog n)

WeakHeapsort [7, 9] ? [−0.46,−0.42] 0.09 O(n)bits O(nlog n)

RelaxedWeakHeapsort [8] −0.91 −0.91 −0.91 O(n)O(nlog n)

InPlaceMergesort [41] ? −1.32 O(1) O(nlog n)

QuickHeapsort [3] −0.03≤≈0.20 ω(1) O(1) O(nlog n)

Improved QuickHeapsort [4] −0.99≤≈ −1.24 ω(1) O(n)bits O(nlog n)

UltimateHeapsort [28] O(1) ≈6[4] O(1) O(1) O(nlog n)

QuickMergesort #−1.24 [−1.29,−1.27] −0.32†O(1) O(nlog n)

QuickMergesort (IS)#⊥−1.38 −0.32†O(log n)O(nlog n)

QuickMergesort (MI)#⊥−1.3999 [−1.41,−1.40] −0.32†O(log n)O(nlog n)

QuickMergesort (MI+IS)#⊥−1.4106 −0.32†O(log n)O(nlog n)

#in this paper

≤only upper bound proven in cited source

†

assuming InPlaceMergesort as a worst-case stopper; with median-of-medians fallback

pivot selection: O(1), without worst-case stopper: ω(1)

⊥using given method for small subproblems; MI = MergeInsertion, IS = Insertionsort.

using a rope data structure and allowing additional O(n)space in O(nlog2n).

time can be improved to

O

(

nlog2n

)by using a rope data structure [

2

] (or a similar data

structure which allows random access and insertions in

O

(

log n

)time) for insertion of ele-

ments (which, of course, induces additional constant-factor overhead). The same is true for

Insertionsort, which, unless explicitly indicated otherwise, refers to the algorithm that

inserts elements successively into a sorted preﬁx by ﬁnding the insertion position by binary

search – as opposed to linear/sequential search in StraightInsertionsort. Note that

MergeInsertion or Insertionsort can still be used as comparison-eﬃcient subroutines to

sort base cases for Mergesort (and QuickMergesort) of size

O

(

log n

)without aﬀecting

the overall running-time complexity of O(nlog n).

Reinhardt [

41

] used this trick (and others) to design an internal Mergesort variant

that needs

nlg n−

1

.

329

n± O

(

log n

)comparisons in the worst case. Unfortunately, imple-

mentations of this InPlaceMergesort algorithm have not been documented. Katajainen

et al.’s [

29

,

19

,

15

] work inspired by Reinhardt is practical, but the number of comparisons

is larger.

Improvements over MergeInsertion have been obtained for the average number of

comparisons. A combination of MergeInsertion with a variant of Insertionsort (in-

serting two elements simultaneously) by Iwama and Teruyama uses

≤nlg n−

1

.

41064

n

comparisons on average [

27

]; as for MergeInsertion the overall complexity of remains

quadratic (resp.

Θ

(

nlog2n

)), though. Notice that the analysis in [

27

] is based on our bound

on MergeInsertion in Section 9.2.

1. Introduction 5

Previous work on QuickXsort.

Cantone and Cincotti [

3

] were the ﬁrst to explicitly naming

the mixture of Quicksort with another sorting method; they proposed QuickHeapsort.

However, the concept of QuickXsort (without calling it like that) was ﬁrst used in

UltimateHeapsort by Katajainen [

28

]. Both versions use an external Heapsort variant

in which a heap containing

m

elements is not stored compactly in the ﬁrst

m

cells of the

array, but may be spread out over the whole array. This allows to restore the heap property

with

dlg ne

comparisons after extracting some element by introducing a new gap (we can

think of it as an element of inﬁnite weight) and letting it sink down to the bottom of the

heap. The extracted elements are stored in an output buﬀer.

In UltimateHeapsort, we ﬁrst ﬁnd the exact median of the array (using a linear

time algorithm) and then partition the array into subarrays of equal size; this ensures

that with the above external Heapsort variant, the ﬁrst half of the array (on which the

heap is built) does not contain gaps (Katajainen calls this a two-level heap); the other

half of the array is used as the output buﬀer. QuickHeapsort avoids the signiﬁcant

additional eﬀort for exact median computations by choosing the pivot as median of some

smaller sample. In our terminology, it applies QuickXsort where X is external Heapsort.

UltimateHeapsort is inferior to QuickHeapsort in terms of the average case number

of comparisons, although, unlike QuickHeapsort, it allows an

nlg n

+

O

(

n

)bound for

the worst case number of comparisons. Diekert and Weiß [

4

] analyzed QuickHeapsort

more thoroughly and described some improvements requiring less than

nlg n−

0

.

99

n

+

o

(

n

)

comparisons on average (choosing the pivot as median of

√n

elements). However, both the

original analysis of Cantone and Cincotti and the improved analysis could not give tight

bounds for the average case of median-of-kQuickMergesort.

In [

15

] Elmasry, Katajainen and Stenmark proposed InSituMergesort, following the

same principle as UltimateHeapsort but with Mergesort replacing ExternalHeap-

sort. Also InSituMergesort only uses an expected linear algorithm for the median

computation.

In the conference paper [

12

], the ﬁrst and second author introduced the name QuickX-

sort and ﬁrst considered QuickMergesort as an application (including weaker forms of

the results in Section 5 and Section 9 without proofs). In [

50

], the third author analyzed

QuickMergesort with constant-size pivot sampling (see Section 6). A weaker upper

bound for the median-of-3 case was also given by the ﬁrst two authors in the preprint [

14

].

The present work is a full version of [

12

] and [

50

]; it uniﬁes and strengthens these results

(including all proofs) and it complements the theoretical ﬁndings with extensive running-time

experiments.

1.2. Contributions

In this work, we introduce QuickXsort as a general template for transforming an external

algorithm into an internal algorithm. As examples we consider QuickHeapsort and

QuickMergesort. For the readers convenience, we collect our results here (with references

to the corresponding sections).

•

If X is some sorting algorithm requiring

x

(

n

) =

nlg n

+

bn ±o

(

n

)comparisons on

expectation and

k

(

n

)

∈ω

(1)

∩o

(

n

). Then, median-of-

k

(

n

)QuickXsort needs

x(n)±o(n)comparisons in the average case (Theorem 5.1).

6QuickXsort – A Fast Sorting Scheme in Theory and Practice

•

Under reasonable assumptions, sample sizes of

√n

are optimal among all polynomial

size sample sizes.

•

The probability that median-of-

√n

QuickXsort needs more than

xwc

(

n

) + 6

n

com-

parisons decreases exponentially in 4

√n(Proposition 5.5).

•

We introduce median-of-medians fallback pivot selection (a trick similar to Intro-

sort [

39

]) which guarantees

nlg n

+

O

(

n

)comparisons in the worst case while altering

the average case only by o(n)-terms (Theorem 5.7).

•

Let

k

be ﬁxed and let X be a sorting method that needs a buﬀer of

bαnc

elements

for some constant

α∈

[0

,

1] to sort

n

elements and requires on average

x

(

n

) =

nlg n+bn ±o(n)comparisons to do so. Then median-of-kQuickXsort needs

c(n) = nlg n+ (P(k, α) + b)·n±o(n),

comparisons on average where

P

(

k, α

)is some constant depending on

k

and

α

(The-

orem 6.1). We have

P

(1

,

1) = 0

.

5070 (for median-of-3 QuickHeapsort or Quick-

Mergesort) and P(1,1/2) = 0.4050 (for median-of-3 QuickMergesort).

•

We compute the standard deviation of the number of comparisons of median-of-

k

QuickMergesort for some small values of

k

. For

k

= 3 and

α

=

1

2

, the standard

deviation is 0.3268n(Section 8).

•

When sorting small subarrays of size

O

(

log n

)in QuickMergesort with some sorting

algorithm

Z

using

z

(

n

) =

nlg n

+ (

b±ε

)

n

+

o

(

n

)comparisons on average and other

operations taking at most

O

(

n2

)time, then QuickMergesort needs

z

(

n

) +

o

(

n

)

comparisons on average (Corollary 9.2). In order to apply this result, we prove that

–

(Binary) Insertionsort needs

nlg n−

(1

.

3863

±

0

.

005)

n

+

o

(

n

)comparisons on

average (Proposition 9.3).

–

(A simpliﬁed version of) MergeInsertion [

18

] needs at most

nlg n−

1

.

3999

n

+

o(n)on average (Theorem 9.5).

Moreover, with Iwama and Teruyama’s algorithm [

27

] this can be improved sightly to

nlg n−1.4106n+o(n)comparisons (Corollary 9.9).

•

We run experiments conﬁrming our theoretical (and heuristic) estimates for the average

number of comparisons of QuickMergesort and its standard deviation and verifying

that the sublinear terms are indeed negligible (Section 10).

•

From running-time studies comparing QuickMergesort with various other sorting

methods, we conclude that our QuickMergesort implementation is among the fastest

internal general-purpose sorting methods for both the regime of cheap and expensive

comparisons (Section 10).

To simplify the arguments, in all our analyses we assume that all elements in the input

are distinct. This is no severe restriction since duplicate elements can be handled well using

fat-pivot partitioning (which excludes elements equal to the pivot from recursive calls and

calls to X).

2. QuickXsort 7

2. QuickXsort

In this section we give a more precise description of QuickXsort. Let X be a sorting

method that requires buﬀer space for storing at most

bαnc

elements (for

α∈

[0

,

1]) to sort

n

elements. The buﬀer may only be accessed by swaps so that once X has ﬁnished its work,

the buﬀer contains the same elements as before, albeit (in general) in a diﬀerent order than

before.

sort by X

sort recursively

Figure 1:

Schematic steps of

QuickXsort

. The pictures show

a sequence, where the vertical height corresponds to

key values. We start with an unsorted sequence (top),

and partition it around a pivot value (second from

top). Then one part is sorted by X (second from

bottom) using the other segment as buﬀer area (grey

shaded area). Note that this in general permutes the

elements there. Sorting is completed by applying the

same procedure recursively to the buﬀer (bottom).

QuickXsort now works as follows: First, we choose a pivot element; typically we use

the median of a random sample of the input. Next, we partition the array according to this

pivot element,

i.e.

, we rearrange the array so that all elements left of the pivot are less or

equal and all elements on the right are greater or equal than the pivot element. This results

in two contiguous segments of

J1

resp.

J2

elements; we exclude the pivot here (since it will

have reached its ﬁnal position), so

J1

+

J2

=

n−

1. Note that the (one-based) rank

R

of the

pivot is random, and so are the segment sizes J1and J2. We have R=J1+ 1 for the rank.

We then sort one segment by X using the other segment as a buﬀer. To guarantee

a suﬃciently large buﬀer for X when it sorts

Jr

(

r

= 1 or 2), we must make sure that

J3−r≥αJr

. In case both segments could be sorted by X, we use the larger of the two. After

one part of the array has been sorted with X, we move the pivot element to its correct

position (right after/before the already sorted part) and recurse on the other segment of the

array. The process is illustrated in Figure 1.

The main advantage of this procedure is that the part of the array that is not currently

being sorted can be used as temporary buﬀer area for algorithm X. This yields fast internal

variants for various external sorting algorithms such as Mergesort. We have to make sure,

however, that the contents of the buﬀer is not lost. A simple suﬃcient condition is to require

that X to maintains a permutation of the elements in the input and buﬀer: whenever a

data element should be moved to the external storage, it is swapped with the data element

occupying that respective position in the buﬀer area. For Mergesort, using swaps in the

merge (see Section 2.1) is suﬃcient. For other methods, we need further modiﬁcations.

Remark 2.1 (Avoiding unnecessary copying):

For some X, it is convenient to have the

sorted sequence reside in the buﬀer area instead of the input area. We can avoid unnecessary

swaps for such X by partitioning “in reverse order”,

i.e.

, so that large elements are left of

the pivot and small elements right of the pivot.

8QuickXsort – A Fast Sorting Scheme in Theory and Practice

Pivot sampling.

It is a standard strategy for Quicksort to choose pivots as the median of

some sample. This optimization is also eﬀective for QuickXsort and we will study its eﬀect

in detail. We assume that in each recursive call, we choose a sample of

k

elements, where

k

= 2

t

+ 1,

t∈N0

is an odd number. The sample can either be selected deterministically

(

e.g.

some ﬁxed positions) or at random. Usually for the analysis we do not need random

selection; only if the algorithm X does not preserve randomness of the buﬀer element, we

have to assume randomness (see Section 4). However, notice that in any case random

selection might be beneﬁcial as it protects against against a potential adversary who provides

a worst-case input permutation.

Unlike for Quicksort, in QuickXsort pivot selection contributes only a minor term

to the overall running time (at least in the usual case that

kn

). The reason is that

QuickXsort only makes a logarithmic number of partitioning rounds in expectation (while

Quicksort always makes a linear number of partitioning rounds) since in expectation after

each partitioning round constant fraction of the input is excluded from further consideration

(after sorting it with X). Therefore, we do not care about details of how pivots are selected,

but simply assume that selecting the median of

k

elements needs

s

(

k

) =

Θ

(

k

)comparisons

on average (e.g. using Quickselect [24]).

We consider both the case where

k

is a ﬁxed constant and where

k

=

k

(

n

)is an increasing

function of the (sub)problem size. Previous results in [

4

,

35

] for Quicksort suggest that

sample sizes

k

(

n

) =

Θ

(

√n

)are likely to be optimal asymptotically, but most of the relative

savings for the expected case are already realized for

k≤

10. It is quite natural to expect

similar behavior in QuickXsort, and it will be one goal of this article to precisely quantify

these statements.

2.1. QuickMergesort

A natural candidate for X is Mergesort: it is comparison-optimal up to the linear term (and

quite close to optimal in the linear term), and needs a

Θ

(

n

)-element-size buﬀer for practical

implementations of merging.3

Step 1:

swap

Step 2:

merge

Result:

Figure 2: Usual merging procedure where one of the two runs ﬁts into the buﬀer.

3

Merging can be done in place using more advanced tricks (see,

e.g.

, [

19

,

34

]), but those tend not to be

competitive in terms of running time with other sorting methods. By changing the global structure,

a “pure” internal Mergesort variant [

29

] can be achieved using part of the input as a buﬀer (as in

QuickMergesort) at the expense of occasionally having to merge runs of very diﬀerent lengths.

2. QuickXsort 9

Algorithm 1

Simple merging procedure that uses the buﬀer only by swaps. We move the

ﬁrst run

A

[

`..m −

1] into the buﬀer

B

[

b..b

+

n1−

1] and then merge it with the second run

A

[

m..r

](still in the original array) into the empty slot left by the ﬁrst run. By the time this

ﬁrst half is ﬁlled, we either have consumed enough of the second run to have space to grow

the merged result, or the merging was trivial,

i.e.

, all elements in the ﬁrst run were smaller.

SimpleMergeBySwaps(A[`..r], m, B[b..e])

// Merges runs A[`, m −1] and A[m..r]in-place into A[l..r]using scratch space B[b..e]

1n1:= r−`+ 1;n2:= r−`+ 1

// Assumes A[`, m −1] and A[m..r]are sorted, n1≤n2and n1≤e−b+ 1.

2for i= 0, . . . , n1−1

3Swap(A[`+i], B[b+i])

4end for

5i1:= b;i2:= m;o:= `

6while i1< b +n1and i2≤r

7if B[i1]≤A[i2]

8Swap(A[o], B[i2]);o:= o+ 1;i1:= i1+ 1

9else

10 Swap(A[o], A[i1]);o:= o+ 1;i2:= i2+ 1

11 end if

12 end while

13 while i1< b +n1

14 Swap(A[o], B[i2]);o:= o+ 1;i1:= i1+ 1

15 end while

Simple swap-based merge.

To be usable in QuickXsort, we use a swap-based merge

procedure as given in Algorithm 1. Note that it suﬃces to move the smaller of the two runs

to a buﬀer (see Figure 2); we use a symmetric version of Algorithm 1 when the second run

is shorter. Using classical top-down or bottom-up Mergesort as described in any algorithms

textbook (e. g. [46]), we thus get along with α=1

2.

The code in Algorithm 1 illustrates that very simple adaptations suﬃce for QuickMerge-

sort. This merge procedure leaves the merged result in the range previously occupied by

the two input runs. This “in-place”-style interface comes at the price of copying one run.

“Ping-pong” merge.

Copying one run can be avoided if we instead write the merge result

into an output buﬀer (and leave it there). This saves element moves, but uses buﬀer space

for all

n

elements, so we have

α

= 1 here. The Mergesort scaﬀold has to take care to

correctly orchestrate the merges, using the two arrays alternatingly; this alternating pattern

resembled the ping-pong game.

“Ping-pong” merge with smaller buﬀer.

It is also possible to implement the “ping-pong”

merge with

α

=

1

2

. Indeed, the copying in Algorithm 1 can be avoided by sorting the ﬁrst run

with the “ping-pong” merge. This will automatically move it to the desired position in the

buﬀer and the merging can proceed as in Algorithm 1. Figure 3 illustrates this idea, which

is easily realized with a recursive procedure. Our implementation of QuickMergesort

uses this variant.

10 QuickXsort – A Fast Sorting Scheme in Theory and Practice

Step 1:

ping-pong sort

Step 2:

ping-pong sort

Step 3:

merge

Result:

Figure 3: Mergesort with α= 1/2using ping-pong merges.

Step 1:

Step 2:

Result:

Figure 4:

Reinhardt’s merging procedure that needs only buﬀer space for half of the smaller run.

In the ﬁrst step the two sequences are merged starting with the smallest elements

until the empty space is ﬁlled. Then there is enough empty space to merge the

sequences from the right into the ﬁnal position.

Reinhardt’s merge.

A third, less obvious alternative was proposed by Reinhardt [

41

], which

allows to use an even smaller

α

for merges where input and buﬀer area form a contiguous

region; see Figure 4. Assume we are given an array

A

with positions

A

[1

, . . . , t

]being

empty or containing dummy elements (to simplify the description, we assume the ﬁrst case),

A

[

t

+ 1

, . . . , t

+

`

]and

A

[

t

+

`

+ 1

, . . . , t

+

`

+

r

]containing two sorted sequences. We wish to

merge the two sequences into the space

A

[1

, . . . , `

+

r

](so that

A

[

`

+

r

+ 1

, . . . , t

+

`

+

r

]

becomes empty). We require that

r/

2

≤t<r

. First we start from the left merging the

two sequences into the empty space until there is no space left between the last element of

the already merged part and the ﬁrst element of the left sequence (ﬁrst step in Figure 4).

At this point, we know that at least

t

elements of the right sequence have been introduced

into the merged part; so, the positions

t

+

`

+ 1 through

`

+ 2

t

are empty now. Since

`

+

t

+ 1

≤`

+

r≤`

+ 2

t

, in particular,

A

[

`

+

r

]is empty now and we can start merging

the two sequences right-to-left into the now empty space (where the right-most element is

moved to position A[`+r]– see the second step in Figure 4).

In order to have a balanced merge, we need

`

=

r

and so

t≥

(

`

+

r

)

/

4. Therefore, when

applying this method in QuickMergesort, we have α=1

4.

Remark 2.2 (Even less buﬀer space?):

Reinhardt goes even further: even with

εn

space, we can merge in linear time when

ε

is ﬁxed by moving one run whenever we run out

2. QuickXsort 11

of space. Even though not more comparisons are needed, this method is quickly dominated

by the additional data movements when ε < 1

4, so we do not discuss it in this article.

Another approach for dealing with less buﬀer space is to allow imbalanced merges: for

both Reinhardt’s merge and the simple swap-based merge, we need only additional space for

(half) the size of the smaller run. Hence, we can merge a short run into a long run with a

relatively small buﬀer. The price of this method is that the number of comparisons increases,

while the number of additional moves is better than with the previous method. We shed

some more light on this approach in [10].

Avoiding Stack Space.

The standard version of Mergesort uses a top-down recursive

formulation. It requires a stack of logarithmic height, which is usually deemed acceptable

since it is dwarfed by the buﬀer space for merging. Since QuickMergesort removes the

need for the latter, one might prefer to also avoid the logarithmic stack space.

An elementary solution is bottom-up Mergesort, where we form pairs of runs and

merge them, except for, potentially, a lonely rightmost run. This variant occasionally merges

two runs of very diﬀerent sizes, which aﬀects the overall performance (see Section 3.6).

A simple (but less well-known) modiﬁcation that we call boustrophedonic

4

Mergesort

allows us to get the best of both worlds [

20

]: instead of leaving a lonely rightmost run

unmerged (and starting again at the beginning with the next round of merges), we start the

next merging round at the same end, moving backwards through the array. We hence begin

by merging the lonely run, and so avoid ever having a two runs that diﬀer by more than a

factor of two in length. The logic for handling odd and even numbers of runs correctly is

more involved, but constant extra space can be achieved without a loss in the number of

comparisons.

2.2. QuickHeapsort

Another good option – and indeed the historically ﬁrst one – for X is Heapsort.

Why Heapsort?

In light of the fact that Heapsort is the only textbook method with

reasonable overall performance that already sorts with constant extra space, this suggestion

might be surprising. Heapsort rather appears to be the candidate least likely to proﬁt from

QuickXsort. Indeed, it is a reﬁned variant of Heapsort that is an interesting candidate for X.

To work in place, standard Heapsort has to maintain the heap in a very rigid shape to

store it in a contiguous region of the array. And this rigid structure comes at the price of

extra comparisons. Standard Heapsort requires up to 2(

h−

1) comparisons to extract the

maximum from a heap of height

h

, for an overall 2

nlg n± O

(

n

)comparisons in the worst

case.

Comparisons can be saved by ﬁrst ﬁnding the cascade of promotions (

a.k.a.

the special

path),

i.e.

, the path from the root to a leaf, always choosing to the larger of the two children.

Then, in a second step, we ﬁnd the correct insertion position along this line of the element

currently occupying the last position of the heap area. The standard procedure corresponds

to sequential search from the root. Floyd’s optimization (

a.k.a.

bottom-up Heapsort [

48

])

instead uses sequential search from the leaf. It has a substantially higher chance to succeed

4

after boustrophedon, a type of bi-directional text seen in ancient manuscripts where lines alternate between

left-to-right and right-to-left order; literally “turning like oxen in ploughing”.

12 QuickXsort – A Fast Sorting Scheme in Theory and Practice

early (in the second phase), and is probably optimal in that respect for the average case. If

a better worst case is desired, one can use binary search on the special path, or even more

sophisticated methods [21].

External Heapsort.

In ExternalHeapsort, we avoid any such extra comparisons by

relaxing the heap’s shape. Extracted elements go to an output buﬀer and we only promote

the elements along the special path into the gap left by the maximum. This leaves a gap at

the leaf level, that we ﬁll with a sentinel value smaller than any element’s value (in the case

of a max-heap). ExternalHeapsort uses

nlg n±O

(

n

)comparisons in the worst case, but

requires a buﬀer to hold nelements. By using it as our X in QuickXsort, we can avoid the

extra space requirement.

When using ExternalHeapsort as X, we cannot simply overwrite gaps with sentinel

values, though: we have to keep the buﬀer elements intact! Fortunately, the buﬀer elements

themselves happen to work as sentinel values. If we sort the segment of large elements

with ExternalHeapsort, we swap the max from the heap with a buﬀer element, which

automatically is smaller than any remaining heap element and will thus never be promoted

as long as any actual elements remain in the heap. We know when to stop since we know

the segment sizes; after that many extractions, the right segment is sorted and the heap area

contains only buﬀer elements.

We use a symmetric variant (with a min-oriented heap) if the left segment shall be sorted

by X. For detailed code for the above procedure, we refer to [3] or [4].

Trading space for comparisons.

Many options to further reduce the number of comparisons

have been explored. Since these options demand extra space beyond an output buﬀer and

cannot restore the contents of that extra space, using them in QuickXsort does not yield

an internal sorting method, but we brieﬂy mention these variants here.

One option is to remember outcomes of sibling comparisons to avoid redundant compar-

isons in following steps [

37

]. In [

4

, Thm. 4], this is applied to QuickHeapsort together

with some further improvements using extra space.

Another option is to modify the heap property itself. In a weak heap, the root of a

subtree is only larger than one of the subtrees, and we use an extra bit to store (and

modify) which one it is. The more liberal structure makes construction of weak heaps

more eﬃcient: indeed, they can be constructed using

n−

1comparisons. WeakHeapsort

has been introduced by Dutton [

7

] and applied to QuickWeakHeapsort in [

8

]. We

introduced a reﬁned version of ExternalWeakHeapsort in [

12

] that works by the

same principle as ExternalHeapsort; more details on this algorithm, its application in

QuickWeakHeapsort, and the relation to Mergesort can be found in our preprint [

11

].

Due to the additional bit-array, which is not only space-consuming, but also costs time to

access, WeakHeapsort and QuickWeakHeapsort are considerably slower than ordinary

Heapsort,Mergesort, or Quicksort; see the experiments in [

8

,

12

]. Therefore, we do

not consider these variants here in more detail.

3. Preliminaries

In this section, we introduce some important notation and collect known results for reference.

The reader who is only interested in the main results may skip this section. A comprehensive

3. Preliminaries 13

list of notation is given in Appendix A.

We use Iverson’s bracket [

stmt

]to mean 1if

stmt

is true and 0otherwise.

P[E]

denotes

the probability of event

E

,

E[X]

the expectation of random variable

X

. We write

XD

=Y

to

denote equality in distribution.

With

f

(

n

) =

g

(

n

)

±h

(

n

)we mean that

|f(n)−g(n)| ≤ h

(

n

)for all

n

, and we use

similar notation

f

(

n

) =

g

(

n

)

± O

(

h

(

n

)) to state asymptotic bounds on the diﬀerence

|f(n)−g(n)|

=

O

(

h

(

n

)). We remark that both use cases are examples of “one-way equalities”

that are in common use for notational convenience, even though

⊆

instead of =would be

formally more appropriate. Moreover, f(n)∼g(n)means f(n) = g(n)±o(g(n)).

Throughout,

lg

refers to the logarithm to base 2while, while

ln

is the natural logarithm.

Moreover, log is used for the logarithm with unspeciﬁed base (for use in O-notation).

We write

ab

(resp.

ab

) for the falling (resp. rising) factorial power

a

(

a−

1)

···

(

a−b

+ 1)

(resp. a(a+ 1) · ··(a+b−1)).

3.1. Hölder continuity

A function

f

:

I→R

deﬁned on a bounded interval

I

is Hölder-continuous with exponent

η∈(0,1] if

∃C∀x, y ∈I:f(x)−f(y)≤C|x−y|η.

Hölder-continuity is a notion of smoothness that is stricter than (uniform) continuity but

slightly more liberal than Lipschitz-continuity (which corresponds to

η

= 1).

f

: [0

,

1]

→R

with

f

(

z

) =

zln

(1

/z

)is a stereotypical function that is Hölder-continuous (for any

η∈

(0

,

1))

but not Lipschitz (see Lemma 3.5 below).

One useful consequence of Hölder-continuity is given by the following lemma: an er-

ror bound on the diﬀerence between an integral and the Riemann sum ([

49

, Proposition

2.12–(b)]).

Lemma 3.1 (Hölder integral bound):

Let

f

: [0

,

1]

→R

be Hölder-continuous with

exponent η. Then

Z1

x=0

f(x)dx =1

n

n−1

X

i=0

f(i/n)± O(n−η),(n→ ∞).

Proof:

The proof is a simple computation. Let

C

be the Hölder-constant of

f

. We split the

integral into small integrals over intervals of width

1

n

and use Hölder-continuity to bound

the diﬀerence to the corresponding summand:

Z1

x=0

f(x)dx −1

n

n−1

X

i=0

f(i/n)

=

n−1

X

i=0 Z(i+1)/n

i/n

f(x)dx −f(i/n)

n

=

n−1

X

i=0 Z(i+1)/n

i/n f(x)−f(i/n)dx

≤

n−1

X

i=0 Z(i+1)/n

i/n

Cx−i

nηdx

14 QuickXsort – A Fast Sorting Scheme in Theory and Practice

≤C

n−1

X

i=0 Z(i+1)/n

i/n 1

nηdx

=Cn−ηZ1

0

1dx

=O(n−η).

Remark 3.2 (Properties of Hölder-continuity):

We considered only the unit interval

as the domain of functions, but this is no restriction: Hölder-continuity (on bounded

domains) is preserved by addition, subtraction, multiplication and composition (see,

e.g.

,

[

47

, Section 4.6] for details). Since any linear function is Lipschitz, the result above holds for

Hölder-continuous functions f: [a, b]→R.

If our functions are deﬁned on a bounded domain, Lipschitz-continuity implies Hölder-

continuity and Hölder-continuity with exponent

η

implies Hölder-continuity with exponent

η0< η. A real-valued function is Lipschitz if its derivative is bounded.

3.2. Concentration results

We write

XD

= Bin

(

n, p

)if

X

is has a binomial distribution with

n∈N0

trials and success

probability

p∈

[0

,

1]. Since

X

is a sum of independent random variables with bounded

inﬂuence on the result, Chernoﬀ bounds imply strong concentration results for

X

. We will

only need a very basic variant given in the following lemma.

Lemma 3.3 (Chernoﬀ Bound, Theorem 2.1 of [36]):

Let

XD

= Bin

(

n, p

)and

δ≥

0.

Then

P"

X

n−p≥δ#≤2 exp(−2δ2n).(1)

A consequence of this bound is that we can bound expectations of the form

E[f

(

X

n

)

]

, by

f

(

p

)plus a small error term if

f

is “suﬃciently smooth”. Hölder-continuous (introduced

above) is an example for such a criterion:

Lemma 3.4 (Expectation via Chernoﬀ):

Let

p∈

(0

,

1) and

XD

= Bin

(

n, p

), and let

f

: [0

,

1]

→R

be a function that is bounded by

|f

(

x

)

| ≤ A

and Hölder-continuous with

exponent η∈(0,1] and constant C. Then it holds that

EfX

n =f(p)±ρ,

where we have for any δ≥0that

ρ≤C

ln 2 ·δη1−2e−2δ2n+ 4Ae−2δ2n

For any ﬁxed ε > 1−η

2, we obtain ρ=o(n−1/2+ε)as n→ ∞ for a suitable choice of δ.

Proof of Lemma 3.4: By the Chernoﬀ bound we have

P"

X

n−p≥δ#≤2uexp(−2δ2n).(2)

3. Preliminaries 15

To use this on

EfX

n−f

(

p

)

, we divide the domain [0

,

1] of

X

n

into the region of values

with distance at most δfrom p, and all others. This yields

EfX

n−f(p)≤

(2)

sup

ξ:|ξ|<δf(p+ξ)−f(p)·1−2e−2δ2n

+ sup

xf(x)−f(p)·2e−2δ2n

≤

Lemma 3.5 C·δη·1−2e−2δ2n+ 2A·2e−2δ2n.

This proves the ﬁrst part of the claim.

For the second part, we assume

ε > 1−η

2

is given, so we can write

η

= 1

−

2

ε

+ 4

β

for a constant

β >

0, and

η

= (1

−

2

ε

)

/

(1

−

2

β0

)for another constant

β0>

0. We may

further assume

ε < 1

2

; for larger values the claim is vacuous. We then choose

δ

=

nc

with

c=−1

2−1/2−ε

η/2 = −1

4−1−2ε

4η. For large nwe thus have

ρ·n1/2−ε≤Cδηn1/2−ε1−2 exp(−2δ2n)+ 4An1/2−εexp(−2δ2n)

=Cn−β

| {z }

→0·1−2 exp(−2nβ0)

| {z }

→0+ 4Aexp−2nβ0+ (1

2−ε) ln(n)

| {z }

→0

→0

for n→ ∞, which implies the claim.

3.3. Beta distribution

The analysis in Section 6 makes frequent use of the beta distribution: For

λ, ρ ∈R>0

,

XD

= Beta

(

λ, ρ

)if

X

admits the density

fX

(

z

) =

zλ−1

(1

−z

)

ρ−1/

B(

λ, ρ

)where B(

λ, ρ

) =

R1

0zλ−1

(1

−z

)

ρ−1dz

is the beta function. It is a standard fact that for

λ, ρ ∈N≥1

we have

B(λ, ρ) = (λ−1)!(ρ−1)!

(λ+ρ−1)! ;(3)

a generalization of this identity using the gamma function holds for any

λ, ρ >

0[

5

,

Eq. (5.12.1)]. We will also use the regularized incomplete beta function

Ix,y(λ, ρ) = Zy

x

zλ−1(1 −z)ρ−1

B(λ, ρ)dz, (λ, ρ ∈R+,0≤x≤y≤1).(4)

Clearly I0,1(λ, ρ)=1.

Let us denote by

h

the function

h

: [0

,

1]

→R≥

0with

h

(

x

) =

−xlg x

. We have for a

beta-distributed random variable XD

= Beta(λ, ρ)for λ, ρ ∈N≥1that

E[h(X)] = B(λ, ρ)Hλ+ρ−Hλ.(5)

This follows directly from a well-known closed form a “logarithmic beta integral” (see,

e.g., [49, Eq. (2.30)]).

Z1

0

ln(z)·zλ−1(1 −z)ρ−1dz = B(λ, ρ)Hλ−1−Hλ+ρ−1

We will make use of the following elementary properties of

h

later (towards applying

Lemma 3.4).

16 QuickXsort – A Fast Sorting Scheme in Theory and Practice

Lemma 3.5 (Elementary Properties of h):

Let

h

: [0

,

1]

u→R≥0

with

h

(

x

) =

−xlg

(

x

).

(a) his bounded by 0≤h(x)≤lg e

e≤0.54 for x∈[0,1].

(b) g

(

x

)

:=−xln x

=

ln

(2)

h

(

x

)is Hölder-continuous in [0

,

1] for any exponent

η∈

(0

,

1),

i.e.

, there is a constant

C

=

Cη

such that

|g

(

y

)

−g

(

x

)

| ≤ Cη|y−x|η

for all

x, y ∈

[0

,

1].

A possible choice for Cηis given by

Cη=Z1

0ln(t)+11

1−η1−η

(6)

For example, η= 0.99 yields Cη≈37.61.

A detailed proof for the second claim appears in [

49

, Lemma 2.13]. Hence,

h

is suﬃciently

smooth to be used in Lemma 3.4.

3.4. Beta-binomial distribution

Moreover, we use the beta-binomial distribution, which is a conditional binomial distribution

with the success probability being a beta-distributed random variable. If

XD

= BetaBin

(

n, λ, ρ

)

then

P[X=i] = n

i!B(λ+i, ρ + (n−i))

B(λ, ρ).

Beta-binomial distributions are precisely the distribution of subproblem sizes after partition-

ing in Quicksort. We detail this in Section 4.3.

A property that we repeatedly use here is a local limit law showing that the normalized

beta-binomial distribution converges to the beta distribution. Using Chernoﬀ bounds after

conditioning on the beta distributed success probability shows that

BetaBin

(

n, λ, ρ

)

/n

converges to

Beta

(

λ, ρ

)(in a speciﬁc sense); but we obtain stronger error bounds for ﬁxed

λ

and

ρ

by directly comparing the probability density functions (PDFs). This yields the

following result; (a detailed proof appears in [49, Lemma 2.38]).

Lemma 3.6 (Local Limit Law for Beta-Binomial, [49]):

Let (

I(n)

)

n∈N≥1

be a family

of random variables with beta-binomial distribution,

I(n)D

= BetaBin

(

n, λ, ρ

)where

λ, ρ ∈

{

1

}∪R≥2

, and let

fB

(

z

) =

zλ−1

(1

−z

)

ρ−1/

B(

λ, ρ

)be the density of the

Beta

(

λ, ρ

)distribution.

Then we have uniformly in z∈(0,1) that

n·PI=bz(n+ 1)c=fB(z)± O(n−1),(n→ ∞).

That is,

I(n)/n

converges to

Beta

(

λ, ρ

)in distribution, and the probability weights converge

uniformly to the limiting density at rate O(n−1).

3.5. Continuous Master Theorem

For solving recurrences, we build upon Roura’s master theorems [

43

]. The relevant continuous

master theorem is restated here for convenience:

3. Preliminaries 17

Theorem 3.7 (Roura’s Continuous Master Theorem (CMT)):

Let Fnbe recursively deﬁned by

Fn=

bn,for 0≤n<N;

tn+

n−1

X

j=0

wn,j Fj,for n≥N,(7)

where

tn

, the toll function, satisﬁes

tn∼Knσlogτ

(

n

)as

n→ ∞

for constants

K6

= 0,

σ≥

0

and

τ > −

1. Assume there exists a function

w

: [0

,

1]

→R≥0

, the shape function, with

R1

0w(z)dz ≥1and

n−1

X

j=0 wn,j −Z(j+1)/n

j/n

w(z)dz=O(n−d),(n→ ∞),(8)

for a constant d > 0. With H:= 1 −Z1

0

zσw(z)dz, we have the following cases:

1. If H > 0, then Fn∼tn

H.

2. If H= 0, then Fn∼tnln n

e

Hwith e

H=−(τ+ 1) Z1

0

zσln(z)w(z)dz.

3. If H < 0, then Fn=O(nc)for the unique c∈Rwith Z1

0

zcw(z)dz = 1.

Theorem 3.7 is the “reduced form” of the CMT, which appears as Theorem 1.3.2 in Roura’s

doctoral thesis [

42

], and as Theorem 18 of [

35

]. The full version (Theorem 3.3 in [

43

]) allows

us to handle sublogarithmic factors in the toll function, as well, which we do not need here.

3.6. Average costs of Mergesort

We recapitulate some known facts about standard mergesort. The average number of

comparisons for Mergesort has the same – optimal – leading term

nlg n

in the worst and

best case; this is true for both the top-down and bottom-up variants. The coeﬃcient of the

linear term of the asymptotic expansion, though, is not a constant, but a bounded periodic

function with period

lg n

, and the functions diﬀer for best, worst, and average case and the

variants of Mergesort [45, 16, 40, 25, 26].

For this paper, we conﬁne ourselves to upper and lower bounds for the average case of the

form

x

(

n

) =

an lg n

+

bn ±O

(

n1−ε

)with constant

b

valid for all

n

. Setting

b

to the inﬁmum

resp. supremum of the periodic function, we obtain the following lower resp. upper bounds

for top-down [26] and bottom-up [40] Mergesort

xtd(n) = nlg n−1.2645n

1.2408n+ 2 ± O(n−1)(9)

=nlg n−(1.25265 ±0.01185)n+ 2 ± O(n−1)and

xbu(n) = nlg n−1.2645n

0.2645n± O(1)

=nlg n−(0.7645 ±0.5)n± O(1).

18 QuickXsort – A Fast Sorting Scheme in Theory and Practice

4. The QuickXsort recurrence

In this section, we set up a recurrence equation for the costs of QuickXsort. This recurrence

will be the basis for our analyses below. We start with some prerequisites and assumptions

about X.

4.1. Prerequisites

For simplicity we will assume that below a constant subproblem size

w

(with

w≥k

in

the case of constant size-

k

samples for pivot selection) are sorted with X (using a constant

amount of extra space). Nevertheless, we could use any other algorithm for that as this

only inﬂuences the constant term of costs. A common choice in practice is replace X by

StraightInsertionsort to sort the small cases.

We further assume that selecting the pivot from a sample of size

k

costs

s

(

k

)comparisons,

where we usually assume

s

(

k

) =

Θ

(

k

),

i.e.

, a (expected-case) linear selection method is used.

Now, let

c

(

n

)be the expected number of comparisons in QuickXsort on arrays of size

n

,

where the expectation is over the random choices for selecting the pivots for partitioning.

Preservation of randomness?

Our goal is to set up a recurrence equation for

c

(

n

). We

will justify here that such a recursive relation exists.

For the Quicksort part of QuickXsort, only the ranks of the chosen pivot elements has an

inﬂuence on the costs; partitioning itself always needs precisely one comparison per element.

5

Since we choose the pivot elements randomly (from a random sample), the order of the input

does not inﬂuence the costs of the Quicksort part of QuickXsort.

For general X, the sorting costs do depend on the order of the input, and we would like

to use the average-case bounds for X, when it is applied on a random permutation. We may

assume that our initial input is indeed a random permutation of the elements,

6

but this is

not suﬃcient! We also have to guarantee that the inputs for recursive calls are again random

permutations of their elements.

A simple suﬃcient condition for this “randomness-preserving” property is that X may not

compare buﬀer contents. This is a natural requirement,

e.g.

, for our Mergesort variants. If

no buﬀer elements are compared to each other and the original input is a random permutation

of its elements, so are the segments after partitioning, and so will be the buﬀer after X has

terminated. Then we can set up a recurrence equation for

c

(

n

)using the average-case cost

for X. We may also replace the random sampling of pivots by choosing any ﬁxed positions

without aﬀecting the expected costs c(n).

However, not all candidates for X meet this requirement. (Basic) QuickHeapsort does

compare buﬀer elements to each other (see Section 2.2) and, indeed, the buﬀer elements are

not in random order when the Heapsort part has ﬁnished. For such X, we assume that

genuinely random samples for pivot selection are used. Moreover, and we will have to use

conservative bounds for the number of comparisons incurred by X,

e.g.

, worst or best case

5

We remark that this is no longer true for multiway partitioning methods where the number of comparisons

per element is not necessarily the same for all possible outcomes. Similarly, the number of swaps in the

standard partitioning method depends not only on the rank of the pivot, but also on how “displaced” the

elements in the input are.

6

It is a reasonable option to enforce this assumption in an implementation by an explicit random shuﬄe of

the input before we start sorting. Sedgewick and Wayne, for example, do this for the implementation of

Quicksort in their textbook [46].

4. The QuickXsort recurrence 19

results, as the input of X is not random anymore. This only allows to derive upper or lower

bounds for

c

(

n

), whereas for randomness preserving methods, the expected costs can be

characterized precisely by the recurrence.

In both cases, we use

x

(

n

)as (a bound for) the number of comparisons needed by X to

sort nelements, and we will assume that

x(n) = an lg n+bn ± O(n1−ε),(n→ ∞),

for constants a,band ε∈(0,1].

4.2. The recurrence for the expected costs

We can now proceed to the recursive description of the expected costs

c

(

n

)of QuickXsort.

The description follows the recursive nature of the algorithm. Recall that QuickXsort tries

to sort the largest segment with X for which the other segment gives suﬃcient buﬀer space.

We ﬁrst consider the case

α

= 1, in which this largest segment is always the smaller of the

two segments created.

Case α= 1.

Let us consider the recurrence for

c

(

n

)(which holds for both constant and

growing size

k

=

k

(

n

)). We distinguish two cases: ﬁrst, let

α

= 1. We obtain the recurrence

c(n) = x(n)≥0,(for n≤w)

c(n) = n−k(n)

| {z }

partitioning

+sk(n)

| {z }

pivot sampling

+E[J1> J2](x(J1) + c(J2))

+E[J1≤J2](x(J2) + c(J1))(for n>w)

=

2

X

r=1

E[Ar(Jr)c(Jr)] + t(n)

where

A1(J) = [J≤J0], A2(J) = [J < J 0]with J0= (n−1) −J,

t(n) = n−k+s(k) + E[A2(J2)x(J1)] + E[A1(J1)x(J2)].

The expectation here is taken over the choice for the random pivot,

i.e.

, over the segment

sizes

J1

resp.

J2

. Note that we use both

J1

and

J2

to express the conditions in a convenient

form, but actually either one is fully determined by the other via

J1

+

J2

=

n−

1. We call

t

(

n

)the toll function. Note how

A1

and

A2

change roles in recursive calls and toll functions,

since we always sort one segment recursively and the other segment by X.

General α.

For

α <

1, we obtain two cases: When the split induced by the pivot is

“uneven” – namely when

min{J1, J2}< α max{J1, J2}

,

i.e.

,

max{J1, J2}>n−1

1+α

– the smaller

segment is not large enough to be used as buﬀer. Then we can only assign the large segment

as a buﬀer and run X on the smaller segment. If however the split is “about even”,

i.e.

,

both segments are

≤n−1

1+α

we can sort the larger of the two segments by X. These cases also

20 QuickXsort – A Fast Sorting Scheme in Theory and Practice

show up in the recurrence of costs.

c(n) = x(n)≥0,(for n≤w)

c(n) = (n−k) + s(k) + EhJ1, J2≤1

1+α(n−1)·[J1> J2]·x(J1) + c(J2)i

+EhJ1, J2≤1

1+α(n−1)·[J1≤J2]·x(J2) + c(J1)i

+EhJ2>1

1+α(n−1)·x(J1) + c(J2)i

+EhJ1>1

1+α(n−1)·x(J2) + c(J1)i(for n>w)

=

2

X

r=1

E[Ar(Jr)c(Jr)] + t(n)

where

A1(J) = hJ, J0≤1

1+α(n−1)i·[J≤J0] + hJ > 1

1+α(n−1)iwith J0= (n−1) −J

A2(J) = hJ, J0≤1

1+α(n−1)i·[J < J0] + hJ > 1

1+α(n−1)i

t(n) = n−k+s(k) + E[A2(J2)x(J1)] + E[A1(J1)x(J2)]

The above formulation actually covers α= 1 as a special case, so in both cases we have

c(n) =

2

X

r=1

E[Ar(Jr)c(Jr)] + t(n)(10)

where

A1

(resp.

A2

) is the indicator random variable for the event “left (resp. right) segment

sorted recursively” and

t(n) = n−k+s(k) +

2

X

r=1

E[Arx(J3−r)].(11)

We note that the expected number of partitioning rounds is only

Θ

(

log n

)and hence also

the expected overall number of comparisons used in all pivot sampling rounds combined is

only O(log n)when kis constant.

Recursion indicator variables.

It will be convenient to rewrite

A1

(

J1

)and

A2

(

J2

)in terms

of the relative subproblem size:

A1(J1) = J1

n−1∈hα

1 + α,1

2i∪1

1 + α,1i,

A2(J2) = J2

n−1∈hα

1 + α,1

2∪1

1 + α,1i.

Graphically, if we view

J1/

(

n−

1) as a point in the unit interval, the following picture shows

which subproblem is sorted recursively for typical values of

α

; (the other subproblem is

sorted by X).

4. The QuickXsort recurrence 21

A2= 1

A1= 1

A2= 1

A1= 1

0α

1+α1

2

1

1+α1

α=1

2

A2= 1

A1= 1

A2= 1

A1= 1

0α

1+α1

2

1

1+α1

α=1

4

A2= 1

A1= 1

01

21

α= 1

Obviously, we have

A1

+

A2

= 1 for any choice of

J1

, which corresponds to having exactly

one recursive call in QuickXsort.

4.3. Distribution of subproblem sizes

A vital ingredient to our analyses below is to characterize the distribution of the subproblem

sizes J1and J2.

Without pivot sampling, we have

J1D

=U

[0

..n −

1], a discrete uniform distribution. In

this paper, though, we assume throughout that pivots are chosen as the me the median of a

random sample of

k

= 2

t

+ 1, elements, where

t∈N0

.

k

may or may not depend on

n

; we

write k=k(n)to emphasize a potential dependency.

By symmetry, the two subproblem sizes always have the same distribution,

J1D

=J2

. We

will therefore in the following simply write

J

instead of

J1

when the distinction between left

and right subproblem is not important.

Combinatorial model.

What is the probability

P[J

=

j]

to obtain a certain subproblem

size

j

? An elementary counting argument yields the result. For selecting the

j

+ 1-st element

as pivot, the sample needs to contain

t

elements smaller than the pivot and

t

elements large

than the pivot. There are

n

k

possible choices for the sample in total, and

j

t·n−1−j

t

of

which will select the j+ 1-st element as pivot. Thus,

P[J=j] = j

tn−1−j

t

n

k

Note that this probability is 0for

j < t

or

j > n −

1

−t

, so we can always write

J

=

I

+

t

for a random variable I∈[0..n −k]with P[I=i] = P[J=i+t].

The following lemma can be derived by direct elementary calculations, showing that

J

is

concentrated around its expected value n−1

2.

Lemma 4.1 ([4, Lemma 2]):

Let 0

< δ < 1

2

. If we choose the pivot as median of a random

sample of k= 2t+ 1 elements where k≤n

2, then the rank of the pivot R=J1+ 1 satisﬁes

PR≤n

2−δn< kρtand PR≥n

2+δn< kρt

where ρ= 1 −4δ2<1.

22 QuickXsort – A Fast Sorting Scheme in Theory and Practice

Proof: First note that the probability for choosing the r-th element as pivot satisﬁes

n

k!·P[R=r] = r−1

t! n−r

t!.

We use the notation of falling factorial x`=x···(x−`+ 1). Thus, x

`=x`/`!.

P[R=r] = k!·(r−1)t·(n−r)t

(t!)2·nk

= 2t

t!k

(n−k−1)

t−1

Y

i=0

(r−1−i)(n−r−i)

(n−2i−1)(n−2i).

For

r≤t

we have

P[R

=

r]

= 0. So, let

t<r≤n

2−δn

and let us consider an index

i

in the

product with 0≤i<t:

(r−1−i)(n−r−i)

(n−2i−1)(n−2i)≤(r−i)(n−r−i)

(n−2i)(n−2i)

=n

2−i−n

2−r·n

2−i+n

2−r

(n−2i)2

=n

2−i2−n

2−r2

(n−2i)2

≤1

4−n

2−n

2−δn2

n2=1

4−δ2.

We have 2t

t≤4t. Since k≤n

2, we obtain:

P[R=r]≤4tk

(n−2t)1

4−δ2t

< k 2

nρt.

Now, we obtain the desired result:

P[R≤n

2−δn]<bn

2−δnc

X

k=0

k2

nρt≤kρt.

Uniform model.

There is a second view on the distribution of

J

that will turn out convenient

for our analysis. Suppose our input consists of

n

real numbers drawn

i.i.d.

uniformly from

(0

,

1). Since our algorithms are comparison based and the ranks of these numbers form

a random permutation almost surely, this assumption is without loss of generality for

expected-case considerations.

The vital aspect of this uniform model is that we can separate the value

P∈

(0

,

1) of

the (ﬁrst) pivot from its rank

R∈

[1

..n

]. In particular,