Content uploaded by Armin Weiß

Author content

All content in this area was uploaded by Armin Weiß on Nov 26, 2014

Content may be subject to copyright.

QuickXsort: Eﬃcient Sorting with n log n − 1.399n + o(n)

Comparisons on Average

Stefan Edelkamp

TZI, Universit¨at Bremen, Am Fallturm 1, D-28239 Bremen, Germany

edelkamp@tzi.de

Armin Weiß

FMI, Universit¨at Stuttgart, Universit¨atsstr. 38, D-70569 Stuttgart, Germany

armin.weiss@fmi.uni-stuttgart.de

Abstract

In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort.

Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X

satisﬁes certain natural conditions.

With QuickWeakHeapsort and QuickMergesort we present two examples for the

QuickXsort-construction. Both are eﬃcient algorithms that incur approximately

n log n −

1

.

26

n

+

o

(

n

) comparisons on the average. A worst case of

n log n

+

O

(

n

) comparisons can be

achieved without signiﬁcantly aﬀecting the average case.

Furthermore, we describe an implementation of MergeInsertion for small

n

. Taking

MergeInsertion as a base case for QuickMergesort, we establish a worst-case eﬃcient

sorting algorithm calling for

n log n −

1

.

3999

n

+

o

(

n

) comparisons on average. QuickMergesort

with constant size base cases shows the best performance on practical inputs: when sorting

integers it is slower by only 15% to STL-Introsort.

arXiv:1307.3033v1 [cs.DS] 11 Jul 2013

1 Introduction

Sorting a sequence of

n

elements remains one of the most frequent tasks carried out by computers.

A lower bound for sorting by only pairwise comparisons is

blog n!c ≈ n log n −

1

.

44

n

+

O

(

log n

)

comparisons for the worst and average case (logarithms referred to by

log

are base 2, the average

case refers to a uniform distribution of all input permutations assuming all elements are diﬀerent).

Sorting algorithms that are optimal in the leading term are called constant-factor-optimal.

Table 1 lists some milestones in the race for reducing the coeﬃcient in the linear term. One

of the most eﬃcient (in terms of number of comparisons) constant-factor-optimal algorithms for

solving the sorting problem is Ford and Johnson’s MergeInsertion algorithm [

7

]. It requires

n log n −

1

.

329

n

+

O

(

log n

) comparisons in the worst case [

10

]. MergeInsertion has a severe

drawback that makes it uninteresting for practical issues: similar to Insertionsort the number

of element moves is quadratic in

n

. With Insertionsort we mean the algorithm that inserts all

elements successively into the already ordered sequence ﬁnding the position for each element by

binary search (not by linear search as mostly done). However, MergeInsertion and Insertionsort

can be used to sort small subarrays such that the quadratic running time for these subarrays is

small in comparison to the overall running time.

Reinhardt [

12

] used this technique to design an internal Mergesort variant that needs in

the worst case

n log n −

1

.

329

n

+

O

(

log n

) comparisons. Unfortunately, implementations of this

InPlaceMergesort algorithm have not been documented. Katajainen et al.’s [

9

,

6

] work inspired

by Reinhardt is practical, but the number of comparisons is larger.

Throughout the text we avoid the terms in-place or in-situ and prefer the term internal (opposed

to external). We call an algorithm internal if it needs at most

O

(

log n

) space in addition to the

array to be sorted. That means we consider Quicksort as an internal algorithm whereas standard

Mergesort is external because it needs a linear amount of extra space.

Based on QuickHeapsort [

1

], in this paper we develop the concept of QuickXsort and apply

it to other sorting algorithms as Mergesort or WeakHeapsort. This yields eﬃcient internal

sorting algorithms. The idea is very simple: as in Quicksort the array is partitioned into the

elements greater and less than some pivot element. Then one part of the array is sorted by some

algorithm X and the other part is sorted recursively. The advantage of this procedure is that,

if X is an external algorithm, then in QuickXsort the part of the array which is not currently

being sorted may be used as temporary space, what yields an internal variant of X. We show that

under natural assumptions QuickXsort performs up to

o

(

n

) terms on average the same number of

comparisons as X.

The concept of QuickXsort (without calling it like that) was ﬁrst applied in UltimateHeap-

sort by Katajainen [

8

]. In UltimateHeapsort, ﬁrst the median of the array is determined, and

then the array is partitioned into subarrays of equal size. Finding the median means signiﬁcant

additional eﬀort. Cantone and Cincotti [

1

] weakened the requirement for the pivot and designed

QuickHeapsort which uses only a sample of smaller size to select the pivot for partitioning. Ulti-

mateHeapsort is inferior to QuickHeapsort in terms of average case running time, although,

unlike QuickHeapsort, it allows an

n log n

+

O

(

n

) bound for the worst case number of comparisons.

Diekert and Weiß [

2

] analyzed QuickHeapsort more thoroughly and showed that it needs less

than

n log n −

0

.

99

n

+

o

(

n

) comparisons in the average case when implemented with approximately

√

n elements as sample for pivot selection and some other improvements.

Edelkamp and Stiegeler [

4

] applied the idea of QuickXsort to WeakHeapsort (which was ﬁrst

1

Table 1: Constant-factor-optimal sorting with n log n + κn + o(n) comparisons.

Mem. Other κ Worst κ Avg. κ Exper.

Lower bound O(1) O(n log n) -1.44 -1.44

BottomUpHeapsort [13] O(1) O(n log n) ω(1) – [0.35,0.39]

WeakHeapsort [3, 5] O(n/w) O(n log n) 0.09 – [-0.46,-0.42]

RelaxedWeakHeapsort [4] O(n) O(n log n) -0.91 -0.91 -0.91

Mergesort [10] O(n) O(n log n) -0.91 -1.26 –

ExternalWeakHeapsort # O(n) O(n log n) -0.91 -1.26* –

Insertionsort [10] O(1) O(n

2

) -0.91 -1.38 # –

MergeInsertion [10] O(n) O(n

2

) -1.32 -1.3999 # [-1.43,-1.41]

InPlaceMergesort [12] O(1) O(n log n) -1.32 – –

QuickHeapsort [1, 2] O(1) O(n log n) ω(1) -0.03 ≈ 0.20

O(n/w) O(n log n) ω(1) -0.99 ≈ -1.24

QuickMergesort (IS) # O(log n) O(n log n) -0.32 -1.38 –

QuickMergesort # O(1) O(n log n) -0.32 -1.26 [-1.29,-1.27]

QuickMergesort (MI) # O(log n) O(n log n) -0.32 -1.3999 [-1.41,-1.40]

Abbreviations: # in this paper, MI MergeInsertion, – not analyzed, * for

n

= 2

k

,

w

: computer word

width in bits; we assume log n ∈ O(n/w).

For QuickXsort we assume InPlaceMergesort as a worst-case stopper (without

κ

worst

∈ ω

(1)).

described by Dutton [

3

]) introducing QuickWeakHeapsort. The worst case number of comparisons

of WeakHeapsort is

ndlog ne −

2

dlog ne

+

n −

1

≤ n log n

+ 0

.

09

n

, and, following Edelkamp and

Wegener [

5

], this bound is tight. In [

4

] an improved variant with

n log n −

0

.

91

n

comparisons in the

worst case and requiring extra space is presented. With ExternalWeakHeapsort we propose

a further reﬁnement with the same worst case bound, but in average requiring approximately

n log n −

1

.

26

n

comparisons. Using ExternalWeakHeapsort as X in QuickXsort we obtain

an improvement over QuickWeakHeapsort of [4].

As indicated above, Mergesort is another good candidate to apply the QuickXsort-

construction. With QuickMergesort we describe an internal variant of Mergesort which

not only in terms of number of comparisons is almost as good as Mergesort, but also in terms of

running time. As mentioned before, MergeInsertion can be used to sort small subarrays. We

study MergeInsertion and provide an implementation based on weak heaps. Furthermore, we give

an average case analysis. When sorting small subarrays with MergeInsertion, we can show that

the average number of comparisons performed by Mergesort is bounded by

n log n−

1

.

3999

n

+

o

(

n

),

and, therefore, QuickMergesort uses at most

n log n −

1

.

3999

n

+

o

(

n

) comparisons in the average

case.

2 QuickXsort

In this section we give a more precise description of QuickXsort and derive some results concerning

the number of comparisons performed in the average and worst case. Let X be some sorting algorithm.

QuickXsort works as follows: First, choose some pivot element as median of some random sample.

Next, partition the array according to this pivot element, i. e., rearrange the array such that all

elements left of the pivot are less or equal and all elements on the right are greater or equal than

2

the pivot element. Then, choose one part of the array and sort it with algorithm X. (In general, it

does not matter whether the smaller or larger half of the array is chosen. However, for a speciﬁc

sorting algorithm X like Heapsort, there might be a better and a worse choice.) After one part of

the array has been sorted with X, move the pivot element to its correct position (right after/before

the already sorted part) and sort the other part of the array recursively with QuickXsort.

The main advantage of this procedure is that the part of the array that is not being sorted

currently can be used as temporary memory for the algorithm X. This yields fast internal variants

for various external sorting algorithms (such as Mergesort). The idea is that whenever a data

element should be moved to the external storage, instead it is swapped with some data element in

the part of the array which is not currently being sorted. Of course, this works only, if the algorithm

needs additional storage only for data elements. Furthermore, the algorithm has to be able to keep

track of the positions of elements which have been swapped. As the speciﬁc method depends on the

algorithm X, we give some more details when we describe the examples for QuickXsort.

For the number of comparisons we can derive some general results which hold for a wide class

of algorithms X. Under natural assumptions the average case number of comparisons of X and

of QuickXsort diﬀers only by an

o

(

n

)-term. For the rest of the paper, we assume that the

pivot is selected as the median of approximately

√

n

randomly chosen elements. Sample sizes of

approximately

√

n are likely to be optimal as the results in [2, 11] suggest.

Theorem 1

(QuickXsort Average-Case)

.

Let X be some sorting algorithm requiring at most

n log n

+

cn

+

o

(

n

) comparisons in the average case. Then, QuickXsort implemented with Θ(

√

n

)

elements as sample for pivot selection is a sorting algorithm that also needs at most

n log n

+

cn

+

o

(

n

)

comparisons in the average case.

For the proofs we assume that the arrays are indexed starting with 1. The following lemma is

crucial for our estimates. It can be derived by applying Chernoﬀ bounds or by direct elementary

calculations.

Lemma 1

([

2

, Lm. 2])

.

Let 0

< δ <

1

2

. If we choose the pivot as median of 2

γ

+ 1 elements such

that 2γ + 1 ≤

n

2

, then we have Pr

pivot ≤

n

2

− δn

< (2γ + 1)α

γ

where α = 4

1

4

− δ

2

< 1.

Proof of Thm. 1.

Let

T

(

n

) denote the average number of comparisons performed by QuickXsort

on an input array of length

n

and let

S

(

n

) =

n log n

+

cn

+

s

(

n

) with

s

(

n

)

∈ o

(

n

) be an upper

bound for the average number of comparisons performed by the algorithm X on an input array of

length n. Without loss of generality we may assume that s(n) is monotone. We are going to show

by induction that

T (n) ≤ n log n + cn + t(n)

for some monotonically increasing t(n) ∈ o(n) with s(n) ≤ t(n) which we will specify later.

Let

δ

(

n

)

∈ o

(1)

∩

Ω(

n

−

1

4

+

) with 1

/n ≤ δ

(

n

)

≤

1

/

4, i. e.,

δ

is some function tending slowly to

zero for

n → ∞

. Because of

δ

(

n

)

∈

Ω(

n

−

1

4

+

), we see that (2

γ

+ 1)

1 − 4δ

2

γ

tends to zero if

γ ∈

Θ(

√

n

). Hence, by Lem. 1 it follows that the probability that the pivot is more than

n · δ

(

n

)

oﬀ the median

p

(

n

) =

Pr

pivot < n

1

2

− δ(n)

+

Pr

pivot > n

1

2

+ δ(n)

tends to zero for

n → ∞

. In the following we write

M

=

n

1

2

− δ(n)

, n

1

2

+ δ(n)

∩ N

and

M

=

{

1

, . . . , n} \ M

.

3

We obtain the following recurrence relation:

T (n) ≤ n − 1 + T

pivot

(n)

+

n

X

k=1

Pr [pivot = k] · max {T (k − 1) + S(n − k), T (n − k) + S(k − 1) }

≤ n − 1 + T

pivot

(n)

+ Pr [pivot ∈ M] · max

k∈M

{T (k) + S(n − k), T (n − k) + S(k) }

+ Pr

pivot ∈ M

· max

k∈M

{T (k) + S(n − k), T (n − k) + S(k) }.

The function

f

(

x

) =

x log x

+ (

n − x

)

log

(

n − x

),

f

(0) =

f

(

n

) =

n log n

has its only minimum in

the interval [0

, n

] at

x

=

n/

2, i. e., for 0

< x < n/

2 it decreases monotonically and for

n/

2

< x < n

it increases monotonically. We set

β

=

n

2

+ n · δ(n)

. That means that we have

f

(

x

)

≤ f

(

β

) for

x ∈ M

and

f

(

x

)

≤ f

(

n

) for

x ∈ M

. Using this observation, the induction hypothesis, and our

assumptions, we conclude

max

k∈M

{T (k) + S(n − k), T (n − k) + S(k) }

≤ max {f(k) + cn + t(k) + s(n − k) | k ∈ M } ≤ f(β) + cn + t(β) + s(n),

max

k∈M

{T (k) + S(n − k), T (n − k) + S(k) }

≤ max

f(k) + cn + t(k) + s(n − k)

k ∈ M

≤ T (n) + s(n).

With p(n) as above we obtain:

T (n) ≤ n − 1 + T

pivot

(n) + p(n) · T (n) + s(n)

+ (1 − p(n))

f(β) + cn + t(β)

.

(1)

We subtract

p

(

n

)

· T

(

n

) on both sides and then divide by 1

− p

(

n

). Let

D

be some constant such

that

D ≥

1

/

(1

− p

(

n

)) for all

n

(which exists since

p

(

n

)

6

= 1 for all

n

and

p

(

n

)

→

0 for

n → ∞

).

Then, we obtain

T (n) ≤ (1 + D · p(n)) · (n − 1) + D · (T

pivot

(n) + s(n)) +

f(β) + cn + t(β)

≤ (1 + D · p(n)) · (n − 1) + D · (T

pivot

(n) + s(n))

+

n

2

− n · δ(n)

· (log(n/2) + log(1 + 2δ(n)))

+

n

2

+ n · δ(n)

· (log(n/2) + log(1 + 2δ(n))) + cn + t(3n/4)

≤ n log n + cn

+ (D · p(n) + 2 · δ(n)/ ln 2) · n + D · (T

pivot

(n) + s(n)) + t(3n/4),

where the last inequality follows from

log

(1 +

x

) =

ln

(1 +

x

)

/ ln

(2)

≤ x/ ln

(2) for

x ∈ R

>0

. We see

that T (n) ≤ n log n + cn + t(n) if t(n) satisﬁes the inequality

(D · p(n) + 2 · δ(n)/ ln 2) · n + D · (T

pivot

(n) + s(n)) + t(3n/4) ≤ t(n).

We choose

t

(

n

) as small as possible. Inductively, we can show that for every

there is some

D

such

that t(n) < n + D

. Hence, the theorem follows.

4

Does QuickXsort provide a good bound for the worst case? The obvious answer is “no”. If

always the

√

n

smallest elements are chosen for pivot selection, a running time of Θ(

n

3/2

) is obtained.

However, we can prove that such a worst case is very unlikely. In fact, let

R

(

n

) be the worst case

number of comparisons of the algorithm X. Prop. 1 states that the probability that QuickXsort

needs more than

R

(

n

) + 6

n

comparisons decreases exponentially in

n

. (This bound is not tight, but

since we do not aim for exact probabilities, Prop. 1 is enough for us.)

Proposition 1.

Let

>

0. The probability that QuickXsort needs more than

R

(

n

) + 6

n

compar-

isons is less than (3/4 + )

4

√

n

for n large enough.

Proof.

Let

n

be the size of the input. We say that we are in a good case if an array of size

m

is

partitioned in the interval [

m/

4

,

3

m/

4], i. e., if the pivot is chosen in that interval. We can obtain a

bound for the desired probability by estimating the probability that we always are in such a good

case until the array contains only

√

n

elements. For smaller arrays, we can assume an upper bound

of

√

n

2

=

n

comparisons for the worst case. For all partitioning steps that sums up to less than

n ·

P

i≥0

(3

/

4)

i

= 4

n

comparisons if we are always in a good case. We also have to consider the

number of comparisons required to ﬁnd the pivot element. At any stage the pivot is chosen as

median of at most

√

n

elements. Since the median can be determined in linear time, for all stages

together this sums up to less than

n

comparisons if we are always in a good case and

n

is large

enough. Finally, for all the sorting phases with X we need at most

R

(

n

) comparisons in total (that

is only a rough upper bound which can be improved as in the proof of Thm. 1). Hence, we need at

most R(n) + 6n comparisons if always a good case occurs.

Now, we only have to estimate the probability that always a good case occurs. By Lem. 1,

the probability for a good case in the ﬁrst partitioning step is at least 1

− d ·

√

n · (3/4)

√

n

for

some constant

d

. We have to choose

log

(

√

n

)

/ log

(3

/

4)

<

1

.

21

log n

times a pivot in the interval

[

m/

4

,

3

m/

4], then the array has size less than

√

n

. We only have to consider partitioning steps

where the array has size greater than

√

n

(if the size of the array is already less than

√

n

we deﬁne

the probability of a good case as 1). Hence, for each of these partitioning steps we obtain that the

probability for a good case is greater than 1 − d ·

4

√

n · (3/4)

4

√

n

. Therefore, we obtain

Pr [ always good case ] ≥

1 − d ·

4

√

n · (3/4)

4

√

n

1.21 log(n)

≥ 1 − 1.21 log(n) · d ·

4

√

n · (3/4)

4

√

n

by Bernoulli’s inequality. For

n

large enough we have 1

.

21

log

(

n

)

·d·

4

√

n·(3/4)

4

√

n

≤

(3

/

4+

)

4

√

n

.

To obtain a provable bound for the worst case complexity we apply a simple trick. We ﬁx some

worst case eﬃcient sorting algorithm Y. This might be, e. g., InPlaceMergesort. Worst case

eﬃcient means that we have a

n log n

+

O

(

n

) bound for the worst case number of comparisons.

We choose some slowly decreasing function

δ

(

n

)

∈ o

(1)

∩

Ω(

n

−

1

4

+

), e. g.,

δ

(

n

) = 1

/ log n

. Now,

whenever the pivot is more than

n · δ

(

n

) oﬀ the median, we switch to the algorithm Y. We

call this QuickXYsort. To achieve a good worst case bound, of course, we also need a good

bound for algorithm X. W. l. o. g. we assume the same worst case bounds for X as for Y. Note

that QuickXYsort only makes sense if one needs a provably good worst case bound. Since

QuickXsort is always expected to make at most as many comparisons as QuickXYsort (under

the reasonable assumption that X on average is faster than Y – otherwise one would use simply Y),

in every step of the recursion QuickXsort is the better choice for the average case.

5

In order to obtain an eﬃcient internal sorting algorithm, of course, Y has to be internal and X

using at most n extra spaces for an array of size n.

Theorem 2

(QuickXYsort Worst-Case)

.

Let X be a sorting algorithm with at most

n log n

+

cn

+

o

(

n

) comparisons in the average case and

R

(

n

) =

n log n

+

dn

+

o

(

n

) comparisons in the worst

case (

d ≥ c

). Let Y be a sorting algorithm with at most

R

(

n

) comparisons in the worst case. Then,

QuickXYsort is a sorting algorithm that performs at most

n log n

+

cn

+

o

(

n

) comparisons in the

average case and n log n + (d + 1)n + o(n) comparisons in the worst case.

Proof.

Since the proof is very similar to the proof of Thm. 1, we provide only a sketch. By replacing

T

(

n

) by

R

(

n

) =

n log n

+

dn

+

r

(

n

) with

r

(

n

)

∈ o

(

n

) in the right side of (1) in the proof of Thm. 1

we obtain for the average case:

T

av

(n) ≤ (n − 1) + T

pivot

(n) + s(n) + p(n) · (n log n + dn + r(n))

+ (1 − p(n)) ·

n

2

− n · δ(n)

· (log(n/2) + log(1 + 2δ(n)) + c)

+

n

2

+ n · δ(n)

· (log(n/2) + log(1 + 2δ(n)) + c) + t(3n/4)

≤ n log n + cn + T

pivot

(n)

+ (p(n) · (dn + r(n)) + 2δ(n)/ ln 2) · n + s(n) + t(3n/4).

As in Thm. 1 the statement for the average case follows.

For the worst case, there are two possibilities: either the algorithm already fails the condition

pivot ∈

n

1

2

− δ(n)

, n

1

2

+ δ(n)

in the ﬁrst partitioning step or it does not. In the ﬁrst case, it

is immediate that we have a worst case bound of

n log n

+

dn

+

n

+

o

(

n

), which also is tight. Note

that we assume that we can choose the pivot element in time

o

(

n

) which is no real restriction, since

the median of Θ(

√

n

) elements can be found in Θ(

√

n

) time. In the second case, we assume by

induction that

T

worst

(

m

)

≤ m log m

+

dm

+

m

+

u

(

m

) for

m < n

for some

u

(

m

)

∈ o

(

m

) and obtain

a recurrence relation similar to (1) in the proof of Thm. 1:

T

worst

(n) ≤ n − 1 + T

pivot

(n) + R

n

2

− n · δ(n)

+ T

worst

n

2

+ n · δ(n)

.

By the same arguments as above the result follows.

3 QuickWeakHeapsort

In this section consider QuickWeakHeapsort as a ﬁrst example of QuickXsort. We start by

introducing weak heaps and then continue by describing WeakHeapsort and a novel external

version of it. This external version is a good candidate for QuickXsort and yields an eﬃcient

sorting algorithm that uses approximately

n log n −

1

.

2

n

comparisons (this value is only a rough

estimate and neither a bound from below nor above). A drawback of WeakHeapsort and its

variants is that they require one extra bit per element. The exposition also serves as an intermediate

step towards our implementation of MergeInsertion, where the weak-heap data structure will be

used as a building block.

Conceptually, a weak heap (see Fig. 1) is a binary tree satisfying the following conditions:

(1) The root of the entire tree has no left child.

6

9

2

2

1

0

7

56

4

1

3

11

9

8

4

5

7

6

3

8

Figure 1: A weak heap (reverse bits are set for grey nodes, above the nodes are array indices.)

(2)

Except for the root, the nodes that have at most one child are in the last two levels only. Leaves

at the last level can be scattered, i. e., the last level is not necessarily ﬁlled from left to right.

(3)

Each node stores an element that is smaller than or equal to every element stored in its right

subtree.

From the ﬁrst two properties we deduce that the height of a weak heap that has

n

elements is

dlog ne

+ 1. The third property is called the weak-heap ordering or half-tree ordering. In particular,

this property enforces no relation between an element in a node and those stored its left subtree.

On the other hand, it implies that any node together with its right subtree forms a weak heap on

its own. In an array-based implementation, besides the element array

s

, an array

r

of reverse bits is

used, i. e.,

r

i

∈ {

0

,

1

}

for

i ∈ {

0

, . . . , n −

1

}

. The root has index 0. The array index of the left child

of

s

i

is 2

i

+

r

i

, the array index of the right child is 2

i

+ 1

− r

i

, and the array index of the parent

is

bi/

2

c

(assuming that

i 6

= 0). Using the fact that the indices of the left and right children of

s

i

are exchanged when ﬂipping

r

i

, subtrees can be reversed in constant time by setting

r

i

←

1

− r

i

.

The distinguished ancestor (

d-ancestor

(

j

)) of

s

j

for

j 6

= 0, is recursively deﬁned as the parent of

s

j

if

s

j

is a right child, and the distinguished ancestor of the parent of

s

j

if

s

j

is a left child. The

distinguished ancestor of

s

j

is the ﬁrst element on the path from

s

j

to the root which is known to be

smaller or equal than

s

j

by (3). Moreover, any subtree rooted by

s

j

, together with the distinguished

ancestor s

i

of s

j

, forms again a weak heap with root s

i

by considering s

j

as right child of s

i

.

The basic operation for creating a weak heap is the

join

operation which combines two weak

heaps into one. Let

i

and

j

be two nodes in a weak heap such that

s

i

is smaller than or equal to

every element in the left subtree of

s

j

. Conceptually,

s

j

and its right subtree form a weak heap,

while

s

i

and the left subtree of

s

j

form another weak heap. (Note that

s

i

is not allowed be in the

subtree with root

s

j

.) The result of

join

is a weak heap with root at position

i

. If

s

j

< s

i

, the two

elements are swapped and

r

j

is ﬂipped. As a result, the new element

s

j

will be smaller than or

equal to every element in its right subtree, and the new element

s

i

will be smaller than or equal to

every element in the subtree rooted at

s

j

. To sum up,

join

requires constant time and involves one

element comparison and a possible element swap in order to combine two weak heaps to a new one.

The construction of a weak heap consisting of

n

elements requires

n −

1 comparisons. In the

standard bottom-up construction of a weak heap the nodes are visited one by one. Starting with

the last node in the array and moving to the front, the two weak heaps rooted at a node and

its distinguished ancestor are joined. The amortized cost to get from a node to its distinguished

ancestor is O(1) [5].

When using weak heaps for sorting, the minimum is removed and the weak heap condition

7

restored until the weak heap becomes empty. After extracting an element from the root, ﬁrst the

special path from the root is traversed top-down, and then, in a bottom-up process the weak-heap

property is restored using at most

dlog ne

join operations. (The special path is established by going

once to the right and then to the left as far as it is possible.) Hence, extracting the minimum

requires at most dlog ne comparisons.

Now, we introduce a modiﬁcation to the standard procedure described by Dutton [

3

], which

has a slightly improved performance, but requires extra space. We call this modiﬁed algorithm

ExternalWeakHeapsort. This is because it needs an extra output array, where the elements

which are extracted from the weak heap are moved to. On average ExternalWeakHeapsort

requires less comparisons than RelaxedWeakHeapsort [

4

]. Integrated in QuickXsort we

can implement it without extra space other than the extra bits

r

and some other extra bits. We

introduce an additional array active and weaken the requirements of a weak heap: we also allow

nodes on other than the last two levels to have less than two children. Nodes where the active bit is

set to false are considered to have been removed. ExternalWeakHeapsort works as follows:

First, a usual weak heap is constructed using

n −

1 comparisons. Then, until the weak heap becomes

empty, the root – which is the minimal element – is moved to the output array and the resulting

hole has to be ﬁlled with the minimum of the remaining elements (so far the only diﬀerence to

normal WeakHeapsort is that there is a separate output area).

The hole is ﬁlled by searching the special path from the root to a node

x

which has no left child.

Note that the nodes on the special path are exactly the nodes having the root as distinguished

ancestor. Finding the special path does not need any comparisons, since one only has to follow

the reverse bits. Next, the element of the node

x

is moved to the root leaving a hole. If

x

has a

right subtree (i. e., if

x

is the root of a weak heap with more than one element), this hole is ﬁlled by

applying the hole-ﬁlling algorithm recursively to the weak heap with root

x

. Otherwise, the active

bit of

x

is set to false. Now, the root of the whole weak heap together with the subtree rooted by

x

forms a weak heap. However, it remains to restore the weak heap condition for the whole weak heap.

Except for the root and

x

, all nodes on the special path together with their right subtrees form weak

heaps. Following the special path upwards these weak heaps are joined with their distinguished

ancestor as during the weak heap construction (i. e., successively they are joined with the weak

heap consisting of the root and the already treated nodes on the special path together with their

subtrees). Once, all the weak heaps on the special path are joined, the whole array forms a weak

heap again.

Theorem 3.

For

n

= 2

k

ExternalWeakHeapsort performs exactly the same comparisons as

Mergesort applied on a ﬁxed permutation of the same input array.

Proof.

First, recall the Mergesort algorithm: The left half and the right half of the array are

sorted recursively and then the two subarrays are merged together by always comparing the smallest

elements of both arrays and moving the smaller one to the separate output area. Now, we move to

WeakHeapsort. Consider the tree as it is initialized with all reverse bits set to false. Let

r

be the

root and

y

its only child (not the elements but the positions in the tree). We call

r

together with

the left subtree of

y

the left part of the tree and we call

y

together with its right subtree the right

part of the tree. That means the left part and the right part form weak heaps on their own. The

only time an element is moved from the right to the left part or vice-versa is when the data elements

s

r

and

s

y

are exchanged. However, always one of the data elements of

r

and

y

comes from the right

part and one from the left part. After extracting the minimum

s

r

, it is replaced by the smallest

8

Pivot Pivot

Pivot

Figure 2: First the two halves of the left part are sorted moving them from one place to another.

Then, they are merged to the original place.

remaining element of the part

s

r

came from. Then, the new

s

r

and

s

y

are compared again and so

on. Hence, for extracting the elements in sorted order from the weak heap the following happens.

First, the smallest elements of the left and right part are determined, then they are compared and

ﬁnally the smaller one is moved to the output area. This procedure repeats until the weak heap is

empty. This is exactly how the recursion of Mergesort works: always the smallest elements of the

left and right part are compared and the smaller one is moved to the output area. If

n

= 2

k

, then

the left and right parts for Mergesort and WeakHeapsort have the same sizes.

By [10, 5.2.4–13] we obtain the following corollary.

Corollary 1

(Average Case ExternalWeakHeapsort)

.

For

n

= 2

k

the algorithm External-

WeakHeapsort uses approximately n log n − 1.26n comparisons in the average case.

If

n

is not a power of two, the sizes of left and right parts of WeakHeapsort are less

balanced than the left and right parts of ordinary Mergesort and one can expect a slightly higher

number of comparisons. For QuickWeakHeapsort, the half of the array which is not sorted by

ExternalWeakHeapsort is used as output area. Whenever the root is moved to the output

area, the element that occupied that place before is inserted as a dummy element at the position

where the active bit is set to false. Applying Thm. 1, we obtain the rough estimate of

n log n −

1

.

2

n

comparisons for the average case of QuickWeakHeapsort.

4 QuickMergesort

As another example for QuickXsort we consider QuickMergesort. For the Mergesort part

we use standard (top-down) Mergesort which can be implemented using

m

extra spaces to merge

two arrays of length

m

(there are other methods like in [

12

] which require less space – but for our

purposes this is good enough). The procedure is depicted in Fig. 2. We sort the larger half of the

partitioned array with Mergesort as long as we have one third of the whole array as temporary

memory left, otherwise we sort the smaller part with Mergesort. Hence, the part which is not

sorted by Mergesort always provides enough temporary space. When a data element should be

moved to or from the temporary space, it is swapped with the element occupying the respective

position. Since Mergesort moves through the data from left to right, it is always known which are

the elements to be sorted and which are the dummy elements. Depending on the implementation the

extra space needed is

O

(

log n

) words for the recursion stack of Mergesort. By avoiding recursion

this can even be reduced to O(1). Thm. 1 together with [10, 5.2.4–13] yields the next result.

9

Theorem 4

(Average Case QuickMergesort)

.

QuickMergesort is an internal sorting algo-

rithm that performs at most n log n − 1.26n + o(n) comparisons on average.

We can do even better if we sort small subarrays with another algorithm Z requiring less

comparisons but extra space and more moves, e. g., Insertionsort or MergeInsertion . If we use

O

(

log n

) elements for the base case of Mergesort, we have to call Z at most

O

(

n/ log n

) times.

In this case we can allow additional operations of Z like moves in the order of

O

(

n

2

), given that

O((n/ log n) · log

2

n) = O(n log n).

Note that for the next theorem we only need that the size of the base cases grows as

n

grows.

Nevertheless,

O

(

log n

) is the largest growing value we can choose if we apply a base case algorithm

with Θ(n

2

) moves and want to achieve an O(n log n) overall running time.

Theorem 5

(QuickMergesort with Base Case)

.

Let

Z

be some sorting algorithm with

n log n

+

en

+

o

(

n

) comparisons on the average and other operations taking at most

O

(

n

2

) time. If base cases

of size

O

(

log n

) are sorted with Z, QuickMergesort uses at most

n log n

+

en

+

o

(

n

) comparisons

and O(n log n) other instructions on the average.

Proof.

By Thm. 1 and the preceding remark, the only thing we have to prove is that Mergesort

with base case Z requires on average at most

≤ n log n

+

en

+

o

(

n

) comparisons, given that Z needs

≤ U

(

n

) =

n log n

+

en

+

o

(

n

) comparisons on average. The latter means that for every

>

0 we

have U (n) ≤ n log n + (e + ) · n for n large enough.

Let

S

k

(

m

) denote the average case number of comparisons of Mergesort with base cases

of size

k

sorted with Z and let

>

0. Since

log n

grows as

n

grows, we have that

S

log n

(

m

) =

U

(

m

)

≤ m log m

+ (

e

+

)

· m

for

n

large enough and (

log n

)

/

2

< m ≤ log n

. For

m > log n

we

have

S

log n

(

m

)

≤

2

·S

log n

(

m/

2) +

m

and by induction we see that

S

log n

(

m

)

≤ m log m

+ (

e

+

)

·m

.

Hence, also S

log n

(n) ≤ n log n + (e + ) · n for n large enough.

Using Insertionsort we obtain the following result. Here,

ln

denotes the natural logarithm.

As we did not ﬁnd a result in literature, we also provide a proof. Recall that Insertionsort inserts

the elements one by one into the already sorted sequence by binary search.

Proposition 2

(Average Case of Insertionsort)

.

The sorting algorithm Insertionsort needs

n log n − 2 ln 2 · n + c(n) · n + O(log n) comparisons on the average where c(n) ∈ [−0.005, 0.005].

Corollary 2

(QuickMergesort with Base Case Insertionsort)

.

If we use as base case In-

sertionsort, QuickMergesort uses at most

n log n −

1

.

38

n

+

o

(

n

) comparisons and

O

(

n log n

)

other instructions on the average.

Proof of Prop. 2.

First, we take a look at the average number of comparisons

T

InsAvg

(

k

) to insert

one element into a sorted array of k − 1 elements by binary insertion.

To insert a new element into

k −

1 elements either needs

dlog ke −

1 or

dlog ke

comparisons.

There are

k

positions where the element to be inserted can end up, each of which is equally

likely. For 2

dlog ke

− k

of these positions

dlog ke −

1 comparisons are needed. For the other

k − (2

dlog ke

− k) = 2k − 2

dlog ke

positions dlog ke comparisons are needed. This means

T

InsAvg

(k) =

(2

dlog ke

− k) · (dlog ke − 1) + (2k − 2

dlog ke

) · dlog ke

k

= dlog ke + 1 −

2

dlog ke

k

10

comparisons are needed on average. By [

10

, 5.3.1–(3)], we obtain for the average case for sorting

n

elements:

T

InsSortAvg

(n) =

n

X

k=1

T

InsAvg

(k) =

n

X

k=1

dlog ke + 1 −

2

dlog ke

k

!

= n · dlog ne − 2

dlog ne

+ 1 + n −

n

X

k=1

2

dlog ke

k

.

We examine the last sum separately. In the following we write

H

(

n

) =

P

n

k=1

1

k

=

ln n

+

γ

+

O

(

1

n

)

for the harmonic sum with γ the Euler constant.

n

X

k=1

2

dlog ke

k

= 1 +

dlog ne−2

X

i=0

2

i

X

`=1

2

i+1

2

i

+ `

+

n

X

`=2

dlog ne−1

+1

2

dlog ne

`

= 1 +

dlog ne−2

X

i=0

2

i+1

·

H

2

i+1

− H

2

i

+ 2

dlog ne

·

H(n) − H

2

dlog ne−1

=

dlog ne−2

X

i=0

2

i+1

·

ln

2

i+1

) + γ −ln

2

i

− γ

+

ln (2

n

) + γ −ln

2

dlog ne−1

− γ

· 2

dlog ne

+ O(1)

= ln 2 ·

dlog ne−2

X

i=0

2

i+1

· (i + 1 − i)

+

log(n) · ln 2 − (dlog ne − 1) · ln 2

· 2

dlog ne

= ln 2 ·

2 ·

2

dlog ne−1

− 1

+ (log n − dlog ne + 1) · 2

dlog ne

+ O(1)

= ln 2 · (2 + log n − dlog ne) · 2

dlog ne

+ O(1)

Hence, we have

T

InsSortAvg

(n) = n · dlog ne − 2

dlog ne

+ n − ln 2 · (2 + log n − dlog ne) · 2

dlog ne

+ O(1).

In order to obtain a numeric bound for

T

InsSortAvg

(

n

), we compute (

T

InsSortAvg

(

n

)

− n log n

)

/n

and

then replace dlog ne − log n by x. This yields a function

x 7→ x − 2

x

+ 1 − ln 2 · (2 − x) · 2

x

,

which oscillates between

−

1

.

381 and

−

1

.

389 for 0

≤ x <

1. For

x

= 0, its value is 2

ln

2

≈

1

.

386.

Bases cases of growing size, always lead to a constant factor overhead in running time if an

algorithm with a quadratic number of total operations is applied. Therefore, in the experiments

we will also consider constant size base cases which oﬀer a slightly worse bound for the number

of comparisons, but are faster in practice. We do not analyze them separately, since the preferred

choice for the size depends on the type of data to be sorted and the system on which the algorithms

run.

11

5 MergeInsertion

MergeInsertion by Ford and Johnson [

7

] is one of the best sorting algorithms in terms of number

of comparisons. Hence, it can be applied for sorting base cases of QuickMergesort what yields

even better results than Insertionsort. Therefore, we want to give a brief description of the

algorithm and our implementation. While the description is simple, MergeInsertion is not easy

to implement eﬃciently. Our implementation is based on weak heaps and uses

n log n

+

n

extra bits.

Algorithmically, MergeInsertion(

s

0

, . . . , s

n−1

) can be described as follows (an intuitive example

for n = 21 can be found in [10]).

1.

Arrange the input such that

s

i

≥ s

i+bn/2c

for 0

≤ i < bn/2c

with one comparison per pair.

Let a

i

= s

i

and b

i

= s

i+bn/2c

for 0 ≤ i < bn/2c, and b

bn/2c

= s

n−1

if n is odd.

2. Sort the values a

0

,...,a

bn/2c−1

recursively with MergeInsertion.

3.

Rename the solution as follows:

b

0

≤ a

0

≤ a

1

≤ ··· ≤ a

bn/2c−1

and insert the elements

b

1

, . . . , b

dn/2e−1

via binary insertion, following the ordering

b

2

,

b

1

,

b

4

,

b

3

,

b

10

,

b

9

, . . . , b

5

, . . .

,

b

t

k−1

, b

t

k−1

−1

, . . . b

t

k−2

+1

, b

t

k

, . . . into the main chain, where t

k

= (2

k+1

+ (−1)

k

)/3.

Due to the diﬀerent renamings, the recursion, and the change of link structure, the design of

an eﬃcient implementation is not immediate. Our proposed implementation of MergeInsertion

is based on a tournament tree representation with weak heaps as in Sect. 3. The pseudo-code

implementations for all the operations to construct a tournament tree with a weak heap and to

access the partners in each round are shown in Fig. 6 in the appendix. (Note that for simplicity in

the above formulation the indices and the order are reversed compared to our implementation.)

One main subroutine of MergeInsertion is binary insertion. The call

binary-insert

(

x, y, z

)

inserts the element at position

z

between position

x −

1 and

x

+

y

by binary insertion. (The

pseudo-code implementations for the binary search routine is shown in Fig. 7 in the appendix.) In

this routine we do not move the data elements themselves, but we use an additional index array

φ

0

, . . . , φ

n−1

to point to the elements contained in the weak heap tournament tree and move these

indirect addresses. This approach has the advantage that the relations stored in the tournament

tree are preserved.

The most important procedure for MergeInsertion is the organization of the calls for

binary-insert

. After adapting the addresses for the elements

b

i

(w. r. t. the above description)

in the second part of the array, the algorithm calls the binary insertion routine with appropriate

indices. Note that we always use

k

comparisons for all elements of the

k

-th block (i. e., the elements

b

t

k

, . . . , b

t

k−1

+1

) even if there might be the chance to save one comparison. By introducing an

additional array, which for each

b

i

contains the current index of

a

i

, we can exploit the observation

that not always

k

comparisons are needed to insert an element of the

k

-th block. In the following

we call this the improved variant. The pseudo-code of the basic variant is shown in Fig. 3. The last

sequence is not complete and is thus tackled in a special case.

Theorem 6

(Average Case of MergeInsertion)

.

The sorting algorithm MergeInsertion needs

n log n − c(n) · n + O(log n) comparisons on the average, where c(n) ≥ 1.3999.

Corollary 3

(QuickMergesort with Base Case MergeInsertion)

.

When using MergeInser-

tion as base case, QuickMergesort needs at most

n log n −

1

.

3999

n

+

o

(

n

) comparisons and

O(n log n) other instructions on the average.

12

procedure: merge(m: integer)

global: φ array of n integers imposed by weak-heap

for l ← 0 to bm/2c − 1

φ

m−odd(m)−l−1

← d-child(φ

l

, m − odd(m));

k ← 1; e ← 2

k

; c ← f ← 0;

while e < m

k ← k + 1; e ← 2e;

l ← dm/2e + f; f ← f + (t

k

− t

k−1

);

for i ← 0 to (t

k

− t

k−1

) − 1

c ← c + 1;

if c = dm/2e then

return;

if t

k

> dm/2e − 1 then

binary-insert(i + 1 − odd(m), l, m − 1);

else

binary-insert(bm/2c − f + i, e − 1, bm/2c + f);

Figure 3: Merging step in MergeInsertion with

t

k

= (2

k+1

+ (

−

1)

k

)

/

3 ,

odd

(

m

) =

m mod

2,

and

d-child

(

φ

i

, n

) returns the highest index less than

n

of a grandchild of

φ

i

in the weak

heap (i. e,

d-child

(

φ

i

, n

) = index of the bottommost element in the weak heap which has

d-ancestor = φ

i

and index < n).

Proof of Thm. 6.

According to Knuth [

10

], MergeInsertion requires at most

W

(

n

) =

n log n −

(3

−log

3)

n

+

n

(

y

+1

−

2

y

)+

O

(

log n

) comparisons in the worst case, where

y

=

y

(

n

) =

dlog(3n/4)e−

log

(3

n/

4)

∈

[0

,

1). In the following we want to analyze the average savings relative to the worst

case. Therefore, let

F

(

n

) denote the average number of comparisons of the insertion steps of

MergeInsertion, i. e., all comparisons minus the eﬀorts for the weak heap construction, which

always takes place. Then, we obtain the recurrence relation

F (n) = F (bn/2c) + G(dn/2e), with

G(m) = (k

m

− α

m

) · (m − t

k

m

−1

) +

k

m

−1

X

j=1

j · (t

j

− t

j−1

),

with

k

m

such that

t

k

m

−1

≤ m < t

k

m

and some

α

m

∈

[0

,

1]. As we do not analyze the improved

version of the algorithm, the insertion of elements with index less or equal

t

k

m

−1

requires always

the same number of comparisons. Thus, the term

P

k

m

−1

j=1

j ·

(

t

j

− t

j−1

) is independent of the data.

However, inserting an element after

t

k

m

−1

may either need

k

m

or

k

m

−

1 comparisons. This is where

α

m

comes from. Note that α

m

only depends on m. We split F (n) into F

0

(n) + F

00

(n) with

F

0

(n) = F

0

(bn/2c) + G

0

(dn/2e) and

G

0

(m) = (k

m

− α

m

) · (m − t

k

m

−1

) with k

m

such that t

k

m

−1

≤ m < t

k

m

,

13

and

F

00

(n) = F

00

(bn/2c) + G

00

(dn/2e) and

G

00

(m) =

k

m

−1

X

j=1

j · (t

j

− t

j−1

) with k

m

such that t

k

m

−1

≤ m < t

k

m

.

For the average case analysis, we have that

F

00

(

n

) is independent of the data. For

n

= (4

/

3)

·

2

k

we have

G

0

(

n

) = 0, and hence,

F

0

(

n

) = 0. Since otherwise

G

0

(

n

) is non-negative, this proves that

exactly for n = (4/3) · 2

k

the average case matches the worst case.

Now, we have to estimate

F

0

(

n

) for arbitrary

n

. We have to consider the calls to binary insertion

more closely. To insert a new element into an array of

m −

1 elements either needs

dlog me −

1 or

dlog me

comparisons. For a moment assume that the element is inserted at every position with the

same probability. Under this assumption the analysis in the proof of Prop. 2 is valid, which states

that

T

InsAvg

(m) = dlog me + 1 −

2

dlog me

m

comparisons are needed on average.

The problem is that in our case the probability at which position an element is inserted is not

uniformly distributed. However, it is monotonically increasing with the index in the array (indices

as in our implementation). Informally speaking, this is because if an element is inserted further

to the right, then for the following elements there are more possibilities to be inserted than if the

element is inserted on the left.

Now,

binary-insert

can be implemented such that for an odd number of positions the next

comparison is made such that the larger half of the array is the one containing the positions with

lower probabilities. (In our case, this is the part with the lower indices – see Fig. 7.) That means

the less probable positions lie on rather longer paths in the search tree, and hence, the average path

length is better than in the uniform case. Therefore, we may assume a uniform distribution in the

following as an upper bound.

In each of the recursion steps we have

dn/2e − t

k

dn/2e

−1

calls to binary insertion into sets

of size

dn/2e

+

t

k

dn/2e

−1

−

1 elements each. Hence, for inserting one element, the diﬀerence

to the worst case is

2

logdn/2e+t

k

dn/2e

−1

dn/2e+t

k

dn/2e

−1

−

1. Summing up, we obtain for the average savings

S

(

n

) =

W

(

n

)

−

(

F

(

n

) +

weak-heap-construction

(

n

)) w. r. t. the worst case number

W

(

n

) the

recurrence

S(n) ≥ S(bn/2c) + (dn/2e − t

k

dn/2e

−1

) ·

2

l

log(dn/2e+t

k

dn/2e

−1

)

m

dn/2e + t

k

dn/2e

−1

− 1

.

For m ∈ R

>0

we write m = 2

`

m

−log 3+x

with x ∈ [0, 1) and we set

f(m) = (m − 2

`

m

−log 3

) ·

2

`

m

m + 2

`

m

−log 3

− 1

.

14

Recall that we have

t

k

= (2

k+1

+ (

−

1)

k

)

/

3. Thus,

k

m

and

`

m

coincide for most

m

and diﬀer by at

most 1 for a few values where

m

is close to

t

k

m

or

t

k

m

−1

. Since in both cases

f

(

m

) is smaller than

some constant, this implies that

f

(

m

) and (

m − t

k

m

−1

)

·

2

d

log(m+t

k

m

−1

)

e

m+t

k

m

−1

− 1

diﬀer by at most a

constant. Furthermore, f(m) and f(m + 1/2) diﬀer by at most a constant. Hence, we have:

S(n) ≥ S(n/2) + f(n/2) + O(1).

Since we have f(n/2) = f (n)/2, this resolves to

S(n) ≥

X

i>0

f(n/2

i

) + O(log n) =

X

i>0

f(n)/2

i

+ O(log n) = f(n) + O(log n).

With n = 2

k−log 3+x

this means up to O(log n/n)-terms

S(n)

n

≈

2

k−log 3+x

− 2

k−log 3

2

k−log 3+x

·

2

k

2

k−log 3+x

+ 2

k−log 3

− 1

= (1 − 2

−x

) ·

3

2

x

+ 1

− 1

.

Writing F (n) = n log n − c(n) · n + O(log n) we obtain with [10]

c(n) ≥ −(F (n) − n log n)/n = (3 − log 3) − (y + 1 − 2

y

) + S(n)/n,

where

y

=

dlog(3n/4)e − log

(3

n/

4)

∈

[0

,

1), i. e.,

n

= 2

`−log 3−y

for some

` ∈ Z

. With

y

= 1

− x

it

follows

c(n) ≥ (3 − log 3) − (1 − x + 1 − 2

1−x

) + (1 − 2

−x

) ·

3

2

x

+ 1

− 1

> 1.3999.

This function reaches its minimum in [0, 1) for x = log

ln 8 − 1 +

p

(1 − ln 8)

2

− 1

.

It is not diﬃcult to observe that

c

(2

k

) = 1

.

4. For the factor

e

(

n

) in

n log n − e

(

n

) +

O

(

log n

)

we have

e

(2

k

) = 3

− log

3 + (

x

+ 1

−

2

x

), where

x

=

log(3/4) · 2

k

− log

((3

/

4)

·

2

k

). We know that

x

can be rewritten as

x

=

log(3) + log(2

k

/4)

−

(

log

3 +

log

(2

k

/

4) =

dlog 3e − log

3 = 2

− log

3.

Hence, we have

e

(

n

) =

−

3

log

(3) + (3

log

(3)

−

2

2−log(3)

) =

−

4

/

3. Finally, we are interested in the

value W (n) − S(n) = W (2

k

) − S(2

k

) = −4/3 − 1/15 = −1.4.

6 Experiments

Our experiments consist of two parts. First, we compare the diﬀerent algorithms we use as base

cases, i. e., MergeInsertion, its improved variant, and Insertionsort. The results can be seen

in Fig. 4. Depending on the size of the arrays the displayed numbers are averages over 10-10000

runs

1

. The data elements we sorted were randomly chosen 64-bit integers

2

.

1

Our experiments were run on one core of an Intel Core i7-3770 CPU (3.40GHz, 8MB Cache) with 32GB RAM;

Operating system: Ubuntu Linux 64bit; Compiler: GNU’s g++ (version 4.6.3) optimized with ﬂag -O3.

2

To rely on objects being handled we avoided the ﬂattening of the array structure by the compiler. Hence, for the

running time experiments, and in each comparison taken, we left the counter increase operation intact.

15

The outcome in Fig. 4 shows that our improved MergeInsertion implementation achieves

results for the constant

κ

of the linear term in the range of [

−

1

.

43

, −

1

.

41] (for some values of

n

are

even smaller than

−

1

.

43). Moreover, the standard implementation with slightly more comparisons is

faster than Insertionsort. By the

O

(

n

2

) work, the resulting runtimes for all three implementations

raises quickly, so that only moderate values of n can be handled.

−1.45

−1.44

−1.43

−1.42

−1.41

−1.4

−1.39

−1.38

−1.37

−1.36

−1.35

2

10

2

12

2

14

2

16

Number of element comparisons − n log n per n

n [logarithmic scale]

Small−Scale Comparison Experiment

Lower Bound

Insertionsort

Merge Insertion Improved

Merge Insertion

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

2

10

2

12

2

14

2

16

Execution time per (#elements)

2

[µs]

n [logarithmic scale]

Small−Scale Runtime Experiment

Insertionsort

Merge Insertion Improved

Merge Insertion

Figure 4: Comparison of MergeInsertion, its improved variant and Insertionsort. For the

number of comparisons n log n + κn the value of κ is displayed.

The second part of our experiments (shown in Fig. 5) consists of the comparison of Quick-

Mergesort (with base cases of constant and growing size) and QuickWeakHeapsort with

state-of-the-art algorithms as STL-Introsort (i. e., Quicksort), STL-stable-sort (an imple-

mentation of Mergesort) and Quicksort with median of

√

n

elements for pivot selection. For

QuickMergesort with base cases, the improved variant of MergeInsertion is used to sort

subarrays of size up to 40

log

10

n

. For the normal QuickMergesort we used base cases of size

≤

9. We also implemented QuickMergesort with median of three for pivot selection, which turns

out to be practically eﬃcient, although it needs slightly more comparisons than QuickMergesort

with median of

√

n

. However, since also the larger half of the partitioned array can be sorted with

Mergesort, the diﬀerence to the median of

√

n

version is not as big as in QuickHeapsort [

2

].

As suggested by the theory, we see that our improved QuickMergesort implementation with

growing size base cases MergeInsertion yields a result for the constant in the linear term that is

in the range of [

−

1

.

41

, −

1

.

40] – close to the lower bound. However, for the running time, normal

QuickMergesort as well as the STL-variants Introsort (

std::sort

) and BottomUpMerge-

sort (

std::stable sort

) are slightly better. With about 15% the time gap, however, is not overly

big, and may be bridged with additional eﬀorts like skewed pivots and reﬁned partitioning. Also,

if comparisons are more expensive, QuickMergesort should perform signiﬁcantly faster than

Introsort.

16

−1.5

−1

−0.5

0

0.5

1

2

10

2

12

2

14

2

16

2

18

2

20

2

22

Number of element comparisons − n log n per n

n [logarithmic scale]

Large−Scale Comparison Experiment

Quicksort Median Sqrt

STL Introsort (out of range)

STL Mergesort

QuickMergesort (MI) Median Sqrt

QuickMergesort Median 3

QuickMergesort Median Sqrt

QuickWeakHeapsort Median Sqrt

Lower Bound

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

2

12

2

14

2

16

2

18

2

20

2

22

Execution time per element [µs]

n [logarithmic scale]

Large−Scale Runtime Experiment

Quicksort Median Sqrt

STL Introsort

STL Mergesort

QuickMergesort (MI) Median Sqrt

QuickMergesort Median 3

QuickMergesort Median Sqrt

QuickWeakHeapsort Median Sqrt

Figure 5: Comparison of QuickMergesort (with base cases of constant and growing size) and

QuickWeakHeapsort with other sorting algorithms; (MI) is short for including growing size base

cases derived from MergeInsertion. For the number of comparisons

n log n

+

κn

the value of

κ

is

displayed.

7 Concluding Remarks

Sorting

n

elements remains a fascinating topic for computer scientists both from a theoretical and

from a practical point of view. With QuickXsort we have described a procedure how to convert

an external sorting algorithm into an internal one introducing only

o

(

n

) additional comparisons on

average. We presented QuickWeakHeapsort and QuickMergesort as two examples for this

construction. QuickMergesort is close to the lower bound for the average number of comparisons

and at the same time is practically eﬃcient, even when the comparisons are fast.

Using MergeInsertion to sort base cases of growing size for QuickMergesort, we derive an

an upper bound of

n log n −

1

.

3999

n

+

o

(

n

) comparisons for the average case. As far as we know

a better result has not been published before. Our experimental results validate the theoretical

considerations and indicate that the factor

−

1

.

43 can be beaten. Of course, there is still room in

closing the gap to the lower bound of n log n − 1.44n + O(log n) comparisons.

17

References

[1]

D. Cantone and G. Cinotti. QuickHeapsort, an eﬃcient mix of classical sorting algorithms.

Theoretical Comput. Sci., 285(1):25–42, 2002.

[2]

V. Diekert and A. Weiß. Quickheapsort: Modiﬁcations and improved analysis. In A. A. Bulatov

and A. M. Shur, editors, CSR, volume 7913 of Lecture Notes in Computer Science, pages 24–35.

Springer, 2013.

[3] R. D. Dutton. Weak-heap sort. BIT, 33(3):372–381, 1993.

[4]

S. Edelkamp and P. Stiegeler. Implementing HEAPSORT with

n log n −

0

.

9

n

and QUICKSORT

with n log n + 0.2n comparisons. ACM Journal of Experimental Algorithmics, 10(5), 2002.

[5]

S. Edelkamp and I. Wegener. On the performance of Weak-Heapsort. In 17th Annual Symposium

on Theoretical Aspects of Computer Science, volume 1770, pages 254–266. Springer-Verlag,

2000.

[6]

A. Elmasry, J. Katajainen, and M. Stenmark. Branch mispredictions don’t aﬀect mergesort. In

SEA, pages 160–171, 2012.

[7]

J. Ford, Lester R. and S. M. Johnson. A tournament problem. The American Mathematical

Monthly, 66(5):pp. 387–389, 1959.

[8] J. Katajainen. The Ultimate Heapsort. In CATS, pages 87–96, 1998.

[9]

J. Katajainen, T. Pasanen, and J. Teuhola. Practical in-place mergesort. Nord. J. Comput.,

3(1):27–40, 1996.

[10]

D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison

Wesley Longman, 2nd edition, 1998.

[11]

C. Mart´ınez and S. Roura. Optimal Sampling Strategies in Quicksort and Quickselect. SIAM

J. Comput., 31(3):683–705, 2001.

[12]

K. Reinhardt. Sorting in-place with a worst case complexity of

n log n −

1

.

3

n

+

o

(

log n

)

comparisons and n log n + o(1) transports. In ISAAC, pages 489–498, 1992.

[13]

I. Wegener. Bottom-up-Heapsort, a new variant of Heapsort beating, on an average, Quicksort

(if n is not very small). Theoretical Comput. Sci., 118:81–98, 1993.

18

A Pseudocode

procedure: construct(s: array of elements, r: array of n bits, m bound)

for k = m − 1 downto 1

if i + 1 = k then

if even(k) then

join(d-ancestor(i), i)

k ← bk/2c

else

join(d-ancestor(i), i)

procedure: d-ancestor(j: index)

while (j bitand 1) = r

bj/2c

j ← bj/2c

return bj/2c

procedure: join(i, j: indices)

if s

j

< s

i

then

swap(s

i

, s

j

)

r

j

← 1 − r

j

procedure: d-child(i, j: indices)

x ← secondchild(i)

while ﬁrstchild(x) < j

x ← ﬁrstchild(x)

return x

Figure 6: Constructing a weak heap for MergeInsertion.

19

procedure: binary-insert( s: array of n elements, φ: array of n integers, r: array of n bits,

f, d, t integers)

for j = t downto f + d + 1

swap(φ

j−1

, φ

j

)

l ← f

r ← f + d

while l < r

m ← (l + r)/2

if s

φ

f+d

> s

φ

m

then

l ← m + 1

else

r ← m

for j = f + d downto l

swap(φ

j−1

, φ

j

)

Figure 7: Binary insertion of elements in MergeInsertion algorithm.

procedure: mergeinsertionrecursive( s: array of n elements, φ: array of n integers, r: array

of n bits )

if k > 2 then

mergeinsertionrecursive(k div 2)

merge(k)

procedure: mergeinsertion

(

s

: array of

n

elements,

φ

: array of

n

integers,

r

: array of

n

bits )

construct(n)

mergeinsertionrecursive(n)

Figure 8: Main routine and recursive call for MergeInsertion algorithm.

20