Content uploaded by Armin Weiß
Author content
All content in this area was uploaded by Armin Weiß on Nov 26, 2014
Content may be subject to copyright.
QuickXsort: Efficient Sorting with n log n − 1.399n + o(n)
Comparisons on Average
Stefan Edelkamp
TZI, Universit¨at Bremen, Am Fallturm 1, D-28239 Bremen, Germany
edelkamp@tzi.de
Armin Weiß
FMI, Universit¨at Stuttgart, Universit¨atsstr. 38, D-70569 Stuttgart, Germany
armin.weiss@fmi.uni-stuttgart.de
Abstract
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort.
Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X
satisfies certain natural conditions.
With QuickWeakHeapsort and QuickMergesort we present two examples for the
QuickXsort-construction. Both are efficient algorithms that incur approximately
n log n −
1
.
26
n
+
o
(
n
) comparisons on the average. A worst case of
n log n
+
O
(
n
) comparisons can be
achieved without significantly affecting the average case.
Furthermore, we describe an implementation of MergeInsertion for small
n
. Taking
MergeInsertion as a base case for QuickMergesort, we establish a worst-case efficient
sorting algorithm calling for
n log n −
1
.
3999
n
+
o
(
n
) comparisons on average. QuickMergesort
with constant size base cases shows the best performance on practical inputs: when sorting
integers it is slower by only 15% to STL-Introsort.
arXiv:1307.3033v1 [cs.DS] 11 Jul 2013
1 Introduction
Sorting a sequence of
n
elements remains one of the most frequent tasks carried out by computers.
A lower bound for sorting by only pairwise comparisons is
blog n!c ≈ n log n −
1
.
44
n
+
O
(
log n
)
comparisons for the worst and average case (logarithms referred to by
log
are base 2, the average
case refers to a uniform distribution of all input permutations assuming all elements are different).
Sorting algorithms that are optimal in the leading term are called constant-factor-optimal.
Table 1 lists some milestones in the race for reducing the coefficient in the linear term. One
of the most efficient (in terms of number of comparisons) constant-factor-optimal algorithms for
solving the sorting problem is Ford and Johnson’s MergeInsertion algorithm [
7
]. It requires
n log n −
1
.
329
n
+
O
(
log n
) comparisons in the worst case [
10
]. MergeInsertion has a severe
drawback that makes it uninteresting for practical issues: similar to Insertionsort the number
of element moves is quadratic in
n
. With Insertionsort we mean the algorithm that inserts all
elements successively into the already ordered sequence finding the position for each element by
binary search (not by linear search as mostly done). However, MergeInsertion and Insertionsort
can be used to sort small subarrays such that the quadratic running time for these subarrays is
small in comparison to the overall running time.
Reinhardt [
12
] used this technique to design an internal Mergesort variant that needs in
the worst case
n log n −
1
.
329
n
+
O
(
log n
) comparisons. Unfortunately, implementations of this
InPlaceMergesort algorithm have not been documented. Katajainen et al.’s [
9
,
6
] work inspired
by Reinhardt is practical, but the number of comparisons is larger.
Throughout the text we avoid the terms in-place or in-situ and prefer the term internal (opposed
to external). We call an algorithm internal if it needs at most
O
(
log n
) space in addition to the
array to be sorted. That means we consider Quicksort as an internal algorithm whereas standard
Mergesort is external because it needs a linear amount of extra space.
Based on QuickHeapsort [
1
], in this paper we develop the concept of QuickXsort and apply
it to other sorting algorithms as Mergesort or WeakHeapsort. This yields efficient internal
sorting algorithms. The idea is very simple: as in Quicksort the array is partitioned into the
elements greater and less than some pivot element. Then one part of the array is sorted by some
algorithm X and the other part is sorted recursively. The advantage of this procedure is that,
if X is an external algorithm, then in QuickXsort the part of the array which is not currently
being sorted may be used as temporary space, what yields an internal variant of X. We show that
under natural assumptions QuickXsort performs up to
o
(
n
) terms on average the same number of
comparisons as X.
The concept of QuickXsort (without calling it like that) was first applied in UltimateHeap-
sort by Katajainen [
8
]. In UltimateHeapsort, first the median of the array is determined, and
then the array is partitioned into subarrays of equal size. Finding the median means significant
additional effort. Cantone and Cincotti [
1
] weakened the requirement for the pivot and designed
QuickHeapsort which uses only a sample of smaller size to select the pivot for partitioning. Ulti-
mateHeapsort is inferior to QuickHeapsort in terms of average case running time, although,
unlike QuickHeapsort, it allows an
n log n
+
O
(
n
) bound for the worst case number of comparisons.
Diekert and Weiß [
2
] analyzed QuickHeapsort more thoroughly and showed that it needs less
than
n log n −
0
.
99
n
+
o
(
n
) comparisons in the average case when implemented with approximately
√
n elements as sample for pivot selection and some other improvements.
Edelkamp and Stiegeler [
4
] applied the idea of QuickXsort to WeakHeapsort (which was first
1
Table 1: Constant-factor-optimal sorting with n log n + κn + o(n) comparisons.
Mem. Other κ Worst κ Avg. κ Exper.
Lower bound O(1) O(n log n) -1.44 -1.44
BottomUpHeapsort [13] O(1) O(n log n) ω(1) – [0.35,0.39]
WeakHeapsort [3, 5] O(n/w) O(n log n) 0.09 – [-0.46,-0.42]
RelaxedWeakHeapsort [4] O(n) O(n log n) -0.91 -0.91 -0.91
Mergesort [10] O(n) O(n log n) -0.91 -1.26 –
ExternalWeakHeapsort # O(n) O(n log n) -0.91 -1.26* –
Insertionsort [10] O(1) O(n
2
) -0.91 -1.38 # –
MergeInsertion [10] O(n) O(n
2
) -1.32 -1.3999 # [-1.43,-1.41]
InPlaceMergesort [12] O(1) O(n log n) -1.32 – –
QuickHeapsort [1, 2] O(1) O(n log n) ω(1) -0.03 ≈ 0.20
O(n/w) O(n log n) ω(1) -0.99 ≈ -1.24
QuickMergesort (IS) # O(log n) O(n log n) -0.32 -1.38 –
QuickMergesort # O(1) O(n log n) -0.32 -1.26 [-1.29,-1.27]
QuickMergesort (MI) # O(log n) O(n log n) -0.32 -1.3999 [-1.41,-1.40]
Abbreviations: # in this paper, MI MergeInsertion, – not analyzed, * for
n
= 2
k
,
w
: computer word
width in bits; we assume log n ∈ O(n/w).
For QuickXsort we assume InPlaceMergesort as a worst-case stopper (without
κ
worst
∈ ω
(1)).
described by Dutton [
3
]) introducing QuickWeakHeapsort. The worst case number of comparisons
of WeakHeapsort is
ndlog ne −
2
dlog ne
+
n −
1
≤ n log n
+ 0
.
09
n
, and, following Edelkamp and
Wegener [
5
], this bound is tight. In [
4
] an improved variant with
n log n −
0
.
91
n
comparisons in the
worst case and requiring extra space is presented. With ExternalWeakHeapsort we propose
a further refinement with the same worst case bound, but in average requiring approximately
n log n −
1
.
26
n
comparisons. Using ExternalWeakHeapsort as X in QuickXsort we obtain
an improvement over QuickWeakHeapsort of [4].
As indicated above, Mergesort is another good candidate to apply the QuickXsort-
construction. With QuickMergesort we describe an internal variant of Mergesort which
not only in terms of number of comparisons is almost as good as Mergesort, but also in terms of
running time. As mentioned before, MergeInsertion can be used to sort small subarrays. We
study MergeInsertion and provide an implementation based on weak heaps. Furthermore, we give
an average case analysis. When sorting small subarrays with MergeInsertion, we can show that
the average number of comparisons performed by Mergesort is bounded by
n log n−
1
.
3999
n
+
o
(
n
),
and, therefore, QuickMergesort uses at most
n log n −
1
.
3999
n
+
o
(
n
) comparisons in the average
case.
2 QuickXsort
In this section we give a more precise description of QuickXsort and derive some results concerning
the number of comparisons performed in the average and worst case. Let X be some sorting algorithm.
QuickXsort works as follows: First, choose some pivot element as median of some random sample.
Next, partition the array according to this pivot element, i. e., rearrange the array such that all
elements left of the pivot are less or equal and all elements on the right are greater or equal than
2
the pivot element. Then, choose one part of the array and sort it with algorithm X. (In general, it
does not matter whether the smaller or larger half of the array is chosen. However, for a specific
sorting algorithm X like Heapsort, there might be a better and a worse choice.) After one part of
the array has been sorted with X, move the pivot element to its correct position (right after/before
the already sorted part) and sort the other part of the array recursively with QuickXsort.
The main advantage of this procedure is that the part of the array that is not being sorted
currently can be used as temporary memory for the algorithm X. This yields fast internal variants
for various external sorting algorithms (such as Mergesort). The idea is that whenever a data
element should be moved to the external storage, instead it is swapped with some data element in
the part of the array which is not currently being sorted. Of course, this works only, if the algorithm
needs additional storage only for data elements. Furthermore, the algorithm has to be able to keep
track of the positions of elements which have been swapped. As the specific method depends on the
algorithm X, we give some more details when we describe the examples for QuickXsort.
For the number of comparisons we can derive some general results which hold for a wide class
of algorithms X. Under natural assumptions the average case number of comparisons of X and
of QuickXsort differs only by an
o
(
n
)-term. For the rest of the paper, we assume that the
pivot is selected as the median of approximately
√
n
randomly chosen elements. Sample sizes of
approximately
√
n are likely to be optimal as the results in [2, 11] suggest.
Theorem 1
(QuickXsort Average-Case)
.
Let X be some sorting algorithm requiring at most
n log n
+
cn
+
o
(
n
) comparisons in the average case. Then, QuickXsort implemented with Θ(
√
n
)
elements as sample for pivot selection is a sorting algorithm that also needs at most
n log n
+
cn
+
o
(
n
)
comparisons in the average case.
For the proofs we assume that the arrays are indexed starting with 1. The following lemma is
crucial for our estimates. It can be derived by applying Chernoff bounds or by direct elementary
calculations.
Lemma 1
([
2
, Lm. 2])
.
Let 0
< δ <
1
2
. If we choose the pivot as median of 2
γ
+ 1 elements such
that 2γ + 1 ≤
n
2
, then we have Pr
pivot ≤
n
2
− δn
< (2γ + 1)α
γ
where α = 4
1
4
− δ
2
< 1.
Proof of Thm. 1.
Let
T
(
n
) denote the average number of comparisons performed by QuickXsort
on an input array of length
n
and let
S
(
n
) =
n log n
+
cn
+
s
(
n
) with
s
(
n
)
∈ o
(
n
) be an upper
bound for the average number of comparisons performed by the algorithm X on an input array of
length n. Without loss of generality we may assume that s(n) is monotone. We are going to show
by induction that
T (n) ≤ n log n + cn + t(n)
for some monotonically increasing t(n) ∈ o(n) with s(n) ≤ t(n) which we will specify later.
Let
δ
(
n
)
∈ o
(1)
∩
Ω(
n
−
1
4
+
) with 1
/n ≤ δ
(
n
)
≤
1
/
4, i. e.,
δ
is some function tending slowly to
zero for
n → ∞
. Because of
δ
(
n
)
∈
Ω(
n
−
1
4
+
), we see that (2
γ
+ 1)
1 − 4δ
2
γ
tends to zero if
γ ∈
Θ(
√
n
). Hence, by Lem. 1 it follows that the probability that the pivot is more than
n · δ
(
n
)
off the median
p
(
n
) =
Pr
pivot < n
1
2
− δ(n)
+
Pr
pivot > n
1
2
+ δ(n)
tends to zero for
n → ∞
. In the following we write
M
=
n
1
2
− δ(n)
, n
1
2
+ δ(n)
∩ N
and
M
=
{
1
, . . . , n} \ M
.
3
We obtain the following recurrence relation:
T (n) ≤ n − 1 + T
pivot
(n)
+
n
X
k=1
Pr [pivot = k] · max {T (k − 1) + S(n − k), T (n − k) + S(k − 1) }
≤ n − 1 + T
pivot
(n)
+ Pr [pivot ∈ M] · max
k∈M
{T (k) + S(n − k), T (n − k) + S(k) }
+ Pr
pivot ∈ M
· max
k∈M
{T (k) + S(n − k), T (n − k) + S(k) }.
The function
f
(
x
) =
x log x
+ (
n − x
)
log
(
n − x
),
f
(0) =
f
(
n
) =
n log n
has its only minimum in
the interval [0
, n
] at
x
=
n/
2, i. e., for 0
< x < n/
2 it decreases monotonically and for
n/
2
< x < n
it increases monotonically. We set
β
=
n
2
+ n · δ(n)
. That means that we have
f
(
x
)
≤ f
(
β
) for
x ∈ M
and
f
(
x
)
≤ f
(
n
) for
x ∈ M
. Using this observation, the induction hypothesis, and our
assumptions, we conclude
max
k∈M
{T (k) + S(n − k), T (n − k) + S(k) }
≤ max {f(k) + cn + t(k) + s(n − k) | k ∈ M } ≤ f(β) + cn + t(β) + s(n),
max
k∈M
{T (k) + S(n − k), T (n − k) + S(k) }
≤ max
f(k) + cn + t(k) + s(n − k)
k ∈ M
≤ T (n) + s(n).
With p(n) as above we obtain:
T (n) ≤ n − 1 + T
pivot
(n) + p(n) · T (n) + s(n)
+ (1 − p(n))
f(β) + cn + t(β)
.
(1)
We subtract
p
(
n
)
· T
(
n
) on both sides and then divide by 1
− p
(
n
). Let
D
be some constant such
that
D ≥
1
/
(1
− p
(
n
)) for all
n
(which exists since
p
(
n
)
6
= 1 for all
n
and
p
(
n
)
→
0 for
n → ∞
).
Then, we obtain
T (n) ≤ (1 + D · p(n)) · (n − 1) + D · (T
pivot
(n) + s(n)) +
f(β) + cn + t(β)
≤ (1 + D · p(n)) · (n − 1) + D · (T
pivot
(n) + s(n))
+
n
2
− n · δ(n)
· (log(n/2) + log(1 + 2δ(n)))
+
n
2
+ n · δ(n)
· (log(n/2) + log(1 + 2δ(n))) + cn + t(3n/4)
≤ n log n + cn
+ (D · p(n) + 2 · δ(n)/ ln 2) · n + D · (T
pivot
(n) + s(n)) + t(3n/4),
where the last inequality follows from
log
(1 +
x
) =
ln
(1 +
x
)
/ ln
(2)
≤ x/ ln
(2) for
x ∈ R
>0
. We see
that T (n) ≤ n log n + cn + t(n) if t(n) satisfies the inequality
(D · p(n) + 2 · δ(n)/ ln 2) · n + D · (T
pivot
(n) + s(n)) + t(3n/4) ≤ t(n).
We choose
t
(
n
) as small as possible. Inductively, we can show that for every
there is some
D
such
that t(n) < n + D
. Hence, the theorem follows.
4
Does QuickXsort provide a good bound for the worst case? The obvious answer is “no”. If
always the
√
n
smallest elements are chosen for pivot selection, a running time of Θ(
n
3/2
) is obtained.
However, we can prove that such a worst case is very unlikely. In fact, let
R
(
n
) be the worst case
number of comparisons of the algorithm X. Prop. 1 states that the probability that QuickXsort
needs more than
R
(
n
) + 6
n
comparisons decreases exponentially in
n
. (This bound is not tight, but
since we do not aim for exact probabilities, Prop. 1 is enough for us.)
Proposition 1.
Let
>
0. The probability that QuickXsort needs more than
R
(
n
) + 6
n
compar-
isons is less than (3/4 + )
4
√
n
for n large enough.
Proof.
Let
n
be the size of the input. We say that we are in a good case if an array of size
m
is
partitioned in the interval [
m/
4
,
3
m/
4], i. e., if the pivot is chosen in that interval. We can obtain a
bound for the desired probability by estimating the probability that we always are in such a good
case until the array contains only
√
n
elements. For smaller arrays, we can assume an upper bound
of
√
n
2
=
n
comparisons for the worst case. For all partitioning steps that sums up to less than
n ·
P
i≥0
(3
/
4)
i
= 4
n
comparisons if we are always in a good case. We also have to consider the
number of comparisons required to find the pivot element. At any stage the pivot is chosen as
median of at most
√
n
elements. Since the median can be determined in linear time, for all stages
together this sums up to less than
n
comparisons if we are always in a good case and
n
is large
enough. Finally, for all the sorting phases with X we need at most
R
(
n
) comparisons in total (that
is only a rough upper bound which can be improved as in the proof of Thm. 1). Hence, we need at
most R(n) + 6n comparisons if always a good case occurs.
Now, we only have to estimate the probability that always a good case occurs. By Lem. 1,
the probability for a good case in the first partitioning step is at least 1
− d ·
√
n · (3/4)
√
n
for
some constant
d
. We have to choose
log
(
√
n
)
/ log
(3
/
4)
<
1
.
21
log n
times a pivot in the interval
[
m/
4
,
3
m/
4], then the array has size less than
√
n
. We only have to consider partitioning steps
where the array has size greater than
√
n
(if the size of the array is already less than
√
n
we define
the probability of a good case as 1). Hence, for each of these partitioning steps we obtain that the
probability for a good case is greater than 1 − d ·
4
√
n · (3/4)
4
√
n
. Therefore, we obtain
Pr [ always good case ] ≥
1 − d ·
4
√
n · (3/4)
4
√
n
1.21 log(n)
≥ 1 − 1.21 log(n) · d ·
4
√
n · (3/4)
4
√
n
by Bernoulli’s inequality. For
n
large enough we have 1
.
21
log
(
n
)
·d·
4
√
n·(3/4)
4
√
n
≤
(3
/
4+
)
4
√
n
.
To obtain a provable bound for the worst case complexity we apply a simple trick. We fix some
worst case efficient sorting algorithm Y. This might be, e. g., InPlaceMergesort. Worst case
efficient means that we have a
n log n
+
O
(
n
) bound for the worst case number of comparisons.
We choose some slowly decreasing function
δ
(
n
)
∈ o
(1)
∩
Ω(
n
−
1
4
+
), e. g.,
δ
(
n
) = 1
/ log n
. Now,
whenever the pivot is more than
n · δ
(
n
) off the median, we switch to the algorithm Y. We
call this QuickXYsort. To achieve a good worst case bound, of course, we also need a good
bound for algorithm X. W. l. o. g. we assume the same worst case bounds for X as for Y. Note
that QuickXYsort only makes sense if one needs a provably good worst case bound. Since
QuickXsort is always expected to make at most as many comparisons as QuickXYsort (under
the reasonable assumption that X on average is faster than Y – otherwise one would use simply Y),
in every step of the recursion QuickXsort is the better choice for the average case.
5
In order to obtain an efficient internal sorting algorithm, of course, Y has to be internal and X
using at most n extra spaces for an array of size n.
Theorem 2
(QuickXYsort Worst-Case)
.
Let X be a sorting algorithm with at most
n log n
+
cn
+
o
(
n
) comparisons in the average case and
R
(
n
) =
n log n
+
dn
+
o
(
n
) comparisons in the worst
case (
d ≥ c
). Let Y be a sorting algorithm with at most
R
(
n
) comparisons in the worst case. Then,
QuickXYsort is a sorting algorithm that performs at most
n log n
+
cn
+
o
(
n
) comparisons in the
average case and n log n + (d + 1)n + o(n) comparisons in the worst case.
Proof.
Since the proof is very similar to the proof of Thm. 1, we provide only a sketch. By replacing
T
(
n
) by
R
(
n
) =
n log n
+
dn
+
r
(
n
) with
r
(
n
)
∈ o
(
n
) in the right side of (1) in the proof of Thm. 1
we obtain for the average case:
T
av
(n) ≤ (n − 1) + T
pivot
(n) + s(n) + p(n) · (n log n + dn + r(n))
+ (1 − p(n)) ·
n
2
− n · δ(n)
· (log(n/2) + log(1 + 2δ(n)) + c)
+
n
2
+ n · δ(n)
· (log(n/2) + log(1 + 2δ(n)) + c) + t(3n/4)
≤ n log n + cn + T
pivot
(n)
+ (p(n) · (dn + r(n)) + 2δ(n)/ ln 2) · n + s(n) + t(3n/4).
As in Thm. 1 the statement for the average case follows.
For the worst case, there are two possibilities: either the algorithm already fails the condition
pivot ∈
n
1
2
− δ(n)
, n
1
2
+ δ(n)
in the first partitioning step or it does not. In the first case, it
is immediate that we have a worst case bound of
n log n
+
dn
+
n
+
o
(
n
), which also is tight. Note
that we assume that we can choose the pivot element in time
o
(
n
) which is no real restriction, since
the median of Θ(
√
n
) elements can be found in Θ(
√
n
) time. In the second case, we assume by
induction that
T
worst
(
m
)
≤ m log m
+
dm
+
m
+
u
(
m
) for
m < n
for some
u
(
m
)
∈ o
(
m
) and obtain
a recurrence relation similar to (1) in the proof of Thm. 1:
T
worst
(n) ≤ n − 1 + T
pivot
(n) + R
n
2
− n · δ(n)
+ T
worst
n
2
+ n · δ(n)
.
By the same arguments as above the result follows.
3 QuickWeakHeapsort
In this section consider QuickWeakHeapsort as a first example of QuickXsort. We start by
introducing weak heaps and then continue by describing WeakHeapsort and a novel external
version of it. This external version is a good candidate for QuickXsort and yields an efficient
sorting algorithm that uses approximately
n log n −
1
.
2
n
comparisons (this value is only a rough
estimate and neither a bound from below nor above). A drawback of WeakHeapsort and its
variants is that they require one extra bit per element. The exposition also serves as an intermediate
step towards our implementation of MergeInsertion, where the weak-heap data structure will be
used as a building block.
Conceptually, a weak heap (see Fig. 1) is a binary tree satisfying the following conditions:
(1) The root of the entire tree has no left child.
6
9
2
2
1
0
7
56
4
1
3
11
9
8
4
5
7
6
3
8
Figure 1: A weak heap (reverse bits are set for grey nodes, above the nodes are array indices.)
(2)
Except for the root, the nodes that have at most one child are in the last two levels only. Leaves
at the last level can be scattered, i. e., the last level is not necessarily filled from left to right.
(3)
Each node stores an element that is smaller than or equal to every element stored in its right
subtree.
From the first two properties we deduce that the height of a weak heap that has
n
elements is
dlog ne
+ 1. The third property is called the weak-heap ordering or half-tree ordering. In particular,
this property enforces no relation between an element in a node and those stored its left subtree.
On the other hand, it implies that any node together with its right subtree forms a weak heap on
its own. In an array-based implementation, besides the element array
s
, an array
r
of reverse bits is
used, i. e.,
r
i
∈ {
0
,
1
}
for
i ∈ {
0
, . . . , n −
1
}
. The root has index 0. The array index of the left child
of
s
i
is 2
i
+
r
i
, the array index of the right child is 2
i
+ 1
− r
i
, and the array index of the parent
is
bi/
2
c
(assuming that
i 6
= 0). Using the fact that the indices of the left and right children of
s
i
are exchanged when flipping
r
i
, subtrees can be reversed in constant time by setting
r
i
←
1
− r
i
.
The distinguished ancestor (
d-ancestor
(
j
)) of
s
j
for
j 6
= 0, is recursively defined as the parent of
s
j
if
s
j
is a right child, and the distinguished ancestor of the parent of
s
j
if
s
j
is a left child. The
distinguished ancestor of
s
j
is the first element on the path from
s
j
to the root which is known to be
smaller or equal than
s
j
by (3). Moreover, any subtree rooted by
s
j
, together with the distinguished
ancestor s
i
of s
j
, forms again a weak heap with root s
i
by considering s
j
as right child of s
i
.
The basic operation for creating a weak heap is the
join
operation which combines two weak
heaps into one. Let
i
and
j
be two nodes in a weak heap such that
s
i
is smaller than or equal to
every element in the left subtree of
s
j
. Conceptually,
s
j
and its right subtree form a weak heap,
while
s
i
and the left subtree of
s
j
form another weak heap. (Note that
s
i
is not allowed be in the
subtree with root
s
j
.) The result of
join
is a weak heap with root at position
i
. If
s
j
< s
i
, the two
elements are swapped and
r
j
is flipped. As a result, the new element
s
j
will be smaller than or
equal to every element in its right subtree, and the new element
s
i
will be smaller than or equal to
every element in the subtree rooted at
s
j
. To sum up,
join
requires constant time and involves one
element comparison and a possible element swap in order to combine two weak heaps to a new one.
The construction of a weak heap consisting of
n
elements requires
n −
1 comparisons. In the
standard bottom-up construction of a weak heap the nodes are visited one by one. Starting with
the last node in the array and moving to the front, the two weak heaps rooted at a node and
its distinguished ancestor are joined. The amortized cost to get from a node to its distinguished
ancestor is O(1) [5].
When using weak heaps for sorting, the minimum is removed and the weak heap condition
7
restored until the weak heap becomes empty. After extracting an element from the root, first the
special path from the root is traversed top-down, and then, in a bottom-up process the weak-heap
property is restored using at most
dlog ne
join operations. (The special path is established by going
once to the right and then to the left as far as it is possible.) Hence, extracting the minimum
requires at most dlog ne comparisons.
Now, we introduce a modification to the standard procedure described by Dutton [
3
], which
has a slightly improved performance, but requires extra space. We call this modified algorithm
ExternalWeakHeapsort. This is because it needs an extra output array, where the elements
which are extracted from the weak heap are moved to. On average ExternalWeakHeapsort
requires less comparisons than RelaxedWeakHeapsort [
4
]. Integrated in QuickXsort we
can implement it without extra space other than the extra bits
r
and some other extra bits. We
introduce an additional array active and weaken the requirements of a weak heap: we also allow
nodes on other than the last two levels to have less than two children. Nodes where the active bit is
set to false are considered to have been removed. ExternalWeakHeapsort works as follows:
First, a usual weak heap is constructed using
n −
1 comparisons. Then, until the weak heap becomes
empty, the root – which is the minimal element – is moved to the output array and the resulting
hole has to be filled with the minimum of the remaining elements (so far the only difference to
normal WeakHeapsort is that there is a separate output area).
The hole is filled by searching the special path from the root to a node
x
which has no left child.
Note that the nodes on the special path are exactly the nodes having the root as distinguished
ancestor. Finding the special path does not need any comparisons, since one only has to follow
the reverse bits. Next, the element of the node
x
is moved to the root leaving a hole. If
x
has a
right subtree (i. e., if
x
is the root of a weak heap with more than one element), this hole is filled by
applying the hole-filling algorithm recursively to the weak heap with root
x
. Otherwise, the active
bit of
x
is set to false. Now, the root of the whole weak heap together with the subtree rooted by
x
forms a weak heap. However, it remains to restore the weak heap condition for the whole weak heap.
Except for the root and
x
, all nodes on the special path together with their right subtrees form weak
heaps. Following the special path upwards these weak heaps are joined with their distinguished
ancestor as during the weak heap construction (i. e., successively they are joined with the weak
heap consisting of the root and the already treated nodes on the special path together with their
subtrees). Once, all the weak heaps on the special path are joined, the whole array forms a weak
heap again.
Theorem 3.
For
n
= 2
k
ExternalWeakHeapsort performs exactly the same comparisons as
Mergesort applied on a fixed permutation of the same input array.
Proof.
First, recall the Mergesort algorithm: The left half and the right half of the array are
sorted recursively and then the two subarrays are merged together by always comparing the smallest
elements of both arrays and moving the smaller one to the separate output area. Now, we move to
WeakHeapsort. Consider the tree as it is initialized with all reverse bits set to false. Let
r
be the
root and
y
its only child (not the elements but the positions in the tree). We call
r
together with
the left subtree of
y
the left part of the tree and we call
y
together with its right subtree the right
part of the tree. That means the left part and the right part form weak heaps on their own. The
only time an element is moved from the right to the left part or vice-versa is when the data elements
s
r
and
s
y
are exchanged. However, always one of the data elements of
r
and
y
comes from the right
part and one from the left part. After extracting the minimum
s
r
, it is replaced by the smallest
8
Pivot Pivot
Pivot
Figure 2: First the two halves of the left part are sorted moving them from one place to another.
Then, they are merged to the original place.
remaining element of the part
s
r
came from. Then, the new
s
r
and
s
y
are compared again and so
on. Hence, for extracting the elements in sorted order from the weak heap the following happens.
First, the smallest elements of the left and right part are determined, then they are compared and
finally the smaller one is moved to the output area. This procedure repeats until the weak heap is
empty. This is exactly how the recursion of Mergesort works: always the smallest elements of the
left and right part are compared and the smaller one is moved to the output area. If
n
= 2
k
, then
the left and right parts for Mergesort and WeakHeapsort have the same sizes.
By [10, 5.2.4–13] we obtain the following corollary.
Corollary 1
(Average Case ExternalWeakHeapsort)
.
For
n
= 2
k
the algorithm External-
WeakHeapsort uses approximately n log n − 1.26n comparisons in the average case.
If
n
is not a power of two, the sizes of left and right parts of WeakHeapsort are less
balanced than the left and right parts of ordinary Mergesort and one can expect a slightly higher
number of comparisons. For QuickWeakHeapsort, the half of the array which is not sorted by
ExternalWeakHeapsort is used as output area. Whenever the root is moved to the output
area, the element that occupied that place before is inserted as a dummy element at the position
where the active bit is set to false. Applying Thm. 1, we obtain the rough estimate of
n log n −
1
.
2
n
comparisons for the average case of QuickWeakHeapsort.
4 QuickMergesort
As another example for QuickXsort we consider QuickMergesort. For the Mergesort part
we use standard (top-down) Mergesort which can be implemented using
m
extra spaces to merge
two arrays of length
m
(there are other methods like in [
12
] which require less space – but for our
purposes this is good enough). The procedure is depicted in Fig. 2. We sort the larger half of the
partitioned array with Mergesort as long as we have one third of the whole array as temporary
memory left, otherwise we sort the smaller part with Mergesort. Hence, the part which is not
sorted by Mergesort always provides enough temporary space. When a data element should be
moved to or from the temporary space, it is swapped with the element occupying the respective
position. Since Mergesort moves through the data from left to right, it is always known which are
the elements to be sorted and which are the dummy elements. Depending on the implementation the
extra space needed is
O
(
log n
) words for the recursion stack of Mergesort. By avoiding recursion
this can even be reduced to O(1). Thm. 1 together with [10, 5.2.4–13] yields the next result.
9
Theorem 4
(Average Case QuickMergesort)
.
QuickMergesort is an internal sorting algo-
rithm that performs at most n log n − 1.26n + o(n) comparisons on average.
We can do even better if we sort small subarrays with another algorithm Z requiring less
comparisons but extra space and more moves, e. g., Insertionsort or MergeInsertion . If we use
O
(
log n
) elements for the base case of Mergesort, we have to call Z at most
O
(
n/ log n
) times.
In this case we can allow additional operations of Z like moves in the order of
O
(
n
2
), given that
O((n/ log n) · log
2
n) = O(n log n).
Note that for the next theorem we only need that the size of the base cases grows as
n
grows.
Nevertheless,
O
(
log n
) is the largest growing value we can choose if we apply a base case algorithm
with Θ(n
2
) moves and want to achieve an O(n log n) overall running time.
Theorem 5
(QuickMergesort with Base Case)
.
Let
Z
be some sorting algorithm with
n log n
+
en
+
o
(
n
) comparisons on the average and other operations taking at most
O
(
n
2
) time. If base cases
of size
O
(
log n
) are sorted with Z, QuickMergesort uses at most
n log n
+
en
+
o
(
n
) comparisons
and O(n log n) other instructions on the average.
Proof.
By Thm. 1 and the preceding remark, the only thing we have to prove is that Mergesort
with base case Z requires on average at most
≤ n log n
+
en
+
o
(
n
) comparisons, given that Z needs
≤ U
(
n
) =
n log n
+
en
+
o
(
n
) comparisons on average. The latter means that for every
>
0 we
have U (n) ≤ n log n + (e + ) · n for n large enough.
Let
S
k
(
m
) denote the average case number of comparisons of Mergesort with base cases
of size
k
sorted with Z and let
>
0. Since
log n
grows as
n
grows, we have that
S
log n
(
m
) =
U
(
m
)
≤ m log m
+ (
e
+
)
· m
for
n
large enough and (
log n
)
/
2
< m ≤ log n
. For
m > log n
we
have
S
log n
(
m
)
≤
2
·S
log n
(
m/
2) +
m
and by induction we see that
S
log n
(
m
)
≤ m log m
+ (
e
+
)
·m
.
Hence, also S
log n
(n) ≤ n log n + (e + ) · n for n large enough.
Using Insertionsort we obtain the following result. Here,
ln
denotes the natural logarithm.
As we did not find a result in literature, we also provide a proof. Recall that Insertionsort inserts
the elements one by one into the already sorted sequence by binary search.
Proposition 2
(Average Case of Insertionsort)
.
The sorting algorithm Insertionsort needs
n log n − 2 ln 2 · n + c(n) · n + O(log n) comparisons on the average where c(n) ∈ [−0.005, 0.005].
Corollary 2
(QuickMergesort with Base Case Insertionsort)
.
If we use as base case In-
sertionsort, QuickMergesort uses at most
n log n −
1
.
38
n
+
o
(
n
) comparisons and
O
(
n log n
)
other instructions on the average.
Proof of Prop. 2.
First, we take a look at the average number of comparisons
T
InsAvg
(
k
) to insert
one element into a sorted array of k − 1 elements by binary insertion.
To insert a new element into
k −
1 elements either needs
dlog ke −
1 or
dlog ke
comparisons.
There are
k
positions where the element to be inserted can end up, each of which is equally
likely. For 2
dlog ke
− k
of these positions
dlog ke −
1 comparisons are needed. For the other
k − (2
dlog ke
− k) = 2k − 2
dlog ke
positions dlog ke comparisons are needed. This means
T
InsAvg
(k) =
(2
dlog ke
− k) · (dlog ke − 1) + (2k − 2
dlog ke
) · dlog ke
k
= dlog ke + 1 −
2
dlog ke
k
10
comparisons are needed on average. By [
10
, 5.3.1–(3)], we obtain for the average case for sorting
n
elements:
T
InsSortAvg
(n) =
n
X
k=1
T
InsAvg
(k) =
n
X
k=1
dlog ke + 1 −
2
dlog ke
k
!
= n · dlog ne − 2
dlog ne
+ 1 + n −
n
X
k=1
2
dlog ke
k
.
We examine the last sum separately. In the following we write
H
(
n
) =
P
n
k=1
1
k
=
ln n
+
γ
+
O
(
1
n
)
for the harmonic sum with γ the Euler constant.
n
X
k=1
2
dlog ke
k
= 1 +
dlog ne−2
X
i=0
2
i
X
`=1
2
i+1
2
i
+ `
+
n
X
`=2
dlog ne−1
+1
2
dlog ne
`
= 1 +
dlog ne−2
X
i=0
2
i+1
·
H
2
i+1
− H
2
i
+ 2
dlog ne
·
H(n) − H
2
dlog ne−1
=
dlog ne−2
X
i=0
2
i+1
·
ln
2
i+1
) + γ −ln
2
i
− γ
+
ln (2
n
) + γ −ln
2
dlog ne−1
− γ
· 2
dlog ne
+ O(1)
= ln 2 ·
dlog ne−2
X
i=0
2
i+1
· (i + 1 − i)
+
log(n) · ln 2 − (dlog ne − 1) · ln 2
· 2
dlog ne
= ln 2 ·
2 ·
2
dlog ne−1
− 1
+ (log n − dlog ne + 1) · 2
dlog ne
+ O(1)
= ln 2 · (2 + log n − dlog ne) · 2
dlog ne
+ O(1)
Hence, we have
T
InsSortAvg
(n) = n · dlog ne − 2
dlog ne
+ n − ln 2 · (2 + log n − dlog ne) · 2
dlog ne
+ O(1).
In order to obtain a numeric bound for
T
InsSortAvg
(
n
), we compute (
T
InsSortAvg
(
n
)
− n log n
)
/n
and
then replace dlog ne − log n by x. This yields a function
x 7→ x − 2
x
+ 1 − ln 2 · (2 − x) · 2
x
,
which oscillates between
−
1
.
381 and
−
1
.
389 for 0
≤ x <
1. For
x
= 0, its value is 2
ln
2
≈
1
.
386.
Bases cases of growing size, always lead to a constant factor overhead in running time if an
algorithm with a quadratic number of total operations is applied. Therefore, in the experiments
we will also consider constant size base cases which offer a slightly worse bound for the number
of comparisons, but are faster in practice. We do not analyze them separately, since the preferred
choice for the size depends on the type of data to be sorted and the system on which the algorithms
run.
11
5 MergeInsertion
MergeInsertion by Ford and Johnson [
7
] is one of the best sorting algorithms in terms of number
of comparisons. Hence, it can be applied for sorting base cases of QuickMergesort what yields
even better results than Insertionsort. Therefore, we want to give a brief description of the
algorithm and our implementation. While the description is simple, MergeInsertion is not easy
to implement efficiently. Our implementation is based on weak heaps and uses
n log n
+
n
extra bits.
Algorithmically, MergeInsertion(
s
0
, . . . , s
n−1
) can be described as follows (an intuitive example
for n = 21 can be found in [10]).
1.
Arrange the input such that
s
i
≥ s
i+bn/2c
for 0
≤ i < bn/2c
with one comparison per pair.
Let a
i
= s
i
and b
i
= s
i+bn/2c
for 0 ≤ i < bn/2c, and b
bn/2c
= s
n−1
if n is odd.
2. Sort the values a
0
,...,a
bn/2c−1
recursively with MergeInsertion.
3.
Rename the solution as follows:
b
0
≤ a
0
≤ a
1
≤ ··· ≤ a
bn/2c−1
and insert the elements
b
1
, . . . , b
dn/2e−1
via binary insertion, following the ordering
b
2
,
b
1
,
b
4
,
b
3
,
b
10
,
b
9
, . . . , b
5
, . . .
,
b
t
k−1
, b
t
k−1
−1
, . . . b
t
k−2
+1
, b
t
k
, . . . into the main chain, where t
k
= (2
k+1
+ (−1)
k
)/3.
Due to the different renamings, the recursion, and the change of link structure, the design of
an efficient implementation is not immediate. Our proposed implementation of MergeInsertion
is based on a tournament tree representation with weak heaps as in Sect. 3. The pseudo-code
implementations for all the operations to construct a tournament tree with a weak heap and to
access the partners in each round are shown in Fig. 6 in the appendix. (Note that for simplicity in
the above formulation the indices and the order are reversed compared to our implementation.)
One main subroutine of MergeInsertion is binary insertion. The call
binary-insert
(
x, y, z
)
inserts the element at position
z
between position
x −
1 and
x
+
y
by binary insertion. (The
pseudo-code implementations for the binary search routine is shown in Fig. 7 in the appendix.) In
this routine we do not move the data elements themselves, but we use an additional index array
φ
0
, . . . , φ
n−1
to point to the elements contained in the weak heap tournament tree and move these
indirect addresses. This approach has the advantage that the relations stored in the tournament
tree are preserved.
The most important procedure for MergeInsertion is the organization of the calls for
binary-insert
. After adapting the addresses for the elements
b
i
(w. r. t. the above description)
in the second part of the array, the algorithm calls the binary insertion routine with appropriate
indices. Note that we always use
k
comparisons for all elements of the
k
-th block (i. e., the elements
b
t
k
, . . . , b
t
k−1
+1
) even if there might be the chance to save one comparison. By introducing an
additional array, which for each
b
i
contains the current index of
a
i
, we can exploit the observation
that not always
k
comparisons are needed to insert an element of the
k
-th block. In the following
we call this the improved variant. The pseudo-code of the basic variant is shown in Fig. 3. The last
sequence is not complete and is thus tackled in a special case.
Theorem 6
(Average Case of MergeInsertion)
.
The sorting algorithm MergeInsertion needs
n log n − c(n) · n + O(log n) comparisons on the average, where c(n) ≥ 1.3999.
Corollary 3
(QuickMergesort with Base Case MergeInsertion)
.
When using MergeInser-
tion as base case, QuickMergesort needs at most
n log n −
1
.
3999
n
+
o
(
n
) comparisons and
O(n log n) other instructions on the average.
12
procedure: merge(m: integer)
global: φ array of n integers imposed by weak-heap
for l ← 0 to bm/2c − 1
φ
m−odd(m)−l−1
← d-child(φ
l
, m − odd(m));
k ← 1; e ← 2
k
; c ← f ← 0;
while e < m
k ← k + 1; e ← 2e;
l ← dm/2e + f; f ← f + (t
k
− t
k−1
);
for i ← 0 to (t
k
− t
k−1
) − 1
c ← c + 1;
if c = dm/2e then
return;
if t
k
> dm/2e − 1 then
binary-insert(i + 1 − odd(m), l, m − 1);
else
binary-insert(bm/2c − f + i, e − 1, bm/2c + f);
Figure 3: Merging step in MergeInsertion with
t
k
= (2
k+1
+ (
−
1)
k
)
/
3 ,
odd
(
m
) =
m mod
2,
and
d-child
(
φ
i
, n
) returns the highest index less than
n
of a grandchild of
φ
i
in the weak
heap (i. e,
d-child
(
φ
i
, n
) = index of the bottommost element in the weak heap which has
d-ancestor = φ
i
and index < n).
Proof of Thm. 6.
According to Knuth [
10
], MergeInsertion requires at most
W
(
n
) =
n log n −
(3
−log
3)
n
+
n
(
y
+1
−
2
y
)+
O
(
log n
) comparisons in the worst case, where
y
=
y
(
n
) =
dlog(3n/4)e−
log
(3
n/
4)
∈
[0
,
1). In the following we want to analyze the average savings relative to the worst
case. Therefore, let
F
(
n
) denote the average number of comparisons of the insertion steps of
MergeInsertion, i. e., all comparisons minus the efforts for the weak heap construction, which
always takes place. Then, we obtain the recurrence relation
F (n) = F (bn/2c) + G(dn/2e), with
G(m) = (k
m
− α
m
) · (m − t
k
m
−1
) +
k
m
−1
X
j=1
j · (t
j
− t
j−1
),
with
k
m
such that
t
k
m
−1
≤ m < t
k
m
and some
α
m
∈
[0
,
1]. As we do not analyze the improved
version of the algorithm, the insertion of elements with index less or equal
t
k
m
−1
requires always
the same number of comparisons. Thus, the term
P
k
m
−1
j=1
j ·
(
t
j
− t
j−1
) is independent of the data.
However, inserting an element after
t
k
m
−1
may either need
k
m
or
k
m
−
1 comparisons. This is where
α
m
comes from. Note that α
m
only depends on m. We split F (n) into F
0
(n) + F
00
(n) with
F
0
(n) = F
0
(bn/2c) + G
0
(dn/2e) and
G
0
(m) = (k
m
− α
m
) · (m − t
k
m
−1
) with k
m
such that t
k
m
−1
≤ m < t
k
m
,
13
and
F
00
(n) = F
00
(bn/2c) + G
00
(dn/2e) and
G
00
(m) =
k
m
−1
X
j=1
j · (t
j
− t
j−1
) with k
m
such that t
k
m
−1
≤ m < t
k
m
.
For the average case analysis, we have that
F
00
(
n
) is independent of the data. For
n
= (4
/
3)
·
2
k
we have
G
0
(
n
) = 0, and hence,
F
0
(
n
) = 0. Since otherwise
G
0
(
n
) is non-negative, this proves that
exactly for n = (4/3) · 2
k
the average case matches the worst case.
Now, we have to estimate
F
0
(
n
) for arbitrary
n
. We have to consider the calls to binary insertion
more closely. To insert a new element into an array of
m −
1 elements either needs
dlog me −
1 or
dlog me
comparisons. For a moment assume that the element is inserted at every position with the
same probability. Under this assumption the analysis in the proof of Prop. 2 is valid, which states
that
T
InsAvg
(m) = dlog me + 1 −
2
dlog me
m
comparisons are needed on average.
The problem is that in our case the probability at which position an element is inserted is not
uniformly distributed. However, it is monotonically increasing with the index in the array (indices
as in our implementation). Informally speaking, this is because if an element is inserted further
to the right, then for the following elements there are more possibilities to be inserted than if the
element is inserted on the left.
Now,
binary-insert
can be implemented such that for an odd number of positions the next
comparison is made such that the larger half of the array is the one containing the positions with
lower probabilities. (In our case, this is the part with the lower indices – see Fig. 7.) That means
the less probable positions lie on rather longer paths in the search tree, and hence, the average path
length is better than in the uniform case. Therefore, we may assume a uniform distribution in the
following as an upper bound.
In each of the recursion steps we have
dn/2e − t
k
dn/2e
−1
calls to binary insertion into sets
of size
dn/2e
+
t
k
dn/2e
−1
−
1 elements each. Hence, for inserting one element, the difference
to the worst case is
2
logdn/2e+t
k
dn/2e
−1
dn/2e+t
k
dn/2e
−1
−
1. Summing up, we obtain for the average savings
S
(
n
) =
W
(
n
)
−
(
F
(
n
) +
weak-heap-construction
(
n
)) w. r. t. the worst case number
W
(
n
) the
recurrence
S(n) ≥ S(bn/2c) + (dn/2e − t
k
dn/2e
−1
) ·
2
l
log(dn/2e+t
k
dn/2e
−1
)
m
dn/2e + t
k
dn/2e
−1
− 1
.
For m ∈ R
>0
we write m = 2
`
m
−log 3+x
with x ∈ [0, 1) and we set
f(m) = (m − 2
`
m
−log 3
) ·
2
`
m
m + 2
`
m
−log 3
− 1
.
14
Recall that we have
t
k
= (2
k+1
+ (
−
1)
k
)
/
3. Thus,
k
m
and
`
m
coincide for most
m
and differ by at
most 1 for a few values where
m
is close to
t
k
m
or
t
k
m
−1
. Since in both cases
f
(
m
) is smaller than
some constant, this implies that
f
(
m
) and (
m − t
k
m
−1
)
·
2
d
log(m+t
k
m
−1
)
e
m+t
k
m
−1
− 1
differ by at most a
constant. Furthermore, f(m) and f(m + 1/2) differ by at most a constant. Hence, we have:
S(n) ≥ S(n/2) + f(n/2) + O(1).
Since we have f(n/2) = f (n)/2, this resolves to
S(n) ≥
X
i>0
f(n/2
i
) + O(log n) =
X
i>0
f(n)/2
i
+ O(log n) = f(n) + O(log n).
With n = 2
k−log 3+x
this means up to O(log n/n)-terms
S(n)
n
≈
2
k−log 3+x
− 2
k−log 3
2
k−log 3+x
·
2
k
2
k−log 3+x
+ 2
k−log 3
− 1
= (1 − 2
−x
) ·
3
2
x
+ 1
− 1
.
Writing F (n) = n log n − c(n) · n + O(log n) we obtain with [10]
c(n) ≥ −(F (n) − n log n)/n = (3 − log 3) − (y + 1 − 2
y
) + S(n)/n,
where
y
=
dlog(3n/4)e − log
(3
n/
4)
∈
[0
,
1), i. e.,
n
= 2
`−log 3−y
for some
` ∈ Z
. With
y
= 1
− x
it
follows
c(n) ≥ (3 − log 3) − (1 − x + 1 − 2
1−x
) + (1 − 2
−x
) ·
3
2
x
+ 1
− 1
> 1.3999.
This function reaches its minimum in [0, 1) for x = log
ln 8 − 1 +
p
(1 − ln 8)
2
− 1
.
It is not difficult to observe that
c
(2
k
) = 1
.
4. For the factor
e
(
n
) in
n log n − e
(
n
) +
O
(
log n
)
we have
e
(2
k
) = 3
− log
3 + (
x
+ 1
−
2
x
), where
x
=
log(3/4) · 2
k
− log
((3
/
4)
·
2
k
). We know that
x
can be rewritten as
x
=
log(3) + log(2
k
/4)
−
(
log
3 +
log
(2
k
/
4) =
dlog 3e − log
3 = 2
− log
3.
Hence, we have
e
(
n
) =
−
3
log
(3) + (3
log
(3)
−
2
2−log(3)
) =
−
4
/
3. Finally, we are interested in the
value W (n) − S(n) = W (2
k
) − S(2
k
) = −4/3 − 1/15 = −1.4.
6 Experiments
Our experiments consist of two parts. First, we compare the different algorithms we use as base
cases, i. e., MergeInsertion, its improved variant, and Insertionsort. The results can be seen
in Fig. 4. Depending on the size of the arrays the displayed numbers are averages over 10-10000
runs
1
. The data elements we sorted were randomly chosen 64-bit integers
2
.
1
Our experiments were run on one core of an Intel Core i7-3770 CPU (3.40GHz, 8MB Cache) with 32GB RAM;
Operating system: Ubuntu Linux 64bit; Compiler: GNU’s g++ (version 4.6.3) optimized with flag -O3.
2
To rely on objects being handled we avoided the flattening of the array structure by the compiler. Hence, for the
running time experiments, and in each comparison taken, we left the counter increase operation intact.
15
The outcome in Fig. 4 shows that our improved MergeInsertion implementation achieves
results for the constant
κ
of the linear term in the range of [
−
1
.
43
, −
1
.
41] (for some values of
n
are
even smaller than
−
1
.
43). Moreover, the standard implementation with slightly more comparisons is
faster than Insertionsort. By the
O
(
n
2
) work, the resulting runtimes for all three implementations
raises quickly, so that only moderate values of n can be handled.
−1.45
−1.44
−1.43
−1.42
−1.41
−1.4
−1.39
−1.38
−1.37
−1.36
−1.35
2
10
2
12
2
14
2
16
Number of element comparisons − n log n per n
n [logarithmic scale]
Small−Scale Comparison Experiment
Lower Bound
Insertionsort
Merge Insertion Improved
Merge Insertion
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
2
10
2
12
2
14
2
16
Execution time per (#elements)
2
[µs]
n [logarithmic scale]
Small−Scale Runtime Experiment
Insertionsort
Merge Insertion Improved
Merge Insertion
Figure 4: Comparison of MergeInsertion, its improved variant and Insertionsort. For the
number of comparisons n log n + κn the value of κ is displayed.
The second part of our experiments (shown in Fig. 5) consists of the comparison of Quick-
Mergesort (with base cases of constant and growing size) and QuickWeakHeapsort with
state-of-the-art algorithms as STL-Introsort (i. e., Quicksort), STL-stable-sort (an imple-
mentation of Mergesort) and Quicksort with median of
√
n
elements for pivot selection. For
QuickMergesort with base cases, the improved variant of MergeInsertion is used to sort
subarrays of size up to 40
log
10
n
. For the normal QuickMergesort we used base cases of size
≤
9. We also implemented QuickMergesort with median of three for pivot selection, which turns
out to be practically efficient, although it needs slightly more comparisons than QuickMergesort
with median of
√
n
. However, since also the larger half of the partitioned array can be sorted with
Mergesort, the difference to the median of
√
n
version is not as big as in QuickHeapsort [
2
].
As suggested by the theory, we see that our improved QuickMergesort implementation with
growing size base cases MergeInsertion yields a result for the constant in the linear term that is
in the range of [
−
1
.
41
, −
1
.
40] – close to the lower bound. However, for the running time, normal
QuickMergesort as well as the STL-variants Introsort (
std::sort
) and BottomUpMerge-
sort (
std::stable sort
) are slightly better. With about 15% the time gap, however, is not overly
big, and may be bridged with additional efforts like skewed pivots and refined partitioning. Also,
if comparisons are more expensive, QuickMergesort should perform significantly faster than
Introsort.
16
−1.5
−1
−0.5
0
0.5
1
2
10
2
12
2
14
2
16
2
18
2
20
2
22
Number of element comparisons − n log n per n
n [logarithmic scale]
Large−Scale Comparison Experiment
Quicksort Median Sqrt
STL Introsort (out of range)
STL Mergesort
QuickMergesort (MI) Median Sqrt
QuickMergesort Median 3
QuickMergesort Median Sqrt
QuickWeakHeapsort Median Sqrt
Lower Bound
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
2
12
2
14
2
16
2
18
2
20
2
22
Execution time per element [µs]
n [logarithmic scale]
Large−Scale Runtime Experiment
Quicksort Median Sqrt
STL Introsort
STL Mergesort
QuickMergesort (MI) Median Sqrt
QuickMergesort Median 3
QuickMergesort Median Sqrt
QuickWeakHeapsort Median Sqrt
Figure 5: Comparison of QuickMergesort (with base cases of constant and growing size) and
QuickWeakHeapsort with other sorting algorithms; (MI) is short for including growing size base
cases derived from MergeInsertion. For the number of comparisons
n log n
+
κn
the value of
κ
is
displayed.
7 Concluding Remarks
Sorting
n
elements remains a fascinating topic for computer scientists both from a theoretical and
from a practical point of view. With QuickXsort we have described a procedure how to convert
an external sorting algorithm into an internal one introducing only
o
(
n
) additional comparisons on
average. We presented QuickWeakHeapsort and QuickMergesort as two examples for this
construction. QuickMergesort is close to the lower bound for the average number of comparisons
and at the same time is practically efficient, even when the comparisons are fast.
Using MergeInsertion to sort base cases of growing size for QuickMergesort, we derive an
an upper bound of
n log n −
1
.
3999
n
+
o
(
n
) comparisons for the average case. As far as we know
a better result has not been published before. Our experimental results validate the theoretical
considerations and indicate that the factor
−
1
.
43 can be beaten. Of course, there is still room in
closing the gap to the lower bound of n log n − 1.44n + O(log n) comparisons.
17
References
[1]
D. Cantone and G. Cinotti. QuickHeapsort, an efficient mix of classical sorting algorithms.
Theoretical Comput. Sci., 285(1):25–42, 2002.
[2]
V. Diekert and A. Weiß. Quickheapsort: Modifications and improved analysis. In A. A. Bulatov
and A. M. Shur, editors, CSR, volume 7913 of Lecture Notes in Computer Science, pages 24–35.
Springer, 2013.
[3] R. D. Dutton. Weak-heap sort. BIT, 33(3):372–381, 1993.
[4]
S. Edelkamp and P. Stiegeler. Implementing HEAPSORT with
n log n −
0
.
9
n
and QUICKSORT
with n log n + 0.2n comparisons. ACM Journal of Experimental Algorithmics, 10(5), 2002.
[5]
S. Edelkamp and I. Wegener. On the performance of Weak-Heapsort. In 17th Annual Symposium
on Theoretical Aspects of Computer Science, volume 1770, pages 254–266. Springer-Verlag,
2000.
[6]
A. Elmasry, J. Katajainen, and M. Stenmark. Branch mispredictions don’t affect mergesort. In
SEA, pages 160–171, 2012.
[7]
J. Ford, Lester R. and S. M. Johnson. A tournament problem. The American Mathematical
Monthly, 66(5):pp. 387–389, 1959.
[8] J. Katajainen. The Ultimate Heapsort. In CATS, pages 87–96, 1998.
[9]
J. Katajainen, T. Pasanen, and J. Teuhola. Practical in-place mergesort. Nord. J. Comput.,
3(1):27–40, 1996.
[10]
D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison
Wesley Longman, 2nd edition, 1998.
[11]
C. Mart´ınez and S. Roura. Optimal Sampling Strategies in Quicksort and Quickselect. SIAM
J. Comput., 31(3):683–705, 2001.
[12]
K. Reinhardt. Sorting in-place with a worst case complexity of
n log n −
1
.
3
n
+
o
(
log n
)
comparisons and n log n + o(1) transports. In ISAAC, pages 489–498, 1992.
[13]
I. Wegener. Bottom-up-Heapsort, a new variant of Heapsort beating, on an average, Quicksort
(if n is not very small). Theoretical Comput. Sci., 118:81–98, 1993.
18
A Pseudocode
procedure: construct(s: array of elements, r: array of n bits, m bound)
for k = m − 1 downto 1
if i + 1 = k then
if even(k) then
join(d-ancestor(i), i)
k ← bk/2c
else
join(d-ancestor(i), i)
procedure: d-ancestor(j: index)
while (j bitand 1) = r
bj/2c
j ← bj/2c
return bj/2c
procedure: join(i, j: indices)
if s
j
< s
i
then
swap(s
i
, s
j
)
r
j
← 1 − r
j
procedure: d-child(i, j: indices)
x ← secondchild(i)
while firstchild(x) < j
x ← firstchild(x)
return x
Figure 6: Constructing a weak heap for MergeInsertion.
19
procedure: binary-insert( s: array of n elements, φ: array of n integers, r: array of n bits,
f, d, t integers)
for j = t downto f + d + 1
swap(φ
j−1
, φ
j
)
l ← f
r ← f + d
while l < r
m ← (l + r)/2
if s
φ
f+d
> s
φ
m
then
l ← m + 1
else
r ← m
for j = f + d downto l
swap(φ
j−1
, φ
j
)
Figure 7: Binary insertion of elements in MergeInsertion algorithm.
procedure: mergeinsertionrecursive( s: array of n elements, φ: array of n integers, r: array
of n bits )
if k > 2 then
mergeinsertionrecursive(k div 2)
merge(k)
procedure: mergeinsertion
(
s
: array of
n
elements,
φ
: array of
n
integers,
r
: array of
n
bits )
construct(n)
mergeinsertionrecursive(n)
Figure 8: Main routine and recursive call for MergeInsertion algorithm.
20