PreprintPDF Available

# QuickMergesort: Practically Efficient Constant-Factor Optimal Sorting

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

## Abstract

We consider the fundamental problem of internally sorting a sequence of $n$ elements. In its best theoretical setting QuickMergesort, a combination Quicksort with Mergesort with a Median-of-$\sqrt{n}$ pivot selection, requires at most $n \log n - 1.3999n + o(n)$ element comparisons on the average. The questions addressed in this paper is how to make this algorithm practical. As refined pivot selection usually adds much overhead, we show that the Median-of-3 pivot selection of QuickMergesort leads to at most $n \log n - 0{.}75n + o(n)$ element comparisons on average, while running fast on elementary data. The experiments show that QuickMergesort outperforms state-of-the-art library implementations, including C++'s Introsort and Java's Dual-Pivot Quicksort. Further trade-offs between a low running time and a low number of comparisons are studied. Moreover, we describe a practically efficient version with $n \log n + O(n)$ comparisons in the worst case.
QuickMergesort: Practically Eﬃcient
Constant-Factor Optimal Sorting
Stefan Edelkamp1and Armin Weiß2
1 King’s College London, UK
stefan.edelkamp@kcl.ac.uk
2 Universität Stuttgart, Germany
armin.weiss@fmi.uni-stuttgart.de
Abstract
We consider the fundamental problem of internally sorting a sequence of nelements. In its
best theoretical setting QuickMergesort, a combination Quicksort with Mergesort with a Median-
of-npivot selection, requires at most nlog n1.3999n+o(n)element comparisons on the
average. The questions addressed in this paper is how to make this algorithm practical. As
reﬁned pivot selection usually adds much overhead, we show that the Median-of-3 pivot selection
of QuickMergesort leads to at most nlog n0.75n+o(n)element comparisons on average, while
running fast on elementary data. The experiments show that QuickMergesort outperforms state-
of-the-art library implementations, including C++’s Introsort and Java’s Dual-Pivot Quicksort.
Further trade-oﬀs between a low running time and a low number of comparisons are studied.
Moreover, we describe a practically eﬃcient version with nlog n+O(n)comparisons in the worst
case.
1998 ACM Subject Classiﬁcation F.2.2 Nonnumerical Algorithms and Problems
Keywords and phrases in-place sorting, quicksort, mergesort, analysis of algorithms
Digital Object Identiﬁer 10.4230/LIPIcs.xxx.yyy.p
1 Introduction
Sorting a sequence of
n
elements remains one of the most fascinating topics in computer
science, and runtime improvements to sorting has signiﬁcant impact for many applications.
The lower bound is
log
(
n
!)
nlog n
1
.
44
n
+ Θ(
log n
)element comparisons applies to the
worst and the average case1.
The sorting algorithms we propose in this paper are internal or in-place: they need at
most
O
(
log n
)space (computer words) in addition to the array to be sorted. That means
we consider Quicksort [
15
] an internal algorithm, whereas standard Mergesort is external
because it needs a linear amount of extra space.
Based on QuickHeapsort [
5
,
7
], Edelkamp and Weiß [
9
] developed the concept of QuickX-
sort and applied it to X = WeakHeapsort [
8
] and X = Mergesort. The idea – going back
to UltimateHeapsort [
17
] – is very simple: as in Quicksort the array is partitioned into the
elements greater and less than some pivot element, respectively. Then one part of the array
is sorted by X and the other part is sorted recursively. The advantage is that, if X is an
external algorithm, then in QuickXsort the part of the array which is not currently being
sorted may be used as temporary space, which yields an internal variant of X.
1
Logarithms denoted by
log
are base 2, and the term average case refers to a uniform distribution of all
input permutations assuming all elements are diﬀerent.
arXiv:1804.10062v1 [cs.DS] 26 Apr 2018
Using Mergesort as X, a partitioning scheme with
dne
pivots, known to be optimal
for classical Quicksort [
22
], and Ford and Johnson’s MergeInsertion as the base case [
13
],
QuickMergesort requires at most
nlog n
1
.
3999
n
+
o
(
n
)element comparisons on the average
(and
nlog n
1
.
4
n
+
o
(
n
)for
n
= 2
k
), while preserving worst-case bounds
nlog n
+
O
(
n
)
element comparisons and
O
(
nlog n
)time for all other operations [
9
]. To the authors’
knowledge the average-case result is the best-known upper bound for sequential sorting with
O
(
nlog n
)overall time bound, in the leading term matching, and in the linear term being less
than 0
.
045
n
away from the lower bound. The research question addressed in this paper, is
whether QuickMergesort can be made practical in relation to eﬃcient library implementations
for sorting, such as Introsort and Dual-Pivot Quicksort.
Introsort [
23
], implemented as
std::sort
in C++/STL, is a mix of Insertionsort, Clever-
Quicksort (the Median-of-3 variant of Quicksort) and Heapsort [
12
,
28
], where the former and
latter are used as recursion stoppers (the one for improving the performance for small sets of
data, the other one for improving worst-case performance). The average-time complexity,
however, is dominated by CleverQuicksort.
Dual-Pivot Quicksort
2
by Yaroslavskiy et al. as implemented in current versions of Java
(e.g., Oracle Java 7 and Java 8) is an interesting Quicksort variant using two (instead of one)
pivot elements in the partitioning stage (recent proposals use three and more pivots [
20
]). It
has been shown that – in contrast to ordinary Quicksort with an average case of 2
·nln n
+
O
(
n
)
element comparisons – Dual-Pivot Quicksort requires at most 1
.
9
·nln n
+
O
(
n
)element
comparisons on the average, and there are variants that give 1
.
8
·nln n
+
O
(
n
). For a rising
number of samples for pivot selection, the leading factor decreases [2, 27, 3].
So far there is no practical (competitive in performance to state-of-the-art library imple-
mentations) sorting algorithm that is internal and constant-factor-optimal (optimal in the
leading term). Maybe closest is InSituMergesort [
18
,
11
], but even though that algorithm
improves greatly over the library implementation of in-place stable sort in STL, it could
not match with other internal sorting algorithms. Hence, the aim of the paper is to design
fast QuickMergesort variants. Instead of using a Median-of-
n
strategy, we will use the
Median-of-3. For Quicksort, the Median-of-3 strategy is also known as CleverQuicksort. The
c·nlog n
+
O
(
n
)for the average case of comparisons of CleverQuicksort
is
c
= (12
/
7)
ln
2
1
.
188. As
c <
1
.
8
ln
2, CleverQuicksort is theoretically superior to the
wider class of DualPivotQuicksort algorithms considered in [2, 3, 27].
Another sorting algorithm studied in this paper is a mix of QuickMergesort and Clever-
Quicksort: during the sorting with Mergesort, for small arrays CleverQuicksort is applied.
The contributions of the paper are as follows.
1.
We derive a bound on the average number of comparisons in QuickMergesort when the
Median-of-3 partitioning strategy is used instead of the Median-of-
n
strategy, and show
a surprisingly low upper bound of nlog n0.75n+o(n)comparisons on average.
2.
We analyze a variant of QuickMergesort where base cases of size at most
nβ
for some
β
[0
,
1] are sorted using yet another sorting algorithm X; otherwise the algorithm is
identical to QuickMergesort. We show that if X is called for about
n
elements and X
uses at most
α·nlog n
+
O
(
n
)comparisons on average, the average number of comparisons
of is (1 +
α
)
/
2
·nlog n
+
O
(
n
), with (1 +
α
)
/
2
1
.
094 for X =Median-of-3 Quicksort.
2
Oracle states: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley,
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 2–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
3.
We reﬁne a trick suggested in [
9
] in order to obtain a bound of
nlog n
+16
.
1
n
comparisons
in the worst case using the median-of-median algorithm [
4
sampling strategy. On average the modiﬁed algorithm is only slightly slower than the
Median-of-3 variant of QuickMergesort.
4. We compare the proposals empirically to other algorithms from the literature.
important and practically relevant sub-cases. We derive an upper bound on the average
number of comparisons in QuickMergesort with Median-of-3 pivot selection. In Section 3,
we present changes to the algorithm that lead to the hybrid QuickMergeXsort. Next, we
introduce the worst-case eﬃcient variant MoMQuickMergesort, and, ﬁnally, we present
experimental results.
2 QuickXsort and QuickMergesort
In this section we give a brief description of QuickXsort and extend a result concerning the
number of comparisons performed in the average case.
Let X be some sorting algorithm. QuickXsort works as follows: First, choose a pivot
element as the median of some sample (the performance will depend on the size of the
sample). Next, partition the array according to this pivot element, i. e., rearrange the array
such that all elements left of the pivot are less or equal and all elements on the right are
greater or equal than the pivot element. Then, choose one part of the array and sort it with
the algorithm X. After that, sort the other part of the array recursively with QuickXsort.
The main advantage of this procedure is that the part of the array that is not being
sorted currently can be used as temporary memory for the algorithm X. This yields fast
internal variants for various external sorting algorithms such as Mergesort. The idea is that
whenever a data element should be moved to the extra (additional or external) element space,
instead it is swapped with the data element occupying the respective position in part of the
array which is used as temporary memory. Of course, this works only if the algorithm needs
additional storage only for data elements. Furthermore, the algorithm has to keep track of
the positions of elements which have been swapped.
For the number of comparisons some general results hold for a wide class of algorithms X.
Under natural assumptions the average number of comparisons of X and of QuickXsort diﬀer
only by an
o
(
n
)-term: Let X be some sorting algorithm requiring at most
nlog n
+
cn
+
o
(
n
)
comparisons on average. Then, QuickXsort with a Median-of-
n
pivot selection also needs
at most
nlog n
+
cn
+
o
(
n
)comparisons on average [
9
]. Sample sizes of approximately
n
are likely to be optimal [7, 22].
If the unlikely case happens that always the
n
smallest elements are chosen for pivot
selection, Ω(
n3/2
)comparisons are performed. However, as we showed in [
9
], such a worst
case is unlikely. Nevertheless, for improving the worst-case complexity, in [
9
] we suggested a
trick similar to Introsort [
23
nlog n
+
O
(
n
)comparisons in the worst case (use
the median of the whole array as pivot if the previous pivot was very bad). In Section 4 of
this paper, we reﬁne this method yielding a better average and worst-case performance.
One example for QuickXsort is QuickMergesort. For the Mergesort part we use standard
(top-down) Mergesort, which can be implemented using
m
extra element spaces to merge two
arrays of length
m
. After the partitioning, one part of the array – for a simpler description we
assume the ﬁrst part – has to be sorted with Mergesort (note, however, that any of the two
sides can be sorted with Mergesort as long as the other side contains at least
n/
3elements.
In order to do so, the second half of this ﬁrst part is sorted recursively with Mergesort while
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 3–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
11 4 5 6 10 9 2 3 1 0 87
3 2 4 5 6 0 1 9 10 11 87
| {z }
sort recursively with Mergesort
3 2 4 11 9 10 8 70 1 5 6
sort recursively with Mergesort
| {z }
9 10 8 11 23470 1 5 6
| {z } | {z }
merge two parts
0123456711 9 8 10
| {z }
sort recursively with QuickMergesort
Figure 1 Example for the execution of QuickMergesort.
moving the elements to the back of the whole array. The elements from the back of the array
are inserted as dummy elements into the ﬁrst part. Then, the ﬁrst half of the ﬁrst part is
sorted recursively with Mergesort while being moved to the position of the former second
half of the ﬁrst part. Now, at the front of the array, there is enough space (ﬁlled with dummy
elements) such that the two halves can be merged. The executed stages of the algorithm
QuickMergesort (with no median pivot selection strategy applied) are illustrated in Fig 1.
Mergesort requires approximately
nlog n
1
.
26
n
comparisons on average, so that with a
Median-of-
n
we obtain an internal sorting algorithm with
nlog n
1
.
26
n
+
o
(
n
)comparisons
on average. One can do even better by sorting small subarrays with a more complicated
algorithm requiring less comparisons – for details see [9].
Since the Median-of-3 variant (i. e. CleverQuickMergesortsort) shows a slightly better
practical performance than with Median-of-
n
(see [
9
]), we provide here a theoretical analysis
of it by showing that CleverQuickMergesortsort performs at most
nlog n
0
.
75
n
+
o
(
n
)
comparisons on average. In fact, as in [
9
] we show a more general result for CleverQuickXsort
for an arbitrary algorithm X.
ITheorem 1
(Average Case CleverQuickXsort)
.
Let the algorithm X perform at most
αn log n
+
cn
+
O
(
log n
)comparisons on average. Then, CleverQuickXsort performs at most
αn log n
+
(c+κα)n+O(log2n)comparisons on average with κα=4
15 12 7α
ln 2 0.51.
Since Mergesort requires at most
nlog n
1
.
26
n
+
o
(
n
)comparisons on average, we obtain
the following corollary:
ICorollary 2
(Average Case CleverQuickMergesort)
.
CleverQuickMergesort is an in-place
algorithm that performs at most nlog n0.75n+o(n)comparisons on average.
Proof of Theorem 1.
The probability of choosing the
k
-th element (in the ordered sequence)
as pivot of a random n-element array is Pr [pivot =k] = (k1)(nk)n
31(one element
of the three element set has to be less than the
k
-th, one equal to the
k
-th, and one greater
than
k
-th element of the array). Note that this holds no matter whether we select the three
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 4–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
elements at random or we use ﬁxed positions and average over all input permutations. Since
probabilities sum up to 1, we have
n
X
k=1
(k1)(nk)n
31
= 1.(1)
Moreover, partitioning preserves randomness of the two sides of the array – this includes
the positions where the other two elements from the pivot sample are placed (since for a
ﬁxed pivot, every element smaller (resp. greater) than the pivot has the same probability of
being part of the sample). Also, using the array as temporary space for Mergesort does not
destroy randomness since the dummy elements are never compared.
Let
T
(
n
)be the average-case number of comparisons of CleverQuickXsort for sorting an
input of size nand let
S(n) = αn log n+cn +d(1 + log n)
be a bound for the average number of comparisons of the algorithm X (e. g. Mergesort). We
will show by induction that
T(n)αn log n+ (c+κα)n+D(1 + log2n)
for some constant
Dd
(which we specify later such that the induction base is satisﬁed)
and
κα
=
4
15 12 7α
ln 2
0
.
51 (since
α
1by the general lower bound on sorting). As
induction hypothesis for 1knwe assume that
max{T(k1) + S(nk), T (nk) + S(k1) }
α(k1) log(k1) + α(nk) log(nk) + cn +καmax{k1, n k}
+D1 + log2(max{k1, n k})+d(1 + log(min{k1, n k}))
=: f(n, k).
In order to ﬁnd the pivot element, three comparisons are needed. After that, for
partitioning
n
3comparisons are performed (all except the three elements of the pivot
sample are compared with the pivot). Since after partitioning, one part of the array is sorted
with X and the other recursively with CleverQuickXsort, we obtain the recurrence
T(n)n+
n
X
k=1
Pr [pivot =k]·max {T(k1) + S(nk), T (nk) + S(k1) }
n+
n
X
k=1
(k1)(nk)
n
3f(n, k)
n+1
n
3
n
X
k=1
(k1)(nk)α(k1) log(k1) + α(nk) log(nk)(2)
+1
n
3
n
X
k=1
(k1)(nk)καmax{k1, n k}(3)
+1
n
3
n
X
k=1
(k1)(nk)Dlog2(max{k1, n k})(4)
+1
n
3
n
X
k=1
(k1)(nk)cn +D+d+dlog (min{k1, n k})(5)
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 5–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
We simplify the terms (2)–(5) separately using
http://www.wolframalpha.com/
for eval-
uating the sums and integrals. The function
x7→ g
(
x
) = (
x
1)
2
(
nx
)
log
(
x
1) is
non-negative and has a single maximum for 1
xn
at position
x
=
ξ
; on the left of
ξ
, it
is monotonically increasing, on the right monotonically decreasing. Therefore,
n
X
k=1
g(k) =
bξc
X
k=1
g(k) +
n
X
k=bξc+1
g(k)Zbξc
1
g(x)dx +Zn
bξc+1
g(x)dx + 2g(ξ).
Since the second term of (2) is obtained from the ﬁrst one by a substitution
k7→ n
+ 1
k
,
it follows that
(2) nα
n
3· Zn
1
g(x)dx +Zn
1
g(n+ 1 x)dx + 4g(ξ)!
α
n
3· 2Zn
1
x2(nx) log x dx + 4n3log n!
=α
n
3· 2
144 ln 2 n4(12 ln n7) + 4n3log n!αn log n7α
12 ln 2 n+c3αlog n
for some properly chosen constant c3. Now, ﬁrst assume that κα0. Then we have
(3) 2κα
n
3
dn/2e
X
k=1
(k1)(nk)22καn
192n
311n320n244n+ 8011
16καn+c4
for some constant c4. On the other hand, if κα<0, we have
(3) 2κα
n
3
bn/2c
X
k=1
(k1)(nk)22καn
192n
311n368n2100n+ 1611
16καn+c4
for some constant
c4
. Thus, in any case, we have (3)
11
16 καn
+
c4
. With the same argument
as for (2), we have
(4) 2D
n
3
n
X
k=bn/2c
(k1)(nk) log2(k1)
2D
n
3Zn
bn/2c
(x1)(nx) log2(x1) dx +D·c0
5Dlog2n5D
3log n+D·c5
for some constants c0
5and c5. Finally, by (1), we have
(5) cn +D+d+dlog(n/2) = cn +D+dlog n.
Now, we combine all the terms and obtain
T(n)αn log n+n1 + 7α
12 ln 2 +c+11
16κα
+c3αlog n+c4+Dlog2n5D
3log n+Dc5+D+dlog n
We can choose
D
such that
5D
3log nc3αlog n
+
c4
+
Dc5
+
D
+
dlog n
for
n
large enough
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 6–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
and DT(n)for all smaller n. Hence, we conclude the proof of Theorem 1:
T(n)αn log n+n1 + 7α
12 ln 2 +c+11
16κα+Dlog2n+D
=αn log n+n1 + 7α
12 ln 2 +c+11
16 ·4
15 12 7α
ln 2 +Dlog2n+D
=αn log n+ (c+κα)n+Dlog2n+D.
J
Notice that in the case that in each recursion level always the smaller part is sorted with
X, the inequalities in the proof of Theorem 1 are tight up to some lower order terms. Thus,
the proof can be easily modiﬁed to provide a lower bound of
αn log n
+ (
c
+
κα
)
nO
(
log2n
)
comparisons in this special case.
3 QuickMergeXsort
QuickMergeXsort agrees with QuickMergesort up to the following change: for arrays of
size smaller than some threshold cardinality X_THRESH, the sorting algorithm X is called
(instead of Mergesort) and the sorted elements are moved to their target location expected
by QuickMergesort.
Fig. 2 provides the full implementation details of QuickMerge(X)sort (in C++). The
realization of the sorting algorithm X and the partitioning algorithm have to be added. The
listing shows that by dropping the base cases from QuickMergesort the code is short enough
for textbooks on algorithms and data structures. The general principle is that we have a
merging step that takes two sorted areas, merges and swaps them into a third one.
The program msort applies Mergesort with X as a stopper. It goes down the recursion
tree and shrinks the size of the array accordingly. If the array is small enough, the algorithm
calls X followed by a joint movement (memory copy) of array elements (the only change of
code wrt. QuickMergesort). The algorithm out serves as an interface between the recursive
procedure msort and top-level procedure sort. Last, but not least, we have the overall
internal sorting algorithm sort, that performs the partitioning. The following result is a
generalization of the 1
·nlog n
+
cn
+
o
(
n
)average comparisons bound in [
9
]. Indeed, the
proof is almost a verbatim copy of the proof of [
9
, Thm. 1] (compare to the role of
α
in the
proof of Theorem 1).
ITheorem 3
(Average-Case QuickXsort)
.
For
α
1let X be some sorting algorithm requiring
at most
α·nlog n
+
cn
+
o
(
n
)comparisons on average. Then, QuickXsort with a Median-of-
n
pivot selection also needs at most α·nlog n+cn +o(n)comparisons on average.
We are now ready to analyze the average-case performance of QuickMergeXsort.
ITheorem 4
(Average-Case QuickMergeXsort/CleverQuickMergeXsort)
.
Let X be a sorting
algorithm with
α·nlog n
+
cn
+
o
(
n
)comparisons in the average case, called when reaching
nβ
elements, 0
< β <
1. Then, QuickMergeXsort with Median-of-
n
pivot selection, as
well as with Median-of-3 pivot selection, is a sorting algorithm that needs at most (
αβ
+ (1
β)) ·nlog n+O(n)comparisons in the average case.
Proof.
To begin with we analyze MergeXsort, i.e., Mergesort, with recursion stopper X. We
assume that every path of the recursion tree of Mergesort has the same length until the
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 7–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
1t y p e d e f s t d : : v e ct o r <t > :: i t e r a t o r i t e r ;
2vo i d m er ge ( i t e r b e gi n 1 , i t e r en d1 , i t e r t a r g e t , i t e r e n d t a r g e t ) {
3i t e r i 1 = b e gi n 1 , i 2 = t a r g e t + ( e nd 1 b e g i n1 ) , i r e s = t a r g e t ;
4t temp = target ;
5while ( i 1 ! = en d 1 && i 2 ! = e n d t a r g e t ) {
6i t e r te m pi t = ( i 1 < i 2 ) ? i 1++ : i 2 ++;
7i r e s++ = te m p i t ; te mp i t = i r e s ;
8}
9while( i 1 < e nd 1 ) { i r e s ++ = i 1 ; i 1++ = i r e s ; }
10 ( i 1 1) = temp ;
11 }
12 vo i d m so rt ( i t e r be gi n , i t e r end , i t e r t a r g et ) {
13 in d e x n = end begin ;
14 i f ( n < X_THRESH) {
15 X( b e gi n , en d ) ;
16 for(i n t i =0 ; i <n ; i ++) s t d : : sw ap ( b e g i n [ i ] , t a r g e t [ i ] )
17 }
18 e l s e {
19 i nd e x q = n / 2 ;
20 ms o rt ( b e gi n + q , end , t a r g e t + q ) ;
21 ms o rt ( b e gi n , b e gi n + q , b e g in + q ) ;
22 me rg e ( b e g i n + q , b e gi n + n , ta r g e t , t a r g e t + n ) ;
23 }
24 }
25 vo i d o ut ( i t e r b eg in , i t e r en d , i t e r temp ) {
26 in d e x n = end begin ;
27 i f ( n > 1 ) {
28 in d e x q = n / 2 , r = n q ;
29 ms o rt ( b e gi n + q , end , te mp ) ;
30 ms or t ( be gi n , be g in + q , b eg in + r ) ;
31 me rg e ( tem p , tem p + r , b e gi n , e nd ) ;
32 }
33 }
34 vo i d s o r t ( s t d : : ve c t o r <t> &a ) {
35 i t e r b e gi n = a . b eg i n ( ) , e nd = a . e nd ( ) ;
36 while ( b e gi n < e nd ) {
37 i t e r b = p a r t i t i o n ( b e gi n , e nd ) ;
38 i f ( b < ( e nd + b eg in ) / 2) { ou t ( be gi n , b , b+1) ; b eg in = b+ 1; }
39 e l s e { o ut ( b +1 , en d , b eg i n ) ; e nd = b ; }
40 }
41 }
42
Figure 2 Implementation of QuickMergeXsort.
algorithm switches to X. This can be easily implemented and guarantees that all calls to X
are made on arrays of almost identical size.
First, we look at the
d(log n)·(1 β)e
top layers of the recursion tree, which are sorted
by Mergesort. In the worst-case, in layer
i
of the tree, Mergesort requires at most
n
2
i< n
comparisons, so that in total we have at most
CMergeXsort
(
n
) =
n· d(1 β)·log ne
element
comparisons. The average case diﬀers only negligibly from the worst case.
In the
d(log n)·(1 β)e
recursion levels of Mergesort, 2
d(1β) log n)e
sorted arrays are
merged to one large sorted array. Each of the
gβ
(
n
) = 2
d(1β) log ne
arrays is of size at most
fβ(n) = 2log n−d(1β) log nenβ.
Next, we look at the
gβ
(
n
) = 2
d(1β) log ne
calls to X. Let
CX
(
n
)denote the average number
of element comparisons executed by all calls of X. Given that
gβ
(
n
)
fβ
(
n
) =
n
+
O
(
n1β
)
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 8–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
and log fβ(n) = log 2log n−d(1β) log ne= log n− d(1 β) log ne+O(1/nβ), we obtain
CX(n) = gβ(n)·(α·fβ(n) log fβ(n) + cfβ(n) + o(fβ(n)))
=α·nlog fβ(n) + cn +o(n) = α·n(log n− d(1 β) log ne) + cn +o(n)
In cumulation, for the average-case number of comparisons of MergeXsort we have the
following upper bound
CMergeXsort(n) = CX(n) + CMergeXsort (n)
α·n(log n− d(1 β) log ne) + cn +o(n) + nd(1 β) log ne
=nα·log n(α1) d(1 β) log ne+cn +o(n)
= (αβ + (1 β)) ·nlog n+O(n).
Using Theorem 3 (resp. Theorem 1 for Median-of-3) we obtain the matching bound of at
most (
αβ
+(1
β
))
·nlog n
+
O
(
n
)element comparisons on average for QuickMergeXsort.
J
Theorem 4 implies that CleverQuickMergeXsort implemented with CleverQuicksort as
recursion stopper at
n
elements (
β
= 1
/
2) is a sorting algorithm that needs at most
((α+ 1)/2) ·nlog n+O(n)=1.094 ·nlog n+O(n)comparisons on average.
4 Worst-Case Eﬃcient QuickMergeSort
Although QuickMergesort has an
O
(
n2
)worst-case running time, is is quite simple to
guarantee a worst-case number of comparisons of
nlog n
+
O
(
n
): just choose the median
of the whole array as pivot. This is essentially how InSituMergesort [
11
] works. The most
eﬃcient way for ﬁnding the median is using Quickselect [
14
] as applied in InSituMergesort.
However, this does not allow the desired bound on the number of comparisons (even not when
using IntroSelect as in [
11
]). Alternatively, one could use the median-of-medians algorithm
[
4
], which, while having a linear worst-case running time, on average is quite slow. In this
section we describe a slight variation of the median-of-medians approach, which combines a
linear worst-case running time with almost the same average performance as InSituMergesort.
Again, the crucial observation is that it is not necessary to use the actual median as
pivot. As remarked in Section 2, the larger of the two sides of the partitioned array can be
sorted with Mergesort as long as the smaller side contains at least one third of the total
number of elements. Therefore, it suﬃces to ﬁnd a pivot which guarantees such a partition.
For doing so, we can apply the idea of the median-of-medians algorithm: for sorting an array
of
n
elements, we choose ﬁrst
n/
3elements as median of three elements each. Then, the
median-of-medians algorithm is used to ﬁnd the median of those
n/
3elements. This median
becomes the next pivot. Like for the median-of-medians algorithm [
4
], this ensures that at
least 2
·bn/6c
elements are less or equal and at least the same number of elements are greater
or equal than the pivot – thus, always the larger part of the partitioned array can be sorted
with Mergesort and the recursion takes place on the smaller part. The big advantage over the
straightforward application of the median-of-medians algorithm it that it is called on an array
of only size
n/
3(with the cost of introducing a small overhead for ﬁnding the
n/
3medians
of three) – giving less weight on its big constant for the linear number of comparisons. We
call this algorithm MoMQuickMergesort (MOMQMS).
In our implementation of the median-of-medians algorithm, we use select the pivot as
median of the medians of groups of ﬁve elements – we refer to [
6
, Sec. 9.3] for a detailed
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 9–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
description. The total number
T
(
n
)of comparisons in the worst case of MoMQuickMergesort
is bounded by
T(n)T(n/2) + S(n/2) + M(n/3) + n
3·3 + 2
3n
where
S
(
n
)is the number of comparisons incurred by Mergesort and
M
(
n
)the number of
comparisons for the median-of-medians algorithm. We have
M
(
n
)
22
n
(for the variant
used in our implementation, which uses seven comparisons for ﬁnding the median of ﬁve
elements). The
n
3·
3-term comes from ﬁnding
n/
3medians of three elements, the 2
n/
3
comparisons from partitioning the remaining elements (after ﬁnding the pivot, the correct
side of the partition is known for n/3elements).
Since by [19] we have S(n)nlog n0.9n, this yields
T(n)T(n/2) + n
2log(n/2) 0.9n
2+22
3n+5
3n
resolving to T(n)nlog n+ 16.1n.
For our implementation we also use a slight improvement over the basic median-of-medians
algorithm by using the approach of adaption, which was ﬁrst introduced in [
21
] for Quickselect
and recently applied to the median-of-medians algorithm [
1
]. More speciﬁcally, whenever in
a recursive call the
k
-th element is searched with
k
far apart from
n/
2(more precisely for
k
0
.
3
n
or
k
0
.
7
n
), we do not choose the median of the medians as pivot but an element
proportional to
k
(while still guaranteeing that at least 0
.
3
n
next recursive call as in [4]).
Notice that in the presence of duplicate elements, we need to apply three-way partitioning
for guaranteeing that worst-case number of comparisons (that is elements equal to the pivot
are placed in the middle and not included into the recursive call nor into Mergesort). With
the usual partitioning (as in our experiments), we obtain a worse bound for the worst case
since it might happen that the smaller part of the array has to be sorted with Mergesort.
In order to achieve the guarantee for the worst case together with the eﬃciency of the
Median-of-3 pivot sampling, we can combine the two approaches using a trick similar to
Introsort [
23
]: we ﬁx some small
δ >
0. Whenever the pivot is contained in the interval
[δn, (1 δ)n]
, the next pivot is selected as Median-of-3, otherwise according to the worst-case
eﬃcient procedure described in the previous section – for the following pivots switch back to
Median-of-3. When choosing
δ
not too small, the worst case number of comparisons will be
only approximately 2
n
more than of MoMQuickMergesort (because in the worst case before
every partitioning step according to MoMQuickMergesort, there will be one partitioning step
with Median-of-3 using
n
comparisons), while the average is almost as CleverQuickMergesort.
We propose δ= 1/16. We call this algorithm HybridQuickMergsort (HQMS).
5 Experiments
The collection of sorting algorithms we considered for comparison is much larger than the one
we present here, but the bar of being competitive wrt. state-of-the-art library implementations
in C++ and Java on basic data types is surprisingly high. For example, all Heapsort variants
we are aware of fail this test, we checked reﬁned implementations of Binary Heapsort [
12
,
28
],
Bottom-Up Heapsort [
26
], MDR Heapsort [
25
], QuickHeapsort [
7
], and Weak-Heapsort [
8
].
Some of these algorithm even use extra space. Timsort (by Tim Peters; used in Java for
sorting non-elementary object sequences) was less performant on simple data types.
There are fast algorithms that exploit the set of keys to be sorted (like CountingSort or
Radixsort), but we aim at a general algorithm.
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 10–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
We also experimented with Sanders and Winkel’s SuperScalarSampleSort that has a
particular memory proﬁle [
24
,
10
]. The main reason not to include the results was that
it allocates substantial amounts of space for the elements and, thus, is not internal. We
experienced that it acts fast on random data, but not as good on presorted inputs.
One remaining competitor was (Bottom-Up) Mergesort (
std::stable_sort
) in the C++
STL library, which on some inputs shows a very good performance. As this is an external
algorithm, we chose a tuned version of in-place Mergesort (
stl::inplace_stable_sort
simply was too slow) called InSituMergesort (ISMS) [11] for our experiments.
According to [
2
,
27
,
3
], for the DualPivotQuicksort algorithm variants, there was no
clear-cut winner, but the experiments suggested that the standard ones had a slight edge.
For DualPivotQuicksort we translated the most recent Oracle’s (Java) version (the algorithm
selects the 2nd and 4th element of the inner ﬁve pivot candidates of a split-into-7). As the
full sorting algorithm is lengthy and contains many checks for special input types (with
diﬀerent code fragments and parameter settings for sorting arrays of bytes, shorts, ints, ﬂoats,
doubles etc.) we extracted the integer part.
TunedQuicksort [
11
] is an engineered implementation of CleverQuicksort, probably un-
noticed by the public and contained in a paper on tuning Mergesort for studying branch
misprediction as in [
16
]. It applies Lomuto’s uni-directional Median-of-3 partitioner [
6
],
which works well for permutations and a limited number of duplicates in the element set.
As with Introsort, the algorithm stops recursion, if less than a ﬁxed number of elements are
reached (16 in our case). These elements are then sorted together, calling STL’s Insertionsort
algorithm. The implementation utilizes a stack to avoid recursion, being responsible for
tracking the remaining array intervals to be processed. We dropped TunedQuicksort from
the experiments as it failed on presorted data and data with duplicates, but we used parts of
its eﬃcient stack-based implementation. This advanced CleverQuicksort implementation and
CleverQuickMergesort (QMS) are the two extremes, while CleverQuickMergeCleverQuick-
sort (QuickMergeCleverQuicksort with a modiﬁed TunedQuicksort implementation at
n
elements) (QMQS for short) is our tested intermediate.
QMS uses hard-coded base cases for
n <
10, while the recursion stopper in QMQS does
not. Depending on the size of the arrays the displayed numbers are averages over multiple
runs (repeats)
3
. The arrays we sorted were random permutations of
{
1
, . . . , n}
. The number
of element comparisons was measured by increasing a counter for every comparison.
For CPU time experiments we used vectors of integers as this is often most challenging
for algorithms with a lower number of comparison. All algorithms sort the same arrays. As
counting the number of comparisons aﬀects the speed of the sorting algorithms, for further
measurements (e.g., moves and comparisons) we started another sets of experiments.
We made element comparisons more expensive (we experimented with logarithms, and
elements as vectors and records). Through a lower number of comparisons results were even
better.
As a ﬁrst empirical observation, for Introsort (Std) the number of element comparisons
divided by
nlog n
is larger than 1
.
18, due to higher lower-order terms. As theoretically
shown, for QMS the number of element comparisons divided by nlog nwas below 1.
For our QuickMergesort implementations we used the block partitionioner from [
10
], which
improves the performance considerably over the standard Hoare partitioner. Figs. 3–4 show
3
Experiments were run on one core of an Intel Core i5-2520M CPU (2.50GHz, 3MB Cache) with 16GB
RAM; Operating system: Ubuntu Linux 64bit; Compiler: GNU’s
g++
(4.8.2); optimized with ﬂags
-O3
-march=native -funroll-loops.
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 11–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
Figure 3 Time (left) and element comparisons (right) for sorting random integer data.
the results when sorting random integer data (with QMQS: CleverQuickMergeCleverQuicksort,
QMS: CleverQuickMergesort, MOMQMS: worst-case-eﬃcent QuickMergesort, HQMS: hybrid
of worst-case- and average-case-eﬃcient QuickMergeSort, ISMS: InSituMergesort, Java:
DualPivotQuicksort, and Std:
std::sort
). Times displayed are the total running times
divided by the number of elements (in ns). We see that QuickMergeSort variants are fast. For
measuring element moves (assignments of input data elements, e.g., a swap of two elements
is counted as three moves).
Figure 4
Time for sorting random data with a comparator that applies the logarithm to the
integer elements (left), and number of element moves (right).
6 Conclusion
Sorting
n
elements is one of the most frequently studied subjects in computer science with
applications in almost all areas in which eﬃcient programs run.
With variants of QuickMergesort, we contributed sorting algorithms which are able to
run faster than Introsort and DualPivotQuicksort even for elementary data. Compared to
Introsort, we reduced the leading term
α
in
α·nlog n
+
O
(
n
)in the average number of
comparisons from
α
1
.
18 via 1
.
09 to ﬁnally reaching 1. The algorithms are simple but
eﬀective: a) Median-of-3 pivot selection (as opposed to using a sample of
n
), b) faster
sorting for smaller element sets. Both modiﬁcations show empirical impact and are analyzed
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 12–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
theoretically to provide upper bounds on the average number of comparisons. We discussed
options to warrant a constant-factor optimal worst-case.
In the theoretical part of our work we concentrated on average-case analyses, as we
strongly believe that this reﬂects realistic behavior more closely than worst-case analyses.
With very low overhead, QuickMergesort has implemented in a way that it becomes constant-
factor optimal in the worst-case, too. We chose eﬃcient deterministic median-of-median
strategies that are also of interest for further considerations.
For future research we propose the integration of QuickMergesort with multi-way merging,
envisioning to scale the algorithm beyond main memory capacity and eﬀective parallelizations.
References
1Andrei Alexandrescu. Fast deterministic selection. In 16th International Symposium on
Experimental Algorithms, SEA 2017, June 21-23, 2017, London, UK, pages 24:1–24:19,
2017.
2Martin Aumüller and Martin Dietzfelbinger. Optimal partitioning for dual pivot quicksort
- (extended abstract). In ICALP, pages 33–44, 2013.
3Martin Aumüller, Martin Dietzfelbinger, and Pascal Klaue. How good is multi-pivot quick-
sort? CoRR, abs/1510.04676, 2015.
4Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert E. Tarjan.
Time bounds for selection. Journal of Computer and System Sciences, 7(4):448–461, 1973.
5D. Cantone and G. Cinotti. QuickHeapsort, an eﬃcient mix of classical sorting algorithms.
Theoretical Computer Science, 285(1):25–42, 2002.
6Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliﬀord Stein. Introduction
to Algorithms. The MIT Press, 3th edition, 2009.
7Volker Diekert and Armin Weiß. Quickheapsort: Modiﬁcations and improved analysis. In
CSR, pages 24–35, 2013.
8Ronald D. Dutton. Weak-heap sort. BIT, 33(3):372–381, 1993.
9Stefan Edelkamp and Armin Weiß. QuickXsort: Eﬃcient sorting with n logn - 1.399n +
o(n) comparisons on average. In CSR, pages 139–152, 2014.
10 Stefan Edelkamp and Armin Weiß. Blockquicksort: Avoiding branch mispredictions in
Quicksort. In Piotr Sankowski and Christos D. Zaroliagis, editors, 24th Annual European
Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark, volume 57
of LIPIcs, pages 38:1–38:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
11 Amr Elmasry, Jyrki Katajainen, and Max Stenmark. Branch mispredictions don’t aﬀect
mergesort. In SEA, pages 160–171, 2012.
12 Robert W. Floyd. Algorithm 245: Treesort 3. Comm. of the ACM, 7(12):701, 1964.
13 Jr. Ford, Lester R. and Selmer M. Johnson. A tournament problem. The American Math-
ematical Monthly, 66(5):pp. 387–389, 1959.
14 C. A. R. Hoare. Algorithm 65: Find. Commun. ACM, 4(7):321–322, July 1961.
15 Charles A. R. Hoare. Quicksort. The Computer Journal, 5(1):10–16, 1962.
16 Kanela Kaligosi and Peter Sanders. How branch mispredictions aﬀect quicksort. In ESA,
pages 780–791, 2006.
17 Jyrki Katajainen. The ultimate heapsort. In Computing: The Fourth Australasian Theory
Symposium (CATS), pages 87–96, 1998.
18 Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort. Nord.
J. Comput., 3(1):27–40, 1996.
19 Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.
Addison Wesley Longman, 2nd edition, 1998.
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 13–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
20 Shrinu Kushagra, Alejandro López-Ortiz, Aurick Qiao, and J. Ian Munro. Multi-pivot
quicksort: Theory and experiments. In ALENEX, pages 47–60, 2014.
21 Conrado Martínez, Daniel Panario, and Alfredo Viola. Adaptive sampling strategies for
quickselects. ACM Trans. Algorithms, 6(3):53:1–53:45, 2010.
22 Conrado Martínez and Salvador Roura. Optimal Sampling Strategies in Quicksort and
Quickselect. SIAM J. Comput., 31(3):683–705, 2001.
23 David R. Musser. Introspective sorting and selection algorithms. Software—Practice and
Experience, 27(8):983–993, 1997.
24 Peter Sanders and Sebastian Winkel. Super scalar sample sort. In ESA, pages 784–796,
2004.
25 Ingo Wegener. The worst case complexity of McDiarmid and Reed’s variant of bottom-up-
heap sort is less than nlog n+ 1.1n. In STACS, pages 137–147, 1991.
26 Ingo Wegener. Bottom-up-Heapsort, a new variant of Heapsort beating, on an average,
Quicksort (if nis not very small). Theoretical Computer Science, 118:81–98, 1993.
27 Sebastian Wild, Markus E. Nebel, and Ralph Neininger. Average case and distributional
analysis of dual-pivot quicksort. ACM Transactions on Algorithms, 11(3):22:1–22:42, 2015.
28 J. W. J. Williams. Algorithm 232: Heapsort. Communications of the ACM, 7(6):347–348,
1964.
Conference title on which this volume is based on.
Editors: Billy Editor and Bill Editors; pp. 14–14
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The Median of Medians (also known as BFPRT) algorithm, although a landmark theoretical achievement, is seldom used in practice because it and its variants are slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm QuickselectAdaptive based on a refined definition of MedianOfMedians. The algorithm's performance brings deterministic selection---along with its desirable properties of reproducible runs, predictable run times, and immunity to pathological inputs---in the range of practicality. We demonstrate results on independent and identically distributed random inputs and on normally-distributed inputs. Measurements show that QuickselectAdaptive is faster than state-of-the-art baselines.
Article
Full-text available
The idea of multi-pivot quicksort has recently received the attention of researchers after Vladimir Yaroslavskiy proposed a dual pivot quicksort algorithm that, contrary to prior intuition, outperforms standard quicksort by a a significant margin under the Java JVM [10]. More recently, this algorithm has been analysed in terms of comparisons and swaps by Wild and Nebel [9]. Our contributions to the topic are as follows. First, we perform the previous experiments using a native C implementation thus removing potential extraneous effects of the JVM. Second, we provide analyses on cache behavior of these algorithms. We then provide strong evidence that cache behavior is causing most of the performance differences in these algorithms. Additionally, we build upon prior work in multi-pivot quicksort and propose a 3-pivot variant that performs very well in theory and practice. We show that it makes fewer comparisons and has better cache behavior than the dual pivot quicksort in the expected case. We validate this with experimental results, showing a 7-8% performance improvement in our tests. Copyright
Article
Full-text available
Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step $k$ pivots are used to split the input into $k + 1$ segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This paper studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well known median-of-$k$ approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorithm for rearranging elements in the partitioning step is carried out, observing mainly how often array cells are accessed during partitioning. The algorithm behaves best if 3 or 5 pivots are used. Experiments show that this translates into good cache behavior and is closest to predicting observed running times of multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots from a sample affects sorting cost.
Conference Paper
Full-text available
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. We show that up to o(n) terms the average number of comparisons incurred by QuickXsort is equal to the average number of comparisons of X. We also describe a new variant of WeakHeapsort. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort construction. Both are efficient algorithms that perform approximately n logn − 1.26n + o(n) comparisons on average. Moreover, we show that this bound also holds for a slight modification which guarantees an $$n \log n + \mathcal{O}(n)$$ bound for the worst case number of comparisons. Finally, we describe an implementation of MergeInsertion and analyze its average case behavior. Taking MergeInsertion as a base case for QuickMergesort, we establish an efficient internal sorting algorithm calling for at most n logn − 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs and is competitive to STL-Introsort.
Article
Full-text available
In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance—on the contrary: previous theoretical studies of other dual-pivot Quicksort variants even discouraged the use of two pivots. In 2012, two of the authors gave an average case analysis of a simplified version of Yaroslavskiy's algorithm, proving that savings in the number of comparisons are possible. However, Yaroslavskiy's algorithm needs more swaps, which renders the analysis inconclusive. To force the issue, we herein extend our analysis to the fully detailed style of Knuth: we determine the exact number of executed Java Bytecode instructions. Surprisingly, Yaroslavskiy's algorithm needs sightly more Bytecode instructions than a simple implementation of classic Quicksort—contradicting observed running times. As in Oracle's library implementation, we incorporate the use of Insertionsort on small subproblems and show that it indeed speeds up Yaroslavskiy's Quicksort in terms of Bytecodes; but even with optimal Insertionsort thresholds, the new Quicksort variant needs slightly more Bytecode instructions on average. Finally, we show that the (suitably normalized) costs of Yaroslavskiy's algorithm converge to a random variable whose distribution is characterized by a fixed-point equation. From that, we compute variances of costs and show that for large n, costs are concentrated around their mean.
Conference Paper
Full-text available
In quicksort, due to branch mispredictions, a skewed pivot-selection strategy can lead to a better performance than the exact-median pivot-selection strategy, even if the exact median is given for free. In this paper we investigate the effect of branch mispredictions on the behaviour of mergesort. By decoupling element comparisons from branches, we can avoid most negative effects caused by branch mispredictions. When sorting a sequence of n elements, our fastest version of mergesort performs n log2n + O(n) element comparisons and induces at most O(n) branch mispredictions. We also describe an in-situ version of mergesort that provides the same bounds, but uses only O(log2n) words of extra memory. In our test computers, when sorting integer data, mergesort was the fastest sorting method, then came quicksort, and in-situ mergesort was the slowest of the three. We did a similar kind of decoupling for quicksort, but the transformation made it slower.
Conference Paper
Full-text available
We present a new analysis for QuickHeapsort splitting it into the analysis of the partition-phases and the analysis of the heap-phases. This enables us to consider samples of non-constant size for the pivot selection and leads to better theoretical bounds for the algorithm. Furthermore we introduce some modifications of QuickHeapsort, both in-place and using n extra bits. We show that on every input the expected number of comparisons is n lg n - 0.03n + o(n) (in-place) respectively n lg n -0.997 n+ o (n). Both estimates improve the previously known best results. (It is conjectured in Wegener93 that the in-place algorithm Bottom-Up-Heapsort uses at most n lg n + 0.4 n on average and for Weak-Heapsort which uses n extra-bits the average number of comparisons is at most n lg n -0.42n in EdelkampS02.) Moreover, our non-in-place variant can even compete with index based Heapsort variants (e.g. Rank-Heapsort in WangW07) and Relaxed-Weak-Heapsort (n lg n -0.9 n+ o (n) comparisons in the worst case) for which no O(n)-bound on the number of extra bits is known.
Conference Paper
Dual pivot quicksort refers to variants of classical quicksort where in the partitioning step two pivots are used to split the input into three segments. This can be done in different ways, giving rise to different algorithms. Recently, a dual pivot algorithm due to Yaroslavskiy received much attention, because it replaced the well-engineered quicksort algorithm in Oracle's Java 7 runtime library. Nebel and Wild (ESA 2012) analyzed this algorithm and showed that on average it uses 1.9n ln n + O(n) comparisons to sort an input of size n, beating standard quicksort, which uses 2n ln n + O(n) comparisons. We introduce a model that captures all dual pivot algorithms, give a unified analysis, and identify new dual pivot algorithms that minimize the average number of key comparisons among all possible algorithms up to lower order or linear terms. This minimum is 1.8n ln n + O(n). For the case that the pivots are chosen from a small sample, we include a comparison of dual pivot quicksort and classical quicksort. We also present results about minimizing the average number of swaps.