Conference PaperPDF Available

QuickXsort: Efficient Sorting with n logn − 1.399n + o(n) Comparisons on Average

Authors:

Abstract and Figures

In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. We show that up to o(n) terms the average number of comparisons incurred by QuickXsort is equal to the average number of comparisons of X. We also describe a new variant of WeakHeapsort. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort construction. Both are efficient algorithms that perform approximately n logn − 1.26n + o(n) comparisons on average. Moreover, we show that this bound also holds for a slight modification which guarantees an \(n \log n + \mathcal{O}(n)\) bound for the worst case number of comparisons. Finally, we describe an implementation of MergeInsertion and analyze its average case behavior. Taking MergeInsertion as a base case for QuickMergesort, we establish an efficient internal sorting algorithm calling for at most n logn − 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs and is competitive to STL-Introsort.
Content may be subject to copyright.
QuickXsort: Efficient Sorting with
nlog n1.399n+o(n) Comparisons on Average
Stefan Edelkamp1and Armin Weiß2
1TZI, Universit¨at Bremen, Am Fallturm 1, D-28239 Bremen, Germany
2FMI, Universit¨at Stuttgart, Universit¨atsstr. 38, D-70569 Stuttgart, Germany
Abstract.
In this paper we generalize the idea of QuickHeapsort
leading to the notion of QuickXsort. Given some external sorting
algorithm X, QuickXsort yields an internal sorting algorithm if X
satisfies certain natural conditions. We show that up to
o
(
n
) terms the
average number of comparisons incurred by QuickXsort is equal to the
average number of comparisons of X.
We also describe a new variant of WeakHeapsort. With QuickWeak-
Heapsort and QuickMergesort we present two examples for the
QuickXsort construction. Both are efficient algorithms that perform
approximately
nlog n
1
.
26
n
+
o
(
n
) comparisons on average. Moreover, we
show that this bound also holds for a slight modification which guarantees
an nlog n+O(n) bound for the worst case number of comparisons.
Finally, we describe an implementation of MergeInsertion and analyze
its average case behavior. Taking MergeInsertion as a base case for
QuickMergesort, we establish an efficient internal sorting algorithm
calling for at most
nlog n
1
.
3999
n
+
o
(
n
) comparisons on average. Quick-
Mergesort with constant size base cases shows the best performance
on practical inputs and is competitive to STL-Introsort.
Keywords:
in-place sorting, quicksort, mergesort, analysis of algorithms
1 Introduction
Sorting a sequence of
n
elements remains one of the most frequent tasks carried
out by computers. A lower bound for sorting by only pairwise comparisons
is
log
(
n
!)
nlog n
1
.
44
n
+
O
(
log n
) comparisons for the worst and average
case (logarithms denoted by
log
are always base 2, the average case refers to a
uniform distribution of all input permutations assuming all elements are different).
Sorting algorithms that are optimal in the leading term are called constant-factor-
optimal. Tab. 1 lists some milestones in the race for reducing the coefficient in
the linear term. One of the most efficient (in terms of number of comparisons)
constant-factor-optimal algorithms for solving the sorting problem is Ford and
Johnson’s MergeInsertion algorithm [9]. It requires
nlog n
1
.
329
n
+
O
(
log n
)
comparisons in the worst case [12]. MergeInsertion has a severe drawback that
makes it uninteresting for practical issues: similar to Insertionsort the number
of element moves is quadratic in
n
, i. e., it has quadratic running time. With
Insertionsort we mean the algorithm that inserts all elements successively
into the already ordered sequence finding the position for each element by binary
search (not by linear search as frequently done). However, MergeInsertion
and Insertionsort can be used to sort small subarrays such that the quadratic
running time for these subarrays is small in comparison to the overall running
time. Reinhardt [15] used this technique to design an internal Mergesort
variant that needs in the worst case
nlog n
1
.
329
n
+
O
(
log n
) comparisons.
Unfortunately, implementations of this InPlaceMergesort algorithm have
not been documented. Katajainen et al.’s [11, 8] work inspired by Reinhardt is
practical, but the number of comparisons is larger.
Throughout the text we avoid the terms in-place or in-situ and prefer the
term internal (opposed to external ). We call an algorithm internal if it needs
at most
O
(
log n
) space (computer words) in addition to the array to be sorted.
That means we consider Quicksort as an internal algorithm whereas standard
Mergesort is external because it needs a linear amount of extra space.
Based on QuickHeapsort [2], we develop the concept of QuickXsort
in this paper and apply it to Mergesort and WeakHeapsort, what yields
efficient internal sorting algorithms. The idea is very simple: as in Quicksort the
array is partitioned into the elements greater and less than some pivot element.
Then one part of the array is sorted by some algorithm X and the other part is
sorted recursively. The advantage of this procedure is that, if X is an external
algorithm, then in QuickXsort the part of the array which is not currently being
sorted may be used as temporary space, what yields an internal variant of X. We
give an elementary proof that under natural assumptions QuickXsort performs
up to
o
(
n
) terms on average the same number of comparisons as X. Moreover,
we introduce a trick similar to Introsort [14] which guarantees
nlog n
+
O
(
n
)
comparisons in the worst case.
The concept of QuickXsort (without calling it like that) was first applied
in UltimateHeapsort by Katajainen [10]. In UltimateHeapsort, first the
median of the array is determined, and then the array is partitioned into sub-
arrays of equal size. Finding the median means significant additional effort.
Cantone and Cincotti [2] weakened the requirement for the pivot and designed
QuickHeapsort which uses only a sample of smaller size to select the pivot
for partitioning. UltimateHeapsort is inferior to QuickHeapsort in terms
of average case number of comparisons, although, unlike QuickHeapsort, it
allows an
nlog n
+
O
(
n
) bound for the worst case number of comparisons. Diekert
and Weiß [3] analyzed QuickHeapsort more thoroughly and described some
improvements requiring less than
nlog n
0
.
99
n
+
o
(
n
) comparisons on average.
Edelkamp and Stiegeler [5] applied the idea of QuickXsort to WeakHeap-
sort (which was first described by Dutton [4]) introducing QuickWeakHeap-
sort. The worst case number of comparisons of WeakHeapsort is
ndlog ne −
2
dlog ne
+
n
1
nlog n
+ 0
.
09
n
, and, following Edelkamp and Wegener [6], this
bound is tight. In [5] an improved variant with
nlog n
0
.
91
n
comparisons in the
worst case and requiring extra space is presented. With ExternalWeakHeap-
sort we propose a further refinement with the same worst case bound, but
2
Table 1. Constant-factor-optimal sorting with nlog n+κn +o(n) comparisons.
Mem. Other κWorst κAvg. κExper.
Lower bound O(1) O(nlog n) -1.44 -1.44
BottomUpHeapsort [16] O(1) O(nlog n)ω(1) – [0.35,0.39]
WeakHeapsort [4, 6] O(n/w)O(nlog n) 0.09 [-0.46,-0.42]
RelaxedWeakHeapsort [5] O(n)O(nlog n) -0.91 -0.91 -0.91
Mergesort [12] O(n)O(nlog n) -0.91 -1.26
ExternalWeakHeapsort #O(n)O(nlog n) -0.91 -1.26*
Insertionsort [12] O(1) O(n2) -0.91 -1.38 #
MergeInsertion [12] O(n)O(n2) -1.32 -1.3999 # [-1.43,-1.41]
InPlaceMergesort [15] O(1) O(nlog n) -1.32
QuickHeapsort [2, 3] O(1) O(nlog n)ω(1) -0.03 0.20
O(n/w)O(nlog n)ω(1) -0.99 -1.24
QuickMergesort (IS) # O(log n)O(nlog n) -0.32 -1.38
QuickMergesort #O(1) O(nlog n) -0.32 -1.26 [-1.29,-1.27]
QuickMergesort (MI) # O(log n)O(nlog n) -0.32 -1.3999 [-1.41,-1.40]
Abbreviations: # established in this paper, MI MergeInsertion, – not analyzed, * for
n= 2k,w: computer word width in bits; we assume logn∈ O(n/w).
For QuickXsort we assume InPlaceMergesort as a worst-case stopper (without
κworst ω
(1)). The column “Mem.” exhibits the amount of computer words of memory
needed additionally to the data. “Other” gives the amount of other operations than
comparisons performed during sorting.
on average requiring approximately
nlog n
1
.
26
n
comparisons. Using Exter-
nalWeakHeapsort as X in QuickXsort we obtain an improvement over
QuickWeakHeapsort of [5].
Mergesort is another good candidate for applying the QuickXsort con-
struction. With QuickMergesort we describe an internal variant of Merge-
sort which not only in terms of number of comparisons competes with standard
Mergesort, but also in terms of running time. As mentioned before, MergeIn-
sertion can be used to sort small subarrays. We study MergeInsertion and
provide an implementation based on weak heaps. Furthermore, we give an average
case analysis. When sorting small subarrays with MergeInsertion, we can show
that the average number of comparisons performed by Mergesort is bounded
by
nlog n
1
.
3999
n
+
o
(
n
), and, therefore, QuickMergesort uses at most
nlog n
1
.
3999
n
+
o
(
n
) comparisons in the average case. To our best knowledge
this is better than any previously known bound.
The paper is organized as follows: in Sect. 2 the concept of QuickXsort is
described and our main theorems about the average and worst case number of
comparisons are stated. The following sections are devoted to present examples for
X in QuickXsort: In Sect. 3 we develop ExternalWeakHeapsort, analyze
it, and show how it can be used for QuickWeakHeapsort. The next section
treats QuickMergesort and the modification that small base cases are sorted
with some other algorithm, e. g. MergeInsertion, which is then described in
Sect. 5. Finally, we present our experimental results in Sect. 6.
Due to space limitations most proofs can be found in the arXiv version [7].
3
2QuickXsort
In this section we give a more precise description of QuickXsort and derive
some results concerning the number of comparisons performed in the average and
worst case. Let X be some sorting algorithm. QuickXsort works as follows: First,
choose some pivot element as median of some random sample. Next, partition
the array according to this pivot element, i. e., rearrange the array such that
all elements left of the pivot are less or equal and all elements on the right are
greater or equal than the pivot element. (If the algorithms X outputs the sorted
sequence in the extra memory, the partitioning is performed such that the all
elements left of the pivot are greater or equal and all elements on the right are
less or equal than the pivot element.) Then, choose one part of the array and
sort it with algorithm X. (The preferred choice depends on the sorting algorithm
X.) After one part of the array has been sorted with X, move the pivot element
to its correct position (right after/before the already sorted part) and sort the
other part of the array recursively with QuickXsort.
The main advantage of this procedure is that the part of the array that is
not being sorted currently can be used as temporary memory for the algorithm
X. This yields fast internal variants for various external sorting algorithms such
as Mergesort. The idea is that whenever a data element should be moved
to the external storage, instead it is swapped with the data element occupying
the respective position in part of the array which is used as temporary memory.
Of course, this works only if the algorithm needs additional storage only for
data elements. Furthermore, the algorithm has to be able to keep track of the
positions of elements which have been swapped. As the specific method depends
on the algorithm X, we give some more details when we describe the examples
for QuickXsort.
For the number of comparisons we can derive some general results which
hold for a wide class of algorithms X. Under natural assumptions the average
number of comparisons of X and of QuickXsort differ only by an
o
(
n
)-term.
For the rest of the paper, we assume that the pivot is selected as the median of
approximately
n
randomly chosen elements. Sample sizes of approximately
n
are likely to be optimal as the results in [3, 13] suggest.
The following theorem is one of our main results. It can be proved using
Chernoff bounds and then solving the linear recurrence.
Theorem 1 (
QuickXsort
Average-Case).
Let X be some sorting algorithm
requiring at most
nlog n
+
cn
+
o
(
n
)comparisons in the average case. Then,
QuickXsort implemented with
Θ
(
n
)elements as sample for pivot selection is
a sorting algorithm that also needs at most
nlog n
+
cn
+
o
(
n
)comparisons in
the average case.
Does QuickXsort provide a good bound for the worst case? The obvious
answer is “no”. If always the
n
smallest elements are chosen for pivot selection,
Θ
(
n3/2
) comparisons are performed. However, we can prove that such a worst
case is very unlikely. Let
R
(
n
) be the worst case number of comparisons of the
algorithm X.
4
Proposition 1.
Let
 >
0. The probability that QuickXsort needs more than
R(n)+6ncomparisons is less than (3/4 + )4
nfor nlarge enough.
In order to obtain a provable bound for the worst case complexity we apply a
simple trick similar to the one used in Introsort [14]. We fix some worst case
efficient sorting algorithm Y. This might be, e. g., InPlaceMergesort. (In order
to obtain an efficient internal sorting algorithm, Y has to be internal.) Worst case
efficient means that we have a
nlog n
+
O
(
n
) bound for the worst case number of
comparisons. We choose some slowly decreasing function
δ
(
n
)
o
(1)
(
n1
4+
),
e. g.,
δ
(
n
) = 1
/log n
. Now, whenever the pivot is more than
n·δ
(
n
) off the
median, we stop with QuickXsort and continue by sorting both parts of the
partitioned array with the algorithm Y. We call this QuickXYsort. To achieve
a good worst case bound, of course, we also need a good bound for algorithm
X. W. l. o. g. we assume the same worst case bounds for X as for Y. Note that
QuickXYsort only makes sense if one needs a provably good worst case bound.
Since QuickXsort is always expected to make at most as many comparisons as
QuickXYsort (under the reasonable assumption that X on average is faster
than Y – otherwise one would use simply Y), in every step of the recursion
QuickXsort is the better choice for the average case.
Theorem 2 (
QuickXYsort
Worst-Case).
Let X be a sorting algorithm with
at most
nlog n
+
cn
+
o
(
n
)comparisons in the average case and
R
(
n
) =
nlog n
+
dn
+
o
(
n
)comparisons in the worst case (
dc
). Let Y be a sorting algorithm
with at most
R
(
n
)comparisons in the worst case. Then, QuickXYsort is a
sorting algorithm that performs at most
nlog n
+
cn
+
o
(
n
)comparisons in the
average case and nlog n+ (d+ 1)n+o(n)comparisons in the worst case.
In order to keep the the implementation of QuickXYsort simple, we propose
the following algorithm Y: Find the median with some linear time algorithm
(see e.g. [1]), then apply QuickXYsort with this median as first pivot element.
Note that this algorithm is well defined because by induction the algorithm Y is
already defined for all smaller instances. The proof of Thm. 2 shows that Y, and
thus QuickXYsort, has a worst case number of comparisons in
nlog n
+
O
(
n
).
3QuickWeakHeapsort
In this section consider QuickWeakHeapsort as a first example of Quick-
Xsort. We start by introducing weak heaps and then continue by describing
WeakHeapsort and a novel external version of it. This external version is a
good candidate for QuickXsort and yields an efficient sorting algorithm that
uses approximately
nlog n
1
.
2
n
comparisons (this value is only a rough estimate
and neither a bound from below nor above). A drawback of WeakHeapsort
and its variants is that they require one extra bit per element. The exposition also
serves as an intermediate step towards our implementation of MergeInsertion,
where the weak-heap data structure will be used as a building block. Conceptually,
aweak heap (see Fig. 1) is a binary tree satisfying the following conditions:
5
9
2
2
1
0
7
56
4
1
3
11
98
4576
3
8
Fig. 1.
A weak heap (reverse bits are set for grey nodes, above the nodes are array
indices.)
(1) The root of the entire tree has no left child.
(2)
Except for the root, the nodes that have at most one child are in the last
two levels only. Leaves at the last level can be scattered, i. e., the last level is
not necessarily filled from left to right.
(3)
Each node stores an element that is smaller than or equal to every element
stored in its right subtree.
From the first two properties we deduce that the height of a weak heap that has
n
elements is
dlog ne
+ 1. The third property is called the weak-heap ordering or
half-tree ordering. In particular, this property enforces no relation between an
element in a node and those stored its left subtree. On the other hand, it implies
that any node together with its right subtree forms a weak heap on its own. In
an array-based implementation, besides the element array
s
, an array
r
of reverse
bits is used, i. e.,
ri∈ {
0
,
1
}
for
i∈ {
0
, . . . , n
1
}
. The root has index 0. The
array index of the left child of
si
is 2
i
+
ri
, the array index of the right child is
2
i
+ 1
ri
, and the array index of the parent is
bi/
2
c
(assuming that
i6
= 0). Using
the fact that the indices of the left and right children of siare exchanged when
flipping
ri
, subtrees can be reversed in constant time by setting
ri
1
ri
. The
distinguished ancestor (
d-ancestor
(
j
)) of
sj
for
j6
= 0, is recursively defined as
the parent of
sj
if
sj
is a right child, and the distinguished ancestor of the parent
of
sj
if
sj
is a left child. The distinguished ancestor of
sj
is the first element on
the path from
sj
to the root which is known to be smaller or equal than
sj
by (3).
Moreover, any subtree rooted by
sj
, together with the distinguished ancestor
si
of
sj
, forms again a weak heap with root
si
by considering
sj
as right child of
si
.
The basic operation for creating a weak heap is the
join
operation which
combines two weak heaps into one. Let
i < j
be two nodes in a weak heap
such that
si
is smaller than or equal to every element in the left subtree of
sj
.
Conceptually,
sj
and its right subtree form a weak heap, while
si
and the left
subtree of
sj
form another weak heap. (Note that
si
is not part of the subtree
with root
sj
.) The result of
join
is a weak heap with root at position
i
. If
sj< si
,
the two elements are swapped and
rj
is flipped. As a result, the new element
sj
will be smaller than or equal to every element in its right subtree, and the new
element
si
will be smaller than or equal to every element in the subtree rooted at
6
sj
. To sum up,
join
requires constant time and involves one element comparison
and a possible element swap in order to combine two weak heaps to a new one.
The construction of a weak heap consisting of
n
elements requires
n
1
comparisons. In the standard bottom-up construction of a weak heap the nodes
are visited one by one. Starting with the last node in the array and moving to
the front, the two weak heaps rooted at a node and its distinguished ancestor
are joined. The amortized cost to get from a node to its distinguished ancestor is
O(1) [6].
When using weak heaps for sorting, the minimum is removed and the weak
heap condition restored until the weak heap becomes empty. After extracting an
element from the root, first the special path from the root is traversed top-down,
and then, in a bottom-up process the weak-heap property is restored using at
most
dlog ne
join operations. (The special path is established by going once to the
right and then to the left as far as it is possible.) Hence, extracting the minimum
requires at most dlog necomparisons.
Now, we introduce a modification to the standard procedure described by
Dutton [4], which has a slightly improved performance, but requires extra space.
We call this modified algorithm ExternalWeakHeapsort. This is because it
needs an extra output array, where the elements which are extracted from the
weak heap are moved to. On average ExternalWeakHeapsort requires less
comparisons than RelaxedWeakHeapsort [5]. Integrated in QuickXsort we
can implement it without extra space other than the extra bits
r
and some other
extra bits. We introduce an additional array active and weaken the requirements
of a weak heap: we also allow nodes on other than the last two levels to have
less than two children. Nodes where the active bit is set to false are considered
to have been removed. ExternalWeakHeapsort works as follows: First, a
usual weak heap is constructed using
n
1 comparisons. Then, until the weak
heap becomes empty, the root – which is the minimal element – is moved to the
output array and the resulting hole has to be filled with the minimum of the
remaining elements (so far the only difference to normal WeakHeapsort is that
there is a separate output area).
The hole is filled by searching the special path from the root to a node
x
which has no left child. Note that the nodes on the special path are exactly the
nodes having the root as distinguished ancestor. Finding the special path does
not need any comparisons since one only has to follow the reverse bits. Next, the
element of the node
x
is moved to the root leaving a hole. If
x
has a right subtree
(i. e., if
x
is the root of a weak heap with more than one element), this hole is
filled by applying the hole-filling algorithm recursively to the weak heap with
root
x
. Otherwise, the active bit of
x
is set to false. Now, the root of the whole
weak heap together with the subtree rooted by
x
forms a weak heap. However, it
remains to restore the weak heap condition for the whole weak heap. Except for
the root and
x
, all nodes on the special path together with their right subtrees
form weak heaps. Following the special path upwards these weak heaps are joined
with their distinguished ancestor as during the weak heap construction (i. e.,
successively they are joined with the weak heap consisting of the root and the
7
already treated nodes on the special path together with their subtrees). Once,
all the weak heaps on the special path are joined, the whole array forms a weak
heap again.
Theorem 3.
For
n
= 2
k
ExternalWeakHeapsort performs exactly the same
comparisons as Mergesort applied on a fixed permutation of the same input
array.
By [12, 5.2.4–13] we obtain the following corollary.
Corollary 1 (Average Case
ExternalWeakHeapsort
).
For
n
= 2
k
the
algorithm ExternalWeakHeapsort uses approximately
nlog n
1
.
26
n
com-
parisons in the average case.
If
n
is not a power of two, the sizes of left and right parts of WeakHeapsort
are less balanced than the left and right parts of ordinary Mergesort and one
can expect a slightly higher number of comparisons. For QuickWeakHeapsort,
the half of the array which is not sorted by ExternalWeakHeapsort is used
as output area. Whenever the root is moved to the output area, the element that
occupied that place before is inserted as a dummy element at the position where
the active bit is set to false. Applying Thm. 1, we obtain the rough estimate of
nlog n1.2ncomparisons for the average case of QuickWeakHeapsort.
4QuickMergesort
As another example for QuickXsort we consider QuickMergesort. For
the Mergesort part we use standard (top-down) Mergesort which can be
implemented using
m
extra spaces to merge two arrays of length
m
. After the
partitioning, one part of the array – we assume the first part – has to be sorted
with Mergesort. In order to do so, the second half of this first part is sorted
recursively with Mergesort while moving the elements to the back of the whole
array. The elements from the back of the array are inserted as dummy elements
into the first part. Then, the first half the first part is sorted recursively with
Mergesort while being moved to the position of the former second part. Now,
at the front of the array, there is enough space (filled with dummy elements)
such that the two halves can be merged. The procedure is depicted in Fig. 2. As
long as there is at least one third of the whole array as temporary memory left,
the larger part of the partitioned array is sorted with Mergesort, otherwise the
smaller part is sorted with Mergesort. Hence, the part which is not sorted by
Mergesort always provides enough temporary space. Whenever a data element
is moved to or from the temporary space, it is swapped with the dummy element
occupying the respective position. Since Mergesort moves through the data
from left to right, it is always clear which elements are the dummy elements.
Depending on the implementation the extra space needed is
O
(
log n
) words for
the recursion stack of Mergesort. By avoiding recursion this can be reduced to
O(1). Thm. 1 together with [12, 5.2.4–13] yields the next result.
8
Pivot Pivot
Pivot
Fig. 2.
First the two halves of the left part are sorted moving them from one place to
another. Then, they are merged to the original place.
Theorem 4 (Average Case
QuickMergesort
).
QuickMergesort is an
internal sorting algorithm that performs at most
nlog n
1
.
26
n
+
o
(
n
)comparisons
on average.
We can do even better if we sort small subarrays with another algorithm Z
requiring less comparisons but extra space and more moves, e.g., Insertionsort
or MergeInsertion. If we use
O
(
log n
) elements for the base case of Mergesort,
we have to call Z at most
O
(
n/ log n
) times. In this case we can allow additional
operations of Z like moves in the order of
O
(
n2
) given that
O
((
n/ log n
)
·log2n
) =
O
(
nlog n
). Note that for the next result we only need that the size of the base
cases grows as
n
grows. Nevertheless, when applying an algorithm which uses
Θ
(
n2
) moves, the size of the base cases has to be in
O
(
log n
) in order to achieve
an O(nlog n) overall running time.
Theorem 5 (
QuickMergesort
with Base Case).
Let
Z
be some sorting
algorithm with
nlog n
+
en
+
o
(
n
)comparisons on average and other operations
taking at most
O
(
n2
)time. If base cases of size
O
(
log n
)are sorted with Z,
QuickMergesort uses at most
nlog n
+
en
+
o
(
n
)comparisons and
O
(
nlog n
)
other instructions on average.
Proof.
By Thm. 1 and the preceding remark, the only thing we have to prove is
that Mergesort with base case Z requires on average at most
nlog n
+
en
+
o
(
n
)
comparisons, given that Z needs
U
(
n
) =
nlog n
+
en
+
o
(
n
) comparisons on
average. The latter means that for every
 >
0 we have
U
(
n
)
nlog n
+ (
e
+
)
·n
for nlarge enough.
Let
Sk
(
m
) denote the average case number of comparisons of Mergesort
with base cases of size
k
sorted with Z and let
 >
0. Since
log n
grows as
n
grows,
we have that
Slog n
(
m
) =
U
(
m
)
mlog m
+ (
e
+
)
·m
for
n
large enough and
(
log n
)
/
2
< m log n
. For
m > log n
we have
Slog n
(
m
)
2
·Slog n
(
m/
2) +
m
and by induction we see that
Slog n
(
m
)
mlog m
+ (
e
+
)
·m
. Hence, also
Slog n(n)nlog n+ (e+)·nfor nlarge enough. ut
Recall that Insertionsort inserts the elements one by one into the already
sorted sequence by binary search. Using Insertionsort we obtain the following
result. Here, ln denotes the natural logarithm.
Proposition 2 (Average Case of
Insertionsort
).
The sorting algorithm
Insertionsort needs
nlog n
2
ln
2
·n
+
c
(
n
)
·n
+
O
(
log n
)comparisons on
average where c(n)[0.005,0.005].
9
Corollary 2 (
QuickMergesort
with Base Case
Insertionsort
).
If we
use as base case Insertionsort,QuickMergesort uses at most
nlog n
1.38n+o(n)comparisons and O(nlog n)other instructions on average.
Bases cases of growing size always lead to a constant factor overhead in
running time if an algorithm with a quadratic number of total operations is
applied. Therefore, in the experiments we also consider constant size base cases,
which offer a slightly worse bound for the number of comparisons, but are faster
in practice. We do not analyze them separately since the preferred choice for
the size depends on the type of data to be sorted and the system on which the
algorithms run.
5MergeInsertion
MergeInsertion by Ford and Johnson [9] is one of the best sorting algorithms
in terms of number of comparisons. Hence, it can be applied for sorting base
cases of QuickMergesort what yields even better results than Insertionsort.
Therefore, we want to give a brief description of the algorithm and our imple-
mentation. Algorithmically, MergeInsertion(
s0, . . . , sn1
) can be described as
follows (an intuitive example for n= 21 can be found in [12]):
1.
Arrange the input such that
sisi+bn/2c
for 0
i < bn/2c
with one
comparison per pair. Let
ai
=
si
and
bi
=
si+bn/2c
for 0
i < bn/2c
, and
bbn/2c=sn1if nis odd.
2. Sort the values a0,...,abn/2c−1recursively with MergeInsertion.
3.
Rename the solution as follows:
b0a0a1≤ ··· ≤ abn/2c−1
and insert
the elements
b1, . . . , bdn/2e−1
via binary insertion, following the ordering
b2
,
b1
,
b4
,
b3
,
b10
,
b9, . . . , b5, . . .
,
btk1
,
btk11,...btk2+1
,
btk, . . .
into the main
chain, where tk= (2k+1 + (1)k)/3.
While the description is simple, MergeInsertion is not easy to implement
efficiently because of the different renamings, the recursion, and the change of
link structure. Our proposed implementation of MergeInsertion is based on a
tournament tree representation with weak heaps as in Sect. 3. It uses
nlog n
+
n
extra bits and works as follows: First, step 1 is performed for all recursion levels
by constructing a weak heap. (Pseudo-code implementations for all the operations
to construct a tournament tree with a weak heap and to access the partners in
each round can be found in [7] – note that for simplicity in the above formulation
the indices and the order are reversed compared to our implementation.) Then,
in a second phase step 3 is executed for all recursion levels, see Fig. 3. One main
subroutine of MergeInsertion is binary insertion. The call
binary-insert
(
x, y, z
)
inserts the element at position
z
between position
x
1 and
x
+
y
by binary
insertion. In this routine we do not move the data elements themselves, but we use
an additional index array
φ0, . . . , φn1
to point to the elements contained in the
weak heap tournament tree and move these indirect addresses. This approach has
the advantage that the relations stored in the tournament tree are preserved. The
10
procedure:merge(m: integer)
global:φarray of nintegers imposed by weak-heap
for l0to bm/2c − 1
φmodd(m)l1d-child(φl, m odd(m));
k1; e2k;cf0;
while e < m
kk+ 1; e2e;
l← dm/2e+f;ff+ (tktk1);
for i0to (tktk1)1
cc+ 1;
if c=dm/2ethen
return;
if tk>dm/2e − 1then
binary-insert(i+ 1 odd(m), l, m 1);
else
binary-insert(bm/2c − f+i, e 1,bm/2c+f);
Fig. 3.
Merging step in MergeInsertion with
tk
= (2
k+1
+ (
1)
k
)
/
3 ,
odd
(
m
) =
mmod
2, and
d-child
(
φi, n
) returns the highest index less than
n
of a grandchild
of
φi
in the weak heap (i. e,
d-child
(
φi, n
) = index of the bottommost element in
the weak heap which has d-ancestor =φiand index < n).
most important procedure for MergeInsertion is the organization of the calls
for
binary-insert
. After adapting the addresses for the elements
bi
(w. r. t. the
above description) in the second part of the array, the algorithm calls the binary
insertion routine with appropriate indices. Note that we always use
k
comparisons
for all elements of the
k
-th block (i. e., the elements
btk, . . . , btk1+1
) even if there
might be the chance to save one comparison. By introducing an additional array,
which for each
bi
contains the current index of
ai
, we can exploit the observation
that not always
k
comparisons are needed to insert an element of the
k
-th block.
In the following we call this the improved variant. The pseudo-code of the basic
variant is shown in Fig. 3. The last sequence is not complete and is thus tackled
in a special case.
Theorem 6 (Average Case of
MergeInsertion
).
The sorting algorithm
MergeInsertion needs
nlog nc
(
n
)
·n
+
O
(
log n
)comparisons on average,
where c(n)1.3999.
Corollary 3 (
QuickMergesort
with Base Case
MergeInsertion
).
When
using MergeInsertion as base case, QuickMergesort needs at most
nlog n
1.3999n+o(n)comparisons and O(nlog n)other instructions on average.
6 Experiments
Our experiments consist of two parts. First, we compare the different algorithms
we use as base cases, i. e., MergeInsertion, its improved variant, and Inser-
tionsort. The results can be seen in Fig. 4. Depending on the size of the arrays
11
the displayed numbers are averages over 10-10000 runs
3
. The data elements we
sorted were randomly chosen 32-bit integers. The number of comparisons was
measured by increasing a counter in every comparison4.
The outcome in Fig. 4 shows that our improved MergeInsertion imple-
mentation achieves results for the constant
κ
of the linear term in the range of
[
1
.
43
,
1
.
41] (for some values of
n
are even smaller than
1
.
43). Moreover, the
standard implementation with slightly more comparisons is faster than Insertion-
sort. By the
O
(
n2
) work, the resulting runtimes for all three implementations
raise quickly, so that only moderate values of ncan be handled.
−1.45
−1.44
−1.43
−1.42
−1.41
−1.4
−1.39
−1.38
−1.37
−1.36
−1.35
210 212 214 216
Number of element comparisons − n log n per n
n [logarithmic scale]
Insertionsort
Simple MergeInsertion
MergeInsertion
Lower Bound
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
210 212 214 216
Execution time per (#elements)2 [µs]
n [logarithmic scale]
Small−Scale Runtime Experiment
Insertionsort
Merge Insertion Improved
Merge Insertion
Fig. 4.
Comparison of MergeInsertion, its improved variant and Insertionsort. For
the number of comparisons nlog n+κn the value of κis displayed.
The second part of our experiments (shown in Fig. 5) consists of the com-
parison of QuickMergesort (with base cases of constant and growing size)
and QuickWeakHeapsort with state-of-the-art algorithms as STL-Introsort
(i. e., Quicksort), STL-stable-sort (BottomUpMergesort) and Quick-
sort with median of
n
elements for pivot selection. For QuickMergesort
with base cases, the improved variant of MergeInsertion is used to sort subar-
rays of size up to 40
log10 n
. For the normal QuickMergesort we used base
cases of size
9. We also implemented QuickMergesort with median of three
for pivot selection, which turns out to be practically efficient, although it needs
slightly more comparisons than QuickMergesort with median of
n
. However,
3
Our experiments were run on one core of an Intel Core i7-3770 CPU (3.40GHz, 8MB
Cache) with 32GB RAM; Operating system: Ubuntu Linux 64bit; Compiler: GNU’s
g++ (version 4.6.3) optimized with flag -O3.
4
To rely on objects being handled we avoided the flattening of the array structure
by the compiler. Hence, for the running time experiments, and in each comparison
taken, we left the counter increase operation intact.
12
since also the larger half of the partitioned array can be sorted with Mergesort,
the difference to the median of
n
version is not as big as in QuickHeapsort
[3]. As suggested by the theory, we see that our improved QuickMergesort
implementation with growing size base cases MergeInsertion yields a result
for the constant in the linear term that is in the range of [
1
.
41
,
1
.
40] – close
to the lower bound. However, for the running time, normal QuickMergesort
as well as the STL-variants Introsort (
std::sort
) and BottomUpMerge-
sort (
std::stable sort
) are slightly better. With about 15% the time gap,
however, is not overly big, and may be bridged with additional optimizations.
Also, when comparisons are more expensive, QuickMergesort performs faster
than Introsort and BottomUpMergesort, see the arXiv version [7].
−1.5
−1
−0.5
0
0.5
214 216 218 220 222 224 226
Number of element comparisons − n log n per n
n [logarithmic scale]
STL Introsort (out of range)
Quicksort Median Sqrt
STL Mergesort
QuickMergesort Median 3
QuickWeakHeapsort Median Sqrt
QuickMergesort Median Sqrt
QuickMergesort (MI) Median Sqrt
Lower Bound
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
214 216 218 220 222 224 226
Execution time per element [µs]
n [logarithmic scale]
QuickWeakHeapsort Median Sqrt
QuickMergesort (MI) Median Sqrt
Quicksort Median Sqrt
QuickMergesort Median Sqrt
QuickMergesort Median 3
STL Mergesort
STL Introsort
Fig. 5.
Comparison of QuickMergesort (with base cases of constant and growing size)
and QuickWeakHeapsort with other sorting algorithms; (MI) is short for including
growing size base cases derived from MergeInsertion. For the number of comparisons
nlog n+κn the value of κis displayed.
7 Concluding Remarks
Sorting
n
elements remains a fascinating topic for computer scientists both
from a theoretical and from a practical point of view. With QuickXsort we
have described a procedure how to convert an external sorting algorithm into
an internal one introducing only
o
(
n
) additional comparisons on average. We
presented QuickWeakHeapsort and QuickMergesort as two examples for
this construction. QuickMergesort is close to the lower bound for the average
number of comparisons and at the same time is practically efficient, even when
the comparisons are fast.
13
Using MergeInsertion to sort base cases of growing size for QuickMerge-
sort, we derive an an upper bound of
nlog n
1
.
3999
n
+
o
(
n
) comparisons for
the average case. As far as we know a better result has not been published before.
We emphasize that the average of our best implementation has a proven gap of
at most 0
.
05
n
+
o
(
n
) comparisons to the lower bound. The value
nlog n
1
.
4
n
for
n
= 2
k
matches one side of Reinhardt’s conjecture that an optimized in-place
algorithm can have
nlog n
1
.
4
n
+
O
(
log n
) comparisons in the average [15].
Moreover, our experimental results validate the theoretical considerations and
indicate that the factor
1
.
43 can be beaten. Of course, there is still room in
closing the gap to the lower bound of nlog n1.44n+O(log n) comparisons.
References
1.
M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for
selection. J. Comput. Syst. Sci., 7(4):448–461, 1973.
2.
D. Cantone and G. Cinotti. QuickHeapsort, an efficient mix of classical sorting
algorithms. Theoretical Comput. Sci., 285(1):25–42, 2002.
3.
V. Diekert and A. Weiß. Quickheapsort: Modifications and improved analysis. In
A. A. Bulatov and A. M. Shur, editors, CSR, volume 7913 of Lecture Notes in
Computer Science, pages 24–35. Springer, 2013.
4. R. D. Dutton. Weak-heap sort. BIT, 33(3):372–381, 1993.
5.
S. Edelkamp and P. Stiegeler. Implementing HEAPSORT with
nlog n
0
.
9
n
and
QUICKSORT with
nlog n
+ 0
.
2
n
comparisons. ACM Journal of Experimental
Algorithmics, 10(5), 2002.
6.
S. Edelkamp and I. Wegener. On the performance of Weak-Heapsort. In 17th
Annual Symposium on Theoretical Aspects of Computer Science, volume 1770, pages
254–266. Springer-Verlag, 2000.
7.
S. Edelkamp and A. Weiß. QuickXsort: Efficient Sorting with
nlog n
1
.
399
n
+
o
(
n
)
Comparisons on Average. ArXiv e-prints, abs/1307.3033, 2013.
8.
A. Elmasry, J. Katajainen, and M. Stenmark. Branch mispredictions don’t affect
mergesort. In SEA, pages 160–171, 2012.
9.
J. Ford, Lester R. and S. M. Johnson. A tournament problem. The American
Mathematical Monthly, 66(5):pp. 387–389, 1959.
10. J. Katajainen. The Ultimate Heapsort. In CATS, pages 87–96, 1998.
11.
J. Katajainen, T. Pasanen, and J. Teuhola. Practical in-place mergesort. Nord. J.
Comput., 3(1):27–40, 1996.
12.
D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.
Addison Wesley Longman, 2nd edition, 1998.
13.
C. Mart´ınez and S. Roura. Optimal Sampling Strategies in Quicksort and Quickse-
lect. SIAM J. Comput., 31(3):683–705, 2001.
14.
D. R. Musser. Introspective sorting and selection algorithms. Software—Practice
and Experience, 27(8):983–993, 1997.
15.
K. Reinhardt. Sorting in-place with a worst case complexity of
nlog n
1
.
3
n
+
o
(
log n
)
comparisons and n log n+o(1) transports. In ISAAC, pages 489–498, 1992.
16.
I. Wegener. Bottom-up-Heapsort, a new variant of Heapsort beating, on an average,
Quicksort (if nis not very small). Theoretical Comput. Sci., 118:81–98, 1993.
14
... Regarding the average case (with respect to a uniform distribution over all input permutations) not much is known: in [8] Knuth calculated the number of comparisons required on average for n ∈ {1, . . . , 8}; an upper bound of n log n − 1.3999n + o(n) has been given in [3,Theorem 6] (the proof can be found in [4,Theorem 6.5]). Most recently, Iwama and Teruyama [7] showed that in the average case MergeInsertion can be improved by combining it with their (1,2)-Insertion algorithm resulting in an upper bound of n log n − 1.4106n + O(log n). ...
... -We compute the exact average number of comparisons for n up to 148 -thus, going much further than [8] (Section 4.2). -We improve the bound of [3,4] to n log n − 1.4005n + o(n) (Theorem 3). This partially answers a conjecture from [12] which asks for an in-place algorithm with n log n − 1.4n comparisons on average and n log n − 1.3n comparisons in the worst case. ...
... This partially answers a conjecture from [12] which asks for an in-place algorithm with n log n − 1.4n comparisons on average and n log n − 1.3n comparisons in the worst case. Although MergeInsertion is not in-place, the the techniques from [3,4] or [12] can be used to make it so. -We evaluate a slightly different insertion order decreasing the gap between the lower bound and the average number of comparisons of MergeInsertion by roughly 30% for n ≈ 2 k /3 (Section 5.2). ...
Article
Full-text available
MergeInsertion, also known as the Ford-Johnson algorithm, is a sorting algorithm which, up to today, for many input sizes achieves the best known upper bound on the number of comparisons. Indeed, it gets extremely close to the information-theoretic lower bound. While the worst-case behavior is well understood, only little is known about the average case. This work takes a closer look at the average case behavior. In particular, we establish an upper bound of \(n \log n - 1.4005n + o(n)\) comparisons. We also give an exact description of the probability distribution of the length of the chain a given element is inserted into and use it to approximate the average number of comparisons numerically. Moreover, we compute the exact average number of comparisons for n up to 148. Furthermore, we experimentally explore the impact of different decision trees for binary insertion. To conclude, we conduct experiments showing that a slightly different insertion order leads to a better average case and we compare the algorithm to Manacher’s combination of merging and MergeInsertion as well as to the recent combined algorithm with (1,2)-Insertionsort by Iwama and Teruyama.
... Also InSituMergesort only uses an expected-case lineartime algorithm for the median computation. 3 In the conference paper [10], the first and second author introduced the name QuickXsort and first considered QuickMergesort as an application (including weaker forms of the results in Sect. 4.2 and Sect. ...
... A weaker upper bound for the median-of-3 case was also given by the first two authors in the preprint [12]. The present work is an extended version of [10] and [52]; it unifies and strengthens these results, includes detailed proofs, and it complements the theoretical findings with extensive running-time experiments. ...
... WeakHeapsort has been introduced by Dutton [6] and applied to QuickWeakHeapsort in [7]. We introduced a refined version of ExternalWeakHeapsort in [10] that works by the same principle as ExternalHeapsort; more details on this algorithm, its application in QuickWeakHeapsort, and the relation to Mergesort can be found in our preprint [9]. ...
Article
Full-text available
QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare’s Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by o(n)-terms. For median-of-k pivot selection for some constant k, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most nlgn-0.8358n+O(logn) comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to nlgn-1.4112n+o(n) for a remaining gap of only 0.0315n comparisons to the known lower bound (while using only O(logn) additional space and O(nlogn) time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser’s Introsort.
... Regarding the average case not much is known: in [7] Knuth calculated the number of comparisons required on average for n ∈ {1, . . . , 8}; an upper bound of n log n − 1.3999n + o(n) has been established in [3]. Most recently, Iwama and Teruyama [6] showed that in the average case MergeInsertion can be improved by combining it with their (1,2)-Insertion algorithm resulting in an upper bound of n log n − 1.4106n + O(log n). ...
... -We compute the exact average number of comparisons for n up to 148thus, going much further than [7]. -We improve the bound of [3] to n log n − 1.4005n + o(n) (Theorem 3). This partially answers a conjecture from [11] which asks for an in-place algorithm with n log n + 1.4n comparisons on average and n log n − 1.3n comparisons in the worst case. ...
... This partially answers a conjecture from [11] which asks for an in-place algorithm with n log n + 1.4n comparisons on average and n log n − 1.3n comparisons in the worst case. Although MergeInsertion is not in-place, the the techniques from [3] or [11] can be used to make it so. -We evaluate a slightly different insertion order decreasing the gap between the lower bound and the average number of comparisons of MergeInsertion by roughly 30% for n ≈ 2 k /3. ...
Preprint
Full-text available
MergeInsertion, also known as the Ford-Johnson algorithm, is a sorting algorithm which, up to today, for many input sizes achieves the best known upper bound on the number of comparisons. Indeed, it gets extremely close to the information-theoretic lower bound. While the worst-case behavior is well understood, only little is known about the average case. This work takes a closer look at the average case behavior. In particular, we establish an upper bound of $n \log n - 1.4005n + o(n)$ comparisons. We also give an exact description of the probability distribution of the length of the chain a given element is inserted into and use it to approximate the average number of comparisons numerically. Moreover, we compute the exact average number of comparisons for $n$ up to 148. Furthermore, we experimentally explore the impact of different decision trees for binary insertion. To conclude, we conduct experiments showing that a slightly different insertion order leads to a better average case and we compare the algorithm to the recent combination with (1,2)-Insertionsort by Iwama and Teruyama.
... In the conference paper [12], the first and second author introduced the name QuickX- sort and first considered QuickMergesort as an application (including weaker forms of the results in Section 5 and Section 9 without proofs). In [50], the third author analyzed ...
... A weaker upper bound for the median-of-3 case was also given by the first two authors in the preprint [14]. The present work is a full version of [12] and [50]; it unifies and strengthens these results (including all proofs) and it complements the theoretical findings with extensive running-time experiments. ...
... WeakHeapsort has been introduced by Dutton [7] and applied to QuickWeakHeapsort in [8]. We introduced a refined version of ExternalWeakHeapsort in [12] that works by the same principle as ExternalHeapsort; more details on this algorithm, its application in QuickWeakHeapsort, and the relation to Mergesort can be found in our preprint [11]. ...
Preprint
Full-text available
QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by $o(n)$-terms. For median-of-$k$ pivot selection for some constant $k$, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most $n \lg n - 0.8358n + O(\log n)$ comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to $n \lg n- 1.4106n + o(n)$ for a remaining gap of only $0.0321n$ comparisons to the known lower bound (while using only $O(\log n)$ additional space and $O(n \log n)$ time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser's Introsort.
... Based on QuickHeapsort [5,7], Edelkamp and Weiß [9] developed the concept of QuickXsort and applied it to X = WeakHeapsort [8] and X = Mergesort. The idea -going back to UltimateHeapsort [17] -is very simple: as in Quicksort the array is partitioned into the elements greater and less than some pivot element, respectively. ...
... Using Mergesort as X, a partitioning scheme with √ n pivots, known to be optimal for classical Quicksort [22], and Ford and Johnson's MergeInsertion as the base case [13], QuickMergesort requires at most n log n − 1.3999n + o(n) element comparisons on the average (and n log n − 1.4n + o(n) for n = 2 k ), while preserving worst-case bounds n log n + O(n) element comparisons and O(n log n) time for all other operations [9]. To the authors' knowledge the average-case result is the best-known upper bound for sequential sorting with O(n log n) overall time bound, in the leading term matching, and in the linear term being less than 0.045n away from the lower bound. ...
... We refine a trick suggested in [9] in order to obtain a bound of n log n + 16.1n comparisons in the worst case using the median-of-median algorithm [4] with an adaptive pivot sampling strategy. On average the modified algorithm is only slightly slower than the Median-of-3 variant of QuickMergesort. ...
Preprint
Full-text available
We consider the fundamental problem of internally sorting a sequence of $n$ elements. In its best theoretical setting QuickMergesort, a combination Quicksort with Mergesort with a Median-of-$\sqrt{n}$ pivot selection, requires at most $n \log n - 1.3999n + o(n)$ element comparisons on the average. The questions addressed in this paper is how to make this algorithm practical. As refined pivot selection usually adds much overhead, we show that the Median-of-3 pivot selection of QuickMergesort leads to at most $n \log n - 0{.}75n + o(n)$ element comparisons on average, while running fast on elementary data. The experiments show that QuickMergesort outperforms state-of-the-art library implementations, including C++'s Introsort and Java's Dual-Pivot Quicksort. Further trade-offs between a low running time and a low number of comparisons are studied. Moreover, we describe a practically efficient version with $n \log n + O(n)$ comparisons in the worst case.
... In QuickXSort [5], we use the recursive scheme of ordinary Quicksort, but instead of doing two recursive calls after partitioning, we first sort one of the segments by some other sorting method X. Only the second segment is recursively sorted by QuickXSort. ...
... Edelkamp and Weiß [5] explicitly describe QuickXSort as a general design pattern and, among others, consider using Mergesort as 'X'. They use the median of √ n elements in each round throughout to guarantee good splits with high probability. ...
... They show by induction that when X uses at most n lg n + cn + o(n) comparisons on average for some constant c, the number of comparisons in QuickXSort is also bounded by n lg n + cn + o(n). By combining QuickMergesort with Ford and Johnson's MergeInsertion [7] for subproblems of logarithmic size, Edelkamp and Weiß obtained an in-place sorting method that uses on the average a close to minimal number of comparisons of n lg n − 1.3999n + o(n). 2 Edelkamp and Weiß do consider this version of QuickMergesort [5], but only analyze it for median-of-√ n pivots. In this case the behavior coincides with the simpler strategy to always sort the smaller segment by Mergesort since the segments are of almost equal size with high probability. ...
Article
Full-text available
QuickXSort is a strategy to combine Quicksort with another sorting method X, so that the result has essentially the same comparison cost as X in isolation, but sorts in place even when X requires a linear-size buffer. We solve the recurrence for QuickXSort precisely up to the linear term including the optimization to choose pivots from a sample of k elements. This allows to immediately obtain overall average costs using only the average costs of sorting method X (as if run in isolation). We thereby extend and greatly simplify the analysis of QuickHeapsort and QuickMergesort with practically efficient pivot selection, and give the first tight upper bounds including the linear term for such methods.
... We aim to complete this list by the integration of various data structures found in the AI/planning literature. For instance we may cite weak-heapsort [ES02], QuickXsort [EW14], and various data structures found in the heuristic search textbook [ES11]. ...
Thesis
Full-text available
Tree search algorithms are used in a large variety of applications (MIP, CP, SAT, metaheuristics with Ant Colony Optimization and GRASP) and also in AI/planning communities. All of these techniques present similar components and many of those components can be transferred from one community to another. Preliminary results indicate that anytime tree search techniques are competitive compared to commonly used metaheuristics in operations research.In this work, we detail a state of the art and a classification of the different tree search techniques that one can find in metaheuristics, exact methods and AI/planning. Then, we present a generic framework that allows the rapid prototyping of tree search algorithms. Finally, we use this framework to develop anytime tree search algorithms that are competitive with the commonly-used metaheuristics in operations research. We report new tree search applications for some combinatorial optimization problems and new best-known solutions.
... As the default sorting algorithm of Java 2 and Python 3 , Tim Sort [2] took the advantages of Merge Sort and Insert Sort [15] to achieve fewer than nloд(n) comparisons when running on partially sorted arrays. Stefan Edelkamp et al. introduced Quickx Sort [20] which uses at most nloдn − 0.8358 + O(loдn) operations to sort n data elements in place. The authors also introduced median-of-medians Quick Merge sort as a variant of Quick Merge Sort using the median-of-medians algorithms for pivot selection [21], which further reduces the number of operations down to nloдn + 1.59n + O(n 0.8 ). ...
Research
Full-text available
Sorting is a fundamental operation in computing. However, the speed of state-of-the-art sorting algorithms on a single thread have reached their limits. Meanwhile, deep learning has demonstrated its potential to provide significant performance improvements on data mining and machine learning tasks. Therefore, it is interesting to explore whether sorting can also be speed up by deep learning techniques. In this paper, a neural network based data distribution aware sorting method named NN-sort is presented. Compared to traditional comparison-based sorting algorithms, which need to compare the data elements in pairwise, NN-sort leverages the neural network model to learn the data distribution and uses it to map disordered data elements into ordered ones. Although the complexity of NN-sort is nloдn in theory, it can run in near-linear time as being observed in most of the cases. Experimental results on both synthetic and real-world datasets show that NN-sort yields performance improvement by up to 10.9x over traditional sorting algorithms
Conference Paper
MergeInsertion, also known as the Ford-Johnson algorithm, is a sorting algorithm which, up to today, for many input sizes achieves the best known upper bound on the number of comparisons. Indeed, it gets extremely close to the information-theoretic lower bound. While the worst-case behavior is well understood, only little is known about the average case. This work takes a closer look at the average case behavior. In particular, we establish an upper bound of \(n \log n - 1.4005n + o(n)\) comparisons. We also give an exact description of the probability distribution of the length of the chain a given element is inserted into and use it to approximate the average number of comparisons numerically. Moreover, we compute the exact average number of comparisons for n up to 148. Furthermore, we experimentally explore the impact of different decision trees for binary insertion. To conclude, we conduct experiments showing that a slightly different insertion order leads to a better average case and we compare the algorithm to the recent combination with (1,2)-Insertionsort by Iwama and Teruyama.
Conference Paper
This paper studies the average complexity on the number of comparisons for sorting algorithms. Its information-theoretic lower bound is \(n \lg n - 1.4427n + O(\log n)\). For many efficient algorithms, the first \(n\lg n\) term is easy to achieve and our focus is on the (negative) constant factor of the linear term. The current best value is \(-1.3999\) for the MergeInsertion sort. Our new value is \(-1.4106\), narrowing the gap by some \(25\%\). An important building block of our algorithm is “two-element insertion,” which inserts two numbers A and B, \(A<B\), into a sorted sequence T. This insertion algorithm is still sufficiently simple for rigorous mathematical analysis and works well for a certain range of the length of T for which the simple binary insertion does not, thus allowing us to take a complementary approach with the binary insertion.
Conference Paper
Full-text available
In quicksort, due to branch mispredictions, a skewed pivot-selection strategy can lead to a better performance than the exact-median pivot-selection strategy, even if the exact median is given for free. In this paper we investigate the effect of branch mispredictions on the behaviour of mergesort. By decoupling element comparisons from branches, we can avoid most negative effects caused by branch mispredictions. When sorting a sequence of n elements, our fastest version of mergesort performs n log2n + O(n) element comparisons and induces at most O(n) branch mispredictions. We also describe an in-situ version of mergesort that provides the same bounds, but uses only O(log2n) words of extra memory. In our test computers, when sorting integer data, mergesort was the fastest sorting method, then came quicksort, and in-situ mergesort was the slowest of the three. We did a similar kind of decoupling for quicksort, but the transformation made it slower.
Preprint
Full-text available
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort-construction. Both are efficient algorithms that incur approximately n log n - 1.26n +o(n) comparisons on the average. A worst case of n log n + O(n) comparisons can be achieved without significantly affecting the average case. Furthermore, we describe an implementation of MergeInsertion for small n. Taking MergeInsertion as a base case for QuickMergesort, we establish a worst-case efficient sorting algorithm calling for n log n - 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs: when sorting integers it is slower by only 15% to STL-Introsort.
Conference Paper
Full-text available
We present a new analysis for QuickHeapsort splitting it into the analysis of the partition-phases and the analysis of the heap-phases. This enables us to consider samples of non-constant size for the pivot selection and leads to better theoretical bounds for the algorithm. Furthermore we introduce some modifications of QuickHeapsort, both in-place and using n extra bits. We show that on every input the expected number of comparisons is n lg n - 0.03n + o(n) (in-place) respectively n lg n -0.997 n+ o (n). Both estimates improve the previously known best results. (It is conjectured in Wegener93 that the in-place algorithm Bottom-Up-Heapsort uses at most n lg n + 0.4 n on average and for Weak-Heapsort which uses n extra-bits the average number of comparisons is at most n lg n -0.42n in EdelkampS02.) Moreover, our non-in-place variant can even compete with index based Heapsort variants (e.g. Rank-Heapsort in WangW07) and Relaxed-Weak-Heapsort (n lg n -0.9 n+ o (n) comparisons in the worst case) for which no O(n)-bound on the number of extra bits is known.
Conference Paper
Full-text available
First we present a new variant of Merge-sort, which needs only 1.25n space, because it uses space again, which becomes available within the current stage. It does not need more comparisons than classical Merge-sort. The main result is an easy to implement method of iterating the procedure in-place starting to sort 4/5 of the elements. Hereby we can keep the additional transport costs linear and only very few comparisons get lost, so that n log n–0.8n comparisons are needed. We show that we can improve the number of comparisons if we sort blocks of constant length with Merge-Insertion, before starting the algorithm. Another improvement is to start the iteration with a better version, which needs only (1+)n space and again additional O(n) transports. The result is, that we can improve this theoretically up to n log n –1.3289n comparisons in the worst case. This is close to the theoretical lower bound of n log n–1.443n. The total number of transports in all these versions can be reduced to n log n+O(1) for any >0.
Article
: An improved solution is presented for the problem of finding the smallest number of direct pairwise comparisons which will always suffice to rank n objects according to some transitive characteristic. In his book, Mathematical Snapshots, Steinhaus discusses the problem of ranking n objects according to some transitive characteristic, by means of successive pairwise comparisons. In this paper, the terminology was adopted of a tennis tournament by n players. The problem may be briefly stated: 'What is the smallest number of matches which will always suffice to rank all n players.'
Article
With refinements to the WEAK-HEAPSORT algorithm we establish the general and practical relevant sequential sorting algorithm INDEX-WEAK-HEAPSORT with exactly n⌈log n⌉ - 2⌈log n⌉ + 1 ≤ n log n-0.9n comparisons and at most n log n + 0.1n transpositions on any given input. It comprises an integer array of size n and is best used to generate an index for the data set. With RELAXED-WEAK-HEAPSORT and GREEDY-WEAK-HEAPSORT we discuss modifications for a smaller set of pending element transpositions.If extra space to create an index is not available, with QUICK-WEAK-HEAPSORT we propose an efficient QUICKSORT variant with n log n + 0.2n + o(n) comparisons on the average. Furthermore, we present data showing that WEAK-HEAPSORT, INDEX-WEAK-HEAPSORT and QUICK-WEAK-HEAPSORT compete with other performant QUICKSORT and HEAPSORT variants.
Article
A variant of HEAPSORT, called BOTTOM-UP-HEAPSORT, is presented. It is based on a new reheap procedure. This sequential sorting algorithm is easy to implement and beats, on an average, QUICKSORT if n⩾400 and a clever version of QUICKSORT (where the split object is the median of 3 randomly chosen objects) if n⩾16000. The worst-case number of comparisons is bounded by 1.5n log n+O(n). Moreover, the new reheap procedure improves the delete procedure for the heap data structure for all n.
Article
We present an efficient and practical algorithm for the internal sorting problem. Our algorithm works in-place and, on the average, has a running-time of in the size n of the input. More specifically, the algorithm performs comparisons and element moves on the average. An experimental comparison of our proposed algorithm with the most efficient variants of Quicksort and Heapsort is carried out and its results are discussed.
Article
The number of comparisons required to select the i-th smallest of n numbers is shown to be at most a linear function of n by analysis of a new selection algorithm—PICK. Specifically, no more than 5.4305 n comparisons are ever required. This bound is improved for extreme values of i, and a new lower bound on the requisite number of comparisons is also proved.