ArticlePDF Available

Pivot Sampling in QuickXSort: Precise Analysis of QuickMergesort and QuickHeapsort

Authors:

Abstract and Figures

QuickXSort is a strategy to combine Quicksort with another sorting method X, so that the result has essentially the same comparison cost as X in isolation, but sorts in place even when X requires a linear-size buffer. We solve the recurrence for QuickXSort precisely up to the linear term including the optimization to choose pivots from a sample of k elements. This allows to immediately obtain overall average costs using only the average costs of sorting method X (as if run in isolation). We thereby extend and greatly simplify the analysis of QuickHeapsort and QuickMergesort with practically efficient pivot selection, and give the first tight upper bounds including the linear term for such methods.
Content may be subject to copyright.
Average Cost of QuickXsort with
Pivot Sampling
Sebastian Wild
August 15, 2018
Abstract:
QuickXsort is a strategy to combine Quicksort with another sorting
method X, so that the result has essentially the same comparison cost as X in
isolation, but sorts in place even when X requires a linear-size buffer. We solve the
recurrence for QuickXsort precisely up to the linear term including the optimization
to choose pivots from a sample of
k
elements. This allows to immediately obtain
overall average costs using only the average costs of sorting method X (as if run in
isolation). We thereby extend and greatly simplify the analysis of QuickHeapsort
and QuickMergesort with practically efficient pivot selection, and give the first tight
upper bounds including the linear term for such methods.
1. Introduction
In QuickXsort [
5
], we use the recursive scheme of ordinary Quicksort, but instead of doing
two recursive calls after partitioning, we first sort one of the segments by some other sorting
method X. Only the second segment is recursively sorted by QuickXsort. The key insight is
that X can use the second segment as a temporary buffer for elements. By that, QuickXsort is
sorting in-place (using O(1) words of extra space) even when X itself is not.
Not every method makes a suitable ‘X’; it must use the buffer in a swap-like fashion: After
X has sorted its segment, the elements originally stored in our buffer must still be intact,
i.e.
,
they must still be stored in the buffer, albeit in a different order. Two possible examples that
use extra space in such a way are Mergesort (see Section 6 for details) and a comparison-efficient
Heapsort variant [
1
] with an output buffer. With QuickXsort we can make those methods sort
in-place while retaining their comparison efficiency. (We lose stability, though.)
While other comparison-efficient in-place sorting methods are known (e.g. [18, 12, 9]), the
ones based on QuickXsort and elementary methods X are particularly easy to implement
1
since
one can adapt existing implementations for X. In such an implementation, the tried and tested
optimization to choose the pivot as the median of a small sample suggests itself to improve
QuickXsort. In previous works [
1
,
5
,
3
,
6
], the influence of QuickXsort on the performance of X
was either studied by ad-hoc techniques that do not easily apply with general pivot sampling
David R. Cheriton School of Computer Science, University of Waterloo, Email: wild @ uwaterloo.ca
This work was supported by the Natural Sciences and Engineering Research Council of Canada and the
Canada Research Chairs Programme.
1
See for example the code for QuickMergesort that was presented for discussion on code review stack exchange,
https://codereview.stackexchange.com/q/149443, and the succinct C++ code in [6].
arXiv:1803.05948v2 [cs.DS] 17 May 2018
2Average Cost of QuickXsort with Pivot Sampling
or it was studied for the case of very good pivots: exact medians or medians of a sample of
n
elements. Both are typically detrimental to the average performance since they add significant
overhead, whereas most of the benefit of sampling is realized already for samples of very small
constant sizes like 3,5or 9. Indeed, in a very recent manuscript [
6
], Edelkamp and Weiß
describe an optimized median-of-3 QuickMergesort implementation in C++ that outperformed
the library Quicksort in std::sort.
The contribution of this paper is
a general transfer theorem (Theorem 5.1) that
expresses the costs of QuickXsort with median-of-ksampling (for any odd constant
k) directly in terms of the costs of X,
(
i.e.
, the costs that X needs to sort
n
elements in
isolation). We thereby obtain the first analyses of QuickMergesort and QuickHeapsort with
best possible constant-coefficient bounds on the linear term under realistic sampling schemes.
Since Mergesort only needs a buffer for one of the two runs, QuickMergesort should not
simply give Mergesort the smaller of the two segments to sort, but rather the largest one for
which the other segments still offers sufficient buffer space. (This will be the larger segment of
the two if the smaller one contains at least a third of the elements; see Section 6 for details.)
Our transfer theorem covers this refined version of QuickMergesort, as well, which had not
been analyzed before.2
The rest of the paper is structured as follows: In Section 2, we summarize previous work
on QuickXsort with a focus on contributions to its analysis. Section 3 collects mathematical
facts and notations used later. In Section 4 we define QuickXsort and formulate a recurrence
for its cost. Its solution is stated in Section 5. Section 6 presents the QuickMergesort as our
stereotypical instantiation of QuickXsort. The proof of the transfer spreads over Sections 7
and 8. In Section 9, we apply our result to QuickHeapsort and QuickMergesort and discuss
some algorithmic implications.
2. Previous Work
The idea to combine Quicksort and a secondary sorting method was suggested by Contone and
Cincotti [
2
,
1
]. They study Heapsort with an output buffer (external Heapsort),
3
and combine
it with Quicksort to QuickHeapsort. They analyze the average costs for external Heapsort in
isolation and use a differencing trick for dealing with the QuickXsort recurrence; however, this
technique is hard to generalize to median-of-kpivots.
Diekert and Weiß [
3
] suggest optimizations for QuickHeapsort (some of which need extra
space again), and they give better upper bounds for QuickHeapsort with random pivots and
median-of-3. Their results are still not tight since they upper bound the total cost of all
Heapsort calls together (using ad hoc arguments on the form of the costs for one Heapsort
round), without taking the actual subproblem sizes into account that Heapsort is used on. In
particular, their bound on the overall contribution of the Heapsort calls does not depend on
the sampling strategy.
Edelkamp and Weiß [
5
] explicitly describe QuickXsort as a general design pattern and,
among others, consider using Mergesort as ‘X’. They use the median of
n
elements in each
round throughout to guarantee good splits with high probability. They show by induction
2
Edelkamp and Weiß do consider this version of QuickMergesort [
5
], but only analyze it for median-of-
n
pivots. In this case, the behavior coincides with the simpler strategy to always sort the smaller segment by
Mergesort since the segments are of almost equal size with high probability.
3
Not having to store the heap in a consecutive prefix of the array allows to save comparisons over classic
in-place Heapsort: After a delete-max operation, we can fill the gap at the root of the heap by promoting the
largest child and recursively moving the gap down the heap. (We then fill the gap with a
−∞
sentinel value).
That way, each delete-max needs exactly blg nccomparisons.
3. Preliminaries 3
that when X uses at most
nlg n
+
cn
+
o
(
n
)comparisons on average for some constant
c
, the
number of comparisons in QuickXsort is also bounded by
nlg n
+
cn
+
o
(
n
). By combining
QuickMergesort with Ford and Johnson’s MergeInsertion [
8
] for subproblems of logarithmic
size, Edelkamp and Weiß obtained an in-place sorting method that uses on the average a close
to minimal number of comparisons of nlg n1.3999n+o(n).
In a recent follow-up manuscript [
6
], Edelkamp and Weiß investigated the practical per-
formance of QuickXsort and found that a tuned median-of-3 QuickMergesort variant indeed
outperformed the C++ library Quicksort. They also derive an upper bound for the average
costs of their algorithm using an inductive proof; their bound is not tight.
3. Preliminaries
A comprehensive list of used notation is given in Appendix A; we mention the most important
here. We use Iverson’s bracket [
stmt
]to mean 1if
stmt
is true and 0otherwise.
P[E]
denotes
the probability of event
E
,
E[X]
the expectation of random variable
X
. We write
XD
=Y
to
denote equality in distribution.
We heavily use the beta distribution: For
α, β R>0
,
XD
= Beta
(
α, β
)if
X
admits the density
fX
(
z
) =
zα1
(1
z
)
β1/
B(
α, β
)where B(
α, β
) =
R1
0zα1
(1
z
)
β1dz
is the beta function.
Moreover, we use the beta-binomial distribution, which is a conditional binomial distribution
with the success probability being a beta-distributed random variable. If
XD
= BetaBin
(
n, α, β
)
then
P[X
=
i]
=
n
i
B(
α
+
i, β
+ (
ni
))
/
B(
α, β
). For a collection of its properties see [
23
],
Section 2.4.7; one property that we use here is a local limit law showing that the normalized
beta-binomial distribution converges to the beta distribution. It is reproduced as Lemma C.1
in the appendix.
For solving recurrences, we build upon Roura’s master theorems [
20
]. The relevant continuous
master theorem is restated in the appendix (Theorem B.1).
4. QuickXsort
Let X be a sorting method that requires buffer space for storing at most
bαnc
elements (for
α
[0
,
1]) to sort
n
elements. The buffer may only be accessed by swaps so that once X
has finished its work, the buffer contains the same elements as before, but in arbitrary order.
Indeed, we will assume that X does not compare any buffer contents; then QuickXsort preserves
randomness: if the original input is a random permutation, so will be the segments after
partitioning and so will be the buffer after X has terminated.4
We can then combine5X with Quicksort as follows: We first randomly choose a pivot and
partition the input around that pivot. This results in two contiguous segments containing the
J1
elements that are smaller than the pivot and the
J2
elements that are larger than the pivot,
respectively. We exclude the space for the pivot, so
J1
+
J2
=
n
1; note that since the rank of
the pivot is random, so are the segment sizes
J1
and
J2
. We then sort one segment by X using
the other segment as a buffer, and afterwards sort the buffer segment recursively by QuickXsort.
To guarantee a sufficiently large buffer for X when it sorts
Jr
(
r
= 1 or 2), we must make
sure that
J3rαJr
. In case both segments could be sorted by X, we use the larger one. The
motivation behind this is that we expect an advantage from reducing the subproblem size for
the recursive call as much as possible.
4We assume in this paper throughout that the input contains pairwise distinct elements.
5
Depending on details of X, further precautions might have to be taken,
e.g.
, in QuickHeapsort [
1
]. We assume
here that those have already been taken care of and solely focus on the analysis of QuickXsort.
4Average Cost of QuickXsort with Pivot Sampling
We consider the practically relevant version of QuickXsort, where we use as pivot the median
of a sample of
k
= 2
t
+ 1 elements, where
tN0
is constant
w.r.t. n
. We think of
t
as a design
parameter of the algorithm that we have to choose. Setting
t
= 0 corresponds to selecting
pivots uniformly at random.
4.1. Recurrence for Expected Costs
Let
c
(
n
)be the expected number of comparisons in QuickXsort on arrays of size
n
and
x
(
n
)
be (an upper bound for) the expected number of comparisons in X. We will assume that
x
(
n
)
fulfills
x(n) = an lg n+bn ±O(n1ε),(n→ ∞),
for constants a,band ε(0,1].
For
α <
1, we obtain two cases: When the split induced by the pivot is “uneven” – namely
when
min{J1, J2}< α max{J1, J2}
,
i.e.
,
max{J1, J2}>n1
1+α
– the smaller segment is not large
enough to be used as buffer. Then we can only assign the large segment as a buffer and run X
on the smaller segment. If however the split is about “even”,
i.e.
, both segments are
n1
1+α
we
can sort the larger of the two segments by X. These cases also show up in the recurrence of
costs:
c(n) = b(n)0,(nk)
c(n) = (nk) + b(k) + Eh[J1, J21
1+α(n1)][J1> J2](x(J1) + c(J2))i
+Eh[J1, J21
1+α(n1)][J1J2](x(J2) + c(J1))i
+Eh[J2>1
1+α(n1)](x(J1) + c(J2))i
+Eh[J1>1
1+α(n1)](x(J2) + c(J1))i(n2)
=
2
X
r=1
E[Ar(Jr)c(Jr)] + t(n)(1)
where
A1(J) = [J, J01
1+α(n1)] ·[JJ0] + [J > 1
1+α(n1)] with J0= (n1) J
A2(J) = [J, J01
1+α(n1)] ·[J < J0] + [J > 1
1+α(n1)]
t(n) = (n1) + E[A2(J2)x(J1)] + E[A1(J1)x(J2)]
The expectation here is taken over the choice for the random pivot,
i.e.
, over the segment
sizes
J1
resp.
J2
. Note that we use both
J1
and
J2
to express the conditions in a convenient
form, but actually either one is fully determined by the other via
J1
+
J2
=
n
1. Note how
A1
and
A2
change roles in recursive calls and toll functions, since we always sort one segment
recursively and the other segment by X.
The base cases
b
(
n
)are the costs to sort inputs that are too small to sample
k
elements.
A practical choice is be to switch to Insertionsort for these, which is also used for sorting the
samples. Unlike for Quicksort itself,
b
(
n
)only influences the logarithmic term of costs (for
constant
k
). For our asymptotic transfer theorem, we only assume
b
(
n
)
0, the actual values
are immaterial.
5. The Transfer Theorem 5
Distribution of Subproblem Sizes.
If pivots are chosen as the median of a random sample
of size
k
= 2
t
+ 1, the subproblem sizes have the same distribution,
J1D
=J2
. Without pivot
sampling, we have
J1D
=U
[0
..n
1], a discrete uniform distribution. If we choose pivots as
medians of a sample of
k
= 2
t
+ 1 elements, the value for
J1
consists of two summands:
J1
=
t
+
I1
. The first summand,
t
, accounts for the part of the sample that is smaller than
the pivot. Those
t
elements do not take part in the partitioning round (but they have to be
included in the subproblem).
I1
is the number of elements that turned out to be smaller than
the pivot during partitioning.
This latter number
I1
is random, and its distribution is
I1D
= BetaBin
(
nk, t
+ 1
, t
+ 1), a
so-called beta-binomial distribution. The connection to the beta distribution is best seen by
assuming
n
independent and uniformly in (0
,
1) distributed reals as input. They are almost
surely pairwise distinct and their relative ranking is equivalent to a random permutation of [
n
],
so this assumption is
w.l.o.g.
for our analysis. Then, the value
P
of the pivot in the first
partitioning step has a
Beta
(
t
+ 1
, t
+ 1) distribution by definition.Conditional on that value
P
=
p
,
I1D
= Bin
(
nk, p
)has a binomial distribution; the resulting mixture is the so-called
beta-binomial distribution.
For
t
= 0,
i.e.
, no sampling, we have
t
+
BetaBin
(
nk, t
+ 1
, t
+ 1) =
BetaBin
(
n
1
,
1
,
1),
so we recover the uniform case U[0..n 1].
5. The Transfer Theorem
We now state the main result of the paper: an asymptotic approximation for c(n).
Theorem 5.1 (Total Cost of QuickXsort):
The expected number of comparisons needed
to sort a random permutation with QuickXsort using median-of-
k
pivots,
k
= 2
t
+ 1, and a
sorting method X that needs a buffer of
bαnc
elements for some constant
α
[0
,
1] to sort
n
elements and requires on average
x
(
n
) =
an lg n
+
bn ±O
(
n1ε
)comparisons to do so as
n→ ∞
for some ε(0,1] is
c(n) = an lg n+1
Ha·Hk+1 Ht+1
Hln 2 +b·n±O(n1ε+ log n),
where
H=I0,α
1+α(t+ 2, t + 1) + I1
2,1
1+α
(t+ 2, t + 1)
is the expected relative subproblem size that is sorted by X.
Here Ix,y(α, β)is the regularized incomplete beta function
Ix,y(α, β) = Zy
x
zα1(1 z)β1
B(α, β)dz, (α, β R+,0xy1).
We prove Theorem 5.1 in Sections 7 and 8. To simplify the presentation, we will restrict
ourselves to a stereotypical algorithm for X and its value
α
=
1
2
; the given arguments, however,
immediately extend to the general statement above.
6. QuickMergesort
A natural candidate for X is Mergesort: It is comparison-optimal up to the linear term (and
quite close to optimal in the linear term), and needs a
Θ
(
n
)-element-size buffer for practical
6Average Cost of QuickXsort with Pivot Sampling
implementations of merging.6
To be usable in QuickXsort, we use a swap-based merge procedure as given in Algorithm 1.
Note that it suffices to move the smaller of the two runs to a buffer; we use a symmetric
Merge(A[`..r], m, B[b..e])
// Merges runs A[`, m 1] and A[m..r]in-place into A[l..r]using scratch space B[b..e]
1n1=r`+ 1;n2=r`+ 1
// Assumes A[`, m 1] and A[m..r]are sorted, n1n2and n1eb+ 1.
2for i= 0, . . . , n11do Swap(A[`+i], B[b+i]) end for
3i1=b;i2=m;o=`
4while i1< b +n1and i2r
5if B[i1]A[i2]then Swap(A[o], B[i2]);o=o+ 1;i1=i1+ 1
6else Swap(A[o], A[i1]);o=o+ 1;i2=i2+ 1 end if
7while i1< b +n1do Swap(A[o], B[i2]);o=o+ 1;i1=i1+ 1 end while
Algorithm 1.:
A simple merging procedure that uses the buffer only by swaps. We move the first
run
A
[
`..r
]into the buffer
B
[
b..b
+
n1
1] and then merge it with the second run
A
[
m..r
](still stored in the original array) into the empty slot left by the first run.
By the time this first half is filled, we either have consumed enough of the second
run to have space to grow the merged result, or the merging was trivial,
i.e.
, all
elements in the first run were smaller.
version of Algorithm 1 when the second run is shorter. Using classical top-down or bottom-up
Mergesort as described in any algorithms textbook (e. g. [22]), we thus get along with α=1
2.
6.1. Average Case of Mergesort
The average number of comparisons for Mergesort has the same – optimal – leading term
nlg n
as in the worst and best case; and this is true for both the top-down and bottom-up variants.
The coefficient of the linear term of the asymptotic expansion, though, is not a constant, but a
bounded periodic function with period
lg n
, and the functions differ for best, worst, and average
case and the variants of Mergesort [21, 7, 17, 10, 11].
In this paper, we will confine ourselves to an upper bound for the average case
x
(
n
) =
an lg n
+
bn ±O
(
n1ε
)with constant
b
valid for all
n
, so we will set
b
to the supremum of
the periodic function. We leave the interesting challenge open to trace the precise behavior of
the fluctuations through the recurrence, where Mergesort is used on a logarithmic number of
subproblems with random sizes.
We use the following upper bounds for top-down [11] and bottom-up [17] Mergesort7
xtd(n) = nlg n1.24n+ 2 and (2)
xbu(n) = nlg n0.26n±O(1).(3)
6
Merging can be done in place using more advanced tricks (see,
e.g.
, [
15
]), but those tend not to be competitive
in terms of running time with other sorting methods. By changing the global structure, a pure in-place
Mergesort variant [
13
] can be achieved using part of the input as a buffer (as in QuickMergesort) at the
expense of occasionally having to merge runs of very different lengths.
7
Edelkamp and Weiß [
5
] use
x
(
n
) =
nlg n
1
.
26
n±o
(
n
); Knuth [
14
, 5.2.4–13] derived this formula for
n
a
power of 2(a general analysis is sketched, but no closed result for general
n
is given). Flajolet and Golin [
7
]
and Hwang [
11
] continued the analysis in more detail; they find that the average number of comparisons is
nlg n(1.25 ±0.01)n±O(1), where the linear term oscillates in the given range.
7. Solving the Recurrence: Leading Term 7
7. Solving the Recurrence: Leading Term
We start with Equation
(1)
. Since
α
=
1
2
for our Mergesort, we have
α
1+α
=
1
3
and
1
1+α
=
2
3
.
(The following arguments are valid for general
α
, including the extreme case
α
= 1, but in an
attempt to de-clutter the presentation, we stick to
α
=
1
2
here.) We rewrite
A1
(
J1
)and
A2
(
J2
)
explicitly in terms of the relative subproblem size:
A1(J1) = J1
n1h1
3,1
2i2
3,1i,
A2(J2) = J2
n1h1
3,1
22
3,1i.
Graphically, if we view
J1/
(
n
1) as a point in the unit interval, the following picture shows
which subproblem is sorted recursively; (the other subproblem is sorted by Mergesort).
A2= 1
A1= 1
A2= 1
A1= 1
01/3 1/2 2/31
Obviously, we have
A1
+
A2
= 1 for any choice of
J1
, which corresponds to having exactly one
recursive call in QuickMergesort.
7.1. The Shape Function
The expectations
E[Ar
(
Jr
)
c
(
Jr
)
]
in Equation
(1)
are actually finite sums over the values
0
, . . . , n
1that
J:=J1
can attain. Recall that
J2
=
n
1
J1
and
A1
(
J1
) +
A2
(
J2
) = 1 for
any value of J. With J=J1D
=J2, we find
2
X
r=1
E[Ar(Jr)c(Jr)] = E"J
n1h1
3,1
2i2
3,1i·c(J)#
+E"J
n1h1
3,1
22
3,1i·c(J)#
=
n1
X
j=0
wn,j ·c(j),where
wn,j =P[J=j]·hj
n1[1
3,1
2](2
3,1]i
+P[J=j]·hj
n1[1
3,1
2)(2
3,1]i
=
2·P[J=j]if j
n1[1
3,1
2)(2
3,1]
1·P[J=j]if j
n1=1
2
0otherwise
We thus have a recurrence of the form required by the Roura’s continuous master theorem
(CMT) (see Theorem B.1 in Appendix B) with the weights
wn,j
from above (Figure 1 shows an
example how these weights look like).
It remains to determine
P[J
=
j]
. Recall that we choose the pivot as the median of
k
= 2
t
+ 1 elements for a fixed constant
tN0
, and the subproblem size
J
fulfills
J
=
t
+
I
8Average Cost of QuickXsort with Pivot Sampling
20 40 60 80 100
0.005
0.010
0.015
0.020
0.025
0.030
Figure 1: The weights wn,j for n= 101,t= 1; note the singular point at j= 50.
with ID
= BetaBin(nk, t + 1, t + 1). So we have for i[0, n 1t]by definition
P[I=i] = nk
i!Bi+t+ 1,(nki)) + t+ 1
B(t+ 1, t + 1)
= nk
i!(t+ 1)i(t+ 1)nki
(k+ 1)nk
(For details, see [
23
, Section 2.4.7].) Now the local limit law for beta binomials (Lemma C.1
in Appendix C says that the normalized beta binomial
I/n
converges to a beta variable “in
density”, and the convergence is uniform. With the beta density
fP
(
z
) =
zt
(1
z
)
t/
B(
t
+1
, t
+1),
we thus find by Lemma C.1 that
P[J=j] = P[I=jt] = 1
nfP(j/n)±O(n2),(n→ ∞).
The shift by the small constant
t
from (
jt
)
/n
to
j/n
only changes the function value by
O
(
n1
)since
fP
is Lipschitz continuous on [0
,
1]. (Details of that calculation are also given in
[23], page 208.)
The first step towards applying the CMT is to identify a shape function
w
(
z
)that approxi-
mates the relative subproblem size probabilities
w
(
z
)
nwn,bznc
for large
n
. With the above
observation, a natural choice is
w(z) = 2 1
3< z < 1
2z > 2
3zt(1 z)t
B(t+ 1, t + 1) .(4)
We show in Appendix D that this is indeed a suitable shape function,
i.e.
, it fulfills Equation
(11)
from the CMT.
7.2. Computing the Toll Function
The next step in applying the CMT is a leading-term approximation of the toll function. We
consider a general function
x
(
n
) =
an lg n
+
bn ±O
(
n1ε
)where the error term holds for any
constant ε > 0as n→ ∞. We start with the simple observation that
Jlg J=Jlg(J
n) + lg n
=n·J
nlg J
n+J
nlg n
=J
nnlg n+J
nlgJ
nn. (5)
=J
nnlg n±O(n).(6)
7. Solving the Recurrence: Leading Term 9
For the leading term of
E[x
(
J
)
]
, we thus only have to compute the expectation of
J/n
, which is
essentially a relative subproblem size. In
t
(
n
), we also have to deal with the conditionals
A1
(
J
)
resp.
A2
(
J
), though. By approximating
J
n
with a beta distributed variable, the conditionals
translate to bounds of an integral. Details are given in Lemma E.1 (see Appendix E). This
yields
t(n) = n1 + E[A2(J2)x(J1)] + E[A1(J1)x(J2)]
=aE[A2(J2)J1lg J1] + aE[A1(J1)J2lg J2)] ±O(n)
=
Lemma E.1(a) 2a·t+ 1
2t+ 2 ·I0,1
3(t+ 2, t + 1) + I1
2,2
3(t+ 2, t + 1)·nlg n±O(n)
=aI0,1
3(t+ 2, t + 1) + I1
2,2
3(t+ 2, t + 1)
| {z }
¯a
·nlg n±O(n),(n→ ∞).(7)
Here we use the incomplete regularized beta function
Ix,y(α, β) = Zy
x
zα1(1 z)β1
B(α, β)dz, (α, β R+,0xy1)
for concise notation. (
Ix,y
(
α, β
)is the probability that a
Beta
(
α, β
)distributed random variable
falls into (x, y)[0,1], and I0,x(α, β )is its cumulative distribution function.)
7.3. Which Case of the CMT?
We are now ready to apply the CMT (Theorem B.1). As shown in Section 7.2, our toll function
is Θ(nlog n), so we have α= 1 and β= 1. We hence compute
H= 1 Z1
0
z w(z)dz
= 1 Z1
0
21
3< z < 1
2z > 2
3zt+1(1 z)t
B(t+ 1, t + 1) dz
= 1 2t+ 1
k+ 1 Z1
01
3< z < 1
2z > 2
3zt+1(1 z)t
B(t+ 2, t + 1) dz
= 1 I1
3,1
2(t+ 2, t + 1) + I2
3,1(t+ 2, t + 1)
=I0,1
3(t+ 2, t + 1) + I1
2,2
3(t+ 2, t + 1) (8)
For any sampling parameters, we have
H >
0, so the overall costs satisfy by Case 1 of
Theorem B.1
c(n)t(n)
H¯an lg n
H,(n→ ∞).(9)
7.4. Cancellations
Combining Equations (7) and (9), we find
c(n)an lg n, (n→ ∞);
since
I0,1
3
+
I1
3,1
2
+
I1
2,2
3
+
I2
3,1
= 1. The leading term of the number of comparisons in QuickXsort
is the same as in X itself, regardless of how the pivot elements are chosen! This is not as
surprising as it might first seem. We are typically sorting a constant fraction of the input
10 Average Cost of QuickXsort with Pivot Sampling
by X and thus only do a logarithmic number of recursive calls on a geometrically decreasing
number of elements, so the linear contribution of Quicksort (partitioning and recursion cost) is
dominated by even the first call of X, which has linearithmic cost. This remains true even if
we allow asymmetric sampling,
e.g.
, by choosing the pivot as the smallest (or any other order
statistic) of a random sample.
Edelkamp and Weiß [
5
] give the above result for the case of using the median of
n
elements,
where we effectively have exact medians from the perspective of analysis. In this case, the
informal reasoning given above is precise, and in fact, in this case the same form of cancellations
also happen for the linear term [
5
, Thm. 1]. (See also the “exact ranks” result in Section 9.) We
will show in the following that for practical schemes of pivot sampling,
i.e.
, with fixed sample
sizes, these cancellations happen only for the leadings-term approximation. The pivot sampling
scheme does affect the linear term significantly; and to measure the benefit of sampling, the
analysis thus has to continue to the next term of the asymptotic expansion of c(n).
Relative Subproblem Sizes.
The integral
R1
0zw
(
z
)
dz
is precisely the expected relative sub-
problem size for the recursive call, whereas for
t
(
n
)we are interested in the subproblem that is
sorted using X whose relative size is given by
R1
0
(1
z
)
w
(
z
)
dz
= 1
R1
0zw
(
z
)
dz
. We can thus
write ¯a=aH.
20 40 60 80 100
0.50
0.55
0.60
0.65
0.70
Figure 2: R1
0zw(z)dz, the relative recursive subproblem size, as a function of t.
The quantity
R1
0zw
(
z
)
dz
, the average relative size of the recursive call is of independent
interest. While it is intuitively clear that for
t→ ∞
,
i.e.
, the case of exact medians as pivots,
we must have a relative subproblem size of exactly
1
2
, this convergence is not apparent from
the behavior for finite
t
: the mass of the integral
R1
0zw
(
z
)
dz
concentrates at
z
=
1
2
, a point of
discontinuity in
w
(
z
). It is also worthy of note that the expected subproblem size is initially
larger than
1
2
(0
.
69
4
for
t
= 0), then decreases to
0
.
449124 around
t
= 20 and then starts to
slowly increase again (see Figure 2).
8. Solving the Recurrence: The Linear Term
Since
c
(
n
)
an lg n
for any choice of
t
, the leading term alone does not allow to make distinctions
to judge the effect of sampling schemes. To compute the next term in the asymptotic expansion
of
c
(
n
), we consider the values
c0
(
n
) =
c
(
n
)
an lg n
.
c0
(
n
)has essentially the same recursive
8. Solving the Recurrence: The Linear Term 11
structure as c(n), only with a different toll function:
c0(n) = c(n)an lg n
=
2
X
r=1
EAr(Jr)c(Jr)an lg n+t(n)
=
2
X
r=1EhAr(Jr)c(Jr)aJrlg Jri+aEAr(Jr)Jrlg Jran lg n
+ (n1) + EA2(J2)·x(J1)+EA1(J1)·x(J2)
=
2
X
r=1
EhAr(Jr)c0(Jr)i+ (n1) an lg n
+aEhA1(J1) + A2(J2)J1lg J1i+bE[A2(J2)J1]
+aEhA2(J2) + A1(J1)J2lg J2i+bE[A1(J1)J2]±O(n1ε)
Since J1D
=J2we can simplify
EhA1(J1) + A2(J2)J1lg J1i+EhA2(J2) + A1(J1)J2lg J2i
=EhA1(J1) + A2(J2)J1lg J1i+EhA2(J1) + A1(J2)J1lg J1i
=EhJ1lg J1·A1(J1) + A1(J2)+A2(J1) + A2(J2)i
= 2E[Jlg J]
=
(5) 2E[J
n]·nlg n+ 2 ·1
ln 2 E[J
nln J
n]·n
=
Lemma E.1(b)
nlg n1
ln 2 Hk+1 Ht+1n±O(n1ε).
Plugging this back into our equation for c0(n), we find
c0(n) =
2
X
r=1
EhAr(Jr)c0(Jr)i+ (n1) an lg n
+anlg n1
ln 2 Hk+1 Ht+1n
+bI0,1
3(t+ 2, t + 1) + I1
2,2
3(t+ 2, t + 1)·n±O(n1ε)
=
2
X
r=1
EhAr(Jr)c0(Jr)i+t0(n)
where
t0(n) = b0n±O(n1ε)
b0= 1 a
ln 2 Hk+1 Ht+1+b·H
Apart from the smaller toll function
t0
(
n
), this recurrence has the very same shape as the
original recurrence for
c
(
n
); in particular, we obtain the same shape function
w
(
z
)and the
same H > 0and obtain
c0(n)t0(n)
Hb0n
H.
12 Average Cost of QuickXsort with Pivot Sampling
8.1. Error Bound
Since our toll function is not given precisely, but only up to an error term
O
(
n1ε
)for a
given fixed
ε
(0
,
1], we also have to estimate the overall influence of this term. For that
we consider the recurrence for
c
(
n
)again, but replace
t
(
n
)(entirely) by
C·n1ε
. If
ε >
0,
R1
0z1εw
(
z
)
dz < R1
0w
(
z
)
dz
= 1, so we still find
H >
0and apply case 1 of the CMT. The
overall contribution of the error term is then
O
(
n1ε
). For
ε
= 0,
H
= 0 and case 2 applies,
giving an overall error term of O(log n).
This completes the proof of Theorem 5.1.
9. Discussion
Since all our choices for X are leading-term optimal, so will QuickXsort be. We can thus fix
a
= 1 in Theorem 5.1; only
b
(and the allowable
α
) still depend on X. We then basically find
that going from X to QuickXsort adds a “penalty”
q
in the linear term that depends only on the
sampling size (and
α
), but not on X. Table 1 shows that this penalty is
n
without sampling,
but can be reduced drastically when choosing pivots from a sample of 3or 5elements. (Note
that the overall costs for pivot sampling are O(log n)for constant t.)
t= 0 t= 1 t= 2 t= 3 t= 10 t→ ∞
α= 1 1.1146 0.5070 0.3210 0.2328 0.07705 0
α=1
20.9120 0.4050 0.2526 0.1815 0.05956 0
Table 1:
QuickXsort penalty. QuickXsort with
x
(
n
) =
nlg n
+
bn
yields
c
(
n
) =
nlg n
+ (
q
+
b
)
n
where q, the QuickXsort penalty, is given in the table.
As we increase the sample size, we converge to the situation studied by Edelkamp and
Weiß using median-of-
n
, where no linear-term penalty is left [
5
]. Given that
q
is less than
0
.
08 already for a sample of 21 elements, these large sample versions are mostly of theoretical
interest. It is noteworthy that the improvement from no sampling to median-of-3 yields a
reduction of
q
by more than 50%, which is much more than its effect on Quicksort itself (where
it reduces the leading term of costs by 15 % from 2nln nto 12
7nln n).
We now apply our transfer theorem to the two most well-studied choices for X, Heapsort
and Mergesort, and compare the results to analyses and measured comparison counts from
previous work. The results confirm that solving the QuickXsort recurrence exactly yields much
more accurate predictions for the overall number of comparisons than previous bounds that
circumvented this.
9.1. QuickHeapsort
The basic external Heapsort of Cantone and Cincotti [
1
] always traverses one path in the heap
from root to bottom and does one comparison for each edge followed,
i.e.
,
blg nc
or
blg nc −
1
many per deleteMax. By counting how many leaves we have on each level, Diekert and Weiß
found [3, Eq. 1]
nblg nc − 1+ 2n2blg nc±O(log n)nlg n0.913929n±O(log n)
comparisons for the sort-down phase. (The constant of the linear term is 1
1
ln 2 lg
(2
ln
2), the
supremum of the periodic function at the linear term). Using the classical heap construction
9. Discussion 13
method adds on average 1.8813726ncomparisons [4], so here
x(n) = nlg n+ 0.967444n±O(nε)
for any ε > 0.
Both [
1
] and [
3
] report averaged comparison counts from running time experiments. We
compare them in Table 2 against the estimates from our result and previous analyses. While
the approximation is not very accurate for
n
= 100 (for all analyses), for larger
n
, our estimate
is correct up to the first three digits, whereas previous upper bounds have almost one order of
magnitude bigger errors. Note that it is expected for our bound to still be on the conservative
side since we used the supremum of the periodic linear term for Heapsort.
Instance observed W CC DW
Fig. 4 [1], n= 102,k= 1 806 +67 +158 +156
Fig. 4 [1], n= 102,k= 3 714 +98 +168
Fig. 4 [1], n= 105,k= 1 1 869 769 600 +90 795 +88 795
Fig. 4 [1], n= 105,k= 3 1 799 240 +9 165 +79 324
Fig. 4 [1], n= 106,k= 1 21 891 874 +121 748 +1 035 695 +1 015 695
Fig. 4 [1], n= 106,k= 3 21 355 988 +49 994 +751 581
Tab. 2 [3], n= 104,k= 1 152 573 +1 125 +10 264 +10 064
Tab. 2 [3], n= 104,k= 3 146 485 +1 136 +8 152
Tab. 2 [3], n= 106,k= 1 21 975 912 +37 710 +951 657 +931 657
Tab. 2 [3], n= 106,k= 3 21 327 478 +78 504 +780 091
Table 2:
Comparison of estimates from this paper (W), Theorem 6 of [
1
] (CC) and Theorem 1
of [3] (DW); shown is the difference between the estimate and the observed average.
9.2. QuickMergesort
For QuickMergesort, Edelkamp and Weiß [
5
, Fig. 4] report measured average comparison counts
for a median-of-3 version using top-down Mergesort: the linear term is shown to be between
0
.
8
n
and
0
.
9
n
. In a recent manuscript [
6
], they also analytically consider the simplified
median-of-3 QuickMergesort which always sorts the smaller segment by Mergesort (
i.e.
,
α
= 1).
It uses
nlg n
0
.
7330
n
+
o
(
n
)comparisons on average (using
b
=
1
.
24). They use this as a
(conservative) upper bound for the original QuickMergesort.
Our transfer theorem shows that this bound is off by roughly 0
.
1
n
: median-of-3 Quick-
Mergesort uses at most
c
(
n
) =
nlg n
0
.
8350
n±O
(
log n
)comparisons on average. Going
to median-of-5 reduces the linear term to
0
.
9874
n
, which is better than the worst-case for
top-down Mergesort for most n.
Skewed Pivots for Mergesort?
For Mergesort with
α
=
1
2
the largest fraction of elements
we can sort by Mergesort in one step is
2
3
; this suggests that using a slightly skewed pivot
might be beneficial since it will increase the subproblem size for Mergesort and decrease the
size for recursive calls. Indeed, Edelkamp and Weiß allude to this variation: “With about
15% the time gap, however, is not overly big, and may be bridged with additional efforts like
skewed pivots and refined partitioning. (the statement appears in the arXiv version of [
5
],
arxiv.org/abs/1307.3033
). And the above mentioned StackExchange post actually chooses
pivots as the second tertile.
Our analysis above can be extended to skewed sampling schemes (omitted due to space
constraints), but to illustrate this point it suffices to pay a short visit to “wishful-thinking land”
14 Average Cost of QuickXsort with Pivot Sampling
and assume that we can get exact quantiles for free. We can show (
e.g.
, with Roura’s discrete
master theorem [
20
]) that if we always pick the exact
ρ
-quantile of the input, for
ρ
(0
,
1), the
overall costs are
cρ(n) =
nlg n+1 + h(ρ)
1ρ+bn±O(n1ε)if ρ(1
3,1
2)(2
3,1)
nlg n+1 + h(ρ)
ρ+bn±O(n1ε)otherwise
for
h
(
x
) =
xlg x
+ (1
x
)
lg
(1
x
). The coefficient of the linear term has a strict minimum at
ρ
=
1
2
: Even for
α
=
1
2
, the best choice is to use the median of a sample. (The result is the
same for fixed-size samples.) For QuickMergesort, skewed pivots turn out to be a pessimization,
despite the fact that we sort a larger part by Mergesort. A possible explanation is that skewed
pivots significantly decrease the amount of information we obtain from the comparisons during
partitioning, but do not make partitioning any cheaper.
9.3. Future Work
More promising than skewed pivot sampling is the use of several pivots. The resulting
MultiwayQuickXsort would be able to sort all but one segment using X and recurse on only
one subproblem. Here, determining the expected subproblem sizes becomes a challenge, in
particular for α < 1; we leave this for future work.
We also confined ourselves to the expected number of comparisons here, but more details
about the distribution of costs are possible to obtain. The variance follows a similar recurrence
as the one studied in this paper and a distributional recurrence for the costs can be given. The
discontinuities in the subproblem sizes add a new facet to these analyses.
Finally, it is a typical phenomenon that constant-factor optimal sorting methods exhibit
periodic linear terms. QuickXsort inherits these fluctuations but smooths them through the
random subproblem sizes. Explicitly accounting for these effects is another interesting challenge
for future work.
Acknowledgements.
I would like to thank three anonymous referees for many helpful com-
ments, references and suggestions that helped improve the presentation of this paper.
A. Notation 15
A. Notation
A.1. Generic Mathematics
N,N0,Z,R. . . . . . . . . natural numbers N={1,2,3, . . .},N0=N∪ {0}, integers
Z={. . . , 2,1,0,1,2, . . .}, real numbers R.
R>1,N3etc. . . . . . . . restricted sets Xpred ={xX:xfulfills pred}.
0.3. . . . . . . . . . . . . . . . .repeating decimal; 0.3=0.333 . . . =1
3;
numerals under the line form the repeated part of the decimal number.
ln(n),lg(n). . . . . . . . . . natural and binary logarithm; ln(n) = loge(n),lg(n) = log2(n).
X. . . . . . . . . . . . . . . . . .to emphasize that Xis a random variable it is Capitalized.
[a, b)...............
real intervals, the end points with round parentheses are excluded, those with
square brackets are included.
[m..n],[n]. . . . . . . . . . . integer intervals, [m..n] = {m, m + 1, . . . , n};[n] = [1..n].
[stmt],[x=y]. . . . . . . Iverson bracket, [stmt] = 1 if stmt is true, [stmt]=0otherwise.
Hn.................nth harmonic number; Hn=Pn
i=1 1/i.
x±y...............xwith absolute error |y|; formally the interval x±y= [x− |y|, x +|y|]; as
with O-terms, we use one-way equalities z=x±yinstead of zx±y.
B(α, β). . . . . . . . . . . . . the beta function, B(α, β ) = R1
0zα1(1 z)β1dz
Ix,y(α, β). . . . . . . . . . . the regularized incomplete beta function; Ix,y (α, β ) = Ry
x
zα1(1z)β1
B(α,β)dz for
α, β R+,0xy1.
ab,ab. . . . . . . . . . . . . . .factorial powers; “ato the bfalling resp. rising.
A.2. Stochastics-related Notation
P[E],P[X=x]. . . . . . probability of an event Eresp. probability for random variable Xto attain
value x.
E[X]. . . . . . . . . . . . . . . expected value of X; we write E[X|Y]for the conditional expectation of X
given
Y
, and
EX[f
(
X
)
]
to emphasize that expectation is taken
w.r.t.
random
variable X.
XD
=Y. . . . . . . . . . . . . equality in distribution; Xand Yhave the same distribution.
U(a, b). . . . . . . . . . . . . .uniformly in (a, b)Rdistributed random variable.
Beta(α, β). . . . . . . . . . Beta distributed random variable with shape parameters αR>0and
βR>0.
Bin(n, p). . . . . . . . . . . . binomial distributed random variable with nN0trials and success
probability p[0,1].
BetaBin(n, α, β). . . . . beta-binomial distributed random variable; nN0,α, β R>0;
A.3. Notation for the Algorithm
n. . . . . . . . . . . . . . . . . . length of the input array, i.e., the input size.
k,t. . . . . . . . . . . . . . . . sample size kN1, odd; k= 2t+ 1,tN0.
x(n),a,b. . . . . . . . . . . Average costs of X, x(n) = an lg n+bn ±O(n1ε).
t(n),¯a,¯
b. . . . . . . . . . . .toll function t(n) = ¯an lg n+¯
bn ±O(n1ε)
J1,J2. . . . . . . . . . . . . . (random) subproblem sizes; J1+J2=n1;J1=t+I1;
I1,I2. . . . . . . . . . . . . . .(random) segment sizes in partitioning; I1
D
= BetaBin(nk, t + 1, t + 1);
I2=nkI1;J1=t+I1
16 Average Cost of QuickXsort with Pivot Sampling
B. The Continuous Master Theorem
We restate Roura’s CMT here for convenience.
Theorem B.1 (Roura’s Continuous Master Theorem (CMT)):
Let
Fn
be recursively
defined by
Fn=
bn,for 0n<N;
tn+
n1
X
j=0
wn,j Fj,for nN,(10)
where
tn
, the toll function, satisfies
tnKnαlogβ
(
n
)as
n→ ∞
for constants
K6
= 0,
α
0
and
β >
1. Assume there exists a function
w
: [0
,
1]
R0
, the shape function, with
R1
0w(z)dz 1and
n1
X
j=0 wn,j Z(j+1)/n
j/n
w(z)dz=O(nd),(n→ ∞),(11)
for a constant d > 0. With H:= 1 Z1
0
zαw(z)dz, we have the following cases:
1. If H > 0, then Fntn
H.
2. If H= 0, then Fntnln n
˜
Hwith ˜
H=(β+ 1) Z1
0
zαln(z)w(z)dz.
3. If H < 0, then Fn=O(nc)for the unique cRwith Z1
0
zcw(z)dz = 1.
Theorem B.1 is the “reduced form” of the CMT, which appears as Theorem 1.3.2 in Roura’s
doctoral thesis [
19
], and as Theorem 18 of [
16
]. The full version (Theorem 3.3 in [
20
]) allows us
to handle sublogarithmic factors in the toll function, as well, which we do not need here.
C. Local Limit Law for the Beta-Binomial Distribution
Since the binomial distribution is sharply concentrated, one can use Chernoff bounds on beta-
binomial variables after conditioning on the beta distributed success probability. That already
implies that
BetaBin
(
n, α, β
)
/n
converges to
Beta
(
α, β
)(in a specific sense). We can obtain
stronger error bounds, though, by directly comparing the PDFs. Doing that gives the following
result; a detailed proof is given in [23], Lemma 2.38.
Lemma C.1 (Local Limit Law for Beta-Binomial, [23], Lemma 2.38):
Let (
I(n)
)
nN1
be a family of random variables with beta-binomial distribution,
I(n)D
=
BetaBin
(
n, α, β
)where
α, β ∈ {
1
} ∪ R2
, and let
fB
(
z
)be the density of the
Beta
(
α, β
)
distribution. Then we have uniformly in z(0,1) that
n·PI=bz(n+ 1)c=fB(z)±O(n1),(n→ ∞).
That is,
I(n)/n
converges to
Beta
(
α, β
)in distribution, and the probability weights converge
uniformly to the limiting density at rate O(n1).
D. Smoothness of the Shape Function 17
D. Smoothness of the Shape Function
In this appendix we show that
w
(
z
)as given in Equation
(4)
on page 8 fulfills Equation
(11)
on page 16, the approximation-rate criterion of the CMT. We consider the following ranges for
bznc
n1=j
n1separately:
bznc
n1<1
3and 1
2<bznc
n1<2
3.
Here
wn,bznc
= 0 and so is
w
(
z
). So actual value and approximation are exactly the same.
1
3<bznc
n1<1
2and bznc
n1>2
3.
Here
wn,j
= 2
P[J
=
j]
and
w
(
z
) = 2
fP
(
z
)where
fP
(
z
) =
zt
(1
z
)
t/
B(
t
+ 1
, t
+ 1) is twice
the density of the beta distribution
Beta
(
t
+ 1
, t
+ 1). Since
fP
is Lipschitz-continuous on
the bounded interval [0
,
1] (it is a polynomial) the uniform pointwise convergence from
above is enough to bound the sum of
wn,j R(j+1)/n
j/n w
(
z
)
dz
over all
j
in the range by
O(n1).
bznc
n1∈ {1
3,1
2,2
3}.
At these boundary points, the difference between
wn,bznc
and
w
(
z
)does not vanish (in
particularly
1
2
is a singular point for
wn,bznc
), but the absolute difference is bounded.
Since this case only concerns 3out of
n
summands, the overall contribution to the error
is O(n1).
Together, we find that Equation (11) is fulfilled as claimed:
n1
X
j=0 wn,j Z(j+1)/n
j/n
w(z)dz=O(n1) (n→ ∞).(12)
E. Approximation by (Incomplete) Beta Integrals
Lemma E.1:
Let
JD
= BetaBin
(
nc1, α, β
) +
c2
be a random variable that differs by fixed
constants
c1
and
c2
from a beta-binomial variable with parameters
nN
and
α, β N1
. Then
the following holds
(a) For fixed constants 0xy1holds
E[xn Jyn]·Jlg J=α
α+βIx,y(α+ 1, β)·nlg n±O(n),(n→ ∞).
The result holds also when any or both of the inequalities in [xn Jyn]are strict.
(b) E[J
nln J
n] = α
α+β(HαHα+β)±O(nh)for any h(0,1).
Proof:
We start with part (a). By the local limit law for beta binomials (Lemma C.1) it
is plausible to expect a reasonably small error when we replace
E
[
xn Jyn
]
·Jlg J
by
E
[
xPy
]
·
(
P n
)
lg
(
P n
)
where
PD
= Beta
(
α, β
)is beta distributed. We bound the error in
the following.
We have
E
[
xn Jyn
]
·Jln J=E
[
xn Jyn
]
·J
n·nln n±O
(
n
)by Equation
(5)
;
it thus suffices to compute
E
[
xn Jyn
]
·J
n
. We first replace
J
by
ID
= BetaBin
(
n, α, β
)
18 Average Cost of QuickXsort with Pivot Sampling
and argue later that this results in a sufficiently small error. We expand
E[xI
ny]·I
n=
bync
X
i=dxne
i
n·P[I=i]
=1
n
bync
X
i=dxne
i
n·nP[I=i]
=
Lemma C.1
1
n
bync
X
i=dxne
i
n·(i/n)α1(1 (i/n))β1
B(α, β)±O(n1)
=1
B(α, β)·1
n
bync
X
i=dxne
f(i/n)±O(n1),
where f(z) = zα(1 z)β1.
Note that
f
(
z
)is Lipschitz-continuous on the bounded interval [
x, y
]since it is continuously
differentiable (it is a polynomial). Integrals of Lipschitz functions are well-approximated by
finite Riemann sums; see Lemma 2.12 (b) of [
23
] for a formal statement. We use that on the
sum above
1
n
bync
X
i=dxne
f(i/n) = Zy
x
f(z)dz ±O(n1),(n→ ∞).
Inserting above and using B(α+ 1, β)/B(α, β) = α/(α+β)yields
E[xI
ny]·I
n=Ry
xzα(1 z)β1dz
B(α, β)±O(n1)
=α
α+βIx,y(α+ 1, β)±O(n1); (13)
recall that
Ix,y(α, β) = Zy
x
zα1(1 z)β1
B(α, β)dz =Px<P <y
denotes the regularized incomplete beta function.
Changing from
I
back to
J
has no influence on the given approximation. To compensate
for the difference in the number of trials (
nc1
instead of
n
), we use the above formulas for
with
nc1
instead of
n
; since we let
n
go to infinity anyway, this does not change the result.
Moreover, replacing
I
by
I
+
c2
changes the value of the argument
z
=
I/n
of
f
by
O
(
n1
);
since
f
is smooth, namely Lipschitz-continuous, this also changes
f
(
z
)by at most
O
(
n1
). The
result is thus not affected by more than the given error term:
E[xJ
ny]·J
n=E[xI
ny]·I
n±O(n1)
We obtain the claim by multiplying with nlg n.
Versions with strict inequalities in [
xn Jyn
]only affect the bounds of the sums above
by one, which again gives a negligible error of O(n1).
This concludes the proof of part (a).
E. Approximation by (Incomplete) Beta Integrals 19
For part (b), we follow a similar route. The function we integrate is no longer Lipschitz
continuous, but a weaker form of smoothness is sufficient to bound the difference between the
integral and its Riemann sums. Indeed, the above cited Lemma 2.12 (b) of [
23
] is formulated
for the weaker notion of Hölder-continuity: A function
f
:
IR
defined on a bounded interval
Iis called Hölder-continuous with exponent h(0,1] when
Cx, y I:f(x)f(y)C|xy|h.
This generalizes Lipschitz-continuity (which corresponds to h= 1).
As above, we replace
J
by
ID
= BetaBin
(
n, α, β
), which affects the overall result by
O
(
n1
).
We compute
EI
nln I
n=
n
X
i=0
i
nln i
n·P[I=i]
=
Lemma C.1
1
n
n
X
i=0
i
nln i
n·(i/n)α1(1 (i/n))β1
B(α, β)±O(n1)
=1
B(α, β)·1
n
n
X
i=0
f(i/n)±O(n1),
where now
f
(
z
) =
ln
(1
/z
)
·zα
(1
z
)
β1
. Since the derivative is
for
z
= 0,
f
cannot
be Lipschitz-continuous, but it is Hölder-continuous on [0
,
1] for any exponent
h
(0
,
1):
z7→ ln
(1
/z
)
z
is Hölder-continuous (see,
e.g.
, [
23
], Prop. 2.13.), products of Hölder-continuous
function remain such on bounded intervals and the remaining factor of
f
is a polynomial in
z
,
which is Lipschitz- and hence Hölder-continuous.
By Lemma 2.12 (b) of [23] we then have
1
n
n
X
i=0
f(i/n) = Z1
0
f(z)dz ±O(nh)
Recall that we can choose has close to 1as we wish; this will only affect the constant hidden
by the
O
(
nh
). It remains to actually compute the integral; fortunately, this “logarithmic beta
integral” has a well-known closed form (see, e.g., [23], Eq. (2.30)).
Z1
0
ln(z)·zα(1 z)β1= B(α+ 1, β)HαHα+β
Inserting above, we finally find
E[J
nln J
n] = E[I
nln I
n]±O(n1)
=α
α+βHαHα+β±O(nh)
for any h(0,1).
20 References
References
[1]
D. Cantone and G. Cincotti. Quickheapsort, an efficient mix of classical sorting algorithms.
Theoretical Computer Science, 285(1):25–42, August 2002.
doi:10.1016/S0304-3975(01)
00288-2.
[2]
Domenico Cantone and Gianluca Cincotti. QuickHeapsort, an efficient mix of classical
sorting algorithms. In Italian Conference on Algorithms and Complexity (CIAC), pages
150–162, 2000. doi:10.1007/3-540-46521-9_13.
[3]
Volker Diekert and Armin Weiß. QuickHeapsort: Modifications and improved analysis. The-
ory of Computing Systems, 59(2):209–230, aug 2016. doi:10.1007/s00224-015-9656-y.
[4]
Ernst E. Doberkat. An average case analysis of Floyd’s algorithm to construct heaps. Infor-
mation and Control, 61(2):114–131, May 1984. doi:10.1016/S0019-9958(84)80053-4.
[5]
Stefan Edelkamp and Armin Weiß. QuickXsort: Efficient sorting with
nlog n
1
.
399
n
+
o
(
n
)
comparisons on average. In International Computer Science Symposium in Russia, pages
139–152. Springer, 2014. doi:10.1007/978-3-319-06686-8_11.
[6]
Stefan Edelkamp and Armin Weiß. QuickMergesort: Practically efficient constant-factor
optimal sorting, 2018. arXiv:1804.10062.
[7]
Philippe Flajolet and Mordecai Golin. Mellin transforms and asymptotics. Acta Informatica,
31(7):673–696, July 1994. doi:10.1007/BF01177551.
[8]
Lester R. Ford and Selmer M. Johnson. A tournament problem. The American Mathematical
Monthly, 66(5):387, May 1959. doi:10.2307/2308750.
[9]
Viliam Geffert and Jozef Gajdoš. In-place sorting. In SOFSEM 2011: Theory and Practice
of Computer Science, pages 248–259. Springer, 2011.
doi:10.1007/978-3-642-18381-2_
21.
[10]
Hsien-Kuei Hwang. Limit theorems for mergesort. Random Structures and Algorithms,
8(4):319–336, July 1996.
doi:10.1002/(sici)1098-2418(199607)8:4<319::aid-rsa3>
3.0.co;2-0.
[11]
Hsien-Kuei Hwang. Asymptotic expansions of the mergesort recurrences. Acta Informatica,
35(11):911–919, November 1998. doi:10.1007/s002360050147.
[12]
Jyrki Katajainen. The ultimate heapsort. In Proceedings of the Computing: The 4th
Australasian Theory Symposium, Australian Computer Science Communications, pages
87–96. Springer-Verlag Singapore Pte. Ltd., 1998. URL:
http://www.diku.dk/~jyrki/
Myris/Kat1998C.html.
[13]
Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort. Nordic
Journal of Computing, 3(1):27–40, 1996. URL:
http://www.diku.dk/~jyrki/Myris/
KPT1996J.html.
[14]
Donald E. Knuth. The Art Of Computer Programming: Searching and Sorting. Addison
Wesley, 2nd edition, 1998.
[15]
Heikki Mannila and Esko Ukkonen. A simple linear-time algorithm for in situ merging.
Information Processing Letters, 18(4):203–208, May 1984.
doi:10.1016/0020-0190(84)
90112-1.
References 21
[16]
Conrado Martínez and Salvador Roura. Optimal sampling strategies in Quicksort
and Quickselect. SIAM Journal on Computing, 31(3):683–705, 2001.
doi:10.1137/
S0097539700382108.
[17]
Wolfgang Panny and Helmut Prodinger. Bottom-up mergesort—a detailed analysis.
Algorithmica, 14(4):340–354, October 1995. doi:10.1007/BF01294131.
[18]
Klaus Reinhardt. Sorting in-place with a worst case complexity of
nlog n
1
.
3
n
+
O
(
log n
)
comparisons and
εn log n
+
O
(1) transports. In International Symposium on Algorithms
and Computation (ISAAC), pages 489–498, 1992. doi:10.1007/3-540-56279-6_101.
[19]
Salvador Roura. Divide-and-Conquer Algorithms and Data Structures. Tesi doctoral (Ph. D.
thesis, Universitat Politècnica de Catalunya, 1997.
[20]
Salvador Roura. Improved master theorems for divide-and-conquer recurrences. Journal
of the ACM, 48(2):170–205, 2001. doi:10.1145/375827.375837.
[21]
Robert Sedgewick and Philippe Flajolet. An Introduction to the Analysis of Algorithms.
Addison-Wesley-Longman, 2nd edition, 2013.
[22] Robert Sedgewick and Kevin Wayne. Algorithms. Addison-Wesley, 4th edition, 2011.
[23]
Sebastian Wild. Dual-Pivot Quicksort and Beyond: Analysis of Multiway Partitioning and
Its Practical Potential. Doktorarbeit (
Ph.D.
thesis), Technische Universität Kaiserslautern,
2016. ISBN 978-3-00-054669-3. URL:
http://nbn-resolving.de/urn/resolver.pl?urn:
nbn:de:hbz:386-kluedo-44682.
ResearchGate has not been able to resolve any citations for this publication.
Thesis
Full-text available
Multiway Quicksort, i.e., partitioning the input in one step around several pivots, has received much attention since Java 7’s runtime library uses a new dual-pivot method that outperforms by far the old Quicksort implementation. The success of dual-pivot Quicksort is most likely due to more efficient usage of the memory hierarchy, which gives reason to believe that further improvements are possible with multiway Quicksort. In this dissertation, I conduct a mathematical average-case analysis of multiway Quicksort including the important optimization to choose pivots from a sample of the input. I propose a parametric template algorithm that covers all practically relevant partitioning methods as special cases, and analyze this method in full generality. This allows me to analytically investigate in depth what effect the parameters of the generic Quicksort have on its performance. To model the memory-hierarchy costs, I also analyze the expected number of scanned elements, a measure for the amount of data transferred from memory that is known to also approximate the number of cache misses very well. The analysis unifies previous analyses of particular Quicksort variants under particular cost measures in one generic framework. A main result is that multiway partitioning can reduce the number of scanned elements significantly, while it does not save many key comparisons; this explains why the earlier studies of multiway Quicksort did not find it promising. A highlight of this dissertation is the extension of the analysis to inputs with equal keys. I give the first analysis of Quicksort with pivot sampling and multiway partitioning on an input model with equal keys. Official version: http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:hbz:386-kluedo-44682
Conference Paper
Full-text available
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. We show that up to o(n) terms the average number of comparisons incurred by QuickXsort is equal to the average number of comparisons of X. We also describe a new variant of WeakHeapsort. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort construction. Both are efficient algorithms that perform approximately n logn − 1.26n + o(n) comparisons on average. Moreover, we show that this bound also holds for a slight modification which guarantees an \(n \log n + \mathcal{O}(n)\) bound for the worst case number of comparisons. Finally, we describe an implementation of MergeInsertion and analyze its average case behavior. Taking MergeInsertion as a base case for QuickMergesort, we establish an efficient internal sorting algorithm calling for at most n logn − 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs and is competitive to STL-Introsort.
Conference Paper
Full-text available
We present a new analysis for QuickHeapsort splitting it into the analysis of the partition-phases and the analysis of the heap-phases. This enables us to consider samples of non-constant size for the pivot selection and leads to better theoretical bounds for the algorithm. Furthermore we introduce some modifications of QuickHeapsort, both in-place and using n extra bits. We show that on every input the expected number of comparisons is n lg n - 0.03n + o(n) (in-place) respectively n lg n -0.997 n+ o (n). Both estimates improve the previously known best results. (It is conjectured in Wegener93 that the in-place algorithm Bottom-Up-Heapsort uses at most n lg n + 0.4 n on average and for Weak-Heapsort which uses n extra-bits the average number of comparisons is at most n lg n -0.42n in EdelkampS02.) Moreover, our non-in-place variant can even compete with index based Heapsort variants (e.g. Rank-Heapsort in WangW07) and Relaxed-Weak-Heapsort (n lg n -0.9 n+ o (n) comparisons in the worst case) for which no O(n)-bound on the number of extra bits is known.
Conference Paper
Full-text available
First we present a new variant of Merge-sort, which needs only 1.25n space, because it uses space again, which becomes available within the current stage. It does not need more comparisons than classical Merge-sort. The main result is an easy to implement method of iterating the procedure in-place starting to sort 4/5 of the elements. Hereby we can keep the additional transport costs linear and only very few comparisons get lost, so that n log n–0.8n comparisons are needed. We show that we can improve the number of comparisons if we sort blocks of constant length with Merge-Insertion, before starting the algorithm. Another improvement is to start the iteration with a better version, which needs only (1+)n space and again additional O(n) transports. The result is, that we can improve this theoretically up to n log n –1.3289n comparisons in the worst case. This is close to the theoretical lower bound of n log n–1.443n. The total number of transports in all these versions can be reduced to n log n+O(1) for any >0.
Article
Full-text available
Though the behaviors of mergesort algorithms are basically known, the periodicity phenomena encountered in their analyses are not easy to deal with. In this paper closed-form expressions for the necessary number of comparisons are derived for the bottom-up algorithm, which adequately describe its periodic behavior. This allows us, among other things, to compare the top-down and bottom-up mergesort algorithms.
Article
: An improved solution is presented for the problem of finding the smallest number of direct pairwise comparisons which will always suffice to rank n objects according to some transitive characteristic. In his book, Mathematical Snapshots, Steinhaus discusses the problem of ranking n objects according to some transitive characteristic, by means of successive pairwise comparisons. In this paper, the terminology was adopted of a tennis tournament by n players. The problem may be briefly stated: 'What is the smallest number of matches which will always suffice to rank all n players.'
Article
Mellin transforms and Dirichlet series are useful in quantifying periodicity phenomena present in recursive divide-and-conquer algorithms. This note illustrates the techniques by providing a precise analysis of the standard topdown recursive mergesort algorithm, in the average case, as well as in the worst and best cases. It also derives the variance and shows that the cost of mergesort has a Gaussian limiting distribution. The approach is applicable to a number of divide-and-conquer recurrences.
Article
The expected number of interchanges and comparisons in Floyd's well-known algorithm to construct heaps and derive the probability generating functions for these quantities are considered. From these functions the corresponding expected values are computed.
Conference Paper
We present a practically efficient algorithm for the internal sorting problem. Our algorithm works in-place and, on the average, has a running-time of O(n log n) in the length n of the input. More specifically, the algorithm performs n log n + 3n comparisons and n log n + 2.65n element moves on the average. An experimental comparison of our proposed algorithm with the most efficient variants of Quicksort and Heapsort is carried out and its results are discussed.