Content uploaded by Klaus Reinhardt
Author content
All content in this area was uploaded by Klaus Reinhardt
Content may be subject to copyright.
Sorting In-Place with a Worst Case Complexity of
nlog n−1.3n+O(log n) Comparisons and
ε n log n+O(1) Transports ∗
Klaus Reinhardt
Institut f¨ur Informatik, Universit¨at Stuttgart
Breitwiesenstr.22, D-7000 Stuttgart-80, Germany
e-mail: reinhard@informatik.uni-stuttgart.de
Abstract
First we present a new variant of Merge-sort, which needs only 1.25nspace, because
it uses space again, which becomes available within the current stage. It does not need
more comparisons than classical Merge-sort.
The main result is an easy to implement method of iterating the procedure in-place
starting to sort 4/5 of the elements. Hereby we can keep the additional transport costs
linear and only very few comparisons get lost, so that nlog n−0.8ncomparisons are
needed.
We show that we can improve the number of comparisons if we sort blocks of con-
stant length with Merge-Insertion, before starting the algorithm. Another improvement
is to start the iteration with a better version, which needs only (1+ε)nspace and again
additional O(n) transports. The result is, that we can improve this theoretically up to
nlog n−1.3289ncomparisons in the worst case. This is close to the theoretical lower
bound of nlog n−1.443n.
The total number of transports in all these versions can be reduced to ε n log n+O(1)
for any ε > 0.
1 Introduction
In regard to well known sorting algorithms, there appears to be a kind of trade-off be-
tween space and time complexity: Methods for sorting like Merge-sort, Merge-Insertion or
Insertion-sort, which have a small number of comparisons, either need O(n2) transports or
work with a data structure, which needs 2nplaces (see [Kn72] [Me84]). Although the price
for storage is decreasing, it is a desirable property for an algorithm to be in-place, that
means to use only the storage needed for the input (except for a constant amount).
In [HL88] Huang and Langston gave an upper bound of 3nlog nfor the number of
comparisons of their in-place variant of merge-sort. Heap-sort needs 2nlog ncomparisons
and the upper bound for the comparisons in Bottom-up-Heapsort of 1.5nlog n[We90] is
tight [Fl91]. Carlsson’s variant of heap-sort [Ca87] needs nlog n+ Θ(nlog log n) compar-
isons. The first algorithm, which is nearly in-place and has nlog n+O(n) as the number of
comparisons, is Mc Diarmid and Reed’s variant of Bottom-up-Heap-sort. Wegener showed
∗this research has been partially supported by the EBRA working group No. 3166 ASMICS.
1
in [We91], that it needs nlog n+ 1.1ncomparisons, but the algorithm is not in-place, since
nadditional bits are used.
For in-place sorting algorithms, we can find a trade-off between the number of compar-
isons and the number of transports: In [MR91] Munro and Raman describe an algorithm,
which sorts in-place with only O(n) transports but needs O(n1+ε) comparisons.
The way the time complexity is composed of transports and essential1comparisons
depends on the data structure of the elements. If an element is a block of fixed size and the
key field is a small part of it, then a transport can be more expensive than a comparison. On
the other hand, if an element is a small set of pointers to an (eventually external) database,
then a transport is much cheaper than a comparison. But in any way an O(nlog n) time
algorithm is asymptotically faster than the algorithm in [MR91].
In this paper we describe the first sorting algorithm, which fulfills the following prop-
erties:
•It is general. (Any kind of elements of an ordered set can be sorted.)
•It is in-place. (The used storage except the input is constant.)
•It has the worst-case time complexity O(nlog n).
•It has nlog n+O(n) as the number of (essential) comparisons in the worst case (the
constant factor 1 for the term nlog n).
The negative linear constant in the number of comparisons and the possibility of reducing
the number of transports to εn log n+O(1) for every fixed ε > 0 are further advantages of
our algorithm.
In our description some parameters of the algorithm are left open. One of them de-
pends on the given ε. Other parameters influence the linear constant in the amount of
comparisons. Regarding these parameters, we can again find a trade-off between the linear
component in the number of comparisons and a linear amount of transports. Although a
linear amount of transports does not influence the asymptotic transport cost, it is surely
important for a practical application. We will see that by choosing appropriate parameters,
we obtain a negative linear component in the number of comparisons as close as we want
to 1.328966, but for a practical application, we should be satisfied with at most 1. The
main issue of the paper is theoretical, but we expect that a good choice of the parameters
leads to an efficient algorithm for practical application.
2 A Variant of Merge-sort in 1.25nPlaces
The usual way of merging is to move the elements from one array of length nto another
one. We use only one array, but we add a gap of length n
4and move the elements from one
side of the gap to the other. In each step pairs of lists are merged together by repeatedly
moving the bigger head element to the tail of the new list. Hereby the space of former
pairs of lists in the same stage can be used for the resulting lists. As long as the lists are
short, there is enough space to do this (see Figure 1). The horizontal direction in a figure
shows the indices of the array and the vertical direction expresses the size of the contained
elements.
1Comparisons of pointers are not counted
2
& %
6
" !
6
.
.
.. .............
.
.
.
.
.
.
.
.
.
.
.......
.
.
.
.
.
.
.
.
Figure 1: Merging two short lists on the left side to one list on the right side
In the last step it can happen, that in the worst case (for the place) there are two lists
of length n
2. At some time during the last step the tail of the new list will hit the head of
the second list. In this case half of the first list (of length n
4) is already merged and the
other half is not yet moved. This means, that the gap begins at the point, where the left
end of the resulting list will be. We can then start to merge from the other side (see Figure
2). This run ends exactly when all elements of the first list are moved into the new list.
6
6
6
..............
((((
(
.
.. ... .
.
.
..
.
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
((
((((((((((((((
............
.
.
.
.
.
.
.
............
.
.
.
.
.
.. ...................
((((((((((((((
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 2: The last step, merging two lists of length n
2using a gap of length n
4
Remark: In general, it suffices to have a gap of half of the size of the shorter list (If the
number of elements is not a power of 2, it may happen that the other list has up to twice
this length.) or 1
2+r, if we don’t mind the additional transports for shifting the second list
rtimes. The longer list has to be between the shorter list and the gap.
Remark: This can easily be performed in a stable way by regarding the element in
the left list as smaller, if two elements are equal.
3
3 In-Place sorting by Iteration
Using the procedure described in Section 2 we can also sort 0.8nelements of the input
in-place by treating the 0.2nelements like the gap. Whenever an element is moved by the
procedure, it is swapped with an unsorted element from the gap similar to the method in
[HL88].
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
.
Figure 3: Reducing the number of unsorted element to 1
3
In one iteration we sort 2
3of the unsorted elements with the procedure and merge them
into the sorted list, where we again use 1
3of the unsorted part of the array, which still
remains unsorted, as gap (see Figure 3). This means that we merge 2
15 nelements to 4
5n
elements, 2
45 nto 14
15 n,2
135 nto 44
45 nand so on.
Remark: In one iteration step we must either shift the long list once, which costs
additional transports, or address the storage in a cyclic way in the entire algorithm.
Remark: This method mixes the unsorted elements completely, which makes it im-
possible to perform the algorithm in a stable way.
3.1 Asymmetric Merging
If we want to merge a long of length land a short list of length htogether, we can compare
the first element of the short list with the 2k’th element of the long list and then we can
either move 2kelements of the long list or we need another kcomparisons to insert and
move the element of the short list. Hence, in order to merge the two lists one needs in the
worst case bl
2kc+ (k+ 1)h−1 comparisons. This was first described by [HL72].
4
3.2 The Number of Additional Comparisons
Let us assume, that we can sort nelements with nlog n−dn +O(1) comparisons, then we
get the following: Sorting 4
5nelements needs 4
5n(log 4
5n−d) + O(1) comparisons. Sorting
2
5·3knelements for all kneeds
log n
X
k=1 2
5·3kn(log( 2
5·3kn)−d) + O(1)
=2
5n
log n
X
k=1
1
3k(log 2
5n−d)−log 3
log n
X
k=1
k
3k
+O(log n)
=2
5n1
2(log 2
5n−d)−log 3 3
4+O(log n)
=1
5nlog n−d+ log 2
5−log 3 3
2+O(log n)
comparisons. Together this yields n(log n−d+4
5log 4
5+1
5log 2
5−3
10 log 3) + O(log n) =
n(log n−d−0.9974) + O(log n) comparisons. Merging 2
15 nelements to 4
5nelements, 2
5·32mn
to (1 −1
5·32m−1)nwith k= 3m+ 1 and 2
5·32m+1 nto (1 −1
5·32m)nwith k= 3m+ 3 needs
n1
5+n3·2
15 +
n
log n
X
m=1 1−1
5·32m−1
23m+1 +(3m+ 2)2
5·32m+1−1
5·32m
23m+3 +(3m+ 4)2
5·32m+1 !+O(log n)
=n
3
5+5
8
log n
X
m=1
1
8m−13
40
log n
X
m=1
1
72m+8
5
log n
X
m=1
m
9m+4
3
log n
X
m=1
1
9m
+O(log n)
=n3
5+5
8·1
7−13
40 ·1
71 +8
5·9
64 +4
3·1
8+O(log n)
= 1.076n+O(log n)
comparisons. Thus we need only nlog n−dipn+O(log n) = nlog n−(d−0.0785)n+O(log n)
comparisons for the in-place algorithm. For the algorithm described so far dip = 0.8349;
the following section will explain this and show how this can be improved.
4 Methods for Reducing the Number of Comparisons
The idea is, that we can sort blocks of chosen length bwith ccomparisons by Merge-
Insertion. If the time costs for one block are O(b2), then the time we need is O(bn) = O(n)
since bis fixed.
4.1 Good Applications of Merge-Insertion
Merge-Insertion needs b
P
k=1
dlog 3
4kecomparisons to sort belements [Kn72]. So it works well
for bi:= 4i−1
3, where it needs ci:= 2i4i+i
3−4i+ 1 comparisons. We prove this in the
following by induction: Clearly b1= 1 element is sorted with c1= 0 comparisons. To sort
5
bi+1 = 4bi+ 1 we need
ci+1 =ci+bidlog 3
2bie+ (2bi+ 1)dlog 3
4bi+1e
=ci+bidlog 4i−1
2e+ (2bi+ 1)dlog 4i+1 −1
4e
=2i4i+i
3−4i+ 1 + 4i−1
3(2i−1) + 2·4i+ 1
32i
=8i4i+i−4·4i+ 4
3
=2(i+ 1)4i+1 +i+ 1
3−4i+1 + 1.
Table 1 shows some instances for biand ci.
ibicidbest,i didip,i
1 1 0 1 0.9139 0.8349
2 5 7 1.1219 1.0358 0.9768
3 21 66 1.2971 1.2110 1.1320
4 85 429 1.3741 1.2880 1.2090
5 341 2392 1.4019 1.3158 1.2368
6 1365 12290 1.4118 1.3257 1.2467
.
.
..
.
..
.
..
.
..
.
..
.
.
∞ ∞ ∞ 1.415037 1.328966 1.250008
Table 1: The negative linear constant depending on the block size.
4.2 The Complete Number of Comparisons
In order to merge two lists of length land hone needs l+h−1 comparisons. In the best
case where n=b2kwe would need 2kccomparisons to sort the blocks first and then the
comparisons to merge 2k−ipairs of lists of the length 2i−1bin the i’th of the ksteps. These
are
2kc+
k
X
i=1
2k−i(2i−1b+ 2i−1b−1) = 2kc+kn −2k+ 1
=nlog n−n(log b+1−c
b)
|{z }
dbest:=
+1
=nlog n−ndbest + 1
comparisons. In the general case the negative linear factor dis not so good as dbest. Let
the number of elements be n= (2k+m)b−jfor j < b and m < 2k, then we need (2k+m)c
comparisons to sort the blocks, m(2b−1) −jcomparisons to merge mpairs of blocks
together and k
P
i=1
(n−2k−i) = kn −2k+ 1 comparisons to merge everything together in k
6
steps. The total number of comparisons is
cb(n) := (2k+m)c+m(2b−1) −j+kn −2k+ 1
=nlog n−nlog n
2k+ (2k+m)(c+ 2b−1) −2b2k−j+ 1
=nlog n−n log n
2k+ 2b2k
n−c+ 2b−1
b!+c+ 2b−1
bj−j+ 1
≤nlog n−nlog(2bln 2) + 1
ln 2 −c+ 2b−1
b
| {z }
d:=
+c+b−1
bj+ 1
=nlog n−nd +c+b−1
bj+ 1.
The inequality follows from the fact that the expression log x+2b
xhas its minimum for
x= 2bln 2, since (log x+2b
x)0=1
xln 2 −2b
x2= 0. We have used x:= n
2k. Hence we loose at
most (log(2 ln 2) −2 + 1
ln 2 )n=−0.086071n= (d−dbest)ncomparisons in contrast to the
case of an ideal value of n.
Table 1 shows the influence of the block size biand the number of comparisons cito
sort it by Merge-Insertion on the negative linear constant difor the comparisons of the
algorithm in Section 2 and dip,i for the algorithm in Section 3. It shows that dip can be
improved as close as we want to 1.250008. As one can see looking at Table 1, most of the
possible improvement is already reached with relatively small blocks.
Another idea is that we can reduce the additional comparisons in Section 3 to a very
small amount, if we start using the algorithm in Section 4.3. This allows us to improve
the linear constant for the in-place algorithm as close as we want to diand in combination
with the block size we can improve it as close as we want to 1.328966.
4.3 A variant of Merge-sort in (1+ε)nplaces
We can change the algorithm of Section 2 in the following way: For a given ε > 0 we have
to choose appropriate r’s for the last dlog 1
εe − 2 steps according to the first remark in that
section. Because the number of the last steps and the r’s are constants determined by ε,
the additional transports are O(n) (of course the constant becomes large for small ε’s).
7
5 The Reduction of the Number of Transports
The algorithms in Section 2 and Section 4.3 perform nlog n+O(n) transports. We can
improve this to ε n log n+O(1) for all constants ε > 0, if we combine d1
ε+ 1esteps to
1 by merging 2d1
ε+1e(constantly many) lists in each step, as long as the lists are short
enough. Hereby we keep the number of comparisons exactly the same using for example
the following technique: We use a binary tree of that (constant) size, which contains on
every leaf node the pointer to the head of a list (’nil’ if it is empty, and additionally a
pointer to the tail) and on every other node that pointer of a son node, which points to
the bigger element. After moving one element we need d1
ε+ 1ecomparisons to repair the
tree as shown in this example:
... 7... 3... 5 8 ... 6 9 ...
•
AA
AU
•
•
•
•
•
•
EEEEEEEE?
........
.
.
.
.
.
.
.
.
........
.
.
.
.
.
.
.
.
................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
⇒
... 7... 3... 5... 6 8 9 . . .
•
AA
AU
•
•
/
•
•
•
?
•
/
........
.
.
.
.
.
.
.
.
........
.
.
.
.
.
.
.
.
................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Note that if we want to reduce the number of transports to o(nlog n), then the size of
the tree must increase depending on n, but then the algorithm would not be in-place.
Remark: This method can also be used to reduce the total number of I/Os (for
transports and comparisons) to a slow storage to ε n log n+O(1), if we can keep the 2d1
ε+1e
elements in a faster storage.
5.1 Transport Costs in the In-Place Procedure
Choosing εof the half size eliminates the factor 2 for swapping instead of moving. But
there are still the additional transports in the iteration during the asymmetric merging in
Section 3:
Since ≈log3niterations have to be performed, we would get O(nlog n) additional
transports, if we perform only iterations of the described kind, which need O(n) transports
each time. But we can reduce these additional transports to O(n), if we perform a second
kind of iteration after each iiterations for any chosen i. Each time it reduces the number
of elements to the half, which have to be moved in later iterations. This second kind of
iteration works as follows (see Figure 4):
One Quick-Sort iteration step is performed on the unsorted elements, where the middle
element of the sorted list is chosen as reference. Then one of the lists is sorted in the usual
way (the other unsorted list is used as a gap) and merged with the corresponding half of
the sorted list. These elements will never have to be moved again.
Remark: It is easy to see that we can reduce the number of additional comparisons to
εn, if we choose ibig enough. Although a formal estimation is quite hard, we conjecture,
that we need only a few comparisons more, if we mainly apply iterations of the second kind,
even in the worst case, where all unsorted elements are on one side. Clearly this iteration
is good in the average.
8
6
6
6
& %
6
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
Figure 4: Moving half of the elements to their final position.
6 Possible Further Improvements
The algorithm in Section 2 works optimal for the case that n=b2mand has its worst
behavior (loosing -0.086n) in the case that n= 2 ln 2b2m= 1.386b2m. But we could avoid
this for the start of the iteration, if we are not fixed to sort 0.8 of the elements in the
first step but exactly b2melements for 2
5< b2m≤4
5. A simpler method would be to
sort always the biggest possible b2min each iteration step. In both cases the k’s for the
asymmetric merging in the further iterations have to be found dynamically, which makes
a proper formal estimation difficult.
For practical applications it is worth to note, that the average number of comparisons
for asymmetric merging is better than in the worst case, because for each element of the
smaller list, which is inserted in 2k−1 elements, on the average some elements of the
longer list also can be moved to the new list, so that the average number of comparisons
is less than bl
2kc+ (k+ 1)h−1. So we conjecture, that an optimized in-place algorithm
can have nlog n−1.4n+O(log n) comparisons in the average and nlog n−1.3n+O(log n)
comparisons in the worst case avoiding large constants for the linear amount of transports.
If there exists a constant c > 0 and an in-place algorithm, which sorts cn of the input
with O(nlog n) comparisons and O(n) transports, then the iteration method could be used
to solve the open problem in [MR91].
9
6.0.1 Open Problems:
•Does there exists an in-place sorting algorithm with O(nlog n) comparisons and O(n)
transports [MR91]?
•Does there exists an in-place sorting algorithm, which needs only nlog n+O(n)
comparisons and only o(nlog n) transports?
•Does there exists a stable in-place sorting algorithm, which needs only nlog n+O(n)
comparisons and only O(nlog n) transports?
References
[Ca87] S. Carlsson: A variant of HEAPSORT with almost optimal number of comparisons.
Information Processing Letters 24:247-250,1987.
[Fl91] R. Fleischer: A tight lower bound for the worst case of bottom-up-heapsort. Techni-
cal Report MPI-I-91-104, Max-Planck-Institut f¨ur Informatik, D-W-6600 Saarbr¨ucken,
Germany April 1991.
[HL88] B-C. Huang and M.A. Langston: Practical in-place merging. CACM, 31:248-
352,1988.
[HL72] F. K. Hwang and S. Lin: A simple algorithm for merging two disjoint linearly
ordered sets. SIAM J. Computing 1:31-39, 1972.
[Kn72] D. E. Knuth: The Art of Computer Programming Volume 3 / Sorting and Search-
ing. Addison-Wesley 1972.
[Me84] K. Mehlhorn: Data Structures and Algorithms, Vol 1: Sorting and Searching.
Springer-Verlag, Berlin/Heidelberg, 1984.
[MR91] J. I. Munro and V. Raman: Fast Stable In-Place Sorting with O(N) Data Moves.
Proceedings of the FST&TCS, LNCS 560:266-277, 1991.
[We90] I. Wegener: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating on
average QUICKSORT (if nis not very small). Proceedings of the MFCS90, LNCS 452,
516-522, 1990.
[We91] I. Wegener: The worst case complexity of Mc Diarmid and Reed’s variant of
BOTTOM-UP-HEAP SORT is less than nlog n+ 1.1n. Proceedings of the STACS91,
LNCS 480:137-147, 1991.
10