Content uploaded by Klaus Reinhardt

Author content

All content in this area was uploaded by Klaus Reinhardt

Content may be subject to copyright.

Sorting In-Place with a Worst Case Complexity of

nlog n−1.3n+O(log n) Comparisons and

ε n log n+O(1) Transports ∗

Klaus Reinhardt

Institut f¨ur Informatik, Universit¨at Stuttgart

Breitwiesenstr.22, D-7000 Stuttgart-80, Germany

e-mail: reinhard@informatik.uni-stuttgart.de

Abstract

First we present a new variant of Merge-sort, which needs only 1.25nspace, because

it uses space again, which becomes available within the current stage. It does not need

more comparisons than classical Merge-sort.

The main result is an easy to implement method of iterating the procedure in-place

starting to sort 4/5 of the elements. Hereby we can keep the additional transport costs

linear and only very few comparisons get lost, so that nlog n−0.8ncomparisons are

needed.

We show that we can improve the number of comparisons if we sort blocks of con-

stant length with Merge-Insertion, before starting the algorithm. Another improvement

is to start the iteration with a better version, which needs only (1+ε)nspace and again

additional O(n) transports. The result is, that we can improve this theoretically up to

nlog n−1.3289ncomparisons in the worst case. This is close to the theoretical lower

bound of nlog n−1.443n.

The total number of transports in all these versions can be reduced to ε n log n+O(1)

for any ε > 0.

1 Introduction

In regard to well known sorting algorithms, there appears to be a kind of trade-oﬀ be-

tween space and time complexity: Methods for sorting like Merge-sort, Merge-Insertion or

Insertion-sort, which have a small number of comparisons, either need O(n2) transports or

work with a data structure, which needs 2nplaces (see [Kn72] [Me84]). Although the price

for storage is decreasing, it is a desirable property for an algorithm to be in-place, that

means to use only the storage needed for the input (except for a constant amount).

In [HL88] Huang and Langston gave an upper bound of 3nlog nfor the number of

comparisons of their in-place variant of merge-sort. Heap-sort needs 2nlog ncomparisons

and the upper bound for the comparisons in Bottom-up-Heapsort of 1.5nlog n[We90] is

tight [Fl91]. Carlsson’s variant of heap-sort [Ca87] needs nlog n+ Θ(nlog log n) compar-

isons. The ﬁrst algorithm, which is nearly in-place and has nlog n+O(n) as the number of

comparisons, is Mc Diarmid and Reed’s variant of Bottom-up-Heap-sort. Wegener showed

∗this research has been partially supported by the EBRA working group No. 3166 ASMICS.

1

in [We91], that it needs nlog n+ 1.1ncomparisons, but the algorithm is not in-place, since

nadditional bits are used.

For in-place sorting algorithms, we can ﬁnd a trade-oﬀ between the number of compar-

isons and the number of transports: In [MR91] Munro and Raman describe an algorithm,

which sorts in-place with only O(n) transports but needs O(n1+ε) comparisons.

The way the time complexity is composed of transports and essential1comparisons

depends on the data structure of the elements. If an element is a block of ﬁxed size and the

key ﬁeld is a small part of it, then a transport can be more expensive than a comparison. On

the other hand, if an element is a small set of pointers to an (eventually external) database,

then a transport is much cheaper than a comparison. But in any way an O(nlog n) time

algorithm is asymptotically faster than the algorithm in [MR91].

In this paper we describe the ﬁrst sorting algorithm, which fulﬁlls the following prop-

erties:

•It is general. (Any kind of elements of an ordered set can be sorted.)

•It is in-place. (The used storage except the input is constant.)

•It has the worst-case time complexity O(nlog n).

•It has nlog n+O(n) as the number of (essential) comparisons in the worst case (the

constant factor 1 for the term nlog n).

The negative linear constant in the number of comparisons and the possibility of reducing

the number of transports to εn log n+O(1) for every ﬁxed ε > 0 are further advantages of

our algorithm.

In our description some parameters of the algorithm are left open. One of them de-

pends on the given ε. Other parameters inﬂuence the linear constant in the amount of

comparisons. Regarding these parameters, we can again ﬁnd a trade-oﬀ between the linear

component in the number of comparisons and a linear amount of transports. Although a

linear amount of transports does not inﬂuence the asymptotic transport cost, it is surely

important for a practical application. We will see that by choosing appropriate parameters,

we obtain a negative linear component in the number of comparisons as close as we want

to 1.328966, but for a practical application, we should be satisﬁed with at most 1. The

main issue of the paper is theoretical, but we expect that a good choice of the parameters

leads to an eﬃcient algorithm for practical application.

2 A Variant of Merge-sort in 1.25nPlaces

The usual way of merging is to move the elements from one array of length nto another

one. We use only one array, but we add a gap of length n

4and move the elements from one

side of the gap to the other. In each step pairs of lists are merged together by repeatedly

moving the bigger head element to the tail of the new list. Hereby the space of former

pairs of lists in the same stage can be used for the resulting lists. As long as the lists are

short, there is enough space to do this (see Figure 1). The horizontal direction in a ﬁgure

shows the indices of the array and the vertical direction expresses the size of the contained

elements.

1Comparisons of pointers are not counted

2

& %

6

" !

6

.

.

.. .............

.

.

.

.

.

.

.

.

.

.

.......

.

.

.

.

.

.

.

.

Figure 1: Merging two short lists on the left side to one list on the right side

In the last step it can happen, that in the worst case (for the place) there are two lists

of length n

2. At some time during the last step the tail of the new list will hit the head of

the second list. In this case half of the ﬁrst list (of length n

4) is already merged and the

other half is not yet moved. This means, that the gap begins at the point, where the left

end of the resulting list will be. We can then start to merge from the other side (see Figure

2). This run ends exactly when all elements of the ﬁrst list are moved into the new list.

6

6

6

..............

((((

(

.

.. ... .

.

.

..

.

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

((

((((((((((((((

............

.

.

.

.

.

.

.

............

.

.

.

.

.

.. ...................

((((((((((((((

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 2: The last step, merging two lists of length n

2using a gap of length n

4

Remark: In general, it suﬃces to have a gap of half of the size of the shorter list (If the

number of elements is not a power of 2, it may happen that the other list has up to twice

this length.) or 1

2+r, if we don’t mind the additional transports for shifting the second list

rtimes. The longer list has to be between the shorter list and the gap.

Remark: This can easily be performed in a stable way by regarding the element in

the left list as smaller, if two elements are equal.

3

3 In-Place sorting by Iteration

Using the procedure described in Section 2 we can also sort 0.8nelements of the input

in-place by treating the 0.2nelements like the gap. Whenever an element is moved by the

procedure, it is swapped with an unsorted element from the gap similar to the method in

[HL88].

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

.

Figure 3: Reducing the number of unsorted element to 1

3

In one iteration we sort 2

3of the unsorted elements with the procedure and merge them

into the sorted list, where we again use 1

3of the unsorted part of the array, which still

remains unsorted, as gap (see Figure 3). This means that we merge 2

15 nelements to 4

5n

elements, 2

45 nto 14

15 n,2

135 nto 44

45 nand so on.

Remark: In one iteration step we must either shift the long list once, which costs

additional transports, or address the storage in a cyclic way in the entire algorithm.

Remark: This method mixes the unsorted elements completely, which makes it im-

possible to perform the algorithm in a stable way.

3.1 Asymmetric Merging

If we want to merge a long of length land a short list of length htogether, we can compare

the ﬁrst element of the short list with the 2k’th element of the long list and then we can

either move 2kelements of the long list or we need another kcomparisons to insert and

move the element of the short list. Hence, in order to merge the two lists one needs in the

worst case bl

2kc+ (k+ 1)h−1 comparisons. This was ﬁrst described by [HL72].

4

3.2 The Number of Additional Comparisons

Let us assume, that we can sort nelements with nlog n−dn +O(1) comparisons, then we

get the following: Sorting 4

5nelements needs 4

5n(log 4

5n−d) + O(1) comparisons. Sorting

2

5·3knelements for all kneeds

log n

X

k=1 2

5·3kn(log( 2

5·3kn)−d) + O(1)

=2

5n

log n

X

k=1

1

3k(log 2

5n−d)−log 3

log n

X

k=1

k

3k

+O(log n)

=2

5n1

2(log 2

5n−d)−log 3 3

4+O(log n)

=1

5nlog n−d+ log 2

5−log 3 3

2+O(log n)

comparisons. Together this yields n(log n−d+4

5log 4

5+1

5log 2

5−3

10 log 3) + O(log n) =

n(log n−d−0.9974) + O(log n) comparisons. Merging 2

15 nelements to 4

5nelements, 2

5·32mn

to (1 −1

5·32m−1)nwith k= 3m+ 1 and 2

5·32m+1 nto (1 −1

5·32m)nwith k= 3m+ 3 needs

n1

5+n3·2

15 +

n

log n

X

m=1 1−1

5·32m−1

23m+1 +(3m+ 2)2

5·32m+1−1

5·32m

23m+3 +(3m+ 4)2

5·32m+1 !+O(log n)

=n

3

5+5

8

log n

X

m=1

1

8m−13

40

log n

X

m=1

1

72m+8

5

log n

X

m=1

m

9m+4

3

log n

X

m=1

1

9m

+O(log n)

=n3

5+5

8·1

7−13

40 ·1

71 +8

5·9

64 +4

3·1

8+O(log n)

= 1.076n+O(log n)

comparisons. Thus we need only nlog n−dipn+O(log n) = nlog n−(d−0.0785)n+O(log n)

comparisons for the in-place algorithm. For the algorithm described so far dip = 0.8349;

the following section will explain this and show how this can be improved.

4 Methods for Reducing the Number of Comparisons

The idea is, that we can sort blocks of chosen length bwith ccomparisons by Merge-

Insertion. If the time costs for one block are O(b2), then the time we need is O(bn) = O(n)

since bis ﬁxed.

4.1 Good Applications of Merge-Insertion

Merge-Insertion needs b

P

k=1

dlog 3

4kecomparisons to sort belements [Kn72]. So it works well

for bi:= 4i−1

3, where it needs ci:= 2i4i+i

3−4i+ 1 comparisons. We prove this in the

following by induction: Clearly b1= 1 element is sorted with c1= 0 comparisons. To sort

5

bi+1 = 4bi+ 1 we need

ci+1 =ci+bidlog 3

2bie+ (2bi+ 1)dlog 3

4bi+1e

=ci+bidlog 4i−1

2e+ (2bi+ 1)dlog 4i+1 −1

4e

=2i4i+i

3−4i+ 1 + 4i−1

3(2i−1) + 2·4i+ 1

32i

=8i4i+i−4·4i+ 4

3

=2(i+ 1)4i+1 +i+ 1

3−4i+1 + 1.

Table 1 shows some instances for biand ci.

ibicidbest,i didip,i

1 1 0 1 0.9139 0.8349

2 5 7 1.1219 1.0358 0.9768

3 21 66 1.2971 1.2110 1.1320

4 85 429 1.3741 1.2880 1.2090

5 341 2392 1.4019 1.3158 1.2368

6 1365 12290 1.4118 1.3257 1.2467

.

.

..

.

..

.

..

.

..

.

..

.

.

∞ ∞ ∞ 1.415037 1.328966 1.250008

Table 1: The negative linear constant depending on the block size.

4.2 The Complete Number of Comparisons

In order to merge two lists of length land hone needs l+h−1 comparisons. In the best

case where n=b2kwe would need 2kccomparisons to sort the blocks ﬁrst and then the

comparisons to merge 2k−ipairs of lists of the length 2i−1bin the i’th of the ksteps. These

are

2kc+

k

X

i=1

2k−i(2i−1b+ 2i−1b−1) = 2kc+kn −2k+ 1

=nlog n−n(log b+1−c

b)

|{z }

dbest:=

+1

=nlog n−ndbest + 1

comparisons. In the general case the negative linear factor dis not so good as dbest. Let

the number of elements be n= (2k+m)b−jfor j < b and m < 2k, then we need (2k+m)c

comparisons to sort the blocks, m(2b−1) −jcomparisons to merge mpairs of blocks

together and k

P

i=1

(n−2k−i) = kn −2k+ 1 comparisons to merge everything together in k

6

steps. The total number of comparisons is

cb(n) := (2k+m)c+m(2b−1) −j+kn −2k+ 1

=nlog n−nlog n

2k+ (2k+m)(c+ 2b−1) −2b2k−j+ 1

=nlog n−n log n

2k+ 2b2k

n−c+ 2b−1

b!+c+ 2b−1

bj−j+ 1

≤nlog n−nlog(2bln 2) + 1

ln 2 −c+ 2b−1

b

| {z }

d:=

+c+b−1

bj+ 1

=nlog n−nd +c+b−1

bj+ 1.

The inequality follows from the fact that the expression log x+2b

xhas its minimum for

x= 2bln 2, since (log x+2b

x)0=1

xln 2 −2b

x2= 0. We have used x:= n

2k. Hence we loose at

most (log(2 ln 2) −2 + 1

ln 2 )n=−0.086071n= (d−dbest)ncomparisons in contrast to the

case of an ideal value of n.

Table 1 shows the inﬂuence of the block size biand the number of comparisons cito

sort it by Merge-Insertion on the negative linear constant difor the comparisons of the

algorithm in Section 2 and dip,i for the algorithm in Section 3. It shows that dip can be

improved as close as we want to 1.250008. As one can see looking at Table 1, most of the

possible improvement is already reached with relatively small blocks.

Another idea is that we can reduce the additional comparisons in Section 3 to a very

small amount, if we start using the algorithm in Section 4.3. This allows us to improve

the linear constant for the in-place algorithm as close as we want to diand in combination

with the block size we can improve it as close as we want to 1.328966.

4.3 A variant of Merge-sort in (1+ε)nplaces

We can change the algorithm of Section 2 in the following way: For a given ε > 0 we have

to choose appropriate r’s for the last dlog 1

εe − 2 steps according to the ﬁrst remark in that

section. Because the number of the last steps and the r’s are constants determined by ε,

the additional transports are O(n) (of course the constant becomes large for small ε’s).

7

5 The Reduction of the Number of Transports

The algorithms in Section 2 and Section 4.3 perform nlog n+O(n) transports. We can

improve this to ε n log n+O(1) for all constants ε > 0, if we combine d1

ε+ 1esteps to

1 by merging 2d1

ε+1e(constantly many) lists in each step, as long as the lists are short

enough. Hereby we keep the number of comparisons exactly the same using for example

the following technique: We use a binary tree of that (constant) size, which contains on

every leaf node the pointer to the head of a list (’nil’ if it is empty, and additionally a

pointer to the tail) and on every other node that pointer of a son node, which points to

the bigger element. After moving one element we need d1

ε+ 1ecomparisons to repair the

tree as shown in this example:

... 7... 3... 5 8 ... 6 9 ...

•

AA

AU

•

•

•

•

•

•

EEEEEEEE?

........

.

.

.

.

.

.

.

.

........

.

.

.

.

.

.

.

.

................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

⇒

... 7... 3... 5... 6 8 9 . . .

•

AA

AU

•

•

/

•

•

•

?

•

/

........

.

.

.

.

.

.

.

.

........

.

.

.

.

.

.

.

.

................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Note that if we want to reduce the number of transports to o(nlog n), then the size of

the tree must increase depending on n, but then the algorithm would not be in-place.

Remark: This method can also be used to reduce the total number of I/Os (for

transports and comparisons) to a slow storage to ε n log n+O(1), if we can keep the 2d1

ε+1e

elements in a faster storage.

5.1 Transport Costs in the In-Place Procedure

Choosing εof the half size eliminates the factor 2 for swapping instead of moving. But

there are still the additional transports in the iteration during the asymmetric merging in

Section 3:

Since ≈log3niterations have to be performed, we would get O(nlog n) additional

transports, if we perform only iterations of the described kind, which need O(n) transports

each time. But we can reduce these additional transports to O(n), if we perform a second

kind of iteration after each iiterations for any chosen i. Each time it reduces the number

of elements to the half, which have to be moved in later iterations. This second kind of

iteration works as follows (see Figure 4):

One Quick-Sort iteration step is performed on the unsorted elements, where the middle

element of the sorted list is chosen as reference. Then one of the lists is sorted in the usual

way (the other unsorted list is used as a gap) and merged with the corresponding half of

the sorted list. These elements will never have to be moved again.

Remark: It is easy to see that we can reduce the number of additional comparisons to

εn, if we choose ibig enough. Although a formal estimation is quite hard, we conjecture,

that we need only a few comparisons more, if we mainly apply iterations of the second kind,

even in the worst case, where all unsorted elements are on one side. Clearly this iteration

is good in the average.

8

6

6

6

& %

6

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

Figure 4: Moving half of the elements to their ﬁnal position.

6 Possible Further Improvements

The algorithm in Section 2 works optimal for the case that n=b2mand has its worst

behavior (loosing -0.086n) in the case that n= 2 ln 2b2m= 1.386b2m. But we could avoid

this for the start of the iteration, if we are not ﬁxed to sort 0.8 of the elements in the

ﬁrst step but exactly b2melements for 2

5< b2m≤4

5. A simpler method would be to

sort always the biggest possible b2min each iteration step. In both cases the k’s for the

asymmetric merging in the further iterations have to be found dynamically, which makes

a proper formal estimation diﬃcult.

For practical applications it is worth to note, that the average number of comparisons

for asymmetric merging is better than in the worst case, because for each element of the

smaller list, which is inserted in 2k−1 elements, on the average some elements of the

longer list also can be moved to the new list, so that the average number of comparisons

is less than bl

2kc+ (k+ 1)h−1. So we conjecture, that an optimized in-place algorithm

can have nlog n−1.4n+O(log n) comparisons in the average and nlog n−1.3n+O(log n)

comparisons in the worst case avoiding large constants for the linear amount of transports.

If there exists a constant c > 0 and an in-place algorithm, which sorts cn of the input

with O(nlog n) comparisons and O(n) transports, then the iteration method could be used

to solve the open problem in [MR91].

9

6.0.1 Open Problems:

•Does there exists an in-place sorting algorithm with O(nlog n) comparisons and O(n)

transports [MR91]?

•Does there exists an in-place sorting algorithm, which needs only nlog n+O(n)

comparisons and only o(nlog n) transports?

•Does there exists a stable in-place sorting algorithm, which needs only nlog n+O(n)

comparisons and only O(nlog n) transports?

References

[Ca87] S. Carlsson: A variant of HEAPSORT with almost optimal number of comparisons.

Information Processing Letters 24:247-250,1987.

[Fl91] R. Fleischer: A tight lower bound for the worst case of bottom-up-heapsort. Techni-

cal Report MPI-I-91-104, Max-Planck-Institut f¨ur Informatik, D-W-6600 Saarbr¨ucken,

Germany April 1991.

[HL88] B-C. Huang and M.A. Langston: Practical in-place merging. CACM, 31:248-

352,1988.

[HL72] F. K. Hwang and S. Lin: A simple algorithm for merging two disjoint linearly

ordered sets. SIAM J. Computing 1:31-39, 1972.

[Kn72] D. E. Knuth: The Art of Computer Programming Volume 3 / Sorting and Search-

ing. Addison-Wesley 1972.

[Me84] K. Mehlhorn: Data Structures and Algorithms, Vol 1: Sorting and Searching.

Springer-Verlag, Berlin/Heidelberg, 1984.

[MR91] J. I. Munro and V. Raman: Fast Stable In-Place Sorting with O(N) Data Moves.

Proceedings of the FST&TCS, LNCS 560:266-277, 1991.

[We90] I. Wegener: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating on

average QUICKSORT (if nis not very small). Proceedings of the MFCS90, LNCS 452,

516-522, 1990.

[We91] I. Wegener: The worst case complexity of Mc Diarmid and Reed’s variant of

BOTTOM-UP-HEAP SORT is less than nlog n+ 1.1n. Proceedings of the STACS91,

LNCS 480:137-147, 1991.

10