Page 1
Ratio based stable inplace merging
PokSon Kim1and Arne Kutzner2
1Kookmin University, Department of Mathematics, Seoul 136702, Rep. of Korea
pskim@kookmin.ac.kr
2Seokyeong University, Department of Computer Science, Seoul 136704, Rep. of
Korea kutzner@skuniv.ac.kr
Abstract. We investigate the problem of stable inplace merging from
a ratio k =
sequences with m ≤ n . We introduce a novel algorithm for this problem
that is asymptotically optimal regarding the number of assignments as
well as comparisons. Our algorithm uses knowledge about the ratio of
the input sizes to gain optimality and does not stay in the tradition of
Mannila and Ukkonen’s work [8] in contrast to all other stable inplace
merging algorithms proposed so far. It has a simple modular structure
and does not demand the additional extraction of a movement imitation
buffer as needed by its competitors. For its core components we give
concrete implementations in form of Pseudo Code. Using benchmarking
we prove that our algorithm performs almost always better than its direct
competitor proposed in [6].
As additional subresult we show that stable inplace merging is a quite
simple problem for every ratio k ≥√m by proving that there exists a
primitive algorithm that is asymptotically optimal for such ratios.
n
mbased point of view where m,n are the sizes of the input
1 Introduction
Merging denotes the operation of rearranging the elements of two adjacent sorted
sequences of size m and n, so that the result forms one sorted sequence of m+n
elements. An algorithm merges two sequences in place when it relies on a fixed
amount of extra space. It is regarded as stable, if it preserves the initial ordering
of elements with equal value.
There are two significant lower bounds for merging. The lower bound for the
number of assignments is m + n because every element of the input sequences
can change its position in the sorted output. As shown e.g. in Knuth [7] the
lower bound for the number of comparisons is Ω(mlog(n
A merging algorithm is called asymptotically fully optimal if it is asymptotically
optimal regarding the number of comparisons as well as assignments.
We will inspect the merging problem on the foundation of a ratio based approach.
In the following k will always denote the ratio k =
sequences. The lower bounds for merging can be expressed on the foundation of
such a ratio as well. We get Ω(mlog(k + 1)) as lower bound for the number of
comparisons and m · (k + 1) as lower bound for the number of assignments.
m+ 1)), where m ≤ n.
n
mof the sizes of the input
Page 2
In the first part of this paper we will show that there is a simple asymptot
ically fully optimal stable inplace merging algorithm for every ratio k ≥√m.
Afterward we will introduce a novel stable inplace merging algorithm that is
asymptotically fully optimal for any ratio k. The new algorithm has a modular
structure and does not rely on the techniques described by Mannila and Ukko
nen [8] in contrast to all other works ([10,4,2,6]) known to us. Instead it exploits
knowledge about the ratio of the input sizes to achieve optimality. In its core
our algorithm consists of two separated operations named “Block rearrangement”
and “Local merges”. The separation allowed the omitting of the extraction of an
additional movement imitation buffer as e.g. necessary in [6]. For core parts of
the new algorithm we will give an implementation in PseudoCode. Some bench
marks will show that it performs better than its competitor proposed in [6] for
a wide range of inputs.
A first conceptual description of a stable asymptotically fully optimal in
place merging algorithm can be found in the work of Symvonis [10]. Further work
was done by Geffert et al. [4] and Chen [2] where Chen presented a simplified
variant of Geffert et al’s algorithm. All three publications delivered neither an
implementation in PseudoCode nor benchmarks. Recently Kim and Kutzner
[6] published a further algorithm together with benchmarks. These benchmarks
proved that stable asymptotically fully optimal inplace merging algorithms are
competitive and don’t have to be viewed as theoretical models merely.
2A simple asymptotically optimal algorithm for k ≥√m
Algorithm
Hwang and Lin
Arguments
u,v with u ≤ v
let
m = u,n = v
Comparisons
m(t + 1) + n/2t
where
t = ?log(n/m)?
Assignments
(1)  ext. buffer
(2)  m rotat.
Block Swapping
Block Rotation
2m + n
n + m2+ m
3u
u + v + gcd(u,v)
≤ 2(u + v)


m(m+1)
2
u,v with u = v
u,v


Binary Search
Minimum Search
Insertion Sort
u, x (searched element)
u
u, let m = u
Table 1. Complexity of the ToolboxAlgorithms
?logu? + 1
u − 1
m(m−1)
2
+ (m − 1)
− 1
We now introduce some notations that will be used throughout the paper.
Let u and v be two ascending sorted sequences. We define u ≤ v (u < v) iff
x ≤ y (x < y) for all elements x ∈ u and for all elements y ∈ v. u denotes the
size of the sequence u. Unless stated otherwise, m and n (m ≤ n) are the sizes
Page 3
of two input sequences u and v respectively. δ always denotes some blocksize
with δ ≤ m.
Tab. 1 contains the complexity regarding comparisons and assignments for six
elementary algorithms that we will use throughout this paper. Brief descriptions
of these algorithms except for “Minimum Search” can be found in [6]. In the case
of “Minimum Search” we assume that u is unsorted, therefore a linear search is
necessary.
First we will now show that there is a simple stable merging algorithm called
BlockRotationMerge that is asymptotically fully optimal for any ratio
k ≥√m. Afterward we will prove that there is a relation between the number of
different elements in the shorter input sequence u and the number of assignments
performed by the rotation based variant of Hwang and Lin’s algorithm [5].
Algorithm 1: BlockRotationMerge (u,v,δ)
1. We split the sequence u into blocks u1u2...u?m
u?m
of ui (i = 1,···,?m
2. u1u2...u?m
u1v1u2v2...u?m
3. We locally merge all pairs uiviusing?m
The steps 2 and 3 are interlaced as follows: After creating a new pair uivi(i =
1,···,?m
Lemma 1. BlockRotationMerge performs
assignments at most if we use the optimal algorithm from Dudzinski and Dydek
[3] for all blockrotations .
δ?so that all sections u2to
δ?are of equal size δ and u1is of size mmodδ. Let xibe the last element
δ
into sections v1v2...v?m
δ?v1v2...v?m
δ?v?m
of Hwang and Lin’s algorithm ([5]).
?). Using binary searches we compute a splitting of v
δ?is reorganized to
δ
?calls of the rotation based variant
δ?so that vi< xi≤ vi+1(i = 1,···,?m
δ?using?m
δ
δ
?− 1).
?− 1 many rotations.
δ
?) as part of the second step we immediately locally merge this pair as
described in step 3.
m2
2·δ+ 2n + 6m + m · δ many
Proof. For the first rotation from u1u2···u?m
gorithm performs u2 + ··· + u?m
signments. The second rotation from u2u3···u?m
u3+···+u?m
the last rotation from u?m
the algorithm requires u?m
Additionallym
merges. Altogether the algorithm performs δ · ((m
n + n + 6m + m · δ =m2
assignments at most.
δ?v1 to u1v1u2···u?m
δ?the al
δ? + v1 + gcd(u2 + ··· + u?m
δ?, v1) as
δ?v2to u2v2u3···u?m
δ?requires
δ?+v2+gcd(u3+···+u?m
δ?−1u?m
δ?, v2) assignments, and so on. For
δ?−1v?m
δ?, v?m
δ?v?m
δ?to u?m
δ?−1v?m
δ?−1u?m
δ?v?m
δ?
δ? + v?m
δ?−1 + gcd(u?m
δ?−1) assignments.
δ(3δ+3δ+δ2) = 6m+m·δ assignments are required for the local
δ− 1) + (m
δ− 2) + ··· + 1) +
2·δ+ 2 · n + 6m + m · δ
2·δ−m
2+ 2 · n + 6m + m · δ ≤m2
? ?
Lemma 2. BlockRotationMerge is asymptotically optimal regarding the
number of comparisons.
Page 4
Corollary 1. If we assume a blocksize of ?√m? then BlockRotationMerge
is asymptotically fully optimal for all k ≥√m.
So, for k ≥√m there is a quite primitive asymptotically fully optimal stable
inplace merging algorithm. In the context of complexity deliberations in the
next section we will rely on the following Lemma.
Lemma 3. Let λ be the number of different elements in u. Then the number of
assignments performed by the rotation based variant of Hwang and Lin’s algo
rithm is O(λ · m + n) = O((λ + k) · m).
Proof. Let u = u1u2...uλ, where every ui(i = 1,···,λ) is a maximally sized
section of equal elements. We split v into sections v1v2...vλvλ+1so that we get
vi< ui≤ vi+1(i = 1,···,λ). (Some vican be empty.) We assume that Hwang
and Lin’s algorithm already merged a couple of section and comes to the first
elements of the section ui(i = 1,···,λ). The algorithm now computes the section
viand moves it in front of uiusing one rotation of the form ···ui...uλvi··· to
···viui...uλ···. This requires ui+···+uλ+vi+gcd(ui+···+uλ, vi) ≤
2(m+vi) many assignments. Afterward the algorithm continues with the second
element in ui. Obviously there is nothing to move at this stage because all
elements in uiare equal and the smaller elements from v were already moved
in the step before. Because we have only λ different sections we proved our
conjecture.
? ?
Corollary 2. Hwang and Lin’s algorithm is fully asymptotically optimal if we
have either k ≥ m or k ≥ λ where λ is the number of different elements in the
shorter input sequence u .
3Novel asymptotically optimal stable inplace merging
algorithm
We will now propose a novel stable inplace merging algorithm called Stable
OptimalBlockMerge that is fully asymptotically optimal for any ratio. No
table properties of our algorithm are: It does not rely on the block management
techniques described in Mannila and Ukonnen’s work [8] in contrast to all other
such algorithms proposed so far. It degenerates to the simple BlockRotation
Merge algorithm for roughly k ≥√m/2 . The internal buffer for local merges
and the movement imitation buffer share a common buffer area. The two opera
tions “block rearrangement” and “local merges” stay separated and communicate
using a common block distribution storage. There is no lower bound regarding
the size of the shorter input sequence.
Algorithm 2: StableOptimalBlockMerge
Step 1: Block distribution storage assignment
Let δ = ?√m? be our blocksize. We split the input sequence u into u = s1ts2u?
so that s1and s2are two sequences of size ?m/δ? + ?n/δ? and t is a sequence
of maximal size with elements equal to the last element of s1. We assume that
Page 5
there are enough elements to get a nonempty u?and call s1together with s2our
block distribution storage (in the following shortened to bdstorage).
Step 2: Buffer extraction
w
v
t
s1
(all elements in this area are distinct)
s2
b
elements originating from u
block distribution storage
buffer
Fig.1. Segmentation after the buffer extraction
In front of the remaining sequence u?we extract an ascending sorted buffer b of
size δ so that all pairs of elements inside b are distinct. Simple techniques how to
do so are proposed e.g. in [9] or [6]. Once more we assume that there are enough
elements to do so. Now let w be the remaining right part of u?after the buffer
extraction.
The segmentation of our input sequences after the buffer extraction is shown in
Fig. 1.
Step 3: Block rearrangement
wr−1
wr
v1
v?n
δ?
w1
w2
buffer area
≤
(a)
(b)
wblock
vblock
<
vblock
wblock
wblock
vblock
> (elements exchanged)
vblock
(here used as movement imitation buffer)
first and second section of the block distribution storage
(c)
v?n
δ?+1
Fig.2. Graphical remarks to the block rearrangement process
We logically split the sequence wv into blocks of equal size δ as shown in Fig.
2 (a). The two blocks w1and v?n
the following we call every block originating from w a wblock and every block
δ?+1are undersized and can even be empty. In
Page 6
originating from v a vblock. The minimal wblock of a sequence of wblocks
is always the wblock with the lowest order (smallest elements) regarding the
original order of these blocks.
We rearrange all blocks except of the two undersized blocks w1and v?n
so that the following 3 properties hold:
(1) If a vblock is followed by a wblock, then the the last element of the vblock
must be smaller than the first element of the w block (Fig. 2(b)).
(2) If a wblock is followed by a vblock, then the first element of the wblock
must be smaller or equal to the last element of the vblock (Fig. 2(b)).
(3) The relative order of the vblocks as well as wblocks stays unchanged.
This rearrangement can be easily realized by “rolling” the wblocks through
the vblocks and “drop” minimal wblocks so that the above properties are ful
filled. During this rolling the wblocks stay together as group but they can be
moved out of order. So, due to the need for stability, we have to track their po
sitions. For this reason we mirror all block replacements in the buffer area using
a technique called movement imitation (The technique of movement imitation is
described e.g. in [10] and [6]). Every time when a minimal wblock was dropped
we can find the position of the next minimal block using this buffer area.
Later we will have to find the positions of wblocks in the blocksequence created
as output of the rearrangement process. For this purpose we store the positions
of wblocks in the block distribution storage as follows:
The block distribution storage consist of two sections of size ?m/δ?+?n/δ? and
the ith element of the first section together with the ith element of the second
section belong to the ith block in the result of the rearrangement process. Please
note that, due to the technique used for constructing the bdstorage, such pairs
of elements are always different with the first one smaller than the second one. If
the ith block originates from w we exchange the corresponding elements in the
bdstorage otherwise we leave them untouched. Fig. 2(c) shows this graphically.
Step 4: Local merges
We visit every wblock and proceed as follows:
Let p be the wblock to be merged and let q be the sequence of all voriginating
elements immediately to the right of p that are still unmerged. Further let x be
the first element of p.
(1) Using a binary search we split q into q = q1q2so that we get q1< x ≤ q2.
It holds q1 < δ due to the block rearrangement applied before. (2) We rotate
pq1q2to q1pq2. (3) We locally merge p and q2by Hwang and Lin’s algorithm,
where we use the buffer area as internal buffer.
This visiting process starts with the rightmost wblock and moves sequen
tially wblock by wblock to the left. The positions of the wblocks are detected
using the information hold in the bdstorage. Every time when we locate the
position of a wblock in the bdstorage we bring the corresponding bdstorage
elements back to their original order. So, after finishing all local merges both
sections of the bdstorage are restored to their original form.
Step 5: Final sweeping up
On the left there is a still unmerged subsequence s1ts2bw1v?where v?is the
δ?+1,
Page 7
subsection of v that consists of the remaining unmerged elements. We proceed
as follows: (1) We split v?into v?= v?
element of s2. Afterward we rotate bw1v?
v?
way we split v?
last element of s1. We rotate s1ts2v1,1v1,2to s1v1,1ts2v1,2and locally merge s1
with v?
a blocksize of ?√m?. (3) We sort the buffer area using InsertionSort and
merge it with all elements right of it using the rotation based variant of Hwang
and Lin’s algorithm.
Lack of Space in Step 1:
The inputs are so asymmetric that u?becomes empty. Using a binary search
we split v into v = v1v2 so that we get v1 < t ≤ v2 and rotate s1ts2v1v2
to s1v1ts2v2. Using the BlockRotationMerge algorithm with a blocksize
?√m? we locally merge s1with v1and s2with v2. If s2is empty we ignore it
and directly merge s1with v in the same style.
Extracted buffer smaller than ?√m? in Step 2:
We assume that we could extract a buffer of size λ with λ < ?√m?. We change
our blocksize δ to ?u/λ? and apply the algorithm as described but with the
modification that we use the rotation based variant of Hwang and Lin’s algorithm
for all local merges.
1v?
2so that v?
1v?
1< x ≤ v?
1bw1v?
2where x is the last
2and locally merge w1and
2to v?
2using Hwang and Lin’s algorithm with the internal buffer. (2) In the same
1into v?= v?
1,1v?
1,2so that we get v?
1,1< y ≤ v?
1,2where y is the
1,1and s2with v1,2’ using the BlockRotationMerge algorithm with
Corollary 3. StableOptimalBlockMerge is stable.
Theorem 1. The StableOptimalBlockMerge algorithm requires O(m+
n) = O(m · (k + 1)) assignments..
Proof. It is enough to prove that every step is performed with O(m + n) as
signments. In the first step no assignments occur at all. The buffer extraction in
step 2 requires O(m) assignments, as shown in [6]. In step 3 the “rolling” of the
wblocks through the vblocks together with the “dropping” of the minimal w
blocks requires 3√m·(√m+
integrated “movement imitation” contribute O(√m·(√m+
signments. The marking of the positions of the wblocks in the bdstorage needs
O(√m) assignments. So, altogether step 3 requires O(m+n) assignments. In step
4 each wblock rotation requires√m+√m+gcd(√m,√m) = 3√m assignments
at most. So all wblock rotations need 3√m·√m = O(m) assignments. The local
mergings using Hwang and Lin’s algorithm consume 2m + n assignments alto
gether. The reconstruction of the original order of the exchanged elements in the
bdstorage contributes O(√m) assignments. In step 5 the first rotation requires
4√m assignments at most and the local merging of w1and v?
ments at most. The second rotation requires 3√m +
The success in step 1 implies that roughly k ≤√m/2, so we get k ·√m ≤ m.
Further we have ?m/δ? + ?n/δ? is roughly equal to (k + 1) ·√m =
according to Lemma 1 each local merging with BlockRotationMerge needs
(k√m)·k√m
2·√m
n
√m) = O(m+n) assignments. The rotations for the
n
√m)) = O(m+n) as
2needs 3√m assign
√massignments at most.
n
m+n
√m. So,
+2·n+6k√m+k(√m√m) ≤k·m
2+2n+6m+k·m assignments at
Page 8
most. The buffer sorting using insertion sort contributes O(m) assignments and
the final call of Hwang and Lin’s algorithm requires n + m +√m assignments.
So, step 5 needs altogether O(m + n) assignments as well.
In the first exceptional case “Lack of Space in Step 1” we have roughly k ≥√m/2
and directly switch to BlockRotationMerge. According to Corollary 1
BlockRotationMerge is fully asymptotically optimal for such k.
In the second exceptional case “Extracted buffer smaller than ?√m?” we change
the blocksize to ?u/λ? with λ <√m and use the rotation based variant of
Hwang and Lin’s algorithm for local merges. A recalculation of the steps 3 to 5,
were we use Lemma 3 in the context of all local merges, proves that the number
of assignments is still O(m + n) .
Lemma 4. If k =?n
Proof. It holds because the function logx is concave.
? ?
i=1kifor any ki> 0 and integer n > 0, then?n
i=1logki≤
nlog(k/n).
? ?
Theorem 2. The
O(mlog(n
StableOptimalBlockMerge
m+ 1)) = O(mlog(k + 1)) comparisons.
algorithm requires
Proof. As in the case of the assignments it is enough to show that every step
keeps the asymptotic optimality. Step 1 contains one binary search over m
merely. The buffer extraction in step 2 requires m comparisons at most, as shown
in [6]. The rearrangement of all blocks except of the two undersized blocks w1
and v?n
the minimal element in the movement imitation buffer demands√m·√m many
comparisons at most. In step 4 the binary searches for splitting the qsequences
cost√m·log√m comparisons at most. Now let (m1, n1),(m2, n2),···,(mr, nr)
be the sizes of all rgroups that are locally merged by Hwang and Lin’s algo
rithm. According to Lemma 4, Table 1 and since r <√m this task requires
?r
√m(√mlogn
comparisons. The asymptotic optimality in step 5 as well as in the exceptional
case “Lack of Space in Step 1” is obvious due to Lemma 2. The change of the
blocksize in the second exceptional case “Extracted buffer smaller than ?√m?”
triggers a simple recalculation of step 3 and step 4, where we leave the details
to the reader.
δ?+1in step 3 requires 2√m+
n
√mcomparisons at most. The detection of
i=1(mi(log(ni
2m =?r
mi) + 1) + mi) =?r
i=1(milog(ni
mi) + 2mi) ≤?r
m+ 1)) + 2m = O(mlog(n
i=1milog(ni
mi) +
i=1(milogni− milogmi) + 2m =√m?r
i=1(logni− logmi) + 2m ≤
r−√mlogm
r) + 2m ≤ m(log(n
m+ 1))
? ?
Corollary 4. StableOptimalBlockMerge is an asymptotically fully op
timal stable inplace merging algorithm.
Pseudocode implementations for the core operations “block rearrangement” and
“local merges” are given in Alg. 1 and Alg. 2, respectively. Both code segments
contain calls of the toolbox algorithms mentioned in section 2. The Pseudocode
definitions for these toolbox algorithms are summarized in Tab. 2.
Page 9
Pseudocode Definition
HwangAndLin(A,first1,first2,last) u is in A[first1 : first2 − 1],
BinarySearch(A,first,last,x)
Description of the Arguments
v is in A[first2 : last − 1]
delivers the position of the
first occurrence of x in A[first : last−1]
delivers the index of the minimal element
in A[pos1 : pos2 − 1]
u is in A[pos1 : pos1 + len − 1],
v is in A[pos2 : pos2 + len − 1]
u,v as in HwangAndLin
Minimum(A,pos1,pos2)
BlockSwap(A,pos1,pos2,len)
BlockRotate(A,first1,first2,last)
Exchange(A,pos1,pos2) is equal to BlockSwap(A,pos1,pos2,1).
Table 2. Pseudocode Definitions of the Toolbox Algorithms
Algorithm 1 Pseudocode of the procedure for the block rearrangement
RearrangeBlocks(A, first1, first2, last, buf, bds1, bds2, blockSize)
1
? w2...wx is in A[first1 : first2 − 1], v1...vy−1 is in A[first2 : last − 1]
2
3
4
5
bufEnd ← buf + (first2 − first1) / blockSize
6
minBlock ← first1
7
while first1 < first2
8
do if first2 + blockSize < last and A[first2 + blockSize − 1] < A[minBlock]
9
then BlockSwap(A, first1, first2, blockSize)
10
BlockRotation(A, buf, buf + 1, bufEnd)
11
if minBlock = first1
12
then minBlock ← first2
13
first2 ← first2 + blockSize
14
else BlockSwap(A, minBlock, first1, blockSize)
15
Exchange(A, buf, buf + (minBlock − first1) / blockSize)
16
Exchange(A, bds1, bds2)
17
buf ← buf + 1
18
if buf < end
19
then minIndex ← Minimum(A,buf, bufEnd)
20
minBlock ← first1 + (minIndex − buf) ∗ blockSize
21
bds1 ← bds1 + 1; bds2 ← bds2 + 1
22
first1 = first1 + blockSize
? buffer b is in A[buf : buf +?√m?
− 1]
? bdstorage s{12}is in A[bds{12} : bds{12} +?√m?
+?n/√m?
− 1]
3.1Optimizations
We now report about several optimizations that help improving the performance
of the algorithm without any impact on its asymptotic properties. The immediate
mirroring of all wblock movements in the movement imitation buffer (occurs in
Step 3) triggers a rotation (line 10 in Alg. 1) every time when a vblock is moved
Page 10
Algorithm 2 Pseudocode of the function for local merges
LocalMerges(A, first, last, buf, bds1, bds2, blockSize, numBlocks)
1
? A[first : last − 1] contains all blocks in distributed form
2
3
index ← ((last − first) / blockSize) − 1
4
while numBlocks > 0
5
do while A[bsd1 + index] < A[bsd2 + index]
6
do index ← index − 1
7
first2 ← first + ((index + 1) ∗ blockSize)
8
if first2 < last
9
then b ← BinarySearch(first2,last,A[first2 − blockSize])
10
BlockRotation(A, first2 − blockSize, first2, b)
11
HwangLin(A, b − blockSize, b, last,buf)
12
last ← b − blockSize
13
Exchange(A, bds1 + index, bds2 + index)
14
numBlocks ← numBlocks − 1;index ← index − 1
15
return last
into front of the group of wblocks. The number of necessary rotations can be
reduced by first counting the number of vblocks moved into front of the w
blocks. This counting follows a single update of the movement imitation buffer
if the placement of a minimal wblock happens. In the context of the movement
of vblocks into front of wblocks (Step 3) the floating hole technique (for a
description see [4] or [6]) can be applied for reducing the number of assignments.
Similarly the floating hole technique can also be applied during the local merges
(Step 4) by combining the block swap to the internal buffer with the rotation
that moves smaller voriginating elements to the front of the wblock. In the
special case “Extracted buffer smaller then ?√m?” the sorting of the buffer b in
Step 5 is unnecessary because the buffer is already sorted after Step 3 and stays
unchanged during Step 4. InsertionSort can be replace by some more efficient
sorting algorithm. Please note that there is no need for stability in the context
of the buffer sorting because all buffer elements are distinct.
4 Experimental work
We did some experimental work with our algorithm in order to get an impression
of its performance. We compared it with the stable fully asymptotically optimal
algorithm presented in [6] as well as the simple standard algorithm that relies
on external space of size m. The results of our experimental work summarizes
Tab. 3 where every line shows average values for 50 runs with different data. We
took a standard desktop computer with 2GHz processor as hardware platform.
All coding happened in the C programming language. For the measurement
of the number of assignments we applied the optimal block rotation algorithm
Page 11
nm StableO.B.Merge
#comp #assign
2212215843212 37551852 227 5961524 49666369 335 4194239 8388608 121
2212181500433 15866835 100 1505766 17182008 122 2359288 4718592 71
221215280611 17350896 87 280412 12681115 68 2129890 4259840 64
221212
43611 44224933547330 10512479 53 2100804 4202496 63
22329
805716350956 1338589
22326
1200 15459824 1311271
22323
172 11322991 119170
22320
234163489 55 24
te : Execution time in ms, #comp : Number of comp., m,n : Lengths of inp. seq.
Table 3. Practical comparison of various merge algorithms
SOFSEM 2006 Alg.
#comp #assign te
Linear Standard Alg.
#comp #assign te
te
38150052 202 8373039 16778240 251
30749720 161 8234508 16777344 254
7535160 68 7572307 16777232 301
4163489 55 4225121 16777218 261
presented in [3]. Although this algorithm is optimal regarding the number of
assignments it is quite slow in practice due to its high computational demands.
Therefore for the time measurements we applied a blockswap based algorithm
presented e.g. in [1] using identical data.
Regarding the buffer extraction (Step 2) there are several alternatives. The ex
traction process can be started from the left end as well as from the right end
of the input and we can choose between a binary search and linear search for
the determination of the next element. All 4 possible combinations keep the
asymptotic optimality. However, there is no clear “best choice” among them be
cause the most advantageous combination can vary depending on the structure
of the input. In the context of the StableOptimalBlockMerge algorithm
we decided for the variant “starting from the left combined with binary search”,
the SOFSEM 2006 algorithm already originally chose “starting from the right
combined with linear search”.
Except for two combinations of input sizes our new algorithm is always faster
than its predecessor. The bad performance in the case (221,215) reflects the lack
of the implementation of the floating hole technique as mentioned in the section
about optimizations. The application of BlockRotationMerge triggers un
necessary rotations in the case (223,23). This can be fixed by introduction of a
check whether k ≥ m and a direct switch to the rotation based variant of Hwang
and Lin’s algorithm if true.
5 Conclusion
We investigated the problem of stable inplace merging from a ratio based point
of view by introducing a ratio k =
sequences with m ≤ n. We could show that there is a simple asymptotically fully
optimal (optimal regarding the number of comparisons as well as assignments)
stable inplace merging algorithm for any ratio k ≥√m.
In the second part of this paper we introduced a novel asymptotically fully
optimal stable inplace merging algorithm which is constructed on the founda
tion of deliberations regarding the ratio of the input sizes. Highlights of this
n
m, where m,n are the sizes of the input
Page 12
algorithm are: It has a modular structure and does not rely on techniques de
scribed by Mannila and Ukkonen [8] in contrast to all its known competitors
([10,4,6]). The tasks “blockdistribution” and “local block mergings” are modular
separated. As side effect they can share a common buffer area and the extrac
tion of a separated movement imitation buffer is not necessary. The algorithm
demands no lower bound for the size of the shorter input sequence (32 elements
in case of the alg. in [4] and 10 elements for the alg. in [6]).
Our algorithm performs for a wide range of inputs remarkably better than its
direct competitor presented in [6]. There is a superiority in particular for sym
metrically sized inputs, a fact that is of importance in the context of the Merge
sort algorithm.
The number of comparisons and assignments are good measurements for the
efficiency of merging algorithms. However, the impact of other operations as e.g.
numerical calculations and index comparisons deserves investigation as well. As
motivation we would like to refer to a well known effect with the optimal block
rotation algorithm introduced by Dudzinski and Dydek in [3]. Their algorithm
is optimal regarding the number of assignments but has a bad performance due
to a included computation of a greatest common divisor. For our further work
we plan to include deliberations regarding such so far uncounted operations.
References
1. J. Bentley. Programming Pearls. AddisonWesley, Inc, 2nd edition, 2000.
2. J. Chen.Optimizing stable inplace merging.
302(1/3):191–210, 2003.
3. K. Dudzinski and A. Dydek. On a stable storage merging algorithm. Information
Processing Letters, 12(1):5–8, February 1981.
4. V. Geffert, J. Katajainen, and T. Pasanen. Asymptotically efficient inplace merg
ing. Theoretical Computer Science, 237(1/2):159–181, 2000.
5. F.K. Hwang and S.Lin. A simple algorithm for merging two disjoint linearly ordered
sets. SIAM J. Comput., 1(1):31–39, 1972.
6. PokSon Kim and Arne Kutzner. On optimal and efficient in place merging. In Jirí
Wiedermann, Gerard Tel, Jaroslav Pokorný, Mária Bieliková, and Julius Stuller,
editors, SOFSEM 2006, volume 3831 of Lecture Notes in Computer Science, pages
350–359. Springer, 2006.
7. D. E. Knuth. The Art of Computer Programming, volume Vol. 3: Sorting and
Searching. AddisonWesley, 1973.
8. H. Mannila and Esko Ukkonen. A simple lineartime algorithm for in situ merging.
Information Processing Letters, 18:203–208, 1984.
9. L. T. Pardo. Stable sorting and merging with optimal space and time bounds.
SIAM Journal on Computing, 6(2):351–372, June 1977.
10. A. Symvonis. Optimal stable merging. Computer Journal, 38:681–690, 1995.
Theoretical Computer Science,