ArticlePDF Available

Spin-the-Bottle Sort and Annealing Sort: Oblivious Sorting via Round-Robin Random Comparisons

Authors:

Abstract

We study sorting algorithms based on randomized round-robin comparisons. Specifically, we study Spin-the-bottle sort, where comparisons are unrestricted, and Annealing sort, where comparisons are restricted to a distance bounded by a \emph{temperature} parameter. Both algorithms are simple, randomized, data-oblivious sorting algorithms, which are useful in privacy-preserving computations, but, as we show, Annealing sort is much more efficient. We show that there is an input permutation that causes Spin-the-bottle sort to require $\Omega(n^2\log n)$ expected time in order to succeed, and that in $O(n^2\log n)$ time this algorithm succeeds with high probability for any input. We also show there is an implementation of Annealing sort that runs in $O(n\log n)$ time and succeeds with very high probability.
Spin-the-bottle Sort and Annealing Sort:
Oblivious Sorting via Round-robin Random Comparisons
Michael T. Goodrich
University of California, Irvine
Abstract
We study sorting algorithms based on randomized round-robin comparisons. Specifically, we study
Spin-the-bottle sort, where comparisons are unrestricted, and Annealing sort, where comparisons are
restricted to a distance bounded by a temperature parameter. Both algorithms are simple, randomized,
data-oblivious sorting algorithms, which are useful in privacy-preserving computations, but, as we show,
Annealing sort is much more efficient. We show that there is an input permutation that causes Spin-the-
bottle sort to require Ω(n2log n)expected time in order to succeed, and that in O(n2log n)time this
algorithm succeeds with high probability for any input. We also show there is an implementation of
Annealing sort that runs in O(nlog n)time and succeeds with very high probability.
1 Introduction
The sorting problem is classic in computer science, with well over a fifty-year history (e.g., see [3, 20, 24,
39, 42]). In this problem, we are given an array, A, of nelements taken from some total order and we
are interested in permuting Aso that the elements are listed in order1. In this paper, we are interested
in randomized sorting algorithms based on simple round-robin strategies of scanning the array Awhile
performing, for each i= 1,2, . . . , n, a compare-exchange operation between A[i]and A[s], where sis a
randomly-chosen index not equal to i.
In addition to its simplicity, sorting via round-robin compare-exchange operations, in this manner, is
data-oblivious. That is, if we view compare-exchange operations as a blackbox primitive, then the sequence
of operations performed by such a randomized sorting algorithm is independent of the input permutation.
Any data-oblivious sorting algorithm can also be viewed as a sorting network [26], where the elements
in the input array are provided on ninput wires and internal gates are compare-exchange operations. Ajtai,
Koml´
os, and Szemer´
edi (AKS) [1] give a sorting network with O(nlog n)compare-exchange gates, but their
method is quite complicated and has a very large constant factor, even with known improvements [32, 38].
Leighton and Plaxton [27] and Goodrich [17] describe alternative randomized sorting networks that use
O(nlog n)compare-exchange gates and sort any given input array with very high probability. None of
these previous approaches are based on simple round-robin comparison strategies, however.
Data-oblivious sorting algorithms are often motivated from their ability to be implemented in special-
purpose hardware modules [24], but such algorithms also have applications in secure multi-party computa-
tion (SMC) protocols (e.g., see [4, 10,14,15, 28, 29]). In such protocols, two or more parties separately hold
different portions of a set of data values, {x1, x2, . . . , xn}, and are interested in computing some function,
f(x1, x2, . . . , xn), without revealing their respective data values (e.g., see [4, 28, 40]). Thus, the design of
simpler data-oblivious sorting algorithms can lead to simpler SMC protocols.
1Since we are focusing on comparison-based algorithms here, let us assume, without loss of generality, that the elements of A
are distinct, e.g., by a mapping A[i](A[i], i)and then using lexicographic ordering for comparisons.
1
arXiv:1011.2480v1 [cs.DS] 10 Nov 2010
1.1 Previous Related Work
In spite of their simplicity, we are not familiar with previous work on data-oblivious sorting algorithms
based on round-robin random comparisons. So we review below some of the previous work on sorting that
is related to the various properties that are of interest in this paper.
Sorting via Random Comparisons. Biedl et al. [5] analyze a simple algorithm, Guess-sort, which itera-
tively picks two elements in the input array at random and performs a compare-exchange for them, and they
show that this method runs in expected time Θ(n2log n). In addition, Gruber et al. [19] perform a more
exact analysis of this algorithm, which they call Bozo-sort. Neither of these papers consider round-robin
random comparisons, however.
Quicksort. Of course, the randomized Quicksort algorithm sorts via round-robin comparisons against a
randomly-chosen element, known as a pivot (e.g., see [11, 18, 36]) and this leads to a sorting algorithm
that runs in O(nlog n)time with high probability. Even so, the set of comparisons is highly dependent on
input values. Thus, randomized Quicksort is not a data-oblivious algorithm based on random round-robin
compare-exchange operations.
Shellsort. Sorting via data-oblivious round-robin random comparisons has a similar flavor to randomized
Shellsort [17], which sorts via random matchings between various subarrays of the input array. Nevertheless,
there are some important differences between randomized Shellsort and sorting via round-robin random
compare-exchange operations. For instance, the analysis of randomized Shellsort requires an extensive
postprocessing step, which we avoid in the analysis of our randomized round-robin sorting algorithms. We
also avoid the complexity of previous analyses of deterministic variants of Shellsort (e.g., see [12, 23, 33]),
such as that by Pratt [34], which leads to the best known performance for deterministic Shellsort, namely, a
worst-case running time of O(nlog2n). (See also the excellent survey of Sedgewick [37].)
Sorting via Round-robin Passes. Sorting by deterministic round-robin passes is, of course, a classic ap-
proach, as in the well-known Bubble-sort algorithm (e.g., see [11, 18, 36]). For instance, Dobosiewicz [13]
proposes sorting via various bubble-sort passes—doing a left-to-right sequence of compare-exchanges be-
tween elements at offset-distances apart. In addition, Incerpi and Sedgewick [21, 22] study a version of
Shellsort that replaces the inner-loop with a round-robin “shaker” pass (see also [9, 41]), which is a left-to-
right bubble-sort pass followed by a right-to-left bubble-sort pass. These algorithms do not ultimately lead
to a time performance that is O(nlog n), however.
1.2 Our Results
In this paper, we study two sorting algorithms based on randomized round-robin comparisons. Specifically,
we study an algorithm we are calling “Spin-the-bottle sort,” where comparisons in each round are arbitrary,
and an algorithm we are calling “Annealing sort,” where comparisons are restricted to a distance bounded
by a temperature parameter. These algorithms are therefore similar to one another, with both being simple,
data-oblivious sorting algorithms based on round-robin random compare-exchange operations.
Their respective performance is quite different, however, in that we show there is an input permutation
that causes Spin-the-bottle sort to require an expected running time that is Ω(n2log n)in order to succeed,
and that Spin-the-bottle sort succeeds with high probability for any input permutation in O(n2log n)time.
That is, Spin-the-bottle sort has an asymptotic expected running time that is actually worse than Bubble sort!
Thus, it is perhaps a bit surprising that, with just a couple of minor changes, Spin-the-bottle sort can be
transformed into Annealing sort, which is much more efficient. In particular, Annealing sort is derived by
applying the simulated annealing [25] meta-heuristic to Spin-the-bottle sort. There are, of course, multiple
ways to apply this meta-heuristic, but we show there is a version of Annealing sort that runs in O(nlog n)
time and succeeds with very high probability2.
2We say an algorithm succeeds with very high probability if success occurs with probability 11/nρ, for some constant ρ1.
2
2 Spin-the-bottle Sort
The simplest sorting algorithm we consider in this paper is Spin-the-bottle sort3, which is given in Figure 1.
while Ais not sorted do
for i= 1 to ndo
Choose suniformly and independently at random from {1,2, . . . , i 1, i + 1, . . . , n}.
if (i<sand A[i]> A[s]) or (i>sand A[i]< A[s]) then
Swap A[i]and A[s].
Figure 1: Spin-the-bottle sort.
The test for Abeing sorted is either done via a straightforward linear-time scan of Aor by a heuristic
based on counting the number rounds needed until it is highly likely that Ais sorted. In the latter case, this
leads to a data-oblivious sorting algorithm, that is, a sorting algorithm for which the sequence of comparison-
exchange operations is independent of the values of the input, depending only on its size.
2.1 A Lower Bound on the Expected Running Time of Spin-the-bottle Sort
Our analysis of Spin-the-bottle sort is fairly straightforward and shows that this algorithm is asymptotically
worse than almost all other published sorting algorithms. Nevertheless, let us go through some details of
this analysis, as it provides some intuition of how improvements can be made, which in turn leads to a much
more efficient algorithm, Annealing sort.
Let us begin with a lower bound on the expected running time for Spin-the-bottle sort. As was done in
the analysis of Guess-Sort [5], let us consider the input array A= (2,1,4,3, . . . , n, n 1), albeit now with
a different argument as to why this is a difficult input instance.
This array has N=n/2inversions, with each element participating in exactly one inversion. During any
scan of A, each element that has yet to have its inversion resolved has a probability of 1/(n1) of resolving
its inversion. Considering the sequence of compare-exchange operations that Spin-the-bottle sort performs
until Ais sorted, let us divide this sequence into maximal epochs of comparisons that do not resolve an
inversion followed by one that does. Let X1, X2, . . . , XNbe a set of random variables where Xidenotes
the number of comparisons performed in epoch i, and observe that there are Niinversions remaining in
Aafter epoch i. Likewise, let Y1, Y2, . . . , YNbe a set of random variables where Yidenotes the number of
comparisons performed in epoch i, but only counting each comparison done such that its element, A[i], has
not had its inversion resolved in a previous epoch. Note that
XinYi
n2(i1),
since one full round performed in epoch iinvolves ncomparisons, of which n2(i1) are for elements
that have yet to have their inversions resolved.
The running time of Spin-the-bottle sort is proportional to
X=
N
X
i=1
Xi.
3The name comes from a party game, Spin the bottle, where a group of players sit in a circle and take turns, in a round-robin
fashion, spinning a bottle in the middle of the circle. When it is a player’s turn, he or she spins the bottle and then kisses the person
of the appropriate gender nearest to where the bottle points.
3
Each Yiis a geometric random variable with parameter p= 1/(n1); hence, E(Yi) = n1. Thus,
E(X) = E N
X
i=1
Xi!
E N
X
i=1
nYi
n2(i1)!
n
N
X
i=1 E(Yi)
n2(i1) 1
=n(n1)
N
X
i=1
1
n2(i1) nN
=n(n1)Hn/4/2n2/2,
where Hmdenotes the mth Harmonic number. Thus, E(X)is Ω(n2log n)for this input array, giving us the
following.
Theorem 2.1: There is an input causing Spin-the-bottle sort to have an expected running time of Ω(n2log n).
An important lesson to take away from the proof of the above theorem is that a set of inversions between
pairs of close-by elements in Ais sufficient to cause Spin-the-bottle sort to have a relatively large expected
running time. Intuitively, the algorithm is spending a lot of time for each element A[i]looking throughout
the entire array for an inversion that is caused by an element right “next door” to A[i]. Interestingly, this
same intuition applies to our upper bound for the running time of Spin-the-bottle sort.
2.2 An Upper Bound on the Running Time of Spin-the-bottle Sort
Let us now consider an upper bound on the running time of Spin-the-bottle sort. Our analysis is based
on characterizations involving M, the number of inversions present in Awhen it is given as input to the
algorithm. Let Mjdenote the number of inversions that exist in Aat the beginning of round j(where a
round involves a complete scan of A), so M1=M. In addition, let mi,j denote the number of inversions
that exist at the beginning of round jand involve A[i], and observe that
n
X
i=1
mi,j = 2Mj.
We divide the course of the algorithm into three phases, depending on the value of Mj:
Phase 1: Mj12nlog n
Phase 2: 12nMj<12nlog n
Phase 3: Mj<12n.
Theorem 2.2: Given an array Aof nelements, the three phases of Spin-the-bottle sort run in O(n2log n)
time and sort Awith very high probability.
Proof: See Appendix A.
This, of course, is no great achievement, since there are several simple deterministic data-oblivious
sorting algorithms that run in O(nlog2n)time and even Bubble sort itself is faster than Spin-the-bottle sort,
running in O(n2)time. But the above three-phase characterization nevertheless gives us some intuition that
leads to a more efficient sorting algorithm, which we discuss next.
4
3 Annealing Sort
The sorting algorithm we discuss in this section is based on applying the simulated annealing [25] meta-
heuristic to the sorting problem. Following an analogy from metallurgy, simulated annealing involves
solving an optimization problem by a sequence of choices, such that choice jis made from among some rj
neighbors of a current state that are confined to be within a distance bounded from above by a parameter
Tj(according to an appropriate metric). Given the metallurgical analogy, the parameter Tjis called the
temperature, which is gradually decreased during the algorithm according to an annealing schedule, until
it is 0, at which point the algorithm halts.
Let us apply this meta-heuristic to sorting, which is admittedly not an optimization problem, so some
adaption is required. That is, let us view each round in a sorting algorithm that is similar to Spin-the-bottle
sort as a step in a simulated annealing algorithm. Since each compare-exchange operation is chosen at
random, let us now limit, in round j, the distance between candidate comparison elements to a parameter
Tj, so as to implement the temperature metaphor, and let us also repeat the random choices for each element
rjtimes, so as to implement a notion of neighbors of the current state under consideration. The sequence of
Tjand rjvalues defines the annealing schedule for our Annealing sort.
Formally, let us assume we are given an annealing schedule defined by the following:
Atemperature sequence,T= (T1, T2, . . . , Tt), where TiTi+1, for i= 1, . . . , t 1, and Tt= 0.
Arepetition sequence,R= (r1, r2, . . . , rt), for i= 1, . . . , t.
Given these two sequences, Annealing sort is as given in Figure 2.
for j= 1 to tdo
for i= 1 to n1do
for k= 1 to rjdo
Let sbe a random integer in the range [i+ 1,min{n, i +Tj}].
if A[i]> A[s]then
Swap A[i]and A[s]
for i=ndownto 2do
for k= 1 to rjdo
Let sbe a random integer in the range [max{1, i Tj}, i 1].
if A[s]> A[i]then
Swap A[i]and A[s]
Figure 2: Annealing sort. It takes as input an array, A, of nelements and an annealing schedule defined by
sequences, T= (T1, T2, . . . , Tt)and R= (r1, r2, . . . , rt). Note that if the compare-exchange operations
are performed as a blackbox, then the algorithm is data-oblivious.
The running time of Annealing sort is O(nPt
j=1 ri)and its effectiveness depends on the annealing
schedule, defined by T= (T1, T2, . . . , Tt)and R= (r1, r2, . . . , rt). Fortunately, there is a three-phase
annealing schedule that causes Annealing sort to run in O(nlog n)time and succeed with very high proba-
bility:
Phase 1. For this phase, let T1= (2n, 2n, n, n, n/2, n/2, n/4, n/4...,qlog6n, q log6n)be the
temperature sequence and let R1= (c, c, . . . , c)be an equal-length repetition sequence (of all c’s),
where q1and c > 1are constants.
Phase 2. For this phase, let T2= (qlog6n, (q/2) log6n, (q/4) log6n, . . . , g log n)be the temper-
ature sequence and let R2= (r, r, . . . , r)be an equal-length repetition sequence, where qis the
constant from Phase 1, g1is a constant determined in the analysis, and ris Θ(log n/ log log n).
Phase 3. For this phase, let T3and R3be sequences of length glog nof all 1’s.
5
Given the annealing schedule defined by T= (T1,T2,T3,0) and R= (R1,R2,R3,0), note that the
running time of Annealing sort is O(nlog n). Let us therefore analyze its success probability.
3.1 Analysis of Phase 1
Our analysis for Phase 1 borrows some elements from our analysis of randomized Shellsort [17], as this
algorithm has a somewhat similar structure of a schedule of random choices that gradually reduce in scope.
The Probabilistic Zero-One Principle. We begin our analysis with a probabilistic version of the zero-one
principle (e.g., see Knuth [24]).
Lemma 3.1 [6, 17, 35]: If a randomized data-oblivious sorting algorithm sorts any array of 0’s and 1’s of
size nwith failure probability at most , then it sorts any array of size nwith failure probability at most
(n+ 1).
This lemma is clearly only of effective use for randomized data-oblivious algorithms that have fail-
ure probabilities that are O(nρ), for some constant ρ > 1, i.e., algorithms that succeed with very high
probability.
Shrinking Lemmas. As we move up and down Ain a single pass, let us assume that we are considering
the affect of this pass on an array Aof zeroes and ones, reasoning about how this pass impacts the ones
“moving up” in A. We can prove a number of useful “shrinking” lemmas for the number of ones that remain
in various regions (i.e., subarrays) of Aduring this pass. (Symmetric lemmas hold for the 0’s with respect
to their downward movement in A.)
Lemma 3.2 (Sliding-Window Lemma): Let Bbe a subarray of Aof size N, and let Cbe the subarray of
Aof size 4Nimmediately after B. Suppose further there are k4βN ones in BC, for 0< β < 1. Let
k(c)
1be the number of ones in Bafter a single up-and-down pass of Annealing sort with temperature 4Nand
repetition factor c. Then
Pr k(c)
1>max {2βcN, 8elog n}min{2βcN/2, n4}.
Proof: For a one to remain in a given location in Bit must be matched with a one in each of its ccompare-
exchange operations in BC(and note that this is the extent of possibilities, since the temperature is 4N).
Moreover, we may pessimistically assume each such c-ary test will occur independently for each possible
position in Bwith probability at most βc. Thus,
E(k(c)
1)βcN.
Since k(c)
1can, in this case, be viewed as the sum of Nindependent 0-1 random variables, we can apply a
Chernoff bound (e.g., see [30,31]) to establish
Pr k(c)
1>2βcN2βcN/2,
for the case when our bound on E(k(c)
1)is greater than 4elog n. When this bound is less than or equal to
4elog n, we can use a Chernoff bound to establish
Pr k(c)
1>8elog n22elog nn4.
6
Lemma 3.3: Suppose we are given two regions, Band C, of A, of size Nand αN , respectively, for
0< α < 4, that are contained inside a subarray of Aof size 4N, with Bto the left of C, and let k=k1+k2,
where k1(resp., k2) is the number of ones in B(resp., C). Let k(c)
1be the number of ones in Bafter a single
up-and-down pass of Annealing sort with temperature 4Nand repetition factor c. Then
Ek(c)
1k11α
4+k2
4Nc
.
Proof: A one may possibly remain in Bafter a single (up) pass of Annealing sort with temperature 4N,
with respect to a single random choice, if it is matched with a one in Cor not matched with an element in
Cat all. In a single random choice, with probability 1α/4, it is not matched with an element in C, and,
if matched with an element in C, which occurs with probability α/4, the probability that it is matched with
a one is k2/(αN).
Lemma 3.4 (Fractional-Depletion Lemma): Given two regions, Band C, in A, of size Nand αN , re-
spectively, for 0< α < 4, such that Band Care contained in a subarray of Aof size 4N, with Bto the
left of C, let k=k1+k2, where k1and k2are the respective number of ones in Band C, and suppose
k4βN , for 0< β < 1. Let k(c)
1be the number of ones in Bafter a single up-pass of Annealing sort with
temperature 4Nand repetition factor c. Then
Pr k(c)
1>max 21α
4+βc
N, 8elog nmin{2(1α/4 + β)cN/2, n4}.
Proof: By Lemma 3.3, applied to this scenario,
E(k(c)
1)k11α
4+4βN
4Nc
1α
4+βc
N.
Since k(c)
1can be viewed as the sum of k1independent 0-1 random variables, we can apply a standard
Chernoff bound (e.g., see [30,31]) to establish
Pr k(c)
1>21α
4+βc
N2(1α/4 + β)cN/2,
for the case when our bound on E(k(c)
1)is greater than 4elog n. When this bound is less than or equal to
4elog n, we can use a Chernoff bound to establish
Pr k(c)
1>8elog n22elog nn4.
Lemma 3.5 (Startup Lemma): Given two regions, Band C, in A, of size Nand αN , respectively, for
0< α < 4, contained in a subarray of Aof size 4N, with Bto the left of C, let k=k1+k2, where k1and
k2are the respective number of ones in Band C, and suppose k4βN , for 0< β < 1. Let k(c)
1be the
number of ones in Bafter one up-pass of Annealing sort with temperature 4Nand repetition factor c. Then,
for any constant λ > 0such that 1α/4 + βλ1, for some constant 0<<1, there is a constant
c > 1such that k(c)
1λN, with very high probability, provided Nis Ω(log n).
Proof: By Lemma 3.3, so long as k1λN, then
E(k(c)
1)1α
4+4βN λN
4Nc
N
1α
4+βλc
N
(1 )cN.
7
Of course, we are done as soon as k1λN , and note that, for clog1/(1)λ/2, we have E(k(c)
1)
λN/2. Thus, by a Chernoff bound, for such a constant c,
Pr k(c)
1> λN= Pr k(c)
1>2λN/22λN/4.
The proof follows then, the fact that Nis Ω(log n).
Having proven the essential properties for the compare-exchange passes done in each round of Phase 1
of Annealing sort, let us now turn to the actual analysis of Phase 1.
Bounding Dirtiness after each Iteration. In the 2d-th iteration of Phase 1, imagine that we partition the
array Ainto 2dregions, A0,A1,. . .,A2d1, each of size n/2d. Moreover, every two iterations with the same
temperature splits a region from the previous iteration into two equal-sized halves. Thus, the algorithm can
be visualized in terms of a complete binary tree, B, with nleaves. The root of Bcorresponds to a region
consisting of the entire array Aand each leaf4of Bcorresponds to an individual cell, ai, in A, of size 1. Each
internal node vof Bat depth dcorresponds with a region, Ai, created in the 2d-th iteration of the algorithm,
and the children of vare associated with the two regions that Aiis split into during iteration 2(d+ 1).
The desired output, of course, is to have each leaf value, ai= 0, for i<nk, and ai= 1, otherwise.
We therefore refer to the transition from cell nk1to cell nkon the last level of Bas the crossover
point. We refer to any leaf-level region to the left of the crossover point as a low region and any leaf-level
region to the right of the crossover point as a high region. We say that a region, Ai, corresponding to an
internal node vof B, is a low region if all of vs descendents are associated with low regions. Likewise, a
region, Ai, corresponding to an internal node vof , is a high region if all of vs descendents are associated
with high regions. Thus, we desire that low regions eventually consist of only zeroes and high regions
eventually consist of only ones. A region that is neither high nor low is mixed, since it is an ancestor of both
low and high regions. Note that there are no mixed leaf-level regions, however.
Also note that, since Phase 1 is data-oblivious, the algorithm doesn’t take any different behavior de-
pending on whether is a region is high, low, or mixed. Nevertheless, given the shrinking lemmas presented
above, we can reason about the actions of our algorithm on different regions in terms of any one of these
pairs.
With each high (resp., low) region, Ai, define the dirtiness of Aito be the number of zeroes (resp., ones)
that are present in Ai, that is, values of the wrong type for Ai. With each region, Ai, we associate a dirtiness
bound, δ(Ai), which is a desired upper bound on the dirtiness of Ai. For each region, Ai, at depth din B,
let jbe the number of regions from Aito the crossover point or mixed region on that level. That is, if Ai
is next to the mixed region, then j= 1, and if Aiis next to a region next to the mixed region, then j= 2,
and so on. In general, if Aiis a low leaf-level region, then j=nki1, and if Aiis a high leaf-level
region, then j=jn+k. We define the desired dirtiness bound,δ(Ai), of Aias follows:
If j2, then
δ(Ai) = n
2d+j+3 .
If j= 1, then
δ(Ai) = n
5·2d.
If Aiis a mixed region, then
δ(Ai) = |Ai|.
4This is a slight exaggeration, of course, since we terminate Phase 1 when regions have size O(log6n).
8
Thus, every mixed region trivially satisfies its desired dirtiness bound.
Because of our need for a high probability bound, we will guarantee that each region Aisatisfies its
desired dirtiness bound, w.v.h.p., only if δ(Ai)8elog n. If δ(Ai)<8elog n, then we say Aiis an
extreme region, for, during our algorithm, this condition implies that Aiis relatively far from the crossover
point. We will show that the total dirtiness of all extreme regions is O(log3n)w.v.h.p. This motivates our
termination of Phase 1 when the temperature is O(log6n).
Lemma 3.6: Suppose Aiis a low (resp., high) region and is the cumulative dirtiness of all regions to the
left (right) of Ai. Then any compare-exchange pass over Acan increase the dirtiness of Aiby at most .
Proof: If Aiis a low (resp., high) region, then its dirtiness is measured by the number of ones (resp., zeroes)
it contains. During any compare-exchange pass, ones can only move right, exchanging themselves with
zeroes, and zeroes can only move left, exchanging themselves with ones. Thus, the only ones that can move
into a low region are those to the left of it and the only zeroes that can move into a high region are those to
the right of it.
The inductive claim we show in Appendix B holds with very high probability is the following.
Claim 3.7: After iteration d, for each region Ai, the dirtiness of Aiis at most δ(Ai), provided Aiis not
extreme. The total dirtiness of all extreme regions is at most 8ed log2n.
3.2 Analysis of Phase 2
Claim 3.7 is the essential condition we need to hold at the start of Phase 2. In this section, we analyze the
degree to which Phase 2 increases the sortedness of the array Afurther from this point.
At the beginning of Phase 2, the total dirtiness of all extreme regions is at most 8elog3n, and the size
of each such region is glog6n, for g= 64e2. Without loss of generality, let us consider a one in an extreme
low region. The probability that such a one fails to be compared with a zero to its right in a round of Phase 2
is at most 1/N1/2, provided gis large enough. Thus, with r=hlog n/ log log n, the probability such a one
fails to be compared with a 0 after rrandom comparisons at distance Nis at most
1
N1/2hlog n/ log log n
1
N(h/2) log n/ log log n
1
(log n)(h/2) log n/ log log n
=1
nh/2,
since Nlog nduring Phase 2. Thus, with very high probability, there are no dirty extreme regions after
one round of Phase 2.
Consider next a non-extreme low region that is not mixed. By Claim 3.7, the dirtiness of such a region,
and all regions to its left, is, with very high probability, at most 7N/10. Thus,
E(k(r)
1)13
20r
N
e(20/3)rN.
Therefore, by a Chernoff bound, for dand nlarge enough,
Pr k(r)
1> d log N(eN )dlog N
(e(20/3)dlog n/ log log n)dlog N
9
1
edlog n
1
nd.
Note that in the next round after this, such a region will become completely clean, w.v.h.p., since its
dirtiness is below 1/N1/2w.v.h.p.
In addition, by Lemma 3.5, since Nis Ω(log n)throughout Phase 2, then, w.v.h.p, the dirtiness of
regions separate from a mixed region is at most N/6. Thus, the above analysis applies to them as well, once
they are separate from a mixed region.
Therefore, by the end of Phase 2, w.v.h.p., the only dirty regions are either mixed or within distance 2of
a mixed region. In other words, the total dirtiness of the array Aat the end of Phase 2 is O(log n).
3.3 Analysis of Phase 3
Each round of Phase 3 is guaranteed to decrease the dirtiness of Aby at least 1so long as Ais not completely
clean. This property is similar to the reason why Bubble sort works. Namely, using the zero-one principle,
note that the leftmost one in Awill always move right until it encounters another one. Thus, a single up-pass
in Aeliminates the leftmost one having a zero somewhere to its right. Likewise, a single down-pass in A
eliminates the rightmost zero having a one somewhere to its left. Thus, since the total dirtiness of Ais
O(log n)w.v.h.p., Phase 3 will completely sort Aw.v.h.p.
Therefore, we have the following.
Theorem 3.8: Given an array Aof nelements, there is an annealing schedule that cause the three phases
of Annealing sort to run in O(nlog n)time and leave Asorted with very high probability.
4 Conclusion
We have given two related data-oblivious sorting algorithms based on iterated passes of round-robin random
comparisons. The first, Spin-the-bottle sort requires an expected Ω(n2log n)time to sort some inputs and
in O(n2log n)time it will sort any given input sequence with very high probability. The second, Annealing
sort, on the other hand, can be designed to run in O(nlog n)time and sort with very high probability.
Some interesting open problems include the following.
Our analysis is, in many ways, overly pessimistic, in order to show that Annealing sort succeeds with
very high probability. Is there a simpler and shorter annealing sequence that causes Annealing sort to
run in O(nlog n)time and sort with very high probability?
Both Spin-the-bottle sort and Annealing sort are highly sequential. Is there a simple5randomized
sorting network with depth O(log n)and size O(nlog n)that sorts any given input sequence with
very high probability?
Throughout this paper, we have assumed that compare-exchange operations always return the correct
answer. But there are some scenarios when one would want to be tolerant of faulty compare-exchange
operations (e.g., see [2, 8, 16]). Is there a version of Annealing sort that runs in O(nlog n)time and
sorts with high probability even if comparisons return a faulty answer uniformly at random with
probability strictly less than 1/2?
5Leighton and Plaxton [27] describe a randomized sorting network that sorts with very high probability, which is simpler than
the AKS sorting network [1], but is still somewhat complicated. So the open problem would be to design a sorting network
construction that is clearly simpler than the construction of Leighton and Plaxton.
10
Acknowledgments
This research was supported in part by the National Science Foundation under grants 0724806, 0713046,
and 0847968, and by the Office of Naval Research under MURI grant N00014-08-1-1015.
References
[1] M. Ajtai, J. Koml´
os, and E. Szemer´
edi. Sorting in clog nparallel steps. Combinatorica, 3:1–19,
1983.
[2] S. Assaf and E. Upfal. Fault tolerant sorting networks. SIAM J. Discrete Math., 4(4):472–480, 1991.
[3] K. E. Batcher. Sorting networks and their applications. In Proc. 1968 Spring Joint Computer Conf.,
pages 307–314, Reston, VA, 1968. AFIPS Press.
[4] A. Ben-David, N. Nisan, and B. Pinkas. FairplayMP: A system for secure multi-party computation.
In CCS ’08: Proceedings of the 15th ACM conference on Computer and communications security,
pages 257–266, New York, NY, USA, 2008. ACM.
[5] T. Biedl, T. Chan, E. D. Demaine, R. Fleischer, M. Golin, J. A. King, and J. I. Munro. Fun-sort–or the
chaos of unordered binary search. Discrete Appl. Math., 144(3):231–236, 2004.
[6] D. T. Blackston and A. Ranade. Snakesort: A family of simple optimal randomized sorting
algorithms. In ICPP ’93: Proceedings of the 1993 International Conference on Parallel Processing,
pages 201–204, Washington, DC, USA, 1993. IEEE Computer Society.
[7] A. Boneh and M. Hofri. The coupon-collector problem revisited — a survey of engineering problems
and computational methods. Communications in Statistics — Stochastic Models, 13(1):39–66, 1997.
[8] M. Braverman and E. Mossel. Noisy sorting without resampling. In SODA ’08: Proceedings of the
19th ACM-SIAM Symposium on Discrete algorithms, pages 268–276, Philadelphia, PA, USA, 2008.
Society for Industrial and Applied Mathematics.
[9] B. Brejov´
a. Analyzing variants of Shellsort. Information Processing Letters, 79(5):223 – 227, 2001.
[10] R. Canetti, Y. Lindell, R. Ostrovsky, and A. Sahai. Universally composable two-party and multi-party
secure computation. In STOC ’02: Proceedings of the thiry-fourth annual ACM symposium on
Theory of computing, pages 494–503, New York, NY, USA, 2002. ACM.
[11] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press,
Cambridge, MA, 2nd edition, 2001.
[12] R. Cypher. A lower bound on the size of Shellsort sorting networks. SIAM J. Comput., 22(1):62–71,
1993.
[13] W. Dobosiewicz. An efficient variation of bubble sort. Inf. Process. Lett., 11(1):5–6, 1980.
[14] W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: a review
and open problems. In NSPW ’01: Proceedings of the 2001 workshop on New security paradigms,
pages 13–22, New York, NY, USA, 2001. ACM.
[15] W. Du and Z. Zhan. A practical approach to solve secure multi-party computation problems. In
NSPW ’02: Proceedings of the 2002 workshop on New security paradigms, pages 127–135, New
York, NY, USA, 2002. ACM.
[16] U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM J. Comput.,
23(5):1001–1018, 1994.
[17] M. T. Goodrich. Randomized Shellsort: A simple oblivious sorting algorithm. In Proceedings of the
ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1–16. SIAM, 2010.
11
[18] M. T. Goodrich and R. Tamassia. Algorithm Design: Foundations, Analysis, and Internet Examples.
John Wiley & Sons, New York, NY, 2002.
[19] H. Gruber, M. Holzer, and O. Ruepp. Sorting the slow way: an analysis of perversely awful
randomized sorting algorithms. In FUN’07: Proceedings of the 4th international conference on Fun
with algorithms, pages 183–197, Berlin, Heidelberg, 2007. Springer-Verlag.
[20] C. A. R. Hoare. Quicksort. Comput. J., 5(1):10–15, 1962.
[21] J. Incerpi and R. Sedgewick. Improved upper bounds on Shellsort. J. Comput. Syst. Sci.,
31(2):210–224, 1985.
[22] J. Incerpi and R. Sedgewick. Practical variations of Shellsort. Inf. Process. Lett., 26(1):37–43, 1987.
[23] T. Jiang, M. Li, and P. Vit´
anyi. A lower bound on the average-case complexity of Shellsort. J. ACM,
47(5):905–911, 2000.
[24] D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.
Addison-Wesley, Reading, MA, 1973.
[25] P. J. M. Laarhoven and E. H. L. Aarts, editors. Simulated annealing: theory and applications. Kluwer
Academic Publishers, Norwell, MA, USA, 1987.
[26] F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.
Morgan-Kaufmann, San Mateo, CA, 1992.
[27] T. Leighton and C. G. Plaxton. Hypercubic sorting networks. SIAM J. Comput., 27(1):1–47, 1998.
[28] D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella. Fairplay—a secure two-party computation system. In
SSYM’04: Proceedings of the 13th conference on USENIX Security Symposium, pages 20–20,
Berkeley, CA, USA, 2004. USENIX Association.
[29] U. Maurer. Secure multi-party computation made simple. Discrete Appl. Math., 154(2):370–381,
2006.
[30] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and
Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005.
[31] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, New York, NY,
1995.
[32] M. Paterson. Improved sorting networks with O(log N)depth. Algorithmica, 5(1):75–92, 1990.
[33] C. G. Plaxton and T. Suel. Lower bounds for Shellsort. J. Algorithms, 23(2):221–240, 1997.
[34] V. R. Pratt. Shellsort and sorting networks. PhD thesis, Stanford University, Stanford, CA, USA,
1972.
[35] S. Rajasekaran and S. Sen. PDM sorting algorithms that take a small number of passes. In IPDPS
’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium
(IPDPS’05) - Papers, page 10, Washington, DC, USA, 2005. IEEE Computer Society.
[36] R. Sedgewick. Algorithms in C++. Addison-Wesley, Reading, MA, 1992.
[37] R. Sedgewick. Analysis of Shellsort and related algorithms. In ESA ’96: Proceedings of the Fourth
Annual European Symposium on Algorithms, pages 1–11, London, UK, 1996. Springer-Verlag.
[38] J. Seiferas. Sorting networks of logarithmic depth, further simplified. Algorithmica, 53(3):374–384,
2009.
[39] D. L. Shell. A high-speed sorting procedure. Commun. ACM, 2(7):30–32, 1959.
[40] G. Wang, T. Luo, M. T. Goodrich, W. Du, and Z. Zhu. Bureaucratic protocols for secure two-party
sorting, selection, and permuting. In 5th ACM Symposium on Information, Computer and
Communications Security (ASIACCS), pages 226–237. ACM, 2010.
12
[41] M. A. Weiss and R. Sedgewick. Bad cases for shaker-sort. Information Processing Letters, 28(3):133
– 136, 1988.
[42] J. Williams. Algorithm 232: Heasort. Commun. ACM, 7:347–348, 1964.
13
A Proving the Correctness of Spin-the-bottle Sort
In this appendix, we prove Theorem 2.2, which states that, given an array Aof nelements, the three phases
of Spin-the-bottle sort run in O(n2log n)time and sort Awith very high probability.
The proof is based on showing that we can achieve each of the milestones marking each phase in
O(n2log n)time or better.
Phase 1. Let Xjbe a random variable that equals the number of inversions resolved in round jof Phase 1,
and let Xi,j denote an indicator random variable that is 1iff we perform a comparison in iteration (round) j
of the algorithm between A[i]and an element that caused an inversion with A[i]at the beginning of round
j. Thus,
XjPn
i=1 Xi,j
2,
since each inversion involves two elements of A. Each of the Xi,j’s are independent. Furthermore,
E(Xi,j) = mi,j
n1,
where mi,j denotes the number of inversions that exist at the beginning of round jand involve A[i]. There-
fore,
E(Xj)(1/2)
n
X
i=1
mi,j
n1=Mj/(n1),
where Mjis the number of inversions in Athat exist at the beginning of round j. Thus, by a well-known
Chernoff bound,
Pr(Xj< Mj/2(n1)) e1/2
(1/2)1/2!Mj/(n1)
2Mj/3(n1)
n4,
since we are in Phase 1. So we may assume with probability at least 1c/n3that the following recurrence
relation holds during Phase 1, for all 1jcn, for any constant c1:
Mj+1 MjMj
2n.
Therefore, with probability at least 14/n3, there are at most 4nrounds during Phase 1 of Spin-the-bottle
sort, since M1=M < n2and Mj12nlog n, for all jduring Phase 1. That is, with very high probability,
Phase 1 runs in O(n2)time.
Phase 2. For this phase, let Xjand Xi,j denote random variables defined as in our analysis of Phase 1,
with the index jreset to 1for Phase 2. In this case,
E(Xj)Mj/(n1) 12.
Thus, by a similar Chernoff bound used for analyzing Phase 1,
Pr(Xj<6) Pr(Xj< Mj/2(n1))
2Mj/3(n1)
24,
14
since we are in Phase 2. That is, with probability 1/16 we resolve fewer than 6 inversions in round jof
Phase 2. Call round jafailure in this case, and call it a success if it resolves at least 6 inversions. Let Yj
be an indicator random variable that is 1iff we resolve fewer than 6inversions in round jof Phase 2, or,
if jis larger than the number of rounds in Phase 2, then let Yjbe an independent random variable that is 1
with probability 1/16. Thus, the number of failure rounds in the first at most 4nlog nrounds of Phase 2 is
at most
Y=
4nlog n
X
j=1
Yj.
Note that E(Y) = (1/4)nlog n. Thus, by a standard Chernoff bound,
Pr(Y > 2nlog n) = Pr(Y > 8(1/4)nlog n)
e7
88!(1/4)nlog n
22nlog n
=n2n.
Note, in addition, that there can be, in total, at most 2nlog nsuccessful rounds in Phase 2. Thus, with very
high probability, there are only O(nlog n)rounds in Phase 2. That is, with very high probability, Phase 2
runs in O(n2log n)time.
Phase 3. The analysis for this phase is similar to that for the coupon collector’s problem (e.g., see [7]).
At the start of this phase, there are fewer than 12ninversions that remain in A. Note that, for any such
inversion, χ, the probability that χis resolved in a round of Phase 3 is at least61/n. Let Zr
χbe the event
that χis not resolved after rrounds of Phase 3. Thus,
Pr(Zr
χ)11
nr
er/n.
Let Rdenote the number of rounds needed to resolve all the inversions in Phase 3. Then, for c2,
Pr(R > cn ln n)Pr [
χ
Zcn log n
χ!
X
χ
Pr Zcn log n
χ
12
nc1.
Thus, with very high probability, Ris O(nlog n); hence, with very high probability, Phase 3 runs in
O(n2log n)time. This completes the proof.
6In fact, the probability that χis resolved in a round of Phase 3 is equal to 2/(n1) 1/(n1)2, since each inversion has
two chances of being resolved during a round.
15
B Proof of the Inductive Claim for Phase 1 of Annealing Sort
In this appendix, we prove Claim 3.7, which states that, after iteration d, for each region Ai, the dirtiness
of Aiis at most δ(Ai), provided Aiis not extreme, and that the total dirtiness of all extreme regions is at
most 8ed log2n. As mentioned above, this analysis for Phase 1 of Annealing sort borrows from our analysis
of randomized Shellsort [17], as there is a similar structure to our inductive argument even though the fine
details are quite different.
Let us begin at the first round, which we are viewing in terms of two regions, A1and A2, of size
N=n/2each. Suppose that knk, where kis the number of ones, so that A1is a low region and
A2is either a high region (i.e., if k=nk) or A2is mixed (the case when k > n kis symmetric). Let
k1(resp., k2) denote the number of ones in A1(resp., A2), so k=k1+k2. By the Startup Lemma (3.5),
the dirtiness of A1will be at most n/12, with very high probability, since in this case (using the notation of
that lemma and viewing Aas existing inside a larger array of size 2n), α= 1,β1/4, and λ= 1/6, so
1α/4 + βλ11/6. Note that this satisfies the desired dirtiness of A1, since δ(A1) = n/10 in
this case. A similar argument applies to A2if it is a high region, and if A2is mixed, it trivially satisfies its
desired dirtiness bound. Also, assuming nis large enough, there are no extreme regions (if nis so small that
A1is extreme, we can immediately switch to Phase 2). The next round of Annealing sort (with temperature
2n) can only improve the dirtiness in A. Thus, we satisfy the base case of our inductive argument—the
dirtiness bounds for the two children of the root of Bare satisfied with (very) high probability, and similar
arguments prove the inductive claim for iterations 3 and 4, for N=n/22and temperature n, and iterations
5 and 6 for N=n/23and temperature n/2.
Let us now consider a general inductive step. Let us assume that, with very high probability, we have
satisfied Claim 3.7 for the regions on level d3and let us now consider the transition to level d+ 1, which
occurs in iterations 2d+ 1 and 2d+ 2. In addition, we terminate this line of reasoning when the region size,
n/2d, becomes less than 64e2log6n.
Extreme Regions. Let us begin with the bound for the dirtiness of extreme regions at depth d+ 1, con-
sidering the effect of iteration 2d+ 1. Note that, by Lemma 3.6, regions that were extreme after iteration
2dwill be split into regions in iteration 2d+ 1 that contribute no new amounts of dirtiness to pre-existing
extreme regions. That is, extreme regions get split into extreme regions. Thus, the new dirtiness for extreme
regions can come only from regions that were not extreme on level dof Bthat are now splitting into extreme
regions on level d+ 1, which we call freshly extreme regions. Suppose, then, that Aiis such a region, say,
with a parent, Ap, which is jregions from the mixed region on level d. Then the desired dirtiness bound
of Ai’s parent region, Ap, is δ(Ap) = n/2d+j+3 8elog n, by Claim 3.7, since Apis not extreme. Ap
has (low-region) children, Aiand Ai+1, that have desired dirtiness bounds of δ(Ai) = n/2d+1+2j+4 or
δ(Ai) = n/2d+1+2j+3 and of δ(Ai+1) = n/2d+1+2j+3 or δ(Ai+1 ) = n/2d+1+2j+2, depending on whether
the mixed region on level d+ 1 has an odd or even index. Moreover, Ai(and possibly Ai+1) is freshly
extreme, so n/2d+1+2j+4 <8elog n, which implies that j > (log ndlog log n10)/2. Neverthe-
less, note also that there are O(log n)new regions on this level that are just now becoming extreme, since
n/2d>64e2log6nand n/2d+j+3 8elog nimplies jlog nd. So let us consider the two freshly
extreme regions, Aiand Ai+1, in turn, and how a pass of Annealing sort effects them (for after that they will
collectively satisfy the extreme-region part of Claim 3.7).
Region Ai:Consider the worst case for δ(Ai), namely, that δ(Ai) = n/2d+1+2j+4. Since Aiis
a left child of Ap,Aicould get at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by
Lemma 3.6. In addition, Aiand Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap.
Thus, if we let Ndenote the size of Ai, i.e., N=n/2d+1, then Aiand Ai+1 together have at most
N/2j+1 + 3N1/2N/2jones, since we stop Phase 1 when N < 64e2log6n. In addition, assuming
j4, regions Ai+2 and Ai+3 may inherit at most n/2d+j+2 ones from their parent and region Ai+4
may inherit at most n/2d+j+1 ones from its parent. Therefore, by the Sliding-Window Lemma (3.2),
16
with β= 5/2j+3 <1/2j, the following condition holds with probability at least 1cn4,
k(c)
1max{2βcN , 8elog n},
where k(c)
1is the number of one left in Aiafter an up-pass of Annealing sort with temperature 4Nand
repetition factor c. Note that, if k(c)
18elog n, then we have satisfied the desired dirtiness for Ai.
Alternatively, so long as c4, and j4, then w.v.h.p.,
k(c)
12βcN
n
2d+jc
n
2d+1+2j+4 8elog n=δ(Ai).
Region Ai+1:Consider the worst case for δ(Ai+1), namely δ(Ai+1) = n/2d+1+2j+3. Since, in
this case, Ai+1 is a right child of Ap,Ai+1 could get at most n/2d+j+3 + 8ed log2nones from
regions left of Ai+1, by Lemma 3.6, plus Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from
Apitself. In addition, since j3,Ai+2 and Ai+3 could inherit at most n/2d+j+2 ones from their
parent, and Ai+4 and Ai+5 could inherit at most n/2d+j+1 ones from their parent. Thus, if we
let Ndenote the size of Ai+1, i.e., N=n/2d+1 , then Ai+1 through Ai+5 together have at most
N/2j+1 + 3N1/2+N/2j+1 +N/2j4N/2jones, since we stop Phase 1 when N < 64e2log6n,
and j4. By the Sliding-Window Lemma (3.2), applied with β= 1/2j, the following condition
holds with probability at least 1cn4,
k(c)
1max{2βcN , 8elog n},
where k(c)
1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c
and temperature 4N. Note that, if k(c)
18elog n, then we have satisfied the desired dirtiness bound
for Ai+1. Alternatively, so long as c4, and j4, then w.v.h.p.,
k(c)
12βcN
n
2d+jc
n
2d+1+2j+4 8elog n=δ(Ai+1).
Therefore, if a low region Aior Ai+1 becomes freshly extreme in iteration 2d+ 1, then, w.v.h.p., its
dirtiness is at most 8elog n. Since there are at most log nfreshly extreme regions created in iteration 2d+1,
this implies that the total dirtiness of all extreme low regions in iteration 2d+ 1 is at most 8e(d+ 1) log2n,
w.v.h.p., after the right-moving pass of Phase 1, by Claim 3.7. Likewise, by symmetry, a similar claim
applies to the high regions after the left-moving pass of Phase 1. Moreover, by Lemma 3.6, these extreme
regions will continue to satisfy Claim 3.7 after this.
Non-extreme Regions not too Close to the Crossover Point. Let us now consider non-extreme regions
on level d+1 that are at least two regions away from the crossover point on level d+1. Consider, wlog, a low
region, Ap, on level d, which is jregions from the crossover point on level d, with Aphaving (low-region)
children, Aiand Ai+1, that have desired dirtiness bounds of δ(Ai) = n/2d+1+2j+4 or δ(Ai) = n/2d+1+2j+3
and of δ(Ai+1) = n/2d+1+2j+3 or δ(Ai+1 ) = n/2d+1+2j+2, depending on whether the mixed region on
level d+ 1 has an odd or even index. By Lemma 3.6, if we can show w.v.h.p. that the dirtiness of each
such Ai(resp., Ai+1) is at most δ(Ai)/3(resp., δ(Ai+1 )/3), after the up-and-down pass of Phase 1, then
17
no matter how many more ones come into Aior Ai+1 from the left during the rest of iteration 2d+ 1 (and
2d+ 2), they will satisfy their desired dirtiness bounds.
Let us consider the different region types (always taking the most difficult choice for each desired dirti-
ness in order to avoid additional cases):
Type 1: δ(Ai) = n/2d+1+2j+4, with j4. Since Aiis a left child of Ap, in this case, Aicould
get at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by Lemma 3.6. In addition, Aiand
Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. Thus, if we let Ndenote the size of Ai,
i.e., N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 3N1/2N/2jones, since we
stop Phase 1 when N < 64e2log6n. In addition, Ai+2 and Ai+3 inherit at most n/2d+j+2 ones from
their parent. Likewise, Ai+4 inherits at most n/2d+j+1 ones from its parent. Thus, Aithrough Ai+4
inherit at most N/2j+N/2j+1 +N/2jN/2j2ones. Thus, we can apply the Sliding-Window
Lemma (3.2), with β= 1/2j, so that, the following condition holds with probability at least 1n4,
provided c4and j4:
k(c)
12βcN
n
2d+1+jc1
n
3·2d+1+2j+4 =δ(Ai)/3,
where k(c)
1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor c.
Type 2: δ(Ai+1) = n/2d+1+2j+3, with j4. Since Ai+1 is a right child of Ap, in this case, Ai+1
could get at most n/2d+j+3 + 8ed log2nones from regions left of Ai+1, by Lemma 3.6, plus Ai+1
could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. In addition, since j > 2,Ai+2 and Ai+3
could inherit at most n/2d+j+2 ones from their parent. Thus, if we let Ndenote the size of Ai+1, i.e.,
N=n/2d+1, then Ai+1 ,Ai+2, and Ai+3 together have at most N/2j+ 3N1/2N/2j1ones, since
we stop Phase 1 when N < 64e2log6n. In addition, Ai+4 and Ai+5 may inherit n/2d+j+1 ones from
their parent. Thus, Ai+1 through Ai+5 may receive N/2j1+N/2jN/2j2ones. Therefore, with
β= 1/2j, we may apply the Sliding-Window Lemma (3.2) to show that, with probability at least
1n4, for j4and c4,
k(c)
12βcN
n
2d+1+jc
n
3·2d+1+2j+3 =δ(Ai+1)/3,
where k(c)
1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c.
Type 3: δ(Ai) = n/2d+1+2j+4, with j= 3. Since Aiis a left child of Ap, in this case, Aicould
get at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by Lemma 3.6. In addition, Aiand
Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. Thus, if we let Ndenote the size of
Ai, i.e., N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 3N1/2N/2j=N/23
ones, since we stop Phase 1 when N < 64e2log6n. In addition, Ai+2 and Ai+3 inherit at most
n/2d+j+2 =N/24ones from their parent. Finally, Ai+4 inherits at most n/(5 ·2d) = 2N/5ones
from its parent. Thus, Aithrough Ai+4 inherit at most N/23+N/24+ 2N/55N/23= 5N/2j
ones. Thus, we can apply the Sliding-Window Lemma (3.2), with β= 5/2j+2, so that, the following
condition holds with probability at least 1n4, for c5and j= 3:
k(c)
12βcN
18
5cn
2d+(j+2)c
n
3·2d+1+2j+4 =δ(Ai)/3,
where k(c)
1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor cand
temperature 4N.
Type 4: δ(Ai+1) = n/2d+1+2j+3, with j= 3. Since Ai+1 is a right child of Ap, in this case, Ai+1
could get at most n/2d+j+3 + 8ed log2nones from regions left of Ai+1, by Lemma 3.6, plus Ai+1
could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. In addition, since j > 2,Ai+2 and Ai+3
could inherit at most n/2d+j+2 ones from their parent. Thus, if we let Ndenote the size of Ai+1, i.e.,
N=n/2d+1, then Ai+1 ,Ai+2, and Ai+3 together have at most N/2j+ 3N1/2N/2j1ones, since
we stop Phase 1 when N < 64e2log6n. In addition, Ai+4 and Ai+5 may inherit n/(5 ·2d)ones from
their parent. Thus, Ai+1 through Ai+5 may receive N/2j1+ 2N/5<(2/3)Nones. Therefore,
with β < 1/6, we may apply the Sliding-Window Lemma (3.2) to show that, with probability at least
1n4, for j= 3 and c6,
k(c)
12βcN
n
3c2d+1
n
3·2d+1+2j+3 =δ(Ai+1)/3,
where k(c)
1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c.
Type 5: δ(Ai) = n/2d+1+2j+4, with j= 2. Since Aiis a left child of Ap, in this case, Aicould get
at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by Lemma 3.6. In addition, Aiand Ai+1
could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. Thus, if we let Ndenote the size of Ai, i.e.,
N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 3N1/2N/2j=N/22ones, since
we stop Phase 1 when N < 64e2log6n. In addition, Ai+2 and Ai+3 inherit at most 2N/5ones from
their parent. Thus, we can apply the Fractional-Depletion Lemma (3.4), with α= 3 and β < 1/6, so
that the following condition holds with probability at least 1n4, for c9and j= 2:
k(c)
121
4+1
6c
N
n
3·2d+1+2j+4 =δ(Ai)/3,
where k(c)
1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor cand
temperature 4N.
Type 6: δ(Ai+1) = n/2d+1+2j+3, with j= 2. Since Ai+1 is a right child of Ap, in this case,
Ai+1 could get at most n/2d+j+3 + 8ed log2nones from regions left of Ai+1, by Lemma 3.6, plus
Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. In addition, since j= 2,Ai+2 and
Ai+3 could inherit at most 2N/5ones from their parent, where we let Ndenote the size of Ai+1,
i.e., N=n/2d+1. Thus, Ai+1,Ai+2 , and Ai+3 together have at most N/2j+1 + 3N1/2+ 2N/5
(2/3)Nones, since we stop Phase 1 when N < 64e2log6n. Thus, Ai+1 through Ai+5 may receive
N/2j1+ 2N/5<(2/3)Nones. Therefore, with α= 3 and β < 1/6, we may apply the Fractional-
Depletion Lemma to show that, with probability at least 1n4, for c9and j= 2:
k(c)
121
4+1
6c
N
n
3·2d+1+2j+3 =δ(Ai)/3,
19
where k(c)
1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c
and temperature 4N.
Type 7: δ(Ai) = n/2d+1+2j+4, with j= 1. Since Aiis a left child of Ap, in this case, Aicould
get at most n/2d+j+2 + 8ed log2nones from regions left of Ai, by Lemma 3.6, plus Aiand Ai+1
could inherit at most δ(Ap) = n/(5 ·2d)ones from Ap. Thus, if we let Ndenote the size of Ai, i.e.,
N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 2N/5+3N1/27N/10 ones, since
we stop Phase 1 when N < 64e2log6n. Thus, we may apply the Fractional-Depletion Lemma (3.4),
with α= 1 and β= 0.175, the following condition holds with probability at least 1n4, for a
suitably-chosen constant c, with j= 1,
k(c)
12(0.925)cN
n
3·2d+1+2j+4 =δ(Ai)/3,
where k(c)
1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor c.
Thus, Aiand Ai+1 satisfy their respective desired dirtiness bounds w.v.h.p., provided they are at least two
regions from the mixed region or crossover point.
Regions near the Crossover Point. Consider now regions near the crossover point. That is, each region
with a parent that is mixed, bordering the crossover point, or next to a region that either contains or borders
the crossover point. Let us focus specifically on the case when there is a mixed region on levels dand d+ 1,
as it is the most difficult of these scenarios.
So, having dealt with all the other regions, which have their desired dirtiness satisfied after a single up-
and-down pass of Phase 1, with temperature 4N, we are left with four regions near the crossover point, each
of size N=n/2d+1, which we will refer to as A1,A2,A3, and A4. One of A2or A3is mixed—without loss
of generality, let us assume A3is mixed. At this point in the algorithm, we perform an other up-and-down
pass with temperature 4N. So, let us consider how this pass impacts the dirtiness of these four regions. Note
that, by the results of the previous pass with temperature 4N(which were proved above), we have at this
point pushed to these four regions all but at most n/2d+7 + 8e(d+ 1) log2nof the ones and all but at most
n/2d+6 + 8e(d+ 1) log2nof the zeroes. Moreover, these bounds will continue to hold (and could even
improve) as we perform the second up-and-down pass with temperature 4N. Thus, at the beginning of this
second pass, we know that the four regions hold between 2NN/32 3N1/2and 3N+N/64 + 3N1/2
zeroes and between NN/64 3N1/2and 2N+N/32 + 3N1/2ones, where N=n/2d+1 >64e2log6n.
Let us therefore consider the impact of the second pass with temperature 4Nfor each of these four regions:
A1: this region is compared to A2,A3, and A4, during the up-pass. Thus, we may apply the Fractional-
Depletion Lemma (3.4) with α= 3. Note, in addition, that, for Nlarge enough, since there are at most
2N+N/32+ 3N1/22.2Nones in all of these four regions, we may apply the Fractional-Depletion
Lemma with β= 0.55. Thus, the following condition holds with probability at least 1n4, for a
suitably-chosen constant c,
k(c)
12(0.8)cN
N
32 =δ(A1),
where k(c)
1is the number of ones left in A1after a pass of Annealing sort with repetition factor cand
temperature 4N.
A2: each element of this region is compared to elements in A3and A4in the up-pass and A1in the
down-pass. Note, however, that even if A1receives Nzeroes in the up-pass, there are still at most
20
2N+N/32 + 3N1/22.2Nones in A2A3A4. Thus, even under this worst-case scenario
(from A2’s perspective), we may apply the Startup Lemma (3.5), with α= 2,β= 0.55, and λ= 1/6,
which implies that
1α/4 + βλ11/10,
i.e., we can take = 1/10 and show that, there is a constant csuch that, w.v.h.p.,
k(c)
1N
6< δ(A2),
where k(c)
1is the number of ones left in A2after an up-pass of Annealing sort with repetition factor c
and temperature 4N.
A3: by assumption, A3is mixed, so it automatically satisfies its desired dirtiness bound.
A4: this region is compared to A1,A2, and A3, in the down-pass. Note further that, w.v.h.p., there
are at most 3N+N/64 + 3N1/23.2Nzeroes in these four regions, for large enough N. Thus,
we may apply a symmetric version of the Startup Lemma (3.5), with α= 3,β= 0.8, and λ= 1/6,
which implies
1α/4 + βλ11/10,
i.e., we can take = 1/10 and show that, there is a constant csuch that, w.v.h.p.,
k(c)
1N
6< δ(A4).
where k(c)
1is the number of ones left in A4after a down-pass of Annealing sort with repetition factor
cand temperature 4N.
Thus, after the two up-and-down passes of Annealing sort with temperature 4N, we will have satisfied
Claim 3.7 w.v.h.p. In particular, we have proved that each region satisfies Claim 3.7 after iteration 2(d+ 1)
of Phase 1 of Annealing sort with a failure probability of at most O(n4), for each region. Thus, since there
are O(n)such regions per iteration, this implies any iteration will fail with probability at most O(n3).
Therefore, since there are O(log n)iterations, and we lose only an O(n)factor in our failure probability
when we apply the probabilistic zero-one principle (Lemma 3.1), when we complete the first phase of
Annealing sort, w.v.h.p., at the beginning of Phase 2, the total dirtiness of all extreme regions is at most
8elog3n, and the size of each such region is glog6n, for g= 64e2.
21
... Compared with well-known full sorting algorithm [19][20][21][22][23], researches on top-k selecting algorithm are relatively few. We generalize the related work to three aspects. ...
... In order to get the average efficiency of DC-Top-k algorithm, we need to get the Expected value of Q at first. Note that the elements are typically assumed to be distinct [5,8,21,30], only minor adjustments are necessary to cope with duplicate elements. For analysis purposes, DC-Top-k is also based on this assumption, but it still works even if the elements are not distinct. ...
Conference Paper
Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n,k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n,k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U.S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.
... After this a cleanup phase is performed for the nearly sorted array performing two passes up and down the array. Goodrich [28] raises another variant of Shellsort by proposing Spin-the-Bottle sort and Annealing sort. Spin-the-Bottle sort and Annealing sort are two algorithms which fall under the category of random round robin comparisons. ...
Article
Full-text available
Towards the dawn of the 20 th century, there have been numerous advancements in the field of computing. This has been possible because of the cumulative efforts of not only the intelligentsia but also due to the rise in the computational capabilities of machines. One of these advancements is the number of sorting algorithms that are available today. Numerous algorithms have been developed for different purposes, each one more unprecedented than the other. This papers deals with a heterogeneous combination of such sorting algorithms, where the purpose and idea behind each one is explained in a brief yet transparent manner.
... In addition, Shellsort and all its variations (e.g., see [27]) are data-oblivious sorting algorithms, which trace their origins to a classic 1959 paper by the algorithm's namesake [29]. More recently, examples of randomized data-oblivious sorting algorithms running in O(n log n) time that sort with high probability include constructions by Goodrich [11,12] and Leighton and Plaxton [19]. ...
Article
Full-text available
We describe and analyze Zig-zag Sort--a deterministic data-oblivious sorting algorithm running in O(n log n) time that is arguably simpler than previously known algorithms with similar properties, which are based on the AKS sorting network. Because it is data-oblivious and deterministic, Zig-zag Sort can be implemented as a simple O(n log n)-size sorting network, thereby providing a solution to an open problem posed by Incerpi and Sedgewick in 1985. In addition, Zig-zag Sort is a variant of Shellsort, and is, in fact, the first deterministic Shellsort variant running in O(n log n) time. The existence of such an algorithm was posed as an open problem by Plaxton et al. in 1992 and also by Sedgewick in 1996. More relevant for today, however, is the fact that the existence of a simple data-oblivious deterministic sorting algorithm running in O(n log n) time simplifies the inner-loop computation in several proposed oblivious-RAM simulation methods (which utilize AKS sorting networks), and this, in turn, implies simplified mechanisms for privacy-preserving data outsourcing in several cloud computing applications. We provide both constructive and non-constructive implementations of Zig-zag Sort, based on the existence of a circuit known as an epsilon-halver, such that the constant factors in our constructive implementations are orders of magnitude smaller than those for constructive variants of the AKS sorting network, which are also based on the use of epsilon-halvers.
Article
Full-text available
We further simplify Paterson’s version of the Ajtai–Komlós–Szemerédi sorting network, and its analysis, mainly by tuning the invariant to be maintained.
Article
Ramsey theory is a fast-growing area of combinatorics with deep connections to other fields of mathematics such as topological dynamics, ergodic theory, mathematical logic, and algebra. The area of Ramsey theory dealing with Ramsey-type phenomena in higher dimensions is particularly useful. Introduction to Ramsey Spaces presents in a systematic way a method for building higher-dimensional Ramsey spaces from basic one-dimensional principles. It is the first book-length treatment of this area of Ramsey theory, and emphasizes applications for related and surrounding fields of mathematics, such as set theory, combinatorics, real and functional analysis, and topology. In order to facilitate accessibility, the book gives the method in its axiomatic form with examples that cover many important parts of Ramsey theory both finite and infinite. An exciting new direction for combinatorics, this book will interest graduate students and researchers working in mathematical subdisciplines requiring the mastery and practice of high-dimensional Ramsey theory.
Article
We give a sorting network withcn logn comparisons. The algorithm can be performed inc logn parallel steps as well, where in a parallel step we comparen/2 disjoint pairs. In thei-th step of the algorithm we compare the contents of registersR j(i) , andR k(i) , wherej(i), k(i) are absolute constants then change their contents or not according to the result of the comparison.
Article
The sorting network described by Ajtaiet al. was the first to achieve a depth ofO(logn). The networks introduced here are simplifications and improvements based strongly on their work. While the constants obtained for the depth bound still prevent the construction being of practical value, the structure of the presentation offers a convenient basis for further development.
Article
A standard combinatorial problem is to estimate the number (T) of coupons, drawn at random, needed to complete a collection of all possible m types. Generalizations of this problem have found many engineering applications. The usefulness of the model is hampered by the difficulties in obtaining numerical results for moments or distributions. We show two computational paradigms that are well suited for this type of problems: one, following Flajolet et al. [21], is the calculus of generating functions over regular languages. We use it to provide relatively efficient answers to several questions about the sampling process – we show it is possible to compute arbitrarily accurate approximations for quantities such as E[T], in a time which is linear in m for any type distribution, while an exponential time is required for exact calculation. It also leads to a proof of a long-standing folk-theorem, concerning the extremality of uniform reference probabilities. The second method is a generalization of the Poisson transform [25], which we use to discuss statistical estimation procedures