Content uploaded by Michael T. Goodrich

Author content

All content in this area was uploaded by Michael T. Goodrich on Sep 29, 2014

Content may be subject to copyright.

Spin-the-bottle Sort and Annealing Sort:

Oblivious Sorting via Round-robin Random Comparisons

Michael T. Goodrich

University of California, Irvine

Abstract

We study sorting algorithms based on randomized round-robin comparisons. Speciﬁcally, we study

Spin-the-bottle sort, where comparisons are unrestricted, and Annealing sort, where comparisons are

restricted to a distance bounded by a temperature parameter. Both algorithms are simple, randomized,

data-oblivious sorting algorithms, which are useful in privacy-preserving computations, but, as we show,

Annealing sort is much more efﬁcient. We show that there is an input permutation that causes Spin-the-

bottle sort to require Ω(n2log n)expected time in order to succeed, and that in O(n2log n)time this

algorithm succeeds with high probability for any input. We also show there is an implementation of

Annealing sort that runs in O(nlog n)time and succeeds with very high probability.

1 Introduction

The sorting problem is classic in computer science, with well over a ﬁfty-year history (e.g., see [3, 20, 24,

39, 42]). In this problem, we are given an array, A, of nelements taken from some total order and we

are interested in permuting Aso that the elements are listed in order1. In this paper, we are interested

in randomized sorting algorithms based on simple round-robin strategies of scanning the array Awhile

performing, for each i= 1,2, . . . , n, a compare-exchange operation between A[i]and A[s], where sis a

randomly-chosen index not equal to i.

In addition to its simplicity, sorting via round-robin compare-exchange operations, in this manner, is

data-oblivious. That is, if we view compare-exchange operations as a blackbox primitive, then the sequence

of operations performed by such a randomized sorting algorithm is independent of the input permutation.

Any data-oblivious sorting algorithm can also be viewed as a sorting network [26], where the elements

in the input array are provided on ninput wires and internal gates are compare-exchange operations. Ajtai,

Koml´

os, and Szemer´

edi (AKS) [1] give a sorting network with O(nlog n)compare-exchange gates, but their

method is quite complicated and has a very large constant factor, even with known improvements [32, 38].

Leighton and Plaxton [27] and Goodrich [17] describe alternative randomized sorting networks that use

O(nlog n)compare-exchange gates and sort any given input array with very high probability. None of

these previous approaches are based on simple round-robin comparison strategies, however.

Data-oblivious sorting algorithms are often motivated from their ability to be implemented in special-

purpose hardware modules [24], but such algorithms also have applications in secure multi-party computa-

tion (SMC) protocols (e.g., see [4, 10,14,15, 28, 29]). In such protocols, two or more parties separately hold

different portions of a set of data values, {x1, x2, . . . , xn}, and are interested in computing some function,

f(x1, x2, . . . , xn), without revealing their respective data values (e.g., see [4, 28, 40]). Thus, the design of

simpler data-oblivious sorting algorithms can lead to simpler SMC protocols.

1Since we are focusing on comparison-based algorithms here, let us assume, without loss of generality, that the elements of A

are distinct, e.g., by a mapping A[i]→(A[i], i)and then using lexicographic ordering for comparisons.

1

arXiv:1011.2480v1 [cs.DS] 10 Nov 2010

1.1 Previous Related Work

In spite of their simplicity, we are not familiar with previous work on data-oblivious sorting algorithms

based on round-robin random comparisons. So we review below some of the previous work on sorting that

is related to the various properties that are of interest in this paper.

Sorting via Random Comparisons. Biedl et al. [5] analyze a simple algorithm, Guess-sort, which itera-

tively picks two elements in the input array at random and performs a compare-exchange for them, and they

show that this method runs in expected time Θ(n2log n). In addition, Gruber et al. [19] perform a more

exact analysis of this algorithm, which they call Bozo-sort. Neither of these papers consider round-robin

random comparisons, however.

Quicksort. Of course, the randomized Quicksort algorithm sorts via round-robin comparisons against a

randomly-chosen element, known as a pivot (e.g., see [11, 18, 36]) and this leads to a sorting algorithm

that runs in O(nlog n)time with high probability. Even so, the set of comparisons is highly dependent on

input values. Thus, randomized Quicksort is not a data-oblivious algorithm based on random round-robin

compare-exchange operations.

Shellsort. Sorting via data-oblivious round-robin random comparisons has a similar ﬂavor to randomized

Shellsort [17], which sorts via random matchings between various subarrays of the input array. Nevertheless,

there are some important differences between randomized Shellsort and sorting via round-robin random

compare-exchange operations. For instance, the analysis of randomized Shellsort requires an extensive

postprocessing step, which we avoid in the analysis of our randomized round-robin sorting algorithms. We

also avoid the complexity of previous analyses of deterministic variants of Shellsort (e.g., see [12, 23, 33]),

such as that by Pratt [34], which leads to the best known performance for deterministic Shellsort, namely, a

worst-case running time of O(nlog2n). (See also the excellent survey of Sedgewick [37].)

Sorting via Round-robin Passes. Sorting by deterministic round-robin passes is, of course, a classic ap-

proach, as in the well-known Bubble-sort algorithm (e.g., see [11, 18, 36]). For instance, Dobosiewicz [13]

proposes sorting via various bubble-sort passes—doing a left-to-right sequence of compare-exchanges be-

tween elements at offset-distances apart. In addition, Incerpi and Sedgewick [21, 22] study a version of

Shellsort that replaces the inner-loop with a round-robin “shaker” pass (see also [9, 41]), which is a left-to-

right bubble-sort pass followed by a right-to-left bubble-sort pass. These algorithms do not ultimately lead

to a time performance that is O(nlog n), however.

1.2 Our Results

In this paper, we study two sorting algorithms based on randomized round-robin comparisons. Speciﬁcally,

we study an algorithm we are calling “Spin-the-bottle sort,” where comparisons in each round are arbitrary,

and an algorithm we are calling “Annealing sort,” where comparisons are restricted to a distance bounded

by a temperature parameter. These algorithms are therefore similar to one another, with both being simple,

data-oblivious sorting algorithms based on round-robin random compare-exchange operations.

Their respective performance is quite different, however, in that we show there is an input permutation

that causes Spin-the-bottle sort to require an expected running time that is Ω(n2log n)in order to succeed,

and that Spin-the-bottle sort succeeds with high probability for any input permutation in O(n2log n)time.

That is, Spin-the-bottle sort has an asymptotic expected running time that is actually worse than Bubble sort!

Thus, it is perhaps a bit surprising that, with just a couple of minor changes, Spin-the-bottle sort can be

transformed into Annealing sort, which is much more efﬁcient. In particular, Annealing sort is derived by

applying the simulated annealing [25] meta-heuristic to Spin-the-bottle sort. There are, of course, multiple

ways to apply this meta-heuristic, but we show there is a version of Annealing sort that runs in O(nlog n)

time and succeeds with very high probability2.

2We say an algorithm succeeds with very high probability if success occurs with probability 1−1/nρ, for some constant ρ≥1.

2

2 Spin-the-bottle Sort

The simplest sorting algorithm we consider in this paper is Spin-the-bottle sort3, which is given in Figure 1.

while Ais not sorted do

for i= 1 to ndo

Choose suniformly and independently at random from {1,2, . . . , i −1, i + 1, . . . , n}.

if (i<sand A[i]> A[s]) or (i>sand A[i]< A[s]) then

Swap A[i]and A[s].

Figure 1: Spin-the-bottle sort.

The test for Abeing sorted is either done via a straightforward linear-time scan of Aor by a heuristic

based on counting the number rounds needed until it is highly likely that Ais sorted. In the latter case, this

leads to a data-oblivious sorting algorithm, that is, a sorting algorithm for which the sequence of comparison-

exchange operations is independent of the values of the input, depending only on its size.

2.1 A Lower Bound on the Expected Running Time of Spin-the-bottle Sort

Our analysis of Spin-the-bottle sort is fairly straightforward and shows that this algorithm is asymptotically

worse than almost all other published sorting algorithms. Nevertheless, let us go through some details of

this analysis, as it provides some intuition of how improvements can be made, which in turn leads to a much

more efﬁcient algorithm, Annealing sort.

Let us begin with a lower bound on the expected running time for Spin-the-bottle sort. As was done in

the analysis of Guess-Sort [5], let us consider the input array A= (2,1,4,3, . . . , n, n −1), albeit now with

a different argument as to why this is a difﬁcult input instance.

This array has N=n/2inversions, with each element participating in exactly one inversion. During any

scan of A, each element that has yet to have its inversion resolved has a probability of 1/(n−1) of resolving

its inversion. Considering the sequence of compare-exchange operations that Spin-the-bottle sort performs

until Ais sorted, let us divide this sequence into maximal epochs of comparisons that do not resolve an

inversion followed by one that does. Let X1, X2, . . . , XNbe a set of random variables where Xidenotes

the number of comparisons performed in epoch i, and observe that there are N−iinversions remaining in

Aafter epoch i. Likewise, let Y1, Y2, . . . , YNbe a set of random variables where Yidenotes the number of

comparisons performed in epoch i, but only counting each comparison done such that its element, A[i], has

not had its inversion resolved in a previous epoch. Note that

Xi≥nYi

n−2(i−1),

since one full round performed in epoch iinvolves ncomparisons, of which n−2(i−1) are for elements

that have yet to have their inversions resolved.

The running time of Spin-the-bottle sort is proportional to

X=

N

X

i=1

Xi.

3The name comes from a party game, Spin the bottle, where a group of players sit in a circle and take turns, in a round-robin

fashion, spinning a bottle in the middle of the circle. When it is a player’s turn, he or she spins the bottle and then kisses the person

of the appropriate gender nearest to where the bottle points.

3

Each Yiis a geometric random variable with parameter p= 1/(n−1); hence, E(Yi) = n−1. Thus,

E(X) = E N

X

i=1

Xi!

≥E N

X

i=1

nYi

n−2(i−1)!

≥n

N

X

i=1 E(Yi)

n−2(i−1) −1

=n(n−1)

N

X

i=1

1

n−2(i−1) −nN

=n(n−1)Hn/4/2−n2/2,

where Hmdenotes the mth Harmonic number. Thus, E(X)is Ω(n2log n)for this input array, giving us the

following.

Theorem 2.1: There is an input causing Spin-the-bottle sort to have an expected running time of Ω(n2log n).

An important lesson to take away from the proof of the above theorem is that a set of inversions between

pairs of close-by elements in Ais sufﬁcient to cause Spin-the-bottle sort to have a relatively large expected

running time. Intuitively, the algorithm is spending a lot of time for each element A[i]looking throughout

the entire array for an inversion that is caused by an element right “next door” to A[i]. Interestingly, this

same intuition applies to our upper bound for the running time of Spin-the-bottle sort.

2.2 An Upper Bound on the Running Time of Spin-the-bottle Sort

Let us now consider an upper bound on the running time of Spin-the-bottle sort. Our analysis is based

on characterizations involving M, the number of inversions present in Awhen it is given as input to the

algorithm. Let Mjdenote the number of inversions that exist in Aat the beginning of round j(where a

round involves a complete scan of A), so M1=M. In addition, let mi,j denote the number of inversions

that exist at the beginning of round jand involve A[i], and observe that

n

X

i=1

mi,j = 2Mj.

We divide the course of the algorithm into three phases, depending on the value of Mj:

•Phase 1: Mj≥12nlog n

•Phase 2: 12n≤Mj<12nlog n

•Phase 3: Mj<12n.

Theorem 2.2: Given an array Aof nelements, the three phases of Spin-the-bottle sort run in O(n2log n)

time and sort Awith very high probability.

Proof: See Appendix A.

This, of course, is no great achievement, since there are several simple deterministic data-oblivious

sorting algorithms that run in O(nlog2n)time and even Bubble sort itself is faster than Spin-the-bottle sort,

running in O(n2)time. But the above three-phase characterization nevertheless gives us some intuition that

leads to a more efﬁcient sorting algorithm, which we discuss next.

4

3 Annealing Sort

The sorting algorithm we discuss in this section is based on applying the simulated annealing [25] meta-

heuristic to the sorting problem. Following an analogy from metallurgy, simulated annealing involves

solving an optimization problem by a sequence of choices, such that choice jis made from among some rj

neighbors of a current state that are conﬁned to be within a distance bounded from above by a parameter

Tj(according to an appropriate metric). Given the metallurgical analogy, the parameter Tjis called the

temperature, which is gradually decreased during the algorithm according to an annealing schedule, until

it is 0, at which point the algorithm halts.

Let us apply this meta-heuristic to sorting, which is admittedly not an optimization problem, so some

adaption is required. That is, let us view each round in a sorting algorithm that is similar to Spin-the-bottle

sort as a step in a simulated annealing algorithm. Since each compare-exchange operation is chosen at

random, let us now limit, in round j, the distance between candidate comparison elements to a parameter

Tj, so as to implement the temperature metaphor, and let us also repeat the random choices for each element

rjtimes, so as to implement a notion of neighbors of the current state under consideration. The sequence of

Tjand rjvalues deﬁnes the annealing schedule for our Annealing sort.

Formally, let us assume we are given an annealing schedule deﬁned by the following:

•Atemperature sequence,T= (T1, T2, . . . , Tt), where Ti≥Ti+1, for i= 1, . . . , t −1, and Tt= 0.

•Arepetition sequence,R= (r1, r2, . . . , rt), for i= 1, . . . , t.

Given these two sequences, Annealing sort is as given in Figure 2.

for j= 1 to tdo

for i= 1 to n−1do

for k= 1 to rjdo

Let sbe a random integer in the range [i+ 1,min{n, i +Tj}].

if A[i]> A[s]then

Swap A[i]and A[s]

for i=ndownto 2do

for k= 1 to rjdo

Let sbe a random integer in the range [max{1, i −Tj}, i −1].

if A[s]> A[i]then

Swap A[i]and A[s]

Figure 2: Annealing sort. It takes as input an array, A, of nelements and an annealing schedule deﬁned by

sequences, T= (T1, T2, . . . , Tt)and R= (r1, r2, . . . , rt). Note that if the compare-exchange operations

are performed as a blackbox, then the algorithm is data-oblivious.

The running time of Annealing sort is O(nPt

j=1 ri)and its effectiveness depends on the annealing

schedule, deﬁned by T= (T1, T2, . . . , Tt)and R= (r1, r2, . . . , rt). Fortunately, there is a three-phase

annealing schedule that causes Annealing sort to run in O(nlog n)time and succeed with very high proba-

bility:

•Phase 1. For this phase, let T1= (2n, 2n, n, n, n/2, n/2, n/4, n/4...,qlog6n, q log6n)be the

temperature sequence and let R1= (c, c, . . . , c)be an equal-length repetition sequence (of all c’s),

where q≥1and c > 1are constants.

•Phase 2. For this phase, let T2= (qlog6n, (q/2) log6n, (q/4) log6n, . . . , g log n)be the temper-

ature sequence and let R2= (r, r, . . . , r)be an equal-length repetition sequence, where qis the

constant from Phase 1, g≥1is a constant determined in the analysis, and ris Θ(log n/ log log n).

•Phase 3. For this phase, let T3and R3be sequences of length glog nof all 1’s.

5

Given the annealing schedule deﬁned by T= (T1,T2,T3,0) and R= (R1,R2,R3,0), note that the

running time of Annealing sort is O(nlog n). Let us therefore analyze its success probability.

3.1 Analysis of Phase 1

Our analysis for Phase 1 borrows some elements from our analysis of randomized Shellsort [17], as this

algorithm has a somewhat similar structure of a schedule of random choices that gradually reduce in scope.

The Probabilistic Zero-One Principle. We begin our analysis with a probabilistic version of the zero-one

principle (e.g., see Knuth [24]).

Lemma 3.1 [6, 17, 35]: If a randomized data-oblivious sorting algorithm sorts any array of 0’s and 1’s of

size nwith failure probability at most , then it sorts any array of size nwith failure probability at most

(n+ 1).

This lemma is clearly only of effective use for randomized data-oblivious algorithms that have fail-

ure probabilities that are O(n−ρ), for some constant ρ > 1, i.e., algorithms that succeed with very high

probability.

Shrinking Lemmas. As we move up and down Ain a single pass, let us assume that we are considering

the affect of this pass on an array Aof zeroes and ones, reasoning about how this pass impacts the ones

“moving up” in A. We can prove a number of useful “shrinking” lemmas for the number of ones that remain

in various regions (i.e., subarrays) of Aduring this pass. (Symmetric lemmas hold for the 0’s with respect

to their downward movement in A.)

Lemma 3.2 (Sliding-Window Lemma): Let Bbe a subarray of Aof size N, and let Cbe the subarray of

Aof size 4Nimmediately after B. Suppose further there are k≤4βN ones in B∪C, for 0< β < 1. Let

k(c)

1be the number of ones in Bafter a single up-and-down pass of Annealing sort with temperature 4Nand

repetition factor c. Then

Pr k(c)

1>max {2βcN, 8elog n}≤min{2−βcN/2, n−4}.

Proof: For a one to remain in a given location in Bit must be matched with a one in each of its ccompare-

exchange operations in B∪C(and note that this is the extent of possibilities, since the temperature is 4N).

Moreover, we may pessimistically assume each such c-ary test will occur independently for each possible

position in Bwith probability at most βc. Thus,

E(k(c)

1)≤βcN.

Since k(c)

1can, in this case, be viewed as the sum of Nindependent 0-1 random variables, we can apply a

Chernoff bound (e.g., see [30,31]) to establish

Pr k(c)

1>2βcN≤2−βcN/2,

for the case when our bound on E(k(c)

1)is greater than 4elog n. When this bound is less than or equal to

4elog n, we can use a Chernoff bound to establish

Pr k(c)

1>8elog n≤2−2elog n≤n−4.

6

Lemma 3.3: Suppose we are given two regions, Band C, of A, of size Nand αN , respectively, for

0< α < 4, that are contained inside a subarray of Aof size 4N, with Bto the left of C, and let k=k1+k2,

where k1(resp., k2) is the number of ones in B(resp., C). Let k(c)

1be the number of ones in Bafter a single

up-and-down pass of Annealing sort with temperature 4Nand repetition factor c. Then

Ek(c)

1≤k11−α

4+k2

4Nc

.

Proof: A one may possibly remain in Bafter a single (up) pass of Annealing sort with temperature 4N,

with respect to a single random choice, if it is matched with a one in Cor not matched with an element in

Cat all. In a single random choice, with probability 1−α/4, it is not matched with an element in C, and,

if matched with an element in C, which occurs with probability α/4, the probability that it is matched with

a one is k2/(αN).

Lemma 3.4 (Fractional-Depletion Lemma): Given two regions, Band C, in A, of size Nand αN , re-

spectively, for 0< α < 4, such that Band Care contained in a subarray of Aof size 4N, with Bto the

left of C, let k=k1+k2, where k1and k2are the respective number of ones in Band C, and suppose

k≤4βN , for 0< β < 1. Let k(c)

1be the number of ones in Bafter a single up-pass of Annealing sort with

temperature 4Nand repetition factor c. Then

Pr k(c)

1>max 21−α

4+βc

N, 8elog n≤min{2−(1−α/4 + β)cN/2, n−4}.

Proof: By Lemma 3.3, applied to this scenario,

E(k(c)

1)≤k11−α

4+4βN

4Nc

≤1−α

4+βc

N.

Since k(c)

1can be viewed as the sum of k1independent 0-1 random variables, we can apply a standard

Chernoff bound (e.g., see [30,31]) to establish

Pr k(c)

1>21−α

4+βc

N≤2−(1−α/4 + β)cN/2,

for the case when our bound on E(k(c)

1)is greater than 4elog n. When this bound is less than or equal to

4elog n, we can use a Chernoff bound to establish

Pr k(c)

1>8elog n≤2−2elog n≤n−4.

Lemma 3.5 (Startup Lemma): Given two regions, Band C, in A, of size Nand αN , respectively, for

0< α < 4, contained in a subarray of Aof size 4N, with Bto the left of C, let k=k1+k2, where k1and

k2are the respective number of ones in Band C, and suppose k≤4βN , for 0< β < 1. Let k(c)

1be the

number of ones in Bafter one up-pass of Annealing sort with temperature 4Nand repetition factor c. Then,

for any constant λ > 0such that 1−α/4 + β−λ≤1−, for some constant 0<<1, there is a constant

c > 1such that k(c)

1≤λN, with very high probability, provided Nis Ω(log n).

Proof: By Lemma 3.3, so long as k1≥λN, then

E(k(c)

1)≤1−α

4+4βN −λN

4Nc

N

≤1−α

4+β−λc

N

≤(1 −)cN.

7

Of course, we are done as soon as k1≤λN , and note that, for c≥log1/(1−)λ/2, we have E(k(c)

1)≤

λN/2. Thus, by a Chernoff bound, for such a constant c,

Pr k(c)

1> λN= Pr k(c)

1>2λN/2≤2−λN/4.

The proof follows then, the fact that Nis Ω(log n).

Having proven the essential properties for the compare-exchange passes done in each round of Phase 1

of Annealing sort, let us now turn to the actual analysis of Phase 1.

Bounding Dirtiness after each Iteration. In the 2d-th iteration of Phase 1, imagine that we partition the

array Ainto 2dregions, A0,A1,. . .,A2d−1, each of size n/2d. Moreover, every two iterations with the same

temperature splits a region from the previous iteration into two equal-sized halves. Thus, the algorithm can

be visualized in terms of a complete binary tree, B, with nleaves. The root of Bcorresponds to a region

consisting of the entire array Aand each leaf4of Bcorresponds to an individual cell, ai, in A, of size 1. Each

internal node vof Bat depth dcorresponds with a region, Ai, created in the 2d-th iteration of the algorithm,

and the children of vare associated with the two regions that Aiis split into during iteration 2(d+ 1).

The desired output, of course, is to have each leaf value, ai= 0, for i<n−k, and ai= 1, otherwise.

We therefore refer to the transition from cell n−k−1to cell n−kon the last level of Bas the crossover

point. We refer to any leaf-level region to the left of the crossover point as a low region and any leaf-level

region to the right of the crossover point as a high region. We say that a region, Ai, corresponding to an

internal node vof B, is a low region if all of v’s descendents are associated with low regions. Likewise, a

region, Ai, corresponding to an internal node vof , is a high region if all of v’s descendents are associated

with high regions. Thus, we desire that low regions eventually consist of only zeroes and high regions

eventually consist of only ones. A region that is neither high nor low is mixed, since it is an ancestor of both

low and high regions. Note that there are no mixed leaf-level regions, however.

Also note that, since Phase 1 is data-oblivious, the algorithm doesn’t take any different behavior de-

pending on whether is a region is high, low, or mixed. Nevertheless, given the shrinking lemmas presented

above, we can reason about the actions of our algorithm on different regions in terms of any one of these

pairs.

With each high (resp., low) region, Ai, deﬁne the dirtiness of Aito be the number of zeroes (resp., ones)

that are present in Ai, that is, values of the wrong type for Ai. With each region, Ai, we associate a dirtiness

bound, δ(Ai), which is a desired upper bound on the dirtiness of Ai. For each region, Ai, at depth din B,

let jbe the number of regions from Aito the crossover point or mixed region on that level. That is, if Ai

is next to the mixed region, then j= 1, and if Aiis next to a region next to the mixed region, then j= 2,

and so on. In general, if Aiis a low leaf-level region, then j=n−k−i−1, and if Aiis a high leaf-level

region, then j=j−n+k. We deﬁne the desired dirtiness bound,δ(Ai), of Aias follows:

•If j≥2, then

δ(Ai) = n

2d+j+3 .

•If j= 1, then

δ(Ai) = n

5·2d.

•If Aiis a mixed region, then

δ(Ai) = |Ai|.

4This is a slight exaggeration, of course, since we terminate Phase 1 when regions have size O(log6n).

8

Thus, every mixed region trivially satisﬁes its desired dirtiness bound.

Because of our need for a high probability bound, we will guarantee that each region Aisatisﬁes its

desired dirtiness bound, w.v.h.p., only if δ(Ai)≥8elog n. If δ(Ai)<8elog n, then we say Aiis an

extreme region, for, during our algorithm, this condition implies that Aiis relatively far from the crossover

point. We will show that the total dirtiness of all extreme regions is O(log3n)w.v.h.p. This motivates our

termination of Phase 1 when the temperature is O(log6n).

Lemma 3.6: Suppose Aiis a low (resp., high) region and ∆is the cumulative dirtiness of all regions to the

left (right) of Ai. Then any compare-exchange pass over Acan increase the dirtiness of Aiby at most ∆.

Proof: If Aiis a low (resp., high) region, then its dirtiness is measured by the number of ones (resp., zeroes)

it contains. During any compare-exchange pass, ones can only move right, exchanging themselves with

zeroes, and zeroes can only move left, exchanging themselves with ones. Thus, the only ones that can move

into a low region are those to the left of it and the only zeroes that can move into a high region are those to

the right of it.

The inductive claim we show in Appendix B holds with very high probability is the following.

Claim 3.7: After iteration d, for each region Ai, the dirtiness of Aiis at most δ(Ai), provided Aiis not

extreme. The total dirtiness of all extreme regions is at most 8ed log2n.

3.2 Analysis of Phase 2

Claim 3.7 is the essential condition we need to hold at the start of Phase 2. In this section, we analyze the

degree to which Phase 2 increases the sortedness of the array Afurther from this point.

At the beginning of Phase 2, the total dirtiness of all extreme regions is at most 8elog3n, and the size

of each such region is glog6n, for g= 64e2. Without loss of generality, let us consider a one in an extreme

low region. The probability that such a one fails to be compared with a zero to its right in a round of Phase 2

is at most 1/N1/2, provided gis large enough. Thus, with r=hlog n/ log log n, the probability such a one

fails to be compared with a 0 after rrandom comparisons at distance Nis at most

1

N1/2hlog n/ log log n

≤1

N(h/2) log n/ log log n

≤1

(log n)(h/2) log n/ log log n

=1

nh/2,

since N≥log nduring Phase 2. Thus, with very high probability, there are no dirty extreme regions after

one round of Phase 2.

Consider next a non-extreme low region that is not mixed. By Claim 3.7, the dirtiness of such a region,

and all regions to its left, is, with very high probability, at most 7N/10. Thus,

E(k(r)

1)≤1−3

20r

N

≤e−(20/3)rN.

Therefore, by a Chernoff bound, for dand nlarge enough,

Pr k(r)

1> d log N≤(eN )dlog N

(e(20/3)dlog n/ log log n)dlog N

9

≤1

edlog n

≤1

nd.

Note that in the next round after this, such a region will become completely clean, w.v.h.p., since its

dirtiness is below 1/N1/2w.v.h.p.

In addition, by Lemma 3.5, since Nis Ω(log n)throughout Phase 2, then, w.v.h.p, the dirtiness of

regions separate from a mixed region is at most N/6. Thus, the above analysis applies to them as well, once

they are separate from a mixed region.

Therefore, by the end of Phase 2, w.v.h.p., the only dirty regions are either mixed or within distance 2of

a mixed region. In other words, the total dirtiness of the array Aat the end of Phase 2 is O(log n).

3.3 Analysis of Phase 3

Each round of Phase 3 is guaranteed to decrease the dirtiness of Aby at least 1so long as Ais not completely

clean. This property is similar to the reason why Bubble sort works. Namely, using the zero-one principle,

note that the leftmost one in Awill always move right until it encounters another one. Thus, a single up-pass

in Aeliminates the leftmost one having a zero somewhere to its right. Likewise, a single down-pass in A

eliminates the rightmost zero having a one somewhere to its left. Thus, since the total dirtiness of Ais

O(log n)w.v.h.p., Phase 3 will completely sort Aw.v.h.p.

Therefore, we have the following.

Theorem 3.8: Given an array Aof nelements, there is an annealing schedule that cause the three phases

of Annealing sort to run in O(nlog n)time and leave Asorted with very high probability.

4 Conclusion

We have given two related data-oblivious sorting algorithms based on iterated passes of round-robin random

comparisons. The ﬁrst, Spin-the-bottle sort requires an expected Ω(n2log n)time to sort some inputs and

in O(n2log n)time it will sort any given input sequence with very high probability. The second, Annealing

sort, on the other hand, can be designed to run in O(nlog n)time and sort with very high probability.

Some interesting open problems include the following.

•Our analysis is, in many ways, overly pessimistic, in order to show that Annealing sort succeeds with

very high probability. Is there a simpler and shorter annealing sequence that causes Annealing sort to

run in O(nlog n)time and sort with very high probability?

•Both Spin-the-bottle sort and Annealing sort are highly sequential. Is there a simple5randomized

sorting network with depth O(log n)and size O(nlog n)that sorts any given input sequence with

very high probability?

•Throughout this paper, we have assumed that compare-exchange operations always return the correct

answer. But there are some scenarios when one would want to be tolerant of faulty compare-exchange

operations (e.g., see [2, 8, 16]). Is there a version of Annealing sort that runs in O(nlog n)time and

sorts with high probability even if comparisons return a faulty answer uniformly at random with

probability strictly less than 1/2?

5Leighton and Plaxton [27] describe a randomized sorting network that sorts with very high probability, which is simpler than

the AKS sorting network [1], but is still somewhat complicated. So the open problem would be to design a sorting network

construction that is clearly simpler than the construction of Leighton and Plaxton.

10

Acknowledgments

This research was supported in part by the National Science Foundation under grants 0724806, 0713046,

and 0847968, and by the Ofﬁce of Naval Research under MURI grant N00014-08-1-1015.

References

[1] M. Ajtai, J. Koml´

os, and E. Szemer´

edi. Sorting in clog nparallel steps. Combinatorica, 3:1–19,

1983.

[2] S. Assaf and E. Upfal. Fault tolerant sorting networks. SIAM J. Discrete Math., 4(4):472–480, 1991.

[3] K. E. Batcher. Sorting networks and their applications. In Proc. 1968 Spring Joint Computer Conf.,

pages 307–314, Reston, VA, 1968. AFIPS Press.

[4] A. Ben-David, N. Nisan, and B. Pinkas. FairplayMP: A system for secure multi-party computation.

In CCS ’08: Proceedings of the 15th ACM conference on Computer and communications security,

pages 257–266, New York, NY, USA, 2008. ACM.

[5] T. Biedl, T. Chan, E. D. Demaine, R. Fleischer, M. Golin, J. A. King, and J. I. Munro. Fun-sort–or the

chaos of unordered binary search. Discrete Appl. Math., 144(3):231–236, 2004.

[6] D. T. Blackston and A. Ranade. Snakesort: A family of simple optimal randomized sorting

algorithms. In ICPP ’93: Proceedings of the 1993 International Conference on Parallel Processing,

pages 201–204, Washington, DC, USA, 1993. IEEE Computer Society.

[7] A. Boneh and M. Hofri. The coupon-collector problem revisited — a survey of engineering problems

and computational methods. Communications in Statistics — Stochastic Models, 13(1):39–66, 1997.

[8] M. Braverman and E. Mossel. Noisy sorting without resampling. In SODA ’08: Proceedings of the

19th ACM-SIAM Symposium on Discrete algorithms, pages 268–276, Philadelphia, PA, USA, 2008.

Society for Industrial and Applied Mathematics.

[9] B. Brejov´

a. Analyzing variants of Shellsort. Information Processing Letters, 79(5):223 – 227, 2001.

[10] R. Canetti, Y. Lindell, R. Ostrovsky, and A. Sahai. Universally composable two-party and multi-party

secure computation. In STOC ’02: Proceedings of the thiry-fourth annual ACM symposium on

Theory of computing, pages 494–503, New York, NY, USA, 2002. ACM.

[11] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press,

Cambridge, MA, 2nd edition, 2001.

[12] R. Cypher. A lower bound on the size of Shellsort sorting networks. SIAM J. Comput., 22(1):62–71,

1993.

[13] W. Dobosiewicz. An efﬁcient variation of bubble sort. Inf. Process. Lett., 11(1):5–6, 1980.

[14] W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: a review

and open problems. In NSPW ’01: Proceedings of the 2001 workshop on New security paradigms,

pages 13–22, New York, NY, USA, 2001. ACM.

[15] W. Du and Z. Zhan. A practical approach to solve secure multi-party computation problems. In

NSPW ’02: Proceedings of the 2002 workshop on New security paradigms, pages 127–135, New

York, NY, USA, 2002. ACM.

[16] U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM J. Comput.,

23(5):1001–1018, 1994.

[17] M. T. Goodrich. Randomized Shellsort: A simple oblivious sorting algorithm. In Proceedings of the

ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1–16. SIAM, 2010.

11

[18] M. T. Goodrich and R. Tamassia. Algorithm Design: Foundations, Analysis, and Internet Examples.

John Wiley & Sons, New York, NY, 2002.

[19] H. Gruber, M. Holzer, and O. Ruepp. Sorting the slow way: an analysis of perversely awful

randomized sorting algorithms. In FUN’07: Proceedings of the 4th international conference on Fun

with algorithms, pages 183–197, Berlin, Heidelberg, 2007. Springer-Verlag.

[20] C. A. R. Hoare. Quicksort. Comput. J., 5(1):10–15, 1962.

[21] J. Incerpi and R. Sedgewick. Improved upper bounds on Shellsort. J. Comput. Syst. Sci.,

31(2):210–224, 1985.

[22] J. Incerpi and R. Sedgewick. Practical variations of Shellsort. Inf. Process. Lett., 26(1):37–43, 1987.

[23] T. Jiang, M. Li, and P. Vit´

anyi. A lower bound on the average-case complexity of Shellsort. J. ACM,

47(5):905–911, 2000.

[24] D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming.

Addison-Wesley, Reading, MA, 1973.

[25] P. J. M. Laarhoven and E. H. L. Aarts, editors. Simulated annealing: theory and applications. Kluwer

Academic Publishers, Norwell, MA, USA, 1987.

[26] F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.

Morgan-Kaufmann, San Mateo, CA, 1992.

[27] T. Leighton and C. G. Plaxton. Hypercubic sorting networks. SIAM J. Comput., 27(1):1–47, 1998.

[28] D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella. Fairplay—a secure two-party computation system. In

SSYM’04: Proceedings of the 13th conference on USENIX Security Symposium, pages 20–20,

Berkeley, CA, USA, 2004. USENIX Association.

[29] U. Maurer. Secure multi-party computation made simple. Discrete Appl. Math., 154(2):370–381,

2006.

[30] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and

Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005.

[31] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, New York, NY,

1995.

[32] M. Paterson. Improved sorting networks with O(log N)depth. Algorithmica, 5(1):75–92, 1990.

[33] C. G. Plaxton and T. Suel. Lower bounds for Shellsort. J. Algorithms, 23(2):221–240, 1997.

[34] V. R. Pratt. Shellsort and sorting networks. PhD thesis, Stanford University, Stanford, CA, USA,

1972.

[35] S. Rajasekaran and S. Sen. PDM sorting algorithms that take a small number of passes. In IPDPS

’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium

(IPDPS’05) - Papers, page 10, Washington, DC, USA, 2005. IEEE Computer Society.

[36] R. Sedgewick. Algorithms in C++. Addison-Wesley, Reading, MA, 1992.

[37] R. Sedgewick. Analysis of Shellsort and related algorithms. In ESA ’96: Proceedings of the Fourth

Annual European Symposium on Algorithms, pages 1–11, London, UK, 1996. Springer-Verlag.

[38] J. Seiferas. Sorting networks of logarithmic depth, further simpliﬁed. Algorithmica, 53(3):374–384,

2009.

[39] D. L. Shell. A high-speed sorting procedure. Commun. ACM, 2(7):30–32, 1959.

[40] G. Wang, T. Luo, M. T. Goodrich, W. Du, and Z. Zhu. Bureaucratic protocols for secure two-party

sorting, selection, and permuting. In 5th ACM Symposium on Information, Computer and

Communications Security (ASIACCS), pages 226–237. ACM, 2010.

12

[41] M. A. Weiss and R. Sedgewick. Bad cases for shaker-sort. Information Processing Letters, 28(3):133

– 136, 1988.

[42] J. Williams. Algorithm 232: Heasort. Commun. ACM, 7:347–348, 1964.

13

A Proving the Correctness of Spin-the-bottle Sort

In this appendix, we prove Theorem 2.2, which states that, given an array Aof nelements, the three phases

of Spin-the-bottle sort run in O(n2log n)time and sort Awith very high probability.

The proof is based on showing that we can achieve each of the milestones marking each phase in

O(n2log n)time or better.

Phase 1. Let Xjbe a random variable that equals the number of inversions resolved in round jof Phase 1,

and let Xi,j denote an indicator random variable that is 1iff we perform a comparison in iteration (round) j

of the algorithm between A[i]and an element that caused an inversion with A[i]at the beginning of round

j. Thus,

Xj≥Pn

i=1 Xi,j

2,

since each inversion involves two elements of A. Each of the Xi,j’s are independent. Furthermore,

E(Xi,j) = mi,j

n−1,

where mi,j denotes the number of inversions that exist at the beginning of round jand involve A[i]. There-

fore,

E(Xj)≥(1/2)

n

X

i=1

mi,j

n−1=Mj/(n−1),

where Mjis the number of inversions in Athat exist at the beginning of round j. Thus, by a well-known

Chernoff bound,

Pr(Xj< Mj/2(n−1)) ≤ e−1/2

(1/2)1/2!Mj/(n−1)

≤2−Mj/3(n−1)

≤n−4,

since we are in Phase 1. So we may assume with probability at least 1−c/n3that the following recurrence

relation holds during Phase 1, for all 1≤j≤cn, for any constant c≥1:

Mj+1 ≤Mj−Mj

2n.

Therefore, with probability at least 1−4/n3, there are at most 4nrounds during Phase 1 of Spin-the-bottle

sort, since M1=M < n2and Mj≥12nlog n, for all jduring Phase 1. That is, with very high probability,

Phase 1 runs in O(n2)time.

Phase 2. For this phase, let Xjand Xi,j denote random variables deﬁned as in our analysis of Phase 1,

with the index jreset to 1for Phase 2. In this case,

E(Xj)≥Mj/(n−1) ≥12.

Thus, by a similar Chernoff bound used for analyzing Phase 1,

Pr(Xj<6) ≤Pr(Xj< Mj/2(n−1))

≤2−Mj/3(n−1)

≤2−4,

14

since we are in Phase 2. That is, with probability 1/16 we resolve fewer than 6 inversions in round jof

Phase 2. Call round jafailure in this case, and call it a success if it resolves at least 6 inversions. Let Yj

be an indicator random variable that is 1iff we resolve fewer than 6inversions in round jof Phase 2, or,

if jis larger than the number of rounds in Phase 2, then let Yjbe an independent random variable that is 1

with probability 1/16. Thus, the number of failure rounds in the ﬁrst at most 4nlog nrounds of Phase 2 is

at most

Y=

4nlog n

X

j=1

Yj.

Note that E(Y) = (1/4)nlog n. Thus, by a standard Chernoff bound,

Pr(Y > 2nlog n) = Pr(Y > 8(1/4)nlog n)

≤ e7

88!(1/4)nlog n

≤2−2nlog n

=n−2n.

Note, in addition, that there can be, in total, at most 2nlog nsuccessful rounds in Phase 2. Thus, with very

high probability, there are only O(nlog n)rounds in Phase 2. That is, with very high probability, Phase 2

runs in O(n2log n)time.

Phase 3. The analysis for this phase is similar to that for the coupon collector’s problem (e.g., see [7]).

At the start of this phase, there are fewer than 12ninversions that remain in A. Note that, for any such

inversion, χ, the probability that χis resolved in a round of Phase 3 is at least61/n. Let Zr

χbe the event

that χis not resolved after rrounds of Phase 3. Thus,

Pr(Zr

χ)≤1−1

nr

≤e−r/n.

Let Rdenote the number of rounds needed to resolve all the inversions in Phase 3. Then, for c≥2,

Pr(R > cn ln n)≤Pr [

χ

Zcn log n

χ!

≤X

χ

Pr Zcn log n

χ

≤12

nc−1.

Thus, with very high probability, Ris O(nlog n); hence, with very high probability, Phase 3 runs in

O(n2log n)time. This completes the proof.

6In fact, the probability that χis resolved in a round of Phase 3 is equal to 2/(n−1) −1/(n−1)2, since each inversion has

two chances of being resolved during a round.

15

B Proof of the Inductive Claim for Phase 1 of Annealing Sort

In this appendix, we prove Claim 3.7, which states that, after iteration d, for each region Ai, the dirtiness

of Aiis at most δ(Ai), provided Aiis not extreme, and that the total dirtiness of all extreme regions is at

most 8ed log2n. As mentioned above, this analysis for Phase 1 of Annealing sort borrows from our analysis

of randomized Shellsort [17], as there is a similar structure to our inductive argument even though the ﬁne

details are quite different.

Let us begin at the ﬁrst round, which we are viewing in terms of two regions, A1and A2, of size

N=n/2each. Suppose that k≤n−k, where kis the number of ones, so that A1is a low region and

A2is either a high region (i.e., if k=n−k) or A2is mixed (the case when k > n −kis symmetric). Let

k1(resp., k2) denote the number of ones in A1(resp., A2), so k=k1+k2. By the Startup Lemma (3.5),

the dirtiness of A1will be at most n/12, with very high probability, since in this case (using the notation of

that lemma and viewing Aas existing inside a larger array of size 2n), α= 1,β≤1/4, and λ= 1/6, so

1−α/4 + β−λ≤1−1/6. Note that this satisﬁes the desired dirtiness of A1, since δ(A1) = n/10 in

this case. A similar argument applies to A2if it is a high region, and if A2is mixed, it trivially satisﬁes its

desired dirtiness bound. Also, assuming nis large enough, there are no extreme regions (if nis so small that

A1is extreme, we can immediately switch to Phase 2). The next round of Annealing sort (with temperature

2n) can only improve the dirtiness in A. Thus, we satisfy the base case of our inductive argument—the

dirtiness bounds for the two children of the root of Bare satisﬁed with (very) high probability, and similar

arguments prove the inductive claim for iterations 3 and 4, for N=n/22and temperature n, and iterations

5 and 6 for N=n/23and temperature n/2.

Let us now consider a general inductive step. Let us assume that, with very high probability, we have

satisﬁed Claim 3.7 for the regions on level d≥3and let us now consider the transition to level d+ 1, which

occurs in iterations 2d+ 1 and 2d+ 2. In addition, we terminate this line of reasoning when the region size,

n/2d, becomes less than 64e2log6n.

Extreme Regions. Let us begin with the bound for the dirtiness of extreme regions at depth d+ 1, con-

sidering the effect of iteration 2d+ 1. Note that, by Lemma 3.6, regions that were extreme after iteration

2dwill be split into regions in iteration 2d+ 1 that contribute no new amounts of dirtiness to pre-existing

extreme regions. That is, extreme regions get split into extreme regions. Thus, the new dirtiness for extreme

regions can come only from regions that were not extreme on level dof Bthat are now splitting into extreme

regions on level d+ 1, which we call freshly extreme regions. Suppose, then, that Aiis such a region, say,

with a parent, Ap, which is jregions from the mixed region on level d. Then the desired dirtiness bound

of Ai’s parent region, Ap, is δ(Ap) = n/2d+j+3 ≥8elog n, by Claim 3.7, since Apis not extreme. Ap

has (low-region) children, Aiand Ai+1, that have desired dirtiness bounds of δ(Ai) = n/2d+1+2j+4 or

δ(Ai) = n/2d+1+2j+3 and of δ(Ai+1) = n/2d+1+2j+3 or δ(Ai+1 ) = n/2d+1+2j+2, depending on whether

the mixed region on level d+ 1 has an odd or even index. Moreover, Ai(and possibly Ai+1) is freshly

extreme, so n/2d+1+2j+4 <8elog n, which implies that j > (log n−d−log log n−10)/2. Neverthe-

less, note also that there are O(log n)new regions on this level that are just now becoming extreme, since

n/2d>64e2log6nand n/2d+j+3 ≥8elog nimplies j≤log n−d. So let us consider the two freshly

extreme regions, Aiand Ai+1, in turn, and how a pass of Annealing sort effects them (for after that they will

collectively satisfy the extreme-region part of Claim 3.7).

•Region Ai:Consider the worst case for δ(Ai), namely, that δ(Ai) = n/2d+1+2j+4. Since Aiis

a left child of Ap,Aicould get at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by

Lemma 3.6. In addition, Aiand Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap.

Thus, if we let Ndenote the size of Ai, i.e., N=n/2d+1, then Aiand Ai+1 together have at most

N/2j+1 + 3N1/2≤N/2jones, since we stop Phase 1 when N < 64e2log6n. In addition, assuming

j≥4, regions Ai+2 and Ai+3 may inherit at most n/2d+j+2 ones from their parent and region Ai+4

may inherit at most n/2d+j+1 ones from its parent. Therefore, by the Sliding-Window Lemma (3.2),

16

with β= 5/2j+3 <1/2j, the following condition holds with probability at least 1−cn−4,

k(c)

1≤max{2βcN , 8elog n},

where k(c)

1is the number of one left in Aiafter an up-pass of Annealing sort with temperature 4Nand

repetition factor c. Note that, if k(c)

1≤8elog n, then we have satisﬁed the desired dirtiness for Ai.

Alternatively, so long as c≥4, and j≥4, then w.v.h.p.,

k(c)

1≤2βcN

≤n

2d+jc

≤n

2d+1+2j+4 ≤8elog n=δ(Ai).

•Region Ai+1:Consider the worst case for δ(Ai+1), namely δ(Ai+1) = n/2d+1+2j+3. Since, in

this case, Ai+1 is a right child of Ap,Ai+1 could get at most n/2d+j+3 + 8ed log2nones from

regions left of Ai+1, by Lemma 3.6, plus Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from

Apitself. In addition, since j≥3,Ai+2 and Ai+3 could inherit at most n/2d+j+2 ones from their

parent, and Ai+4 and Ai+5 could inherit at most n/2d+j+1 ones from their parent. Thus, if we

let Ndenote the size of Ai+1, i.e., N=n/2d+1 , then Ai+1 through Ai+5 together have at most

N/2j+1 + 3N1/2+N/2j+1 +N/2j≤4N/2jones, since we stop Phase 1 when N < 64e2log6n,

and j≥4. By the Sliding-Window Lemma (3.2), applied with β= 1/2j, the following condition

holds with probability at least 1−cn−4,

k(c)

1≤max{2βcN , 8elog n},

where k(c)

1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c

and temperature 4N. Note that, if k(c)

1≤8elog n, then we have satisﬁed the desired dirtiness bound

for Ai+1. Alternatively, so long as c≥4, and j≥4, then w.v.h.p.,

k(c)

1≤2βcN

≤n

2d+jc

≤n

2d+1+2j+4 ≤8elog n=δ(Ai+1).

Therefore, if a low region Aior Ai+1 becomes freshly extreme in iteration 2d+ 1, then, w.v.h.p., its

dirtiness is at most 8elog n. Since there are at most log nfreshly extreme regions created in iteration 2d+1,

this implies that the total dirtiness of all extreme low regions in iteration 2d+ 1 is at most 8e(d+ 1) log2n,

w.v.h.p., after the right-moving pass of Phase 1, by Claim 3.7. Likewise, by symmetry, a similar claim

applies to the high regions after the left-moving pass of Phase 1. Moreover, by Lemma 3.6, these extreme

regions will continue to satisfy Claim 3.7 after this.

Non-extreme Regions not too Close to the Crossover Point. Let us now consider non-extreme regions

on level d+1 that are at least two regions away from the crossover point on level d+1. Consider, wlog, a low

region, Ap, on level d, which is jregions from the crossover point on level d, with Aphaving (low-region)

children, Aiand Ai+1, that have desired dirtiness bounds of δ(Ai) = n/2d+1+2j+4 or δ(Ai) = n/2d+1+2j+3

and of δ(Ai+1) = n/2d+1+2j+3 or δ(Ai+1 ) = n/2d+1+2j+2, depending on whether the mixed region on

level d+ 1 has an odd or even index. By Lemma 3.6, if we can show w.v.h.p. that the dirtiness of each

such Ai(resp., Ai+1) is at most δ(Ai)/3(resp., δ(Ai+1 )/3), after the up-and-down pass of Phase 1, then

17

no matter how many more ones come into Aior Ai+1 from the left during the rest of iteration 2d+ 1 (and

2d+ 2), they will satisfy their desired dirtiness bounds.

Let us consider the different region types (always taking the most difﬁcult choice for each desired dirti-

ness in order to avoid additional cases):

•Type 1: δ(Ai) = n/2d+1+2j+4, with j≥4. Since Aiis a left child of Ap, in this case, Aicould

get at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by Lemma 3.6. In addition, Aiand

Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. Thus, if we let Ndenote the size of Ai,

i.e., N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 3N1/2≤N/2jones, since we

stop Phase 1 when N < 64e2log6n. In addition, Ai+2 and Ai+3 inherit at most n/2d+j+2 ones from

their parent. Likewise, Ai+4 inherits at most n/2d+j+1 ones from its parent. Thus, Aithrough Ai+4

inherit at most N/2j+N/2j+1 +N/2j≤N/2j−2ones. Thus, we can apply the Sliding-Window

Lemma (3.2), with β= 1/2j, so that, the following condition holds with probability at least 1−n−4,

provided c≥4and j≥4:

k(c)

1≤2βcN

≤n

2d+1+jc−1

≤n

3·2d+1+2j+4 =δ(Ai)/3,

where k(c)

1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor c.

•Type 2: δ(Ai+1) = n/2d+1+2j+3, with j≥4. Since Ai+1 is a right child of Ap, in this case, Ai+1

could get at most n/2d+j+3 + 8ed log2nones from regions left of Ai+1, by Lemma 3.6, plus Ai+1

could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. In addition, since j > 2,Ai+2 and Ai+3

could inherit at most n/2d+j+2 ones from their parent. Thus, if we let Ndenote the size of Ai+1, i.e.,

N=n/2d+1, then Ai+1 ,Ai+2, and Ai+3 together have at most N/2j+ 3N1/2≤N/2j−1ones, since

we stop Phase 1 when N < 64e2log6n. In addition, Ai+4 and Ai+5 may inherit n/2d+j+1 ones from

their parent. Thus, Ai+1 through Ai+5 may receive N/2j−1+N/2j≤N/2j−2ones. Therefore, with

β= 1/2j, we may apply the Sliding-Window Lemma (3.2) to show that, with probability at least

1−n−4, for j≥4and c≥4,

k(c)

1≤2βcN

≤n

2d+1+jc

≤n

3·2d+1+2j+3 =δ(Ai+1)/3,

where k(c)

1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c.

•Type 3: δ(Ai) = n/2d+1+2j+4, with j= 3. Since Aiis a left child of Ap, in this case, Aicould

get at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by Lemma 3.6. In addition, Aiand

Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. Thus, if we let Ndenote the size of

Ai, i.e., N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 3N1/2≤N/2j=N/23

ones, since we stop Phase 1 when N < 64e2log6n. In addition, Ai+2 and Ai+3 inherit at most

n/2d+j+2 =N/24ones from their parent. Finally, Ai+4 inherits at most n/(5 ·2d) = 2N/5ones

from its parent. Thus, Aithrough Ai+4 inherit at most N/23+N/24+ 2N/5≤5N/23= 5N/2j

ones. Thus, we can apply the Sliding-Window Lemma (3.2), with β= 5/2j+2, so that, the following

condition holds with probability at least 1−n−4, for c≥5and j= 3:

k(c)

1≤2βcN

18

≤5cn

2d+(j+2)c

≤n

3·2d+1+2j+4 =δ(Ai)/3,

where k(c)

1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor cand

temperature 4N.

•Type 4: δ(Ai+1) = n/2d+1+2j+3, with j= 3. Since Ai+1 is a right child of Ap, in this case, Ai+1

could get at most n/2d+j+3 + 8ed log2nones from regions left of Ai+1, by Lemma 3.6, plus Ai+1

could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. In addition, since j > 2,Ai+2 and Ai+3

could inherit at most n/2d+j+2 ones from their parent. Thus, if we let Ndenote the size of Ai+1, i.e.,

N=n/2d+1, then Ai+1 ,Ai+2, and Ai+3 together have at most N/2j+ 3N1/2≤N/2j−1ones, since

we stop Phase 1 when N < 64e2log6n. In addition, Ai+4 and Ai+5 may inherit n/(5 ·2d)ones from

their parent. Thus, Ai+1 through Ai+5 may receive N/2j−1+ 2N/5<(2/3)Nones. Therefore,

with β < 1/6, we may apply the Sliding-Window Lemma (3.2) to show that, with probability at least

1−n−4, for j= 3 and c≥6,

k(c)

1≤2βcN

≤n

3c2d+1

≤n

3·2d+1+2j+3 =δ(Ai+1)/3,

where k(c)

1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c.

•Type 5: δ(Ai) = n/2d+1+2j+4, with j= 2. Since Aiis a left child of Ap, in this case, Aicould get

at most n/2d+j+3 + 8ed log2nones from regions left of Ai, by Lemma 3.6. In addition, Aiand Ai+1

could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. Thus, if we let Ndenote the size of Ai, i.e.,

N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 3N1/2≤N/2j=N/22ones, since

we stop Phase 1 when N < 64e2log6n. In addition, Ai+2 and Ai+3 inherit at most 2N/5ones from

their parent. Thus, we can apply the Fractional-Depletion Lemma (3.4), with α= 3 and β < 1/6, so

that the following condition holds with probability at least 1−n−4, for c≥9and j= 2:

k(c)

1≤21

4+1

6c

N

≤n

3·2d+1+2j+4 =δ(Ai)/3,

where k(c)

1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor cand

temperature 4N.

•Type 6: δ(Ai+1) = n/2d+1+2j+3, with j= 2. Since Ai+1 is a right child of Ap, in this case,

Ai+1 could get at most n/2d+j+3 + 8ed log2nones from regions left of Ai+1, by Lemma 3.6, plus

Ai+1 could inherit at most δ(Ap) = n/2d+j+3 ones from Ap. In addition, since j= 2,Ai+2 and

Ai+3 could inherit at most 2N/5ones from their parent, where we let Ndenote the size of Ai+1,

i.e., N=n/2d+1. Thus, Ai+1,Ai+2 , and Ai+3 together have at most N/2j+1 + 3N1/2+ 2N/5≤

(2/3)Nones, since we stop Phase 1 when N < 64e2log6n. Thus, Ai+1 through Ai+5 may receive

N/2j−1+ 2N/5<(2/3)Nones. Therefore, with α= 3 and β < 1/6, we may apply the Fractional-

Depletion Lemma to show that, with probability at least 1−n−4, for c≥9and j= 2:

k(c)

1≤21

4+1

6c

N

≤n

3·2d+1+2j+3 =δ(Ai)/3,

19

where k(c)

1is the number of ones left in Ai+1 after a pass of Annealing sort with repetition factor c

and temperature 4N.

•Type 7: δ(Ai) = n/2d+1+2j+4, with j= 1. Since Aiis a left child of Ap, in this case, Aicould

get at most n/2d+j+2 + 8ed log2nones from regions left of Ai, by Lemma 3.6, plus Aiand Ai+1

could inherit at most δ(Ap) = n/(5 ·2d)ones from Ap. Thus, if we let Ndenote the size of Ai, i.e.,

N=n/2d+1, then Aiand Ai+1 together have at most N/2j+1 + 2N/5+3N1/2≤7N/10 ones, since

we stop Phase 1 when N < 64e2log6n. Thus, we may apply the Fractional-Depletion Lemma (3.4),

with α= 1 and β= 0.175, the following condition holds with probability at least 1−n−4, for a

suitably-chosen constant c, with j= 1,

k(c)

1≤2(0.925)cN

≤n

3·2d+1+2j+4 =δ(Ai)/3,

where k(c)

1is the number of ones left in Aiafter a pass of Annealing sort with repetition factor c.

Thus, Aiand Ai+1 satisfy their respective desired dirtiness bounds w.v.h.p., provided they are at least two

regions from the mixed region or crossover point.

Regions near the Crossover Point. Consider now regions near the crossover point. That is, each region

with a parent that is mixed, bordering the crossover point, or next to a region that either contains or borders

the crossover point. Let us focus speciﬁcally on the case when there is a mixed region on levels dand d+ 1,

as it is the most difﬁcult of these scenarios.

So, having dealt with all the other regions, which have their desired dirtiness satisﬁed after a single up-

and-down pass of Phase 1, with temperature 4N, we are left with four regions near the crossover point, each

of size N=n/2d+1, which we will refer to as A1,A2,A3, and A4. One of A2or A3is mixed—without loss

of generality, let us assume A3is mixed. At this point in the algorithm, we perform an other up-and-down

pass with temperature 4N. So, let us consider how this pass impacts the dirtiness of these four regions. Note

that, by the results of the previous pass with temperature 4N(which were proved above), we have at this

point pushed to these four regions all but at most n/2d+7 + 8e(d+ 1) log2nof the ones and all but at most

n/2d+6 + 8e(d+ 1) log2nof the zeroes. Moreover, these bounds will continue to hold (and could even

improve) as we perform the second up-and-down pass with temperature 4N. Thus, at the beginning of this

second pass, we know that the four regions hold between 2N−N/32 −3N1/2and 3N+N/64 + 3N1/2

zeroes and between N−N/64 −3N1/2and 2N+N/32 + 3N1/2ones, where N=n/2d+1 >64e2log6n.

Let us therefore consider the impact of the second pass with temperature 4Nfor each of these four regions:

•A1: this region is compared to A2,A3, and A4, during the up-pass. Thus, we may apply the Fractional-

Depletion Lemma (3.4) with α= 3. Note, in addition, that, for Nlarge enough, since there are at most

2N+N/32+ 3N1/2≤2.2Nones in all of these four regions, we may apply the Fractional-Depletion

Lemma with β= 0.55. Thus, the following condition holds with probability at least 1−n−4, for a

suitably-chosen constant c,

k(c)

1≤2(0.8)cN

≤N

32 =δ(A1),

where k(c)

1is the number of ones left in A1after a pass of Annealing sort with repetition factor cand

temperature 4N.

•A2: each element of this region is compared to elements in A3and A4in the up-pass and A1in the

down-pass. Note, however, that even if A1receives Nzeroes in the up-pass, there are still at most

20

2N+N/32 + 3N1/2≤2.2Nones in A2∪A3∪A4. Thus, even under this worst-case scenario

(from A2’s perspective), we may apply the Startup Lemma (3.5), with α= 2,β= 0.55, and λ= 1/6,

which implies that

1−α/4 + β−λ≤1−1/10,

i.e., we can take = 1/10 and show that, there is a constant csuch that, w.v.h.p.,

k(c)

1≤N

6< δ(A2),

where k(c)

1is the number of ones left in A2after an up-pass of Annealing sort with repetition factor c

and temperature 4N.

•A3: by assumption, A3is mixed, so it automatically satisﬁes its desired dirtiness bound.

•A4: this region is compared to A1,A2, and A3, in the down-pass. Note further that, w.v.h.p., there

are at most 3N+N/64 + 3N1/2≤3.2Nzeroes in these four regions, for large enough N. Thus,

we may apply a symmetric version of the Startup Lemma (3.5), with α= 3,β= 0.8, and λ= 1/6,

which implies

1−α/4 + β−λ≤1−1/10,

i.e., we can take = 1/10 and show that, there is a constant csuch that, w.v.h.p.,

k(c)

1≤N

6< δ(A4).

where k(c)

1is the number of ones left in A4after a down-pass of Annealing sort with repetition factor

cand temperature 4N.

Thus, after the two up-and-down passes of Annealing sort with temperature 4N, we will have satisﬁed

Claim 3.7 w.v.h.p. In particular, we have proved that each region satisﬁes Claim 3.7 after iteration 2(d+ 1)

of Phase 1 of Annealing sort with a failure probability of at most O(n−4), for each region. Thus, since there

are O(n)such regions per iteration, this implies any iteration will fail with probability at most O(n−3).

Therefore, since there are O(log n)iterations, and we lose only an O(n)factor in our failure probability

when we apply the probabilistic zero-one principle (Lemma 3.1), when we complete the ﬁrst phase of

Annealing sort, w.v.h.p., at the beginning of Phase 2, the total dirtiness of all extreme regions is at most

8elog3n, and the size of each such region is glog6n, for g= 64e2.

21