ArticlePDF Available

Abstract

A method is presented for the analysis of various generalzotions of quicksort. The average asymptotic number of comparisons needed is shown to be ozn log2 (n). A formula is derived expressing of in terms of the probability distribution of the “bound” of a partition. This formula assumes a partcdarly simple form for a generalization already considered by Hoare, namely, choice of the bound as median of a random sample. The main contribution of this paper is another generaization of quicksort, which uses a bounding interval instead of a single element as bound. This generalization turns out to be easy to implement in a computer program. A numerical approximation shows that a = 1.140 for this version of qvicksort compared with 1.386 for the original. This implies a decrease in number of comparisons of 18 percent; actual tests showed about 15 percent saving in computing time.
... Often this sample contains only a few elements, say 3 or 5. The first theoretical analysis of this strategy is due to van Emden [1970]. Martínez and Roura [2001] settled the exact analysis of the leading term of this strategy in 2001. ...
... as known from [Kushagra et al. 2014]. This improves on classical quicksort (2n ln n + O(n) comparisons on average), but is worse than optimal dual-pivot quicksort (1.8n ln n + O(n) comparisons on average [Aumüller and Dietzfelbinger 2015]) or median-of-3 quicksort (1.714n ln n + O(n) comparisons on average [van Emden 1970]). ...
... In this section we use the theory developed so far to discuss the optimal average comparison count of k-pivot quicksort. We compare the result to the well known medianof-k strategy of classical quicksort [van Emden 1970]. ...
Article
Full-text available
Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step $k$ pivots are used to split the input into $k + 1$ segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This paper studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well known median-of-$k$ approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorithm for rearranging elements in the partitioning step is carried out, observing mainly how often array cells are accessed during partitioning. The algorithm behaves best if 3 or 5 pivots are used. Experiments show that this translates into good cache behavior and is closest to predicting observed running times of multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots from a sample affects sorting cost.
... The classical model for the analysis of sorting algorithm considers the average number of key comparisons on random permutations. Quicksort has been extensively studied under this model, including variations like choosing the pivot as median of a sample [7,4,15,10,3]: Let c n denote the expected number of comparisons used by classic Quicksort (as given in [16]), when each pivot is chosen as median of a sample of 2t + 1 random elements. c n fulfills the recurrence ...
... since n − 1 comparisons are needed in the first partitioning step, and we have two recursive calls of random sizes, where the probability to have sizes j 1 and j 2 is given by the fraction of binomials (see [9] for details). This recurrence can be solved asymptotically [4,15] to ...
... In Yaroslavskiy's partitioning, indices k and g together scan A once, but index ℓ scans the leftmost segment a second time. On average, the latter contains a third of all elements, yielding 4 3 n scanned elements in total. ...
Article
Full-text available
I discuss the new dual-pivot Quicksort that is nowadays used to sort arrays of primitive types in Java. I sketch theoretical analyses of this algorithm that offer a possible, and in my opinion plausible, explanation why (a) dual-pivot Quicksort is faster than the previously used (classic) Quicksort and (b) why this improvement was not already found much earlier.
... The first asymptotic analysis of the expection for the number of comparisons for the 2K + 1-version is given in [40]. He showed that E(X n ) n ln n → n e K := 1 ...
... Here g K (see (1.30)) is the density of the (K + 1)-th order statistic of 2K + 1 independent random variables with a uniform distribution. (This is a beta(K + 1, K + 1) distribution on [0, 1].) Van Emden's [40], derived that E(X n ) = e K n ln n + o(n). [34] proved using an example that it is not correct. ...
Thesis
Full-text available
In this thesis Quicksort and random walk on nonnegative integers are studied. The connection between the algorithm and the random walk was initiated by Louchard [25]. The type of the random walk of being transient or recurrent is one of the most important concepts to be studied, in general, for random walk as Markov chains. The random walk is said to be recurrent if the walker starts at some state and the return time to the starting state is finite almost surely, and transient if there is a positive probability of not returning to the starting state. The recurrent random walk is positive if the mean return time is also finite, and otherwise it is null recurrent. Quicksort has been invented by C.A.R.Hoare [15] and [16] in the early 1960s and it is the most widely used sorting algorithm (see also [20], [36], [37], [38], and [39]). It is, for instance, the standard sorting procedure in Unix systems, and in a special issue of Computing in Science & Engineering, guest editors Jack Dongarra and Francis Sullivan ([10]; see also [18]) choose Quicksort as one of the ten algorithms ”with the greatest influence on the development and practice of science and engineering in the 20th century”.
Article
We study a rule of growing a sequence { t n } of finite subtrees of an infinite m -ary tree T. Independent copies { ω ( n )} of a Bernoulli-type process ω on m letters are used to trace out a sequence of paths in T. The tree t n is obtained by cutting each , at the first node such that at most σ paths out of , pass through it. Denote by H n the length of the longest path, h n the length of the shortest path, and L n the length of the randomly chosen path in t n . It is shown that, in probability, H n – log a n = O (1), h n – log b ( n/ log n ) = 0(1), (or h n – log b ( n/ log log n ) = O (1)), and that is asymptotically normal. The parameters a, b, c depend on the distribution of ω and, in case of a , also on σ . These estimates describe respectively the worst, the best and the typical case behavior of a ‘trie’ search algorithm for a dictionary-type information retrieval system, with σ being the capacity of a page.
Conference Paper
Quicksort may be the most familiar and important randomised algorithm studied in computer science. It is well known that the expected number of comparisons on any input of n distinct keys is Θ(n ln n), and the probability of a large deviation above the expected value is very small. This probability was well estimated some time ago, with an ad-hoc proof: we shall revisit this result in the light of further work on concentration.
Chapter
Dieses Kapitel enthält in der Hauptsache eine ausgiebige Menge von Beispielen, die die Vërwendung der im vorangehenden Kapitel behandelten Datenstrukturen erläutern und zeigen, wie stark die Wahl der Struktur der zugrunde liegenden Daten die Algorithmen beeinflusst, die eine bestimmte Aufgabe ausführen.
Article
A TopN sort algorithm based on multiple filtering was developed to improve the performance of the conventional TopN sort algorithm. The algorithm first constructs a sampling set of k × N elements by randomly sampling the original dataset. The algorithm then finds the Mth element (in decreasing order) from the sampling set, which is used to filter out elements of the original dataset that are smaller than this element. This process is repeated till the number of elements in the original dataset is less than k × N. The algorithm then sorts the remaining elements in the original dataset using quicksort and outputs the first N elements. A theoretical analysis and sample comparisons show that the temporal performance of this TopN algorithm is about 50% better than conventional TopN algorithms, such as heapsort.
Article
We revisit classical textbook sorting or selecting algorithms under a complexity model that fully takes into account the elementary comparisons between symbols composing the records to be processed. Our probabilistic models belong to a broad category of information sources that encompasses memoryless (i.e., independent-symbols) and Markov sources, as well as many unbounded-correlation sources. Under this perspective, commonly accepted assertions, such as ``the complexity of Quicksort is O(n log n)'', are to be challenged, and the relative merits of sorting and searching methods relying on different principles (e.g., radix-based versus comparison-based) can be precisely assessed. For instance we establish that, under our conditions, the average-case complexity of QuickSort is O(n log2 n) (rather than O(n log n), classically),whereas that of QuickSelect remains O(n). In fact we propose a framework which allows to revisiting three sorting algorithms (QuickSort, Insertion Sort, Bubble Sort) and two selection algorithms (QuickSelect and Minimum Selection). For each algorithm a precise asymptotic estimate for the dominant term of the mean number of symbol comparisons is given where the constants involve various notions of coincidence depending on the algorithm. Explicit expressions for the implied constants are provided by methods from analytic combinatorics. As an aside, in our setting, we are able to derive a lower bound for the average number of symbol comparisons for algorithms solving the sorting problem and using usual comparisons between strings.
Article
An interactive program with a graphical display has been developed for the approximation of data by means of a linear combination of functions (including splines) selected by the user. The coffiecients of the approximation are determined by linear programming ...
Article
Many algebraic translators provide the programmer with a limited ability to allocate storage. Of course one of the most desirable features of these translators is the extent to which they remove the burden of storage allocation from the programmer. Nevertheless, ...