Conference Paper

# Worst-Case Efficient Sorting with QuickMergesort

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## No full-text available

... However, there are few -mostly negative -results of transferring the theory work into practice. Implementations of non-stable in-place mergesort [22,23,44] are reported to be slower than quicksort from the C++ standard library. Katajainen and Teuhola report that their implementation [44] is even slower than heapsort, which is quite slow for big inputs due to its cache-inefficiency. ...
... Katajainen and Teuhola report that their implementation [44] is even slower than heapsort, which is quite slow for big inputs due to its cache-inefficiency. The fastest non-stable in-place mergesort implementation we have found is QuickMergesort (QMSort) from Edelkamp et al. [22]. Relevant implementations of stable in-place mergesort are WikiSort (derived from [45]) and GrailSort (derived from [38]). ...
... Relevant implementations of stable in-place mergesort are WikiSort (derived from [45]) and GrailSort (derived from [38]). However, Edelkamp et al. [22] report that WikiSort is a factor of more than 1.5 slower than QMSort for large inputs and that GrailSort performs similar to WikiSort. Edelkamp et al. also state that non-in-place mergesort is considerably faster than in-place mergesort. ...
Preprint
We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our comparison-based algorithm, In-place Superscalar Samplesort (IPS$^4$o), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best in-place parallel comparison-based competitor by almost a factor of three. IPS$^4$o also outperforms the best comparison-based competitors in the in-place or not in-place, parallel or sequential settings. IPS$^4$o even outperforms the best integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new in-place radix sorter turns out to be the best algorithm. Claims to have the, in some sense, "best" sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on extensive experiments involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications.
... The price of this method is that the number of comparisons increases, while the number of additional moves is better than with the previous method. We shed some more light on this approach in [13]. ...
... Note, however, that the O(n)-term for the worst case of QuickXsort is rather large because of the medianof-medians algorithm. In [13], the first two authors further elaborate on the technique of median-of-medians pivot selection and show how to bring down the O(n)-term for the worst case to 3.58n for QuickMergesort. ...
... In Theorem 4.7, we did a first step towards such guarantees. Moreover, in [13], we examined the same approach in more detail. Still there are many possibilities for good worst-case guarantees to investigate. ...
Article
Full-text available
QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare’s Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by o(n)-terms. For median-of-k pivot selection for some constant k, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most nlgn-0.8358n+O(logn) comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to nlgn-1.4112n+o(n) for a remaining gap of only 0.0315n comparisons to the known lower bound (while using only O(logn) additional space and O(nlogn) time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser’s Introsort.
... Traditional comparison-based sorting algorithms such as Quick Sort, Merge Sort, and Heap Sort require at least loдn! ≈ nloдn − 1.44n operations to sort n data elements [21]. Among these algorithms, Quick Sort can achieve O(nloдn) complexity on average to sort n data elements, but its performance drops to O(n 2 ) in the worst case. ...
... Among these algorithms, Quick Sort can achieve O(nloдn) complexity on average to sort n data elements, but its performance drops to O(n 2 ) in the worst case. Although Merge Sort gives a worst-case guarantee of nloдn − 0.91n operations to sort n data elements, it requires larger space which is linear to the number of data elements [21]. To avoid the drawbacks of these algorithms and further reduce the complexity of sorting, researchers tried to combine different sorting algorithms to leverage their strengths and circumvent their weaknesses. ...
... Stefan Edelkamp et al. introduced Quickx Sort [20] which uses at most nloдn − 0.8358 + O(loдn) operations to sort n data elements in place. The authors also introduced median-of-medians Quick Merge sort as a variant of Quick Merge Sort using the median-of-medians algorithms for pivot selection [21], which further reduces the number of operations down to nloдn + 1.59n + O(n 0.8 ). Non-comparative sorting algorithms, such as Bucket Sort [13], Counting Sort, and Radix Sort [18], are not restricted by the O(nloдn) boundary, and can reach O(n) complexity. ...
Research
Full-text available
Sorting is a fundamental operation in computing. However, the speed of state-of-the-art sorting algorithms on a single thread have reached their limits. Meanwhile, deep learning has demonstrated its potential to provide significant performance improvements on data mining and machine learning tasks. Therefore, it is interesting to explore whether sorting can also be speed up by deep learning techniques. In this paper, a neural network based data distribution aware sorting method named NN-sort is presented. Compared to traditional comparison-based sorting algorithms, which need to compare the data elements in pairwise, NN-sort leverages the neural network model to learn the data distribution and uses it to map disordered data elements into ordered ones. Although the complexity of NN-sort is nloдn in theory, it can run in near-linear time as being observed in most of the cases. Experimental results on both synthetic and real-world datasets show that NN-sort yields performance improvement by up to 10.9x over traditional sorting algorithms
... The price of this method is that the number of comparisons increases, while the number of additional moves is better than with the previous method. We shed some more light on this approach in [10]. ...
... Notice that the O(n)-term for the worst case of QuickMergesort is rather large because of the median-of-medians algorithm. Nevertheless, in [10], we elaborate the technique of median-of-medians pivot selection in more detail. In particular, we show how to reduce the O(n)-term for the worst case down 3.58n for QuickMergesort. ...
... In Theorem 5.7, we did a first step towards such guarantees. Moreover, in [10], we examined the same approach in more detail. Still there are many possibilities for good worst-case guarantees to investigate. ...
Preprint
Full-text available
QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by $o(n)$-terms. For median-of-$k$ pivot selection for some constant $k$, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most $n \lg n - 0.8358n + O(\log n)$ comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to $n \lg n- 1.4106n + o(n)$ for a remaining gap of only $0.0321n$ comparisons to the known lower bound (while using only $O(\log n)$ additional space and $O(n \log n)$ time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser's Introsort.
... In many applications (e.g., sorting), it is not important to find an exact median, or any other precise order statistic, for that matter, and an approximate median suffices [18]. For instance, quick-sort type algorithms aim at finding a (not necessarily perfect) balanced partition rather quickly; see e.g., [5,22]. ...
Preprint
Full-text available
Given a sequence $A$ of $n$ numbers and an integer (target) parameter $1\leq i\leq n$, the (exact) selection problem asks to find the $i$-th smallest element in $A$. An element is said to be $(i,j)$-mediocre if it is neither among the top $i$ nor among the bottom $j$ elements of $S$. The approximate selection problem asks to find a $(i,j)$-mediocre element for some given $i,j$; as such, this variant allows the algorithm to return any element in a prescribed range. In the first part, we revisit the selection problem in the two-party model introduced by Andrew Yao (1979) and then extend our study of exact selection to the multiparty model. In the second part, we deduce some communication complexity benefits that arise in approximate selection.
Article
We present new sequential and parallel sorting algorithms that now represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. Somewhat surprisingly, part of the speed advantage is due to the additional feature of the algorithms to work in-place, i.e., they do not need a significant amount of space beyond the input array. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our new comparison-based algorithm In-place Parallel Super Scalar Samplesort ( IPS ⁴ o ) , combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best previous in-place parallel comparison-based sorting algorithms by almost a factor of three. That algorithm also outperforms the best comparison-based competitors regardless of whether we consider in-place or not in-place, parallel or sequential settings. Another surprising result is that IPS ⁴ o even outperforms the best (in-place or not in-place) integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new In-place Parallel Super Scalar Radix Sort ( IPS ² Ra ) turns out to be the best algorithm. Claims to have the – in some sense – “best” sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on an extensive experimental study involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the claims made about the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications. This is particularly true for integer sorting algorithms giving one reason to prefer comparison-based algorithms for robust general-purpose sorting.
Article
Full-text available
The linear pivot selection algorithm, known as median-of-medians, makes the worst case complexity of quicksort be $\mathrm{O}(n\ln n)$. Nevertheless, it has often been said that this algorithm is too expensive to use in quicksort. In this article, we show that we can make the quicksort with this kind of pivot selection approach be efficient.
Conference Paper
Full-text available
The Median of Medians (also known as BFPRT) algorithm, although a landmark theoretical achievement, is seldom used in practice because it and its variants are slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm QuickselectAdaptive based on a refined definition of MedianOfMedians. The algorithm's performance brings deterministic selection---along with its desirable properties of reproducible runs, predictable run times, and immunity to pathological inputs---in the range of practicality. We demonstrate results on independent and identically distributed random inputs and on normally-distributed inputs. Measurements show that QuickselectAdaptive is faster than state-of-the-art baselines.
Conference Paper
Full-text available
In quicksort, due to branch mispredictions, a skewed pivot-selection strategy can lead to a better performance than the exact-median pivot-selection strategy, even if the exact median is given for free. In this paper we investigate the effect of branch mispredictions on the behaviour of mergesort. By decoupling element comparisons from branches, we can avoid most negative effects caused by branch mispredictions. When sorting a sequence of n elements, our fastest version of mergesort performs n log2n + O(n) element comparisons and induces at most O(n) branch mispredictions. We also describe an in-situ version of mergesort that provides the same bounds, but uses only O(log2n) words of extra memory. In our test computers, when sorting integer data, mergesort was the fastest sorting method, then came quicksort, and in-situ mergesort was the slowest of the three. We did a similar kind of decoupling for quicksort, but the transformation made it slower.
Conference Paper
Full-text available
We present a new analysis for QuickHeapsort splitting it into the analysis of the partition-phases and the analysis of the heap-phases. This enables us to consider samples of non-constant size for the pivot selection and leads to better theoretical bounds for the algorithm. Furthermore we introduce some modifications of QuickHeapsort, both in-place and using n extra bits. We show that on every input the expected number of comparisons is n lg n - 0.03n + o(n) (in-place) respectively n lg n -0.997 n+ o (n). Both estimates improve the previously known best results. (It is conjectured in Wegener93 that the in-place algorithm Bottom-Up-Heapsort uses at most n lg n + 0.4 n on average and for Weak-Heapsort which uses n extra-bits the average number of comparisons is at most n lg n -0.42n in EdelkampS02.) Moreover, our non-in-place variant can even compete with index based Heapsort variants (e.g. Rank-Heapsort in WangW07) and Relaxed-Weak-Heapsort (n lg n -0.9 n+ o (n) comparisons in the worst case) for which no O(n)-bound on the number of extra bits is known.
Conference Paper
Full-text available
First we present a new variant of Merge-sort, which needs only 1.25n space, because it uses space again, which becomes available within the current stage. It does not need more comparisons than classical Merge-sort. The main result is an easy to implement method of iterating the procedure in-place starting to sort 4/5 of the elements. Hereby we can keep the additional transport costs linear and only very few comparisons get lost, so that n log n–0.8n comparisons are needed. We show that we can improve the number of comparisons if we sort blocks of constant length with Merge-Insertion, before starting the algorithm. Another improvement is to start the iteration with a better version, which needs only (1+)n space and again additional O(n) transports. The result is, that we can improve this theoretically up to n log n –1.3289n comparisons in the worst case. This is close to the theoretical lower bound of n log n–1.443n. The total number of transports in all these versions can be reduced to n log n+O(1) for any >0.
Conference Paper
Full-text available
Quickselect with median-of-3 is largely used in practice and its behavior is fairly well understood. However, the following natural adaptive variant, which we call proportion-from-3, had not been previously analyzed: choose as pivot the smallest of the sample if the rank of the sought element is small, the largest if the rank is large, and the median if the rank is medium". We first analyze proportion-from-2 and then proportion-from3. We also analyze ν-find, a generalization of proportion-from-3 with interval breakpoints at ν and 1 -- ν. We show that there exists an optimal value of ν and we also provide the range of values of ν where ν-find outperforms median-of-3. Our results atrongly suggest that a suitable implementation of this variant could be the method of choice in a practical setting. Finally, we also show that proportion-from-s and similar strategies are optimal when s → ∞
Article
Full-text available
In an earlier research paper [HL1], we presented a novel, yet straightforward linear-time algorithm for merging two sorted lists in a fixed amount of additional space. Constant of proportionality estimates and empirical testing reveal that this procedure is reasonably competitive with merge routines free to squander unbounded additional memory, making it particularly attractive whenever space is a critical resource. In this paper, we devise a relatively simple strategy by which this efficient merge can be made stable, and extend our results in a nontrivial way to the problem of stable sorting by merging. We also derive upper bounds on our algorithms' constants of proportionality, suggesting that in some environments (most notably external file processing) their modest run-time premiums may be more than offset by the dramatic space savings achieved.
Article
Full-text available
This short note reports a master theorem on tight asymptotic solutions to divide-and-conquer recurrences with more than one recursive term: for example, T(n) = 1/4 T(n/16) + 1/3 T(3n/5) + 4 T(n/100) + 10 T(n/300) + n^2.
Article
The early algorithms for in-place merging were mainly focused on the time complexity, whereas their structures themselves were ignored. Most of them therefore are elusive and of only theoretical significance. For this reason, the paper simplifies the unstable in-place merge by Geffert et al. [V. Geffert, J. Katajainen, T. Pasanen, Asymptotically efficient in-place merging, Theoret. Comput. Sci. 237 (2000) 159–181]. The simplified algorithm is simple yet practical, and has a small time complexity.
Article
We present an efficient and practical algorithm for the internal sorting problem. Our algorithm works in-place and, on the average, has a running-time of in the size n of the input. More specifically, the algorithm performs comparisons and element moves on the average. An experimental comparison of our proposed algorithm with the most efficient variants of Quicksort and Heapsort is carried out and its results are discussed.
Article
Two new linear-time algorithms for in-place merging are presented. Both algorithms perform at most (1 + t)m + n/2 2 + o(m) element comparisons, where m and n are the sizes of the input sequences, m ⩽ n, and t = ILlog 2(n/m)⌋. The first algorithm is for unstable merging and it carries out no more than 4(m + n) + o(n) element moves. The second algorithm is for stable merging and it accomplishes at most 15m + 13n + o(n) moves.
Article
The number of comparisons required to select the i-th smallest of n numbers is shown to be at most a linear function of n by analysis of a new selection algorithm—PICK. Specifically, no more than 5.4305 n comparisons are ever required. This bound is improved for extreme values of i, and a new lower bound on the requisite number of comparisons is also proved.
Article
A new selection algorithm is presented which is shown to be very efficient on the average, both theoretically and practically. The number of comparisons used to select the ith smallest of n numbers is n + min(i,n-i) + o(n). A lower bound within 9 percent of the above formula is also derived.
Article
Quickselect with median-of-3 is largely used in practice and its behavior is fairly well understood. However, the following natural adaptive variant, which we call proportion-from-3, had not been previously analyzed: “choose as pivot the smallest of the sample if the relative rank of the sought element is below 1/3, the largest if the relative rank is above 2/3, and the median if the relative rank is between 1/3 and 2/3.” We first analyze the average number of comparisons made when using proportion-from-2 and then for proportion-from-3. We also analyze ν-find, a generalization of proportion-from-3 with interval breakpoints at ν and 1-ν. We show that there exists an optimal value of ν and we also provide the range of values of ν where ν-find outperforms median-of-3. Then, we consider the average total cost of these strategies, which takes into account the cost of both comparisons and exchanges. Our results strongly suggest that a suitable implementation of ν-find could be the method of choice in a practical setting. We also study the behavior of proportion-from-s with s>3 and in particular we show that proportion-from-s-like strategies are optimal when s→∞.
Article
SUMMARY We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra's Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design techniques apply in domains beyond sorting.
Article
Quicksort is the preferred in-place sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worst-case time bound is Theta(N ). Previous attempts to protect against the worst case by improving the way quicksort chooses pivot elements for partitioning have increased the average computing time too much---one might as well use heapsort, which has a Theta(N log N) worst-case time bound but is on the average 2 to 5 times slower than quicksort. A similar dilemma exists with selection algorithms (for finding the i-th largest element) based on partitioning. This paper describes a simple solution to this dilemma: limit the depth of partitioning, and for subproblems that exceed the limit switch to another algorithm with a better worst-case bound. Using heapsort as the "stopper" yields a sorting algorithm that is just as fast as quicksort in the average case but also has an Theta(N log N) worst case time bound. For selection, a hybrid of Hoare's find algorithm, which is linear on average but quadratic in the worst case, and the Blum-Floyd-Pratt-Rivest-Tarjan algorithm is as fast as Hoare's algorithm in practice, yet has a linear worst-case time bound. Also discussed are issues of implementing the new algorithms as generic algorithms and accurately measuring their performance in the framework of the C++ Standard Template Library.
Article
. A variant of Heapsort---named Ultimate Heapsort---is presented that sorts n elements in-place in Theta(n log 2 (n+ 1)) worst-case time by performing at most n log 2 n + Theta(n) key comparisons and n log 2 n + Theta(n) element moves. The secret behind Ultimate Heapsort is that it occasionally transforms the heap it operates with to a two-layer heap which keeps small elements at the leaves. Basically, Ultimate Heapsort is like Bottom-Up Heapsort but, due to the two-layer heap property, an element taken from a leaf has to be moved towards the root only O(1) levels, on an average. Let a[1::n] be an array of n elements each consisting of a key and some information associated with this key. This array is a (maximum) heap if, for all i 2 f2; : : : ; ng, the key of element a[bi=2c] is larger than or equal to that of element a[i]. That is, a heap is a pointer-free representation of a left complete binary tree, where the elements stored are partially ordered according to their key...
Article
Two in-place variants of the classical mergesort algorithm are analysed in detail. The first, straightforward variant performs at most N log 2 N + O(N ) comparisons and 3N log 2 N + O(N ) moves to sort N elements. The second, more advanced variant requires at most N log 2 N + O(N ) comparisons and "N log 2 N moves, for any fixed " ? 0 and any N ? N ("). In theory, the second one is superior to advanced versions of heapsort. In practice, due to the overhead in the index manipulation, our fastest in-place mergesort behaves still about 50 per cent slower than the bottom-up heapsort. However, our implementations are practical compared to mergesort algorithms based on in-place merging. Key words: sorting, mergesort, in-place algorithms CR Classification: F.2.2 1.
Select with groups of 3 or 4
• Ke Chen
Ke Chen and Adrian Dumitrescu. Select with groups of 3 or 4. In Frank Dehne, Jörg-Rüdiger Sack, and Ulrike Stege, editors, Algorithms and Data Structures -14th International Symposium, WADS 2015, Victoria, BC, Canada, August 5-7, 2015. Proceedings, volume 9214 of Lecture Notes in Computer Science, pages 189-199.
QuickXsort: Efficient Sorting with n log n − 1.399n + o(n) Comparisons on Average
• Stefan Edelkamp
• Armin Weiß
Stefan Edelkamp and Armin Weiß. QuickXsort: Efficient Sorting with n log n − 1.399n + o(n) Comparisons on Average. In Edward A. Hirsch, Sergei O. Kuznetsov, Jean-Éric Pin, and Nikolay K. Vereshchagin, editors, CSR, volume 8476 of Lecture Notes in Computer Science, pages 139-152. Springer, 2014.
Quickmergesort: Practically efficient constant-factor optimal sorting
• Stefan Edelkamp
• Armin Weiß
Stefan Edelkamp and Armin Weiß. Quickmergesort: Practically efficient constant-factor optimal sorting.
Ratio based stable in-place merging
• Pok-Son Kim
• Arne Kutzner
Pok-Son Kim and Arne Kutzner. Ratio based stable in-place merging. In Manindra Agrawal, Ding-Zhu Du, Zhenhua Duan, and Angsheng Li, editors, Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi'an, China, April 25-29, 2008. Proceedings, volume 4978 of Lecture Notes in Computer Science, pages 246-257. Springer, 2008.
Computing minimum / maximum of strange two variable function (answer)
• Iosif Pinelis
Iosif Pinelis. Computing minimum / maximum of strange two variable function (answer). MathOverflow. URL: https://mathoverflow.net/q/306757 (visited on: 2018-07-25).
Average cost of QuickXsort with pivot sampling
• Sebastian Wild
Sebastian Wild. Average cost of QuickXsort with pivot sampling. In James Allen Fill and Mark Daniel Ward, editors, 29th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms, AofA 2018, June 25-29, 2018, Uppsala, Sweden, volume 110 of LIPIcs, pages 36:1-36:19. Schloss Dagstuhl -Leibniz-Zentrum fuer Informatik, 2018.