Figure 1 - uploaded by Sebastian Wild
Content may be subject to copyright.
Control flow graph of the main partitioning loop of JRE7 (lines 20-34 of Listing 1 on page 13). These blocks are the only ones that are executed a linearithmic number of times, so they determine the leading term of costs. In the upper right corner of each block, the number of Bytecode instructions is given. Backward arcs are highlighted.

Control flow graph of the main partitioning loop of JRE7 (lines 20-34 of Listing 1 on page 13). These blocks are the only ones that are executed a linearithmic number of times, so they determine the leading term of costs. In the upper right corner of each block, the number of Bytecode instructions is given. Backward arcs are highlighted.

Source publication
Conference Paper
Full-text available
Recent results on Java 7's dual pivot Quicksort have revealed its highly asymmetric nature. These insights suggest that asymmetric pivot choices are preferable to symmetric ones for this Quicksort variant. From a theoretical point of view, this should allow us to improve on the current implementation in Oracle's Java 7 runtime library. In this pape...

Contexts in source publication

Context 1
... also correctly identifies the hot spots of the algorithm, i. e. the basic blocks which are executed asymptotically most often. They are shown in Figure 1 on the preceding page. Execution of the loop is terminated once pointers k and g have crossed (exit condition of block 1). ...
Context 2
... the overall number of loop iterations only depends on how balanced the recursion tree glob- ally is, but not on the direction of asymmetry, i. e. whether pivots are larger or smaller than exact tertiles in expectation. The direction of asymmetry does how- ever influence which paths through Figure 1 the itera- tions take: The ranks of the chosen pivots determine the odds for outcomes of comparisons in branching blocks. In total, there are five different cycles in Figure 1: ...
Context 3
... direction of asymmetry does how- ever influence which paths through Figure 1 the itera- tions take: The ranks of the chosen pivots determine the odds for outcomes of comparisons in branching blocks. In total, there are five different cycles in Figure 1: ...
Context 4
... have to take into account that C 3 and C 4 actually count as two iterations, since k and g move two steps closer to each other on these paths. Ordering the cycles by costs implies the following preference: Figure 11 on page 9). We can now use the branching information from the control flow graph to make more iterations choose cheap cycles. ...
Context 5
... 10: Relative basic block running times for all blocks that are executed Θ(n log n) times. The numbers correspond to the blocks IDs used in Figure 1. It is clearly visible that the running time contribution of some basic blocks is heavily influenced by the pivot choice. ...
Context 6
... the asymptotically dominating basic blocks -i. e. those with a linearithmic number of executions -surprisingly, the picture changes, as shown in Figure 10. ...
Context 7
... carries over to the costs of the five cycles C 1 , . . . , C 5 identified in the control flow graph of the partitioning method; see Figure 11. A closer inspec- tion of the figure explains why JRE7 (1,3) performs worse than expected based on the number of executed Byte- codes: For JRE7 block times, C 5 is the cheapest cycle by far, whereas C 1 is rather expensive. ...

Similar publications

Preprint
Full-text available
We consider active learning for binary classification in the agnostic pool-based setting. The vast majority of works in active learning in the agnostic setting are inspired by the CAL algorithm where each query is uniformly sampled from the disagreement region of the current version space. The sample complexity of such algorithms is described by a...
Preprint
Full-text available
Regression is one of the most commonly used statistical techniques. However, testing regression systems is a great challenge because of the absence of test oracle in general. In this paper, we show that Metamorphic Testing is an effective approach to test multiple linear regression systems. In doing so, we identify intrinsic mathematical properties...
Conference Paper
Full-text available
The study of ancient writings has great value for archaeology and philology. Essential forms of material are photographic characters, but manual photographic character recognition is extremely time-consuming and expertise-dependent. Automatic classification is therefore greatly desired. However, the current performance is limited due to the lack of...

Citations

... And for the other sums in Equation (3): [15] and [16]). This is the same value of the expected number of comparisons, when one pivot chosen in the classical Quicksort [17]. ...
Article
Full-text available
Sorting an array of objects such as integers, bytes, floats, etc is considered as one of the most important problems in Computer Science. Quicksort is an effective and wide studied sorting algorithm to sort an array of n distinct elements using a single pivot. Recently, a modified version of the classical Quicksort was chosen as standard sorting algorithm for Oracles Java 7 routine library due to Vladimir Yaroslavskiy. The purpose of this paper is to present the different behavior of the classical Quicksort and the Dual-pivot Quicksort in complexity. In Particular, we discuss the convergence of the Dual-pivot Quicksort process by using the contraction method. Moreover we show the distribution of the number of comparison done by the duality process converges to a unique fixed point.
... This however leads to very unbalanced distributions of sizes for the recursive calls, such that a trade-off between partitioning costs and balance of subproblem sizes has to be found. We have demonstrated experimentally that there is potential to tune dual-pivot Quicksort using skewed pivots (Wild et al., 2013c), but only considered a small part of the parameter space. It will be the purpose of this paper to identify the optimal way to sample pivots by means of a precise analysis of the resulting overall costs, and to validate (and extend) the empirical findings that way. ...
... the code to count key comparisons, swaps and scanned elements. For counting the number of executed Java Bytecode instructions, we used our tool MaLiJAn, which can automatically generate code to count the number of Bytecodes (Wild et al., 2013c). All reported counts are averages of runs on 1000 random permutations of the same size. ...
... is the optimal sampling parameter for most k (and all values for k in with t = (0, 1, 2), despite using the same sample size k = 5. Whether this also results in a performance gain in practice, however, depends on details of the runtime environment (Wild et al., 2013c). (One should also note that the savings are only 2% respectively 4%.) Since these two cost measures (Bytecodes and scanned elements) are arguably the ones with highest impact in the running time, it is very good news from the practitioner's point of view that the optimal choice for one of them is also reasonably good for the other; such choice should yield a close-to-optimal running time (as far as sampling is involved). ...
Article
Full-text available
The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries in its behavior. They were shown to cause a basic variant of this algorithm to use less comparisons than classic single-pivot Quicksort implementations. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach in this paper and also give the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of I/Os in the external-memory model and determine the optimal order statistics for each of these cost measures. It turns out that - unlike for classic Quicksort, where it is optimal to choose the pivot as median of the sample - the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: Compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
... However, the classic partitioning methods treat elements smaller and larger than the pivot in symmetric ways -unlike Yaroslavskiy's partitioning algorithm: Depending on how elements relate to the two pivots, one of five different execution paths is taken in the partitioning loop, and these have highly different costs (Wild et al., 2013c)! How often each of these five paths is taken thus depends on the ranks of the two pivots, which we can push in a certain direction by selecting other order statistics of a sample than the tertiles. ...
... It is interesting to note in this context that the implementation in Oracle's Java 7 runtime librarywhich uses t = (1, 1, 1) -executes asymptotically more Bytecodes (on random permutations) than Y w t with t = (0, 1, 2), despite using the same sample size k = 5. Whether this also results in a performance gain in practice, however, depends on details of the runtime environment (Wild et al., 2013c). ...
... In this paper, we gave the precise leading term asymptotic of the average costs of Quicksort with Yaroslavskiy's dual-pivot partitioning method and selection of pivots as arbitrary order statistics of a constant size sample. Our results confirm earlier empirical findings (Yaroslavskiy, 2010;Wild et al., 2013c) that the inherent asymmetries of the partitioning algorithm call for a systematic skew in selecting the pivots -the tuning of which requires a quantitative understanding of the delicate trade-off between partitioning costs and the distribution of subproblem sizes for recursive calls. Moreover, we have demonstrated that this tuning process is very sensitive to the choice of suitable cost measures, which firmly suggests a detailed analyses in the style of Knuth, instead of focusing on the number of comparisons and swaps only. ...
Article
Full-text available
The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries in its behavior. They were shown to cause a basic variant of this algorithm to use less comparisons than classic single-pivot Quicksort implementations. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample and give the precise leading term of the average number of comparisons, swaps and executed Java Bytecode instructions. It turns out that - unlike for classic Quicksort, where it is optimal to choose the pivot as median of the sample - the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, the optimal skew heavily depends on the employed cost measure; most strikingly, abstract costs like the number of swaps and comparisons yield a very different result than counting Java Bytecode instructions, which can be assumed most closely related to actual running time.
... Future research may focus on this scenario, trying to identify an optimal choice for the pivots. Related results are known for classic Quickselect [26, 28] and Yaroslavskiy's algorithm in Quicksorting [48]. Furthermore, it would be interesting to extend our analysis to the number of bit comparisons instead of atomic key comparisons. ...
Article
Full-text available
There is excitement within the algorithms community about a new partitioning method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly faster than the case when it runs under classic partitioning methods. We show that this improved performance in Quicksort is not sustained in Quickselect; a variant of Quicksort for finding order statistics. We investigate the number of comparisons made by Quickselect to find a key with a randomly selected rank under Yaroslavskiy's algorithm. This grand averaging is a smoothing operator over all individual distributions for specific fixed order statistics. We give the exact grand average. The grand distribution of the number of comparison (when suitably scaled) is given as the fixed-point solution of a distributional equation of a contraction in the Zolotarev metric space. Our investigation shows that Quickselect under older partitioning methods slightly outperforms Quickselect under Yaroslavskiy's algorithm, for an order statistic of a random rank. Similar results are obtained for extremal order statistics, where again we find the exact average, and the distribution for the number of comparisons (when suitably scaled). Both limiting distributions are of perpetuities (a sum of products of independent mixed continuous random variables).
... The number of executed Bytecode instructions has been shown to resemble actual running time [Camesi et al., 2006], even though just in time compilation can have a tremendous influence [Wild et al., 2013] and some aspects of modern processor architectures are neglected. ...
... The actual Java 7 runtime library implementation uses M = 46, which seems far from optimal at first sight. Note however that the implementation uses the more elaborate pivot selection scheme tertiles of five [Wild et al., 2013], which implies additional constant overhead per partitioning step. ...
... Then, for each block the number of Bytecode instructions was counted, the result is given in Table 5. We have automated this process as part of our tool MaLiJAn (Maximum Likelihood Java Analyzer), which provides a means of automating empirical studies of algorithms based on their control flow graphs [Laube and Nebel, 2010;Wild et al., 2013]. ...
Article
Full-text available
In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance—on the contrary: previous theoretical studies of other dual-pivot Quicksort variants even discouraged the use of two pivots. In 2012, two of the authors gave an average case analysis of a simplified version of Yaroslavskiy's algorithm, proving that savings in the number of comparisons are possible. However, Yaroslavskiy's algorithm needs more swaps, which renders the analysis inconclusive. To force the issue, we herein extend our analysis to the fully detailed style of Knuth: we determine the exact number of executed Java Bytecode instructions. Surprisingly, Yaroslavskiy's algorithm needs sightly more Bytecode instructions than a simple implementation of classic Quicksort—contradicting observed running times. As in Oracle's library implementation, we incorporate the use of Insertionsort on small subproblems and show that it indeed speeds up Yaroslavskiy's Quicksort in terms of Bytecodes; but even with optimal Insertionsort thresholds, the new Quicksort variant needs slightly more Bytecode instructions on average. Finally, we show that the (suitably normalized) costs of Yaroslavskiy's algorithm converge to a random variable whose distribution is characterized by a fixed-point equation. From that, we compute variances of costs and show that for large n, costs are concentrated around their mean.
... As noted by Wild et al. [18], considering only key comparisons and swap operations does not suffice for evaluating the practicability of sorting algorithms. In Section 9, we will present preliminary experimental results that indicate the following: When sorting integers, the "optimal" method of Section 5 is slower than Yaroslavskiy's algorithm. ...
... We choose the second-largest and fourthlargest as pivots. (This is the pivot choice that is used in Yaroslavskiy's algorithm in the JRE7 implementation, see [18] for further discussion.) The probability that p and q, p < q, are chosen as pivots is exactly (s · m · )/ n 5 . ...
... Applying (12), we get E(C Y n ) = 1.704n ln n+o(n ln n) key comparisons. (Note that Wild et al. [18] calculated this leading coefficient as well.) This is slightly better than "clever quicksort", which uses the median of a sample of three elements as a single pivot element and achieves 1.714n ln n + O(n) key comparisons on average [7]. ...
Conference Paper
Dual pivot quicksort refers to variants of classical quicksort where in the partitioning step two pivots are used to split the input into three segments. This can be done in different ways, giving rise to different algorithms. Recently, a dual pivot algorithm due to Yaroslavskiy received much attention, because it replaced the well-engineered quicksort algorithm in Oracle's Java 7 runtime library. Nebel and Wild (ESA 2012) analyzed this algorithm and showed that on average it uses 1.9n ln n + O(n) comparisons to sort an input of size n, beating standard quicksort, which uses 2n ln n + O(n) comparisons. We introduce a model that captures all dual pivot algorithms, give a unified analysis, and identify new dual pivot algorithms that minimize the average number of key comparisons among all possible algorithms up to lower order or linear terms. This minimum is 1.8n ln n + O(n). For the case that the pivots are chosen from a small sample, we include a comparison of dual pivot quicksort and classical quicksort. We also present results about minimizing the average number of swaps.
Article
We present original average-case results on the performance of the Ford-Fulkerson maxflow algorithm on grid graphs (sparse) and random geometric graphs (dense). The analysis technique combines experiments with probability generating functions, stochastic context free grammars and an application of the maximum likelihood principle enabling us to make statements about the performance, where a purely theoretical approach has little chance of success. The methods lends itself to automation allowing us to study more variations of the Ford-Fulkerson maxflow algorithm with different graph search strategies and several elementary operations. A simple depth-first search enhanced with random iterators provides the best performance on grid graphs. For random geometric graphs a simple priority-first search with a maximum-capacity heuristic provides the best performance. Notable is the observation that randomization improves the performance even when the inputs are created from a random process.