[Show abstract][Hide abstract] ABSTRACT: Response time quantiles re∞ect user-perceived quality of service more accurately than mean or average response time measures. Consequently, on-line transaction processing benchmarks, telecommu- nications Service Level Agreements and emergency services legislation all feature stringent 90th percentile response time targets. This chapter describes a range of techniques for extracting response time densities and quantiles from large-scale Markov and semi-Markov models of real-life systems. We describe a method for the computation of response time densities or cumulative distribution functions which centres on the calculation and subsequent numerical inversion of their Laplace transforms. This can be applied to both Markov and semi-Markov models. We also review the use of uniformization to calculate such measures more e-ciently in purely Markovian models. We demonstrate these techniques by using them to generate response time quantiles in a semi-Markov model of a high-availability web-server. We show how these techniques can be used to analyse models with state spaces of O 107 states and above.
[Show abstract][Hide abstract] ABSTRACT: Linear least squares problems are commonly solved by QR factorization. When multiple solutions need to be computed with only minor changes in the underlying data, knowledge of the difference between the old data set and the new can be used to update an existing factorization at reduced computational cost. We investigate the viability of implementing QR updating algorithms on GPUs and demonstrate that GPU-based updating for removing columns achieves speed-ups of up to 13.5x compared with full GPU QR factorization. We characterize the conditions under which other types of updates also achieve speed-ups.
[Show abstract][Hide abstract] ABSTRACT: Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi's method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers in unexpected ways, discuss how our specific implementations could be generalised to other classes of problem, and suggest how existing parallel programming models might be extended to allow asynchronous algorithms to be expressed more easily.
International Journal of High Performance Computing Applications 02/2014; 28(1):97-111. DOI:10.1177/1094342013493123 · 1.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is a widespread but little-noticed phenomenon that the normwise relative error ‖x - y‖/‖x‖ of vectors x and y of floating point numbers of the same precision, where y is an approximation to x, can be many orders of magnitude smaller than the unit roundoff. We analyze this phenomenon and show that in the ∞-norm it happens precisely when x has components of widely varying magnitude and every component of x of largest magnitude agrees with the corresponding component of y. Performance profiles are a popular way to compare competing algorithms according to particular measures of performance. We show that performance profiles based on normwise relative errors can give a misleading impression due to the influence of zero or tiny normwise relative errors. We propose a transformation that reduces the influence of these extreme errors in a controlled manner, while preserving the monotonicity of the underlying data and leaving the performance profile unchanged at its left end-point. Numerical examples with both artificial and genuine data illustrate the benefits of the transformation.
[Show abstract][Hide abstract] ABSTRACT: We explore the relationship between official rankings of professional tennis players and rankings computed using a variant of the PageRank algorithm as proposed by Radicchi in 2011. We show Radicchi's equations follow a natural interpretation of the PageRank algorithm and present up-to-date comparisons of official rankings with PageRank-based rankings for both the Association of Tennis Professionals (ATP) and Women's Tennis Association (WTA) tours. For top-ranked players these two rankings are broadly in line; however, there is wide variation in the tail which leads us to question the degree to which the official ranking mechanism reflects true player ability. For a 390-day sample of recent tennis matches, PageRank-based rankings are found to be better predictors of match outcome than the official rankings.
Proceedings of the 9th European conference on Computer Performance Engineering; 07/2012
[Show abstract][Hide abstract] ABSTRACT: Passage time densities and quantiles are important perfor- mance metrics which are increasingly used in specifying service level agreements (SLAs) and benchmarks. PEPA is a popular stochastic pro- cess algebra and a powerful formalism for describing performance models of communication and computer systems. We present a case study pas- sage time analysis of an 82,944 state PEPA model using the HYDRA tool. HYDRA specialises in passage time analysis of large Markov sys- tems based on stochastic Petri nets. By using the new Imperial PEPA compiler (ipc), we can construct a HYDRA model from a PEPA model and obtain passage time densities based on the original PEPA descrip- tion.
[Show abstract][Hide abstract] ABSTRACT: RAID systems are ubiquitously deployed in storage environments, both as standalone storage solutions and as fundamental components of virtualized storage platforms. Accurate models of their performance are crucial to delivering storage infrastructures that meet given quality of service requirements. To this end, this paper presents a flexible fork-join queueing simulation model of RAID systems that are composed of zoned disk drives and which operate under RAID levels 01 or 5. The simulator takes as input I/O workloads that are heterogeneous in terms of request size and that exhibit burstiness, and its primary output metric is I/O request response time distribution. We also study the effects of heavy workload, taking into account the request-reordering optimizations employed by modern disk drives. All simulation results are validated against device measurements and compared with existing analytical queueing network models for the development of the models.
The Computer Journal 05/2011; 54(5):691-707. DOI:10.1093/comjnl/bxq053 · 0.79 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: High-level semi-Markov modelling paradigms such as semi-Markov stochastic Petri nets and process algebras are used to capture realistic performance models of computer and communication systems but often have the drawback of generating huge underlying semi-Markov processes. Extraction of performance measures such as steady-state probabilities and passage-time distributions therefore relies on sparse matrixâvector operations involving very large transition matrices. Previous studies have shown that exact state-by-state aggregation of semi-Markov processes can be applied to reduce the number of states. This can, however, lead to a dramatic increase in matrix density caused by the creation of additional transitions between remaining states. Our paper addresses this issue by presenting the concept of state space partitioning for aggregation. We present a new deterministic partitioning method which we term barrier partitioning. We show that barrier partitioning is capable of splitting very large semi-Markov models into a number of partitions such that first passage-time analysis can be performed more quickly and using up to 99% less memory than existing algorithms.
[Show abstract][Hide abstract] ABSTRACT: The precision of location tracking technology has improved greatly over the last few decades. We aim to show that by tracking the locations of individuals in a closed environment, it is now possible to record the nature and frequency of interactions between them. Further, that it is possible to use such data to predict the way in which an infection will spread throughout such a population, given parameters such as transmission and recovery rates. We accordingly present a software package that is capable of recording and then replaying location data provided by a high-precision location tracking system. The software then employs a combination of SIR modelling and the epidemiological technique of contact tracing in order to predict the spread of an infection. We use this software to conduct a number of experiments using a sample data set, and compare the SIR graphs generated from these to similar graphs generated using the traditional SIR differential equations.
International Journal of Healthcare Technology and Management 11/2010; 11(6). DOI:10.1504/IJHTM.2010.036925
[Show abstract][Hide abstract] ABSTRACT: Calculation of performance metrics such as steady-state probabilities and response time distributions in large Markov and semi-Markov models can be accomplished using parallel implementations of well-known numerical techniques. In the past these implementations have usually been run on dedicated computational clusters and networks of workstations, but the recent rise of cloud computing offers an alternative environment for executing such applications. It is important, however, to understand what effect moving to a cloud-based infrastructure will have on the performance of the analysis tools themselves. In this paper we investigate the scalability of two existing parallel performance analysis tools (one based on Laplace transform inversion and the other on uniformisation) on Amazon's Elastic Compute Cloud, and compare this with their performance on traditional dedicated hardware. This provides insight into whether such tools can be used effectively in a cloud environment, and suggests factors which must be borne in mind when designing next-generation performance tools specifically for the cloud.
[Show abstract][Hide abstract] ABSTRACT: Useful analytical models of storage system performance must support the characteristics exhibited by real I/O workloads. Two essential features are the ability to cater for bursty arrival streams and to support a given distribution of I/O request size. This paper develops and applies the theory of bulk arrivals in queueing networks to support these phenom-ena in models of I/O request response time in zoned disks and RAID systems, with a specific focus on RAID levels 01 and 5. We represent a single disk as an M X /G/1 queue, and a RAID system as a fork-join queueing network of M X /G/1 queues. We find the response time distribution for a ran-domly placed request within a random bulk arrival. We also use the fact that the response time of a random request with size sampled from some distribution will be the same as that of an entire batch whose size has the same distribution. In both cases, we validate our models against measurements from a zoned disk drive and a RAID platform.
Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools; 10/2009
[Show abstract][Hide abstract] ABSTRACT: Service Level Agreements (SLAs) are widely used through- out industry but suffer from specification ambiguities and difficulties in predicting and monitoring compliance. To address these issues, we propose the use of the Performance Tree formalism for the specification and monitoring of Service Level Agreements (SLAs). Specifically, we show how the basic Performance Tree formalism can be adapted to provide a rigorous yet accessible and expressive means to specify common SLA metrics. Using established per- formance analysis tools that support Performance Trees, this allows system designers to check SLA compliance on formal models of their system before implementation. We also propose an architecture for a system of measurement agents that enables the same performance requirements to be monitored in the context of a live implementation.
[Show abstract][Hide abstract] ABSTRACT: Since tokens in Generalised Stochastic Petri Net (GSPN) models are indistinguishable, it is not always possible to reason about customer-centric performance measures. To remedy this, we propose “tagged tokens” – a variant of the “tagged customer” technique used in the analysis of queueing networks. Under this scheme, one token in a structurally restricted net is “tagged” and its position tracked as it moves around the net. Performance queries can then be phrased in terms of the position of the tagged token.To date, the tagging of customers or tokens has been a time-consuming, manual and model-specific process. By contrast, we present here a completely automated methodology for the tagged token analysis of GSPNs. We first describe an intuitive graphical means of specifying the desired tagging configuration, along with the constraints on GSPN structure which must be observed for tagged tokens to be incorporated. We then present the mappings required for automatically converting a GSPN with a user-specified tagging structure into a Coloured GSPN (CGSPN), and thence into an unfolded GSPN which can be analysed for performance measures of interest by existing tools. We further show how our methodology integrates with Performance Trees, a formalism for the specification of performance queries.We have implemented our approach in the open source PIPE Petri net tool, and use this to illustrate the extra expressibility granted by tagged tokens through the analysis of a GSPN model of a hospital's Accident and Emergency department.
Electronic Notes in Theoretical Computer Science 03/2009; 232(232):75-88. DOI:10.1016/j.entcs.2009.02.051
[Show abstract][Hide abstract] ABSTRACT: This paper presents an overview of Platform-Independent Petri Net Editor 2 (PIPE2 ), an open-source tool that supports the design and analysis of Generalised Stochastic Petri Net (GSPN) models. PIPE2 's extensible design enables developers to add functionality via pluggable analysis modules. It also acts as a front-end for a parallel and distributed performance evaluation environment. With PIPE2, users are able to design and evaluate performance queries expressed in the Performance Tree formalism.
[Show abstract][Hide abstract] ABSTRACT: Disk drives are a common performance bottleneck in modern storage systems. To alleviate this, disk manufacturers employ a variety of I/O request scheduling strategies which aim to reduce disk head positioning time by dynamically reordering queueing requests. An analytical model of this phenomenon is best represented by an M/G/1 queue with queue length dependent service times. However, there is no general exact result for the response time distribution of this variety of queue with generalised service time distributions. In this paper, we present a novel approximation for the response time distribution of such a queue. We then apply this method to the specific case of a zoned disk drive which implements I/O request reordering. A key contribution is the derivation of realistic service time distributions with minimised positioning time. We derive analytical results for calculating not only the mean but also higher moments and the full distribution of I/O request response time. We validate our model against measurements from a real disk to demonstrate the accuracy of our approximation.
QEST 2009, Sixth International Conference on the Quantitative Evaluation of Systems, Budapest, Hungary, 13-16 September 2009; 01/2009
[Show abstract][Hide abstract] ABSTRACT: Imperial College London was host to the 24th Annual UK Performance Engineering Workshop in July 2008. UKPEW is an enjoyable workshop that brings together researchers in the performance engineering community to discuss quantitative aspects of, for instance, Grid computing, web and e-commerce, performance modelling techniques, power management and wireless network performance. In 2008, we had 29 papers presented over the two days of the workshop and this IET Software Special Issue represents significantly extended versions of the best selected papers from that workshop.
IET Software 01/2009; 3:443-444. DOI:10.1049/iet-sen.2009.9049 · 0.60 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: RAID systems are ubiquitously deployed in storage environments, both as standalone storage solutions and as fundamental components
of virtualised storage platforms. Accurate models of their performance are crucial to delivering storage infrastructures that
meet given quality of service requirements. To this end, this paper presents a flexible fork-join queueing simulation model
of RAID systems that are comprised of zoned disk drives and which operate under RAID levels 01 or 5. The simulator takes as
input I/O workloads that are heterogeneous in terms of request size and that exhibit burstiness, and its primary output metric
is I/O request response time distribution. We also study the effects of heavy workload, taking into account the request-reordering
optimisations employed by modern disk drives. All simulation results are validated against device measurements.
Computer Performance Engineering, 6th European Performance Engineering Workshop, EPEW 2009, London, UK, July 9-10, 2009, Proceedings; 01/2009