[Show abstract][Hide abstract] ABSTRACT: Response time quantiles re∞ect user-perceived quality of service more accurately than mean or average response time measures. Consequently, on-line transaction processing benchmarks, telecommu- nications Service Level Agreements and emergency services legislation all feature stringent 90th percentile response time targets. This chapter describes a range of techniques for extracting response time densities and quantiles from large-scale Markov and semi-Markov models of real-life systems. We describe a method for the computation of response time densities or cumulative distribution functions which centres on the calculation and subsequent numerical inversion of their Laplace transforms. This can be applied to both Markov and semi-Markov models. We also review the use of uniformization to calculate such measures more e-ciently in purely Markovian models. We demonstrate these techniques by using them to generate response time quantiles in a semi-Markov model of a high-availability web-server. We show how these techniques can be used to analyse models with state spaces of O 107 states and above.
[Show abstract][Hide abstract] ABSTRACT: Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi's method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers in unexpected ways, discuss how our specific implementations could be generalised to other classes of problem, and suggest how existing parallel programming models might be extended to allow asynchronous algorithms to be expressed more easily.
International Journal of High Performance Computing Applications 02/2014; 28(1):97-111. · 1.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is a widespread but little-noticed phenomenon that the normwise relative error ‖x - y‖/‖x‖ of vectors x and y of floating point numbers of the same precision, where y is an approximation to x, can be many orders of magnitude smaller than the unit roundoff. We analyze this phenomenon and show that in the ∞-norm it happens precisely when x has components of widely varying magnitude and every component of x of largest magnitude agrees with the corresponding component of y. Performance profiles are a popular way to compare competing algorithms according to particular measures of performance. We show that performance profiles based on normwise relative errors can give a misleading impression due to the influence of zero or tiny normwise relative errors. We propose a transformation that reduces the influence of these extreme errors in a controlled manner, while preserving the monotonicity of the underlying data and leaving the performance profile unchanged at its left end-point. Numerical examples with both artificial and genuine data illustrate the benefits of the transformation.
[Show abstract][Hide abstract] ABSTRACT: We explore the relationship between official rankings of professional tennis players and rankings computed using a variant of the PageRank algorithm as proposed by Radicchi in 2011. We show Radicchi's equations follow a natural interpretation of the PageRank algorithm and present up-to-date comparisons of official rankings with PageRank-based rankings for both the Association of Tennis Professionals (ATP) and Women's Tennis Association (WTA) tours. For top-ranked players these two rankings are broadly in line; however, there is wide variation in the tail which leads us to question the degree to which the official ranking mechanism reflects true player ability. For a 390-day sample of recent tennis matches, PageRank-based rankings are found to be better predictors of match outcome than the official rankings.
Proceedings of the 9th European conference on Computer Performance Engineering; 07/2012
[Show abstract][Hide abstract] ABSTRACT: RAID systems are ubiquitously deployed in storage environments, both as standalone storage solutions and as fundamental components of virtualized storage platforms. Accurate models of their performance are crucial to delivering storage infrastructures that meet given quality of service requirements. To this end, this paper presents a flexible fork-join queueing simulation model of RAID systems that are composed of zoned disk drives and which operate under RAID levels 01 or 5. The simulator takes as input I/O workloads that are heterogeneous in terms of request size and that exhibit burstiness, and its primary output metric is I/O request response time distribution. We also study the effects of heavy workload, taking into account the request-reordering optimizations employed by modern disk drives. All simulation results are validated against device measurements and compared with existing analytical queueing network models for the development of the models.
[Show abstract][Hide abstract] ABSTRACT: High-level semi-Markov modelling paradigms such as semi-Markov stochastic Petri nets and process algebras are used to capture realistic performance models of computer and communication systems but often have the drawback of generating huge underlying semi-Markov processes. Extraction of performance measures such as steady-state probabilities and passage-time distributions therefore relies on sparse matrixâvector operations involving very large transition matrices. Previous studies have shown that exact state-by-state aggregation of semi-Markov processes can be applied to reduce the number of states. This can, however, lead to a dramatic increase in matrix density caused by the creation of additional transitions between remaining states. Our paper addresses this issue by presenting the concept of state space partitioning for aggregation. We present a new deterministic partitioning method which we term barrier partitioning. We show that barrier partitioning is capable of splitting very large semi-Markov models into a number of partitions such that first passage-time analysis can be performed more quickly and using up to 99% less memory than existing algorithms.
[Show abstract][Hide abstract] ABSTRACT: The precision of location tracking technology has improved greatly over the last few decades. We aim to show that by tracking the locations of individuals in a closed environment, it is now possible to record the nature and frequency of interactions between them. Further, that it is possible to use such data to predict the way in which an infection will spread throughout such a population, given parameters such as transmission and recovery rates. We accordingly present a software package that is capable of recording and then replaying location data provided by a high-precision location tracking system. The software then employs a combination of SIR modelling and the epidemiological technique of contact tracing in order to predict the spread of an infection. We use this software to conduct a number of experiments using a sample data set, and compare the SIR graphs generated from these to similar graphs generated using the traditional SIR differential equations.
[Show abstract][Hide abstract] ABSTRACT: Calculation of performance metrics such as steady-state probabilities and response time distributions in large Markov and semi-Markov models can be accomplished using parallel implementations of well-known numerical techniques. In the past these implementations have usually been run on dedicated computational clusters and networks of workstations, but the recent rise of cloud computing offers an alternative environment for executing such applications. It is important, however, to understand what effect moving to a cloud-based infrastructure will have on the performance of the analysis tools themselves. In this paper we investigate the scalability of two existing parallel performance analysis tools (one based on Laplace transform inversion and the other on uniformisation) on Amazon's Elastic Compute Cloud, and compare this with their performance on traditional dedicated hardware. This provides insight into whether such tools can be used effectively in a cloud environment, and suggests factors which must be borne in mind when designing next-generation performance tools specifically for the cloud.
[Show abstract][Hide abstract] ABSTRACT: Useful analytical models of storage system performance must support the characteristics exhibited by real I/O workloads. Two essential features are the ability to cater for bursty arrival streams and to support a given distribution of I/O request size. This paper develops and applies the theory of bulk arrivals in queueing networks to support these phenom-ena in models of I/O request response time in zoned disks and RAID systems, with a specific focus on RAID levels 01 and 5. We represent a single disk as an M X /G/1 queue, and a RAID system as a fork-join queueing network of M X /G/1 queues. We find the response time distribution for a ran-domly placed request within a random bulk arrival. We also use the fact that the response time of a random request with size sampled from some distribution will be the same as that of an entire batch whose size has the same distribution. In both cases, we validate our models against measurements from a zoned disk drive and a RAID platform.
Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools; 10/2009
[Show abstract][Hide abstract] ABSTRACT: Traditional methods for deriving performance models of customer o w in real-life systems are manual, time- consuming and prone to human error. This paper pro- poses an automated four-stage data processing pipeline which takes as input raw high-precision location track- ing data and which outputs a queueing network model of customer o w. The pipeline estimates both the structure of the network and the underlying interarrival and service time distributions of its component service centres. We evaluate our method's effectiveness and accuracy in four experimental case studies.
[Show abstract][Hide abstract] ABSTRACT: Imperial College London was host to the 24th Annual UK Performance Engineering Workshop in July 2008. UKPEW is an enjoyable workshop that brings together researchers in the performance engineering community to discuss quantitative aspects of, for instance, Grid computing, web and e-commerce, performance modelling techniques, power management and wireless network performance. In 2008, we had 29 papers presented over the two days of the workshop and this IET Software Special Issue represents significantly extended versions of the best selected papers from that workshop.
IET Software 01/2009; 3:443-444. · 0.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: RAID systems are ubiquitously deployed in storage environments, both as standalone storage solutions and as fundamental components
of virtualised storage platforms. Accurate models of their performance are crucial to delivering storage infrastructures that
meet given quality of service requirements. To this end, this paper presents a flexible fork-join queueing simulation model
of RAID systems that are comprised of zoned disk drives and which operate under RAID levels 01 or 5. The simulator takes as
input I/O workloads that are heterogeneous in terms of request size and that exhibit burstiness, and its primary output metric
is I/O request response time distribution. We also study the effects of heavy workload, taking into account the request-reordering
optimisations employed by modern disk drives. All simulation results are validated against device measurements.
Computer Performance Engineering, 6th European Performance Engineering Workshop, EPEW 2009, London, UK, July 9-10, 2009, Proceedings; 01/2009
[Show abstract][Hide abstract] ABSTRACT: Since tokens in Generalised Stochastic Petri Net (GSPN) models are indistinguishable, it is not always possible to reason about customer-centric performance measures. To remedy this, we propose “tagged tokens” – a variant of the “tagged customer” technique used in the analysis of queueing networks. Under this scheme, one token in a structurally restricted net is “tagged” and its position tracked as it moves around the net. Performance queries can then be phrased in terms of the position of the tagged token.To date, the tagging of customers or tokens has been a time-consuming, manual and model-specific process. By contrast, we present here a completely automated methodology for the tagged token analysis of GSPNs. We first describe an intuitive graphical means of specifying the desired tagging configuration, along with the constraints on GSPN structure which must be observed for tagged tokens to be incorporated. We then present the mappings required for automatically converting a GSPN with a user-specified tagging structure into a Coloured GSPN (CGSPN), and thence into an unfolded GSPN which can be analysed for performance measures of interest by existing tools. We further show how our methodology integrates with Performance Trees, a formalism for the specification of performance queries.We have implemented our approach in the open source PIPE Petri net tool, and use this to illustrate the extra expressibility granted by tagged tokens through the analysis of a GSPN model of a hospital's Accident and Emergency department.
Electronic Notes in Theoretical Computer Science. 01/2009;
[Show abstract][Hide abstract] ABSTRACT: Disk drives are a common performance bottleneck in modern storage systems. To alleviate this, disk manufacturers employ a variety of I/O request scheduling strategies which aim to reduce disk head positioning time by dynamically reordering queueing requests. An analytical model of this phenomenon is best represented by an M/G/1 queue with queue length dependent service times. However, there is no general exact result for the response time distribution of this variety of queue with generalised service time distributions. In this paper, we present a novel approximation for the response time distribution of such a queue. We then apply this method to the specific case of a zoned disk drive which implements I/O request reordering. A key contribution is the derivation of realistic service time distributions with minimised positioning time. We derive analytical results for calculating not only the mean but also higher moments and the full distribution of I/O request response time. We validate our model against measurements from a real disk to demonstrate the accuracy of our approximation. model on an existing zoned disk model (6), (7) in which each disk drive is represented as a first-come first-served (FCFS) M/G/1 queue with a fixed service time distribution. The present work models the operation of a disk drive with Shortest Access Time First (SATF) scheduling by using an M/G/1 queue with queue-length dependent service time distributions. There does not currently exist a generally applicable exact result for the response time distribution of this variety of queue. We present a novel approximation for the response time distribution of such a queue. It is a non-trivial challenge to derive realistic service time distributions for each queue length such that expected positioning time is minimised. We demonstrate the accuracy of our model by comparing model predictions with real device measurements. The remainder of this paper is organised as follows. In Section II we discuss prior work in this area. We survey ex- isting scheduling strategies, modelling techniques and existing approaches for modelling queues with state-dependent service times. We also briefly recap the existing zoned disk model. In Section III we present a new approximation for calculating the response time distribution of M/G/1 queues with state- dependent service times. We then apply this method to the zoned disk model, deriving a queue length dependent service time distribution for the disk drive. Section IV validates our model against real device measurements. Finally, Section V concludes and considers directions for future work.
QEST 2009, Sixth International Conference on the Quantitative Evaluation of Systems, Budapest, Hungary, 13-16 September 2009; 01/2009
[Show abstract][Hide abstract] ABSTRACT: This paper presents an overview of Platform-Independent Petri Net Editor 2 (PIPE2 ), an open-source tool that supports the design and analysis of Generalised Stochastic Petri Net (GSPN) models. PIPE2 's extensible design enables developers to add functionality via pluggable analysis modules. It also acts as a front-end for a parallel and distributed performance evaluation environment. With PIPE2, users are able to design and evaluate performance queries expressed in the Performance Tree formalism.
[Show abstract][Hide abstract] ABSTRACT: Performance Trees are a unifying framework for the specification of performance queries involving measures and requirements. This paper describes an evaluation environment for Performance Trees comprising a client-side Performance Query Editor, incorporated as a module of the PIPE2 Petri net tool, and a cluster-based server-side evaluation engine. The latter combines the capabilities of a number of parallel and distributed analysis tools.
Quantitative Evaluation of Systems, 2008. QEST '08. Fifth International Conference on; 10/2008
[Show abstract][Hide abstract] ABSTRACT: We present and validate an enhanced analytical queueing network model of zoned RAID. The model focuses on RAID levels 01 and 5, and yields the distribution of I/O request response time. Whereas our previous work could only support arrival streams of I/O requests of the same type, the model presented here supports heterogeneous streams with a mixture of read and write requests. This improved realism is made possible through multiclass extentions to our existing model. When combined with priority queueing, this development also enables more accurate modelling of the way subtasks of RAID 5 write requests are scheduled. In all cases we derive analytical results for calculating not only the mean but also higher moments and the full distribution of I/O request response time. We validate our model against measurements from a real RAID system.
Modeling, Analysis and Simulation of Computers and Telecommunication Systems, 2008. MASCOTS 2008. IEEE International Symposium on; 10/2008
[Show abstract][Hide abstract] ABSTRACT: RAID systems are widely deployed, both as standalone storage solutions and as the building blocks of modern virtualised storage
platforms. An accurate model of RAID system performance is therefore critical to understanding storage system performance.
To this end, this paper presents a queueing network-based model of RAID systems comprised of zoned disks and operating at
RAID level 0-1 or 5. The contribution over previous work is twofold. Firstly, our analysis approximates full I/O request response
time distributions rather than just mean values. This provides the ability to reason about response time quantiles and higher
moments of response time – both of which are useful in the context of modern quality of service requirements. Secondly, we
validate our model against measurements from a real RAID system rather than a software simulation. The close agreement between
predicted and observed response time distributions gives a high level of confidence in the validity of our model.
Analytical and Stochastic Modeling Techniques and Applications, 15th International Conference, ASMTA 2008, Nicosia, Cyprus, June 4-6, 2008, Proceedings; 01/2008
[Show abstract][Hide abstract] ABSTRACT: The accessible specification of performance queries is a key challenge in performance analysis. To this end, we seek to combine the intuitive aspects of natural language query specification with the expressive power and flexibility of the Performance Tree formalism. Specifically, we present a structured English grammar for Performance Trees, and use it to implement a Natural Language Query Builder (NLQB) for the Platform Independent Petri net Editor (PIPE). The NLQB guides users in the construction of performance queries in an iterative fashion, presenting at each step a range of natural language alternatives that are appropriate in the query context. We demonstrate our technique in the specification of performance queries on a model of a hospital's Accident and Emergency department.
Computer Performance Engineering, 5th European Performance Engineering Workshop, EPEW 2008, Palma de Mallorca, Spain, September 24-25, 2008. Proceedings; 01/2008