Conference Paper

A Simple Statistical Model of Cache Reference Locality, and its Application to Cache Planning, Measurement and Control.

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The causes of data reuse in storage control cache are often hierarchical. For example, reuse may be caused by repeated requests in the same subroutine; by different routines called to process the same transaction; or by multiple transactions needed to carry out some overall task at the user level. This paper develops a model of cache locality based on the assumption of hierarchical behavior. The model applies the concept of statistical self-similarity, which arises often in the study of fractals, to infer a very simple method of approximating cache miss ratios. The model's approximations are checked empirically against a large number of I/O traces and are shown to be highly serviceable. The model is then applied to the problems of cache miss ratio projection, trace-based measurement of cache miss ratios, and allocation and dynamic management of cache resources.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... In the computation of GCU, whenever this event occurs after the initial lling process it is viewed as though a segment of utilization 0 was collected by the garbage collector, since if a segment becomes empty through track writing the garbage collection algorithm should get \credit" for this. 6 The Mixing Model ...
... (Although other parameters such as the number of segments enter into the analysis, after making a reasonable approximation all parameters except ASU and GCU drop out. ) We recall some of the parameters introduced above and give shorter names to ASU and GCU: S = number of segments C = number of tracks per segment T = number of tracks a = T=CS (= ASU) g = GCU 8 GCU ASU analysis simulation . 6 .324 .322 . ...
Article
. In this paper, we propose and study a new algorithm for choosing segments for garbage collection in Log-Structured File Systems (LFS) and Log-Structured Arrays (LSA). We compare the performance of our new algorithm against previously known algorithms such as greedy and cost-benefit through simulation. The basic idea of our algorithm is that segments which have been recently filled by writes from the system should be forced to wait for a certain amount of time (the age-threshold) before they are allowed to become candidates for garbage collection. The expectation is that if the age-threshold is properly chosen, segments that have reached the age-threshold are unlikely to get significantly emptier due to future rewrites. Among segments that pass the age-threshold and become candidates for garbage collection, we select ones that will yield the most amount of free space. We show, through simulation, that our age-threshold algorithm is more efficient at garbage collection (produces more f...
... There would appear to be an analogy between the chance of being able to satisfy the application requests, as just described, and the chance of being able to satisfy other well-defined types of requests that may occur within the storage hierarchy -for example, a request for a track in cache, or a request for a data set in primary storage. A power law formulation has proven to be useful as a way to describe the probability of being able to satisfy such track or data set requests [3,4,5]. It does not seem so far-fetched to reason, by analogy, that a power law formulation may also apply to the probability of being able to satisfy the overall needs of applications that have some given, fixed transaction rate. ...
Article
Given a family of similar disks with various options for storage capacity, many planners consistently adopt the largest disk so as to hold down storage costs; others, aiming for performance, opt for a disk capacity toward the low end of the available range. This paper tries to reconcile both of these views, by developing a quantitative framework within which disk capacity, performance, and cost can all be considered. We gain a better understanding of the recent history of disk storage, and show how disk performance objectives can be matched with cost and capacity. 1 Introduction Rapid change has become the everyday routine for those engaged in deploying disk storage technology. Every so often, it is necessary to step back and reassess assumptions. This paper reexamines long-held assumptions about disk capacity, performance and cost. As disk storage has evolved over the past three decades, a curious tension has developed between two key players in the capacity planning game. On ...
Article
The log-structured disk subsystem is a new concept for the use of disk storage whose future application has enormous potential. In such a subsystem, all writes are organized into a log, each entry of which is placed into the next available free storage. A directory indicates the physical location of each logical object (e.g., each file block or track image) as known to the processor originating the I/O request. For those objects that have been written more than once, the directory retains the location of the most recent copy. Other work with log- structured disk subsystems has shown that they are capable of high write throughputs. However, the fragmentation of free storage due to the scattered locations of data that become out of date can become a problem in sustained operation. To control fragmentation, it is necessary to perform ongoing garbage collection, in which the location of stored data is shifted to release unused storage for re-use. This paper introduces a mathematical model of garbage collection, and shows how collection load relates to the utilization of storage and the amount of locality present in the pattern of updates. A realistic statistical model of updates, based upon trace data analysis, is applied. In addition, alternative policies are examined for determining which data areas to collect. The key conclusion of our analysis is that in environments with the scattered update patterns typical of database I/O, the utilization of storage must be controlled in order to achieve the high write throughput of which the subsystem is capable. In addition, the presence of data locality makes it important to take the past history of data into account in determining the next area of storage to be garbage-collected.
Article
Efficient management of cached storage control resources has been important since the introduction of cached controllers in the early 1980s, and it continues to grow more important as technology advances. The need for cache resource management is due to the diversity of workloads that may coexist under a given controller. Some workloads may continually require the staging of new data into cache memory, with almost no benefit in terms of performance; other workloads may reap major performance benefits while requiring relatively little data staging. The sharing of resources among various workloads must therefore be controlled to ensure that workloads in the former group do not interfere too much with those in the latter. Management of cache functions is often viewed as the job of the host system to which the controller is attached. But it is now also possible for advanced controllers to perform such management functions in a stand-alone manner. Caching algorithms can change adaptively to match the workloads presented. This enables the controller to be ported across multiple platforms without dependencies on software support. This paper surveys the variety of techniques that have been used for cache resource control, and examines the rapid evolution in such techniques that is now occurring.
Article
I/O subsystem configurations are dictated by the storage and I/O requirements of the specific applications that use the disk hardware. Treating the latter requirement as a given, however, draws a boundary at the channel interface that is not well-suited to the capabilities of the Enterprise Systems Architecture (ESA). This architecture allows hardware expenditures in the I/O subsystem to be managed, while at the same time improving transaction response time and system throughput capability, by a strategy of processor buffering coupled with storage control cache. The key is to control the aggregate time per transaction spent waiting for physical disk motion. This paper investigates how to think about and accomplish such an objective. A case study, based on data collected at a large Multiple Virtual Storage installation, is used to investigate the potential types and amounts of memory use by individual files, both in storage control cache and in processor buffers. The mechanism of interaction between the two memory types is then examined and modeled so as to develop broad guidelines for how best to deploy an overall memory budget. These guidelines tend to contradict the usual metrics of storage control cache effectiveness, underscoring the need for an adjustment in pre-ESA paradigms.
Article
In this paper, we develop analytical models and evaluate the performance of RAID5 disk arrays in normal mode (all disks operational), in degraded mode (one disk broken, rebuild not started) and in rebuild mode (one disk broken, rebuild started but not finished). Models for estimating rebuild time under the assumption that user requests get priority over rebuild activity have also been developed. Separate models were developed for cached and uncached disk controllers. Particular emphasis is on the performance of cached arrays, where the caches are built of Non-Volatile memory and support write caching in addition to read caching. Using these models, we evaluate the performance of arrayed and unarrayed disk subsystems when driven by a database workload such as those seen on systems running any of several popular database managers. In particular, we assume single-block accesses, flat device skew and little seek affinity. With the above assumptions, we find six significant results. First, in normal mode, we find there is no difference in performance between subsystems built out of either small arrays or large arrays as long as the total number of disks used is the same. Second, we find that if our goal is to minimize the average response time of a subsystem in degraded and rebuild modes, it is better to use small arrays rather than large arrays in the subsystem. Third, we find the counter-intuitive result that if our goal is to minimize the average response time of requests to any one array in the subsystem, it is better to use large arrays than small arrays in the subsystem. We call this the best worst-case phenomenon. Fourth, we find that when no caching is used in the disk controller, subsystems built out of arrays have a normal mode performance that is significantly worse than an equivalent unarrayed subsystem built of the same drives. For the specific drive, controller, workload and system parameters we used for our calculations, we find that, without a cache in the controller and operating at typical I/O rates, the normal mode response time of a subsystem built out of arrays is 50% higher than that of an unarrayed subsystem. In rebuild mode, we find that a subsystem built out of arrays can have anywhere from 100% to 200% higher average response time than an equivalent unarrayed subsystem. Out fifth result is that, with cached controllers, the performance differences between arrayed and equivalent unarrayed subsystems shrink considerably. We find that the normal mode response time in a subsystem built out of arrays is only 4.1% higher than that of an equivalent unarrayed system. In degraded (rebuild) mode, a subsystem built out of small arrays has a response time 11% (13%) higher and a subsystem built out of large arrays has a response time 15% (19%) higher than an unarrayed subsystem. Our sixth and last result is that cached arrays have significantly better response times and throughputs than equivalent uncached arrays. For one workload, a cached array with good hit ratios had 5 times the throughout and 10 to 40 times lower response times than the equivalent uncached array. With poor hit ratios, the cached array is still a factor of 2 better in throughput and a factor of 4 to 10 better in response time for this same workload. We conclude that 3 design decisions are important when designing disk subsystems built out of RAID level 5 arrays. First, it is important that disk subsystems built out of arrays have disk controllers with caches, in particular Non-Volatile caches that cache writes in addition to reads. Second, if one were trying to minimize the worst response time seen by any user, one would choose disk array subsystems built out of large RAID level 5 arrays because of the best worst-case phenomenon. Third, if average subsystem response time is the most important design metric, the subsystem should be built out of small RAID level 5 arrays.
Article
While it is known that lowering the associativity of caches degrades cache performance, little is understood about the degree of this effect or how to lessen the effect, especially in very large caches. Most existing works on cache performance are simulation or emulation based and there is a lack of analytical\ models characterizing performance in terms of different configuration parameters such as line size, cache size, associativity and workload specific parameters. We develop analytical models to study performance of large cache architectures by capturing the dependence of miss ratio on associativity and other configuration parameters. While high associativity may decrease cache misses, for very large caches the corresponding increase in hardware cost and power may be significant. We use our models as well as simulation to study different proposals for reducing misses in low associativity caches, specifically, address space randomization and victim caches. Our analysis provides specific detail on the impact of these proposals, and a clearer understanding of why they do or do not work.
Conference Paper
We present the results of a variety of trace-driven simulations of disk cache designs using traces from a variety of mainframe timesharing and database systems in production use. We compute miss ratios, run lengths, traffic ratios, cache residency times, degree of memory pollution and other statistics for a variety of designs, varying lock size, prefetching algorithm and write algorithm. We find that for this workload, sequential prefetching produces a significant (about 20%) but still limited improvement in the miss ratio, even using a powerful technique for detecting sequentiality. Copy-back writing decreased write traffic relative to write-through by more than 50%; periodic flushing of the dirty blocks increased write traffic only slightly compared to pure write-back, and then only for large cache sizes. Write-allocate had little effect compared to no-write-allocate. Block sizes of over a track don't appear to be useful. Limiting cache occupancy by a single process or transaction appears to have little effect. This study is unique in the variety and quality of the data used in the studies
ResearchGate has not been able to resolve any references for this publication.