Conference Paper

A table-based method for single-pass cache optimization.

DOI: 10.1145/1366110.1366129 Conference: Proceedings of the 18th ACM Great Lakes Symposium on VLSI 2008, Orlando, Florida, USA, May 4-6, 2008
Source: DBLP


Due to the large contribution of the memory subsystem to total system power, the memory subsystem is highly amenable to cus- tomization for reduced power/energy and/or improved performance. Cache parameters such as total size, line size, and associat ivity can be specialized to the needs of an application for system optimiza- tion. In order to determine the best values for cache parameters, most methodologies utilize repetitious application execution to in- dividually analyze each configuration explored. In this pap er we propose a simplified yet efficient technique to accurately es timate the miss rate of many different cache configurations in just o ne single-pass of execution. The approach utilizes simple data struc- tures in the form of a multi-layered table and elementary bitwise operations to capture the locality characteristics of an ap plication's addressing behavior. The proposed technique intends to ease miss rate estimation and reduce cache exploration time. Categories and Subject Descriptors

1 Follower
10 Reads
  • Source
    • "Thompson and Smith [18] simulated write-back caches by introducing dirty-level analysis and write-back counters. Since most previous works varied only two cache parameters while holding the remaining parameters fixed, Viana et al. [19] proposed SPCE, a stack-based algorithm that simultaneously evaluated all cache configurations with varying size, block size, and associativity in a single pass. Gordon-Ross et al. [8] implemented SPCE in hardware for runtime cache tuning. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cache tuning is the process of determining the optimal cache configuration given an application's requirements for reducing energy consumption and improving performance. As embedded systems trend towards unified second-level caches for improved performance, the need for fast cache tuning methodologies for multi-level cache hierarchies is becoming more critical. In this paper, we present U-SPaCS, a single-pass cache simulation methodology for design-time tuning of two-level cache hierarchies with a unified second-level cache. To afford fast simulation time, U-SPaCS maintains unique cache block addresses in a set of stacks, which enables simulation of all cache configurations for the level one instruction and data caches, and level two unified cache simultaneously in a single pass of an application's time-ordered instruction and data access trace. Experiments show that U-SPaCS can accurately determine the miss rates for a configurable cache design space consisting of 2,187 cache configurations with a 41X speedup in average simulation time as compared to the most widely-used tracedriven cache simulation.
  • Source
    • "Addr's presence in a cache set (the set that Addr maps to) depends on the cache configuration and the number of conflicts in the stack before Addr B (previous access to Addr's cache block). A stack address A is recorded as a conflict in {SConfl} for the cache configuration with block size B (B = 2 b ) and number of sets S when (A >> b) mod S = (Addr >> b) mod S. Fig 2 illustrates the stack-based algorithm [18] for processing each Addr in the access trace. For every combination of B (state 1) and S (state 2), stack processing determines the conflicts {SConfl} in the stack addresses before Addr B (state 3). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The cache hierarchy's large contribution to total microprocessor system power makes caches a good optimization candidate. We propose a single-pass trace-driven cache simulation methodology -- T-SPaCS -- for a two-level exclusive instruction cache hierarchy. Instead of storing and simulating numerous stacks repeatedly as in direct adaptation of a conventional trace-driven cache simulation to two level caches, T-SPaCS simulates both the level one and level two caches simultaneously using one stack. Experimental results show T-SPaCS efficiently and accurately determines the optimal cache configuration (lowest energy).
    Proceedings of the 16th Asia South Pacific Design Automation Conference, ASP-DAC 2011, Yokohama, Japan, January 25-27, 2011; 01/2011
  • Source
Show more


10 Reads
Available from