Balanced instruction cache: reducing conflict misses of direct-mapped caches through balanced subarray accesses

Dept. of Electr. & Comput. Eng., San Diego State Univ., CA, USA;
IEEE Computer Architecture Letters (Impact Factor: 0.85). 02/2006; 5(1):2- 5. DOI: 10.1109/L-CA.2006.3
Source: IEEE Xplore

ABSTRACT It is observed that the limited memory space of direct-mapped caches is not used in balance therefore incurs extra conflict misses. We propose a novel cache organization of a balanced cache, which balances accesses to cache sets at the granularity of cache subarrays. The key technique of the balanced cache is a programmable subarray decoder through which the mapping of memory reference addresses to cache subarrays can be optimized hence conflict misses of direct-mapped caches can be resolved. The experimental results show that the miss rate of balanced cache is lower than that of the same sized two-way set-associative caches on average and can be as low as that of the same sized four-way set-associative caches for particular applications. Compared with previous techniques, the balanced cache requires only one cycle to access all cache hits and has the same access time as direct-mapped caches.

  • [Show abstract] [Hide abstract]
    ABSTRACT: We implement and undertake an empirical study of the cache-oblivious variant in (2) of the polygon indecom- posability testing algorithm of (11), based on a DFS traversal of the computation tree. According to (2), the cache-oblivious variant exhibits improved spatial and temporal locality over the original one, and its spatial locality is optimal. Our im- plementation revolves around eight different variants of the DFS-based algorithm, tailored to assess the trade-offs between computation and memory performance as originally proposed in (2). We analyse performance sensitively to manipulations of the several parameters comprising the input size. We describe how to construct suitably random families of input that solicit such variations, and how to handle redundancies in vector computations at no asymptotic increase in the work and cache complexities. We report extensively on our experimental results. In all eight variants, the DFS-based variant achieves excellent performance in terms of L1 and L2 cache misses as well as total run time, when compared to the original variant in (11). We also benchmark the DFS variant against the powerful computer algebra system MAGMA, in the context of bivariate polynomial irreducibility testing using polygons. For sufficiently high degree polynomials, MAGMA either runs out of memory or fails to terminate after about four hours of execution. In contrast, the DFS-based version processes such input using a couple of seconds. Particularly, we report on absolute irreducibility testing of bivariate polynomials of total degree reaching 19,000 in about 2 seconds for the DFS variant, using a single processor. ∗
    Computing 01/2010; 88:55-78. · 0.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work we study cache peak temperature variation under different cache access patterns. In particular we show that unbalanced cache access results in higher cache peak temperature. This is the result of frequent accesses made to overused cache sets. Moreover we study cache peak temperature under cache access balancing techniques and show that exploiting such techniques not only reduces cache miss rate but also results in lower peak temperature. Our study shows that balancing cache access reduces peak temperature by up to 20% and 12% for instruction and data caches respectively. This temperature reduction reduces peak temperature in neighbor components by up to 7%.
    The 9th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2011, Sharm El-Sheikh, Egypt, December 27-30, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Based on summary of existing hard disk cache management algorithms and characteristics of hard disk performance, a page miss cost (PMC) cache management algorithm has been proposed. Most of cache management algorithms focus on maximize hit rate. Our analysis shows that cache miss results in tremendous time cost. To minimize the time consumption when a cache miss occurs is the aim of PMC schema. The PMC algorithm keeps a reserved area for each cache working set. The page with high time cost when be swapped into cache will be reserved in this area for future access instead of being swapped out of cache by least recently used (LRU) algorithm. Simulations indicate PMC obviously improve disk throughputs, and system performance is enhanced.
    Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on; 01/2010

Full-text (2 Sources)

Available from
May 21, 2014