Article

Balanced instruction cache: reducing conflict misses of direct-mapped caches through balanced subarray accesses

Dept. of Electr. & Comput. Eng., San Diego State Univ., CA, USA;
IEEE Computer Architecture Letters (Impact Factor: 0.85). 02/2006; 5(1):2- 5. DOI: 10.1109/L-CA.2006.3
Source: IEEE Xplore

ABSTRACT It is observed that the limited memory space of direct-mapped caches is not used in balance therefore incurs extra conflict misses. We propose a novel cache organization of a balanced cache, which balances accesses to cache sets at the granularity of cache subarrays. The key technique of the balanced cache is a programmable subarray decoder through which the mapping of memory reference addresses to cache subarrays can be optimized hence conflict misses of direct-mapped caches can be resolved. The experimental results show that the miss rate of balanced cache is lower than that of the same sized two-way set-associative caches on average and can be as low as that of the same sized four-way set-associative caches for particular applications. Compared with previous techniques, the balanced cache requires only one cycle to access all cache hits and has the same access time as direct-mapped caches.

0 Bookmarks
 · 
84 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Network on chip (NoC) plays an important role in many core system. Recent researches on NoC focus on the design and optimization on network separately. This paper describes a tightly coupled NoC router architecture. In this architecture, the router and the core are designed as a whole. The router uses on-chip storage to improve the network performance. Several optimizations are introduced to make better use of the on-chip resource and information. This design can save 9.3% chip area in theory. The experiments results show the optimization on the ejection process can reduce latency by up to 75% and energy consumption by 31.5% in heavy traffic load network. And it can also improve latency by about 20% and energy consumption by nearly 25% under different buffer depth. The results also show that this tightly coupled router architecture can achieve better performance in the large scale network.
    Scalable Computing and Communications; Eighth International Conference on Embedded Computing, 2009. SCALCOM-EMBEDDEDCOM'09. International Conference on; 10/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: Based on summary of existing hard disk cache management algorithms and characteristics of hard disk performance, a page miss cost (PMC) cache management algorithm has been proposed. Most of cache management algorithms focus on maximize hit rate. Our analysis shows that cache miss results in tremendous time cost. To minimize the time consumption when a cache miss occurs is the aim of PMC schema. The PMC algorithm keeps a reserved area for each cache working set. The page with high time cost when be swapped into cache will be reserved in this area for future access instead of being swapped out of cache by least recently used (LRU) algorithm. Simulations indicate PMC obviously improve disk throughputs, and system performance is enhanced.
    Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Level one cache normally resides on a processor's critical path, which determines the clock frequency. Direct- mapped caches exhibit fast access time but poor hit rates compared with same sized set-associative caches due to non- uniform accesses to the cache sets, which generate more conflict misses in some sets while other sets are underutilized. We propose a technique to reduce the miss rate of direct mapped caches through balancing the accesses to cache sets. We increase the decoder length and thus reduce the accesses to heavily used sets without dynamically detecting the cache set usage information. We introduce a replacement policy to direct-mapped cache design and increase the access to the underutilized cache sets with the help of programmable decoders. On average, the proposed balanced cache, or B- Cache, achieves 64.5% and 37.8% miss rate reductions on all 26 SPEC2K benchmarks for the instruction and data caches, respectively. This translates into an average IPC improvement of 5.9%. The B-Cache consumes 10.5% more power per access but exhibits 2% total memory access related energy saving due to the miss rate reductions and hence the reduction to applications' execution time. Compared with previous techniques that aim at reducing the miss rate of direct-mapped caches, our technique requires only one cycle to access all cache hits and has the same access time of a direct-mapped cache.
    01/2006;

Full-text (2 Sources)

Download
50 Downloads
Available from
May 21, 2014