Conference Paper

Way guard: A segmented counting bloom filter approach to reducing energy for set-associative caches

DOI: 10.1145/1594233.1594276 Conference: Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009, San Fancisco, CA, USA, August 19-21, 2009
Source: DBLP


The design trend of caches in modern processors continues toin- crease their capacity with higher associativity to cope wit h large data footprint and take advantage of feature size shrink, wh ich, un- fortunately, also leads to higher energy consumption. Thispaper presents a technique using segmented counting Bloom filterscalled "Way Guard" to reduce the number of redundant way lookups in large set-associative caches to achieve dynamic energy sav ings. Our Way Guard mechanism only looks up an average of 25-30% of the cache ways and saved up to 65% of the L2 energy and up to 70% of the L1 cache energy.

Download full-text


Available from: Emre Özer,
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tag comparisons occupy a significant portion of cache power consumption in the highly associative cache such as L2 cache. In our work, we propose a novel tag access scheme which applies a partial tag-enhanced Bloom filter to reduce tag comparisons by detecting per-way cache misses. The proposed scheme also classifies cache data into hot and cold data and the tags of hot data are compared earlier than those of cold data exploiting the fact that most of cache hits go to hot data. In addition, the power consumption of each tag comparison can be further reduced by dividing the tag comparison into two micro-steps where a partial tag comparison is performed first and, only if the partial tag comparison gives a partial hit, then the remaining tag bits are compared. We applied the proposed scheme to an L2 cache with 10 programs from SPEC2000 and SPEC2006. Experimental results show average 23.69% and 8.58% reduction in cache energy consumption compared with the conventional serial tag-data access and the other existing methods, respectively.
    Design, Automation and Test in Europe, DATE 2011, Grenoble, France, March 14-18, 2011; 03/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The snoopy protocol is a widely used scheme to maintain cache coherence. However, the protocol requires a broadcast scheme and forces substantial unnecessary data searches at the local cache. This paper proposes a novel Double Layer Counting Bloom Filter (DLCBF) to significantly reduce the redundant data searches and transmission. The DLCBF implements an extra layer of hash function and the counting feature at each filter entry. By using the hierarchical structure of the hash function, DLCBF can effectively increase the successful filter rates while requiring a smaller memory usage than the conventional Bloom filters. Experimental results show that the DLCBF can screen out 4.05X of unnecessary cache searches and use 18.75% less memory compared to conventional Bloom filters. The DLCBF is also used to filter out the redundant data transmission on a hierarchical shared bus. Simulation results show that the DLCBF outperforms conventional filters by 58% for local transmissions and 1.86X for remote transmissions.
    Parallel Architectures, Algorithms and Programming (PAAP), 2012 Fifth International Symposium on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Intelligently partitioning the last-level cache within a chip multiprocessor can bring significant performance improvements. Resources are given to the applications that can benefit most from them, restricting each core to a number of logical cache ways. However, although overall performance is increased, existing schemes fail to consider energy saving when making their partitioning decisions. This paper presents Cooperative Partitioning, a runtime partitioning scheme that reduces both dynamic and static energy while maintaining high performance. It works by enforcing cached data to be way-aligned, so that a way is owned by a single core at any time. Cores cooperate with each other to migrate ways between themselves after partitioning decisions have been made. Upon access to the cache, a core needs only to consult the ways that it owns to find its data, saving dynamic energy. Unused ways can be power-gated for static energy saving. We evaluate our approach on two-core and four-core systems, showing that we obtain average dynamic and static energy savings of 35% and 25% compared to a fixed partitioning scheme. In addition, Cooperative Partitioning maintains high performance while transferring ways five times faster than an existing state-of-the-art technique.
    High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on; 01/2012
Show more