Conference Paper

Way guard: A segmented counting bloom filter approach to reducing energy for set-associative caches

DOI: 10.1145/1594233.1594276 Conference: Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009, San Fancisco, CA, USA, August 19-21, 2009
Source: DBLP

ABSTRACT

The design trend of caches in modern processors continues toin- crease their capacity with higher associativity to cope wit h large data footprint and take advantage of feature size shrink, wh ich, un- fortunately, also leads to higher energy consumption. Thispaper presents a technique using segmented counting Bloom filterscalled "Way Guard" to reduce the number of redundant way lookups in large set-associative caches to achieve dynamic energy sav ings. Our Way Guard mechanism only looks up an average of 25-30% of the cache ways and saved up to 65% of the L2 energy and up to 70% of the L1 cache energy.

Download full-text

Full-text

Available from: Emre Özer
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tag comparisons occupy a significant portion of cache power consumption in the highly associative cache such as L2 cache. In our work, we propose a novel tag access scheme which applies a partial tag-enhanced Bloom filter to reduce tag comparisons by detecting per-way cache misses. The proposed scheme also classifies cache data into hot and cold data and the tags of hot data are compared earlier than those of cold data exploiting the fact that most of cache hits go to hot data. In addition, the power consumption of each tag comparison can be further reduced by dividing the tag comparison into two micro-steps where a partial tag comparison is performed first and, only if the partial tag comparison gives a partial hit, then the remaining tag bits are compared. We applied the proposed scheme to an L2 cache with 10 programs from SPEC2000 and SPEC2006. Experimental results show average 23.69% and 8.58% reduction in cache energy consumption compared with the conventional serial tag-data access and the other existing methods, respectively.
    No preview · Conference Paper · Mar 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Intelligently partitioning the last-level cache within a chip multiprocessor can bring significant performance improvements. Resources are given to the applications that can benefit most from them, restricting each core to a number of logical cache ways. However, although overall performance is increased, existing schemes fail to consider energy saving when making their partitioning decisions. This paper presents Cooperative Partitioning, a runtime partitioning scheme that reduces both dynamic and static energy while maintaining high performance. It works by enforcing cached data to be way-aligned, so that a way is owned by a single core at any time. Cores cooperate with each other to migrate ways between themselves after partitioning decisions have been made. Upon access to the cache, a core needs only to consult the ways that it owns to find its data, saving dynamic energy. Unused ways can be power-gated for static energy saving. We evaluate our approach on two-core and four-core systems, showing that we obtain average dynamic and static energy savings of 35% and 25% compared to a fixed partitioning scheme. In addition, Cooperative Partitioning maintains high performance while transferring ways five times faster than an existing state-of-the-art technique.
    Preview · Conference Paper · Jan 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tag comparison in a highly associative cache consumes a significant portion of the cache energy. Existing methods for tag comparison reduction are based on predicting either cache hits or cache misses. In this paper, we present novel ideas for both cache hit and miss predictions. We present a partial tag-enhanced Bloom filter to improve the accuracy of the cache miss prediction method and hot/cold checks that control data liveness to reduce the tag comparisons of the cache hit prediction method. We also combine both methods so that their order of application can be dynamically adjusted to adapt to changing cache access behavior, which further reduces tag comparisons. To overcome the common limitation of multistep tag comparison methods, we propose a method that reduces tag comparisons while meeting the given performance bound. Experimental results showed that the proposed method reduces the energy consumption of tag comparison by an average of 88.40%, which translates to an average reduction of 35.34% (40.19% with low-power data access) in the total energy consumption of the L2 cache and a further reduction of 8.86% (10.07% with low-power data access) when compared with existing methods.
    No preview · Article · Apr 2012 · IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Show more