Conference Paper

Virtual Ways: Efficient Coherence for Architecturally Visible Storage in Automatic Instruction Set Extensions.

DOI: 10.1007/978-3-642-11515-8_11 Conference: High Performance Embedded Architectures and Compilers, 5th International Conference, HiPEAC 2010, Pisa, Italy, January 25-27, 2010. Proceedings
Source: DBLP

ABSTRACT Customizable processors augmented with application-specific Instruction Set Extensions (ISEs) have begun to gain traction in recent years. The most effective ISEs include Architecturally Visible Storage (AVS), compiler-controlled memories accessible exclusively to the ISEs. Unfortunately, the usage of AVS memories creates a coherence
problem with the data cache. A multiprocessor coherence protocol can solve the problem, however, this is an expensive solution
when applied in a uniprocessor context. Instead, we can solve the problem by modifying the cache controller so that the AVS
memories function as extra ways of the cache with respect to coherence, but are not generally accessible as extra ways for use under normal software execution. This solution, which we call Virtual Ways is less costly than a hardware coherence protocol, and eliminate coherence messages from the system bus, which improves energy
consumption. Moreover, eliminating these messages makes Virtual Ways significantly more robust to performance degradation
when there is a significant disparity in clock frequency between the processor and main memory.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Way Stealing is a simple architectural modification to a cache-based processor that increases the data bandwidth to and from application-specific instruction set extensions (ISEs), which increase performance and reduce energy consumption. Way Stealing offers higher bandwidth than interfacing the ISEs the processor's register file, and eliminates the need to allocate separate memories called architecturally visible storage (AVS) that are dedicated to the ISEs, and to ensure coherence between the AVS memories and the processor's data cache. Our results show that Way Stealing is competitive in terms of performance and energy consumption with other techniques that use AVS memories in conjunction with a data cache.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 01/2014; 22(1):62-75. DOI:10.1109/TVLSI.2012.2236689 · 1.14 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technologies, this paper describes a design methodology (i) to automatically generate a domain-specific coarse-grained array from a set of representative applications and (ii) to introduce limited forms of architectural generality to increase the likelihood that additional applications can be successfully mapped onto it. In particular, coarse-grained arrays generated using our approach are intended to be integrated into customizable processors that use application-specific instruction set extensions to accelerate performance and reduce energy; rather than implementing these extensions using application-specific integrated circuit (ASIC) logic, which lacks flexibility, they can be synthesized onto our reconfigurable array instead, allowing the processor to be used for a variety of applications in related domains. Results show that our array is around 2× slower and 15× larger than an ultimately efficient ASIC implementation, and thus far more efficient than fieldprogrammable gate arrays (FPGAs), which are known to be 3-4× slower and 20-40× larger. Additionally, we estimate that our array is usually around 2× larger and 2× slower than an accelerator synthesized using traditional datapath merging, which has, if any, very limited flexibility beyond the design set of DFGs.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2013; 32(5):681-694. DOI:10.1109/TCAD.2012.2235127 · 1.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Customized instructions (CIs) implemented using custom functional units (CFUs) have been proposed as a way of improving performance and energy efficiency of software while minimizing cost of designing and verifying accelerators from scratch. However, previous work allows CIs to only communicate with the processor through registers or with limited memory operations. In this work we propose an architecture that allows CIs to seamlessly execute memory operations without any special synchronization operations to guarantee program order of instructions. Our results show that our architecture can provide 24\% energy savings with 14% performance improvement for 2-issue and 4-issue superscalar processor cores.
    Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays; 02/2013

Full-text (2 Sources)

Available from
May 17, 2014