The Impact of Speculative Execution on SMT Processors

International Journal of Parallel Programming (Impact Factor: 0.49). 08/2008; 36(4):361-385. DOI: 10.1007/s10766-007-0052-3
Source: DBLP


By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level
Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are fetched
from multiple threads. However, due to incorrect control speculations, a significant number of these in-flight instructions
are discarded from the pipelines of SMT processors (which is a direct consequence of these pipelines getting wider and deeper).
Although increasing the accuracy of branch predictors may reduce the number of instructions so discarded from the pipelines,
the prediction accuracy cannot be easily scaled up since aggressive branch prediction schemes strongly depend on the particular
predictability inherently to the application programs. In this paper, we present an efficient thread scheduling mechanism
for SMT processors, called SAFE-T (Speculation-Aware Front-End Throttling): it is easy to implement and allows an SMT processor
to selectively perform speculative execution of threads according to the confidence level on branch predictions, hence preventing
wrong-path instructions from being fetched. SAFE-T provides an average reduction of 57.9% in the number of discarded instructions
and improves the instructions per cycle (IPC) performance by 14.7% on average over the ICOUNT policy across the multi-programmed
workloads we simulate.

13 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One major obstacle faced by designers when entering the multicore era is how to harness the massive computing power which these cores provide. Since Instructional-Level Parallelism (ILP) is inherently limited, one single thread is not capable of efficiently utilizing the resource of a single core. Hence, Simultaneous MultiThreading (SMT) microarchitecture can be introduced in an effort to achieve improved system resource utilization and a correspondingly higher instruction throughput through the exploitation of Thread-Level Parallelism (TLP) as well as ILP. However, when multiple threads execute concurrently in a single core, they automatically compete for system resources. Our research shows that, without control over the number of entries each thread can occupy in system resources like instruction fetch queue and/or reorder buffer, a scenario called “mutual-hindrance” execution takes place. Conversely, introducing active resource sharing control mechanisms causes the opposite situation (“mutual-benefit” execution), with a possible significant performance improvement and lower cache miss frequency. This demonstrates that active resource sharing control is essential for future multicore multithreading microprocessor design.
    Algorithms and Architectures for Parallel Processing, 9th International Conference, ICA3PP 2009, Taipei, Taiwan, June 8-11, 2009. Proceedings; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to the conventional sequential programming model, the Instruction-Level Parallelism (ILP) that modern superscalar processors can explore is inherently limited. Hence, multithreading architectures have been proposed to exploit Thread-Level Parallelism (TLP) in addition to conventional ILP. By issuing and executing instructions from multiple threads at each clock cycle, Simultaneous MultiThreading (SMT) achieves some of the best possible system resource utilization and accordingly higher instruction throughput. In this chapter, the authors describe the origin of SMT microarchitecture, comparing it with other multithreading microarchitectures. They identify several key aspects for high-performance SMT design: fetch policy, handling long-latency instructions, resource sharing control, synchronization and communication. They also describe some potential benefits of SMT microarchitecture: SMT for fault tolerance and SMT for secure communications. Given the need to support sequential legacy code and emerge of new parallel programming model, we believe SMT microarchitecture will play a vital role as we enter the multi-thread multi/many-core processor design era.
    Handbook of Research on Scalable Computing Technologies, 07/2009: chapter 24: pages 552-582; IGI Global., ISBN: 9781605666617
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the last few years, the landscape of parallel computing has been subject to profound and highly dynamic changes. The paradigm shift towards multicore and manycore technologies coupled with accelerators in a heterogeneous environment is offering a great potential of computing power for scientific and industrial applications. However, for one to take full advantage of these new technologies, holistic approaches coupling the expertise ranging from hardware architecture and software design to numerical algorithms are a pressing necessity. Parallel computing is no longer limited to supercomputers and is now much more diversified – with a multitude of technologies, architectures, and programming approaches leading to increased complexity for developers and engineers. In this work, we give – from the perspective of numerical simulation and applications – an overview of existing and emerging multicore and manycore technologies as well as accelerator concepts. We emphasize the challenges associated with high-performance heterogeneous computing and discuss the interfaces needed to fill the gap between the hardware architecture and the implementation of efficient numerical algorithms. By means of this short survey – which stresses the necessity of hardware-aware computing – we aim at giving assistance to users in scientific computing entering this fascinating field and help understanding associated issues and capabilities. Copyright
    Concurrency and Computation Practice and Experience 05/2012; 24(7). DOI:10.1002/cpe.1904 · 1.00 Impact Factor
Show more