The Impact of Speculative Execution on SMT Processors
By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level
Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are fetched
from multiple threads. However, due to incorrect control speculations, a significant number of these in-flight instructions
are discarded from the pipelines of SMT processors (which is a direct consequence of these pipelines getting wider and deeper).
Although increasing the accuracy of branch predictors may reduce the number of instructions so discarded from the pipelines,
the prediction accuracy cannot be easily scaled up since aggressive branch prediction schemes strongly depend on the particular
predictability inherently to the application programs. In this paper, we present an efficient thread scheduling mechanism
for SMT processors, called SAFE-T (Speculation-Aware Front-End Throttling): it is easy to implement and allows an SMT processor
to selectively perform speculative execution of threads according to the confidence level on branch predictions, hence preventing
wrong-path instructions from being fetched. SAFE-T provides an average reduction of 57.9% in the number of discarded instructions
and improves the instructions per cycle (IPC) performance by 14.7% on average over the ICOUNT policy across the multi-programmed
workloads we simulate.
Available from: Chen Liu
[Show abstract] [Hide abstract]
ABSTRACT: One major obstacle faced by designers when entering the multicore era is how to harness the massive computing power which
these cores provide. Since Instructional-Level Parallelism (ILP) is inherently limited, one single thread is not capable of
efficiently utilizing the resource of a single core. Hence, Simultaneous MultiThreading (SMT) microarchitecture can be introduced
in an effort to achieve improved system resource utilization and a correspondingly higher instruction throughput through the
exploitation of Thread-Level Parallelism (TLP) as well as ILP. However, when multiple threads execute concurrently in a single
core, they automatically compete for system resources. Our research shows that, without control over the number of entries
each thread can occupy in system resources like instruction fetch queue and/or reorder buffer, a scenario called “mutual-hindrance”
execution takes place. Conversely, introducing active resource sharing control mechanisms causes the opposite situation (“mutual-benefit”
execution), with a possible significant performance improvement and lower cache miss frequency. This demonstrates that active
resource sharing control is essential for future multicore multithreading microprocessor design.
Algorithms and Architectures for Parallel Processing, 9th International Conference, ICA3PP 2009, Taipei, Taiwan, June 8-11, 2009. Proceedings; 01/2009
[Show abstract] [Hide abstract]
ABSTRACT: Due to the conventional sequential programming model, the Instruction-Level Parallelism (ILP) that modern superscalar processors can explore is inherently limited. Hence, multithreading architectures
have been proposed to exploit Thread-Level Parallelism (TLP) in addition to conventional ILP. By issuing
and executing instructions from multiple threads at each clock cycle, Simultaneous MultiThreading
(SMT) achieves some of the best possible system resource utilization and accordingly higher instruction
throughput. In this chapter, the authors describe the origin of SMT microarchitecture, comparing it with
other multithreading microarchitectures. They identify several key aspects for high-performance SMT
design: fetch policy, handling long-latency instructions, resource sharing control, synchronization and
communication. They also describe some potential benefits of SMT microarchitecture: SMT for fault tolerance and SMT for secure communications. Given the need to support sequential legacy code and
emerge of new parallel programming model, we believe SMT microarchitecture will play a vital role as
we enter the multi-thread multi/many-core processor design era.
Handbook of Research on Scalable Computing Technologies, 07/2009: chapter 24: pages 552-582; IGI Global., ISBN: 9781605666617
Available from: Jan-Philipp Weiss
[Show abstract] [Hide abstract]
ABSTRACT: In the last few years, the landscape of parallel computing has been subject to profound and highly dynamic changes. The paradigm shift towards multicore and manycore technologies coupled with accelerators in a heterogeneous environment is offering a great potential of computing power for scientific and industrial applications. However, for one to take full advantage of these new technologies, holistic approaches coupling the expertise ranging from hardware architecture and software design to numerical algorithms are a pressing necessity. Parallel computing is no longer limited to supercomputers and is now much more diversified – with a multitude of technologies, architectures, and programming approaches leading to increased complexity for developers and engineers.
In this work, we give – from the perspective of numerical simulation and applications – an overview of existing and emerging multicore and manycore technologies as well as accelerator concepts. We emphasize the challenges associated with high-performance heterogeneous computing and discuss the interfaces needed to fill the gap between the hardware architecture and the implementation of efficient numerical algorithms. By means of this short survey – which stresses the necessity of hardware-aware computing – we aim at giving assistance to users in scientific computing entering this fascinating field and help understanding associated issues and capabilities. Copyright
Concurrency and Computation Practice and Experience 05/2012; 24(7). DOI:10.1002/cpe.1904 · 1.00 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.