The Impact of Speculative Execution on SMT Processors.
ABSTRACT By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level
Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are fetched
from multiple threads. However, due to incorrect control speculations, a significant number of these in-flight instructions
are discarded from the pipelines of SMT processors (which is a direct consequence of these pipelines getting wider and deeper).
Although increasing the accuracy of branch predictors may reduce the number of instructions so discarded from the pipelines,
the prediction accuracy cannot be easily scaled up since aggressive branch prediction schemes strongly depend on the particular
predictability inherently to the application programs. In this paper, we present an efficient thread scheduling mechanism
for SMT processors, called SAFE-T (Speculation-Aware Front-End Throttling): it is easy to implement and allows an SMT processor
to selectively perform speculative execution of threads according to the confidence level on branch predictions, hence preventing
wrong-path instructions from being fetched. SAFE-T provides an average reduction of 57.9% in the number of discarded instructions
and improves the instructions per cycle (IPC) performance by 14.7% on average over the ICOUNT policy across the multi-programmed
workloads we simulate.
SourceAvailable from: Jan-Philipp Weiss[Show abstract] [Hide abstract]
ABSTRACT: In the last few years, the landscape of parallel computing has been subject to profound and highly dynamic changes. The paradigm shift towards multicore and manycore technologies coupled with accelerators in a heterogeneous environment is offering a great potential of computing power for scientific and industrial applications. However, for one to take full advantage of these new technologies, holistic approaches coupling the expertise ranging from hardware architecture and software design to numerical algorithms are a pressing necessity. Parallel computing is no longer limited to supercomputers and is now much more diversified – with a multitude of technologies, architectures, and programming approaches leading to increased complexity for developers and engineers. In this work, we give – from the perspective of numerical simulation and applications – an overview of existing and emerging multicore and manycore technologies as well as accelerator concepts. We emphasize the challenges associated with high-performance heterogeneous computing and discuss the interfaces needed to fill the gap between the hardware architecture and the implementation of efficient numerical algorithms. By means of this short survey – which stresses the necessity of hardware-aware computing – we aim at giving assistance to users in scientific computing entering this fascinating field and help understanding associated issues and capabilities. Copyright © 2011 John Wiley & Sons, Ltd.Concurrency and Computation Practice and Experience 05/2012; 24(7). DOI:10.1002/cpe.1904 · 0.78 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: One major obstacle faced by designers when entering the multicore era is how to harness the massive computing power which these cores provide. Since Instructional-Level Parallelism (ILP) is inherently limited, one single thread is not capable of efficiently utilizing the resource of a single core. Hence, Simultaneous MultiThreading (SMT) microarchitecture can be introduced in an effort to achieve improved system resource utilization and a correspondingly higher instruction throughput through the exploitation of Thread-Level Parallelism (TLP) as well as ILP. However, when multiple threads execute concurrently in a single core, they automatically compete for system resources. Our research shows that, without control over the number of entries each thread can occupy in system resources like instruction fetch queue and/or reorder buffer, a scenario called “mutual-hindrance” execution takes place. Conversely, introducing active resource sharing control mechanisms causes the opposite situation (“mutual-benefit” execution), with a possible significant performance improvement and lower cache miss frequency. This demonstrates that active resource sharing control is essential for future multicore multithreading microprocessor design.Algorithms and Architectures for Parallel Processing, 9th International Conference, ICA3PP 2009, Taipei, Taiwan, June 8-11, 2009. Proceedings; 01/2009