Conference Paper

Speculative synchronization and thread management for fine granularity threads

Technion-Israel Inst. of Technol., Haifa, Israel
DOI: 10.1109/HPCA.2006.1598136 Conference: High-Performance Computer Architecture, 2006. The Twelfth International Symposium on
Source: DBLP


Performance of multithreaded programs is heavily influenced by the latencies of the thread management and synchronization operations. Improving these latencies becomes especially important when the parallelization is performed at fine granularity. In this work we examine the interaction of speculative execution with the thread-related operations. We develop a unified framework which allows all such operations to be executed speculatively and provides efficient recovery mechanisms to handle misspeculation of branches which affect instructions in several threads. The framework was evaluated in the context of Inthreads, a programming model designed for very fine grain parallelization. Our measurements show that the speedup obtained by speculative execution of the threads-related instructions can reach 25%.

Download full-text


Available from: Avi Mendelson, Aug 17, 2015
  • Source

    Preview · Article ·
  • [Show abstract] [Hide abstract]
    ABSTRACT: Code generation for a multithreaded register sharing architecture is inherently complex and involves some issues absent in conventional code compilation. To approach the problem, we define a consistency contract between the program and the hardware and require the compiler to preserve the contract during code transformations. To apply the contract to compiler implementation, we develop a correctness framework that ensures preservation of the contract and use it to adjust the code optimizations for correctness under parallel code. One area that is naturally affected by register sharing is register allocation. We discuss adaptation of existing coloring-based algorithms for shared code and show how they benefit from the consistency contract. Another benefit affects the general compiler optimizations. We show that these optimizations need very little restrictions in order to be correct for parallel code, allowing the compiler to realize its potential to a high degree.
    No preview · Conference Paper · Oct 2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscalar processors. One promising method of exploiting TLP is dynamic speculative multithreading (D-SpMT), which extracts multiple threads from a sequential program without compiler support or instruction set extensions. This paper introduces Cascadia, a D-SpMT multicore architecture that provides multigrain thread-level support and is used to evaluate the performance of several benchmarks. Cascadia applies a unique sustainable IPC (sIPC) metric on a comprehensive loop tree to select the best performing nested loop level to multithread. This paper also discusses the relationships that loops have on one another, in particular, how loop nesting levels can be extended through procedures. In addition, a detailed study is provided on the effects that thread granularity and interthread dependencies have on the entire system.
    Full-text · Article · Jan 2010 · IEEE Transactions on Parallel and Distributed Systems