Conference Proceeding
Uncovering hidden loop level parallelism in sequential applications
Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI
03/2008;
DOI:10.1109/HPCA.2008.4658647
pp.290 - 301 In proceeding of: High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on
Source: IEEE Xplore
-
Citations (0)
- Cited In (5)
-
Conference Proceeding: Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism
[show abstract] [hide abstract]
ABSTRACT: As the web becomes the platform of choice for execution of more complex applications, a growing portion of computation is handed off by developers to the client side to reduce network traffic and improve application responsiveness. Therefore, the client-side component, often written in JavaScript, is becoming larger and more compute-intensive, increasing the demand for high performance JavaScript execution. This has led to many recent efforts to improve the performance of JavaScript engines in the web browsers. Furthermore, considering the wide-spread deployment of multi-cores in today's computing systems, exploiting parallelism in these applications is a promising approach to meet their performance requirement. However, JavaScript has traditionally been treated as a sequential language with no support for multithreading, limiting its potential to make use of the extra computing power in multicore systems. In this work, to exploit hardware concurrency while retaining traditional sequential programming model, we develop ParaScript, an automatic runtime parallelization system for JavaScript applications on the client's browser. First, we propose an optimistic runtime scheme for identifying parallelizable regions, generating the parallel code on-the-fly, and speculatively executing it. Second, we introduce an ultra-lightweight software speculation mechanism to manage parallel execution. This speculation engine consists of a selective checkpointing scheme and a novel runtime dependence detection mechanism based on reference counting and range-based array conflict detection. Our system is able to achieve an average of 2.18× speedup over the Firefox browser using 8 threads on commodity multi-core systems, while performing all required analyses and conflict detection dynamically at runtime.High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on; 03/2011 -
Article: Compiler Assisted Out-Of-Order Instruction Commit
[show abstract] [hide abstract]
ABSTRACT: This paper proposes an out-of-order instruction commit mechanism using a novel compiler/architecture interface. The compiler provides information about instruction "blocks" and the processor uses the block information to decide which instructions can be committed out of order and when. Some blocks are guar-anteed to be data independent blocks which allows instructions from different such blocks be committed simultaneously and out of order. Other blocks have data or control dependencies and require in-order ex-ecution and in-order commit. Micro-architectural support required for the new commit mode is made on top of the standard, ROB-based commit and includes out-of-order instruction commit, early register release, support for committing loads and stores out of order, and exception handling. All of these are driven by the block information which simplifies the hardware. Results for a 4-wide processor model based on the Alpha 21264 and a set of 6 SPEC2000 and 2006 benchmarks show that, on average, 52% instructions are committed out of order resulting in 10% to 26% speedups over in-order commit with minimal hardware overhead.12/2010; -
Conference Proceeding: Runtime parallelization of legacy code on a transactional memory system.
High Performance Embedded Architectures and Compilers, 6th International Conference, HiPEAC 2011, Heraklion, Crete, Greece, January 24-26, 2011. Proceedings; 01/2011
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
additional cores
automatic parallelization
control dependences
difficult task
explicit thread-level parallelism
exploiting loop-level parallelism
four core system
general-purpose applications
general-purpose software
hidden parallelism
legacy single-threaded software
loop-level parallelism
memory dependence analysis
modest performance gains
scientific applications
scientific parallelization communities
single-threaded applications
traditional thread-level speculation techniques
unlikely dependences
vast amount