Conference Paper

Understanding performance, power and energy behavior in asymmetric multiprocessors.

Sch. of Comput. Sci., Georgia Inst. of Technol., Atlanta, GA
DOI: 10.1109/ICCD.2008.4751903 Conference: 26th International Conference on Computer Design, ICCD 2008, 12-15 October 2008, Lake Tahoe, CA, USA, Proceedings
Source: IEEE Xplore

ABSTRACT Multiprocessor architectures are becoming popular in both desktop and mobile processors. Among multiprocessor architectures, asymmetric architectures show promise in saving energy and power. However, the performance and energy consumption behavior of asymmetric multiprocessors with desktop-oriented multithreaded applications has not been studied widely. In this study, we measure performance and power consumption in asymmetric and symmetric multiprocessors using real 8 and 16 processor systems to understand the relationships between thread interactions and performance/power behavior. We find that when the workload is asymmetric, using an asymmetric multiprocessor can save energy, but for most of the symmetric workloads, using a symmetric multiprocessor (with the highest clock frequency) consumes less energy.

0 Bookmarks
 · 
57 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the shift to chip multiprocessors, managing shared resources has become a critical issue in realizing their full potential. Previous research has shown that thread mapping is a powerful tool for resource management. However, the difficulty of simultaneously managing multiple hardware resources and the varying nature of the workloads have impeded the efficiency of thread mapping algorithms. To overcome the difficulties of simultaneously managing multiple resources with thread mapping, the interaction between various microarchitectural resources and thread characteristics must be well understood. This paper presents an in-depth analysis of PARSEC benchmarks running under different thread mappings to investigate the interaction of various thread mappings with microarchitectural resources including, L1 I/D-caches, I/D TLBs, L2 caches, hardware prefetchers, off-chip memory interconnects, branch predictors, memory disambiguation units and the cores. For each resource, the analysis provides guidelines for how to improve its utilization when mapping threads with different characteristics. We also analyze how the relative importance of the resources varies depending on the workloads. Our experiments show that when only memory resources are considered, thread mapping improves an application's performance by as much as 14% over the default Linux scheduler. In contrast, when both memory and processor resources are considered the mapping algorithm achieves performance improvements by as much as 28%. Additionally, we demonstrate that thread mapping should consider L2 caches, prefetchers and off-chip memory interconnects as one resource, and we present a new metric called L2-misses-memory-latency-product (L2MP) for evaluating their aggregated performance impact.
    01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For higher processing and computing power, chip multiprocessors (CMPs) have become the new mainstream architecture. This shift to CMPs has created many challenges for fully utilizing the power of multiple execution cores. One of these challenges is managing contention for shared resources. Most of the recent research address contention for shared resources by single-threaded applications. However, as CMPs scale up to many cores, the trend of application design has shifted towards multi-threaded programming and new parallel models to fully utilize the underlying hardware. There are differences between how single- and multi-threaded applications contend for shared resources. Therefore, to develop approaches to reduce shared resource contention for emerging multi-threaded applications, it is crucial to understand how their performances are affected by contention for a particular shared resource. In this research, we propose and evaluate a general methodology for characterizing multi-threaded applications by determining the effect of shared-resource contention on performance. To demonstrate the methodology, we characterize the applications in the widely used PARSEC benchmark suite for shared-memory resource contention. The characterization reveals several interesting aspects of the benchmark suite. Three of twelve PARSEC benchmarks exhibit no contention for cache resources. Nine of the benchmarks exhibit contention for the L2-cache. Of these nine, only three exhibit contention between their own threads-most contention is because of competition with a co-runner. Interestingly, contention for the Front Side Bus is a major factor with all but two of the benchmarks and degrades performance by more than 11%.
    Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on; 05/2011

Full-text

View
0 Downloads
Available from