Understanding Performance, Power and Energy
Behavior in Asymmetric Multiprocessors
Nagesh B Lakshminarayana
School of Computer Science
Georgia Institute of Technology
Abstract—Multiprocessor architectures are becoming pop-
ular in both desktop and mobile processors. Among multipro-
cessor architectures, asymmetric architectures show promise
in saving energy and power. However, the performance and
energy consumption behavior of asymmetric multiprocessors
with desktop-oriented multithreaded applications has not been
In this study, we measure performance and power consump-
tion in asymmetric and symmetric multiprocessors using real
8 and 16 processor systems to understand the relationships
between thread interactions and performance/power behavior.
We find that when the workload is asymmetric, using an
asymmetric multiprocessor can save energy, but for most of
the symmetric workloads, using a symmetric multiprocessor
(with the highest clock frequency) consumes less energy.
Asymmetric multiprocessor architectures have been pro-
posed to be power efficient multiprocessor architectures ,
, , . Research has shown that these architec-
tures provide power-performance effective platforms for
both throughput-oriented applications and applications that
would benefit from having high performance processors.
Unfortunately, the performance and energy behavior of
multithreaded applications in asymmetric architectures has
not been studied widely. Balakrishnan et al.  evaluated
the performance of applications in an asymmetric multipro-
cessor (AMP). However, in their work, they only showed
performance effects in an AMP. Grant and Afsahi 
studied power-performance efficiency but they only focused
on scientific applications.
In this study, we evaluate the performance and power con-
sumption behavior of multithreaded applications in an AMP.
We emphasize on understanding thread interactions since
many modern applications have many locks and barriers.
To understand the overall power consumption behavior, we
measure the power consumption of two systems (8 proces-
sors and 16 processors). We measure the power consump-
tion of whole systems rather than the power consumption
of only processors, since performance and energy trade-offs
should consider the entire system including DRAM memory
We use PARSEC , the recently released multithreaded
benchmark suite for desktops, for our evaluations. We also
design several microbenchmarks to understand thread inter-
actions better. Furthermore, we modify the Linux scheduler
to evaluate asymmetry aware scheduling algorithms on an
Our experiments yield three major conclusions. First,
when threads do not interact intensively and when all
threads have similar amounts of work, a symmetric mul-
tiprocessor (SMP) with fast processors (i.e., the highest
clock frequency) consumes the least amount of energy.
Second, when thread interactions increase, an SMP with
slow processors or an AMP could provide the best energy
savings. Third, when the workload is strongly asymmetric
(i.e., each thread in the workload has different amount
of work), an AMP consumes the least amount of energy.
Hence, depending on the thread characteristics in mul-
tithreaded applications, a different machine configuration
would provide the best energy savings.
The contributions of our paper are
1) To our knowledge, this is the first work that evaluates
performance and the overall system power consump-
tion behavior in an AMP for multithreaded desktop
2) We thoroughly evaluate thread interaction behavior to
study performance and energy trade-offs in an AMP.
3) We propose a new, simple, but effective job schedul-
ing algorithm for an AMP and show that it provides
the best energy savings for asymmetric workloads.
A. Evaluation System
We use two systems as shown in Table I to measure
performance and energy consumption.1Applications run-
ning on machine-I have 8 threads and applications run-
ning on machine-II have 16 threads. We use SpeedStep
technology  with cpufreq governors to emulate
an AMP. Table II describes three machine configurations.
Machine-I runs RHEL 5 Desktop (Linux Kernel 2.6.18),
while Machine-II runs RHEL 4 WS (Linux Kernel 2.6.9).
1Since machine-I and machine-II show similar trends, we mainly report
results from machine-II except in Section VI.
978-1-4244-2658-4/08/$25.00 ©2008 IEEE471
Normalized execution time
Normalized energy consumption
benchmark (top: execution time, bottom: energy)
The performance and energy consumption behavior of the ITK
performance data in real systems and used software simu-
lations to predict the benefit of their throttling mechanism.
Their work focused on dynamic voltage/frequency scaling
mechanisms. However, our work focuses on understanding
the effects of thread interactions in an AMP.
Li et al.  also measured the performance of AMPs
by changing the clock frequencies. However, their work
focused on proposing thread migration polices in AMPs,
rather than understanding the performance/power behavior
Both Ge et al.  and Grant and Afashi  also
used a real system to measure performance and power
consumption behavior in AMPs. Both works presented only
the trade-offs between power and energy consumption in
multithreaded scientific applications whereas we evaluate
thread interaction effects thoroughly.
In this work, we evaluate the performance and energy
consumption behavior of desktop-oriented multithreaded
applications in AMPs. We also evaluate the effects of criti-
cal sections and barriers thoroughly to understand thread in-
teraction behavior on AMPs. We use real 8 and 16 processor
systems to measure performance and energy consumption.
The conclusions of our experiments are that (1) when
the workload is symmetric, it is usually better to use an
SMP with fast processors than an AMP to reduce both the
execution time and the energy consumption, (2) when an
application has frequent and long critical sections, using an
AMP could be better than using all fast processors to save
energy, and (3) when the workload is highly asymmetric,
using an AMP provides the lowest energy consumption.
We also propose and evaluate a new, simple scheduling
algorithm for an AMP. The scheduling algorithm simply
sends the longest thread to a fast processor. Using knowl-
edge of the application and processor characteristics, this
simple scheduling algorithm can reduce energy consump-
tion by up to 4% on an AMP compared to the best energy
efficient SMP configuration.
In future work, we will focus on predicting application
characteristics (e.g., the length of a task) without requir-
ing information from the programmer and designing task
scheduling algorithms that use the predicted information for
an AMP to reduce energy consumption.
We thank Min Lee and Sushma Rao for helping us understand
the Linux Kernel. We also thank Richard Vuduc and Onur Mutlu
for insightful discussions and Jaekyu Lee and Sunpyo Hong for
initial settings for the benchmarks. We thank Aemen Lodhi, Sung-
bae Kim and the anonymous reviewers for their comments and
suggestions. This research is supported by gifts from Microsoft
 M. Annavaram, E. Grochowski, and J. Shen, “Mitigating Amdahl’s
Law through EPI throttling,” in ISCA-32, 2005.
 S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai, “The impact
of performance asymmetry in emerging multicore architectures,” in
 A. Baniasadi and A. Moshovos, “Asymmetric-frequency clustering: a
power-aware back-end for high-performance processors,” in ISLPED,
 C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC
benchmark suite: Characterization and architectural implications,”
Princeton University, Tech. Rep. TR-811-08, 2008.
 R. Chandra, R. Menon, L. Dagum, D. Kohr, D. Maydan, and J. Mc-
Donald, Parallel Programming in OpenMP.
 C.-K. L. et al., “Pin: Building customized program analysis tools
with dynamic instrumentation,” in PLDI, 2005.
 “Extech 380801,” http://www.extech.com/instrument/products/310
399/380801.html, Extech Instruments Corporation.
 H. Franke, R. Russell, and M. Kirkwood, “Fuss, futexes and fur-
wocks: Fast userlevel locking in linux,” in Ottawa Linux Symposium,
 GCC-4.0, “GNU compiler collection,” http://gcc.gnu.org/.
 R. Ge, X. Feng, and K. W. Cameron, “Improvement of power-
performance efficiency for high-end computing,” in IPDPS’05, 2005.
 R. Grant and A. Afsahi, “Power-performance efficiency of asym-
metric multiprocessors for multi-threaded scientific applications,” in
 “Enhanced Intel SpeedStep Technology for the Intel Pentium M
Processor–White Paper,” Intel, March 2004.
 R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen,
“Single-ISA heterogeneous multi-core architectures: The potential
for processor power reduction,” in MICRO-36, 2003.
 T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn, “Efficient
operating system scheduling for performance-asymmetric multi-core
architecture,” in In Proceedings of Supercomputing 07, 2007.
 T. Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguad,
“Performance, power efficiency and scalability of asymmetric cluster
chip multiprocessors,” vol. 5, no. 1, 2006.
 National Library, “Medicine insight segmentation and registration
toolkit (ITK),” http://www.itk.org/.
 M. A. Suleman, M. K. Qureshi, and Y. N. Patt, “Feedback driven
threading: Power-efficient and high-performance execution of multi-
threaded workloads on cmps,” in ASPLOS-XIII, 2008.
 “OpenMP,” http://openmp.org/wp/, The OpenMP Architecture Re-