Conference PaperPDF Available

PowerNap: Eliminating Server Idle Power

Authors:
Conference Paper

PowerNap: Eliminating Server Idle Power

Abstract and Figures

Data center power consumption is growing to unprece- dented levels: the EPA estimates U.S. data centers will con- sume 100 billion kilowatt hours annually by 2011. Much of this energy is wasted in idle systems: in typical deployments, server utilization is below 30%, but idle servers still con- sume 60% of their peak power draw. Typical idle periods— though frequent—last seconds or less, confounding simple energy-conservation approaches. Inthispaper,we proposePowerNap,anenergy-conservation approach where the entire system transitions rapidly be- tween a high-performance active state and a near-zero- power idle state in response to instantaneous load. Rather than requiring fine-grained power-performance states and complex load-proportionaloperationfrom each system com- ponent, PowerNap instead calls for minimizing idle power and transition time, which are simpler optimization goals. Based on the PowerNap concept, we develop requirements and outline mechanisms to eliminate idle power waste in en- terprise blade servers. Because PowerNap operates in low- efficiency regions of current blade center power supplies, we introduce the Redundant Array for Inexpensive Load Shar- ing (RAILS), a power provisioning approach that provides high conversion efficiency across the entire range of Power- Nap's power demands. Using utilization traces collected from enterprise-scale commercial deployments, we demon- strate that, together, PowerNap and RAILS reduce average server power consumption by 74%. Categories and Subject Descriptors C.5.5 (Computer Sys-
Content may be subject to copyright.
PowerNap: Eliminating Server Idle Power
David MeisnerBrian T. GoldThomas F. Wenisch
meisner@umich.edu bgold@cmu.edu twenisch@umich.edu
Advanced Computer Architecture Lab Computer Architecture Lab
The University of Michigan Carnegie Mellon University
Abstract
Data center power consumption is growing to unprece-
dented levels: the EPA estimates U.S. data centers will con-
sume 100 billion kilowatt hours annually by 2011. Much of
this energy is wasted in idle systems: in typical deployments,
server utilization is below 30%, but idle servers still con-
sume 60% of their peak power draw. Typical idle periods—
though frequent—last seconds or less, confounding simple
energy-conservation approaches.
In this paper, we propose PowerNap, an energy-conservation
approach where the entire system transitions rapidly be-
tween a high-performance active state and a near-zero-
power idle state in response to instantaneous load. Rather
than requiring fine-grained power-performance states and
complex load-proportional operation from each system com-
ponent, PowerNap instead calls for minimizing idle power
and transition time, which are simpler optimization goals.
Based on the PowerNap concept, we develop requirements
and outline mechanisms to eliminate idle power waste in en-
terprise blade servers. Because PowerNap operates in low-
efficiency regions of current blade center power supplies, we
introduce the Redundant Array for Inexpensive Load Shar-
ing (RAILS), a power provisioning approach that provides
high conversion efficiency across the entire range of Power-
Nap’s power demands. Using utilization traces collected
from enterprise-scale commercial deployments, we demon-
strate that, together, PowerNap and RAILS reduce average
server power consumption by 74%.
Categories and Subject Descriptors C.5.5 [Computer Sys-
tem Implementation]: Servers
General Terms Design, Measurement
Keywords power management, servers
1. Introduction
Data center power consumption is undergoing alarming
growth. By 2011, U.S. data centers will consume 100 bil-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
ASPLOS’09,
March 7–11, 2009, Washington, DC, USA.
Copyright c
2009 ACM 978-1-60558-215-3/09/03. . . $5.00
lion kWh at a cost of $7.4 billion per year [27]. Unfortu-
nately, much of this energy is wasted by systems that are
idle. At idle, current servers still draw about 60% of peak
power [1, 6, 13]. In typical data centers, average utilization
is only 20-30% [1, 3]. Low utilization is endemic to data
center operation: strict service-level- agreements force oper-
ators to provision for redundant operation under peak load.
Idle-energy waste is compounded by losses in the power
delivery and cooling infrastructure, which increase power
consumption requirements by 50-100% [18].
Ideally, we would like to simply turn idle systems off. Un-
fortunately, a large fraction of servers exhibit frequent but
brief bursts of activity [2, 3]. Moreover, user demand often
varies rapidly an d/or unpredictably, making dynamic consol-
idation and system shutdown difficult. Our analysis shows
that server workloads, especially interactive services, exhibit
frequent idle periods of less than one second, which cannot
be exploited by existing mechanisms.
Concern over idle-energy waste has prompted calls for a
fundamental redesign of each computer system component
to consume energy in proportion to utilization [1]. Proces-
sor dynamic frequency and voltage scaling (DVFS) exem-
plifies the energy-proportional concept, providing up to cu-
bic energy savings under reduced load. Unfortunately, pro-
cessors account for an ever-shrinking fraction of total server
power, only 25% in current systems [6, 12, 13], and contro l-
ling DVFS remains an active research topic [17, 30]. Other
subsystems incur many fixed power overheads when active
and do not yet offer energy-proportional operation.
We propose an alternative energy-conservation approach,
called PowerNap, that is attuned to server utilization pat-
terns. With PowerNap, we design the entire system to tran-
sition rapidly between a high-performance active state and a
minimal-power nap state in response to instantaneous load.
Rather than requiring components that provide fine-grain
power-performance trade-offs, PowerNap simplifies the sys-
tem designer’s task to focus on two optimization goals:
(1) optimizing energy efficiency while napping, and (2) min-
imizing transition time into and out of the low-power nap
state.
Based on the PowerNap concept, we develop requirements
and outline mechanisms to eliminate idle power waste in
a high-density blade server system. Whereas many mech-
anisms required by PowerNap can be adapted from mo-
10 20 30 40 50 60 70 80 90 100
0
20
40
60
80
100
Utilization (%)
Time (%)
IT
Web 2.0
Figure 1: Server Utilization Histogram. Real data
centers are under 20% utilized.
Table 1: Enterprise Data Center Utilization Traces.
Workload Avg. Utilization Description
Web 2.0 7.4% “Web 2.0” application servers
IT 14.2% Enterprise IT Infrastructure apps
bile and handheld devices, one critical subsystem of cur-
rent blade chassis falls short of meeting PowerNap’s energy-
efficiency requirements: the p ower conversion system. Power-
Nap reduces total ensemble power consumption when all
blades are napping to only 6% of the peak when all are ac-
tive. Power supplies are notoriously inefficient at low loads,
typically providing conversion efficiency below 70% under
20% load [5]. These losses undermines PowerNap’s energy
efficiency.
Directly improving power supply efficiency implies a sub-
stantial cost premium. Instead, we introduce the Redundant
Array for Inexpensive Load Sharing (RAILS), a power pro-
visioning approach where power draw is shared over an ar-
ray of low-capacity power supply units (PSUs) built with
commodity components. The key innovation of RAILS is
to size individual power modules such that the power de-
livery solution operates at high efficiency across the entire
range of PowerNap’s power demands. In addition, RAILS
provides N+1 redundancy, graceful compute capacity degra-
dation in the face of multiple power module failures, and
reduced component costs relative to conventional enterprise-
class power systems. Through modeling and analysis of ac-
tual data center workload traces, we demonstrate:
Analysis of idle/busy intervals in actual data centers.
We analyze utilization traces from production servers
and data centers to determine the distribution of idle and
active periods. Though interactive servers are typically
over 60% idle, most idle intervals are under one second.
Energy-efficiency and response time bounds. Through
queuing analysis, we establish bounds on PowerNap’s
energy efficiency and response time impact. Using our
0%
20%
40%
60%
80%
100%
IBM Sun Google
% Server Power
CPU
Fans
I/O & Disk
Memory
Other
Figure 2: Server Power Breakdown. No single com-
ponent dominates total system power.
models, we determine that PowerNap is effective if state
transition time is below 10ms, and incurs no overheads
below 1ms. Furthermore, we show that PowerNap pro-
vides greater energy efficiency and lower response time
than solutions based on DVFS.
Efficient PowerNap power provisioning with RAILS.
Our analysis of commercial data center workload traces
demonstrates that RAILS improves average power con-
version efficiency from 68% to 86% in PowerNap-
enabled servers.
2. Understanding Server Utilization
It has been well-established in the research literature that the
average server utilization of data centers is low, often below
30% [2, 3, 6]. In facilities that provide interactive services
(e.g., transaction processing, file servers, Web 2.0), average
utilization is often even worse, sometimes as low as 10% [3].
Figure 1 depicts a histogram of utilization for two production
workloads from enterprise-scale commercial deployments.
Table 1 describes the workloads running on these servers.
We derive this data from utilization traces collected over
many days, aggregated over more than 120 severs (produc-
tion utilization traces were provided courtesy of HP Labs).
The most striking feature of this data is that the servers spend
the vast majority of time under 10% utilization.
Data center utilization is unlikely to increase for two reasons.
First, data center operators must provision for peak rather
than average load. For interactive services, peak utilization
often exceeds average utilization by more than a factor of
three [3]. Second, to provide redundancy in the event of
failures, operators usually deploy more systems than are
actually needed. Though server consolidation can improve
average utilization, performance isolation, redundancy, and
service robustness concerns often preclude consolidation of
mission-critical services.
Low utilization creates an energy efficiency challenge be-
cause conventional servers are notoriously inefficient at low
loads. Although power-saving features like clock gating and
101102103104105
0
10
20
30
40
50
60
70
80
90
100
Busy Period (ms)
Percent
Web
Mail
DNS
Shell
Backup
Cluster
101102103104105
0
10
20
30
40
50
60
70
80
90
100
Idle Period (ms)
Percent
Web
Mail
DNS
Shell
Backup
Cluster
Figure 3: Busy and Idle Period Cumulative Distributions.
Table 2: Fine-Grain Utilization Traces.
Workload Utilization Avg. Interval Description
Busy Idle
Web 26.5% 38 ms 106 ms Department web server
Mail 55.0% 115 ms 94 ms Department POP and SMTP servers
DNS 17.4% 194 ms 923 ms Department DNS and DHCP server
Shell 32.0% 51 ms 108 ms Interactive shell and IMAP support
Backup 22.2% 31 ms 108 ms Continuous incremental backup server
Cluster 64.3% 3.25 s 1.8 s 600-node scientific computing cluster
dynamic voltage and frequency scaling (DVFS) nearly elim-
inate processor power consumption in idle systems, present-
day servers still dissipate about 60% as much power when
idle as when fully loaded [4, 6,13]. Processors often account
for only a quarter of system power; main memory and cool-
ing fans contribute larger fractions [14]. Figure 2 reproduces
typical server power breakdowns for the IBM p670 [14],
Sun UltraSparc T2000 [12], and a generic server specified
by Google [6], respectively.
2.1 Frequent Brief Utilization
Clearly, eliminating server idle power waste is critical to im-
proving data center energy efficiency. Engineers have been
successful in reducing idle power in mobile platforms, such
as cell phones and laptops. However, servers pose a funda-
mentally different challenge than these platforms. The key
observation underlying our work is that, although servers
have low utilization, their activity occurs in frequent, brief
bursts. As a result, they appear to be under a constant, light
load.
To investigate the time scale of servers’ idle and busy peri-
ods, we have instrumented a series of interactive and batch
processing servers to collect utilization traces at 10ms gran-
ularity. To our knowledge, our study is the first to report
server utilization data measured at such fine granularity. We
classify an interval as busy or idle based on how the OS
scheduler accounted the period in its utilization tracking.
The traces were collected over a period of a week from seven
departmental IT servers and a scientific computing cluster
comprising over 600 servers. We present the mean idle and
busy period lengths, average utilization, and a brief descrip-
tion of each trace in Table 2.
Figure 3 shows the cumulative distribution for the busy and
idle period lengths in each trace. The key result of our traces
is that the vast majority of idle periods are shorter than 1s,
with mean lengths in the 100’s of milliseconds. Busy periods
are even shorter, typically only 10’s of milliseconds.
2.2 Existing Energy-Conservation Techniques
The rapid transitions and brief intervals of server activity
make it difficult to conserve idle power with existing ap-
proaches. The recent trend towards server consolidation [20]
is partly motivated by the high energy cost of idle sys-
tems. By moving services to virtual machines, several ser-
vices can be time-multiplexed on a single physical server,
increasing average utilization. Consolidation allows the to-
tal number of physical servers to be reduced, thereby re-
ducing idle inefficiency. However, server consolidation, by
itself, does not close the gap between peak and average uti-
lization. Data centers still require sufficient capacity for peak
demand, which inevitably leaves some servers idle in the av-
erage case. Furthermore, consolidation does not save energy
automatically; system administrators must actively consoli-
date services and remove unneeded systems.
Although support for sleep states is widespread in handheld,
laptop and desktop machines, these states are rarely used
in current server systems. Unfortunately, the high restart la-
tency typical of current sleep states renders them unaccept-
SSD
Fans
NIC
DRAM CPUs
Service
SSD
Fans
NIC
DRAM CPUs
Service
Packet
SSD
Fans
NIC
DRAM CPUs
Service
!
zzz
zzz
zzz
System components nap
while server is idle
The NIC detects the
arrival of work
Server returns to full performance to
finish work as quickly as possible
SSD
Fans
NIC
DRAM CPUs
Service
Server operates at full performance
to finish existing work
PowerNap
Transition
Work
Arrival
Wake
Transition
Figure 4: PowerNap.
able for interactive services; current laptops and desktops
require several seconds to suspend using operating system
interfaces (e.g., ACPI). Moreover, unlike consumer devices,
servers cannot rely on the user to transition between power
states; they must have an autonomous mechanism that man-
ages state transitions.
Recent server processors include CPU throttling solutions
(e.g. Intel Speedstep, AMD Cool’n’Quiet) to reduce the
large overhead of light loads. These processors use DVFS
to reduce their operating frequency linearly while gaining
cubic power savings. DVFS relies on operating system sup-
port to tune processor frequency to instantaneous load. In
Linux, the kernel continues lowering frequency until it ob-
serves 20% idle time. Improving DVFS control algorithms
remains an active research area [17, 30]. Nonetheless, DVFS
can be highly effective in reducing CPU power. However, as
Figure 2 shows, CPUs account for a small portion of total
system power.
Energy proportional computing [6] seeks to extend the suc-
cess of DVFS to the entire system. In this scheme, each sys-
tem component is redesigned to consume energy in propor-
tion to utilization. In an energy-proportional system, explicit
power management is unnecessary, as power consumption
varies naturally with utilization. However, as many compo-
nents incur fixed power overheads when active (e.g., clock
power on synchronous memory busses, leakage power in
CPUs, etc.) designing energy-proportional subsystems re-
mains a research challenge.
Energy-proportional operation can be approximated with
non-energy-proportional systems through dynamic virtual
machine consolidation over a large server ensemble [25].
However, such approaches do not address the performance
isolation concerns of dynamic consolidation and operate at
coarse time scales (minutes). Hence, they cannot exploit the
brief idle periods found in servers.
3. PowerNap
Although servers spend most of their time idle, conven-
tional energy-conservation techniques are unable to exploit
these brief idle periods. Hence, we propose an approach to
power management that enables the entire system to tran-
sition rapidly into and out of a low-power state where all
activity is suspended until new work arrives. We call our ap-
proach PowerNap.
Figure 4 illustrates the PowerNap concept. Each time the
server exhausts all pending work, it transitions to the nap
state. In this state, nearly all system components enter sleep
modes, which are already available in many components (see
Section 4). While in the nap state, power consumption is
low, but no processing can occur. System components that
signal the arrival of new work, expiration of a software timer,
or environmental changes, remain partially powered. When
new work arrives, the system wakes and transitions back
to the active state. When the work is complete, the system
returns to the nap state.
PowerNap is simpler than many other energy conservation
schemes because it requires system components to support
only two operating modes: an active mode that provides
maximum performance and a nap mode that minimizes
power draw. For many devices, providing a low-power nap
mode is far easier than providing multiple active modes that
trade performance for power savings. Any level of activity
often implies fixed power overheads (e.g., bus clock switch-
ing, power distribution losses, leakage power, mechanical
components, etc.) We outline mechanisms required to im-
plement PowerNap in Section 4.
3.1 PowerNap Performance and Power Model
To assess PowerNap’s potential, we develop a queuing
model that relates its key performance measures—energy
savings and response time penalty—to workload parameters
and PowerNap implementation characteristics. We contrast
PowerNap with a model of the upper-bound energy-savings
possible with DVFS. The goal of our model is threefold:
(1) to gain insight into PowerNap behavior, (2) to derive re-
quirements for PowerNap implementations, and (3) to con-
trast PowerNap and DVFS.
Figure 5: PowerNap and DVFS Analytic Models.
We model both PowerNap and DVFS under the assump-
tion that each seeks to minimize the energy required to
serve the offered load. Hence, both schemes provide iden-
tical throughput (matching the offered load) but differ in re-
sponse time and energy consumption.
PowerNap Model. We model PowerNap as an M/G/1 queu-
ing system with arrival rate λ, and a generalized service time
distribution with known first and second moments E[S]and
E[S2]. Figure 5(a) shows the work in the queue for three
job arrivals. Note that, in this context, work also includes
time spent in the wake and suspend states. Average server
utilization is given by ρ=λE[S]. To model the effects
of PowerNap suspend and wake transitions, we extend the
conventional M/G/1 model with an exceptional first service
time [29]. We assume PowerNap transitions are symmetric
with latency Tt. Service of the first job in each busy period
is delayed by an initial setup time I. The setup time includes
the wake transition and may include the remaining portion
of a suspend transition as shown for the rightmost arrival in
Figure 5(a). Hence, for an arrival xtime units from the start
of the preceding idle period, the initial setup time is given
by:
I=2Ttxif 0x < Tt
Ttif xTt
The first and second moments E[I]and E[I2]are:
E[I] = Z
0
Iλeλxdx = 2Tt+1
λeλTt1
λ
E[I2] = Z
0
I2λeλxdx
= 4T2
t2T2
teλTt
4Tt
λ+2
λ21(1 + λTt)eλTt
We compute average power as
Pavg =Pnap ·Fnap +Pmax(1 Fnap ),
where the fraction of time spent napping Fnap is given by
the ratio of the expected length of each nap period E[N]to
the expected busy-idle cycle length E[C]:
Fnap =RTt
0(0)λeλtdt +R
Tt(tTt)λeλtdt
E[S]+E[I]
1λE[S]+1
λ
=eλTt(1 λE[S])
1 + λE[I]
The response time for an M/G/1 server with exceptional first
service is due to Welch [29]:
E[R] = λE[S2]
2(1λE[S]) +2E[I]+λE[I2]
2(1+λE[I]) +E[S]
Note that the first term of E[R]is the Pollaczek-Khinchin
formula for the expected queuing delay in a standard M/G/1
queue, the second term is additional residual delay caused
by the initial setup time I, and the final term is the expected
service time E[S]. The second term vanishes when Tt= 0.
DVFS model. Rather than model a real DVFS frequency
control algorithm, we instead model the upper bound of en-
ergy savings possible with DVFS. For each job arrival, we
scale instantaneous frequency fto stretch the job to fill any
idle time until the next job arrival, as illustrated in Fig-
ure 5(b), which gives E[f] = fmax ρ. This scheme maxi-
mizes power savings, but cannot be implemented in prac-
tice because it requires knowledge of future arrival times.
We base power savings estimates on the theoretical formu-
lation of processor dynamic power consumption PC P U =
1
2CV 2Af . We assume Cand Aare fixed, and choose the op-
timal ffor each job within the range fmin < f < fmax .
We impose a lower bound fmin =fmax /2.4to prevent re-
sponse time from growing asymptotically when utilization
is low. We chose a factor of 2.4 between fmin and fmax
based on the frequency range provided by a 2.4 GHz AMD
Athlon. We assume voltage scales linearly with frequency
(i.e., V=Vmax(f /fmax)), which is optimistic with respect
to current DVFS implementations. Finally, as DVFS only re-
duces the CPU’s contribution to system power, we include
a parameter FC P U to control the fraction of total system
power affected by DVFS. Under these assumptions, average
power Pavg is given by:
Pavg =Pmax(1 FC P U (E[f]
fmax )3)
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
% utilization
Avg. Power (% max power)
DVFS = 100% PowerNap = 100 ms
DVFS = 40% PowerNap = 10 ms
DVFS = 20% PowerNap = 1 ms
FCPU
FCPU
FCPU
Tt
Tt
Tt
(a) Power Scaling
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0% 20% 40% 60% 80% 100%
% utilization
Relative response time
DVFS
PowerNap = 100 ms
PowerNap = 10 ms
PowerNap = 1 ms
Tt
Tt
Tt
(b) Response Time Scaling
Figure 6: PowerNap and DVFS Power and Response Time Scaling.
Response time is given by:
E[R] = EhRbase
fi
where Rbase is the response time without DVFS.
3.2 Analysis
Power Savings. Figure 6(a) shows the average power (as
a fraction of peak) required under PowerNap and DVFS as
a function of utilization. For DVFS, we show power sav-
ings for three values of FC P U .FCP U = 100% represents the
upper bound if DVFS were applicable to all system power.
20% < FCP U <40% bound the typical range in cur-
rent servers. For PowerNap, we construct the graphs with
E[s] = 38ms and E[s2] = 3.7E[s], which are both esti-
mated from the observed busy period distribution in our Web
trace. We assume Pnap is 5% of Pmax . We vary λto adjust
utilization, and present results for three values of Tt: 1ms,
10ms, and 100ms. We expect 10ms to be a conservative esti-
mate for achievable PowerNap transition time. For transition
times below 1ms, transition time becomes negligible and the
power savings from PowerNap varies linearly with utiliza-
tion for all workloads. We discuss transition times further in
Section 4.
When FCP U is high, DVFS clearly outperforms PowerNap,
as it provides cubic power savings while PowerNap’s sav-
ings are at best linear in utilization. However, for realistic
values of FCP U and transition times in our expected range
(Tt10ms), PowerNap’s savings rapidly overtake DVFS.
As transition time increases, the break-even point between
DVFS and PowerNap shifts towards lower utilization. Even
for a transition time of 100 ms, PowerNap can provide sub-
stantial energy savings when utilization is below 20%.
Table 3: Per-Workload Energy Savings.
Workload PowerNap Energy Savings DVFS Energy Savings
Web 59% 23%
Mail 35% 21%
DNS 77% 23%
Shell 55% 23%
Backup 61% 23%
Cluster 34% 18%
Response time. In Figure 6(b), we compare the response
time impact of DVFS and PowerNap. The vertical axis
shows response time normalized to a system without power
management (i.e., that always operates at fmax). For DVFS,
response time grows rapidly when the gap between job ar-
rivals is large, and reaches the fmin floor below 40% utiliza-
tion. DVFS response time penalty is independent of FC P U ,
and is bounded at 2.4 by the ratio of fmax /fmin. For Power-
Nap, the response time penalty is negligible if Ttis small
relative to average service time E[S], which we expect to
be the common case (i.e., most jobs last longer than 10ms).
However, if Ttis significant relative to E[S], the PowerNap
response time penalty grows as utilization shrinks. When
utilization is high, the server is rarely idle and few jobs are
delayed by transitions. As utilization drops, the additional
delay seen by each job converges to Tt(i.e., every job must
wait for wake-up).
Per-Workload Energy Savings. Finally, we report the en-
ergy savings under simulated PowerNap and DVFS schemes
for our workload traces. Because these traces only contain
busy and idle periods, and not individual job arrivals, we
cannot estimate response time impact. For each workload,
we perform a trace-based simulation that assumes busy pe-
riods will start at the same time, independent of the current
PowerNap state (i.e., new work still arrives during wake or
suspend transitions). We assume a PowerNap transition time
of 10ms and nap power at 5% of active power, which we be-
lieve to be conservative estimates (see Section 4). For DVFS,
we assume FCP U = 25%. Table 3 shows the results of these
simulations. All workloads except Mail and Cluster hit the
DVFS frequency floor, and, hence, achieve a 23% energy
savings. In all cases, PowerNap achieves greater energy sav-
ings. Additionally, we extracted the average arrival rate (as-
suming a Poisson arrival process) and compared the results
in Table 3 with the M/G/1 model of Fnap derived above. We
found that for these traces, the analytic model was within
2% of our simulated results in all cases. When arrivals are
more deterministic (e.g., Backup) than the exponential we
assume, the model slightly overestimates PowerNap savings.
For more variable arrival processes (e.g., Shell), the model
underestimates the energy savings.
3.3 Implementation Requirements
Based on the results of our analytic model, we identify two
key PowerNap implementation requirements:
Fast transitions. Our model demonstrates that transition
speed is the dominant factor in determining both the power
savings potential and response time impact of PowerNap.
Our results show that transition time must be less than
one tenth of average busy period length. Although a 10ms
transition speed is sufficient to obtain significant savings,
1ms transitions are necessary for PowerNap’s overheads to
become negligible. To achieve these transition periods, a
PowerNap implementation must preserve volatile system
state (e.g., memory) while napping—mass storage devices
transfer rates are insufficient to transfer multiple GB of
memory state in milliseconds.
Minimizing power draw in nap state. Given the low uti-
lization in most enterprise deployments, servers will spend
a majority of time in the nap state, making PowerNap’s
power requirements the key factor affecting average sys-
tem power. Hence, it is critical to minimize the power draw
of napping system components. As a result of eliminating
idle power, PowerNap drastically increases the range be-
tween the minimum and maximum power demands on a
blade chassis. Existing blade-chassis power-conversion sys-
tems are inefficient in the common case, where all blades are
napping. Hence, to maximize PowerNap potential, we must
re-architect the blade chassis power subsystem to increase
its efficiency at low loads.
Although PowerNap requires system-wide modifications, it
demands only two states from each subsystem: active and
nap states. Hence, implementing PowerNap is substantially
simpler than developing energy-proportional components.
Because no computation occurs while napping, many fixed
Table 4: Component Power Consumption.
Component Power Transition Sources
Active Idle Nap
CPU chip 80-150W 12-20W 3.4W 30 µs [10] [9]
DRAM DIMM 3.5-5W 1.8-2.5W 0.2W <1µs [16] [8]
NIC 0.7W 0.3W 0.3W no trans. [24]
SSD 1W 0.4W 0.4W no trans. [22]
Fan 10-15W 1-3W - independent [15]
PSU 50-60W 25-35W 0.5W 300 µs [19]
Typical Blade 450W 270W 10.4W 300 µs
power draws, such as clocks and leakage power, can be
conserved.
4. PowerNap Mechanisms
We outline the design of a PowerNap-enabled blade server
system and enumerate required implementation mecha-
nisms. PowerNap requires nap support in all hardware sub-
systems that have non-negligible idle power draws, and soft-
ware/firmware support to identify and maximize idle periods
and manage state transitions.
4.1 Hardware Mechanisms
Most of the hardware mechanisms required by PowerNap
already exist in components designed for mobile devices.
However, few of these mechanisms are exploited in existing
servers, and some are omitted in current-generation server-
class components. For each hardware subsystem, we identify
existing mechanisms or outline requirements for new mech-
anisms necessary to implement PowerNap. Furthermore, we
provide estimates of power dissipation while napping and
transition speed. We summarize these estimates, along with
our sources, in Table 4. Our estimates for a ”Typical Blade”
are based on HP’s c-series half-height blade designs; our
PowerNap power estimate assumes a two-CPU system with
eight DRAM DIMMs.
Processor: ACPI S3 “Sleep” state. The ACPI standard de-
fines the S3 “Sleep” state for processors that is intended to
allow low-latency transitions. Although the ACPI standard
does not specify power or performance requirements, some
implementations of S3 are ideal for PowerNap. For exam-
ple, in Intel’s mobile processor line, S3 preserves last-level
cache state and consumes only 3.4W [10]. These processors
require approximately 30 µs for PLL stabilization to transi-
tion from sleep back to active execution [9].
If S3 is unavailable, clock gating can provide substantial en-
ergy savings. For example, Intel’s Xeon 5400-series power
requirements drop from 80W to 16W upon executing a halt
instruction [11]. From this state, resuming execution re-
quires only nanosecond-scale delays.
DRAM: Self-refresh. DRAM is typically the second-most
power-hungry system component when active. However,
several recent DRAM specifications feature an operating
mode, called self-refresh, where the DRAM is isolated from
the memory controller and autonomously refreshes DRAM
content. In this mode, the memory bus clock and PLLs are
disabled, as are most of the DRAM interface circuitry. Self-
refresh saves more than an order of magnitude of power.
For example, a 2GB SODIMM (designed for laptops) with
a peak power draw above 5W uses only 202mW of power
during self- refresh [16]. Transitions into and out of self-
refresh can be completed in less than a microsecond [8].
Mass Storage: Solid State Disks. Solid state disks draw
negligible power when idle, and, hence, do not need to tran-
sition to a sleep state for PowerNap. A recent 64GB Sam-
sung SSD consumes only 0.32W while idle [22].
Network Interface: Wake-on-LAN. The key responsibility
PowerNap demands of the network interface card (NIC) is
to wake the system upon arrival of a packet. Existing NICs
already provide support for Wake-on-LAN to perform this
function. Current implementations of Wake-on-LAN pro-
vide a mode to wake on any physical activity. This mode
forms a basis for PowerNap support. Current NICs consume
only 400mW while in this mode [24].
Environmental Monitoring & Service Processors: Power-
Nap transition management. Servers typically include
additional circuitry for environmental monitoring, remote
management (e.g., remote power on), power capping, power
regulation, and other functionality. These components typ-
ically manage ACPI state transitions and would coordinate
PowerNap transitions. A typical service processor draws less
than 10mW when idle.
Fans: Variable Speed Operation. Fans are a dominant
power consumer in many recent servers. Modern servers
employ variable-speed fans where cooling capacity is con-
stantly tuned based on observed temperature or power draw.
Fan power requirements typically grow cubically with aver-
age power. Thus, PowerNap’s average power savings yield
massive reductions in fan power requirements. In most blade
designs, cooling systems are centralized in the blade chassis,
amortizing their energy cost over many blades. Because ther-
mal conduction progresses at drastically different timescales
than PowerNap’s transition frequency, chassis-level fan con-
trol is independent of PowerNap state (i.e., fans may con-
tinue operating during nap and may spin down during active
operation depending on temperature conditions).
Power Provisioning: RAILS. PowerNap fundamentally al-
ters the range of currents over which a blade chassis must ef-
ficiently supply power. In Section 5, we explain why conven-
tional power delivery schemes are unable to provide efficient
AC to DC conversion over this range, and present RAILS,
our power conversion solution.
4.2 Software Mechanisms
For schemes like PowerNap, the periodic timer interrupt
used by legacy OS kernels to track the passage of time and
implement software timers poses a challenge. As the timer
interrupt is triggered every 1ms, conventional OS time keep-
ing precludes the use of PowerNap. The periodic clock tick
also poses a challenge for idle-power conservation on lap-
tops and for virtualization platforms that consolidate hun-
dreds of OS images on a single hardware platform. Hence,
the Linux kernel has recently been enhanced to support
“tickless” operation, where the periodic timer interrupt is es-
chewed in favor of hardware timers for scheduling and time
keeping [23]. PowerNap depends on a kernel that provides
tickless operation.
PowerNap’s effectiveness increases with longer idle periods
and less frequent state transitions. Some existing hardware
devices (e.g., legacy keyboard controllers) require polling
to detect input events. Current operating systems often per-
form maintenance tasks (e.g., flushing disk buffers, zeroing
memory) when the OS detects significant idle periods. These
maintenance tasks may interact poorly with PowerNap and
can induce additional state transitions. However, efforts are
already underway (e.g., as described in [23]) to redesign de-
vice drivers and improve background task scheduling.
5. RAILS
AC to DC conversion losses in computer systems have re-
cently become a major concern, leading to a variety of re-
search p roposals [7, 1 5], pro duct announ cements (e.g., HP’s
Blade System c7000), and standardization efforts [5] to im-
prove power supply efficiency. The concern is particularly
acute in data centers, where each watt wasted in the power
delivery infrastructure implies even more loss in cooling.
Because PowerNap’s power draw is substantially lower than
the idle power in conventional servers, PowerNap demands
conversion efficiency over a wide power range, from as few
as 300W to as much as 7.2kW in a fully-populated enclo-
sure.
In this section, we discuss why existing power solutions are
inadequate for PowerNap and present RAILS, our power
solution. RAILS provides high conversion efficiency across
PowerNap’s power demand spectrum, provides N+1 redun-
dancy, allows for graceful degradation of compute capacity
when PSUs fail, and minimizes costs by using commodity
PSUs in an efficient arrangement.
5.1 Power Supply Unit background
Poor Efficiency at Low Loads. Although manufacturers of-
ten report only a single efficiency value, most PSUs do not
have a constant efficiency across electrical load. A recent
survey of server and desktop PSUs reported their efficiency
across loads [5]. Figure 7 reproduces the range of efficien-
cies reported in that study. Though PSUs are often over 90%
100%80%60%40%20%
50%
60%
70%
80%
90%
Load (%)
Efficiency (%)
RED YELLOW GREEN
100%
Figure 7: Power Supply Efficiency.
efficient at their optimal operating point (usually near 75%
load), efficiency drops off rapidly below 40% load, some-
times dipping below 50% (i.e., >2W in for 1W out). We
divide the operating efficiency of power supplies into three
zones based on electrical load. Above 40% load, the PSUs
operate in the “green” zone, where their efficiency is at or
above 80%. In the 20-40% “yellow” zone, PSU efficiency
begins to drop, but typically exceeds 70%. However, in the
“red” zone below 20%, efficiency drops off precipitously.
Two factors cause servers to frequently operate in the “yel-
low” or “red” efficiency zones. First, servers are highly con-
figurable, which leads to a large range of power require-
ments. The same server model might be sold with only one
or as many as 20 disks installed, and the amount of installed
DRAM might vary by a factor of 10. Furthermore, peripher-
als may be added after the system is assembled. To simplify
ordering, upgrades, testing, and safety certification, manu-
facturers typically install a power supply rated to exceed the
power requirements of the most extreme configuration. Sec-
ond, servers are often configured with 2N redundant power
supplies (i.e., twice as many as are required for a worst-case
configuration). The redundant supplies typically share the
electrical load to minimize PSU temperature and to ensure
current flow remains uninterrupted if a PSU fails. However,
the EPRI study [5] concluded that this load-sharing arrange-
ment often shifts PSUs from “yellow”-zone to “red”-zone
operation.
Recent Efficiency Improvements. A variety of recent ini-
tiatives seek to improve server power efficiency:
80+ certification. The EPA Energy Star program has
defined the “80+” certification standard [26] to incen-
tivize PSU manufacturers to improve efficiency at low
loads. The 80+ incentive program is primarily targeted at
the low-peak-power desktop PSU market. 80+ supplies
require considerably higher design complexity than con-
ventional PSUs, which may pose a barrier to widespread
adoption in the reliability-conscious server PSU market.
Furthermore, despite their name, the 80+ specification
does not require energy efficiency above 80% across all
loads, rather, only within the typical operating range of
conventional systems. This specified efficiency range is
not wide enough for PowerNap.
Single voltage supplies. Unlike desktop machines, which
require five different DC output voltages to support
legacy components, server PSUs typically provide only
a single DC output voltage, simplifying their design and
improving reliability and efficiency [7]. Although Power-
Nap benefits from this feature, a single output voltage
does not directly address inefficiency at low loads.
DC distribution. Recent research [7] has called for dis-
tributing DC power among data center racks, eliminating
AC-to-DC conversion efficiency concerns at the blade en-
closure level. However, the efficiency advantages of DC
distribution are unclear [21] and deploying DC power
will require multi-industry coordination.
Dynamic load-sharing. Blade enclosures create a fur-
ther opportunity to improve efficiency through dynamic
load-sharing. HP’s Dynamic Power Saver [15] feature
in the HP Blade Center c7000 employs up to six high-
efficiency 2.2kW PSUs in a single enclosure, and dy-
namically varies the number of PSUs that are engaged,
ensuring that all active supplies operate in their “green”
zone while maintaining redundancy. Although HP’s so-
lution is ideal for the idle and peak power range of the
c-class blades, it requires expensive PSUs and provides
insufficient granularity for PowerNap.
While all these solutions improve efficiency for their target
markets, none achieve all our goals of efficiency for Power-
Nap, redundancy, and low cost.
5.2 RAILS Design
We introduce a new power delivery solution tuned for
PowerNap: the Redundant Array for Inexpensive Load Shar-
ing (RAILS). The central idea of our scheme is to load-
share over multiple inexpensive, small PSUs to provide the
efficiency and reliability of larger, more expensive units.
Through intelligent sizing and load-sharing, we ensure that
active PSUs operate in their efficiency sweet spots. Our
scheme provides 80+ efficiency and enterprise-class redun-
dancy with commodity components.
RAILS targets three key objectives: (1) efficiency across the
entire PowerNap dynamic power range; (2) N+1 reliability
and graceful degradation of compute capacity under multiple
PSU failure; and (3) minimal cost.
Figure 8 illustrates RAILS. As in conventional blade en-
closures, power is provided by multiple PSUs connected in
IC
...
PSU 1 PSU 2 PSU 3 PSU N+1
...
I1Iin
Iout
I2
Load1Load2Load3LoadN
I3=0
Figure 8: RAILS PSU Design.
parallel. A conventional load-sharing control circuit contin-
uously monitors and controls the PSUs to ensure load is di-
vided evenly among them. As in Dynamic Smart Power [15],
RAILS disables and electrically isolates PSUs that are not
necessary to supply the load. However, our key departure
from prior designs is in the granularity of the individual
PSUs. We select PSUs from the economic sweet spot of the
high-sales-volume market for low-wattage commodity sup-
plies.
We choose a power supply granularity to satisfy two criteria:
(1) A single supply must be operating in its “green” zone
when all blades are napping. This criterion establishes an
upper bound on the PSU capacity based on the minimum
chassis power draw when all blades are napping. (2) Subject
to this bound, we size PSUs to match the incremental power
draw of activating a blade. Thus, as each blade awakens, one
additional PSU is brought on line. Because of intelligent
sizing, each of these PSUs will operate in their optimal
efficiency region. Whereas current blade servers use multi-
kilowatt PSUs, a typical RAILS PSU might supply 500W.
RAILS meets its cost goals by incorporating high-volume
commodity components. Although the form-factor of com-
modity PSUs may prove awkward for rack-mount blade en-
closures, precluding the use of off-the-shelf PSUs, the power
density of high-sales-volume PSUs differs little from high-
end server supplies. Hence, with appropriate mechanical
modifications, it is possible to pack RAILS PSUs in roughly
the same physical volume as conventional blade enclosure
power systems.
RAILS meets its reliability goals by providing fine-grain
degradation of the system’s peak power capacity as PSUs
fail. In any N+1 design, the first PSU failure does not af-
fect compute capacity. However, in conventional blade en-
closures, a subsequent failure may force shutdown of several
(possibly all) blades. Multiple-failure tolerance typically re-
quires 2N redundancy, which is expensive. In contrast, in
RAILS, where PSU capacity is matched to the active power
0 500 1000 1500 2000 2500
0
10
20
30
40
Maximum Output (W)
$/Watt
Commodity
80+
Blade
Figure 9: Power Supply Pricing.
draw of a single blade, the second and subsequent failures
each require the shutdown of only one blade.
5.3 Evaluation
We evaluate the power efficiency and cost of PowerNap with
four power supply designs, commodity supplies (“Commod-
ity”), high-efficiency 80+ supplies (“80+”), dynamic load
sharing (“Dynamic”), and RAILS (“RAILS”). We evalu-
ate all four designs in the context of a PowerNap-enabled
blade system similar to HP’s Blade Center c7000. We as-
sume a fully populated chassis with 16 half-height blades.
Each blade consumes 450W at peak, 270W at idle without
PowerNap, and 10.4W in PowerNap (see Table 4). We as-
sume the blade enclosure draws 270W (we neglect any vari-
ation in chassis power as a function of the number of active
blades). The non-RAILS systems employ 4 2250W PSUs
(sufficient to provide N+1 redundancy). The RAILS design
uses 17 500W PSUs. We assume the average efficiency char-
acteristic from Figure 7 for commodity PSUs.
Cost. Server components are sold in relatively low vol-
umes compared to desktop or embedded products, and thus,
command premium prices. Some Internet companies (e.g.,
Google), have eschewed enterprise servers and instead as-
semble systems from commodity components to avoid these
premiums. PSUs present another opportunity to capitalize
on low-cost commodity components. Because desktop ATX
PSUs are sold in massive volumes, their constituent compo-
nents are cheap. A moderately-sized supply can be obtained
at extremely low cost. Figure 9 shows a survey of PSU prices
in Watts per dollar for a wide range of PSUs across market
segments. Price per Watt increases rapidly with power deliv-
ery capacity. This rise can be attributed to the proportional
increase in required size for power components such as in-
ductors and capacitors. Also, the price of discrete power
components grows with size and maximum current rating.
Presently, the market sweet spot is around 500W supplies.
Both 80+ and blade server PSUs are substantially more ex-
Table 5: Relative PSU Density.
microATX ATX Custom Blade
Density (Normalized W/vol.) 675.5 1000 1187
pensive than commodity parts. Because RAILS uses com-
modity PSUs with small maximum outputs, it takes advan-
tage of PSU market economics, making RAILS far cheaper
than proprietary blade PSUs.
Power Density. In data centers, rack space is at a premium,
and, hence, the physical volume occupied by a blade en-
closure is a key concern. RAILS drastically increases the
number of distinct PSUs in the enclosure, but each PSU is
individually smaller. To confirm the feasibility of RAILS,
we have compared the highest power density available in
commodity PSUs, which conform to one of several stan-
dard form-factors, with that of PSUs designed for blade cen-
ters, which may have arbitrary dimensions. Table 5 com-
pares the power density of two commodity form factors with
the power density of HP’s c7000 PSUs. We report density
in terms of Watts per unit volume normalized to the volume
of one ATX power supply. The highly-compact microATX
form factor exhibits the worst power density—these units
have been optimized for small dimensions but are employed
in small form-factor devices that do not require high peak
power. Though they are not designed for density, commod-
ity ATX supplies are only 16% less dense than enterprise-
class supplies. Furthermore, as RAILS requires only a single
output voltage, eliminating the need for many of a standard
ATX PSU’s components, we conclude that RAILS PSUs fit
within blade enclosure volumetric constraints.
Power Savings and Energy Efficiency. To evaluate each
power system, we calculate expected power draw and con-
version efficiency across blade ensemble utilizations. As
noted in Section 2, low average utilization manifests as brief
bursts of activity where a subset of blades draw near-peak
power. The efficiency of each power delivery solution de-
pends on how long blades are active and how many are
simultaneously active. For each utilization, we construct a
probability mass function for the number of simultaneously
active blades, assuming utilization across blades is uncorre-
lated. Hence, the number of active blades follows a bino-
mial distribution. From the distribution of active blades, we
compute an expected power draw and determine conversion
losses from the power supply’s efficiency- versus-load curve.
We obtain efficiency curves from the Energy Star Bronze
80+ specification [26] for 80+ PSUs and [5] for commodity
PSUs.
Figure 10 compares the relative efficiency of PowerNap un-
der each power delivery solution. Using commodity (“Com-
modity”) or high efficiency (“80+”) PSUs results in the low-
est efficiency, as PowerNap’s low power draw will operate
these power supplies in the “Red” zone. RAILS (“RAILS”)
0 20 40 60 80 100
60
65
70
75
80
85
90
Utilization
% Efficiency
Commodity
80+
Dynamic
RAILS
Figure 10: Power Delivery Solution Comparison.
and Dynamic Load-Sharing (“Dynamic”) both improve
PSU performance because they increase average PSU load.
RAILS outperforms all of the other options because its fine-
grain sizing best matches PowerNap’s requirements.
6. Conclusion
We presented PowerNap, a method for eliminating idle
power in servers by quickly transitioning in and out of
an ultra-low power state. We have constructed an analytic
model to demonstrate that, for typical server workloads,
PowerNap far exceeds DVFS’s power savings potential with
better response time. Because of PowerNap’s unique power
requirements, we introduced RAILS, a novel power delivery
system that improves power conversion efficiency, provides
graceful degradation in the event of PSU failures, and re-
duces costs.
To conclude, we present a projection of the effectiveness
of PowerNap with RAILS in real commercial deployments.
We construct our projections using the commercial high-
density server utilization traces described in Table 1. Ta-
ble 6 presents the power requiremen ts, energy-conversion ef-
ficiency and total power costs for three server configurations:
an unmodified, modern blade center such as the HP c7000;
a PowerNap-enabled system with large, conventional PSUs
(“PowerNap”); and PowerNap with RAILS. The power costs
include the estimated purchase price of the power delivery
system (conventional high-wattage PSUs or RAILS), 3-year
power costs assuming California’s commercial rate of 11.15
cents/kWh [28], and a cooling burden of 0.5W per 1W of IT
equipment [18].
PowerNap yields a striking reduction in average power rela-
tive to Blade of nearly 70% for Web 2.0 servers. Improving
the power system with RAILS shaves another 26%. Our total
power cost estimates demonstrate the true value of Power-
Nap with RAILS: our solution provides power cost reduc-
tions of nearly 80% for Web 2.0 servers and 70% for Enter-
prise IT.
Table 6: Power and Cost Comparison.
Web 2.0 Enterprise
Power Efficiency Power costs Power Efficiency Power costs
Blade 6.4 kW 87% $29k 6.6 kW 87% $30k
PowerNap 1.9 kW 67% $10k 2.6 kW 70% $13k
PowerNap with RAILS 1.4 kW 86% $6k 2.0 kW 86% $9k
Acknowledgements
The authors would like to thank Partha Ranganathan and HP
Labs for the real- world data center utilization traces, An-
drew Caird and the staff at the Michigan Academic Com-
puter Center for assistance in collecting the Cluster utiliza-
tion trace, Laura Falk for assistance in collecting the depart-
mental server utilization traces, Mor Harchol-Balter for her
input on our queuing models, and the anonymous reviewers
for their feedback. This work was supported by an equip-
ment grant from Intel, and NSF grant CCF-0811320.
References
[1] L. Barroso and U. H ¨olzle, “The case for energy-proportional
computing,IEEE Computer, Jan 2007.
[2] C. Bash and G. Forman, “Cool job allocation: Measuring the power
savings of placing jobs at cooling-efficient locations in the data
center,” in Proc. of the 2007 USENIX Annual Technical Conference,
Jan 2007.
[3] P. Bohrer, E. Elnozahy, T. Keller, M. Kistler, C. Lefurgy, and
R. Rajamony, “The case for power management in web servers,
Power Aware Computing, Jan 2002.
[4] J. Chase, D. Anderson, P. Thakar, and A. Vahdat, “Managing energy
and server resources in hosting centers,” in Proc. of the 18th ACM
Symposium on Operating Systems Principles, Jan 2001.
[5] ECOS and EPR, “Efficient power supplies for data center,” ECOS
and EPR, Tech. Rep., Feb. 2008.
[6] X. Fan, W.-D. Weber, and L. A. Barroso, “Power provisioning for a
warehouse-sized computer,” in Proc. of the 34th Annual International
Symposium on Computer Architecture, 2007.
[7] U. H ¨olzle and B. Weihl, “PSU white paper,” Google, Tech. Rep., Sep
2006.
[8] Hynix, “Hynix-DDR2-1Gb,” Aug 2008.
[9] Intel, “Intel Pentium M processor with 2-MB L2 cache and 533-MHz
front side bus,” Jul 2005.
[10] Intel, “Intel Pentium dual-core mobile processor,” Jun 2007.
[11] Intel, “Quad-core Intel Xeon processor 5400 series,” Apr 2008.
[12] J. Laudon, “UltraSPARC T1: A 32-threaded CMP for servers,
Invited talk, Apr 2006.
[13] C. Lefurgy, X. Wang, and M. Ware, “Server-level power control,”
in Proc. of the IEEE International Conference on Autonomic
Computing, Jan 2007.
[14] C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and
T. W. Keller, “Energy management for commercial servers,” IEEE
Computer, vol. 36, no. 12, 2003.
[15] K. Leigh and P. Ranganathan, “Blades as a general-purpose
infrastructure for future system architectures: Challenges and
solutions,” HP Labs, Tech. Rep. HPL-2006-182, Jan 2007.
[16] Micron, “DDR2 SDRAM SODIMM,” Jul 2004.
[17] A. Miyoshi, C. Lefurgy, E. V. Hensbergen, R. Rajamony, and
R. Rajkumar, “Critical power slope: understanding the runtime
effects of frequency scaling,” in Proc. of the 16th International
Conference on Supercomputing, Jan 2002.
[18] J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making
scheduling ‘cool’: Temperature-aware workload placement in data
centers,” in Proc. of the 2005 USENIX Annual Technical Conference,
Jan 2005.
[19] National Semiconductor, “Introduction to power supplies,” National
Semiconductor, Tech. Rep. AN-556, 2002.
[20] P. Padala, X. Zhu, Z. Wanf, S. Singhal, and K. Shin, “Performance
evaluation of virtualization technologies for server consolidation,”
HP Labs, Tech. Rep. HPL-2007-59, 2007.
[21] N. Rasmussen, “AC vs. DC power distribution for data centers,”
American Power Conversion, Tech. Rep. #63, 2007.
[22] Samsung, “SSD SATA 3.0Gbps 2.5 data sheet,” Mar 2008.
[23] S. Siddha, V. Pallipadi, and A. V. D. Ven, “Getting maximum mileage
out of tickless,” in Proc. of the 2007 Linux Symposium, 2007.
[24] SMSC, “LAN9420/LAN9420i single-chip ethernet controller with
HP Auto-MDIX support and PCI interface,” 2008.
[25] N. Tolia, Z. Wang, M. Marwah, C. Bash, P. Ranganathan, and X. Zhu,
“Delivering energy proportionality with non energy-proportional
systems – optimizing the ensemble,” in Proc. of the 1st Workshop on
Power Aware Computing and Systems (HotPower ’08), Dec 2008.
[26] U.S. EPA, “Energy Star computer specification v. 4.0,” U.S.
Environmental Protection Agency, Tech. Rep., July 2007.
[27] U.S. EPA, “Report to congress on server and data center energy
efficiency,” U.S. Environmental Protection Agency, Tech. Rep., Aug.
2007.
[28] U.S. Official Information Administration, “Average retail price of
electricity to ultimate customers by end-use sector, by state,” Jul
2008.
[29] P. D. Welch, “On a generalized M/G/1 queuing process in which
the first customer of each busy period receives exceptional service,
Operations Research, vol. 12, pp. 736–752, 1964.
[30] Q. Wu, P. Juang, M. Martonosi, L. Peh, and D. Clark, “Formal control
techniques for power-performance management,IEEE Micro, no. 5,
Jan. 2005.
... The typical draw is then divided by 0.66 to obtain a maximum power draw of 750 W [38]. To adjust for the efficiency increase in hardware from 2005 to 2011, the maximum power draw is set to an educated guess of 500 W. The power consumption of idle machines is then set to 60% of the maximum power draw [39], at 300 watts. Power usage effectiveness (PUE) is a metric often used to measure the efficiency of data centers [40], by comparing total power consumption with IT power consumption. ...
... To address the high power consumption of idle machines, a secondary low-power state is introduced, which will give the servers running no tasks the option to "shut down", while still being available. In this state, the server can wake up and start processing quickly if new processing tasks arrive [39]. In this "nap"-state, the servers consume 10 W [39]. ...
... In this state, the server can wake up and start processing quickly if new processing tasks arrive [39]. In this "nap"-state, the servers consume 10 W [39]. ...
Article
Full-text available
The growing number of data centers consumes a vast amount of energy for processing. There is a desire to reduce the environmental footprint of the IT industry, and one way to achieve this is to use renewable energy sources. A challenge with using renewable resources is that the energy output is irregular as a consequence of the intermittent nature of this form of energy. In this paper, we propose a simple and yet efficient latency-aware workload scheduler that creates an energy-agile workload, by deferring tasks with low latency sensitivity to periods with excess renewable energy. The scheduler also increases the overall efficiency of the data center, by packing the workload into as few servers as possible, using neural-network-based predictions of resource usage on an individual task basis to avoid unnecessarily provisioning an excess number of servers. The scheduler was tested on a subset of real-world workload traces, and real-world wind-power generation data, simulating a small-scale data center co-located with a wind turbine. Extensive experimental results show that the devised scheduler reduced the number of servers doing work in periods of low wind-power production up to 93% of the time, by postponing tasks with a low latency sensitivity to a later interval.
... This method places VMs on as few physical machines (PMs) as possible , and PM without any VMs is turned into sleep mode to reduce energy consumption. A research by [3]. But VM consolidation technology should consider Quality of Service (QoS). ...
... 2. Overload detection: decide whether a host is overloaded. 3. VM selection: select VMs to be migrated for host detected overloaded. 4. VM placement: decide a placement mapping of migrating VMs and hosts. ...
... LRR-1.0 and MHOD-0. 3 ...
Article
Full-text available
With the increasing use of cloud computing, high energy consumption has become one of the major challenges in cloud data centers. Virtual Machine (VM) consolidation has been proven to be an efficient way to optimize energy consumption in data centers, and many research works have proposed to optimize VM consolidation. However, the performance of different algorithms is related with the characteristics of the workload and system status; some algorithms are suitable for Central Processing Unit (CPU)-intensive workload and some for web application workload. Therefore, an adaptive VM consolidation framework is necessary to fully explore the potential of these algorithms. Neat is an open-source dynamic VM consolidation framework, which is well integrated into OpenStack. However, it cannot conduct dynamic algorithm scheduling, and VM consolidation algorithms in Neat are few and basic, which results in low performance for energy saving and Service-Level Agreement (SLA) avoidance. In this paper, an Intelligent Neat framework (I-Neat) is proposed, which adds an intelligent scheduler using reinforcement learning and a framework manager to improve the usability of the system. The scheduler can select appropriate algorithms for the local manager from an algorithm library with many load detection algorithms. The algorithm library is designed based on a template, and in addition to the algorithms of Neat, I-Neat adds six new algorithms to the algorithm library. Furthermore, the framework manager helps users add self-defined algorithms to I-Neat without modifying the source code. Our experimental results indicate that the intelligent scheduler and these novel algorithms can effectively reduce energy consumption with SLA assurance.
... Dentre as abordagens já existentes para tratar o monitoramento termoenergético em CPDs predomina a característica delas serem altamente intrusivas (e.g. [5]), ou seja, não são transparentes para as aplicações que rodam nos servidores. Esse é o caso das políticas de gerenciamento de energia para CPDs que empregam alteração de frequência de operação do processador, ou desligamento de partes do hardware, por exemplo [6], [7]. ...
Conference Paper
Full-text available
Resumo-A sociedade é cada vez mais dependente de grandes centros de processamento de dados (CPDs). Sistemas de busca, sistemas de comércio eletrônico e computação em nuvem são exemplos de aplicações que necessitam de grandes CPDs. Um dos grandes desafios da Computação Verde é conciliar a demanda computacional desses centros com a necessidade de reduzir o seu custo de operação e o impacto ambiental causado pelo seu alto consumo de energia. Entre os principais responsáveis por esse consumo está o sistema de arrefecimento dessas instalações. Para aumentar a eficiência desses sistemas de arrefecimento primeiramente precisa-se monitorá-los com o objetivo de obter informações sobre o seu comportamento. Para fazer isso de uma maneira indireta e que seja o menos intrusiva possível para a instalação, propõe-se uma rede de sensores sem fio para monitoramento térmico distribuída pelo ambiente. A rede foi implantada em um CPD real e tem uma infraestrutura que permite a ela ser capaz de prover em tempo real os dados térmicos da instalação, com garantia de entrega acima de 99% e com precisão de milissegundos do instante de coleta. I. INTRODUÇÃO As Tecnologias de Informação e Comunicação (TIC) con-tribuem diretamente para mais de 2% das emissões globais de CO 2. A tendência é que esta quantidade dobre até 2020 [1]. Neste cenário, as TIC ultrapassariam emissões de indústrias altamente poluentes, como a de aviação. Um estudo realizado pela U.S. Environmental Protection Agency [2] em 2007 es-timou a demanda de pico desses sistemas em 7 gigawatts. Os CPDs de todo mundo emitiram aproximadamente 116 milhões de toneladas anuais de carbono até 2008, mais que toda emissão da Nigéria [3]. Por conta do calor dissipado pelos diversos equipamentos presentes em um CPD, os servidores precisam de refrigeração adequada para operar de forma confiável. Assim, sistemas de arrefecimento em muitos CPDs são ajustados para operarem em temperaturas muito baixas. Grandes clusters comerciais requerem milhares de processadores e uma grande área para a sua instalação, e na maioria das vezes a instalação, os servidores e a distribuição de carga são heterogêneas. Desta forma, surge naturalmente um desbalanceamento térmico em diversas localidades do CPD. Esse tipo de fenômeno (i.e. ilhas de calor) é bem complexo e difícil de prever sem uma quanti-dade de informações precisas da instalação. Estima-se que para cada watt gasto com o processamento de dados, outro watt é consumido com arrefecimento [4]. Além disso, os gerentes de CPDs, por falta de informações para diagnosticar a causa real do problema, tendem a aumentar ainda mais a potência do sistema de arrefecimento quando alguns dos servidores se encontram próximos a um limiar térmico operacional. Um sistema de monitoramento baseado em redes de sen-sores sem fio (RSSF) tem a principal vantagem de ser inde-pendente do ambiente a ser monitorado (não-intrusivo), pois a alimentação dos dispositivos é autônoma, através de bate-rias. Adicionalmente, graças ao baixo consumo, apresenta um grande tempo de vida. Em CPDs, principalmente aqueles que lidam com informações sensíveis (e.g., CPDs de instituições financeiras), a baixa intrusão é condição fundamental para a adoção de qualquer solução de monitoramento. RSSFs também operam de forma eficiente e genérica independentemente das características do CPD monitorado, superando inúmeras restrições que podem ser encontradas em CPDs (i.e. acessibilidade, interação). Além disso, uma RSSF possibilita uma boa granularidade nos dados coletados independentemente do estado dos servidores do CPD (e.g. ligado, dormindo ou desligado). O ambiente do CPD configura uma ampla fonte de inter-ferências a qualquer tipo de comunicação sem fio, principal-mente em dispositivos de baixa potência, como os utilizados neste trabalho. Além disso existe a problemática comum a todas RSSF que possuem alimentação por meio de baterias, isto é, baixa longevidade dos nós. Por fim, há a necessidade de garantir uma precisão temporal do instante em que os dados foram coletados. Nesse contexto, o objetivo deste trabalho é investigar a viabilidade de uma rede de sensores sem fio para monito-ramento térmico de CPDs. Para tanto, utilizou-se uma rede de sensores funcional instalada em um CPD real que coletou informações térmicas durante a operação normal da instalação. A infraestrutura da rede é baseada na agregação de tecnologias disponíveis para RSSF. A escolha das tecnologias priorizou ampla utilização e bom suporte da comunidade desenvolve-dora. Essa solução é genérica, pois pode atender uma grande variedade de tipos de CPDs pelo fato de não necessitar de nenhuma interação direta com os sistemas da instalação. A rede proposta é capaz de coletar dados térmicos relativos ao ambiente do CPD com precisão de milissegundos. Nas próximas seções são apresentados alguns trabalhos relacionados que utilizam RSSF como solução para o problema de monitoramento térmico de CPDs. Na sequência são expos-tos a topologia da rede, seus componentes e configurações,
... .2. A normal server needs about 500 W[121], so 238 RPi2s take as much power on one server.Table 10.2: Power consumption observed using a multimeter for several benchmarks of a single RPi2. ...
Thesis
Full-text available
Nowadays businesses, governments and industries rely heavily on ICT solutions. Since these ICT solutions often have high space, security, availability and performance requirements, data centres provide physical locations to facilitate networks of servers for processing and storage purposes of these ICT solutions. The increasing worldwide energy consumption of data centres has a significant impact on the world's ecosystem through an increase in greenhouse gases for the generation of necessary electricity. This has led to an increased attention in the global political agenda. Moreover, the high energy consumption in data centres has also led to high operational costs with the consequence that even the smallest improvements in currently active systems could significantly ease the financial burden. These reasons have led to a greater need for energy-efficient data centres. In this thesis, we propose that model-based analysis of power and performance can assist energy saving techniques with meaningful insights in data centres that strive for energy-efficiency. Therefore, two sets of power and performance models for energy-efficient data centres are proposed and analysed. We show that exchanging power at the expense of performance caused by energy saving techniques can lead to so-called power-performance trade-offs, which offers additional flexibility to data centre design. Consequently, power management is studied by modelling this feature and proposing an evaluation method for power management strategies. Also, the potential of combining power management with advanced cooling is analysed, in order to save even more energy. To determine the degree to which the models correspond to the real world, our models are experimentally validated. For this reason, the simulation models are calibrated with parameters obtained through workload modelling using workload traces from a real data centre. Moreover, a cross-model validation is used to compare power and performance estimates of the same system with two different modelling and analysis techniques. Furthermore, we propose an experimental micro data centre and compare it with a real data centre and apply the experimental setup for validation purposes.
... This aimed to achieve a trade-off between power consumption and latency with a lower overhead. The study presented in [20] has introduced an approach in which nodes enter a sleep state once all pending tasks have been processed. When a new task arrives, only then do the sleeping components transition back to their active state. ...
Article
Full-text available
Fog computing could potentially cause the next paradigm shift by extending cloud services to the edge of the network, bringing resources closer to the end-user. With its close proximity to end-users and its distributed nature, fog computing can significantly reduce latency. With the appearance of more and more latency-stringent applications, in the near future, we will witness an unprecedented amount of demand for fog computing. Undoubtedly, this will lead to an increase in the energy footprint of the network edge and access segments. To reduce energy consumption in fog computing without compromising performance, in this paper we propose the Green-Demand-Aware Fog Computing (GDAFC) solution. Our solution uses a prediction technique to identify the working fog nodes (nodes serve when request arrives), standby fog nodes (nodes take over when the computational capacity of the working fog nodes is no longer sufficient), and idle fog nodes in a fog computing infrastructure. Additionally, it assigns an appropriate sleep interval for the fog nodes, taking into account the delay requirement of the applications. Results obtained based on the mathematical formulation show that our solution can save energy up to 65% without deteriorating the delay requirement performance.
Chapter
In recent decades, massive growth has led to high use of natural resources in global economization and acceleration in global transition. In the meantime, computer technology has become a standard human social lifestyle feature. The I.T. sector's energy use has also risen significantly as machine and I.T. services are widely attractive. Cloud computing is probably the most thrilling I.C.T. platform and nearly every web user has used it directly or indirectly. Its ultra‐scale size contains enormous data centers with several thousand additional server support equipment. The proportion of electricity used by infrastructures is 1.1% to 1.5% of the entire electricity production. It can expand even further. We also discern recent cloud computing advances in terms of the energy efficiency of the networks. In the literature and expertise covering servers, networks, cloud management platforms, and computers, including end‐user apps, we discuss government‐of‐the‐art strategies. This chapter defines the profits and trade‐offs when adopting energy conservation measures and ultimately adopting challenges and outlines recommendations for analysis in the future.
Article
Due to the rapid increase in the number and scale of data centers, the information and communication technology (ICT) equipment in data centers consumes an enormous amount of power. A power prediction model is therefore essential for decision‐making optimization and power management of ICT equipment. However, it is difficult to predict the power consumption of data centers accurately due to the complex power patterns and nonlinear interdependencies among components. Existing methods either rely on standard formulas, or simply treat it as time series, both leading to poor power prediction accuracy. To overcome those limitations, in this article, we present a systematic power prediction framework called characteristic aware attention‐augmented deep learning‐based prediction method. In particular, we first analyze the different power consumption series to illustrate their different temporal characteristics. Second, we perform different data processing for the corresponding characteristics of power series samples. Third, we propose an accurate and efficient neural network model to predict future power consumption with the pretreated data. The experimental results show that the proposed model is able to achieve superior prediction accuracy.
Article
Full-text available
This report was prepared in response to the request from Congress stated in Public Law 109-431 (H.R. 5646),"An Act to Study and Promote the Use of Energy Efficient Computer Servers in the United States." This report assesses current trends in energy use and energy costs of data centers and servers in the U.S. (especially Federal government facilities) and outlines existing and emerging opportunities for improved energy efficiency. It also makes recommendations for pursuing these energy-efficiency opportunities broadly across the country through the use of information and incentive-based programs.
Article
Full-text available
Server consolidation has become an integral part of IT planning to reduce cost and improve efficiency in today's enterprise datacenters. The advent of resource virtualization allows consolidation of multiple applications into virtual servers hosted on a single or multiple physical servers. However, despite its salient features, this poses new challenges, including selection of the right virtualization technology and consolidation configuration for a particular set of applications. In this paper, we evaluate two representative virtualization technologies, Xen and OpenVZ, in various configurations. We consolidate one or more multi-tiered systems onto one or two nodes and drive the system with an auction workload called RUBiS. We compare both technologies with a base system in terms of application performance, resource consumption, scalability, low-level system metrics like cache misses, and virtualization-specific metrics like Domain-0 consumption in Xen. Our experiments indicate that the average response time can increase by more than 400% in Xen and by a more modest 100% in OpenVZ as the number of application instances grows from one to four. This large discrepancy is found to come from the higher virtualization overhead in Xen, which is likely caused by higher L2 cache misses and larger number of misses per instruction. A similar trend is observed in the CPU consumptions of the virtual servers. We analyze the overhead with kernel-symbol-specific information generated by Oprofile and suggest possible remedies for these problems.
Article
Full-text available
Internet hosting centers serve multiple service sites from a common hardware base. This paper presents the design and implementation of an architecture for resource management in a hosting center operating system, with an emphasis on energy as a driving resource management issue for large server clusters. The goals are to provision server resources for co-hosted services in a way that automatically adapts to offered load, improve the energy efficiency of server clusters by dynamically resizing the active server set, and respond to power supply disruptions or thermal events by degrading service in accordance with negotiated Service Level Agreements (SLAs).Our system is based on an economic approach to managing shared server resources, in which services "bid" for resources as a function of delivered performance. The system continuously monitors load and plans resource allotments by estimating the value of their effects on service performance. A greedy resource allocation algorithm adjusts resource prices to balance supply and demand, allocating resources to their most efficient use. A reconfigurable server switching infrastructure directs request traffic to the servers assigned to each service. Experimental results from a prototype confirm that the system adapts to offered load and resource availability, and can reduce server energy usage by 29% or more for a typical Web workload.
Article
Full-text available
blades, c-Class, power Bladed servers are increasingly being adopted in enterprise data centers by virtue of the improved benefits they offer in form factor density, modularity, and more robust management for control and maintenance with respect to rack-optimized servers. In the future, such servers are likely to form the key foundational blocks for a variety of system architectures in future data centers. However, designing a commodity blade system environment that can serve as a general-purpose infrastructure platform for a wide variety of future system architectures poses several challenges. This paper discusses these challenges and presents some specific solutions in the context of the HP BladeSystem™ c-Class products. ABSTRACT Bladed servers are increasingly being adopted in enterprise data-centers by virtue of the improved benefits they offer in form factor density, modularity, and more robust management for control and maintenance with respect to rack-optimized servers. In the future, such servers are likely to form the key foundational blocks for a variety of system architectures in future data centers. However, designing a commodity blade system environment that can serve as a general-purpose infrastructure platform for a wide variety of future system architectures poses several challenges. This paper discusses these challenges and presents some specific solutions in the context of the HP BladeSystem™ c-Class products.
Conference Paper
Full-text available
Energy efficiency is becoming an increasingly important feature for both mobile and high-performance server systems. Most processors designed today include power management features that provide processor operating points which can be used in power management algorithms. However, existing power management algorithms implicitly assume that lower performance points are more energy efficient than higher performance points. Our empirical observations indicate that for many systems, this assumption is not valid.We introduce a new concept called critical power slope to explain and capture the power-performance characteristics of systems with power management features. We evaluate three systems - a clock throttled Pentium laptop, a frequency scaled PowerPC platform, and a voltage scaled system to demonstrate the benefits of our approach. Our evaluation is based on empirical measurements of the first two systems, and publicly available data for the third. Using critical power slope, we explain why on the Pentium-based system, it is energy efficient to run only at the highest frequency, while on the PowerPC-based system, it is energy efficient to run at the lowest frequency point. We confirm our results by measuring the behavior of a web serving benchmark. Furthermore, we extend the critical power slope concept to understand the benefits of voltage scaling when combined with frequency scaling. We show that in some cases, it may be energy efficient not to reduce voltage below a certain point.
Conference Paper
Full-text available
We present a technique that controls the peak power consumption of a high-density server by implementing a feedback controller that uses precise, system-level power measurement to periodically select the highest performance state while keeping the system within a fixed power constraint. A control theoretic methodology is applied to systematically design this control loop with analytic assurances of system stability and controller performance, despite unpredictable workloads and running environments. In a real server we are able to control power over a 1 second period to within 1 W. Additionally, we have observed that power over an 8 second period can be controlled to within 0.1 W. We believe that we are the first to demonstrate such precise control of power in a real server. Conventional servers respond to power supply constraint situations by using simple open-loop policies to set a safe performance level in order to limit peak power consumption. We show that closed-loop control can provide higher performance under these conditions and test this technique on an IBM BladeCenter HS20 server. Experimental results demonstrate that closed-loop control provides up to 82% higher application performance compared to open-loop control and up to 17% higher performance compared to a widely used ad-hoc technique.
Article
Now that the tickless(/dynticks) infrastructure is inte-grated into the base kernel, this paper talks about var-ious add on changes that makes tickless kernels more effective. Tickless kernel pose some hardware challenges that were primarily exposed by the requirement of continu-ously running per-CPU timer. We will discuss how this issue was resolved by using HPET in a new mode. Elim-inating idle periodic ticks causes kernel process sched-uler not do idle balance as frequently as it would do oth-erwise. We provide insight into how this tricky issue of saving power with minimal impact on performance, is resolved in tickless kernel. We will also look at the kernel and user level daemons and drivers, polling for things with their own timers and its side effect on overall system idle time, with sug-gestions on how to make these daemons and drivers tickless-friendly.
Article
The following generalization of the M/G/1 queue is considered. If a customer arrives when the server is busy, his service time has a distribution function, Gbx; while if he arrives when the server is idle, his service time has a different distribution function, Gex. Results are obtained that characterize the transient and asymptotic distributions of the queue size, waiting time, and waiting-plus-service time. These results are then applied to the special case of a queue with a single service time distribution function, but with additional independent delay times that have one distribution function for the customers arriving when the server is idle and another for the customers arriving when the server is busy.