ArticlePDF Available

Optimizing concurrency levels in the. net threadpool: A case study of controller design and implementation

Authors:

Abstract and Figures

This paper presents a case study of developing a hill climb-ing concurrency controller (HC 3) for the .NET ThreadPool. The intent of the case study is to provide insight into soft-ware considerations for controller design, testing, and imple-mentation. The case study is structured as a series of issues encountered and approaches taken to their resolution. Ex-amples of issues and approaches include: (a) addressing the need to combine a hill climbing control law with rule-based techniques by the use of hybrid control; (b) increasing the ef-ficiency and reducing the variability of the test environment by using resource emulation; and (c) effectively assessing design choices by using test scenarios for which the optimal concurrency level can be computed analytically and hence desired test results are known a priori. We believe that these issues and approaches have broad application to controllers for resource management of software systems.
Content may be subject to copyright.
Optimizing Concurrency Levels in the .NET ThreadPool: A
Case Study of Controller Design and Implementation
Joseph L. Hellerstein
Microsoft Developer Division
One Microsoft Way
Redmond, WA USA
joehe@microsoft.com
Vance Morrison
Microsoft Developer Division
One Microsoft Way
Redmond, WA USA
vancem@microsoft.com
Eric Eilebrecht
Microsoft Developer Division
One Microsoft Way
Redmond, WA USA
ericeil@microsoft.com
ABSTRACT
This paper presents a case study of developing a hill climb-
ing concurrency controller (HC3) for the .NET ThreadPool.
The intent of the case study is to provide insight into soft-
ware considerations for controller design, testing, and imple-
mentation. The case study is structured as a series of issues
encountered and approaches taken to their resolution. Ex-
amples of issues and approaches include: (a) addressing the
need to combine a hill climbing control law with rule-based
techniques by the use of hybrid control; (b) increasing the ef-
ficiency and reducing the variability of the test environment
by using resource emulation; and (c) effectively assessing
design choices by using test scenarios for which the optimal
concurrency level can be computed analytically and hence
desired test results are known a priori. We believe that these
issues and approaches have broad application to controllers
for resource management of software systems.
1. INTRODUCTION
Over the last decade, many researchers have demonstrated
the benefits of using control theory to engineer resource man-
agement solutions. Such benefits have been demonstrated
for controlling quality of service in web servers [15], regu-
lating administrative utilities in database servers [11], con-
trolling utilizations in real time systems [14], and optimizing
TCP/IP [7]. Despite these results and the availability of in-
troductory control theory texts for computing practitioners
(e.g., [6]), control theory is rarely used by software practi-
tioners. We believe that one reason for this is that deploying
closed loop systems for software products has a number of
challenges related to software design, testing, and implemen-
tation that are not considered in existing research publica-
tions. This paper provides insights into these considerations
through a case study of the development of a controller for
optimizing concurrency levels in the Microsoft .NET Com-
mon Language Runtime (CLR) ThreadPool.
The problem of concurrency management occurs frequently
in software systems. Examples include: determining the set
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
FeBID Workshop 2008 Annapolis, MD USA
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
0 10 20 30 40 50
0
20
40
60
80
100
120
140
160
Concurrency Level
Throughput
Figure 1: Concurrency-throughput curve for a synthetic work-
load. Throughput degrades if the concurrency level exceeds 20
due to the overhead of context switching.
.NET
ThreadPool
Controller
Concurrency
Level Throughput
(Completions/sec)
QueueUserWorkItem()
T TTT
Completions
Figure 2: Block diagram for controlling concurrency levels in
the .NET ThreadPool.
of active jobs in virtual memory systems, selecting the set
of active transactions in optimistic protocols for database
locking, and determining the number of nodes enabled for
transmission on a shared communications medium. Con-
currency management deals with the trade-off between (a)
increasing performance by having more activities happen-
ing concurrently and (b) reducing performance because of
interference between concurrent activities.
We use the term active set to refer to the collection of
activities that take place concurrently, and we use the term
concurrency level to refer to the size of the active set that
is specified by the concurrency controller. To illustrate the
trade-offs in concurrency management, consider the effect
on throughput as we increase the concurrency level of exe-
cuting threads for work that is 10ms of CPU time and 90ms
of wait time on a dual processor computer. As shown in the
concurrency-throughput curve in Figure 1, throughput ini-
tially increases with concurrency level since some threads in
the active set use the CPU while others in the active set are
waiting. However, when the concurrency level is too high,
throughput decreases because threads in the active set inter-
rupt one another causing context switching overheads. We
use the term thrashing to refer to situations in which such
interference occurs.
0 2000 4000 6000
0
10
20
30
40
50
60
Throughput / #Threads
#Threads: Fixed−Max
Throughput: Fixed−Max
Throughput: Automatic−Adjustment
Time (sec)
(a) Current Concurrency Controller
0 2000 4000 6000
0
10
20
30
40
50
60
Throughput / #Threads
Time (sec)
#Threads
Throughput
(b) HC3: Hill Climbing Concurrency Con-
troller
Figure 3: Performance of Concurrency Controllers for the CLR ThreadPool for a Dynamic Workload
Our focus is the CLR ThreadPool [12], a feature that is
widely used in servers running the Windows Operating Sys-
tem. The ThreadPool exposes an interface called
QueueUserWorkItem() whereby programmers place work into
a queue for asynchronous execution. The ThreadPool as-
signs work items to threads up to the concurrency level
specified by its control logic. Figure 2 depicts the closed
loop system used by the ThreadPool to dynamically adjust
the concurrency level to maximize throughput (measured in
work item completions per second) with the secondary ob-
jective of minimizing the number of threads executing so to
reduce overall resource consumption.
The .NET 3.5 ThreadPool concurrency controller (here-
after, current ThreadPool controller) is very effective
for short-running, CPU-intensive workloads. This is because
the current controller makes use of CPU utilizations in its
decision logic. Unfortunately, this information is less use-
ful (and maybe even counter-productive) for workloads that
are not CPU-intensive. For example, Figure 3(a) plots the
throughput of the current ThreadPool controller for a dy-
namically changing, non-CPU intensive workload. Two con-
trol policies are considered: automatic-adjustment, where
the controller tries to maximize throughput, and fixed-max,
where the controller maximizes the number of executing
threads up to a fixed maximum (and only injecting new
threads if the ThreadPool queue is non-empty). We see that
automatic-adjustment has very low throughput. Fixed-max
achieves much higher throughput, but also greatly increases
the number of threads and hence increases memory con-
tention.
These measurements motivate us to develop a new ap-
proach to concurrency management. This approach, the
hill climbing concurrency controller (HC3), uses hill
climbing to maximize throughputs to exploit the concave
structure of the concurrency-throughput curve as illustrated
in Figure 1. HC3differs from the current ThreadPool con-
troller in another way as well—it does not make use of CPU
utilizations. The rationale for this is that CPU is only one
of many resources consumed by work items. Further, the re-
lationships between resource utilizations, controller actions,
and work item throughputs are complex since resources may
be shared with non-ThreadPool threads.
There are two areas of work related to this paper. The
first is software considerations for controller design, test, and
implementation in software products. Unfortunately, there
are few reports of software products built using control engi-
neering. IBM’s DB2 v8.2 uses regulatory control to manage
background work [11]; IBM’s DB2 v9.1 employs a control
optimization technique to dynamically size buffer pools [5];
and Hewlett Packard’s Global Workload Manager uses su-
pervisory control to optimize performance for multi-tier ap-
plications [1]. But these papers focus almost exclusively on
control laws and their assessments, not on software consid-
erations for building closed loop systems. The ControlWare
framework [16] describes middleware for building controllers,
but it does not address controller design, testing, and im-
plementation.
A second area of related work is control engineering for
optimizing concurrency levels. An early example is [3], who
uses dynamic programming to minimize thrashing in a vir-
tual memory system based on information about virtual
memory and database resources. More recently, [4] uses
fuzzy control to optimize concurrency levels in a web server.
Unfortunately, neither approach addresses our requirements
in that the first uses knowledge of resource utilizations and
the second converges slowly. Beyond this, there are well
understood mathematical techniques for optimizing convex
functions with stochastics [13], although these techniques are
not prescriptive in that many engineering constants must be
determined.
This paper presents a case study of developing a hill climb-
ing concurrency controller (HC3) for the .NET ThreadPool.
Our purpose is to provide insight into controller design, test-
ing, and implementation. While HC3contains many inno-
vations, controller assessment is not the focus of this paper.
Rather, this paper presents a series of issues encountered and
approaches taken to their resolution. Figure 8 summarizes
the case study. Examples of issues and approaches include:
(a) addressing the need to combine a hill climbing control
law with rule-based techniques by the use of hybrid control;
(b) increasing the efficiency and reducing the variability of
the test environment by using resource emulation; and (c)
effectively assessing design choices by using test scenarios for
which the optimal concurrency level can be computed ana-
lytically and hence desired test results are known a priori.
We believe that these issues and approaches have broad ap-
plication to controllers for resource management of software
systems.
The remainder of this paper is organized as follows. Sec-
tion 2 discusses controller design, Section 3 addresses test-
ing, and Section 4 details implementation considerations.
Our conclusions and summary of the case study are con-
tained in Section 5.
2. CONTROLLER DESIGN
The primary objective of the ThreadPool controller is to
adjust the concurrency level (number of executing threads)
to maximize throughput as measured by work item comple-
tions per second. However, there are a number of secondary
objectives. First, if there are two concurrency levels that
produce the maximum throughput, we prefer a smaller con-
currency level to reduce memory contention. In addition the
controller should have: (a) short settling times so that cu-
mulative throughput is maximized, (b) minimal oscillations
since changing control settings incurs overheads that reduce
throughput, and (c) fast adaptation to changes in workloads
and resource characteristics.
We assume that the concurrency-throughput curve is con-
cave, as in Figure 1, and so our approach is based on hill
climbing. We assume that time is discrete, and is indexed
by kand m.kindexes the setting of concurrency level, and
mindexes the throughput value collected at the same con-
currency level. Let ukbe the concurrency level, and ykm be
the measured throughput. Then, our system model is:
ykm =f(uk, uk1,···, yk,m1,···, yk1,mj,···) + km
(1)
where fis concave and the km are i.i.d. with mean 0 and
variance σ2
. (The i.i.d. assumption is reasonable within a
modest range of concurrency levels.) We seek the optimal
concurrency level usuch that ∂f/∂u|u= 0. Unfortu-
nately, fis unknown, fchanges over time, and fcannot be
measured directly because of the k.
Stochastic gradient approximation using finite differences
[13] provides a way to find uin Equation (1). The control
law is
uk+1 =uk+akgk,
where ak=a
(1+k+A)α, and
gk=y(uk+ck)y(ukck)
2ck
.
This control law requires choosing values for several engi-
neering constants: a,α,A, and ck. Even more problematic
is that the concurrency level must be changed twice (i.e.,
ukck,uk+ck) before selecting a new concurrency level.
Doing so, adds to variability, and slows settling times.
Our approach is to adapt the above control law in sev-
eral ways: (a) reduce the number of changes in concur-
rency levels, (b) address the fact that concurrency level is a
discrete actuator, and (c) exploit the convex nature of the
concurrency-throughput curve. We use the control law:
uk+1 =uk+sign(∆km )dakm |km|e,(2)
where |x|is the absolute value of xand dxeis the ceiling
function (with d0e= 1). Note that the concurrency level
changes by at least 1 from ukto uk+1. We estimate the
State 2 - Looking for move
State 1- Initializing history
State 1a - InTransition
State 2a - InTransition
TaTb
TeTf
TcTd
Figure 4: HC3state diagram.
Transition Description
TaCompleted initialization
TbChange point while looking for a move
TcChanged concurrency level
TdEnd of ThreadPool transient
TeChanged concurrency level
TfEnd of ThreadPool transient
Figure 5: Description of HC3state transitions in Figure 4.
derivative of fusing:
km =¯ykm ¯yk1
ukuk1
,(3)
where ¯ykm is average throughput at concurrency level uk
after mmeasurements. (No second index is used for ¯yk1
since no measurements are being collected at previous con-
currency levels.) Equation (3) avoids the problems of com-
puting throughputs at two additional concurrency levels as
in Spall’s approach since the curve tangent is approximated
by the line through throughputs at ukand uk1. Further,
observe that if we do not collect throughputs during the
transient introduced by changing concurrency levels, then
E(∆km) = E(∆k) = f(uk)f(uk1)
ukuk1
.
Equation (2) contains the term akm, which has the same
form as Spall’s ak. We use a=geskm ,α= 0.5, and A= 0.
skm is the standard deviation of the sample mean of the m
throughput values collected at uk, and g > 0 is the control
gain. We include a term that decreases with the standard
deviation of throughput so that the controller moves more
slowly when throughput variance is large. Thus,
akm =eskm g
k+ 1 ,(4)
Observe that akm converges to 0 as mbecomes large since
skm converges to 0. Further, akm 0 as kbecomes large.
While the approach described above resolves several issues
with using stochastic gradient approximation, this approach
deviates from Spall’s assumptions and hence his convergence
results no longer apply. We address this by combining Equa-
tion (2) with a set of rules. But this raises the following issue
and motivates the approach taken:
Issue 1: How do we combine hill climbing with
rules?
Approach 1: Use hybrid control.
Hybrid control [8] allows us to combine the control law
in Equation (2) with rules to ensure convergence to uand
adaptations to changes in f. We refer to this as HC3, the
hill climbing concurrency controller. Figure 4 displays
the states used in HC3, and Figure 5 describes the transi-
tions between these states. HC3estimates the slope of the
concurrency-throughput curve based on two points. State 1
Controller
XC
CPU
XD
DISK
DISK
Memory
Allocation
ThreadPool
QueueUserWorkItem()
ResourceEmulator
CPU
1
N
u work items in active set
allocates q fraction of memory
Figure 6: Test scenario used in controller evaluation studies.
collects data to estimate f(uk1), and state 2 does the same
for f(uk). In addition, states 1a and 2a are used to address
the dynamics of changing concurrency levels. Considera-
tions for changes in the currency-throughput curve are ad-
dressed in part by transition Tb, which is described in more
detail in Section 4.
The core of HC3is the set of rules associated with transi-
tion Te:
Re,1: If ¯yk1is significantly less than ¯ykm, then apply
Equation (2).
Re,2: If ¯yk1is significantly greater than ¯ykm, then
uk+1 =uk1.
Re,3: If ¯yk1is statistically identical to ¯ykm, sufficient
data have been collected, and uk1< uk, then uk+1 =
uk1(to minimize the number of threads).
Re,4: If the controller is “stuck in State 2”, then make
an exploratory move.
Note that the term “significantly” refers to the use of a sta-
tistical test to detect differences in population means as in
[9].
We mention in passing that hybrid control lends itself to
proving various properties such as convergence (see [8]). Un-
fortunately, space limitations preclude providing proof de-
tails for HC3.
We address one further issue:
Issue 2: How do we obtain values of the engineer-
ing constants?
These constants are: the control gain gin Equation (4),
the statistical significance level (which is used in the statis-
tical tests transitions Tband Te), and the constant used in
Re,4(to detect “stuck in state”). We resolve this by:
Approach 2: Use a test environment to evaluate
engineering constants for a large number of scenar-
ios.
Based on experiments conducted using the test environ-
ment described in Section 3, we determined the following:
g= 5, a significance level of 0.01, and a threshold of 20
for “stuck in state.” Figure 3(b) plots HC3performance for
the same dynamic workload applied to the current Thread-
Pool controller in Figure 3(a). We see that HC3produces
much larger throughputs than the current algorithm using
the automatic-adjustment policy. Further, HC3achieves
throughputs comparable to (and sometimes greater than)
the current algorithm with the fixed-max policy, and HC3
uses many fewer threads.
0 2000 4000 6000 8000
0
10
20
30
40
50
Throughput / #Threads
Time (sec)
Figure 7: Throughput (circles) at control settings specified by
a cyclic ramp (line).
3. TEST ENVIRONMENT
This section describes the test environment used to ad-
dress a number of considerations for which theory cannot
be applied. These considerations include: determining the
engineering constants identified in Section 2, debugging the
controller implementation, and doing comparisons of con-
troller designs and implementations.
Our focus is on unit testing. Unit tests are conducted
on specific components or features early in the development
cycle to eliminate logical errors and to assess performance
characteristics of the component under test. We use unit
tests to focus on controller robustness to “corner cases” that
occur infrequently but can lead to very poor performance
or even instabilities. In contrast, system test, which we do
not address, is conducted on a complete product (e.g., the
Windows Operating System) late in the development cycle,
and emphasizes representative customer scenarios.
The approach taken for unit test of the ThreadPool is to
generate synthetic work items according to a workload pro-
file. The profile describes the type and amount of resources
to consume, such as CPU, memory, web services. Resources
such as CPU and memory are of particular interest since
excessive utilizations of these resources leads to thrashing,
and optimizing concurrency levels in the presence of thrash-
ing is a particular challenge for controllers. We use the term
workload to refer to a set of work items with the same work
profile. In our controller assessments, we vary the workloads
dynamically to see how quickly the controller finds the op-
timal concurrency level.
There are two parts to our test environment—the test
harness and the test scenarios. The test harness is the in-
frastructure for running tests, which encompasses generat-
ing synthetic work and reporting measurement results. The
test harness should produce efficient, low variance through-
put measurement to facilitate comparisons between a large
number of design alternatives.
In our initial design, the test harness executed synthetic
workloads on physical resources on the test machine and
so consumed CPU, memory, and other resources on these
machines. This resulted in long execution times and highly
variable test results, both of which limited our ability to
explore a large number of scenarios.
Issue 3: How can we do efficient, low variance test-
ing?
Approach 3: Use resource emulation.
By resource emulation, we mean that threads sleep for the
time that they would have consumed the resource. The con-
troller does not know that resource consumption is emulated,
and so its control logic is unchanged. However, resource em-
ulation greatly reduces the load on the test machine, which
allows us to scale greatly the test scenarios.
Resource emulation is provided by the ResourceEmulator
that exposes an interface to create resource types (e.g., CPU,
memory, and disk) and to consume instances of resource
types. A resource may be active or passive. Active resources
perform work, like CPU executions and network transfers.
Passive resources, such as memory and database locks, are
required in order to use an active resource. The emulation
includes thrashing resulting resulting from contention for
passive resources. Thrashing is incorporated as expansion
factors for active resource execution times.
Using resource emulation improved the efficiency of the
test harness by a factor of twenty, and it almost eliminated
measurement variability. To see the latter, Figure 7 displays
results of an open loop test in which concurrency level is var-
ied from 5 to 50 over 7,500 seconds for a dynamic workload.
Because of the low measurement variability, we can clearly
see the effects of thrashing, such as the drop in throughput
around time 2,000 as concurrency level is increased beyond
27. The increased efficiency and reduced variability of using
resource emulation meant that we could run a large number
of scenarios in our evaluations of engineering constants and
candidate algorithms.
The second part of the test environment is the scenarios
executed on the test harness. These scenarios are described
in terms of when workloads enter and leave and the synthetic
resources that they consume. An important consideration is
the issue below:
Issue 4: How do we know the optimal concurrency
level for a scenario?
Approach 4: Construct scenarios for which the
optimal concurrency level can be computed analyti-
cally so that expected controller performance is known
a priori.
One such scenario is the widely-used central server model
(e.g., [10]) that describes work flows in operating systems.
Figure 6 depicts the specifics for our problem. Let Mibe
the number of profile iwork items that enter the Thread-
Pool, and so M=PiMiis the total number of work items.
Once the concurrency level permits, work items from profile
ienter the active set and acquire a fraction qiof available
memory. In the sequel, the concurrency level uis equal to
the size of the active set since actuator dead time is irrele-
vant for this optimization problem. There are Nsynthetic
CPUs. Let XS,i be the nominal CPU execution time of
a work item from workload i. Its actual execution time
is expanded by the overcommitment of memory. That is,
if there are Iworkloads and uiis the number of profile
iwork items in the active set, then the expansion factor
e=Max{1, q1u1/u +···+qIuI/u}, where u=u1+···+uI.
So, the actual CPU execution time of a profile iwork item
is eXS,i. Work items consume a synthetic disk for XP,i
seconds. We use synthetic disks to model requirements for
external resources whose service times do not depend on
local resources (e.g., web service accesses). As such, mem-
ory contention does not affect execution times for synthetic
disks.
The simplicity of the central server model makes it easy to
to develop an analytic solution for the optimal concurrency
level, under certain simplifying assumptions. Consider a sin-
gle workload and deterministic execution times. There are
two cases. If M q 1, then e= 1 and so u=M. Now,
consider uq > 1 and for all concurrency levels under consid-
eration and so e > 1. Clearly, we want ularge enough so
that we obtain the benefits of concurrent execution of CPU
and disk resources, but we do not want uso large that work
items wait for CPU since this increases execution times by a
factor of ewithout an increase in disk throughput. Since ex-
ecution times are deterministic, no queue forms at the CPUs
as long as the flow out from the disks equals the flow out
from the CPUs. That is, optimal throughput is achieved
when N
uqXS=u1
XPu
XPSolving, we have:
urrN
q,(5)
where r=XP/XS. This is easily extended to multiple
workloads by having q=Pi
Mi
Mqi,XS=PMi
MXS,i, and
XP=PMi
MXP,i. For example, in Figure 7, there are two
workloads during Region III (time 3,000 to 4,500) with M1=
20, XS,1= 0, XP,1= 1000ms,M2= 40, q2= 0.04, XS,2=
50ms,XP,2= 950ms. Equation (5) produces the estimate
u= 33, which corresponds closely to concurrency level at
which peak throughput occurs in Figure 7.
To summarize our insights on test scenarios, using the
central server model allows us to assess controller perfor-
mance based on the scenario’s (analytically computed) opti-
mal concurrency levels. Further, the fact that central server
scenarios have a simple parameterization (Mi,qi,XS,i, and
XP,i) provides a way to construct a test matrix that system-
atically explores design alternatives.
4. CONTROLLER IMPLEMENTATION
This section describes key implementation considerations
in HC3. Among these considerations are the structure of the
controller code and techniques for managing the variance of
measured throughputs.
The controller is implemented in C#, an object-oriented
language similar to JAVAT M . An object-oriented design
helps us address certain implementation requirements. For
example, we want to experiment with multiple controller
implementations, many of which have features in common
(e.g., logging). We use inheritance so that features common
to several controllers are implemented in classes from which
other controllers inherit. The controller code is structured
into three parts: implementation of the state machine in
Figure 4, implementation of the “if”’ part of transition rules
such as Re,1,···, Re,4, and implementation of the “then”
part of transition rules (e.g., Equation (2)).
The effectiveness of hill climbing with stochastics is largely
determined by the variability of the estimates of the through-
puts at each concurrency level.
Issue 5: How can the effects of throughput vari-
ance be minimized?
This issue is addressed in two ways. The first is:
Approach 5a: Do not collect throughputs during
concurrency transitions.
To elaborate, one source of throughput variability is changes
in the concurrency level as a result of controller actions. We
manage this by including states 1a and 2a in Figure 4. The
controller enters an ”‘InTransition”’ state when it changes
the concurrency level, and it leaves an “InTransition” state
under either of two conditions: (1) the observed number of
# Area Issue Approach
1 Design How combine hill climbing and rules? Use hybrid control.
2 Design How obtain values of engineering constants? Use test environment to run many scenarios.
3 Test How do efficient, low-variance testing? Use resource emulation in the test harness.
4 Test How know optimal concurrency level? Use scenarios with analytic solutions.
5 Implement How minimize effects of variance? (a) Do not collect throughputs during transitions.
(b) Discard dissimilar throughputs.
Figure 8: Summary of issues and approaches to their resolution used in the development of HC3.
threads equals the controller specified concurrency level; or
(2) the observed concurrency level is less than the number
of threads, and the ThreadPool queue is empty.
The second approach to Issue 5 is:
Approach 5b: Discard dissimilar throughput ob-
servations.
The concurrency-throughput curve changes under several
conditions: (a) new workloads arrives; (b) the existing work-
loads change their profiles (e.g. move from a CPU intensive
phase to an I/O intensive phase); and (c) there is compe-
tition with threads in other processes that reduces the ef-
fective bandwidth of resources. Transition Tbin Figure 4
detects these situations by using change point detection [2].
Change point detection is an on-line statistical test that is
widely used in manufacturing to detect process changes. For
example, change point detection is used in wafer fabrica-
tion to detect anomalous changes in width widths. We use
change point detection in two ways. First, we prune older
throughputs in the measurement history if they differ greatly
from later measurements since the older measurements may
be due to transitions between concurrency levels. Second, we
look for change points evident in recently observed through-
puts at the same concurrency level.
5. CONCLUSIONS
This paper aids in making control engineering more ac-
cessible to software practitioners by addressing software is-
sues in controller design, testing, and implementation. This
is done through a case study of a controller for optimizing
concurrency levels in the .NET ThreadPool.
We structure the case study in terms of issues encoun-
tered and approaches taken to their resolution. Figure 8
summarizes the results. We believe that many of the is-
sues and approaches have broad application. For example,
Issue 1 concerns how to systematically combine formal con-
trol laws with engineering-based insights expressed as rules.
Our approach, using hybrid control, is a very general solu-
tion that is easily implemented in software. Issue 3 also has
broad application. This issue concerns the development of
an efficient, low variance test environment, a common chal-
lenge in controller development for software systems. Our
approach uses resource emulation, which improves efficiency
by a factor of twenty in our experiments. Issue 4, know-
ing the desired outcome of a performance test, is a broad
concern in assessing resource management solutions of soft-
ware systems. Our approach, using scenarios for which an-
alytic solutions can be obtained, improves the effectiveness
of testing and provides a systematic approach to test case
construction based on the parameters of the analytic model.
In terms of future work, we are exploring broader appli-
cations of HC3, such as to load balancing and configuration
optimization. These potential applications in turn motivate
some new directions with HC3—extending it to multiple in-
put, multiple output control.
6. REFERENCES
[1] T. Abdelzaher, Y. Diao, J. L. Hellerstein, C. Yu, and
X. Zhu. Introduction to control theory and its application
to computing systems. In Z. Liu and C. Xia, editors,
Performance Modeling and Engineering, pages 185–216.
Springer-Verlag, 2008.
[2] M. Basseville and I. Nikiforov. Detection of Abrupt
Changes: Theory and Applications. Prentice Hall, 1993.
[3] R. Blake. Optimal control of thrashing. In SIGMETRICS
Performance Evaluation Review, volume 11, pages 1–10,
1982.
[4] Y. Diao, J. L. Hellerstein, and S. Parekh. Optimizing
quality of service using fuzzy control. In Distributed
Systems Operations and Management, pages 42–53, 2002.
[5] Y. Diao, J. L. Hellerstein, A. Storm, M. Surendra,
S. Lightstone, S. Parekh, and C. Garcia-Arellano. Using
MIMO linear control for load balancing in computing
systems. In Proceedings of the American Control
Conference, pages 2045–2050, June 2004.
[6] J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury.
Feedback Control of Computing Systems. John Wiley &
Sons, 2004.
[7] C. V. Hollot, V. Misra, D. Towsley, and W. B. Gong. A
control theoretic analysis of RED. In Proceedings of IEEE
INFOCOM, pages 1510–1519, Anchorage, Alaska, Apr.
2001.
[8] E. A. Lee and P. Varaiya. Signals and Systems. Addison
Wesley, 1st edition, 2003.
[9] B. W. Lindgren. Statistical Theory. The MacMillian
Company, 4th edition, 1968.
[10] D. A. Menasce, V. A. Almeida, and L. W. Dowdy. Capacity
Planning and Performance Modeling. Prentice Hall, 1994.
[11] S. Parekh, K. Rose, Y. Diao, V. Chang, J. L. Hellerstein,
S. Lightstone, and M. Huras. Throttling utilities in the ibm
db2 universal database server. In Proceedings of the
American Control Conference, June 2004.
[12] S. Pratschner. Common Language Runtime. Microsoft
Press, 1st edition, 2005.
[13] J. C. Spall. Introduction to Stochastic Search and
Optimization. Wiley-Interscience, 1st edition, 2003.
[14] X. Wang, D. Jia, C. Lu, and X. Koutsoukos. Deucon:
Decentralized end-to-end utilization control for distributed
real-time systems. IEEE Transactions on Parallel and
Distributed Systems, 18(7):996–1009, 2007.
[15] C.-Z. Xu and B. Liu. Model predictive feedback control for
qos assurance in webservers. IEEE Computer, 41(3):66–72,
2008.
[16] R. Zhang, C. Lu, T. F. Abdelzaher, and J. A. Stankovic.
Controlware: A middleware architecture for feedback
control of software performance. In Internation Conference
on Distributed Computing Systems, pages 301–310, 2002.
... Such elasticity can avoid overprovisioning and thus significantly reduces server cost. While there has been a lot of work on dynamic resource policies for stateless systems [ Chase et al., 2001;Urgaonkar et al., 2005b;Liu et al., 2006;Kusic et al., 2008;Hellerstein et al., 2008;Tesauro et al., 2006 ] , such as web or application servers, little progress has been made for stateful systems. ...
... Much has been published on dynamic resource allocation for stateless systems such as Web servers or application servers [ Chase et al., 2001;Urgaonkar et al., 2005b;Liu et al., 2006;Kusic et al., 2008;Hellerstein et al., 2008;Tesauro et al., 2006 ] . However, most of that work does not directly apply to stateful systems since the control polices for stateless systems do not have to consider data partitioning and replication. ...
... In [5], there is a performance model predictor in their system, which predicts a 5 min workload (req/sec) using the recent 15 minutes of workload. Smoothing splines [6] are used for the performance model predictor. Using the predicted workload, they estimate the number of required servers and the fraction of requests that violate the Service Level Agreement (SLA). ...
Conference Paper
Full-text available
Efficient resource management in cloud computing research is a crucial problem because resource over-provisioning increases costs for cloud providers and cloud customers; resource under-provisioning increases the application latency, and it may violate service level agreements, which eventually makes cloud providers lose their customers and income. As a result, researchers have been striving to develop optimal resource management in cloud computing environments in different ways, such as container placement, job scheduling and multi-resource scheduling. Machine learning techniques are extensively used in this area. In this paper, we present a comprehensive survey on the projects that leveraged machine learning techniques for resource management solutions in the cloud computing environment. At the end, we provide a comparison between these projects. Furthermore, we propose some future directions that will guide researchers to advance this field.
... A natural way to minimize usage while meeting performance requirements is to automat- ically allocate resources based on the current demand. However, despite a growing body of research on auto- matic control of Internet applications [2,11,6,5,4,10], application operators remain skeptical of such methods, and provisioning is typically performed manually. ...
Article
Full-text available
Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. We argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning. In particular, we apply rich statistical models of the application's performance, simulation-based methods for finding an optimal control policy, and change-point methods to find abrupt changes in performance. Preliminary results running aWeb 2.0 benchmark application driven by real workload traces on Amazon's EC2 cloud show that our method can effectively control the number of servers, even in the face of performance anomalies.
Conference Paper
Elasticity of cloud computing environments provides an economic incentive for automatic resource allocation of stateful systems running in the cloud. However, these systems have to meet strict performance Service-Level Objectives (SLOs) expressed using upper percentiles of request latency, such as the 99th. Such latency measurements are very noisy, which complicates the design of the dynamic resource allocation. We design and evaluate the SCADS Director, a control framework that reconfigures the storage system on-the-fly in response to workload changes using a performance model of the system. We demonstrate that such a framework can respond to both unexpected data hotspots and diurnal workload patterns without violating strict performance SLOs.
Chapter
Full-text available
Feedback control is central to managing computing systems and data networks. Unfortunately, computing practitioners typically approach the design of feedback control in an ad hoc manner. Control theory provides a systematic approach to designing feedback loops that are stable in that they avoid wild oscillations, accurate in that they achieve objectives such as target response times for service level management, and settle quickly to their steady state values. This paper provides an introduction to control theory for computing practitioners with an emphasis on applications in the areas of database systems, real-time systems, virtualized servers, and power management.
Article
Full-text available
The method of discrete optimal control is applied to control thrashing in a virtual memory. Certain difficulties with several previous approaches are discussed. The mechanism of optimal control is presented as an effective, inexpensive alternative. A simple, ideal policy is devised to illustrate the method. A new feedback parameter, the thrashing level, is found to be a positive and robust indicator of thrashing. When applied to a real system, the idealized policy effectively controlled the virtual memory.
Conference Paper
Full-text available
Load balancing is widely used in computing systems as a way to optimize performance by equalizing loads to reduce delays, such as adjusting the size of memory pools to balance resource demands in a database management system. Load balancing is generally approached as a nonlinear constrained optimization in which dynamics are ignored. We approach load balancing differently - as a feedback controller design problem using a multiple input multiple output linear quadratic regulator (LQR) that achieves the constrained optimization objective. Such an approach allows us to exploit well-established techniques for handling disturbances (e.g., due to changes in workloads) and to incorporate the cost of control (e.g., throughput reductions due to resizing buffer pools) by properly selecting the LQR Q and R matrices. From studies of DB2 Universal Database Server using industry standard benchmarks, we show that the controller obtains a factor of three increases in throughput for an OLTP workload and a 59% reduction in response times for a DSS workload.
Conference Paper
Full-text available
This paper describes a control system that provides the "utilities throttling" feature in the IBM/spl reg/ DBS /spl reg/ Universal Database/spl trade/ v8.1. Administrative utilities (e.g., file system and database backups, antivirus scan) are essential to the operation of production systems. Unfortunately, production work can be severely degraded by the concurrent execution of such utilities. Hence, it is desirable for the system to self-manage its utilities to limit their performance impact, with only high-level policy input from the administrator. We focus on policies of the form "there should be no more than an x% degradation of production work due to utility execution." We have designed a throttling mechanism called self-imposed sleep (SIS), which forces utilities to slow down their processing, by a configurable amount. We design a feedback control system based on online measurements of an internal database metric that correlates with system performance. A novel aspect of this problem is estimating the baseline, defined as the performance that the system would provide if the utility were not executing. The complete control system combines an online state estimator with a PI controller that achieves good performance and adapts to changing workloads.
Conference Paper
The rapid growth of eCommerce increasingly means busi- ness revenues depend on providing good quality of service (QoS) for web site interactions. Traditionally, system administrators have been respon- sible for optimizing tuning parameters, a process that is time-consuming and skills-intensive, and therefore high cost. This paper describes an ap- proach to automating parameter tuning using a fuzzy controller that employs rules incorporating qualitative knowledge of the effect of tuning parameters. An example of such qualitative knowledge in the Apache web server is “MaxClients has a concave upward effect on response times.” Our studies using a real Apache web server suggest that such a scheme can improve performance without human intervention. Further, we show that the controller can automatically adapt to changes in workloads.
Book
A pioneering reference on control theory designed specifically for computer professionals In today’s computerized economy, the profitability and availability of many businesses rely on a clear understanding of the dynamics of their computing systems, especially how they relate to changes in workloads and configuration. Frequent, dramatic changes in these areas can degrade programs and even cause failures, as often demonstrated on a company’s eCommerce site. In the first book ever to address this vital issue, Feedback Control of Computing Systems offers program designers and computer scientists a hands-on resource that distills the essentials of control theory needed to effectively address these issues. Primarily intended for professionals engaged in the analysis and design of computing systems, who often lack the conceptual tools to address the dynamics of resource management, the text provides a host of valuable features: MATLAB code for the solution of problems and case studies Numerous IT examples, such as tuning the Apache HTTP server and the IBM Lotus Domino Server Examples carefully chosen to illustrate key concepts and provide the basis for modeling real-world systems Several appendices for handy reference A flexible "road map" approach helps different practitioners use the text in highly targeted ways, depending on their particular focus, making Feedback Control of Computing Systems an invaluable resource for systems designers, IT managers, computer scientists, as well as other researchers in the field.
Conference Paper
Attainment of software performance assurances in open, largely unpredictable environments has recently become an important focus for real-time research. Unlike closed embedded systems, many contemporary distributed real-time applications operate in environments where offered load and available resources suffer considerable random fluctuations, thereby complicating the performance assurance problem. Feedback control theory has recently been identified as a promising analytic foundation for controlling performance of such unpredictable, poorly modeled software systems, the same way other engineering disciplines have used this theory for physical process control. In this paper we describe the design and implementation of ControlWare, a middleware QoS-control architecture based on control theory, motivated by the needs of performance-assured Internet services. It offers a new type of guarantees we call convergence guarantees that lie between hard and probabilistic guarantees. The efficacy of the architecture in achieving its QoS goals under realistic load conditions is demonstrated in the context of web server and proxy QoS management.