Content uploaded by Waheed Iqbal
Author content
All content in this area was uploaded by Waheed Iqbal on Sep 25, 2019
Content may be subject to copyright.
1
Predictive Auto-scaling of Multi-tier Applications
Using Performance Varying Cloud Resources
Waheed Iqbal, Abdelkarim Erradi, Muhammad Abdullah, Arif Mahmood
Abstract—The performance of the same type of cloud resources, such as virtual machines (VMs), varies over time mainly due to
hardware heterogeneity, resource contention among co-located VMs, and virtualization overhead. The performance variation can be
significant, introducing challenges to learn workload-specific resource provisioning policies to automatically scale the cloud-hosted
applications to maintain the desired response time. Moreover, auto-scaling multi-tier applications using minimal resources is even more
challenging because bottlenecks may occur on multiple tiers concurrently. In this paper, we address the problem of using performance
varying VMs for gracefully auto-scaling a multi-tier application using minimal resources to handle dynamically increasing workloads
and satisfy the response time requirements. The proposed system uses a supervised learning method to identify the appropriate
resources provisioning for multi-tier applications based on the prediction of the application response time and the request arrival rate.
The supervised learning method learns a state transition configuration map which encodes a resource allocation states invariant to the
underlying VMs performance variations. This configuration map helps to use performance varying resources in predictive autoscaling
method. Our experimental evaluation using a real-world multi-tier web application hosted on a public cloud shows an improved
application performance with minimal resources compared to conventional predictive auto-scaling methods.
Index Terms—Cloud Computing, Dynamic Scalability, Performance Varying VMs, Predictive Auto-scaling, Machine Learning, Service
Level Objective
F
1 INTRODUCTION
APPLICATIONS with bursty and dynamically increas-
ing workloads can leverage the on-demand resource
provisioning feature of cloud computing to maintain the
desired response time service level objective (SLO). This
requires, identifying the need for timely resource provi-
sioning and then provisioning the appropriate resources
to maintain the required response time SLO. A single tier
application is easier to manage to satisfy the response time
SLO by scaling the entire application based on the response
time prediction. However, meeting the response time SLO
for a multi-tier application under time-varying workload
is challenging mainly due to the complexity of the multi-
tier architecture and bottlenecks may occur at the same
time on multiple tiers. Therefore, automatic identification
and scaling of bottleneck tiers are vital for minimizing the
operational cost while maintaining the desired application
performance.
Theoretically, it is possible to detect bottlenecks in multi-
tier applications using low-level hardware profiling infor-
mation such as CPU, memory, bandwidth, and I/O usage.
However, practically obtaining such fine-grained and low-
level hardware resources utilization may incur performance
degradation, increase the complexity of virtualization, and
in some cases may not be possible due to security concerns
•W. Iqbal and M. Abdullah are with College of Information Technol-
ogy, University of the Punjab, Lahore, Pakistan. Emails: {waheed.iqbal,
muhammad.abdullah}@pucit.edu.pk.
•A. Erradi is with Department of Computer Science and Engineer-
ing, College of Engineering, Qatar University, Doha, Qatar. Email:
erradi@qu.edu.qa.
•A. Mahmood is with the Department of Computer Science,
Information Technology University, Lahore, Pakistan. Email:
arif.mahmood@itu.edu.pk.
Fig. 1: Scalable multi-tier application architecture.
of application owners. Moreover, bottleneck detection and
resolution techniques based on low-level hardware utiliza-
tion would not identify software misconfiguration issues. A
typical example of software misconfiguration is an inappro-
priate number of threads or number of connections from the
web server to the database server. These misconfigurations
may cause a significant impact on response time. Though
the fine-grained methods would not be able to detect and
resolve these misconfigurations automatically. Therefore,
we advocate using coarse-grained application performance
data such as response time and throughput to identify and
resolve the performance issues in cloud-hosted multi-tier
web applications.
A typical scalable multi-tier application may consist of N
tiers where each tier is load balanced independently using
a set of dynamically provisioned VMs as shown in Figure
1. Multi-tier web applications can be automatically scaled
to maintain performance using different methods including
user-defined policies, machine learning methods [1], [2], [3],
[4], and heuristics [5], [6]. However, the dynamically chang-
ing workload behaviors and the performance variations of
VMs introduce challenges to build predictive auto-scaling
2
(a) Response time for each iteration.
(b) SLO violations, maximum throughput, and total VMs
used.
Fig. 2: Ten different iterations to scale the benchmark multi-
tier web application hosted on AWS cloud for increasing
workloads.
methods for multi-tier web applications to maintain the
application performance SLOs.
Infrastructure as a Service (IaaS) cloud provides on-
demand resources specifically a wide range of VMs offering
different configurations in terms of the number of cores,
memory, storage, bandwidth, and I/O capacity. The com-
mon types of VMs are small, medium, large, and extra large
depending upon the allocated hardware resources. A user
can instantiate multiple instances of the same or different
VM types. Several studies reported inconsistent and unpre-
dictable performance of the same type of VMs mainly due
to the complexity of virtualization, the resources sharing
among multiple tenants, and the underlying heterogeneous
physical infrastructure [7], [8], [9], [10], [11], [12]. Predictive
auto-scaling methods are widely used to learn resource
provisioning policies for cloud-hosted applications [2], [3],
[4], [13], [14], [15], [16]. However, due to inconsistent per-
formance of VM instances, these predictive models may not
yield the desired performance. In this paper, we investigate
the performance varying behavior of VMs in multi-tier web
applications and then propose a method to effectively use
these performance varying VMs in predictive auto-scaling
methods to gracefully scale the hosted application.
To highlight the challenges associated with performance
varying VMs in the cloud, we conducted some experiments
using a multi-tier benchmark web application on Amazon
Web Services (AWS). We used RUBiS [17] benchmark web
application and hosted it on AWS using t2.small instances.
Then we implemented a multi-tier auto-scaling method
similar to that of Urgaonkar et al. [18]. Their method used an
analytical model to identify the need to scale an application
and then scaled-out all tiers of that application. In our
implementation of their method, instead of the analytical
model, we use a reactive approach to determine the need
for scaling and then scale-out all tiers similar to their pro-
posed method. Our implementation dynamically increases
the number of VMs allocated to each tier of the application
when a response time saturation occurs. We considered 200
milliseconds as the response time SLO threshold to scale-
out all tiers. We evaluated this scaling method using a
linearly increasing workload and repeated the experiment
ten times. Figure 2(a) shows the box plot for the response
time observed during each iteration. The box plot shows
the variation in response time for different iterations. For
example, the application during iterations 2 and 6 shows
the worst performance, while during iterations 3, 4, 8-10 it
shows the best performance. Figure 2(b) shows total SLO
violations, maximum throughput observed, and the total
number of VMs used in the ten different iterations. High
throughput and a low number of SLO violations are ex-
pected from a scalable application. Iterations 2 and 6 exhib-
ited the highest SLO violations and the lowest throughput.
Whereas iteration 6 consumed more CPU hours compared
to all other iterations. Furthermore, we observed perfor-
mance variations in all the iterations. This behavior confirms
the fact that using the same type of VMs yield different ap-
plication performance. Hence, learning auto-scaling policies
from historical logs of applications hosted on performance
varying VMs -based on a specific set of observations would
not yield the desired application performance.
In this paper, we address the problem of using perfor-
mance varying VMs for auto-scaling multi-tier web appli-
cations. Our proposed solution assumes that the system
monitors the application traffic using access logs captured
every ξseconds. Then it uses the arrival rate of the last
kintervals to predict the arrival rate for the next interval.
Then we use the predicted arrival rate and the observed
response time of the last kintervals to learn a response
time prediction model using polynomial regression. Based
on the predicted response time, the system automatically
provisions the required resources based on the application
configuration map learned from a Random Decision Forest
(RDF) [19] that was trained using the historical performance
logs obtained using different administrator-defined provi-
sioning policies. Unlike typical resource provisioning meth-
ods based on machine learning techniques [3], [13], [14],
[15] which are trained on historical input parameters (pos-
sibly arrival rate, response time, and hardware utilization
of VMs), our proposed solution does not depend on such
input parameters. The proposed method learn an order of
resource provisioning configurations, which is maintained
despite performance variations in the same type of VM
instances. The main contributions of this paper include:
i. Study the effect of performance varying cloud VM
instances on a multi-tier web application scalability.
ii. Propose a new method to gracefully scale multi-tier web
application using performance varying VMs.
iii. Evaluation of the proposed method on a public cloud
platform using a multi-tier web application.
The rest of the paper is organized as follows. Related
work is presented in Section II. We explain the auto-scaling
policy learning in Section III. Our proposed auto-scaling
method to address performance varying VMs is presented
in Section IV. The experimental setup is discussed in Section
V. Results and evaluations are given in Section VI. Finally,
conclusions and future work are discussed in Section VII.
3
4. True
......
Response Time
3.
Polynomial
Regression
Response Time Prediction
Application
Access Log
Monitoring
Number of Trees = 10
Depth = 4
Random Decision Forest
Trained on Historic Logs
Provisioned Resources
Linear
Regression
......
Arrival Rate Prediction
# of Requests
5. Adjust resources to
next configuration
1. After every
regular time
interval
2.
State Transition
Configuration Map
(RDF Map)
...
...
Fig. 3: Proposed system design to dynamically provision resources to satisfy the response time SLO requirements using
performance varying cloud VMs.
2 RE LATED WORK
Several research efforts focused on maintaining the per-
formance of typical single tier cloud-hosted applications
using auto-scaling methods. For example, Bodik et al. [20]
presented a statistical machine learning approach to predict
the system performance for a single tier application. Liu
et al. [21] proposed a method to scale applications hosted
on the cloud by monitoring CPU and bandwidth utiliza-
tion. The proposed method dynamically upgrade the VM
types to sustain unexpected workloads. Recently, Michael et
al. [22] used auto-scaling methods for cloud-hosted bioin-
formatics and biomedical applications to speed up the ap-
plication execution time. Victor et al. [23] used a regression-
based performance model for predicting the throughput of
NoSQL-based databases and then auto-scaled the resources
to satisfy the service level agreement (SLA) metrics. Wajahat
et al. [24] presented a machine learning-based auto-scaling
method. Their method uses neural networks for learning
application performance model and then use linear regres-
sion to predict the impact of scaling decisions on application
metrics. Persico et al. [25] proposed a fuzzy logic controller
using CPU and bandwidth utilization to scale cloud-hosted
application horizontally. Alexey et al. [26] presented a com-
parative study of different auto-scaling policies for applica-
tions hosted on the cloud using complex workflows. Salah
et al. [27] proposed an analytical model using a Markov
chain and queuing theory to scale firewall services to satisfy
performance SLO horizontally. Tania et al. [28] presents a
comprehensive survey of auto-scaling methods for cloud
applications. Another survey [29] discussed cloud resource
management methods and identified relevant challenges.
The authors identified achieving predictable performance of
cloud-hosted applications as one of the primary challenges.
Recently, few researchers have started to address the
problem of automatic provisioning of resources for multi-
tier applications. For example, Song et al. [30] presented
a hybrid auto-scaling method for multi-tier applications
hosted on the cloud. They used horizontal scaling for long-
term workload using historical traffic of the application and
used vertical scaling to decrease SLO violations for unex-
pected bursting workloads. Chenhao et al. [31] presented
a fault-tolerance model to use AWS EC2 spot instances
to reduce the operational cost of web applications. The
authors show cost reduction using the fault-tolerance model
and auto-scaling policies for the web applications hosted
on AWS Cloud. Adnan et al. [32] presented different VM
provisioning methods for multiple applications running on
the same infrastructure. The authors used three different
techniques including reactive, hybrid mixture of reactive
and proactive, and session-based adaptive admission con-
trol for resource provisioning.
There have been few efforts to devise machine learning-
based methods to scale cloud hosted applications automat-
ically. For example, Rao et al. [15] used a reinforcement
learning method for the provisioning of resources for web
applications to maximize the performance. Iqbal et. al [2]
also used a reinforcement learning approach to identify
workload specific resource provisioning policies for a multi-
tier web application. Lenar et al. [1] proposed a framework
for multi-tier applications to scale vertically. The proposed
framework used reinforcement learning (RL) to learn poli-
cies to vertically scale CPU and memory of VMs to satisfy
the application response time requirements. Bodik et al. [3]
presented a local regression-based approach for learning
a model for applications hosted on the cloud to allocate
resources efficiently.
The same type of cloud VMs provide different per-
formance due to many reasons, including hardware het-
erogeneity, virtualization overhead, and resource sharing
effects [11], [12], [33]. Currently, to the best of our knowl-
edge, there is no work that considers performance-varying
VMs in auto-scaling methods for cloud-hosted multi-tier
web applications. However, there are relatively few efforts
that consider performance varying VMs in batch processing
workloads. For example, Adam et al. [10] proposed an
optimization model using mixed-integer programming to
obtain predictable application performance using perfor-
mance varying VM instances.
The existing state-of-the-art predictive auto-scaling
methods for multi-tier web applications are either using
4
analytical modeling [34], [35], reinforcement learning [1],
[2], [15], or depends on fine-grained low level hardware
resource monitoring [36]. Moreover, these methods assume
similar performance for the same type of VMs. The research
reported in the current manuscript consider performance
inconsistencies and variations of the same type of VMs to
automatically scale multi-tier web applications hosted on
the cloud with minimal resources.
3 PROPOSED SYSTEM OVERVIEW
The overall design of the proposed system is illustrated in
Figure 3. Different steps are labeled to show the overall flow
of the system. The proposed approach works in three steps
as follows:
1) First, a Random Decision Forest (RDF) classifier is
trained offline using historical administrator-defined
provisioning policies. Then, the system extracts a state
transition configuration map from the RDF classifier,
named as RDF Map, encoding an order in the config-
urations invariant to the underlying VMs performance
variations. At any time interval, the RDF Map is used
to identify the appropriate resources required by the
application to satisfy the application performance in a
relative fashion. In Section 4, we explain the learning of
the RDF Map based on the administrator provisioning
policies.
2) Second, the system continuously monitors the applica-
tion access logs on regular time intervals and identi-
fies arrival rate and response time of the application
for each time interval to predict possible performance
violations. For the current time interval t, the system
predicts the arrival rate for the next time interval t+ 1
using kprevious arrival rate observations and a linear
regression model. Then the predicted arrival rate ˆat+1
is used to predict the application response time ˆrt+1
for the next time interval. We explain this process in
Section 5.1.
3) Finally, if the predicted response time is higher than the
user-defined response time SLO threshold τslo then the
system looks-up the next configuration from the RDF
Map and automatically provisions additional resources
required to satisfy the desired application response
time.
The traditional machine learning based auto-scaling
methods are trained in a rigid fashion based on histori-
cal observations including arrival rate, response time, and
resources utilization. Since the performance of VMs are
inconsistent, such approaches do not accurately learn the
relationship between the number of required VMs and the
arrival rate. Therefore as the performance of VMs varies,
previously trained models become obsolete. Hence, these
methods require frequent retraining to accurately predict
the appropriate required resources to handle the incoming
workload. In contrast, our proposed system identifies the
resource configurations required to maintain the applica-
tion’s performance despite variations of VMs performance.
If the current resources configuration is not yielding the
required performance, then the system transitions to the
next configuration retrieved from the RDF Map to provision
additional resources to the application. The RDF Map is
learned from the historical autoscaling policies enacted by
the administrator such that the relative ordering between
different provisioning configurations remains valid despite
the performance variations of the underlying VMs. The
intuition behind the proposed system is to eliminate the
dependency on input parameters such as the arrival rate and
response time when learning the auto-scaling policies. This
enables a graceful application auto-scale using performance
varying VMs and without frequent model retraining.
4 LEARNING MU LTI-TIER APPLICATIONS AUTO-
SCALING POLICY
It is challenging to identify the best configurations to re-
solve performance issues in cloud-hosted applications [37].
Traditionally, action policies are used to define the best
configurations (number of VMs for each tier) for different
workloads to provision the resources experiencing bottle-
necks [38], [39]. In the proposed approach, we learn the ap-
propriate configurations to scale different workloads based
on the administrator-defined scaling policies. Table 1 briefly
explains the action policies used to obtain the training
datasets used for predictive auto-scaling methods. Alter-
natively, reinforcement learning-based methods can also be
used to explore different resource provisioning policies for
automatic predictive auto-scaling [2], [40]. During the ex-
ploration phase of reinforcement learning, random actions
are taken over time, and system response is recorded to
identify the best actions for a given state. For random
actions, exploration phase may require significantly large
time to cover the whole action state-space. However, we
propose the actions to be predetermined to span the action
state-space in minimum time.
TABLE 1: Initial administrator-defined scaling policies.
Policy # Description
P-1
Scale-out all tiers on every SLO violation. When-
ever we observed a SLO violation, we scale-out
both the web and DB tiers simultaneously.
P-2
Scale-out only the web tier on every SLO vio-
lation. Whenever we observed a SLO violation,
we only scale-out the web tier.
P-3
Scale-out only the database tier on every SLO
violations. Whenever we observed a SLO viola-
tion, we only scale-out the database (DB) tier.
P-4
Scale-out the web and DB tiers alternatively on
every SLO violations. For example, scale-out the
web tier on the first SLO violation, scale-out the
DB on the second SLO violation, scale-out the
web tier on the third violation, scale out the DB
on fourth violation.
P-5
Scale-out with a specific policy derived from the
previous four policies. Scale-out the web tier on
the first violation, scale-out both tiers on second
violation, scale-out the web tier on the third
violation, and scale-out both tiers on the fourth
SLO violation.
4.1 Using Random Decision Forest for Learning Provi-
sioning Configurations
We generated an increasing workload and scaled a bench-
mark web application using various administrator-defined
action policies. For each policy, we gather the data includ-
ing the number of requests, 95th percentile of response
5
0
500
1000
1500
2000
2500
3000
3500
4000
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
P-1
P-2
P-3
P-4
P-5
SLO Threashold
Fig. 4: 95th percentile of response time using five
administrator-defined scaling policies. Each policy is sus-
pended once we observe two consecutive scale-out deci-
sions do not help in restoring the desired response time.
Fig. 5: AdaBoost (ADB), Support Vector Machine (SVM),
Naive Bayes (NB), K-Nearest Neighbors (KNN), and Ran-
dom Decision Forest (RDF) prediction accuracy on the test
dataset for learning resource configurations.
time, and configurations containing the number of VMs
allocated to each tier for each time interval. Then we use
this data to learn the provisioning configurations using a
supervised learning approach based on Random Decision
Forests (RDF). We discarded all the intervals showing SLO
violations to ensure the learner only learns the number of
requests and response time relationship with appropriate
configurations (number of VMs for each tier). For a ntiers
application, each configuration is labelled as Ct1,t2,··· ,tn
where t1, t2,··· , tnare the number of VMs allocated to
the specific tier. For example, a label C5,3for a two tiers
web application represents five web server VMs and three
database server VMs.
We consider each configuration as a distinct class, and
use RDF for classification using the resources configura-
tion, number of requests, and response time as the feature
vector. The use of RDF is motivated by many favorable
attributes. For example, RDF is very fast and highly accurate
multi-class classifiers [41] trainable with relatively smaller
datasets. The workload classification problem addressed in
this manuscript fits well with the RDF framework because
the number of distinct configurations is quite large and each
configuration is considered as an independent class. The
other commonly used classifiers such as SVM, AdaBoost,
Naive Bayes, and KNN have exhibited a lower performance
on our dataset as shown in Figure 5.
RDFs are the ensemble of multiple classifiers, where
each classifier is a decision tree [42], [43]. Each decision
tree contains two types of nodes including the split and the
leaf nodes. Binary-classification is performed on split nodes
using the value of a specific feature dimension v[j]such that
if v[j]≤, where is the learned threshold for that node,
then the class-label is assigned to the left partition otherwise
it is assigned to the right partition. For the case of linearly
separable classes, log2(c)decisions are required to separate
each class from the remaining c−1classes. For a test feature
vector, each tree in the forest will independently predict its
label. A final decision is taken using majority voting. We
train each tree in the RDF by randomly selecting 4
5of the
training data while the remaining 1
5is used for validation.
For each split node, a random feature subset of size
√mis selected, where mis the feature cardinality. Then
we search for the best feature feat[j]in this subset along
with an associated threshold τjso that we minimize the
number of configurations divided by the partition. More
precisely, we want the partition boundaries to maximally
match with the actual configuration (class) boundaries, and
not to cross or divide any configuration. We ensure this
criterion by maximizing the entropy reduction after each
partition. Reduction in entropy is also known as information
gain. Let h(q)be the entropy of the raw training data
h(q) = −
c
X
i
|cfgi|
|q|log2|cfgi|
|q|,(1)
where |cfgi|is the number of samples in i-th configuration,
|q|is the number of samples in the training data. Let
h(q|cfg[j], τj)be the entropy after partitioning data into two
partitions, qrand ql,
h(q|{feat[j], τj}) = |qr|
|q|h(qr) + |ql|
|q|h(ql),(2)
where qrand qlare the number of data points in the right
and the left partitions. The entropy h(qr)is given by
h(qr) = −
c
X
j
|cfgj|
|qr|log2|cfgj|
|qr|,(3)
where |cfgj|are the number of samples of configuration jin
the right partition, qr. Information gain given a partition is
defined as
g(q|{feat[j], τj}) = h(q)−h(q|{feat[j], τj}),(4)
The feature and threshold selection is performed to
maximize the gain. Partitions containing a single configura-
tion are considered as the leaf nodes, and the partitioning
process stops. The partitions containing more than one
configuration are further considered. If the maximum tree
height is reached while some partitions still contain multi-
class labels, majority voting is used to find the partition
label.
Compared to the deep neural networks, RDF can be
trained on significantly smaller datasets. The size of train-
ing data depends on the maximum possible set of system
configurations (that is the number of classes), the number of
trees in the RDF, and the depth of each tree. The sufficiency
6
of training data can be easily established by using a test/-
validation set, which is not used during the training phase.
If the accuracy of RDF on the training data is high but low
on the test/validation set, that means the generalization of
RDF is poor and more training data is required. RDF can
easily scale to hundreds of classes and can achieve quite a
good accuracy depending upon the separability of the data.
The training process of RDF is optimized and can be done
in a few minutes. Online retraining of RDF has also been
proposed and can be used to improve the learned classifier
based on the system response [44], [45].
5 PROPOSED AUTO-SCALING METHOD
An automatic multi-tier application scaling method needs
to address two critical problems. First, it has to identify an
appropriate time to initiate the scaling process. Second, it
has to scale specific tiers to resolve the bottleneck to satisfy
the response time SLO using minimal resources. In our
proposed method of multi-tier web application scaling, we
address both of these problems as explained below. Algo-
rithm 1 presents the overall proposed auto-scaling method.
5.1 The Scaling Trigger
The scaling trigger or ‘when to scale?’ decision can be per-
formed using a reactive or a predictive trigger as explained
below.
5.1.1 Reactive Trigger
A reactive scale-out trigger relies on specific application
metrics such as CPU, memory, I/O, bandwidth, and applica-
tion response time. Thresholds on one or more of these met-
rics can be used to trigger the scaling operations. We observe
that the application response time is the most appropriate
metric to trigger a scale-out decision. This is because of the
fact that the response time is one of the most important SLO
to be maintained for cloud-hosted web applications and also
it incurs low overhead compared to monitoring low-level
hardware utilization to trigger the scale-out decisions. The
reactive triggering occurs out whenever 95th percentile of
the response time exceeds a pre-defined threshold (τslo) for
a specific time interval.
5.1.2 Predictive Trigger
A predictive trigger relies on predicting in advance the
saturation of the application response time and initiating the
process to provision resources before the bottleneck occurs.
The predictive trigger has a definite advantage over the re-
active one because once a bottleneck occurs, the application
response time increases exponentially much more rapidly.
In predictive triggering, we predict that a bottleneck is
going to occur in the next interval and we start provisioning
resources in advance to avoid SLO violation.
Our proposed predictive triggering method for scale-out
consists of two main steps. First, we predict the arrival rate
of the application for the next time interval based on the
last tmonitoring intervals using the least squared regression
method. Then we apply a polynomial regression model to
predict the application response time using data of the last
tintervals. More specifically our arrival rate and response
time prediction methods are as follows.
Let the arrival rate for the last ktime windows, each
considered as an interval: at
t−k= [at−k·· ·at−2at−1at]>.
Assuming that the variations in the arrival rate are locally
linear, using the value for the last interval, we can estimate
the arrival rate for next interval as b
at+1 =αtat+βt, where
the regression parameters αtand βtare estimated over the
last kintervals:
1at−1
1at−2
.
.
..
.
.
1at−k
βt
αt=
at
at−1
.
.
.
at−k+1
(5)
Applying pseudo-inverse solution to find the parameters in
terms of minimum sum of least squares error:
βt
αt=
1at−1
1at−2
.
.
..
.
.
1at−k
>
1at−1
1at−2
.
.
..
.
.
1at−k
−1
1at−1
1at−2
.
.
..
.
.
1at−k
>
at
at−1
.
.
.
at−k+1
(6)
Using the learned regression parameters, we can estimate
the arrival rate in the next interval as
b
at+1 =αtat+βt(7)
For the response time estimation, we observe that ordinary
least square solution does not yield satisfactory results.
We therefore use polynomial regression for next interval
response time estimation using the next interval estimated
arrival rate. Response time at the t+ 1 interval, b
rt+1, can
be estimated from the estimated arrival rate, b
at+1 using the
polynomial given by
b
rt+1 =
n
X
x=0
αx(b
at+1)x(8)
where nis the order of the polynomial. The coefficients of
the polynomial are computed using pseudo-inverse solution
minimizing the sum of least squares over the last vintervals
αo
α1
.
.
.
αn
=hR>Ri−1R>
rt
rt−1
.
.
.
rt−k
(9)
where the matrix Ris defined as
R=
1ata2
t. . . an
t
1at−1a2
t−1. . . an
t−1
.
.
..
.
.
1at−ka2
t−k. . . an
t−k
(10)
If the predicted response time approaches the user de-
fined response time SLO threshold (b
rt+1 ≥τslo) then the
scale-out process is triggered.
5.2 Learning Scaling Decisions: What to Scale?
In Section 4 we have proposed a solution to predict the
required provisioning configuration satisfying the response
time SLO using Random Decision Forest (RDF) classifier.
The classifier is trained using historical access logs using
different different administrator-defined policies to build
the training set as shown in Table 1. The resulting data is
7
used to learn the classifier to predict the appropriate config-
uration. Each configuration has a particular set of associated
resources in terms of VMS to provision for each tier on scale-
up triggers. The policy to scale up is learned in the form
of an RDF based on the application’s arrival rate, response
time, and a label representing currently provisioned VMs
for each tier.
We prepare a state transition map (RDF Map) extracted
from the learned RDF. To prepare it, we linearly increase the
arrival rate of the application up to a specified maximum
and use a constant response time value to obtain the con-
figuration label representing the number of VMs required
for each tier to satisfy the given response time and arrival
rate. We keep increasing the number of requests but use the
same response time to identify the configurations at each
step. Whenever a new configuration is identified, it is stores
in the RDF Map. After we reach the maximum arrival rate
limit, we obtain a map which can be used to identify the next
configuration of the application by just using the current
configuration. This RDF Map will not be dependant on any
specific arrival rates and represents the application state
transitions extracted from the RDF learned using different
resource provisioning policies. Therefore, this RDF Map can
handle the performance variations of the VMs as every time
we can move to the next resources configuration which has
historically yielded better performance.
5.3 Overall Proposed Algorithm
Algorithm 1 presents the pseudocode for our proposed
auto-scaling method using the predictive scale-out trigger.
The algorithm accepts a time interval in seconds to profile
and monitor the proxy logs (ξ), user-defined response time
SLO threshold (τslo) used to identify SOL violation, a set
of initially provisioned VMs (Co
t1,t2,··· ,tn) represents the
current allocation to each tier of the application, and the
size of look-back window (k) used for predicting the arrival
rate and the response time.
In each iteration, it waits for ξseconds to monitor the
application access logs and computes the total number of
requests (arrival rate), and 95th percentile of response time
for all the requests received during the last ξseconds. These
steps in line 3-5 are repeated ktimes to collect enough
data required to predict the arrival rate and the response
time for the next interval. Once the current interval reaches
k, the algorithm predicts the arrival rate using Equation
(7) and then use this predicted arrival rate to predict the
application response time using Equation (8). If the pre-
dicted response time is greater than a user-specific response
time SLO threshold, then we obtain the next application-
configuration using RDF Map extracted from the learned
RDF as mentioned at line 10 and explained in Section 4.
Using the difference between the provisioned VMs and the
VMs of the next configuration, we identify the resources
required for each tier to update the current configuration
(line 11). Finally, the algorithm automatically provisions
these resources to each application tier (line 12).
In our experimental evaluation, we used ξ= 60 seconds,
τslo = 200 milliseconds, a two tier application with initially
1 VM allocated to each tier (Co
t1,t2,··· ,tn =Co
1,1), and k= 3
the Size of look-back window used for predicting the arrival
rate and the application response time.
Algorithm 1: Proposed Auto-scaling Method with
Predictive Trigger.
Input: ξ: Application monitoring time interval in
seconds
τslo : Response time SLO threshold
Co
t1,t2,··· ,tn : Set of initially provisioned VMs
k: Size of look-back window for prediction
Output: Ct+1
t1,t2,··· ,tn : Updated set of provisioned
resources
1t←1
2while true do
3Wait for ξseconds and monitor the application
access logs
4at←compute the arrival rate for the current
interval
5rt←compute the 95th percentile of response
time for the current interval
6if t>kthen
7b
at+1 ←Predict the arrival rate using
Equation (7)
8b
rt+1 ←Predict the response time using
Equation (8)
9if b
rt+1 > τslo (SLO Violation) then
10 C(t+1)
t1,t2,··· ,tn ←Next Configuration from
RDF Map
11 C(∆t1,∆t2,·· · ,∆tn) =
C(t+1)
t1,t2,··· ,tn −Ct
t1,t2,··· ,tn (identify more
resources to provision)
12 Provision (∆t1,∆t2,··· ,∆tn) more VMs
for each tier
13 Ct
t1,t2,··· ,tn =C(t+1)
t1,t2,··· ,tn
14 end
15 end
16 t←t+ 1
17 end
6 EXPERIMENTAL SETUP AND DESIGN
6.1 Benchmark Web Application
We used RUBiS [17] as a sample web application to evaluate
our proposed auto-scaling method. It is an open-source
benchmark web application widely used for experimental
evaluations. RUBiS is an auctions application similar to
eBay. It enables users to browse, bid, buy, and sell items.
Three user roles, buyer, seller, and visitor are available in the
RUBiS. The visitor role allows users to browse items without
creating a user account in the application. We used the PHP
and MySQL multi-tier implementation of the RUBiS for our
experimental evaluation.
6.2 Workload Generation
We used httperf [46], an open-source HTTP load generation
tool, to generate the workload for the sample benchmark
web application. Our workload generation method em-
ulates an increasing workload behavior which generates
HTTP traffic for a specific duration with a required number
of concurrent user sessions per second in a step-up fashion.
8
Fig. 6: Test-bed cloud infrastructure used for multi-tier web
application auto-scaling experiments.
A user session represents a visitor that browse items avail-
able for auctions in different categories and geographical
regions. Each user generates 22 HTTP requests for the
benchmark web application. We started the workload gener-
ation with 40 concurrent users per second and then increase
the number of concurrent users after ever 60 seconds. We
generated this workload for 55 minutes.
6.3 Testbed Infrastructure
Amazon Web Services (AWS) offer cloud data-centers in
multiple geographical locations around the world. We used
AWS us-east-2 data-center for experimental evaluations.
Figure 6 shows the experimental infrastructure used during
the experiments. We use AWS general purpose on-demand
T2 VM instances for different roles. For example, a dedi-
cated EC2 t2.large VM instance is used to generate the
workload for the application. Another VM instance of type
t2.medium is used as a load balancer for the web tier.
We used a set of dynamically provisioned EC2 instances
of type t2.small for the web tier. A dedicated VM in-
stance of type t2.medium is used as a load balancer for
database tier. Another set of dynamically provisioned EC2
instances of type t2.small is used for database tier. The
dynamically provisioned pools for the web and database
tiers always contain at least one VM instance. Auto-scaling
used in different experiments control the number of VMs
in these dynamically provisioned pools. We never observed
any resource saturation for the workload generator, the web
tier load balancer, and the database tier load balancer VMs
during all of the experiments. The t2.large VM instance
provides 2 vCPU with 8 GB memory, the t2.medium
VM instance provides 2 vCPU with 4 GB memory, and
the t2.small VM instance provides 1 vCPU with 2 GB
memory.
The performance variations are minimal in the larger
type of VM instances, while the smaller instances are more
prone to the performance variations [10]. Therefore, in our
experimental testbed, we used the medium and small type
of VM instances.
6.4 Experimental Details
We performed four different experiments using different
methods for auto-scaling multi-tier web applications on
AWS cloud. We repeated each experimented three times.
Table 2 briefly explains each experiment.
In Experiment 1, we used a method similar to that of
Urgaonkar et al. [18] which dynamically scale-out both tiers
whenever 95th percentile of the application response time
exceeds the SLO threshold. We consider this as a Baseline
Auto-Scaling (BAS) method.
In Experiment 2, we developed a scaling policy based
on a Random Decision Forest (RDF) classifier mapping
between the arriving rate and the number VMs required
for each tier to satisfy the response time SLO. The RDF
classier was trained based on the administrator-defined
scaling policies dataset. At every time interval, we decide
the required resources to be provisioned based on the arrival
rate and the learned RDF classifier. We name this experiment
as Policy based Auto-Scaling (PAS) method.
In Experiment 3, we use our proposed auto-scaling
method with the Reactive Trigger (ART) to initiate the scal-
ing process. The proposed method dynamically provisions
the required resources for each tier using the next appli-
cation configuration obtained from the RDF Map extracted
from the learned RDF as explained in Section 4. The method
initiates the process of scaling whenever 95th percentile of
the application response time exceeds the SLO threshold.
In Experiment 4, we use our proposed auto-scaling
method with Predictive Trigger (APT) to initiate the scaling
process. The proposed method dynamically provisions the
required resources for each tier using the next application
configuration obtained from the RDF Map extracted from
the learned RDF as explained in Section 4. The method
triggers the scaling process whenever our the predicted
response time approaches the response time SLO threshold.
7 RE SU LTS AND EVAL UATIONS
7.1 Experiment 1: Baseline Auto-Scaling (BAS) Method
Figure 7 shows the 95th percentile response time, the
throughput, and the dynamic allocation of VMs during
the Experiment 1 using BAS method. We repeated the ex-
periment three times under the same increasing workload.
Iteration 1 of this experiment shows the first response time
SLO violation at the 9th time interval, iteration 2 shows the
first SLO violation at the 19th time interval, and iteration
3 shows the first violation at the 24th time interval. At
every SLO violation, the system automatically scales-up
both tiers and the response time restores to the acceptable
level. Whenever we observe a response time saturation,
9
TABLE 2: Summary of conducted experiments
Experiment/Method Description
Experiment 1: Baseline Auto-
Scaling (BAS) method
Dynamically scale-out both tiers whenever 95th percentile of the application response time exceeds the
SLO threshold.
Experiment 2: Policy-based
Auto-Scaling (PAS) using RDF
Dynamically scale-out specific tiers using a scaling policy based on a Random Decision Forest (RDF)
classifier mapping between the arriving rate and the number VMs required for each tier to satisfy the
response time SLO.
Experiment 3: Proposed Auto-
Scaling Reactive Trigger (ART)
Dynamically scale-out specific tiers using the proposed method to decide the appropriate VMs required
whenever 95th percentile of the application response time exceeds the SLO threshold.
Experiment 4: Proposed Auto-
Scaling Predictive Trigger (APT)
Dynamically scale-out specific tiers using the proposed method to decide the appropriate VMs required
whenever the system predicts that the application response time will exceed the SLO threshold.
0
200
400
600
800
1000
1200
1400
1600
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-1
Iteration-2
Iteration-3
SLO Threashold
0
50
100
150
200
250
0 5 10 15 20 25 30 35 40 45 50 55
Requests/m * 1000
Iteration-1
Iteration-2
Iteration-3
0
2
4
6
8
10
0 5 10 15 20 25 30 35 40 45 50 55
Number of VMs
Time (minutes)
Iteration-1 (Web and DB VMS)
Iteration-2 (Web and DB VMS)
Iteration-3 (Web and DB VMS)
Fig. 7: Experiment 1 results using Baseline Auto-Scaling
(BAS) method.
the application throughput also decreases. However, after
the dynamic provisioning of more VMs, the application
throughput also starts growing linearly. In iteration 1, a
total of 10 VMs were dynamically provisioned where five
VMs were allocated to the web tier and five VMs were
allocated to the DB tier. In iteration 2, a total of six VMs were
dynamically provisioned where three VMs were addeded to
each tier. In iteration 3, a total of four VMs were dynamically
provisioned where both tiers were allocated two VMs.
We observe different performance in the three itera-
tions of this experiment mainly due to the performance
varying behavior of VMs. However, the baseline scaling
method appropriately provisioned the resources to restore
the desired application response time. Iteration 3 gives the
best performance with minimal SLO violations while using
minimal number of VMs. Also, it yielded approximately a
linear growth in the application throughput. This shows that
the VMs allocated in iteration 3 performed better than VMs
allocated in other iterations.
0
500
1000
1500
2000
2500
3000
3500
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-1
Iteration-2
Iteration-3
SLO Threashold
0
50
100
150
200
250
0 5 10 15 20 25 30 35 40 45 50 55
Requests/m * 1000
Iteration-1
Iteration-2
Iteration-3
0
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35 40 45 50 55
Number of VMs
Time (minutes)
Web VMs for all iteration
DB VMs for all iteration
Fig. 8: Experiment 2 results using Policy-based Auto-Scaling
(PAS) method.
7.2 Experiment 2: Policy-based Auto-Scaling (PAS) Us-
ing RDF
Figure 8 shows the 95th percentile response time, the
throughput, and the dynamic allocation of VMs during ex-
periment 2 using PAS method. We repeated the experiment
three times under the same increasing workload. Iteration
1 and 2 of this experiment showed the first saturation at
the 22nd time interval while iteration 3 showed the first
saturation at the 14th time interval. The system scales the
tiers using the learned policy based on the number of re-
quests. The scaling decision in all three iterations remained
the same as the resource provisioning policy only depends
on the number of requests to predict the required number
of VMs for each tier. Whenever we observe a response time
saturation the application throughput also decreases. How-
ever, once more VMs were provisioned, then the through-
put also starts increasing linearly. In each iteration of this
experiment, a total of 9 VMs were allocated where 5 VMs
were allocated to the web tier and 4 VMs to the DB tier. We
observed the best performance in the second iteration of this
experiment which shows approximately a linear growth in
10
0
500
1000
1500
2000
2500
3000
3500
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-1
Iteration-2
Iteration-3
SLO Threashold
0
50
100
150
200
250
0 5 10 15 20 25 30 35 40 45 50 55
Requests/m * 1000
Iteration-1
Iteration-2
Iteration-3
0
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35 40 45 50 55
Number of VMs
Time (minutes)
Iteration-1 (Web VMs)
Iteration-2 (Web VMs)
Iteration-3 (Web VMs)
Iteration-1 (DB VMs)
Iteration-2 (DB VMs)
Iteration-3 (DB VMs)
Fig. 9: Experiment 3 results using the proposed Auto-Scaling
with Reactive Trigger (ART) method.
the throughput and a minimal response time SLO violations.
7.3 Experiment 3: Proposed Auto-Scaling with Reac-
tive Trigger (ART)
Figure 9 shows the 95th percentile response time, the
throughput, and the dynamic allocation of VMs during ex-
periment 3 using ART method. We repeated the experiment
three times under the same workload. Iteration 1 of this
experiment showed the first saturation at the 15th, iteration
2 at the 21st, and iteration 3 at the 14th time intervals. The
system automatically scales-up the application according to
the learned policy from the classifier on every saturation.
Whenever we observe a response time saturation the appli-
cation throughput also decreases. However, the proposed
auto-scaling method provisioned appropriate resources and
then the application throughput starts increasing linearly.
In each iteration of this experiment, a total of 9 VMs were
allocated where 5 were allocated to the web tier and 4 to
the DB tier. However, the time to provision VMs variate in
all three experiments for both tiers. We observed the best
performance in the third iteration of this experiment which
showed approximately a linear growth in the throughput
and a minimal response time SLO violations compared to
the other iterations.
7.4 Experiment 4: Proposed Auto-Scaling with Predic-
tive Trigger (APT)
Figure 10 shows the 95th percentile response time, the
throughput, and the dynamic allocation of VMs during
experiment 4 using the APT method. We repeated the ex-
periment three times under the same workload. Iteration
1 showed the first saturation at the 12th, iteration 2 at the
0
500
1000
1500
2000
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-1
Iteration-2
Iteration-3
SLO Threashold
0
50
100
150
200
250
0 5 10 15 20 25 30 35 40 45 50 55
Requests/m * 1000
Iteration-1
Iteration-2
Iteration-3
0
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35 40 45 50 55
Number of VMs
Time (minutes)
Iteration-1 (Web VMs)
Iteration-2 (Web VMs)
Iteration-3 (Web VMs)
Iteration-1 (DB VMs)
Iteration-2 (DB VMs)
Iteration-3 (DB VMs)
Fig. 10: Experiment 4 results using the proposed Auto-
Scaling with Predictive Trigger (APT) method.
20th, and iteration 3 at the 15th time intervals. Whenever
the predicted response time approached the response time
SLO threshold the system automatically scale-up the appli-
cation according to the proposed auto-scaling method. We
observed fewer SLO violations in all iterations compared to
the other experiments. In iteration 1, a total of 9 VMs were
dynamically provisioned where 5 VMs were provisioned
to the web tier and four to the DB tier. In iterations 2
and 3, a total of 5 VMs were dynamically provisioned, 3
VMs to the web tier and 2 VMs to the DB tier. The best
performance was observed during the third iteration, which
showed approximately a linear growth in the throughput
and a minimal response time SLO violations.
Figure 11 shows the actual vs. the predicted response
time in experiment 4 for all three iterations using APT
method. The Root Mean Squared Error (RMSE) for the
predicted and actual response times are 459.85, 260.37, and
218.08 for iteration 1, iteration 2, and iteration 3 respectively.
7.5 Experimental Summary
To compare the performance of all methods included in this
study, we compute the response time SLO violations, the
total VMs provisioned, the maximum throughput achieved,
the total served requests, and the total rejected requests
for all three iterations of each auto-scaling method. The
comparison is shown in Table 3. The minimum response
time SLO violations are expected from the best auto-scaling
method. Our proposed APT method using a predictive
trigger, has shown only 1.27% requests missing the response
time SLO. Thus our proposed method is 40.93% better than
the baseline auto-scaling (BAS), which is the second best-
performing method for SLO violations.
11
TABLE 3: Experimental results summary. Percentage of SLO violations, maximum achieved throughput, total served
requests, total rejected requests, and total allocated VMs for all three iterations of experiments 1, 2, 3, and 4
Method Iteration SLO Violation (%) Max Throughput Total Completions Total Rejections Total VMs
Exp 1: BAS
1 3.74 200,572 7,873,375 14,125 10
2 1.69 237,825 8,051,573 7,561 06
3 1.03 233,614 8,226,283 5,496 06
Average 2.15 224,003 8,050,410 9,060 7.33
Exp 2: PAS
1 15.17 196,143 7,631,841 23,119 09
2 3.94 231,537 7,955,545 14,682 09
3 18.28 207,195 7,146,180 36,209 09
Average 12.46 211,625 7,577,855 24,670 9.00
Exp 3: ART
1 12.84 187,386 6,848,224 18,670 09
2 7.95 214,316 7,875,660 13,073 09
3 5.09 220,256 7,612,900 18340 09
Average 8.63 207,319 7,445,594 16,694 9.00
Exp 4: APT
1 2.14 212,342 797,0951 11,977 09
2 1.63 239531 8,207,207 5,214 05
3 0.05 241676 8,467,888 1,820 05
Average 1.27 231,183 8,215,348 6,337 6.33
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-1
Actual Response Time
Predicted Response Time
SLO Threashold
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-2
Actual Response Time
Predicted Response Time
SLO Threashold
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40 45 50 55
Response Time (ms)
Iteration-3
Actual Response Time
Predicted Response Time
SLO Threashold
Fig. 11: Actual vs. predicted response times for each moni-
toring interval for all three iterations during Experiment 4.
A method yielding the maximum throughput is con-
sidered the best as it can serve more concurrent requests
compared to the other methods. Our proposed APT method
using a predictive trigger have processed 231,183 requests
per minute which is 3.1% better than the baseline auto-
scaling (BAS), which is the second best-performing method
for the application throughput. The total number of requests
served using our proposed method is 8,215,348 requests
which is more significant than all other methods. These are
2.05% better than the baseline method which is the second-
best performing in the total number of requests served.
The total number of rejected requests should be min-
imum for the best performing auto-scaling method. The
proposed APT method using a predictive trigger only re-
jected 6,337 requests which is 30.05% better than the next
competing method.
Our proposed APT method using a predictive trigger
only used 6.33 VMs on average over the three iterations
of the experiment. These are 13.64% less VMs compared to
the baseline method performing which is the second-best in
minimizing the number of VMs. Thus our proposed method
is more cost-effective compared to all other methods.
The proposed APT method using a predictive trigger
outperformed all other methods by yielding the minimum
SLO violations, the maximum throughput, and the maxi-
mum served requests while using the minimum number of
VMs compared to all other auto-scaling methods included
in this study. All of these results demonstrate the merits of
our proposed approach.
Though the experimental evaluations are carried out
using two-tier systems, the proposed autoscaling method
is generic and can be easily applied to n-tier systems. The
system requires to be trained on a large number of tiers; in
this case, the acquisition of training data for identifying state
transition map will require relatively more time. Increasing
the tiers will cause an increase in the number of classes
in our context. RDF is a multi-class classifier which can
easily handle a large number of classes. Therefore, we do
12
not observe any limitation of RDF to handle more number
of configurations required for complex systems.
VMs running on the cloud infrastructure show per-
formance variations due to reasons including virtualiza-
tion overhead, resource contention, multi-tenancy, and het-
erogeneous physical infrastructure. Learning autoscaling
policies under this situation becomes complicated. There-
fore, our proposed solution, as shown in Experiment
4 (APT), can help to autoscale the applications under
performance-varying VMs gracefully. However, identifying
initial administrator-defined scaling policies are required
to collect sufficient application access logs to train state
transition configuration map. The training process can be
improved by employing reinforcement learning to explore
different policies automatically.
8 CONCLUSION AND FUTURE WORK
Designing new methods to automatically scale cloud-
applications using minimal resources to satisfy the desired
response time requirements is an active and challenging
research topic. The cloud computing features such as on-
demand resource provisioning and pay-per-use model at-
tract application owners to use the cloud to deliver good
application performance with minimal resources to reduce
the operational cost. However, to choose an appropriate
resource management method is always a difficult task
for multi-tier applications. The task turns even harder as
it requires application domain knowledge and it needs
to effectively handle the performance varying cloud VMs
hosting the application.
In the current work, we address this problem by propos-
ing an approach to dynamically provision multi-tier appli-
cations based on a state transition map extracted from a
predictive classifier trained using some initial administrator-
defined policies. Our proposed method performs auto-
scaling using recent workload behavior to predict the ap-
plication performance. If the predicted response time ap-
proaches the user-defined response time SLO threshold,
then the scale-out process is triggered. We use the learned
application state transition map (RDF Map) to decide the
next resource provisioning configuration. The proposed
method works gracefully in the presence of performance
varying cloud VMs as it does not learn the scaling decisions
based on the arrival rate and the performance of specific VM
instances. Instead, it decides the next provisioning configu-
ration based on a map of optimal configurations extracted
from a predictive classifier trained under similar run-time
conditions using administrator-defined provisioning poli-
cies. The next provisioning configuration is computed by
the map (extracted from predictive classifier) based solely
on the current provisioning configuration (i.e., the current
number of VMs per tier).
Our experimental evaluation shows that the proposed
auto-scaling method using a predictive trigger (APT) out-
performed all other methods, including the baseline and the
traditional machine learning approach. It produced the best
results for minimizing the response time SLO violations,
maximizing the application throughput, and completions
while reducing the number of VMs used.
We are currently investigating to replace/complement
the administrator-defined policies required for training the
RDF classifier with reinforcement learning methods. More-
over, the integration of a comprehensive workload modeling
and analytic study with the proposed autoscaling method is
also a promising future direction.
ACKNOWLEDGEMENT
This work was made possible by NPRP grant # 9-224-1-
049 from the Qatar National Research Fund (a member of
Qatar Foundation). The statements made herein are solely
the responsibility of the authors.
REFERENCES
[1] L. Yazdanov and C. Fetzer, “Lightweight automatic resource scal-
ing for multi-tier web applications,” in Cloud Computing (CLOUD),
2014 IEEE 7th International Conference on. IEEE, 2014, pp. 466–473.
[2] W. Iqbal, M. Dailey, and D. Carrera, “Unsupervised learning of
dynamic resource provisioning policies for cloud-hosted multitier
web applications,” IEEE Sys J., vol. 10, no. 4, pp. 1435–1446, 2016.
[3] P. Bodik, R. Griffith, C. Sutton, A. Fox, M. I. Jordan, and D. A.
Patterson, “Automatic exploration of datacenter performance
regimes,” in Workshop on Automated control for datacenters and
clouds, ser. ACDC ’09. ACM, 2009, pp. 1–6.
[4] M. Amiri and L. Mohammad-Khanli, “Survey on prediction mod-
els of applications for resources provisioning in cloud,” Journal of
Network and Computer Applications, vol. 82, pp. 93–113, 2017.
[5] W. Iqbal, M. N. Dailey, D. Carrera, and P. Janecek, “Adaptive
resource provisioning for read intensive multi-tier applications in
the cloud,” Future Gen Comp Sys, vol. 27, no. 6, pp. 871–879, 2011.
[6] R. Han, M. M. Ghanem, L. Guo, Y. Guo, and M. Osmond,
“Enabling cost-aware and adaptive elasticity of multi-tier cloud
applications,” Future Gen Comp Sys, vol. 32, pp. 82–98, 2014.
[7] F. Xu, F. Liu, H. Jin, and A. Vasilakos, “Managing performance
overhead of virtual machines in cloud computing: A survey, state
of the art, and future directions,” Proceedings of the IEEE, vol. 102,
no. 1, pp. 11–31, 2014.
[8] F. Xu, F. Liu, and H. Jin, “Heterogeneity and interference-aware
virtual machine provisioning for predictable performance in the
cloud,” IEEE Trans. Computers, vol. 65, no. 8, pp. 2470–2483, 2016.
[9] T. Chen and R. Bahsoon, “Self-adaptive and online QoS modeling
for cloud-based software services,” IEEE Transactions on Software
Engineering, vol. 43, no. 5, pp. 453–475, 2017.
[10] O. Adam, Y. Lee, and A. Zomaya, “Constructing performance-
predictable clusters with performance-varying resources of
clouds,” IEEE Trans Computers, vol. 65, no. 9, pp. 2709–2724, 2016.
[11] Z. Ou, H. Zhuang, J. K. Nurminen, A. Yl¨
a-J¨
a¨
aski, and P. Hui,
“Exploiting hardware heterogeneity within the same instance type
of amazon ec2.” in HotCloud, 2012.
[12] S. Amri, H. Hamdi, and Z. Brahmi, “Inter-vm interference in cloud
environments: A survey,” in Computer Systems and Applications
(AICCSA), International Conference on. IEEE, 2017, pp. 154–159.
[13] M. Wajahat, A. Gandhi, A. Karve, and A. Kochut, “Using machine
learning for black-box autoscaling,” in Green and Sustainable Com-
puting Conference (IGSC0). IEEE, 2016, pp. 1–8.
[14] K. Kim, W. Wang, Y. Qi, and M. Humphrey, “Empirical evaluation
of workload forecasting techniques for predictive cloud resource
scaling,” in Int. Conf on Cloud Computing. IEEE, 2016, pp. 1–10.
[15] X. Bu, J. Rao, and C.-Z. Xu, “A reinforcement learning approach to
online web systems auto-configuration,” in Int. Conf on Distributed
Computing Systems, ser. ICDCS ’09. IEEE, 2009, pp. 2–11.
[16] D. Huang, B. He, and C. Miao, “A survey of resource management
in multi-tier web applications,” IEEE Communications Surveys &
Tutorials, vol. 16, no. 3, pp. 1574–1590, 2014.
[17] OW2 Consortium, “RUBiS: An auction site prototype,” 1999, http:
//rubis.ow2.org/.
[18] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and T. Wood,
“Agile dynamic provisioning of multi-tier internet applications,”
ACM Transactions on Autonomous and Adaptive Systems, vol. 3, no. 1,
pp. 1–39, 2008.
13
[19] T. K. Ho, “Random decision forests,” in proceedings of the third
international conference on Document analysis and recognition, 1995.,
vol. 1. IEEE, 1995, pp. 278–282.
[20] P. Bodik, R. Griffith, C. Sutton, A. Fox, M. Jordan, and D. Patterson,
“Statistical machine learning makes automatic control practical for
internet datacenters,” in HotCloud’09: Proceedings of the Workshop on
Hotp Topics in Cloud Computnig, 2009.
[21] H. Liu and S. Wee, “Web server farm in the cloud: Performance
evaluation and dynamic architecture,” in CloudCom ’09: Proceed-
ings of the 1st International Conference on Cloud Computing. Berlin,
Heidelberg: Springer-Verlag, 2009, pp. 369–380.
[22] M. T. Krieger, O. Torreno, O. Trelles, and D. Kranzlm¨
uller, “Build-
ing an open source cloud environment with auto-scaling resources
for executing bioinformatics and biomedical workflows,” Future
Generation Computer Systems, vol. 67, pp. 329–340, 2017.
[23] V. A. Farias, F. R. Sousa, J. G. R. Maia, J. P. P. Gomes, and J. C.
Machado, “Regression based performance modeling and provi-
sioning for nosql cloud databases,” Future Generation Computer
Systems, vol. 79, pp. 72–81, 2018.
[24] M. Wajahat, A. Karve, A. Kochut, and A. Gandhi, “Mlscale: A ma-
chine learning based application-agnostic autoscaler,” Sustainable
Computing: Informatics and Systems, 2017, 2017.
[25] V. Persico, D. Grimaldi, A. Pescap `
e, A. Salvi, and S. Santini,
“A fuzzy approach based on heterogeneous metrics for scaling
out public clouds,” IEEE Transactions on Parallel and Distributed
Systems, vol. 28, no. 8, pp. 2117–2130, Aug 2017.
[26] A. Ilyushkin, A. Ali-Eldin, N. Herbst, A. V. Papadopoulos, B. Ghit,
D. Epema, and A. Iosup, “An experimental performance evalua-
tion of autoscaling policies for complex workflows,” in Proceedings
of the 8th ACM/SPEC on International Conference on Performance
Engineering. New York, NY, USA: ACM, 2017, pp. 75– 86.
[27] K. Salah, P. Calyam, and R. Boutaba, “Analytical model for elastic
scaling of cloud-based firewalls,” IEEE Transactions on Network and
Service Management, vol. 14, no. 1, pp. 136–146, 2017.
[28] T. Lorido-Botran, J. Miguel-Alonso, and J. A. Lozano, “A review of
auto-scaling techniques for elastic applications in cloud environ-
ments,” J. of grid computing, vol. 12, no. 4, pp. 559–592, 2014.
[29] B. Jennings and R. Stadler, “Resource management in clouds:
Survey and research challenges,” Journal of Network and Systems
Management, vol. 23, no. 3, pp. 567–619, 2015.
[30] S. Wu, B. Li, X. Wang, and H. Jin, “Hybridscaler: Handling
bursting workload for multi-tier web applications in cloud,” in
Parallel and Distributed Computing (ISPDC), 2016 15th International
Symposium on. IEEE, 2016, pp. 141–148.
[31] C. Qu, R. N. Calheiros, and R. Buyya, “A reliable and cost-
efficient auto-scaling system for web applications using heteroge-
neous spot instances,” Journal of Network and Computer Applications,
vol. 65, pp. 167–180, 2016.
[32] A. Ashraf, B. Byholm, and I. Porres, “Prediction-based vm pro-
visioning and admission control for multi-tier web applications,”
Journal of Cloud Computing, vol. 5, no. 1, p. 15, 2016.
[33] J. Dejun, G. Pierre, and C.-H. Chi, “Resource provisioning of
web applications in heterogeneous clouds,” in Proceedings of the
2nd USENIX conference on Web application development. USENIX
Association, 2011, pp. 5–5.
[34] B. Urgaonkar, G. Pacifici, P. Shenoy, M. Spreitzer, and A. Tantawi,
“An analytical model for multi-tier internet services and its ap-
plications,” in ACM SIGMETRICS Performance Evaluation Review,
vol. 33, no. 1. ACM, 2005, pp. 291–302.
[35] Q. Zhang, L. Cherkasova, and E. Smirni, “A regression-based
analytic model for dynamic resource provisioning of multi-tier
applications,” in Autonomic Computing, 2007. ICAC’07. Fourth In-
ternational Conference on. IEEE, 2007, pp. 27–27.
[36] J. Rao and C.-Z. Xu, “Online capacity identification of multitier
websites using hardware performance counters,” IEEE Trans on
Parallel and Distributed Systems, vol. 22, no. 3, pp. 426–438, 2011.
[37] O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and
M. Zhang, “Cherrypick: Adaptively unearthing the best cloud
configurations for big data analytics.” in NSDI, 2017, pp. 469–482.
[38] Y. Jiang, L. R. Sivalingam, S. Nath, and R. Govindan, “Webperf:
Evaluating what-if scenarios for cloud-hosted web applications,”
in Proceedings of the 2016 conference on ACM SIGCOMM 2016
Conference. ACM, 2016, pp. 258–271.
[39] R. Singh, P. Shenoy, M. Natu, V. Sadaphal, and H. Vin, “Analyt-
ical modeling for what-if analysis in complex cloud computing
applications,” SIGMETRICS Perf Eval Rev, vol. 40, pp. 53–62, 2013.
[40] S. M. R. Nouri, H. Li, S. Venugopal, W. Guo, M. He, and W. Tian,
“Autonomic decentralized elasticity based on a reinforcement
learning controller for cloud applications,” Future Generation Com-
puter Systems, vol. 94, pp. 765–780, 2019.
[41] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio,
A. Blake, M. Cook, and R. Moore, “Real-time human pose recog-
nition in parts from single depth images,” Communications of the
ACM, vol. 56, no. 1, pp. 116–124, 2013.
[42] J. R. Quinlan, “Induction of decision trees,” Machine learning,
vol. 1, no. 1, pp. 81–106, 1986.
[43] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
5–32, 2001.
[44] A. Saffari, C. Leistner, J. Santner, M. Godec, and H. Bischof, “On-
line random forests,” in Int. conference on computer vision workshops,
ICCV. IEEE, 2009, pp. 1393–1400.
[45] B. Lakshminarayanan, D. M. Roy, and Y. W. Teh, “Mondrian
forests: Efficient online random forests,” in Advances in neural
information processing systems, 2014, pp. 3140–3148.
[46] D. Mosberger and T. Jin, “httperf: A tool for measuring Web server
performance,” in In First Workshop on Internet Server Performance.
ACM, 1998, pp. 59–67.
Waheed Iqbal is an Assistant Professor at Pun-
jab University College of Information Technol-
ogy, University of the Punjab, Lahore, Pakistan.
He worked as a Postdoc researcher with the
Department of Computer Science and Engi-
neering, Qatar University during 2017–2018. His
research interests lie in cloud computing, dis-
tributed systems, machine learning, and large
scale system performance evaluation. Waheed
received his Ph.D. degree from the Asian Insti-
tute of Technology, Thailand.
Abdelkarim Erradi is an Assistant Professor
in the Computer Science and Engineering De-
partment at Qatar University. His research and
development activities and interests focus on au-
tonomic computing, self-managing systems and
cybersecurity. He leads several funded research
projects in these areas. He has authored several
scientific papers in international conferences
and journals. He received his Ph.D. in computer
science from the University of New South Wales,
Sydney, Australia. Besides his academic experi-
ence, he possesses 12 years professional experience as a Designer
and a Developer of large scale enterprise applications.
Muhammad Abdullah received M.Phil and BS
in Computer Science degrees from University of
the Punjab, Lahore, Pakistan in 2016 and 2014
respectively. He is currently pursuing Ph.D. in
computer science from University of the Pun-
jab, Lahore, Pakistan. His research interests in-
clude cloud computing, machine learning, scal-
able applications, and system performance man-
agement.
Arif Mahmood is an Associate Professor with
the Department of Computer Science, Infor-
mation Technology University, Pakistan. He re-
ceived his Masters and PhD degrees in Com-
puter Science from Lahore University of Man-
agement Sciences in 2003 and 2011 respec-
tively with Gold Medal and academic distinc-
tion. He also worked as Postdoc researcher
with Qatar University and as Research Assistant
Professor with the School of Mathematics and
Statistics, and with the school of Computer Sci-
ence and Software Engineering, the University of the Western Australia
(UWA). His research interests are in Computer Vision and Machine
Learning areas. Currently he is interested in the applications of Machine
Learning in Cloud and Fog Computing areas.