ArticlePDF Available

Proactive Workload Management in Hybrid Cloud Computing


Abstract and Figures

The hindrances to the adoption of public cloud computing services include service reliability, data security and privacy, regulation compliant requirements, and so on. To address those concerns, we propose a hybrid cloud computing model which users may adopt as a viable and cost-saving methodology to make the best use of public cloud services along with their privately-owned (legacy) data centers. As the core of this hybrid cloud computing model, an intelligent workload factoring service is designed for proactive workload management. It enables federation between on- and off-premise infrastructures for hosting Internet-based applications, and the intelligence lies in the explicit segregation of base workload and flash crowd workload, the two naturally different components composing the application workload. The core technology of the intelligent workload factoring service is a fast frequent data item detection algorithm, which enables factoring incoming requests not only on volume but also on data content, upon a changing application data popularity. Through analysis and extensive evaluation with real-trace driven simulations and experiments on a hybrid testbed consisting of local computing platform and Amazon Cloud service platform, we showed that the proactive workload management technology can enable reliable workload prediction in the base workload zone (with simple statistical methods), achieve resource efficiency (e.g., 78% higher server capacity than that in base workload zone) and reduce data cache/replication overhead (up to two orders of magnitude) in the flash crowd workload zone, and react fast (with an X^2 speed-up factor) to the changing application data popularity upon the arrival of load spikes.
Content may be subject to copyright.
Proactive Workload Management in
Hybrid Cloud Computing
Hui Zhang, Guofei Jiang, Kenji Yoshihira, and Haifeng Chen
Abstract—The hindrances to the adoption of public cloud
computing services include service reliability, data security and
privacy, regulation compliant requirements, and so on. To ad-
dress those concerns, we propose a hybrid cloud computing model
which users may adopt as a viable and cost-saving methodology
to make the best use of public cloud services along with their
privately-owned (legacy) data centers.
As the core of this hybrid cloud computing model, an in-
telligent workload factoring service is designed for proactive
workload management. It enables federation between on- and
off-premise infrastructures for hosting Internet-based applica-
tions, and the intelligence lies in the explicit segregation of
base workload and flash crowd workload, the two naturally
different components composing the application workload. The
core technology of the intelligent workload factoring service is
a fast frequent data item detection algorithm, which enables
factoring incoming requests not only on volume but also on data
content, upon a changing application data popularity.
Through analysis and extensive evaluation with real-trace
driven simulations and experiments on a hybrid testbed con-
sisting of local computing platform and Amazon Cloud service
platform, we showed that the proactive workload management
technology can enable reliable workload prediction in the base
workload zone (with simple statistical methods), achieve resource
efficiency (e.g., 78% higher server capacity than that in base
workload zone) and reduce data cache/replication overhead (up
to two orders of magnitude) in the flash crowd workload zone,
and react fast (with an X2speed-up factor) to the changing
application data popularity upon the arrival of load spikes.
Index Terms—Cloud computing, hybrid cloud, workload man-
agement, algorithms, load balancing.
CLOUD Computing, known either as online services such
as Amazon AWS [1] and Google App Engine [2], or a
technology portfolio behind such services, features a shared
elastic computing infrastructure hosting multiple applications
where IT management complexity is hidden and resource
multiplexing leads to efficiency; more computing resources
can be allocated on demand to an application when its current
workload incurs more resource demand than it was allocated.
Despite the advantages at management simplification and
pay-per-use utility model, Cloud Computing remains in doubt
regarding enterprise IT adoption. The concerns on the cur-
rent cloud computing services include service availability &
reliability, lack of Service Level Agreements, customer data
Manuscript received January 31, 2013; revised October 3 and November 26,
2013. The special issue guest editors coordinating the review of this paper and
approving it for publication were G. Martinez, R. Campbell, and J. Alcaraz
The authors are with NEC Laboratories America, Princeton, NJ 08540 (e-
mail: {huizhang, gfj, kenji, haifeng}
Digital Object Identifier 10.1109/TNSM.2013.122313.130448
security & privacy, government compliance regulation require-
ments, and more 1. For example, Payment Card Industry Data
Security Standard (PCI-DSS) audit is required for e-commerce
systems with payment card involved, and the auditors need a
clear physical demonstration on server infrastructure, software
configuration and network deployment; the outages of the
state-of-the-art Amazon cloud services (e.g., the S3 service
outage on February 5, 2008 [4]) refresh concerns with the
ballyhooed approach of any fully cloud-based computing
This paper proposes a hybrid cloud computing model which
enterprise IT customers can base to design and plan their
computing platform for hosting Internet-based applications
with highly dynamic workload. The hybrid cloud computing
model features a two-zone system architecture where the two
naturally different components in the aggregated workload of
an Internet-based application, base load and flash crowd load,
are explicitly separated for individual management. Base load
refers to the smaller and smoother workload experienced by
the application all the time, while flash crowd load refers to
the much larger but transient load spikes experienced at rare
time (e.g., the 5%-percentile heavy load time). The base load
platform can be setup and managed in the (small) enterprise
data center with the expectation of effective planning and
high utilization, while the flash crowd load platform can be
provisioned on demand through a cloud service by taking
advantage of the elastic nature of the cloud infrastructure.
An intelligent workload factoring service is designed as an
enabling technology of the hybrid cloud computing model.
Its basic function is to split the workload into two parts
upon (unpredictable) load spikes, and assures that the base
load part remains within plan in volume, and the flash crowd
load part incurs minimal cache/replication demand on the
application data required by it. This simplifies the system
architecture for the flash crowd load zone and significantly
increases the server performance within it. As for the base load
zone, workload dynamics are reduced significantly; this makes
possible capacity planning with low over-provisioning factor
and/or efficient dynamic provisioning with reliable workload
We built a video streaming service testbed as a concept
system of the hybrid cloud computing model. It has a local
cluster serving as the base load zone and the Amazon EC2
infrastructure [1] as the flash crowd zone; the workload fac-
toring service was implemented as a load controller to arbitrate
the stream load distribution between the two zones. With
analysis, trace-driven simulations, and testbed experiments,
1Please refer to [3] for an interesting discussion
1932-4537/14/$31.00 c
2014 IEEE
we showed the workload factoring technology can enable
reliable workload prediction in the base load zone (with simple
statistical method), achieve resource efficiency (e.g., 78%
higher server capacity than that in base load zone), reduce data
cache/replication overhead (up to two orders of magnitude) in
the flash crowd load zone, and react fast (with an X2speed-up
factor) to the changing application data popularity upon the
arrival of load spikes.
Note that the technologies presented in this paper are by no
means a complete solution for the hybrid cloud computing
model. There are many technical components skipped for
discussion such as load balancing schemes in the two zones,
data replication & consistency management in the flash crowd
zone, security management for a hybrid platform, and more.
We focus on the workload factoring component in this paper
as it is a unique functionality requirement in the hybrid
cloud computing architecture. In addition, for the presentation
concreteness, we describe and evaluate our technologies in the
context of video streaming applications throughout the paper.
The rest of the paper is organized as follows. Section II
describes the design rationale and architecture of the hybrid
cloud computing model. We present the problem model and
technical details of the workload factoring mechanism in
Section III, and analytic results of the fast frequent data item
detection algorithm used by the workload factoring mechanism
at Section IV. Section V shows the evaluation results. We
present the related work in Section VI, and conclude this paper
with Section VII.
Our target applications are Internet-based applications with
a scaling-out architecture; they can duplicate service instances
on demand to distribute and process increasing workload.
Examples include stateless applications such as YouTube
video streaming service [5] and stateful applications such
as GigaSpaces XAP web applications [6]. The design goal of
the hybrid cloud computing model is to achieve both resource
efficiency and QoS guarantee upon highly dynamic workload
when hosting those scaling applications.
A. Design Rationale
For the presentation concreteness, we discuss the design
rationale through our observations on the measured workload
of a real Internet web service.
Figure 1 shows the dynamics of the hourly workload
measured 2during a 46-day time period on Yahoo! Video [8],
the 2nd largest U.S. online video sharing website [9]. Applying
statistical analysis techniques including auto-regressive inte-
grated moving average (ARIMA) and Fourier Transfer analy-
sis, we observed that there were clear periodic components
(exampled by the workload between July 23 and July 27
in Figure 1) in most of the time; however, the big spikes
shown in Figure 1 were not predictable. Therefore, resource
planning could be effective most of the time, but not always
working. We also observed that the ratio of the maximum
2The measured workload was the number of video streams served per hour
on Yahoo! Video site. For further details, please refer to [7].
Hourly stream request number
Time (July 17 - Auguest 31, 2008)
July 23 - July 27, 2008)
Fig. 1. Video stream workload evolution on Yahoo! Video Site.
File rank
Number of requests within 30 minutes
normal period
spike period
Fig. 2. Popularity comparison of the stable interval and bursty interval on
the load in Figure 1.
workload to the average load is as high as 5.3(12.9if
workload was measured in half an hour interval), which makes
overprovisioning over peak load highly inefficient. Based on
the above observations, we believe an integration of proactive
and reactive resource management schemes should be applied
on different components of the aggregated application work-
load along the time: proactive management opportunistically
achieves resource efficiency through reliable prediction on
the base workload seen most of the time, while reactive
management quickly responds to sudden workload surges
in rare time with the requirement of agile and responsive
While we can not predict these unexpected spikes in the
workload, it is necessary to learn the nature of the burstiness
and find out an efficient way to handle it once such events
happen. The comparison of the data popularity distribution
during one spike interval in Figure 1 and that in the normal
interval right before the spike is shown in Figure 2. Clearly,
the bursty workload can be seen as two parts: a base workload
similar to the workload in the previous normal period, and
a flash crowd load that is caused by a few very popular
data items (video clips). Actually this phenomenon is not
Flash Crowd Zone
Load Balancing Load Balancing
Base Zone
Workload Factoring
Dedicated application platform
(local data center)
Shared resource platform
(cloud infrastructure)
application requests
100% timex% time
Fig. 3. Application hosting platform architecture in the hybrid Cloud
Computing Model.
limited to our measurement; it is a typical pattern in flash
crowd traffic and explained through slash-dot effect (or Digg
effect et al). This pattern indicates that we may not need
complicated workload dispatching scheme for the flash crowd
load part because most requests will be for a small number
of unique data items. As the flash crowd load has extremely
high content locality, the operator can make the best of data
caching and simply provision extra servers with the best-
scenario capacity estimation based on maximal cache hit ratio
(much higher server capacity than that with low cache hit ratio,
as we will show in Section V-B2). The challenge of workload
management here lies in the responsiveness of workload
decomposition upon changing data popularity distribution.
B. Architecture
Figure 3 shows the application hosting platform architecture
in the proposed hybrid Cloud Computing Model. It includes
two resource zones: a base zone which is a dedicated ap-
plication platform in a local data center, and a flash crowd
zone which is an application platform hosted on a cloud in-
frastructure. The base zone runs all the time and processes the
base load of the application. As the base load volume does not
vary dramatically after removing the sporadic spikes, the local
data center is expected to run in a compact and highly utilized
mode even though small-margin resource overprovisioning is
necessary for application QoS guarantee (Section V-A3 gives
evaluation results on this argument). The flash crowd zone is
provisioned on demand and expected to be on for transient
periods. While the resource in the flash crowd zone might
have to be overprovisioned at a large factor (e.g., several times
larger than the compute resources provisioned in the base
zone), it is expected to be utilized only in rare time (X%
of the time, e.g., 5%). Each resource zone has its own load
balancing scheme in managing the separated workload, and
we do not discuss it further in this paper.
At the entry point lies the workload factoring component.
The design goals of the workload factoring component in-
clude two: 1) smoothing the workload dynamics in the base
zone application platform and avoiding overloading scenarios
through load redirection; 2) making flash crowd zone appli-
cation platform agile through load decomposition not only on
the volume but also on the requested application data. By
selectively dispatching requests for similar (hot) data objects
into the flash crowd zone, the workload factoring scheme aims
at minimizing the resulting application data cache/replication
overhead. This will bring multiple benefits on the architecture
and performance of the flash crowd zone platform:
with only a small set of active application data accessed,
the data storage component at the flash crowd zone can be
designed as a data cache and decoupled from that at the
base zone; therefore, the former can be a floating platform
and does not have to be tied to the latter through shared
physical resources (e.g., shared SAN, which otherwise
has to be provisioned for the peak load).
with only a small set of active application data served,
application servers can reduce their warm-up time sig-
nificantly with a cold cache, and therefore speedup the
overall dynamic provisioning process.
with only a small set of active application data cached,
application servers can maximize their capacity with
high cache hit ratio. Section V-B3 gives some evaluation
results on this argument.
with only a small set of active application data requested,
simple load balancing schemes like random or round-
robin can perform as well as more complicated schemes
exploiting content locality such as job-size-based dis-
patching [10].
We will describe the workload factoring scheme in details in
Section III.
C. Discussion
While the motivation of the hybrid cloud computing model
originates from dynamic workload management, it addresses
many concerns on the full Cloud Computing model where cus-
tomers completely rely on public cloud services for application
hosting. For example, enterprise IT legacy infrastructures do
not need to be abolished and instead be powered with the
capability to handle the long tail of their workload; public
cloud service availability is no longer so critical with the
sporadic utilization (a two-9s availability translates into a four-
9s if load spikes are defined as the 1-percentile peak load); data
security & privacy concerns will not be severe as application
data are only cached temporarily on public clouds for a short
time; data transfer bottlenecks can be largely alleviated as only
a small portion of the application data is replicated on the
public cloud.
A. Problem Model
We model the general workload factoring process as a
hypergraph partition problem [11]. Each object (e.g., a video
clip or a DB table) in the application data 3is modeled as
3The definition of data objects is specific to applications. For example, in
video streaming the data items are video clips, in web services the data items
can be URLs, HTML pages, database tables, or even database table entries.
base zone
flash crowd zon
Workload factoring
fast factoring
base load
Fig. 4. Logic view of Workload Factoring Component.
a vertex, each service request type (e.g., all stream requests
for a video clip xor all HTTP requests of the “shopping
cart”interaction category) is modeled as a net, and a link
between a net and a vertex shows the access relationship
between the request type and the application data object. This
leads to the definition of a hypergraph H=(V,N)where
Vis the vertex set and Nis the net set; the weight wiof
avertexviVis the expected workload caused by this
data object, which is calculated as its popularity multiplied
by average workload per request to access/process this data;
another weight siof a vertex viVis the data size of this
object; the cost cjof a net njNis the expected data access
overhead caused by this request type, which is calculated as
the expected request number of this type multiplied by the
sum of its neighboring data objects’ size.
The K-way hypergraph partition, an NP-hard problem [11],
is to assign all vertexes (data objects) to K(K=2 in our case)
disjoint nonempty locations without the expected workload
beyond their capacities, and achieve minimal partition cost
where njNEcjis the net cut cost (the total weights of
the nets that span more than one location, therefore bringing
remote data access/consistency overhead); viVflashcrowd si
is the total size of the data objects in the flash crowd zone and
represents the data transfer/replication overhead; γis a factor
to assign different weights on the two overhead components.
There are fast partition solutions proposed like the bi-
section partition scheme [11]. For video streaming services
where request-data relationship is simple and there is no net
cut as one request accesses only one data item, the partition
problem degenerates to the knapsack problem where our
greedy scheme is moving vertexes from the base zone one
by one ranked by their popularity until reaching the flash
crowd zone’s capacity. This is equal to redirect the requests
for the most popular data items in a top-k list into the flash
crowd zone, and the remaining question is on how to quickly
generate the correct top-k list during a popularity transition
time disturbed by the workload burst. Next we give the details
of the workload factoring process.
B. Logic view
As shown in Figure 4, the intelligent workload factoring
(IWF) scheme has three basic components: workload profiling,
based load threshold, and fast factoring. The workload profil-
ing component updates the current system load upon incoming
requests, and compares it to the base load threshold to decide
if the system is in a normal mode or a factoring mode. The
current request
m comparisons
incoming requests
Historical top−k list
Current top−k list
matching filtering
data id
data id
Historical request queue
Fig. 5. Schematic description of fastTopK algorithm.
base load threshold specifies the maximum load that the base
zone can handle; it may be manually configured by operators
according to the base zone capacity, or automatically set based
on the load history information (e.g., the 95-percentile arrival
rate) which then will also input into the base zone for resource
provisioning decision. When the current system load is not
higher than the base load threshold, the fast factoring process
is in the “normal” mode, and it simply forwards incoming
requests into the base zone. When the current system load
is higher than the base load threshold, it is in the factoring
mode and queries a fast frequent data item detection algorithm
to check if an incoming request asks for data in a set of hot
data objects; if yes, this request is forwarded to the flash crowd
zone; otherwise, it is forwarded to the base zone.
C. Fast frequent data item detection algorithm
We call the fast frequent data item detection algorithm
FastTopK. As shown in Figure 5, it has the following data
structures: a FIFO queue to record the last crequests, a list
to record the current top-k popular data items, a list to record
the historical top-k popular data items, and a list of counters
to record the data item access frequency. Given a request r,
the algorithm outputs “base” if rwill go to the base zone, and
“flash crowd” otherwise. It works as following:
1) if the system is in the “normal” mode, the historical
top-k list is always set as empty; go to step 4).
2) if the system is in the “factoring” mode and ris the first
request since entering this mode, we copy the current
top-k list into the historical top-k list, reset all frequency
counters to 0, and empty the current top-k list.
3) if rmatches any of the historical top-k list (i.e., asking
the same data item), we increase the frequency counter
of that data item by 1 in the counter list, and update the
historical top-k list based on counter values.
4) otherwise, we randomly draw mrequests from the FIFO
queue, and compare them with r;ifrmatches any of
the mrequests (i.e., asking the same data item), we
increase the frequency counter of that data item by 1 in
the counter list, and update the current top-k list based
on counter values.
5) In the “normal” mode, the algorithm always answers
6) In the “factoring” mode, the algorithm combines the
two top-k lists by calculating the estimated request rate
of each data item: for each item in the historical top-k
list, the rate is its frequency counter value divided by
the total requests arrived since entering the “factoring”
mode; for each item in the current top-k list, the rate is
given in Theorem 1.
7) if r’s data item is in the top k of the 2kjoint items, the
algorithm answers “flash crowd”, otherwise it answers
8) if r’s data item does not belong to the historical top-k
list, the algorithm adds rinto the FIFO queue for request
history, and returns.
The key ideas in the fastTopK algorithm for speeding up
frequent data item detection include two: speeding up the
top-k detection at changing data popularity distributions by
pre-filtering old popular data items in a new distribution,
and speeding up the top-k detection at a data popularity
distribution by pre-filtering unpopular data items in this new
In this section, we present the performance analysis results
of the FastTopK algorithm.
A. Accuracy Requirement and Performance Metric
The correctness of the fastTopK algorithm relies on the
accuracy of the frequency counter information, which is the
estimation of the request rates on the correspondingdata items.
Formally, for a data item T, we define its actual request rate
p(T)= total requests for T
total requests .
FastTopK will determine an estimate ˆ
p(T)such that ˆ
p(T)(1 β
2),p(T)(1 + β
2)with probability greater than α.
For example, If we set β=0.01,α= 99.9%, then the
accuracy requirement states that with probability greater than
99.9%, FastTopK will estimate p(T)with a relative error
range of 1%. We use Zαto denote the αpercentile for the
unit normal distribution. For example, if α= 99.75%,then
Given the specified accuracy requirement, we measure the
performance of the algorithm through
Sample Size: Sample size is defined to be the number of
request arrivals needed to perform the estimation. We use
the term estimation time and sample size interchangeably.
B. Request Rate Estimation
We assume that request arrivals to the system follow an
independent and identically distributed (i.i.d.) process. The
following result is used to estimate the request rate pffrom
the frequency counter value of a data item f.
Let Mk(N,T )represent the frequency counter value for
the target data item Tafter Narrivals for fastTopK with k
comparisons. We label the requests 1to Nbased on the arrival
sequence. Let
Cij (f)=1both requests iand jask for data item f
Clearly, the frequency counter of data item fwill be
increased by 1 when Cij (f)=1. We call it a coincidence
in the rest of the paper. Therefore,
Cij (f).
Before we study the correlation structure of the compar-
isons, we first state the following elementary results.
Let Cij (f)be as defined above. Then
E[Cij (f)] = p2
Var[Cij(f)] = p2
f(1 p2
The results follow directly from the assumption that arrivals
are independent and that the probability that an arrival request
asks for data fis pf.
In FastTopK, the comparisons are not always independent
of each other. To see this, let’s use the comparisons Cij(f)and
Cim(f)(i=j=m) as an example. Note that P(Cij (f)=
1) = p2
fdue to the independence of arrivals. But P(Cim (f)=
1|Cij (f)=1)=pfbecause the conditioning already implies
that request iasks for data f. In general, for any pair of
comparisons Cij (f)and Clm(f), they are independent if and
only if all the four indices are distinct. If any two of the
indices are identical, then the comparisons are dependent. For
example, Cij (f)and Cim(f)are dependent. The next result
gives the correlation between the random variables Cij (f)and
Lemma 1 Consider Cij (f)and Cim(f)for ikj, m
Cov(Cij (f),C
xy(f)) = p3
f(1 pf).
Proof: Let τ=Cov(Cij (f),C
τ=E[(Cij (f)E[Cij (f)])(Cim(f)E[Cim (f)])]
=E[(Cij (f)p2
fE[(Cij (f)] + E[Cij (f)Cim(f)]
=E[Cij (f)Cim(f)] p4
f(1 pf)
where the third equality follows from the fact that
E[Cij (f)] = E[Cim(f)].
The last step follows from the fact that Cij and Cim are both
one if and only if requests i, j and mall ask for data item f,
which happens with probability p3
Following the above lemma, the expectation and variance
of Mk(N,f)is:
Lemma 2 Let Mk(N,f)denote the the number of coinci-
dences for data item fafter Narrival requests to the system.
E[Mk(N,f)] = NkP2
Var[Mk(N, f)] = Nkp2
f(1 p2
(1 + pf).
Proof: Note that
E[Mk(N,f)] = E[
Cij (f)]
To simplify the notation, we assume that we index the com-
parisons using a single index mwhere Im(f)is set to one if
comparison mresults in a coincidence for data item f.The
variance can be computed as follows:
Var[Mk(N, f)] = E[M2(N, f)] (E[Mk(N, f)])2
=E[M2(N,f)] (NkP2
i(f)) +
Ii(f)Ij(f)] (2)
i(f))] = Nkp2
Ii(f)Il(f))] (3)
f+(Nk 12(2k1))p4
=Nk((Nk 1)p4
Var[Mk(N, f)] = Nk[p2
f)(1 + 2(2k1)pf
(1 + pf)).
Now we know the mean and the variance for the number
of coincidences, we use the central limit theorem to obtain a
normal approximation for the number of coincidences and then
use the result to estimate the request rates. The next theorem
gives the expression for the estimator of the rate along with
its variance.
Theorem 1
Nk pf∼N0
f=(1 p2
f)(1 + 2(2k1)pf
Proof: Though the comparisons are not independent,
the comparisons are a stationary k2-dependent sequence with
finite expectation and variance. Following the central limit
theorem for dependent sequences [12], we can show that for
large N,
Nk p2
f(1 p2
(1 + pf).(5)
The above result can be shown as in Theorem 5 of [13].
Theorem 1 says that in fastTopK, the estimated request rate
is Mk(N,f)
Nk for the data item f.
C. Estimation Time
The historical top-k list serves as a filter on the requests
entering the FIFO queue. We call fastTopK a basic version
when without using the historical top-K list (such as in the
normal mode), and that actively using the historical top-K
list as a filtering version. For the estimation time of basic
FastTopK,we have the following result:
Lemma 3 Given the accuracy requirement (α, β)on pfde-
scribed in Section IV-A, and NC
basic be the number of arrivals
required for basic fastTopK,
basic =(4k1)Z2
Proof: First, we consider the variance of the estimated
request rate and derive the upper bound on its value. This
upper bound on the variance holds in the entire [0,1] range
and is a function of kand pf.As
σf2=(1 p2
f)(1 + 2(2k1)pf
setting the derivative of the variance with respect to pfto zero
gives us
The above bound on the variance can now be used to compute
the sample size given the accuracy requirement. Let (pfβ)
be the desired estimation accuracy and Zαthe desired α-
percentile. To achieve the accuracy requirement,
Therefore, the minimum sample size Nin order to satisfy the
accuracy requirement is (4k1)Z2
Now, let us define an amplification factor Xfor the rate
change of a data item f before and after the historical top-K
filtering as
X=paf ter
pbef ore
For example, if a data item takes 0.1% of the total requests,
and takes 0.2% of the requests filtered with the historical top-
K data items, the amplification factor X=2for this data
item. We have the speedup result of the fastTopK algorithm
given the rate amplification factor X.
Hosting solution Annual cost
local data center running a 790-server DC
full cloud computing US$1.384 millions
hybrid Cloud Computing US$58.96K+ running a 99-server DC
Theorem 2 Given the accuracy requirement (α, β)on pf
described in Section IV-A, NC
basic be the number of arrivals
required for basic fastTopK, and NC
filtering be the number of
arrivals required for filtering fastTopK,
filtering =NC
Proof: Following Lemma 3,
basic =(4k1)Z2
(kpbefor e
filtering =(4k1)Z2
=(paf ter
(pbef ore
Theorem2showsthatwehaveaX2speedup of the detection
process with a X-factor on rate amplification due to request
filtering based on historical information. For example, if a
data item takes 0.1% of the total requests, and takes 0.2%
of the requests filtered with the historical top-K data items,
the estimation time will be reduced by 4times to accurately
estimate its request rate with the frequency counter.
Through the evaluation, we aim to answer the following
1) What is the economical advantage of application hosting
solutions based on the hybrid cloud computing model?
2) What is the benefit on the base zone workload man-
agement with the intelligent workload factoring (IWF)
3) What is the benefit on the flash crowd zone resource
management with the IWF scheme?
4) What is the performance of the IWF scheme upon a
changing data popularity distribution?
For the first two questions, we rely on trace-driven simula-
tions to evaluate the hybrid cloud computing model in a large-
scale system setting. For the rest two questions, we rely on
testbed experiments to evaluate the IWF scheme in a dynamic
workload setting.
A. Trace-driven simulations
1) Yahoo! Video workload traces: We use the Yahoo! Video
workload traces presented in [7], and it contains a total of
32,064,496 video stream requests throughout the collection
period of 46 days. The hourly request arrival rates are shown
in Figure 1.
2) Economical cost comparison of three hosting solutions:
We compare three application hosting solutions to host the
measured Yahoo! Video stream load 4:
Local data center solution. In this solution, a local data
center is overprovsioned over the peak load in the mea-
Full cloud computing solution. In this solution, a rented
platform on Amazon EC2 infrastructure is provisioned
over the peak load in the measurement. The rent price is
US$0.10 per machine hour based on the Amazon EC2
pricing policy [1] at the time when the paper was written.
Our hybrid cloud computing model based solution. In this
solution, a local data center is provisioned over the 95-
percentile workload, and an Amazon EC2 based platform
is rented on demand for the top 5-percentile workload.
Table I shows the annual economical cost of the three
solutions when we repeated the 46-day workload through one
year. For the presentation simplicity, we use a simple cost
model which only includes the server cost even though there
are many other cost factors such as bandwidth, storage, power,
cooling, physical plant, and operation costs.
As we can see, the local data center solution requires
the running of a 790-server medium-sized infrastructure. The
full cloud computing solution, seeming cheap on the 10-cent
unit price, results in a bill of millions of dollars simply for
computing cost. Lastly, our hybrid cloud computing model
offers an economical solution with a 99-server small-sized
local data center (affordable even for most SMB customers)
and an annual bill of only US$58.96Kfor handling sporadic
load spikes.
Nowadays content providers on the Internet rely on content
distribution networks (CDNs) to leverage their presence across
different geographical locations to serve video content, and
lower the TCO (Total Cost of Ownership). While we do not
include the CDN solution into the Cloud-based cost compar-
ison, we note that some of these systems offer quite flexible
rules to split CDN traffic among multiple CDNs; there are
many CDN load balancers commercially available, including
Level 3 intelligent traffic management [14], Akamai Cotendo
CDN balancer [15], and LimeLight traffic load balancer [16].
However, a missing component of these existing systems is the
algorithm to compute the allocation automatically (e.g., the
percentages and/or the served content) [17]. Our IWF scheme
can be applied to automatically configure these systems, where
the CDNs serve the same purpose as the flash crowd zone
defined in our hybrid Cloud Computing model does.
3) Workload smoothing effect in the base zone: We de n e
the load spikes as the top 5-percentile data points in terms
of request rates in Figure 1. By removing them with the
workload factoring mechanism, we observed that the ratio
of the maximum load to the average load was reduced to
1.84 in the base zone, where overprovisioning over peak
load became a reasonable choice. We also applied statistical
techniques on the short-term workload prediction in the base
zone. Figure 6 shows the CDF of the prediction error using a
simple heuristic: it uses the arrival rate from last interval as the
4In all hosting solutions, we assume the capacity of a video streaming
server is 300 (i.e., supports 300 concurrent streaming threads).
1e-04 0.001 0.01 0.1 1 10 100
Cumulative Probability
Prediction error = |est. - real|/real
Workload Predictability Evaluation
Fig. 6. Prediction error with a simple heuristic on the base load.
Darwin stream servers
Darwin stream servers
stream clients IWF
Apache server
Amazon AWS cloud
local data center
Fig. 7. The video streaming service testbed with a hybrid platform.
prediction of the following interval’s arrival rate, with an 30-
minute interval length. It turned out that 82% of the time the
prediction had an error no more than 10%;90% of the time the
prediction had an error no more than 17%; Therefore, simple
statistical prediction techniques with small margin factor on
the predicted value could be reliable for dynamic provisioning
in the base zone.
B. Testbed experiments
1) Testbed: We set up an application testbed which hosts
YouTube-like video streaming service. The testbed consists
of two parts: a local data center and a platform set up at
Amazon AWS infrastructure utilizing the EC2 and S3 services.
In the local data center, 20 open source Darwin streaming
servers [18] are provisioned all the time, while the streaming
server instances at Amazon EC2 are activated on demand.
The IWF component was implemented as a load controller
for the streaming workload. When a client request for a video
clip comes into the testbed as a HTTP request, the Apache web
server parses the request and asks IWF for which streaming
98000 99000 100000 101000 102000 103000 104000 105000
Video ID
Request arriving process
Fig. 8. Incoming streaming requests.
server (either a local server or a remote server at Amazon
EC2.) the video clip will be served; it then returns to the
client with a dynamic HTML page which automatically starts
the media player for video streaming at the client machine.
The load controller also contains the implementation of
two well-known load balancing algorithms [19]: the Least
Connections balancing algorithm for the local server farm and
the Round-Robin load balancing algorithm for the server farm
on Amazon Cloud.
We developed a distributed workload generator based on
openRTSP [20] to generate real video streaming load. Depend-
ing on the test scenarios, up to 20 machines were provisioned
for client-side workload generation.
2) Methodology: We evaluate the fast factoring algorithm
by running experiments with synthetic load traces. In the
traces, we generated workload with a stable data popularity
distribution D1before time t, and then suddenly changed to
another distribution D2after twhere D2is the sum of D1
and another distribution D3. We generated D1and D3with
uniform and Zipf distributions (different αvalues), and also
changed the volume ratio of D1to D3with different numbers
(1:k,where1k10). For the FastTopK algorithm, its
goal is to decompose the aggregated workload D2into two
parts so that their volume ratio is the same as that of D1to
D3and minimize the unique data items contained in the load
part with the volume D3
Figure 8 shows one example traces where D1is Zipf and
D3is uniform distributions, and the volume ratio is 1:1.One
data point at the coordinate (x, y )in the graph represents one
streaming request asking for the video file with ID yat time
x. The changing time t= 100000 when a load spike on a few
hot data items jumped in.
We compared IWF with 3 other workload factoring algo-
random: the random factoring algorithm decides with the
probability D3
D1+D3a request will go to the load group
with the volume D3
Choke: the Choke factoring algorithm is based on the
ChoKe active queue management scheme [21]. While
ChoKe was originally proposed for approximating fair
bandwidth allocation, it is a reasonable candidate for
Video Number
Workload Factoring Performance - video requested at remote DC
Random Choke RATE IWF
Total Videos
Unique Videos
Fig. 9. IWF performance: D1- Zipf distribution, D3- uniform distribution.
Video Number
Workload Factoring Performance - video requested at remote DC
Random Choke RATE FastTopK
Total Videos
Unique Videos
Fig. 10. IWF performance: D1- uniform distribution, D3- uniform
workload factoring when the optimization goal is min-
imizing the unique data items in one part (dropping the
packets to the top popular IP destinations is similar to
finding the top popular data items).
RATE: the RATE factoring algorithm acts the same as
IWF except that it uses the RATE scheme [22] to detect
the top-K data items.
3) Results: We generated one load trace over a video
library of 1600 unique video clips, which all have video bit
rate of 450Kb/s.D1is Zipf distribution with α=1and D3is
uniform distribution, and the volume ratio is 1:1. One dot at
the coordinate (x, y)in Figure 8 represents that one streaming
request asking for video file yarrives at time x. The changing
time t= 100000 when a load on a few hot data items jumped
in. Figure 9 shows the factoring performance in terms of the
number of unique data items contained in the load part with
the volume D3
D1+D3. When all factoring algorithms dispatched
the same amount of requests into the flash crowd zone, IWF
outperformed the other three significantly in terms of unique
video files requested (two orders of magnitudes compared to
random dispatching); in the flash crowd zone totally 5000
streams were served on only 30 unique video clips.
In Figure 10 we used another trace where both D1and
0 20 40 60 80 100 120 140 160
Client-side perceived Quality
Concurrent stream connections
base zone server
flash crowd zone server
Fig. 11. IWF performance: client-side perceived streaming quality at two
D3are uniform distributions, and the volume ratio is 1:1.
In this case IWF’s performance still outperformed the other
three schemes; actually, it was quite close to the optimal per-
formance (21 vs 10) in terms of unique video clips requested.
Figure 11 shows the client-side perceived streaming quality
from a base zone server and a flash crowd zone server in the
above workload factoring experiment. For fair comparison,
in this case both servers have the same hardware/software
configuration and reside in the local testbed. The client-side
perceived streaming quality is a metric reported by the Darwin
Stream server itself; it has a score between 0 and 100 to mainly
reflect the packet loss ratio during the streaming process. We
see when the concurrent connections went up, the flash crowd
zone server delivered more reliable streaming quality than
the base zone server. It could support up to 160 concurrent
streaming connections while keeping the client-side quality
at above 80, while the base load zone server could only
support around 90 concurrent streaming connections to keep
the client-side quality steadily at above 80. In the testbed
configuration, we set the base zone server capacity at 90
concurrent connections and that for flash crowd zone servers
at 160 (78% higher than that in the base zone), and enforce
it during dispatching.
In November 2008, Amazon launched CloudFront [23]
for its AWS customers who can now deliver part or all
application load through Amazon’s global network; around
the same time period, VMWare also proposed in its Virtual
Data Center Operation System blueprint the vCloud service
concept [24], which helps enterprise customers expand their
internal IT infrastructure into an internal cloud model or
leverage off-premise computing capacity. When current IT
systems evolve from the dedicated platform model to the
shared platform model along the cloud computing trend, we
believe a core technology component in need is a flexible
workload management scheme working for both models, and
our workload factoring technology is proposed as one answer
for it.
Berkeley researchers [25] offer a ranked list of obstacles to
the growth of Cloud Computing. Similar to the points we made
in the introduction, the concerns on public cloud computing
services include service availability, data confidentiality and
auditability, performance unpredictability, and so on. While
some concerns could be addressed technically, some are due
to physical limitations naturally. On the service architecture
level, we believe a hybrid cloud computing model makes sense
to enterprise IT and can eliminate (or significantly alleviate)
many issues raised from a full Cloud Computing model.
In Content Distributed Network (CDN) and web caching,
workload factoring happens between a primary web server and
proxy servers. The typical method is DNS redirecting and the
workload factoring decision is predefined manually over a set
of “hot” web objects. Automatic solutions [26] [27] [28] exist
for cache hit ratio improvement through locality exploration,
and their focus is on compact data structure design to exchange
data item access information.
Many content publishers, such as Netflix, Hulu, Microsoft,
Apple, Facebook, and MSNBC, use multiple CDNs to dis-
tribute and cache their digital content [17]. This allows them
to aggregate the diversity of individual CDN providers on
features, resources, and performance. Accordingly, two new
strategies, telco-CDN federation and hybrid P2P-CDN, are
emerging to augment existing CDN infrastructures [29]. Telco-
CDN federation is based on the recent development among
various CDNs operated by telecommunication companies to
federate by interconnecting their networks, ensure better avail-
ability, and benefit the participating ISPs in terms of provi-
sioning costs [30], [31]. A hybrid strategy of serving content
from dedicated CDN servers using P2P technology provides
the scalability advantage of P2P along with the reliability and
manageability of CDNs [32]–[34]. Like the traditional CDN
infrastructures, workload factoring happens between a primary
server and proxy servers. Our IWF scheme can be applied to
automatically configure these systems, and decide the load
allocation (e.g., the percentages and/or the served content) for
the primary server and proxy servers.
For fast frequent data item detection in data streams, many
schemes have been proposed for fast rate estimation in traffic
monitoring [22] [35] [36], and fast data item counting in
CDN [37]; their focus is on compact data structure design to
memorize request historical information at a static distribution.
In this paper, we present the design of a hybrid cloud
computing model. With the proposed proactive workload
management technology, the hybrid cloud computing model
allows users to develop a new architecture where a dedicated
resource platform runs for hosting base service workload, and
a separate and shared resource platform serves flash crowd
peak load. Given the elastic nature of the cloud infrastructure,
it creates a situation where cloud resources are used as an
extension of existing infrastructures.
[1] “Amazon web services,”
[2] “Google app engine,”
[3] C. Goolsbee, “Don’t buy cloud computing hype: business model will
evaporate,” in, 2008.
[4] “Massive (500) Internal Server Error.outage started 35 minutes
ago,” Feburary 2008. Available: http://developer.amazonwebservices.
[5] “Youtube,”
[6] “Gigaspaces,”
[7] X. Kang, H. Zhang, G. Jiang, H. Chen, X. Meng, and K. Yoshihira,
“Measurement, modeling, and analysis of Internet video sharing site
workload: a case study,” in Proc. 2008 IEEE International Conference
on Web Services, pp. 278–285.
[8] “Yahoo! video,”
[9] “ComScore Video Metrix report: U.S. Viewers Watched
an Average of 3 Hours of Online Video in July,, July 2007.
[10] M. Harchol-Balter, M. E.Crovella, and C. D. Murta, “On choosing a
task assignment policy for a distributed server system,” pp. 231–242,
[11] G. Karypis and V. Kumar, “Multilevel k-way hypergraph partitioning,”
in Proc. 1999 ACM/IEEE Conference on Design Automation, pp. 343–
[12] T. S. Ferguson, A Course in Large Sample Theory. Chapman & Hall,
[13] M. S. Kodialam, T. V. Lakshman, and S. Mohanty, “Runs based traffic
estimator (rate): a simple, memory efficient scheme for per-flow rate
estimation,” in 2004 INFOCOM.
[14] “Level 3 intelligent traffic management,”
[15] “Akamai cotendo cdn balancer,”
[16] “Limelight traffic load balancer,”
[17] H. H. Liu, Y. Wang, Y. R. Yang, H. Wang, and C. Tian, “Optimizing
cost and performance for content multihoming,” in Proc. 2012 ACM
SIGCOMM Conference on Applications, Technologies, Architectures,
and Protocols for Computer Communication, pp. 371–382.
[18] “Darwin streaming server,”
[19] M. Arregoces and M. Portolani, Data Center Fundamentals. Cisco Press,
[20] “openrtsp,”
[21] R. Pan, B. Prabhakar, and K. Psounis, “Choke—a stateless active queue
management scheme for approximating fair bandwidth allocation,” in
Proc. 2000 IEEE INFOCOM, vol. 2, pp. 942–951. Available: http://dx.
[22] F. Hao, M. Kodialam, T. V. Lakshman, and H. Zhang, “Fast payload-
based flow estimation for traffic monitoring and network security,”
in Proc. 2005 ACM Symposium on Architecture for Networking and
Communications Systems, pp. 211–220.
[23] “Amazon,”
[24] “Vmware cloud vservices,”
[25] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Kon-
winski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia,
“Above the clouds: a Berkeley view of cloud computing,” EECS Depart-
ment, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-
28, Feb 2009. Available:
2009/EECS-2009- 28.html
[26] E. Casalicchio, V. Cardellini, and M. Colajanni, “Content-aware dis-
patching algorithms for cluster-based web servers,” Cluster Computing,
vol. 5, no. 1, pp. 65–74, 2002.
[27] S. Jin and A. Bestavros, “Greedydual* web caching algorithm: exploit-
ing the two sources of temporal locality in web request streams,” in
Proc. 2000 International Web Caching and Content Delivery Workshop,
pp. 174–183.
[28] A. Wolman, M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M.
Levy, “On the scale and performance of cooperative web proxy caching,”
SIGOPS Oper. Syst. Rev., vol. 33, no. 5, pp. 16–31, 1999.
[29] A. Balachandran, V. Sekar, A. Akella, and S. Seshan, “Analyzing the
potential benefits of CDN augmentation strategies for Internet video
workloads,” 2013.
[30] “Cisco report on CDN federation—solutions for SPS and content
providers to scale a great customer experience.”
[31] D. Rayburn, “Telcos and carriers forming new federated CDN group
called OCX (operator carrier exchange),”
[32] “Akamai netsession,”
[33] C. Huang, A. Wang, J. Li, and K. W. Ross, “Understanding hybrid
CDN-p2p: why limelight needs its own red swoosh,” in Proc. 2008
International Workshop on Network and Operating Systems Support for
Digital Audio and Video, pp. 75–80. Available:
[34] H. Yin, X. Liu, T. Zhan, V. Sekar, F. Qiu, C. Lin, H. Zhang, and B. Li,
“Design and deployment of a hybrid CDN-p2p system for live video
streaming: experiences with livesky,” in Proc. 2009 ACM International
Conference on Multimedia, pp. 25–34.
[35] A. Kumar, M. Sung, J. Xu, and J. Wang, “Data streaming algorithms
for efficient and accurate estimation of flow distribution,” in Proc. 2004
[36] N. G. Duffield and M. Grossglauser, “Trajectory sampling for direct
traffic observation,SIGCOMM Comput. Commun. Rev., vol. 30, no. 4,
pp. 271–282, 2000.
[37] A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston, “Finding
(recently) frequent items in distributed data streams,” in Proc. 2005
International Conference on Data Engineering, pp. 767–778.
Hui Zhang is a senior researcher at NEC Lab-
oratories America at Princeton, New Jersey. He
received the B.Eng. degree in Electrical Engineering
from Hunan University, China in 1996, the M.Eng.
degree in Electrical Engineering from the Institute of
Automation, Chinese Academy of Sciences, China,
in 1999, and the Ph.D. degree in Computer Science
from the University of Southern California in 2005.
His research interests include next-generation data
centers, Peer-to-Peer and overlay networks, design
and analysis of algorithms.
Guofei Jiang is the Vice President of NEC Labo-
ratories America at Princeton, New Jersey. He leads
a large research group consisted of members from
the global network of NEC R&D units. His group
conducts fundamental and applied research in the
areas of Big Data Analytics, Distributed System
and Cloud Platforms, Software-defined Networking,
and Computer Security. He has published over 100
technical papers and also has over 40 patents granted
or applied. His inventions have been successfully
commercialized as Award Winning NEC products
and solutions, and have significantly contributed to NEC’s business.
Kenji Yoshihira received the B.E. in EE at Univer-
sity of Tokyo in 1996 and designed processor chips
for enterprise computer at Hitachi Ltd. for five years.
He employed himself in CTO at Investoria Inc.
in Japan to develop an Internet service system for
financial information distribution through 2002 and
received the M.S. in CS at New York University in
2004. He is currently Associate Director, Solutions
Incubation in NEC Laboratories America, Inc. in NJ.
Haifeng Chen received the BEng and MEng degrees
in automation from Southeast University, China, in
1994 and 1997, respectively, and the Ph.D. degree
in computer engineering from Rutgers University,
New Jersey, in 2004. He has worked as a researcher
in the Chinese National Research Institute of Power
Automation. He is currently a senior researcher at
NEC Laboratory America, Princeton, New Jersey.
His research interests include data mining, auto-
nomic computing, pattern recognition, and robust
... The potential workload in the second group is uncertain. It is therefore important to predict future demand and to optimize the trade-off between application requirements, such as response time and throughput, and resource economics, such as overhead costs and configuration, e.g. in the case of video streaming services (such as e-learning based systems) [28]. In [23], the authors present a cloud bursting approach focused on long-term and short-term projections of requests for a business-critical web system to classify the optimum resources of the system deployed in private and public data centers. ...
... Authors use a series of metrics, including the amount of workload needed, storage space, and transaction delays, to provide efficient hybrid cloud deployment. Another interesting work is discussed in [28], where a hybrid cloud computing model is proposed featuring a two-zone system architecture. The Internet-based application that must be deployed on the architecture, is divided into two naturally different components, namely the base load and flash crowd load. ...
The COVID-19 emergency suddenly obliged schools and universities around the world to deliver on-line lectures and services. While the urgency of response resulted in a fast and massive adoption of standard, public on-line platforms, generally owned by big players in the digital services market, this does not sufficiently take into account privacy-related and security-related issues and potential legal problems about the legitimate exploitation of the intellectual rights about contents. However, the experience brought to attention a vast set of issues, which have been addressed by implementing these services by means of private platforms. This work presents a modeling and evaluation framework, defined on a set of high-level, management-oriented parameters and based on a Vectorial Auto Regressive Fractional (Integrated) Moving Average based approach, to support the design of distance learning architectures. The purpose of this framework is to help decision makers to evaluate the requirements and the costs of hybrid cloud technology solutions. Furthermore, it aims at providing a coarse grain reference organization integrating low-cost, long-term storage management services to implement a viable and accessible history feature for all materials. The proposed solution has been designed bearing in mind the ecosystem of Italian universities. A realistic case study has been shaped on the needs of an important, generalist, polycentric Italian university, where some of the authors of this paper work.
... One data center resides in each region, and different DCs are interconnected by wide area network. It is realistic for most of content distribution providers (CDPs) to apply such distributed clouds to build the system cost-effectively [37], [38]. In particular, a CDP may operate several geographically dispersed DCs as the private cloud based on the application scale. ...
... In each public DC, the CDPs can provide approximately 500 VMs and each VM is assumed to deal with one request per time slot, while in each private DC, there are 200 servers which can handle two service requests per time slot. Each time slot is set to one hour, with the same granularity of service provided by Amazon [38]. According to the measurement result of YouTube videos in [40], the average size of a video is 8 MB and over 95% of videos are less than 15 MB. ...
A growing number of social video distribution services are turning toward to the cloud-based architecture for lower cost and better scalability. Although appealing, such a cloud content delivery network (cloud-based CDN) involves two key tasks: to dynamically migrate the contents across multiple data centers on diverse locations and to place user requests in the proper sites such that the monetary cost and service latency are targeted appropriately. In this paper, we formulate these two objectives into a combinational optimization problem and develop a context-aware community-based computing model to discover the contextual information of video propagation. In particular, we explore the social graph that people created to estimate the potential demands of each video and to propose a basic scheme for video migration and request placement under the heterogeneous cloud paradigm. Since the basic scheme adopts a greedy algorithm per step, it only obtains the short-term optimality. To solve this problem, we design an advanced scheme that can align to the long-term optimum solution and can ensure that the total monetary cost is minimized while meeting the predefined service level agreements (SLAs). Compared with present scheduling techniques, we found that the designed algorithm achieves lower service cost yet guarantees that the response time of requests is all within the specified quality of service (QoS) requirements.
... Bittencourt et al. [15] and Abrishami et al. [16] describe DAG-influenced heuristic scheduling algorithms that keep sensitive tasks in the private cloud but the sensitivity of the task is not automatically determined by the algorithms. Statistical and artificial intelligence methods are employed by [21] and [22] to aid task scheduling algorithms. The studies, however, do not discriminate between the sensitivity of tasks being processed. ...
—The hybrid cloud inherits the best aspects of both the public and private clouds. One such benefit is maintaining control of data processing in a private cloud whilst having nearly elastic resource availability in the public cloud. However, the public and private cloud combination introduces complexities such as incompatible security and control mechanisms, among others. The result is a reduced consistency of data processing and control policies in the different cloud deployment models. Cloud load-balancing is one control mechanism for routing applications to appropriate processing servers in compliance with the policies of the adopting organization. This paper presents a process-mining influenced load-balancer for routing applications and data according to dynamically defined business rules. We use a high-level Colored Petri Net (CPN) to derive a model for the process mining-influenced load-balancer and validate the model employing live data from a selected hospital.
... The platform's operation will be inefficient under pressure when the user scale reaches a specific number [2]. Therefore, the platform needs to invest in capacity expansion [47]. Considering the investment in capacity expansion has a decreasing return to scale, we assume that the platform invests an expansion cost I = ε 1 ∆M M 2 + ε 2 ∆M S 2 at the start of the upgrade stage. ...
This paper aims to study the evolution mechanism of the third-party platform ecosystem. A multi-value chain network ecosystem composed of multiple manufacturers, multiple suppliers, several logistics providers and a third-party platform for manufacturing is considered. The system dynamics method is used to build the model, and this paper collects relevant industry and platform data to simulate the evolution of user scale and participants' revenues. Furthermore, the influence of platform subsidy and matching service level on the evolution is studied. The results show that the platform's evolution can be divided into four stages: emergence, growth, maturity and upgrade. This paper also finds that, at the emergence stage and the growth stage, the augmentation of the subsidies to manufacturers makes the manufacturers' scale expand but let their revenues decline. Meanwhile, the platform's revenues reduce at the emergence stage while increase at the growth stage. When the subsidy amount is high and continues to augment, its positive effect on the user scale is weakened while its negative effect on manufacturers' revenues is enhanced. Besides, improving the matching service level is not conducive to the platform's revenues at the emergence stage, but after entering the growth stage, it can increase user scale and the platform's revenues simultaneously.
... Auto-scaling of web applications with time series forecasting methods are discussed in [15][16][17]. A workload factoring technique was discussed in [18] for a hybrid cloud where a threshold-based technique was used to classify into two classes. The classes capture the base workload and flash workload. ...
Cloud applications heavily use resources and generate more traffic specifically during specific events. In order to achieve quality in service provisioning, the elasticity of resources is a major requirement. With the use of a hybrid cloud model, organizations combine the private and public cloud services to deploy applications for the elasticity of resources. For elasticity, a traditional adaptive policy implements threshold-based auto-scaling approaches that are adaptive and simple to follow. However, during a high dynamic and unpredictable workload, such a static threshold policy may not be effective. An efficient auto-scaling technique that predicts the system load is highly necessary. Balancing a dynamism of load through the best auto-scale policy is still a challenging issue. In this paper, we suggest an algorithm using Deep learning and queuing theory concepts that proactively indicate an appropriate number of future computing resources for short term resource demand. Experiment results show that the proposed model predicts SLA violation with higher accuracy 5% than the baseline model. The suggested model enhances the elasticity of resources with performance metrics.
The smooth operation of a cloud data center along with the best user experience is one of the prime objectives of a resource management scheme that must be achieved at low cost in terms of resource wastage, electricity consumption, security and many others. The workload prediction has proved to be very useful in improving these schemes as it provides the prior estimation of upcoming demands. These predictions help a cloud system in assigning the resources to new and existing applications on low cost. Machine learning has been extensively used to design the predictive models. This article aims to study the performance of different nature-inspired based metaheuristic algorithms on workload prediction in cloud environment. We conducted an in-depth analysis using eight widely used algorithms on five different data traces. The performance of each approach is measured using Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). In addition, the statistical analysis is also carried out using Wilcoxon signed rank and Friedman with Finner post-hoc multiple comparison tests. The study finds that Blackhole Algorithm (BhA) reduced the RMSE by 23.60%, 6.51%, 21.21%, 60.45% and 38.30% relative to the worst performing algorithm for 5 min forecasts of all five data traces correspondingly. Moreover, Friedman test confirms that the results of these approaches have a significant difference with 95% confidence interval (CI) and ranks show that the BhA and FSA received best ranks for Google Cluster trace (CPU and Memory Requests) while second best ranks for NASA and Saskatchewan HTTP server requests.
Failures in computer systems can be often tracked down to software anomalies of various kinds. In many scenarios, it might be difficult, unfeasible, or unprofitable to carry out extensive debugging activity to spot the cause of anomalies and remove them. In other cases, taking corrective actions may led to undesirable service downtime. In this article, we propose an alternative approach to cope with the problem of software anomalies in cloud‐based applications, and we present the design of a distributed autonomic framework that implements our approach. It exploits the elastic capabilities of cloud infrastructures, and relies on machine learning models, proactive rejuvenation techniques, and a new load balancing approach. By putting together all these elements, we show that it is possible to improve both availability and performance of applications deployed to heterogeneous cloud regions and subject to frequent failures. Overall, our study demonstrates the viability of our approach, thus opening the way towards its adoption, and encouraging further studies and practical experiences to evaluate and improve it.
To ensure reliability for information and communication technology (ICT) systems, it is important to analyze resource usage for the purpose of provisioning resources, detecting failures, and so on. It is more useful to understand the resource usage trends for each business process because generally multiple business processes run on an ICT system. For example, we can detect an increase in resource usage for a specific business process. However, conventional methods have not been able to analyze such trends because resource usage data is usually mixed and cannot be separated. Therefore, in the previous work, we proposed an analysis method that decomposes the data into components of each business process. The method successfully analyzed only single sources of data. However, an actual system consists of multiple resources and multiple devices. Therefore, in this paper, we enhance this method so that it is able to analyze multiple sources of data by incorporating a technique for unifying multiple sources of data into a single sources of data on the basis of a workload dependency model. In addition, the proposed method can also analyze the relationship between resources and application workloads. Therefore, it can identify specific applications that cause resource usage to increase. We evaluated the proposed method by using the data of on-premise and actual commercial systems, and we show that it can extract useful business trends. The method could extract system-wide processes, such as file-copy between two servers, and identify a business event corresponding to a resource usage increase.
Conference Paper
Full-text available
Many large content publishers use multiple content distribution networks to deliver their content, and many commercial systems have become available to help a broader set of content publishers to benefit from using multiple distribution networks, which we refer to as content multihoming. In this paper, we conduct the first systematic study on optimizing content multihoming, by introducing novel algorithms to optimize both performance and cost for content multihoming. In particular, we design a novel, efficient algorithm to compute assignments of content objects to content distribution networks for content publishers, considering both cost and performance. We also design a novel, lightweight client adaptation algorithm executing at individual content viewers to achieve scalable, fine-grained, fast online adaptation to optimize the quality of experience (QoE) for individual viewers. We prove the optimality of our optimization algorithms and conduct systematic, extensive evaluations, using real charging data, content viewer demands, and performance data, to demonstrate the effectiveness of our algorithms. We show that our content multihoming algorithms reduce publishing cost by up to 40%. Our client algorithm executing in browsers reduces viewer QoE degradation by 51%.
Conference Paper
Full-text available
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.
Conference Paper
Full-text available
In this paper we measured and analyzed the workload on Yahoo! Video, the 2nd largest U.S. video sharing site, to understand its nature and the impact on online video data center design. We discovered interesting statistical prop- erties on both static and temporal dimensions of the work- load; theyincludefiledurationandpopularitydistributions, arrival rate dynamics and predictability, and workload sta- tionarity and burstiness. Complemented with queueing- theoretic techniques, we extended our understanding on the measurement data with a virtual data center design as- suming the same workload as measured, which reveals re- sults regarding the impact of workload arrival distribution, Service Level Agreements (SLAs) and workload scheduling schemes on the design and operations of such large-scale video distribution systems.
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement The RAD Lab's existence is due to the generous support of the founding members Google, Microsoft, and Sun Microsystems and of the affiliate members Amazon Web Services, Cisco Systems, Facebook, Hewlett-
Conference Paper
Video viewership over the Internet is rising rapidly, and market predictions suggest that video will comprise over 90\% of Internet traffic in the next few years. At the same time, there have been signs that the Content Delivery Network (CDN) infrastructure is being stressed by ever-increasing amounts of video traffic. To meet these growing demands, the CDN infrastructure must be designed, provisioned and managed appropriately. Federated telco-CDNs and hybrid P2P-CDNs are two content delivery infrastructure designs that have gained significant industry attention recently. We observed several user access patterns that have important implications to these two designs in our unique dataset consisting of 30 million video sessions spanning around two months of video viewership from two large Internet video providers. These include partial interest in content, regional interests, temporal shift in peak load and patterns in evolution of interest. We analyze the impact of our findings on these two designs by performing a large scale measurement study. Surprisingly, we find significant amount of synchronous viewing behavior for Video On Demand (VOD) content, which makes hybrid P2P-CDN approach feasible for VOD and suggest new strategies for CDNs to reduce their infrastructure costs. We also find that federation can significantly reduce telco-CDN provisioning costs by as much as 95%.
While algorithms for cooperative proxy caching have been widely studied, little is understood about cooperative-caching performance in the large-scale World Wide Web environment. This paper uses both trace-based analysis and analytic modelling to show the potential advantages and drawbacks of inter-proxy cooperation. With our traces, we evaluate quantitatively the performance-improvement potential of cooperation between 200 small-organization proxies within a university environment, and between two large-organization proxies handling 23,000 and 60,000 clients, respectively. With our model, we extend beyond these populations to project cooperative caching behavior in regions with millions of clients. Overall, we demonstrate that cooperative caching has performance benefits only within limited population bounds. We also use our model to examine the implications of future trends in Web-access behavior and traffic.
Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the domain. Most resource allocation and capacity planning tasks can benefit from such information. Also, traffic measurements should be obtained without a routing model and without knowledge of network state. This allows the traffic measurement process to be resilient to network failures and state uncertainty. We propose a method that allows the direct inference of traffic flows through a domain by observing the trajectories of a subset of all packets traversing the network. The key advantages of the method are that (i) it does not rely on routing state, (ii) its implementation cost is small, and (iii) the measurement reporting traffic is modest and can be controlled precisely. The key idea of the method is to sample packets based on a hash function computed over the packet content. Using the same hash function will yield the same sample set of packets in the entire domain, and enables us to reconstruct packet trajectories.
We consider a distributed server system in which each host processes tasks in First-Come-First-Served order and each task's service demand is known immediately upon task arrival. We consider four task assignment policies commonly proposed for such distributed server systems: Round Robin; Random; Size-Based, in which all tasks within a given size range are assigned to a particular host; and Dynamic-Least-Work-Remaining, in which a task is assigned to the host with the least outstanding work. Using analysis and simulation, we explore the influence of task size variability on which task assignment policy is best. Surprisingly, we find that not one of the above task assignment policies is best. In particular, we find that when the task sizes are not highly variable, the Dynamic policy is preferable. However, when task sizes show the degree of variability more characteristic of empirically measured workloads, the size-based policy is the best choice. We use the resulting observations to argue in favor of a specific size-based policy, SITA-E, that shows very good performance for realistic task size distributions.
The relative importance of long-term popularity and short-term temporal correlation of references for Web cache replacement policies has not been studied thoroughly. This is partially due to the lack of accurate characterization of temporal locality that enables the identification of the relative strengths of these two sources of temporal locality in a reference stream. In [ACM Sigmetrics’00, June, 2000 (to appear), Computer Science Technical Report BUCS1999-014, Boston University], we have proposed such a metric and have shown that Web reference streams differ significantly in the prevalence of these two sources of temporal locality. These findings underscore the importance of a Web caching strategy that can adapt in a dynamic fashion to the prevalence of these two sources of temporal locality. In this paper, we propose a novel cache replacement algorithm, GreedyDual∗, which is a generalization of GreedyDual-Size. GreedyDual∗ uses the metrics proposed in [ACM Sigmetrics’00, June, 2000 (to appear), Computer Science Technical Report BUCS1999-014, Boston University] to adjust the relative worth of long-term popularity versus short-term temporal correlation of references. Our trace-driven simulation experiments show the superior performance of GreedyDual∗ when compared to other Web cache replacement policies proposed in the literature.