Conference PaperPDF Available

A Component Based Performance Comparison of Four Hypervisors

Authors:

Abstract and Figures

Virtualization has become a popular way to make more efficient use of server resources within both private data centers and public cloud platforms. While recent advances in CPU architectures and new virtualization techniques have reduced the performance cost of using virtualization, overheads still exist, particularly when multiple virtual machines are competing for resources. We have performed an extensive performance comparison under hardware-assisted virtualization settings considering four popular virtualization platforms, Hyper-V, KVM, vSphere and Xen, and find that the overheads incurred by each hypervisor can vary significantly depending on the type of application and the resources assigned to it. We also find dramatic differences in the performance isolation provided by different hypervisors. However, we find no single hypervisor always outperforms the others. This suggests that effectively managing hypervisor diversity in order to match applications to the best platform is an important, yet unstudied, challenge.
Content may be subject to copyright.
A Component-Based Performance
Comparison of Four Hypervisors
Jinho Hwang
The George Washington University
jinho10@gwu.edu
Sai Zeng and Frederick y Wu
IBM T. J. Watson Research Center
{saizeng, fywu}@us.ibm.com
Timothy Wood
The George Washington University
timwood@gwu.edu
Abstract—Virtualization has become a popular way to make
more efficient use of server resources within both private data
centers and public cloud platforms. While recent advances in CPU
architectures and new virtualization techniques have reduced the
performance cost of using virtualization, overheads still exist,
particularly when multiple virtual machines are competing for
resources. We have performed an extensive performance compar-
ison under hardware-assisted virtualization settings considering
four popular virtualization platforms, Hyper-V, KVM, vSphere
and Xen, and find that the overheads incurred by each hypervisor
can vary significantly depending on the type of application and
the resources assigned to it. We also find dramatic differences
in the performance isolation provided by different hypervisors.
However, we find no single hypervisor always outperforms the
others. This suggests that effectively managing hypervisor diver-
sity in order to match applications to the best platform is an
important, yet unstudied, challenge.
KeywordsCloud Computing, Hypervisor, Benchmark
I. INTRODUCTION
Virtualization technologies have dramatically changed how
businesses run their applications, first by enabling much greater
consolidation within private data centers, and more recently
as a driving technology behind cloud computing. Whether
it is used with public or private resources, virtualization
simplifies management, speeds up deployment, and improves
resource efficiency. Virtualization enables new features such
as performance management and reliability services to be
applied without requiring modifications to applications or
operating systems. While the overheads of the virtualization
layer still prevent it from being used in performance critical
domains such as high performance computing, virtualization
has become the norm for running a growing set of applications.
The uptake of virtualization has been driven recently by
the ease with which Infrastructure as a service (IaaS) cloud
platforms can rent users virtual machines (VMs). Many enter-
prises appreciate the rapid deployment and flexible scalability
of these clouds—traits which rely in part on the ease with
which virtual servers can be created and adjusted on the fly.
However, the main purpose of using virtualization technology
is to consolidate workloads so that one physical machine can
be multiplexed for many different users. This improves the
efficiency of an overall data center by allowing more work
to be done on a smaller set of physical nodes [1], and also
improves the per-server energy efficiency because even idle
servers consume a great deal of energy [2].
While virtualization provides many conveniences, it comes
at a cost. The hypervisor which manages the virtualization
platform incurs some overhead simply because of the layer
of abstraction it must add between a VM and the physical
resources it makes use of [3]. Further, since many VMs may
run on one physical machine, performance isolation is critical
to ensure that competing VMs do not negatively impact one
another. For example, a CPU scheduler in the hypervisor must
provide a fair amount of time to each VM and prevent a greedy
VM from hurting others [4, 5].
There are numerous virtualization platforms ranging from
open-source hypervisors such as KVM and Xen, to commercial
hypervisors such as VMware vSphere and Microsoft Hyper-V.
While the goals of these platforms are the same, the underlying
technologies vary, leaving system administrators responsible
for picking the ideal virtualization platform based on its
performance, features, and price. The choice of hypervisor
does not only apply to an enterprise’s private data center—
different cloud services make use of different virtualization
platforms. Amazon EC2, the largest infrastructure cloud, uses
Xen as a hypervisor, but Microsoft Azure uses Hyper-V and
VMware partners use ESX. Recently, Google launched its own
IaaS cloud that uses KVM as a hypervisor [6]. This kind
of hypervisor diversity causes new challenges in managing
resources due to the different APIs and feature sets supported
by each cloud and virtualization platform, but it also promises
new opportunities if applications can be carefully matched to
the best hypervisor.
The combination of new CPU architectures with embedded
virtualization support and advances in hypervisor design have
eliminated many of their performance overheads. Despite this,
popular hypervisors still exhibit different levels of perfor-
mance. As a motivating example, we have configured one
blade server with four different hypervisors in different disk
partitions: Hyper-V, KVM, VSphere, and Xen. We create an
identical Ubuntu VM with 1 virtual CPU (VCPU) and 2 GB
of RAM under each platform and measure the time required to
compile the Linux 2.6 kernel. Even with this straightforward
operation, we find significant performance differences under
each hypervisor, as illustrated in Figure 1.
While these results suggest that Xen is a poor choice for
a hypervisor, our study reveals a more complicated scenario
where each hypervisor has different strengths and weaknesses.
The result is a far more complicated management decision
than many system administrators are currently aware. The
virtualization platform (and by extension, cloud platform) best
suited for a given application is dependent on both the nature
of its resource requirements and on the types of applications it
0
20
40
60
80
100
120
140
Total Compile Time (min)
Hyper-V KVM vSphere XEN
Fig. 1. Linux Compile Workloads
will run alongside. Therefore, situational resource management
needs to be considered in highly scalable cloud architecture.
To understand the relative strengths and weaknesses of
different hypervisors, in this paper we perform an exten-
sive performance comparison of four popular virtualization
platforms. We use a component-based approach that isolates
performance by resource such as CPU, memory, disk, and
network. We also study the level of performance isolation
provided by each hypervisor to measure how competing VMs
may interfere with each other [7–10]. Our results suggest that
performance can vary between 3% and 140% depending on
the type of resource stressed by an application, and that the
level of interference caused by competing VMs can range from
2X to 10X depending on the hypervisor. We believe that this
study can act as a foundation for driving new research into
management systems designed to take advantage of hypervisor
diversity to better match applications to the platforms they will
run the most efficiently on.
II. BACKGROUND & RELATED WORK
Virtualization technology provides a way to share com-
puting resources among VMs by using hardware/software
partitioning, emulation, time-sharing, and dynamic resource
sharing. Traditionally, the operating system (OS) controls the
hardware resources, but virtualization technology adds a new
layer between the OS and hardware. A virtualization layer
provides infrastructural support so that multiple VMs (or guest
OS) can be created and kept independent of and isolated from
each other. Often, a virtualization layer is called a hypervisor
or virtual machine monitor (VMM). While virtualization has
long been used in mainframe systems [11], VMware has
been the pioneer in bringing virtualization to commodity x86
platforms, followed by Xen and a variety of other virtualization
platforms [12, 13].
Figure 2 shows three different approaches to virtualiza-
tion: para-virtualization (PV), full virtualization (FV), and
hardware-assisted virtualization (HVM). Paravirtualization re-
quires modification to the guest OS, essentially teaching the
OS how to make requests to the hypervisor when it needs
access to restricted resources. This simplifies the level of
hardware abstraction that must be provided, but version control
between the hypervisor and paravirtualized OS is difficult
since they are controlled by different organizations. Full virtu-
alization supports running unmodified guests through binary
translation. VMware uses the binary translation and direct
!"#$%&''"%
()$)*+$,-).+/#0%
1-#",%23%
4+$,-).+/)567%
8)9#$%
!"#$%&''"%
1-#",%23%
4::%
;6",%;<%;6",%;<%
!"#$%&''"%
1-#",%23%
4::%
;6",%;<%
!"#$%&%
!"#$%'%
!"#$%(%
!"#$%)%
;9'#$=).."%
()$)*+$,-).+/)567%
>-..%4+$,-).+/)567%
;<?&""+",#0%
@+7)$9%A$)7".)567%
23%B#C-#",%A$)'%,6%4::%
Fig. 2. Different Approaches to Providing the Virtualization Layer
execution techniques to create VMs capable of running propri-
etary operating systems such as Windows [12]. Unfortunately,
these techniques can incur large overheads since instructions
that manipulate protected resources must be intercepted and
rewritten. As a result, Intel and AMD have begun adding
virtualization support to hardware so that the hypervisor can
more efficiently delegate access to restricted resources [12].
Some hypervisors support several of these techniques; in our
study we focus solely on hypervisors using hardware-assisted
virtualization as this promises to offer the greatest performance
and flexibility.
Our target hypervisors are Hyper-V, KVM, VMware
vSphere, and Xen. Each of these hypervisors use different
architectures, even when restricted to HW-assisted virtualiza-
tion mode. The Windows-based Hyper-V has a significantly
different architecture than the others which all derive from
Linux. While Xen and KVM use open-source modifications of
the Linux kernel, VMware uses a custom build with proprietary
features [12]. Xen was initially based on the paravirtualization
technique, but it now supports HVM as well [13]. However,
it still retains a separate management domain (dom0) which
controls VMs and can negotiate access to custom block and
network drivers. KVM runs as a kernel module, which means
it uses most of the features of the linux kernel operating
system itself. For example, rather than providing its own CPU
scheduler for VMs, KVM treats each VM as a process and
uses the default Linux scheduler to allocate resources to them.
A variety of software and operating system aspects can
affect the performance of hypervisors and VMs. In partic-
ular, how the hypervisor schedules resources such as CPU,
memory, disk, and network are critical factors. Each of these
resources requires different techniques to virtualize, leading
to performance differences in each hypervisor depending on
the types of activities being performed. Table I summarizes
performance-related elements for each hypervisor.
Both research and development efforts have gone into
reducing the overheads incurred by virtualization layers. Prior
work by Apparao, et al., and Menon, et al., has focused on
network virtualization overheads and ways to reduce them [14,
15]. Others have examined overheads and performance vari-
ations caused by different CPU schedulers within the Xen
hypervisor [4]. Our prior work attempted to build models of
virtualization layer overheads to facilitate the migration from
native to virtual environments [16]. Benchmark comparisons
between different hypervisors have been performed both by
companies such as VMware [17], and by independent tech
TABLE I. FEATURE COMPARISONS OF HYPERVISORS (* DEFAULT OR USED IN THIS PAPER)
Features Hyper-V KVM vSphere XEN
Base OS Windows Server Linux (+QEMU) vmkernel (linux-based) Linux (+QEMU)
Latest Release Ver-
sion
2008 R2 2.6.32-279 5.0 4.1.2
Architecture Bare-Metal Bare-Metal (controversial) Bare-Metal Hosted (controversial)
Supported
Virtualization
Technologies
Para-Virtualization, Hardware-
Assisted Virtualization*
Para-Virtualization, Full Virtualiza-
tion, Hardware-Assisted Virtualiza-
tion*
Full Virtualization, Hardware-
Assisted Virtualization*
Para-Virtualization, Full Virtualiza-
tion, Hardware-Assisted Virtualiza-
tion*
CPU Scheduling
Features
Control with VM reserve, VM
limit, relative weight
Linux schedulers (Completely Fair
Queuing Scheduler*, round-robin,
fair queuing, proportionally fair
scheduling, maximum throughput,
weighted fair queuing)
Proportional Share-based
Algorithm, Relaxed Co-
Scheduling, Distributed Locking
with Scheduler Cell
SEDF (Simple Earliest Deadline
First), Credit*
SMP Scheduling
Features
CPU Topology-based Scheduling SMP-Aware Scheduling
CPU Topology-aware Load Bal-
ancing
Work-Nonconserve, Work-
Conserve*
Memory Address
Translation
Mechanism
Shadow Pagetable, Hardware-
Assisted Pagetable (Second Level
Address Translation)*
Shadow page table, Hardware-
Assisted Pagetable*
Eumulated TLB, Shadow
Pagetable, Hardware Assisted
Pagetable (nested pagetable)*
Direct Pagetable (PV mode),
Shadow Pagetable (HVM mode),
Hardware-Assisted Pagetable*
Disk Management
Features
Fixed disks, pass through disks,
dynamic disks*
No-op, anticipatory, deadline, com-
pletely fair queue (CFQ)*
Latency-aware Priority-based
scheduler, storage DRS
No-op, anticipatory, deadline, com-
pletely fair queue (CFQ)*
Network
Management
Features
TCP offload, large send offload,
VM queue
FIFO-base scheduling
Priority-based NetIOC (Network
I/O Control), TCP segmemtation
offload, netqueue, distributed vir-
tual switch
FIFO-based scheduling
bloggers [18]; however, these studies have tried to simply
find the fastest hypervisor, not understand the strengths and
weaknesses of each as we do.
III. METHODOLOGY
The methodology for our performance comparison of hy-
pervisors is to drill down each resource component one by one
with a specific benchmark workload. The components include
CPU, memory, disk I/O, and network I/O. Each component
has different virtualization requirements that need to be tested
with different workloads. We follow this with a set of more
general workloads representative of higher-level applications.
When a VM is created, it is assigned a certain number of
virtual CPUs (VCPU). VCPU can represent how many cores
this VM can use. However, VCPU does not guarantee a specific
physical CPU is dedicated to the VM, rather it represents a
flexible assignment of physical to virtual CPUs, which may be
further subdivided based on the scheduling weights of different
VMs. The CPU scheduling parameters used by the hypervisor
can impact the overhead added for handling CPU-intensive
tasks. We study cases where VMs are assigned a single VCPU
or four VCPUs (the max on our test system).
The hypervisor must provide a layer of indirection between
the guest OS and the system’s memory to ensure both per-
formance isolation and data integrity. With hardware-assisted
virtualization, this mapping is done through Extended Page
Table (Intel) or Rapid Virtualization Indexing (AMD) support
built into the Memory Management Unit (MMU), which
provides a significant performance boost compared to doing
memory translation in software [12]. Despite all using this
hardware, the hypervisors we compare can all take advantage
of it in different ways, leading to varying performance levels.
Disk IO is a common source of overhead in virtualization
platforms. If paravirtualization is used, then the IO path
between the guest VM and hypervisor can be optimized. With
full virtualization, this is not possible, and there is not yet wide
support for hardware-assisted disk device virtualization. As a
result, the hypervisor must emulate the functionality of the disk
device, potentially leading to large overheads. To characterize
these overheads, we test a range of IO types and sizes, as well
as higher level IO behavior such as that from an email, file,
or web server.
Network performance is also a critical source of over-
head for many virtualization platforms since many VMs run
network-based applications such as web sites. Therefore, two
major factors for the performance are network bandwidth
(throughput), and latency. Preliminary support for network
card-assisted virtualization is being developed [19], but this
feature is not standardized enough for our tests. Our network
benchmarks stress both low-level network characteristics and
web server performance.
In addition to the component-level tests described above,
we run several application level benchmarks. These bench-
marks illustrate how the overhead of different components
interact to determine overall performance. Finally, we also test
scenarios where multiple interfering VMs run simultaneously.
Performance isolation is an important goal for the virtual-
ization layer, particularly when used in cloud environments.
If the performance isolation fails, customers may complain
that the VM performance varies depending on other tenants’
usage pattern. In our composite benchmarking experiments,
we explore how well the hypervisor schedules or manages the
physical resources shared by VMs.
IV. BENCHMARK RESULTS
A. Experimental Setup
The goal of our system setup is to provide complete fair-
ness in all the aspects that can affect the system performance.
Hardware Setting: For a fair comparison, the hardware set-
tings are exactly the same for all the hypervisors by using one
server machine, which has two 147GB disks that are divided
into three paritions. Hyper-V occupies one partition, VMware
vSphere occupies one parition, and KVM and Xen share the
same linux installation that can be booted using either Xen or
the KVM kernel. The machine has Intel(R) Xeon (R) 5160
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
NUMERIC SORT
STRING SORT
BITFIELD
FP EMULATION
FOURIER
ASSIGNMENT
IDEA
HUFFMAN
NEURAL NET
LU DECOMPOSITION
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
0.96
0.97
0.97
0.98
0.98
0.99
0.99
1.00
1.00
1.01
1.01
NUMERIC SORT
STRING SORT
BITFIELD
FP EMULATION
FOURIER
ASSIGNMENT
IDEA
HUFFMAN
NEURAL NET
LU DECOMPOSITION
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 3. Bytemark Benchmark Result (CI = 99%)
3.00GHz/800MHz four core CPU, 8GB memory, and shared
3MB L2 cache per core (12MB). The disk has LSI logic 1064E
SAS 3 GBps controller IBM-ESXS model, and the network is
dual broadcom 5708S gigabit ethernet.
Guest OS: The base guest VM OS is Ubuntu 10.04 LTS
Lucid Lynx (Linux kernel 2.6.32), 10GB size disk image,
and has 2048MB memory assigned. Each hypervisor has this
base guest VM with exactly the same environment setup.
Interference generator VMs use the same setting with the base
guest VM, but it is assigned only 1024MB of memory.
B. Bytemark
Bytemark [20] is primarily designed to stress the ca-
pabilities of the CPU. It tests a *nix machine in different
ways, and runs both integer and floating point arithmetic
tests. 10 workloads include: 1) numeric sort sorts an array
of 32-bit integers, 2) string sort sorts an array of strings of
arbitrary length, 3) bitfield executes a variety of bit manipula-
tion functions, 4) emulated floating-point is a small software
floating-point package, 5) Fourier coefficients is a numerical
analysis routine for calculating series, and approximations of
waveforms, 6) assignment algorithm is a well-known task
allocation algorithm, 7) Huffman compression is is well-known
text and graphics compression algorithm, 8) IDEA encryption
is a relatively new block cipher algorithm, 9) Neural Net is a
small but functional back-propagation network simulator, 10)
LU Decomposition is a robust algorithm for solving linear
equations.
As seen in Figure 3, we find that all of the hypervisors
perform quite similarly when running these benchmarks. This
is because basic CPU operations require no help from the
0.96
0.96
0.97
0.97
0.98
0.98
0.99
0.99
1.00
1.00
1.01
1.01
INT Copy
INT Scale
INT Add
INT Triad
FLOAT Copy
FLOAT Scale
FLOAT Add
FLOAT Triad
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
INT Copy
INT Scale
INT Add
INT Triad
FLOAT Copy
FLOAT Scale
FLOAT Add
FLOAT Triad
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 4. Ramspeed Benchmark Result (CI = 99%)
hypervisor at all, allowing them to run at near-native speeds.
Increasing the number of VCPUs running the benchmarks from
one to four gives similar results for all of the hypervisors, with
only minor fluctuations. Confidence interval (CI) is 99% for
both cases. CI is calculated with E = t(α/2) × s/
n, where
E is the maximum error, α is 1-degree of confidence, t(·) is
t-value for a two-tailed distribution, s is the standard deviation,
and n is the number of samples. To find t(·), we use Student
t distribution table with the corresponding confidence interval.
We obtain µ ± E, where µ is the mean of the samples.
C. Ramspeed
Ramspeed (a.k.a. ramsmp for symmetric multiprocess-
ing) [21] is a tool to measure cache and memory bandwidth.
Workloads include integer and float operations. Copy simply
transfers data from one memory location to another. Scale
modifies the data before writing by multiplying with a certain
constant value. Add reads data from the first memory location,
then reads from the second, adds them up and writes the result
to the third place. Triad is a merge of Add and Scale.
The benchmark results shown in Figure 4(a) illustrate
the memory access performance of all hypervisors running
on a single VCPU are all within 3%. However, Figure 4(b)
shows up to a 25% performance difference in FLOAT Triad
case for KVM. While all of the other hypervisors exhibit
increased performance from the availability of multiple VCPUs
(e.g., vSphere rises from 2977 MB/s to 4086 MB/s), KVM
performance stays relatively flat (a rise from 2963 MB/s to
only 3210 MB/s). This suggests that KVM is not able to fully
saturate the memory bus, perhaps because dedicating a full 4
VCPUs to the guest OS causes greater interference with the
primary Linux system under which KVM is installed. Notably,
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Char Output
Block Output
Rewrite
Char Input
Block Input
Random Seek
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Char Output
Block Output
Rewrite
Char Input
Block Input
Random Seek
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 5. Bonnie++ Benchmark Result (CI = 95%)
Xen sees a similar, though not as significant performance gap
when moving to four VCPUs—not a surprise since both KVM
and Xen have heavy weight management systems, requiring a
full linux OS to be loaded to run the hypervisor.
D. Bonnie++ & FileBench
Bonnie++ [22] is a disk throughput benchmark. Our system
does not have any form of hardware-assisted disk virtualiza-
tion, so all of the hypervisors must take part in disk opera-
tions. As a result, we expect a larger performance difference
between the hypervisors in this test. The benchmark results
in Figure 5(a) illustrate that while most of the hypervisors
have relatively similar performance across the different IO
types, Xen has significantly lower performance in the tests
involving character level IO. We believe that the performance
drop of up to 40% may be caused by higher overhead
incurred by each IO request. Surprisingly, KVM beats out
all competitors, even though it uses the same QEMU based
backend as Xen for disk processing. Figure 5(b) shows that
increasing the VCPU count and adding more disk-intensive
threads in the benchmark causes a negligible performance gain
for Xen, KVM, and vSphere, but surprisingly causes a dramatic
decrease in throughput for Hyper-V (from 50.8 Kops/s to only
9.8 Kops/sec for the Char output test). We see this anomalous
behavior as long as greater than one VCPU is assigned to
the VM, perhaps indicating that Hyper-V’s IO scheduler can
experience thrashing when too many simultaneous IO requests
are issued.
FileBench [23] is a model based file system workload
generator that can mimic mail server, file server, and web
server workloads. The results in Figure 6 confirm those from
Bonnie++ since we again find Xen with lowest performance
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Mail Server File Server Web Server
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Mail Server File Server Web Server
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 6. Filebench Benchmark Result (CI = 95%)
and also see Hyper-V’s performance drop when the number
of cores is increased. Xen has the worst performance under
the mail server workload, which makes sense because the
workload contains a large number of small disk writes. In con-
trast, the web workload is read intensive, closer to Bonnie++’s
random-seek workload in which Xen performed well.
E. Netperf
Netperf [24] is a network performance benchmark that can
be used to measure various aspects of networking features. The
primary foci are bulk (aka unidirectional) data transfer and
request/response performance using either TCP or UDP and
the Berkeley Sockets interface. A test is based on the Netperf
TCP_STREAM test with a single VCPU. Without defining
MessageSize and SocketSize, the maximum throughput per
second is measured. Figure 7 shows the throughput per second
when using Netperf to transfer data from the test machine using
TCP sockets. The network throughput of vSphere is about 22%
more than that of Xen. Xen may have more overhead due to the
network transmission using the network backend driver located
in dom0. This requires more levels of indirection compared to
other hypervisors, which in turn affects overall throughput.
F. Application Workloads
We have already demonstrated the Linux kernel compile
performance in Figure 1. This illustrated a performance gap
between Xen and the other hypervisors as high as 40%. From
our component based benchmarks, this difference makes sense
and is probably a result of Xen’s moderate disk overheads.
We test a larger set of applications using the freebench
software [25]. As shown in Table II, freebench is an open-
source multi-platform benchmarking suite, providing game,
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere XEN
Fig. 7. Netperf Benchmark (CI = 95%)
audio encoding, compression, scientific, HTML processing,
photo procesing, encryption, and decompression workloads.
Figure 8(a) shows the benchmark results for 1 VCPU
case, while Figure 8(b) shows results with 4VCPUs. The re-
sults show significant inconsistencies—the bzip2.decompress,
scimark, tidy, and openssl benchmarks all have dramatically
different results depending on the hypervisor and number of
cores used. While the bzip benchmark includes a modest
amount of IO, the others are all dominated by CPU and
memory activity. This makes the variability we observe all
the more surprising since the CPU and memory benchmarks
indicated a relatively consistent ranking of hypervisor speed.
G. Multi-Tenant Interference
While our previous tests have only considered a single VM
running in isolation, it is far more common for each server to
run multiple VMs simultaneously. As virtualization platforms
attempt to minimize the interference between these VMs, mul-
tiplexing inevitably leads to some level of resource contention.
That is, if there is more than one virtual machine which tries
to use the same hardware resource, the performance of one
virtual machine can be affected by other virtual machines.
Even though the schedulers in hypervisors mainly isolate
each virtual machine within the amount of assigned hardware
resources, interference still remains in most of hypervisors [7–
10].
As the web server is a popular service in cloud infrastruc-
tures, we want to see how its performance changes when other
VMs run applications on the same host. In order to see the
impact of each component, CPU, Memory, Disk, and network,
we measure the HTTP response while stressing each of the
resource components with different benchmarks.
Figure 9 shows the impact of interference in each hypervi-
sor. There are four VMs: one VM runs a simple web service
being accessed by a client, and the other three are used for
interference generators. The experiment is divided into four
phases: first a CPU based benchmark is run, followed by mem-
ory, disk, and finally a network intensive application. During
each phase, all three interfering VMs run the same benchmark
workload and we measure the performance impact on the web
VM. Note that due to benchmark timing constraints, the start
and end of some phases have short periods where no interfering
VMs are running. With no interference, all hypervisors have a
base web response time of approximately 775 ms.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
gnugo.tactics
ogg.tibetan-chant
bzip2.decompress-9
scimark2.small
tidy.xml
dcraw.d300
povray.reduced
openssl.aes
bzip2.compress-9
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
gnugo.tactics
ogg.tibetan-chant
bzip2.decompress-9
scimark2.small
tidy.xml
dcraw.d300
povray.reduced
openssl.aes
bzip2.compress-9
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 8. Freebench Benchmark Result (CI = 95%)
Figure 9(a) illustrates Hyper-V is sensitive to CPU, mem-
ory, and network interference. Not surprisingly, the interfering
disk benchmarks have little impact on the web server since
it is able to easily cache the files it is serving in memory.
Figure 9(b) shows the interference sensitivity of KVM; while
KVM shows a high degree of variability in response time, none
of the interfering benchmarks significantly hurt performance.
Figure 9(c) shows the interference sensitivity of vSphere to
memory is high, whereas the sensitivity to CPU, disk, and
network is very small. Finally, Figure 9(d) shows the interfer-
ence sensitivity of Xen on memory and network is extremely
high compared to the other hypervisors.
Figure 9(e) shows the direct comparison of four hypervi-
sors. A base line shows the average response time without
using hypervisors. As we also can see from Figure 7, Xen
has inherent network overheads so that other workloads affect
more on the application performance.
V. DISCUSSION
Our experimental results paint a complicated picture about
the relative performance of different hypervisors. Clearly, there
is no perfect hypervisor that is always the best choice; different
applications will benefit from different hypervisors depending
on their performance needs and the precise features they
require. Overall, vSphere performs the best in our tests, not
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
0 100 200 300 400 500 600
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(a) Hyper-V
600
700
800
900
1000
1100
1200
1300
1400
0 100 200 300 400 500 600
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(b) KVM
600
700
800
900
1000
1100
1200
1300
0 100 200 300 400 500 600 700
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(c) vSphere
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 100 200 300 400 500 600
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(d) Xen
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
CPU MEM DISK NET
Avg. Resp. Time (sec)
Base Line
Hyper-V
KVM
vSphere
Xen
(e) Performance Comparison
Fig. 9. Interference Impact for Web Requests: 4 VMs (1 web server, 3 workload generators) are used. 3 VMs run the same workload at the same time. The
workloads run in the sequence of CPU, memory, disk, and network workloads over time span. We can easily identify 4 interference sections from each graph.
TABLE II. FREEBENCH BENCHMARKS
Items Category Notes
gnugo.tactics Game
GNU Go is a free program that
plays the game of Go
ogg.tibetan-chant Audio Encoding Encoding a song, Tibetan Chant
bzip2.decompress-9 Compression Compress a file in bzip2
scimark2.small Scientific
SciMark 2.0 is a Java benchmark
for scientific and numerical com-
puting. It measures several com-
putational kernels and reports a
composite score in approximate
Mflops/s
tidy.xml
HTML process-
ing tool
HTML syntax checker and refor-
matter
dcraw.d300 Photo Rendering
Dcraw is a photo rendering tool,
and become a standard tool within
and without the Open Source world
povray.reduced
Persistence of
Vision Ray-
Tracer
Persistence of Vision Ray-Tracer
creates three-dimensional, photo-
realistic images using a rendering
technique called ray-tracing
openssl.aes Encryption OpenSSL AES encryption
bzip2.compress-9 Decompression Decompress a file in bzip2
surprisingly since VMware’s products have been the longest
in development and have the largest group of dedicated devel-
opers behind them. However, the other three hypervisors all
perform respectably, and each of the tested hypervisors has at
least one benchmark for which it outperforms all of the others.
In general, we find that CPU and memory related tasks
experience the lowest levels of overhead, although KVM
experiences higher memory overheads when all of the sys-
tem’s cores are active. Performance diverges more strongly
for IO activities, where Xen exhibits high overheads when
performing small disk operations. Hyper-V also experiences
a dramatic slowdown when multiple cores are dedicated to
running small, sequential reads and writes. Xen also suffers in
network throughput. It is worth noting that we test Xen using
hardware-assisted full virtualization, whereas the hypervisor
was originally developed for paravirtualization. In practice,
public clouds such as Amazon EC2 use Xen in paravirtualized
mode for all but their high-end instance types.
Our application level tests match these results, with differ-
ent hypervisors exhibiting different overheads depending on
the application and the number of cores assigned to them. All
of this variation suggests that properly matching an application
to the right hypervisor is difficult, but may well be worth the
effort since performance variation can be as high as 140%. We
believe that future management systems should be designed to
exploit this diversity. To do so, researchers must overcome the
inherent challenges in managing multiple systems with differ-
ent APIs, and the difficulty in determining what hypervisor best
matches an application’s needs. VM interference also remains
a challenge for all of the hypervisors tested, and is another area
where properly designed management systems may be able to
help.
While we have taken every effort to configure the physical
systems and VMs running on them identically, it is true that
the performance of each hypervisor can vary significantly
depending on how it is configured. However, this implies that
there may be even greater potential for variability between
hypervisors if they are configured away from their default
settings. Thus the goal of our work is not to definitively show
one hypervisor to be better than the others, but to show that
each have their own strengths and weaknesses.
VI. CONCLUSION
In this paper we have extensively compared four hyper-
visors: Hyper-V, KVM, vSphere, and Xen. We show their
performance differences and similarities in a variety of sit-
uations. Our results indicate that there is no perfect hyper-
visor, and that different workloads may be best suited for
different hypervisors. We believe that the results of our study
demonstrate the benefits of building highly heterogeneous
data center and cloud environments that support a variety
of virtualization and hardware platforms. While this has the
potential to improve efficiency, it also will introduce a number
of new management challenges so that system administrators
and automated systems can properly make use of this diversity.
Our results also illustrate how competing VMs can have a high
degree of performance interference. Properly determining how
to place and allocate resources to virtual servers will remain
an important management challenge due to the shared nature
of virtualization environments. Our future research is to solve
these problems.
REFERENCES
[1] Vijayaraghavan Soundararajan and Kinshuk Govil,
“Challenges in building scalable virtualized datacenter
management, SIGOPS Oper. Syst. Rev., vol. 44, no. 4,
pp. 95–102, Dec. 2010.
[2] Luiz André Barroso and Urs Hölzle, “The case for
energy-proportional computing, Computer, vol. 40, no.
12, pp. 33–37, Dec. 2007.
[3] Timothy Wood, Ludmila Cherkasova, Kivanc Ozonat,
and Prashant Shenoy, “Profiling and modeling resource
usage of virtualized applications, in Proceedings of
the 9th ACM/IFIP/USENIX International Conference on
Middleware, New York, NY, USA, 2008, Middleware
’08, pp. 366–387, Springer-Verlag New York, Inc.
[4] Ludmila Cherkasova, Diwaker Gupta, and Amin Vahdat,
“Comparison of the three cpu schedulers in xen, SIG-
METRICS Perform. Eval. Rev., vol. 35, no. 2, pp. 42–51,
Sept. 2007.
[5] Diego Ongaro, Alan L. Cox, and Scott Rixner, “Schedul-
ing i/o in virtual machine monitors, in Proceedings of the
fourth ACM SIGPLAN/SIGOPS international conference
on Virtual execution environments, New York, NY, USA,
2008, VEE ’08, pp. 1–10, ACM.
[6] Google Compute Engine,
“http://cloud.google.com/compute, 2012.
[7] Melanie Kambadur, Tipp Moseley, Rick Hank, and
Martha A. Kim, “Measuring interference between live
datacenter applications, in Proceedings of the Inter-
national Conference on High Performance Computing,
Networking, Storage and Analysis, Los Alamitos, CA,
USA, 2012, SC ’12, pp. 51:1–51:12, IEEE Computer
Society Press.
[8] Jason Mars, Neil Vachharajani, Robert Hundt, and
Mary Lou Soffa, “Contention aware execution: online
contention detection and response, in Proceedings of
the 8th annual IEEE/ACM international symposium on
Code generation and optimization, New York, NY, USA,
2010, CGO ’10, pp. 257–265, ACM.
[9] Jason Mars, Lingjia Tang, and Mary Lou Soffa, “Directly
characterizing cross core interference through contention
synthesis, in Proceedings of the 6th International
Conference on High Performance and Embedded Ar-
chitectures and Compilers, New York, NY, USA, 2011,
HiPEAC ’11, pp. 167–176, ACM.
[10] Gang Ren, Eric Tune, Tipp Moseley, Yixin Shi, Silvius
Rus, and Robert Hundt, “Google-wide profiling: A
continuous profiling infrastructure for data centers, IEEE
Micro, vol. 30, no. 4, pp. 65–79, July 2010.
[11] Stuart Devenish, Ingo Dimmer, Rafael Folco, Mark Roy,
Stephane Saleur, Oliver Stadler, and Naoya Takizawa,
“Ibm powervm virtualization introduction and configu-
ration, Redbooks, 1999.
[12] VMware, “Understanding full virtualization, paravirtu-
alization, and hardware assist, VMware White Paper,
2007.
[13] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris,
A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen
and the art of virtualization, ACM SOSP, 2003.
[14] Padma Apparao, Srihari Makineni, and Don Newell,
“Characterization of network processing overheads in
xen, in Proceedings of the 2nd International Workshop
on Virtualization Technology in Distributed Computing,
Washington, DC, USA, 2006, VTDC ’06, pp. 2–, IEEE
Computer Society.
[15] Aravind Menon, Jose Renato Santos, Yoshio Turner,
G. (John) Janakiraman, and Willy Zwaenepoel, “Diag-
nosing performance overheads in the xen virtual machine
environment, in Proceedings of the 1st ACM/USENIX in-
ternational conference on Virtual execution environments,
New York, NY, USA, 2005, VEE ’05, pp. 13–23, ACM.
[16] Jinho Hwang and Timothy Wood, Adaptive dynamic
priority scheduling for virtual desktop infrastructures.,
in IWQoS. 2012, pp. 1–9, IEEE.
[17] VMware, A performance comparison of hypervisors,
VMware White Paper, 2007.
[18] Vmware vs Virtualbox vs KVM vs XEN,
“http://www.ilsistemista.net/index.php/virtualization/
1-virtual-machines-performance-comparison.html,
2010.
[19] Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan,
and Kais Belgaied, “Crossbow: from hardware virtualized
nics to virtualized networks, in Proceedings of the 1st
ACM workshop on Virtualized infrastructure systems and
architectures, New York, NY, USA, 2009, VISA ’09, pp.
53–62, ACM.
[20] Bytemark, “http://www.tux.org/ mayer/linux/bmark.html,
2003.
[21] Ramspeed, “http://www.alasir.com/software/ramspeed,
2009.
[22] Bonnie++, “http://www.textuality.com/bonnie, 2004.
[23] Filebench, “http://sourceforge.net/projects/filebench,
2004.
[24] Netperf, “http://www.netperf.org/netperf, 2012.
[25] Freebench, “http://code.google.com/p/freebench, 2008.
... At the same time, Hyper-V is a Windows-based kernel that requires more abstraction layers which reduces the overall performance [18][19][20]. The authors in [19,21] compared the performance of different hypervisors such as KVM, Hyper-V, Xen, and vSphere. They found that no hypervisor is superior in terms of performance, and each hypervisor should be examined to meet a particular need. ...
... The authors state that KVM is, theoretically, better than Hyper-V because of its lightweight kernel and open-source nature. [19,21] In these papers, the authors compared the performance of different hypervisor types such as KVM, Hyper-V, Xen, and vSphere. ...
... Choosing the right hypervisor will continue to be an important challenge for proper virtualization management. [21,23] In these papers, the authors compare KVM and Hyper-V as well as other hypervisors using various criteria, such as responsiveness to SQL workloads, file server workloads, and web server workloads. ...
Article
Full-text available
As the extensive use of cloud computing raises questions about the security of any personal data stored there, cryptography is being used more frequently as a security tool to protect data confidentiality and privacy in the cloud environment. A hypervisor is a virtualization software used in cloud hosting to divide and allocate resources on various pieces of hardware. The choice of hypervisor can significantly impact the performance of cryptographic operations in the cloud environment. An important issue that must be carefully examined is that no hypervisor is completely superior in terms of performance; Each hypervisor should be examined to meet specific needs. The main objective of this study is to provide accurate results to compare the performance of Hyper-V and Kernel-based Virtual Machine (KVM) while implementing different cryptographic algorithms to guide cloud service providers and end users in choosing the most suitable hypervisor for their cryptographic needs. This study evaluated the efficiency of two hypervisors, Hyper-V and KVM, in implementing six cryptographic algorithms: Rivest, Shamir, Adleman (RSA), Advanced Encryption Standard (AES), Triple Data Encryption Standard (TripleDES), Carlisle Adams and Stafford Tavares (CAST-128), BLOWFISH, and TwoFish. The study’s findings show that KVM outperforms Hyper-V, with 12.2% less Central Processing Unit (CPU) use and 12.95% less time overall for encryption and decryption operations with various file sizes. The study’s findings emphasize how crucial it is to pick a hypervisor that is appropriate for cryptographic needs in a cloud environment, which could assist both cloud service providers and end users. Future research may focus more on how various hypervisors perform while handling cryptographic workloads.
... This classification is illustrated in figure 1. [20], includes two categories: Type-1, which runs directly on hardware, and Type-2, which runs on top of another OS. Hwang et al. [21] further identifies three approaches for providing the virtualization layer: Full Virtualization, Paravirtualization, and Hardware-Assisted (HWA) Virtualization. These approaches differ in their use of hardware functionalities and the number of system calls intercepted by the Virtual Machine Monitor (VMM). ...
... Anyway, these models must be fed with temporal information, which have been the focus of previous works. Performance evaluation has been done mainly for traditional virtual machine execution in Clouds [9]. In [4,10], Virtual Machines and containers are compared attending several performance metrics. ...
... The VMware vSphere 5.5 (Lowe, 2011) hypervisor has been used for virtualization services, which also comprises VMware vCenter Server for remote management of the ESXi servers. VMware vSphere was chosen for the development of the V-Network testbed due to its strong performance in comparison to KVM, Xen and Microsoft Hyper-V in the utilization of CPU, memory disk I/O and network I/O as determined by Hwang et al. (2013). The V-Network testbed has features such as: ...
Article
Full-text available
This paper presents a virtualised network environment that serves as a stable and re-usable platform for the analysis of malware propagation. The platform, which has been developed using VMware virtualisation technology, enables the use of either a graphical user interface or scripts to create virtual networks, clone, restart and take snapshots of virtual machines, reset experiments, clean virtual machines and manage the entire infrastructure remotely. The virtualised environment uses open source routing software to support the deployment of intrusion detection systems and other malware attack sensors, and is therefore suitable for evaluating countermeasure systems before deployment on live networks. An empirical analysis of network worm propagation has been conducted using worm outbreak experiments on Class A size networks to demonstrate the capability of the developed platform. INTRODUCTION Malicious software (malware) is a significant risk to the security of computer systems, particularly self-propagating malware (termed a worm) because of its highly virulent nature (Ahmad and Woodhead, 2015). To fully understand the propagation behaviour and infection patterns of computer network worms, security researchers need to have a safe and convenient environment that is isolated from the Internet in order to analyse the behaviour of this malicious software. Large scale network worm outbreak scenarios are difficult to simulate due to the complexity and resources required in setting up a controlled environment for worm propagation and countermeasure testing (Floyd and Paxson, 2001). Thus, this article presents a virtualised network environment termed V-Network. V-Network has been developed with the aim of studying the infection and propagation patterns of network worms and testing a range of countermeasure systems. V-Network is platform independent, which makes it convenient for UNIX-based or Windows-based experimentation. Finally, V-Network has been designed with the capability of resetting and re-running experiments from a standard baseline in a controlled environment either using a graphical user interface or command line scripts. The remainder of this article is organized as follows. Section 2 presents a summary of the related work to the study. Section 3 presents the design of the V-Network testbed. Section 4 presents a small range of worm experiments conducted using the V-Network testbed. Section 5 discusses the impact of the V-Network testbed. Section 6 concludes the study and identifies possible future work.
... Initially, system VMs had significant overhead, but this has been reduced over the years due to software optimizations and hardware advances. By recognizing this, extensive performance evaluations of hypervisors and non-virtualized execution have been conducted and found, in some scenarios, significant overhead [22][23][24][25][26]. ...
Article
Full-text available
Traditional hypervisor-assisted virtualization is a leading virtualization technology in data centers, providing cost savings (CapEx and OpEx), high availability, and disaster recovery. However, its inherent overhead may hinder performance and seems not scale or be flexible enough for certain applications, such as microservices, where deploying an application using a virtual machine is a longer and resource-intensive process. Container-based virtualization has received attention, especially with Docker, as an alternative, which also facilitates continuous integration/continuous deployment (CI/CD). Meanwhile, LXD has reactivated the interest in Linux LXC containers, which provides unique operations, including live migration and full OS emulation. A careful analysis of both options is crucial for organizations to decide which best suits their needs. This study revisits key concepts about containers, exposes the advantages and limitations of each container technology, and provides an up-to-date performance comparison between both types of containers (applicational vs. system). Using extensive benchmarks and well-known workload metrics such as CPU scores, disk speed, and network throughput, we assess their performance and quantify their virtualization overhead. Our results show a clear overall trend toward meritorious performance and the maturity of both technologies (Docker and LXD), with low overhead and scalable performance. Notably, LXD shows greater stability with consistent performance variability.
... Recognizing this, extensive performance evaluations of hypervisors and non-virtualized execution have been conducted [34][35][36][37][38]. Nevertheless, previous comparison of VMs mostly relied on older software like Xen, KVM, or even VMware and out-of-tree patches. ...
Article
Full-text available
Virtualization technologies are indispensable in operating data centers and supporting cloud infrastructures, providing cost reduction (CapEx and OpEx), high availability, and disaster recovery. Hypervisor-assisted virtualization is one of the leading virtualization technologies, with the hypervisor being the software layer responsible for presenting the virtualized view of the hardware to system-level VMs. However, the virtualization overhead it introduces has implications into the computing infrastructure performance. This paper revisits key concepts about virtualization, technologies and techniques, types of VMs and hypervisors, and provides an up-to-date comparison between native and VM environments using workload metrics such as CPU and memory scores, disk speed, and network throughput to determine virtualization overhead. Our results show a clear overall trend toward meritorious performance and the maturity of the technologies used to create system-level VMs.
... They reported results such as CPU utilization, disk utilization, and response time during live migrations. In [24], the authors assessed Microsoft Hyper-V, KVM, VMware vSphere, and Xen in a server that had an Intel Xeon 5160 quad-core processor using several benchmarking tools (e.g., RAMspeed, Bonnie++, Filebench, and Netperf). Kumar and Singh [27] benchmarked VMware ESXi and Citrix XenServer. ...
Preprint
This paper provides a global picture about the deployment of networked processing services for genomic data sets. Many current research make an extensive use genomic data, which are massive and rapidly increasing over time. They are typically stored in remote databases, accessible by using Internet. For this reason, a significant issue for effectively handling genomic data through data networks consists of the available network services. A first contribution of this paper consists of identifying the still unexploited features of genomic data that could allow optimizing their networked management. The second and main contribution of this survey consists of a methodological classification of computing and networking alternatives which can be used to offer what we call the Genomic-as-a-Service (GaaS) paradigm. In more detail, we analyze the main genomic processing applications, and classify not only the main computing alternatives to run genomics workflows in either a local machine or a distributed cloud environment, but also the main software technologies available to develop genomic processing services. Since an analysis encompassing only the computing aspects would provide only a partial view of the issues for deploying GaaS system, we present also the main networking technologies that are available to efficiently support a GaaS solution. We first focus on existing service platforms, and analyze them in terms of service features, such as scalability, flexibility, and efficiency. Then, we present a taxonomy for both wide area and datacenter network technologies that may fit the GaaS requirements. It emerges that virtualization, both in computing and networking, is the key for a successful large-scale exploitation of genomic data, by pushing ahead the adoption of the GaaS paradigm. Finally, the paper illustrates a short and long-term vision on future research challenges in the field.
Conference Paper
Full-text available
Virtual Desktop Infrastructures (VDIs) are gaining popularity in cloud computing by allowing companies to deploy their office environments in a virtualized setting instead of relying on physical desktop machines. Consolidating many users into a VDI environment can significantly lower IT management expenses and enables new features such as "available-anywhere" desktops. However, barriers to broad adoption include the slow performance of virtualized I/O, CPU scheduling interference problems, and shared-cache contention. In this paper, we propose a new soft real-time scheduling algorithm that employs flexible priority designations (via utility functions) and automated scheduler class detection (via hypervisor monitoring of user behavior) to provide a higher quality user experience. We have implemented our scheduler within the Xen virtualization platform, and demonstrate that the overheads incurred from co-locating large numbers of virtual machines can be reduced from 66% with existing schedulers to under 2% in our system. We evaluate the benefits and overheads of using a smaller scheduling time quantum in a VDI setting, and show that the average overhead time per scheduler call is on the same order as the existing SEDF and Credit schedulers.
Conference Paper
Full-text available
In this paper, we present a direct methodology and framework for the measurement and characterization of an application's cross-core interference sensitivity on multicore microarchitectures. While prior works use indirect indicators, such as last level cache miss rate, to infer an application's cross-core interference sensitivity, our approach is direct, in that it characterizes the application's cross-core interference sensitivity using the performance impact due to actual contention. Our methodology and framework, the Cross-core interference Profiling Environment, or CiPE, is composed of a lightweight runtime environment on which a host application runs, along with a carefully designed contention synthesis engine that executes on a neighboring core. CiPE manipulates the co-running contention synthesis engine, while monitoring and analyzing the resulting dynamic impact on the host application. CiPE is able to characterize the cross-core interference sensitivity of the entire application, its individual phases, or source level code regions. To demonstrate the effectiveness of CiPE, we use CiPE characterizations to address two pressing problems. First, we use CiPE characterizations to perform contention conscious batch scheduling that minimizes cross-core interference, resulting in a 12% performance improvment on average when applied to the SPEC2006 benchmark suite, and beyond 20% in the case of mcf and omnetpp. Second, we use CiPE to design a performance analysis tool that is capable identifying contentious bottlenecks in application code.
Conference Paper
Full-text available
Virtual Machine (VM) environments (e.g., VMware and Xen) are experiencing a resurgence of interest for diverse uses including server consolidation and shared hosting. An application's performance in a virtual machine environment can differ markedly from its performance in a non-virtualized environment because of interactions with the underlying virtual machine monitor and other virtual machines. However, few tools are currently available to help debug performance problems in virtual machine environments.In this paper, we present Xenoprof, a system-wide statistical profiling toolkit implemented for the Xen virtual machine environment. The toolkit enables coordinated profiling of multiple VMs in a system to obtain the distribution of hardware events such as clock cycles and cache and TLB misses. The toolkit will facilitate a better understanding of performance characteristics of Xen's mechanisms allowing the community to optimize the Xen implementation.We use our toolkit to analyze performance overheads incurred by networking applications running in Xen VMs. We focus on networking applications since virtualizing network I/O devices is relatively expensive. Our experimental results quantify Xen's performance overheads for network I/O device virtualization in uni- and multi-processor systems. With certain Xen configurations, networking workloads in the Xen environment can suffer significant performance degradation. Our results identify the main sources of this overhead which should be the focus of Xen optimization efforts. We also show how our profiling toolkit was used to uncover and resolve performance bugs that we encountered in our experiments which caused unexpected application behavior.
Conference Paper
Full-text available
This paper explores the relationship between domain scheduling in a virtual machine monitor (VMM) and I/O performance. Tradition- ally, VMM schedulers have focused on fairly sharing the processor resources among domains while leaving the scheduling of I/O re- sources as a secondary concern. However, this can result in poor and/or unpredictable application performance, making virtualiza- tion less desirable for applications that require efficient and consis- tent I/O behavior. This paper is the first to study the impact of the VMM scheduler on performance using multiple guest domains concurrently run- ning different types of applications. In particular, different com- binations of processor-intensive, bandwidth-intensive, and latency- sensitive applications are run concurrently to quantify the impacts of different scheduler configurations on processor and I/O perfor - mance. These applications are evaluated on 11 different scheduler configurations within the Xen VMM. These configurations include a variety of scheduler extensions aimed at improving I/O perfor- mance. This cross product of scheduler configurations and applica- tion types offers insight into the key problems in VMM scheduling for I/O and motivates future innovation in this area.
Conference Paper
Full-text available
Cross-core application interference due to contention for shared on-chip and off-chip resources pose a significant challenge to providing application level quality of service (QoS) guarantees on commodity multicore micro-architectures. Unexpected cross-core interference is especially problematic when considering latency-sensitive applications that are present in the web service data center application domains, such as web-search. The commonly used solution is to simply disallow the co-location of latency-sensitive applications and throughput-oriented batch applications on a single chip, leaving much of the processing capabilities of multicore micro-architectures underutilized. In this work we present a Contention Aware Execution Runtime (CAER) environment that provides a lightweight runtime solution that minimizes cross-core interference due to contention, while maximizing utilization. CAER leverages the ubiquitous performance monitoring capabilities present in current multicore processors to infer and respond to contention and requires no added hardware support. We present the design and implementation of the CAER environment, two separate contention detection heuristics, and approaches to respond to contention online. We evaluate our solution using the SPEC2006 benchmark suite. Our experiments show that when allowing co-location with CAER, as opposed to disallowing co-location, we are able to increase the utilization of the multicore CPU by 58% on average. Meanwhile CAER brings the overhead due to allowing co-location from 17% down to just 4% on average.
Conference Paper
Application interference is prevalent in datacenters due to contention over shared hardware resources. Unfortunately, understanding interference in live datacenters is more difficult than in controlled environments or on simpler architectures. Most approaches to mitigating interference rely on data that cannot be collected efficiently in a production environment. This work exposes eight specific complexities of live datacenters that constrain measurement of interference. It then introduces new, generic measurement techniques for analyzing interference in the face of these challenges and restrictions. We use the measurement techniques to conduct the first large-scale study of application interference in live production datacenter workloads. Data is measured across 1000 12-core Google servers observed to be running 1102 unique applications. Finally, our work identifies several opportunities to improve performance that use only the available data; these opportunities are applicable to any datacenter.
Conference Paper
form only given. The main reasons hindering the wide spread deployment of passive RFID tags are high cost and limited range. The present work focuses on developing a sub-cent RFID capable of operating from a reasonable distance, though with some compromise on the information content. Since there are ample applications of read-only RFID with limited information content, the present technology is expected to fill a substantial part of the niche of sub-cent tags. A metal patch on a metallic ground plane, separated by a dielectric, acts like a microstrip patch antenna and has scattering characteristic defined by poles and zeros depending on the dimensions of the patch. Such resonating structures can be used to create tags, with a purpose of storing information in the various resonant frequencies. Multiple patches, either stacked on top of each other, or located transversely, can be used to increase information content. The challenge is to retrieve these resonant frequencies - from single or multiple patches - in presence of clutter (unwanted scatter) from surrounding objects without the use of any non-linear elements. The situation becomes especially difficult in presence of large metallic objects creating significant amounts of clutter. We have used soft-computing techniques to analyze the nature of the clutter signal. Multilayer Perceptron trained with error back propagation could deliver very accurate estimation of the resonant frequencies in realtime. We discuss in detail the experimental set-up, data collection and analysis methodology and demonstrate the stability of the results for signals measured at a distance, even in presence for of impairments.
Article
Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application-platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.
Conference Paper
This paper describes a new architecture for achieving net- work virtualization using virtual NICs (VNICs) as the build- ing blocks. The VNICs can be associated with dedicated and independent hardware lanes that consist of dedicated NIC and kernel resources. Hardware lanes support dynamic polling, which enables the fair sharing of bandwidth with no performance penalty. VNICs ensure full separation of trac for virtual machines within the host. A collection of VNICs on one or more physical machines can be connected to create a Virtual Wire by assigning them a common attribute such as a VLAN tag.