Conference PaperPDF Available

A Component Based Performance Comparison of Four Hypervisors


Abstract and Figures

Virtualization has become a popular way to make more efficient use of server resources within both private data centers and public cloud platforms. While recent advances in CPU architectures and new virtualization techniques have reduced the performance cost of using virtualization, overheads still exist, particularly when multiple virtual machines are competing for resources. We have performed an extensive performance comparison under hardware-assisted virtualization settings considering four popular virtualization platforms, Hyper-V, KVM, vSphere and Xen, and find that the overheads incurred by each hypervisor can vary significantly depending on the type of application and the resources assigned to it. We also find dramatic differences in the performance isolation provided by different hypervisors. However, we find no single hypervisor always outperforms the others. This suggests that effectively managing hypervisor diversity in order to match applications to the best platform is an important, yet unstudied, challenge.
Content may be subject to copyright.
A Component-Based Performance
Comparison of Four Hypervisors
Jinho Hwang
The George Washington University
Sai Zeng and Frederick y Wu
IBM T. J. Watson Research Center
{saizeng, fywu}
Timothy Wood
The George Washington University
Abstract—Virtualization has become a popular way to make
more efficient use of server resources within both private data
centers and public cloud platforms. While recent advances in CPU
architectures and new virtualization techniques have reduced the
performance cost of using virtualization, overheads still exist,
particularly when multiple virtual machines are competing for
resources. We have performed an extensive performance compar-
ison under hardware-assisted virtualization settings considering
four popular virtualization platforms, Hyper-V, KVM, vSphere
and Xen, and find that the overheads incurred by each hypervisor
can vary significantly depending on the type of application and
the resources assigned to it. We also find dramatic differences
in the performance isolation provided by different hypervisors.
However, we find no single hypervisor always outperforms the
others. This suggests that effectively managing hypervisor diver-
sity in order to match applications to the best platform is an
important, yet unstudied, challenge.
KeywordsCloud Computing, Hypervisor, Benchmark
Virtualization technologies have dramatically changed how
businesses run their applications, first by enabling much greater
consolidation within private data centers, and more recently
as a driving technology behind cloud computing. Whether
it is used with public or private resources, virtualization
simplifies management, speeds up deployment, and improves
resource efficiency. Virtualization enables new features such
as performance management and reliability services to be
applied without requiring modifications to applications or
operating systems. While the overheads of the virtualization
layer still prevent it from being used in performance critical
domains such as high performance computing, virtualization
has become the norm for running a growing set of applications.
The uptake of virtualization has been driven recently by
the ease with which Infrastructure as a service (IaaS) cloud
platforms can rent users virtual machines (VMs). Many enter-
prises appreciate the rapid deployment and flexible scalability
of these clouds—traits which rely in part on the ease with
which virtual servers can be created and adjusted on the fly.
However, the main purpose of using virtualization technology
is to consolidate workloads so that one physical machine can
be multiplexed for many different users. This improves the
efficiency of an overall data center by allowing more work
to be done on a smaller set of physical nodes [1], and also
improves the per-server energy efficiency because even idle
servers consume a great deal of energy [2].
While virtualization provides many conveniences, it comes
at a cost. The hypervisor which manages the virtualization
platform incurs some overhead simply because of the layer
of abstraction it must add between a VM and the physical
resources it makes use of [3]. Further, since many VMs may
run on one physical machine, performance isolation is critical
to ensure that competing VMs do not negatively impact one
another. For example, a CPU scheduler in the hypervisor must
provide a fair amount of time to each VM and prevent a greedy
VM from hurting others [4, 5].
There are numerous virtualization platforms ranging from
open-source hypervisors such as KVM and Xen, to commercial
hypervisors such as VMware vSphere and Microsoft Hyper-V.
While the goals of these platforms are the same, the underlying
technologies vary, leaving system administrators responsible
for picking the ideal virtualization platform based on its
performance, features, and price. The choice of hypervisor
does not only apply to an enterprise’s private data center—
different cloud services make use of different virtualization
platforms. Amazon EC2, the largest infrastructure cloud, uses
Xen as a hypervisor, but Microsoft Azure uses Hyper-V and
VMware partners use ESX. Recently, Google launched its own
IaaS cloud that uses KVM as a hypervisor [6]. This kind
of hypervisor diversity causes new challenges in managing
resources due to the different APIs and feature sets supported
by each cloud and virtualization platform, but it also promises
new opportunities if applications can be carefully matched to
the best hypervisor.
The combination of new CPU architectures with embedded
virtualization support and advances in hypervisor design have
eliminated many of their performance overheads. Despite this,
popular hypervisors still exhibit different levels of perfor-
mance. As a motivating example, we have configured one
blade server with four different hypervisors in different disk
partitions: Hyper-V, KVM, VSphere, and Xen. We create an
identical Ubuntu VM with 1 virtual CPU (VCPU) and 2 GB
of RAM under each platform and measure the time required to
compile the Linux 2.6 kernel. Even with this straightforward
operation, we find significant performance differences under
each hypervisor, as illustrated in Figure 1.
While these results suggest that Xen is a poor choice for
a hypervisor, our study reveals a more complicated scenario
where each hypervisor has different strengths and weaknesses.
The result is a far more complicated management decision
than many system administrators are currently aware. The
virtualization platform (and by extension, cloud platform) best
suited for a given application is dependent on both the nature
of its resource requirements and on the types of applications it
Total Compile Time (min)
Hyper-V KVM vSphere XEN
Fig. 1. Linux Compile Workloads
will run alongside. Therefore, situational resource management
needs to be considered in highly scalable cloud architecture.
To understand the relative strengths and weaknesses of
different hypervisors, in this paper we perform an exten-
sive performance comparison of four popular virtualization
platforms. We use a component-based approach that isolates
performance by resource such as CPU, memory, disk, and
network. We also study the level of performance isolation
provided by each hypervisor to measure how competing VMs
may interfere with each other [7–10]. Our results suggest that
performance can vary between 3% and 140% depending on
the type of resource stressed by an application, and that the
level of interference caused by competing VMs can range from
2X to 10X depending on the hypervisor. We believe that this
study can act as a foundation for driving new research into
management systems designed to take advantage of hypervisor
diversity to better match applications to the platforms they will
run the most efficiently on.
Virtualization technology provides a way to share com-
puting resources among VMs by using hardware/software
partitioning, emulation, time-sharing, and dynamic resource
sharing. Traditionally, the operating system (OS) controls the
hardware resources, but virtualization technology adds a new
layer between the OS and hardware. A virtualization layer
provides infrastructural support so that multiple VMs (or guest
OS) can be created and kept independent of and isolated from
each other. Often, a virtualization layer is called a hypervisor
or virtual machine monitor (VMM). While virtualization has
long been used in mainframe systems [11], VMware has
been the pioneer in bringing virtualization to commodity x86
platforms, followed by Xen and a variety of other virtualization
platforms [12, 13].
Figure 2 shows three different approaches to virtualiza-
tion: para-virtualization (PV), full virtualization (FV), and
hardware-assisted virtualization (HVM). Paravirtualization re-
quires modification to the guest OS, essentially teaching the
OS how to make requests to the hypervisor when it needs
access to restricted resources. This simplifies the level of
hardware abstraction that must be provided, but version control
between the hypervisor and paravirtualized OS is difficult
since they are controlled by different organizations. Full virtu-
alization supports running unmodified guests through binary
translation. VMware uses the binary translation and direct
Fig. 2. Different Approaches to Providing the Virtualization Layer
execution techniques to create VMs capable of running propri-
etary operating systems such as Windows [12]. Unfortunately,
these techniques can incur large overheads since instructions
that manipulate protected resources must be intercepted and
rewritten. As a result, Intel and AMD have begun adding
virtualization support to hardware so that the hypervisor can
more efficiently delegate access to restricted resources [12].
Some hypervisors support several of these techniques; in our
study we focus solely on hypervisors using hardware-assisted
virtualization as this promises to offer the greatest performance
and flexibility.
Our target hypervisors are Hyper-V, KVM, VMware
vSphere, and Xen. Each of these hypervisors use different
architectures, even when restricted to HW-assisted virtualiza-
tion mode. The Windows-based Hyper-V has a significantly
different architecture than the others which all derive from
Linux. While Xen and KVM use open-source modifications of
the Linux kernel, VMware uses a custom build with proprietary
features [12]. Xen was initially based on the paravirtualization
technique, but it now supports HVM as well [13]. However,
it still retains a separate management domain (dom0) which
controls VMs and can negotiate access to custom block and
network drivers. KVM runs as a kernel module, which means
it uses most of the features of the linux kernel operating
system itself. For example, rather than providing its own CPU
scheduler for VMs, KVM treats each VM as a process and
uses the default Linux scheduler to allocate resources to them.
A variety of software and operating system aspects can
affect the performance of hypervisors and VMs. In partic-
ular, how the hypervisor schedules resources such as CPU,
memory, disk, and network are critical factors. Each of these
resources requires different techniques to virtualize, leading
to performance differences in each hypervisor depending on
the types of activities being performed. Table I summarizes
performance-related elements for each hypervisor.
Both research and development efforts have gone into
reducing the overheads incurred by virtualization layers. Prior
work by Apparao, et al., and Menon, et al., has focused on
network virtualization overheads and ways to reduce them [14,
15]. Others have examined overheads and performance vari-
ations caused by different CPU schedulers within the Xen
hypervisor [4]. Our prior work attempted to build models of
virtualization layer overheads to facilitate the migration from
native to virtual environments [16]. Benchmark comparisons
between different hypervisors have been performed both by
companies such as VMware [17], and by independent tech
Features Hyper-V KVM vSphere XEN
Base OS Windows Server Linux (+QEMU) vmkernel (linux-based) Linux (+QEMU)
Latest Release Ver-
2008 R2 2.6.32-279 5.0 4.1.2
Architecture Bare-Metal Bare-Metal (controversial) Bare-Metal Hosted (controversial)
Para-Virtualization, Hardware-
Assisted Virtualization*
Para-Virtualization, Full Virtualiza-
tion, Hardware-Assisted Virtualiza-
Full Virtualization, Hardware-
Assisted Virtualization*
Para-Virtualization, Full Virtualiza-
tion, Hardware-Assisted Virtualiza-
CPU Scheduling
Control with VM reserve, VM
limit, relative weight
Linux schedulers (Completely Fair
Queuing Scheduler*, round-robin,
fair queuing, proportionally fair
scheduling, maximum throughput,
weighted fair queuing)
Proportional Share-based
Algorithm, Relaxed Co-
Scheduling, Distributed Locking
with Scheduler Cell
SEDF (Simple Earliest Deadline
First), Credit*
SMP Scheduling
CPU Topology-based Scheduling SMP-Aware Scheduling
CPU Topology-aware Load Bal-
Work-Nonconserve, Work-
Memory Address
Shadow Pagetable, Hardware-
Assisted Pagetable (Second Level
Address Translation)*
Shadow page table, Hardware-
Assisted Pagetable*
Eumulated TLB, Shadow
Pagetable, Hardware Assisted
Pagetable (nested pagetable)*
Direct Pagetable (PV mode),
Shadow Pagetable (HVM mode),
Hardware-Assisted Pagetable*
Disk Management
Fixed disks, pass through disks,
dynamic disks*
No-op, anticipatory, deadline, com-
pletely fair queue (CFQ)*
Latency-aware Priority-based
scheduler, storage DRS
No-op, anticipatory, deadline, com-
pletely fair queue (CFQ)*
TCP offload, large send offload,
VM queue
FIFO-base scheduling
Priority-based NetIOC (Network
I/O Control), TCP segmemtation
offload, netqueue, distributed vir-
tual switch
FIFO-based scheduling
bloggers [18]; however, these studies have tried to simply
find the fastest hypervisor, not understand the strengths and
weaknesses of each as we do.
The methodology for our performance comparison of hy-
pervisors is to drill down each resource component one by one
with a specific benchmark workload. The components include
CPU, memory, disk I/O, and network I/O. Each component
has different virtualization requirements that need to be tested
with different workloads. We follow this with a set of more
general workloads representative of higher-level applications.
When a VM is created, it is assigned a certain number of
virtual CPUs (VCPU). VCPU can represent how many cores
this VM can use. However, VCPU does not guarantee a specific
physical CPU is dedicated to the VM, rather it represents a
flexible assignment of physical to virtual CPUs, which may be
further subdivided based on the scheduling weights of different
VMs. The CPU scheduling parameters used by the hypervisor
can impact the overhead added for handling CPU-intensive
tasks. We study cases where VMs are assigned a single VCPU
or four VCPUs (the max on our test system).
The hypervisor must provide a layer of indirection between
the guest OS and the system’s memory to ensure both per-
formance isolation and data integrity. With hardware-assisted
virtualization, this mapping is done through Extended Page
Table (Intel) or Rapid Virtualization Indexing (AMD) support
built into the Memory Management Unit (MMU), which
provides a significant performance boost compared to doing
memory translation in software [12]. Despite all using this
hardware, the hypervisors we compare can all take advantage
of it in different ways, leading to varying performance levels.
Disk IO is a common source of overhead in virtualization
platforms. If paravirtualization is used, then the IO path
between the guest VM and hypervisor can be optimized. With
full virtualization, this is not possible, and there is not yet wide
support for hardware-assisted disk device virtualization. As a
result, the hypervisor must emulate the functionality of the disk
device, potentially leading to large overheads. To characterize
these overheads, we test a range of IO types and sizes, as well
as higher level IO behavior such as that from an email, file,
or web server.
Network performance is also a critical source of over-
head for many virtualization platforms since many VMs run
network-based applications such as web sites. Therefore, two
major factors for the performance are network bandwidth
(throughput), and latency. Preliminary support for network
card-assisted virtualization is being developed [19], but this
feature is not standardized enough for our tests. Our network
benchmarks stress both low-level network characteristics and
web server performance.
In addition to the component-level tests described above,
we run several application level benchmarks. These bench-
marks illustrate how the overhead of different components
interact to determine overall performance. Finally, we also test
scenarios where multiple interfering VMs run simultaneously.
Performance isolation is an important goal for the virtual-
ization layer, particularly when used in cloud environments.
If the performance isolation fails, customers may complain
that the VM performance varies depending on other tenants’
usage pattern. In our composite benchmarking experiments,
we explore how well the hypervisor schedules or manages the
physical resources shared by VMs.
A. Experimental Setup
The goal of our system setup is to provide complete fair-
ness in all the aspects that can affect the system performance.
Hardware Setting: For a fair comparison, the hardware set-
tings are exactly the same for all the hypervisors by using one
server machine, which has two 147GB disks that are divided
into three paritions. Hyper-V occupies one partition, VMware
vSphere occupies one parition, and KVM and Xen share the
same linux installation that can be booted using either Xen or
the KVM kernel. The machine has Intel(R) Xeon (R) 5160
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 3. Bytemark Benchmark Result (CI = 99%)
3.00GHz/800MHz four core CPU, 8GB memory, and shared
3MB L2 cache per core (12MB). The disk has LSI logic 1064E
SAS 3 GBps controller IBM-ESXS model, and the network is
dual broadcom 5708S gigabit ethernet.
Guest OS: The base guest VM OS is Ubuntu 10.04 LTS
Lucid Lynx (Linux kernel 2.6.32), 10GB size disk image,
and has 2048MB memory assigned. Each hypervisor has this
base guest VM with exactly the same environment setup.
Interference generator VMs use the same setting with the base
guest VM, but it is assigned only 1024MB of memory.
B. Bytemark
Bytemark [20] is primarily designed to stress the ca-
pabilities of the CPU. It tests a *nix machine in different
ways, and runs both integer and floating point arithmetic
tests. 10 workloads include: 1) numeric sort sorts an array
of 32-bit integers, 2) string sort sorts an array of strings of
arbitrary length, 3) bitfield executes a variety of bit manipula-
tion functions, 4) emulated floating-point is a small software
floating-point package, 5) Fourier coefficients is a numerical
analysis routine for calculating series, and approximations of
waveforms, 6) assignment algorithm is a well-known task
allocation algorithm, 7) Huffman compression is is well-known
text and graphics compression algorithm, 8) IDEA encryption
is a relatively new block cipher algorithm, 9) Neural Net is a
small but functional back-propagation network simulator, 10)
LU Decomposition is a robust algorithm for solving linear
As seen in Figure 3, we find that all of the hypervisors
perform quite similarly when running these benchmarks. This
is because basic CPU operations require no help from the
INT Copy
INT Scale
INT Triad
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
INT Copy
INT Scale
INT Triad
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 4. Ramspeed Benchmark Result (CI = 99%)
hypervisor at all, allowing them to run at near-native speeds.
Increasing the number of VCPUs running the benchmarks from
one to four gives similar results for all of the hypervisors, with
only minor fluctuations. Confidence interval (CI) is 99% for
both cases. CI is calculated with E = t(α/2) × s/
n, where
E is the maximum error, α is 1-degree of confidence, t(·) is
t-value for a two-tailed distribution, s is the standard deviation,
and n is the number of samples. To find t(·), we use Student
t distribution table with the corresponding confidence interval.
We obtain µ ± E, where µ is the mean of the samples.
C. Ramspeed
Ramspeed (a.k.a. ramsmp for symmetric multiprocess-
ing) [21] is a tool to measure cache and memory bandwidth.
Workloads include integer and float operations. Copy simply
transfers data from one memory location to another. Scale
modifies the data before writing by multiplying with a certain
constant value. Add reads data from the first memory location,
then reads from the second, adds them up and writes the result
to the third place. Triad is a merge of Add and Scale.
The benchmark results shown in Figure 4(a) illustrate
the memory access performance of all hypervisors running
on a single VCPU are all within 3%. However, Figure 4(b)
shows up to a 25% performance difference in FLOAT Triad
case for KVM. While all of the other hypervisors exhibit
increased performance from the availability of multiple VCPUs
(e.g., vSphere rises from 2977 MB/s to 4086 MB/s), KVM
performance stays relatively flat (a rise from 2963 MB/s to
only 3210 MB/s). This suggests that KVM is not able to fully
saturate the memory bus, perhaps because dedicating a full 4
VCPUs to the guest OS causes greater interference with the
primary Linux system under which KVM is installed. Notably,
Char Output
Block Output
Char Input
Block Input
Random Seek
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
Char Output
Block Output
Char Input
Block Input
Random Seek
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 5. Bonnie++ Benchmark Result (CI = 95%)
Xen sees a similar, though not as significant performance gap
when moving to four VCPUs—not a surprise since both KVM
and Xen have heavy weight management systems, requiring a
full linux OS to be loaded to run the hypervisor.
D. Bonnie++ & FileBench
Bonnie++ [22] is a disk throughput benchmark. Our system
does not have any form of hardware-assisted disk virtualiza-
tion, so all of the hypervisors must take part in disk opera-
tions. As a result, we expect a larger performance difference
between the hypervisors in this test. The benchmark results
in Figure 5(a) illustrate that while most of the hypervisors
have relatively similar performance across the different IO
types, Xen has significantly lower performance in the tests
involving character level IO. We believe that the performance
drop of up to 40% may be caused by higher overhead
incurred by each IO request. Surprisingly, KVM beats out
all competitors, even though it uses the same QEMU based
backend as Xen for disk processing. Figure 5(b) shows that
increasing the VCPU count and adding more disk-intensive
threads in the benchmark causes a negligible performance gain
for Xen, KVM, and vSphere, but surprisingly causes a dramatic
decrease in throughput for Hyper-V (from 50.8 Kops/s to only
9.8 Kops/sec for the Char output test). We see this anomalous
behavior as long as greater than one VCPU is assigned to
the VM, perhaps indicating that Hyper-V’s IO scheduler can
experience thrashing when too many simultaneous IO requests
are issued.
FileBench [23] is a model based file system workload
generator that can mimic mail server, file server, and web
server workloads. The results in Figure 6 confirm those from
Bonnie++ since we again find Xen with lowest performance
Mail Server File Server Web Server
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
Mail Server File Server Web Server
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 6. Filebench Benchmark Result (CI = 95%)
and also see Hyper-V’s performance drop when the number
of cores is increased. Xen has the worst performance under
the mail server workload, which makes sense because the
workload contains a large number of small disk writes. In con-
trast, the web workload is read intensive, closer to Bonnie++’s
random-seek workload in which Xen performed well.
E. Netperf
Netperf [24] is a network performance benchmark that can
be used to measure various aspects of networking features. The
primary foci are bulk (aka unidirectional) data transfer and
request/response performance using either TCP or UDP and
the Berkeley Sockets interface. A test is based on the Netperf
TCP_STREAM test with a single VCPU. Without defining
MessageSize and SocketSize, the maximum throughput per
second is measured. Figure 7 shows the throughput per second
when using Netperf to transfer data from the test machine using
TCP sockets. The network throughput of vSphere is about 22%
more than that of Xen. Xen may have more overhead due to the
network transmission using the network backend driver located
in dom0. This requires more levels of indirection compared to
other hypervisors, which in turn affects overall throughput.
F. Application Workloads
We have already demonstrated the Linux kernel compile
performance in Figure 1. This illustrated a performance gap
between Xen and the other hypervisors as high as 40%. From
our component based benchmarks, this difference makes sense
and is probably a result of Xen’s moderate disk overheads.
We test a larger set of applications using the freebench
software [25]. As shown in Table II, freebench is an open-
source multi-platform benchmarking suite, providing game,
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere XEN
Fig. 7. Netperf Benchmark (CI = 95%)
audio encoding, compression, scientific, HTML processing,
photo procesing, encryption, and decompression workloads.
Figure 8(a) shows the benchmark results for 1 VCPU
case, while Figure 8(b) shows results with 4VCPUs. The re-
sults show significant inconsistencies—the bzip2.decompress,
scimark, tidy, and openssl benchmarks all have dramatically
different results depending on the hypervisor and number of
cores used. While the bzip benchmark includes a modest
amount of IO, the others are all dominated by CPU and
memory activity. This makes the variability we observe all
the more surprising since the CPU and memory benchmarks
indicated a relatively consistent ranking of hypervisor speed.
G. Multi-Tenant Interference
While our previous tests have only considered a single VM
running in isolation, it is far more common for each server to
run multiple VMs simultaneously. As virtualization platforms
attempt to minimize the interference between these VMs, mul-
tiplexing inevitably leads to some level of resource contention.
That is, if there is more than one virtual machine which tries
to use the same hardware resource, the performance of one
virtual machine can be affected by other virtual machines.
Even though the schedulers in hypervisors mainly isolate
each virtual machine within the amount of assigned hardware
resources, interference still remains in most of hypervisors [7–
As the web server is a popular service in cloud infrastruc-
tures, we want to see how its performance changes when other
VMs run applications on the same host. In order to see the
impact of each component, CPU, Memory, Disk, and network,
we measure the HTTP response while stressing each of the
resource components with different benchmarks.
Figure 9 shows the impact of interference in each hypervi-
sor. There are four VMs: one VM runs a simple web service
being accessed by a client, and the other three are used for
interference generators. The experiment is divided into four
phases: first a CPU based benchmark is run, followed by mem-
ory, disk, and finally a network intensive application. During
each phase, all three interfering VMs run the same benchmark
workload and we measure the performance impact on the web
VM. Note that due to benchmark timing constraints, the start
and end of some phases have short periods where no interfering
VMs are running. With no interference, all hypervisors have a
base web response time of approximately 775 ms.
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(a) 1 VCPU Case
Relative Score to Best (Higher is better)
Hyper-V KVM vSphere Xen
(b) 4 VCPUs case
Fig. 8. Freebench Benchmark Result (CI = 95%)
Figure 9(a) illustrates Hyper-V is sensitive to CPU, mem-
ory, and network interference. Not surprisingly, the interfering
disk benchmarks have little impact on the web server since
it is able to easily cache the files it is serving in memory.
Figure 9(b) shows the interference sensitivity of KVM; while
KVM shows a high degree of variability in response time, none
of the interfering benchmarks significantly hurt performance.
Figure 9(c) shows the interference sensitivity of vSphere to
memory is high, whereas the sensitivity to CPU, disk, and
network is very small. Finally, Figure 9(d) shows the interfer-
ence sensitivity of Xen on memory and network is extremely
high compared to the other hypervisors.
Figure 9(e) shows the direct comparison of four hypervi-
sors. A base line shows the average response time without
using hypervisors. As we also can see from Figure 7, Xen
has inherent network overheads so that other workloads affect
more on the application performance.
Our experimental results paint a complicated picture about
the relative performance of different hypervisors. Clearly, there
is no perfect hypervisor that is always the best choice; different
applications will benefit from different hypervisors depending
on their performance needs and the precise features they
require. Overall, vSphere performs the best in our tests, not
0 100 200 300 400 500 600
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(a) Hyper-V
0 100 200 300 400 500 600
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(b) KVM
0 100 200 300 400 500 600 700
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(c) vSphere
0 100 200 300 400 500 600
Response Time (ms)
HTTP Request Number
CPU Memory Disk Net
(d) Xen
Avg. Resp. Time (sec)
Base Line
(e) Performance Comparison
Fig. 9. Interference Impact for Web Requests: 4 VMs (1 web server, 3 workload generators) are used. 3 VMs run the same workload at the same time. The
workloads run in the sequence of CPU, memory, disk, and network workloads over time span. We can easily identify 4 interference sections from each graph.
Items Category Notes
gnugo.tactics Game
GNU Go is a free program that
plays the game of Go
ogg.tibetan-chant Audio Encoding Encoding a song, Tibetan Chant
bzip2.decompress-9 Compression Compress a file in bzip2
scimark2.small Scientific
SciMark 2.0 is a Java benchmark
for scientific and numerical com-
puting. It measures several com-
putational kernels and reports a
composite score in approximate
HTML process-
ing tool
HTML syntax checker and refor-
dcraw.d300 Photo Rendering
Dcraw is a photo rendering tool,
and become a standard tool within
and without the Open Source world
Persistence of
Vision Ray-
Persistence of Vision Ray-Tracer
creates three-dimensional, photo-
realistic images using a rendering
technique called ray-tracing
openssl.aes Encryption OpenSSL AES encryption
bzip2.compress-9 Decompression Decompress a file in bzip2
surprisingly since VMware’s products have been the longest
in development and have the largest group of dedicated devel-
opers behind them. However, the other three hypervisors all
perform respectably, and each of the tested hypervisors has at
least one benchmark for which it outperforms all of the others.
In general, we find that CPU and memory related tasks
experience the lowest levels of overhead, although KVM
experiences higher memory overheads when all of the sys-
tem’s cores are active. Performance diverges more strongly
for IO activities, where Xen exhibits high overheads when
performing small disk operations. Hyper-V also experiences
a dramatic slowdown when multiple cores are dedicated to
running small, sequential reads and writes. Xen also suffers in
network throughput. It is worth noting that we test Xen using
hardware-assisted full virtualization, whereas the hypervisor
was originally developed for paravirtualization. In practice,
public clouds such as Amazon EC2 use Xen in paravirtualized
mode for all but their high-end instance types.
Our application level tests match these results, with differ-
ent hypervisors exhibiting different overheads depending on
the application and the number of cores assigned to them. All
of this variation suggests that properly matching an application
to the right hypervisor is difficult, but may well be worth the
effort since performance variation can be as high as 140%. We
believe that future management systems should be designed to
exploit this diversity. To do so, researchers must overcome the
inherent challenges in managing multiple systems with differ-
ent APIs, and the difficulty in determining what hypervisor best
matches an application’s needs. VM interference also remains
a challenge for all of the hypervisors tested, and is another area
where properly designed management systems may be able to
While we have taken every effort to configure the physical
systems and VMs running on them identically, it is true that
the performance of each hypervisor can vary significantly
depending on how it is configured. However, this implies that
there may be even greater potential for variability between
hypervisors if they are configured away from their default
settings. Thus the goal of our work is not to definitively show
one hypervisor to be better than the others, but to show that
each have their own strengths and weaknesses.
In this paper we have extensively compared four hyper-
visors: Hyper-V, KVM, vSphere, and Xen. We show their
performance differences and similarities in a variety of sit-
uations. Our results indicate that there is no perfect hyper-
visor, and that different workloads may be best suited for
different hypervisors. We believe that the results of our study
demonstrate the benefits of building highly heterogeneous
data center and cloud environments that support a variety
of virtualization and hardware platforms. While this has the
potential to improve efficiency, it also will introduce a number
of new management challenges so that system administrators
and automated systems can properly make use of this diversity.
Our results also illustrate how competing VMs can have a high
degree of performance interference. Properly determining how
to place and allocate resources to virtual servers will remain
an important management challenge due to the shared nature
of virtualization environments. Our future research is to solve
these problems.
[1] Vijayaraghavan Soundararajan and Kinshuk Govil,
“Challenges in building scalable virtualized datacenter
management, SIGOPS Oper. Syst. Rev., vol. 44, no. 4,
pp. 95–102, Dec. 2010.
[2] Luiz André Barroso and Urs Hölzle, “The case for
energy-proportional computing, Computer, vol. 40, no.
12, pp. 33–37, Dec. 2007.
[3] Timothy Wood, Ludmila Cherkasova, Kivanc Ozonat,
and Prashant Shenoy, “Profiling and modeling resource
usage of virtualized applications, in Proceedings of
the 9th ACM/IFIP/USENIX International Conference on
Middleware, New York, NY, USA, 2008, Middleware
’08, pp. 366–387, Springer-Verlag New York, Inc.
[4] Ludmila Cherkasova, Diwaker Gupta, and Amin Vahdat,
“Comparison of the three cpu schedulers in xen, SIG-
METRICS Perform. Eval. Rev., vol. 35, no. 2, pp. 42–51,
Sept. 2007.
[5] Diego Ongaro, Alan L. Cox, and Scott Rixner, “Schedul-
ing i/o in virtual machine monitors, in Proceedings of the
fourth ACM SIGPLAN/SIGOPS international conference
on Virtual execution environments, New York, NY, USA,
2008, VEE ’08, pp. 1–10, ACM.
[6] Google Compute Engine,
“, 2012.
[7] Melanie Kambadur, Tipp Moseley, Rick Hank, and
Martha A. Kim, “Measuring interference between live
datacenter applications, in Proceedings of the Inter-
national Conference on High Performance Computing,
Networking, Storage and Analysis, Los Alamitos, CA,
USA, 2012, SC ’12, pp. 51:1–51:12, IEEE Computer
Society Press.
[8] Jason Mars, Neil Vachharajani, Robert Hundt, and
Mary Lou Soffa, “Contention aware execution: online
contention detection and response, in Proceedings of
the 8th annual IEEE/ACM international symposium on
Code generation and optimization, New York, NY, USA,
2010, CGO ’10, pp. 257–265, ACM.
[9] Jason Mars, Lingjia Tang, and Mary Lou Soffa, “Directly
characterizing cross core interference through contention
synthesis, in Proceedings of the 6th International
Conference on High Performance and Embedded Ar-
chitectures and Compilers, New York, NY, USA, 2011,
HiPEAC ’11, pp. 167–176, ACM.
[10] Gang Ren, Eric Tune, Tipp Moseley, Yixin Shi, Silvius
Rus, and Robert Hundt, “Google-wide profiling: A
continuous profiling infrastructure for data centers, IEEE
Micro, vol. 30, no. 4, pp. 65–79, July 2010.
[11] Stuart Devenish, Ingo Dimmer, Rafael Folco, Mark Roy,
Stephane Saleur, Oliver Stadler, and Naoya Takizawa,
“Ibm powervm virtualization introduction and configu-
ration, Redbooks, 1999.
[12] VMware, “Understanding full virtualization, paravirtu-
alization, and hardware assist, VMware White Paper,
[13] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris,
A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen
and the art of virtualization, ACM SOSP, 2003.
[14] Padma Apparao, Srihari Makineni, and Don Newell,
“Characterization of network processing overheads in
xen, in Proceedings of the 2nd International Workshop
on Virtualization Technology in Distributed Computing,
Washington, DC, USA, 2006, VTDC ’06, pp. 2–, IEEE
Computer Society.
[15] Aravind Menon, Jose Renato Santos, Yoshio Turner,
G. (John) Janakiraman, and Willy Zwaenepoel, “Diag-
nosing performance overheads in the xen virtual machine
environment, in Proceedings of the 1st ACM/USENIX in-
ternational conference on Virtual execution environments,
New York, NY, USA, 2005, VEE ’05, pp. 13–23, ACM.
[16] Jinho Hwang and Timothy Wood, Adaptive dynamic
priority scheduling for virtual desktop infrastructures.,
in IWQoS. 2012, pp. 1–9, IEEE.
[17] VMware, A performance comparison of hypervisors,
VMware White Paper, 2007.
[18] Vmware vs Virtualbox vs KVM vs XEN,
[19] Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan,
and Kais Belgaied, “Crossbow: from hardware virtualized
nics to virtualized networks, in Proceedings of the 1st
ACM workshop on Virtualized infrastructure systems and
architectures, New York, NY, USA, 2009, VISA ’09, pp.
53–62, ACM.
[20] Bytemark, “ mayer/linux/bmark.html,
[21] Ramspeed, “,
[22] Bonnie++, “, 2004.
[23] Filebench, “,
[24] Netperf, “, 2012.
[25] Freebench, “, 2008.
... Further work is dedicated to the question of virtual infrastructure management, showing cloud resources can be limited in order to respond to dynamic changes in a VE [16]. Finally, certain research efforts have been dedicated to performing comparative analysis of the modern hypervisors, which is in line with the approach applied in our paper [12], [17][18][19][20][21][22][23]. ...
... Compared to related work, we believe that our study introduces more comprehensive modeling of the FS performance in VE. At the practical level, while most of the related approaches consider just a single case study [17][18][19][20][21][22][23], we consider three case studies. Compared to related work, our main focus is FS pair modeling and KDB with the optimal FS pairs. ...
Full-text available
This paper proposed an approach to mathematical modeling of the file system performance in a hypervisor-based virtual environment, with special focus on the file system pair interactions. The main goal of this research is to conduct an in-depth analysis of the filesystem pair behavior with respect to the performance costs originating from the employed technologies, such as H-Trees, B-Trees and Copy-on-Write/Overwrite update method, and different application workload types. The modeling provides a collection of hypotheses about the expected behavior. The modeling and the hypotheses are validated based on the results obtained for a specific case study. Our study reports on a file system performance comparison in the context of KVM hyper-visor-based full hardware virtualization, application-level benchmarking, and 64-bit Linux filesys-tems Ext4, XFS, and Btrfs. The Filebench benchmark tool is applied for comprehensive testing of the filesystem performance under fair-play conditions. According to the obtained results, we provide a set of recommendations (i.e., a Knowledge Data Base) for optimal filesystem pair selection for the KVM hypervisor. Finally, it is important to note that the proposed modeling is also applicable to other hypervisor-based virtualizations.
... Recognizing this, extensive performance evaluations of hypervisors and non-virtualized execution have been conducted [34][35][36][37][38]. Nevertheless, previous comparison of VMs mostly relied on older software like Xen, KVM, or even VMware and out-of-tree patches. ...
Full-text available
Virtualization technologies are indispensable in operating data centers and supporting cloud infrastructures, providing cost reduction (CapEx and OpEx), high availability, and disaster recovery. Hypervisor-assisted virtualization is one of the leading virtualization technologies, with the hypervisor being the software layer responsible for presenting the virtualized view of the hardware to system-level VMs. However, the virtualization overhead it introduces has implications into the computing infrastructure performance. This paper revisits key concepts about virtualization, technologies and techniques, types of VMs and hypervisors, and provides an up-to-date comparison between native and VM environments using workload metrics such as CPU and memory scores, disk speed, and network throughput to determine virtualization overhead. Our results show a clear overall trend toward meritorious performance and the maturity of the technologies used to create system-level VMs.
... hypervisor for time and space multiplexing as well as the overall management of shared platform resources on behalf of the hosted guests. Hypervisor-based shared resource management potentially adds undue overheads (Hwang et al., 2013), which impact predictability and determinism of critical flight control. ...
Full-text available
Autonomous multicopters often feature federated architectures, which incur relatively high communication costs between separate hardware components. These costs limit the ability to react quickly to new mission objectives. Additionally, federated architectures are not easily upgraded without introducing new hardware that impacts size, weight, power and cost (SWaP-C) constraints. In turn, such constraints restrict the use of redundant hardware to handle faults. In response to these challenges, we propose FlyOS, an Integrated Modular Avionics (IMA) approach to consolidate mixed-criticality flight functions in software on heterogeneous multicore aerial platforms. FlyOS is based on a separation kernel that statically partitions resources among virtualized sandboxed OSes. We present a dual-sandbox prototype configuration, where timing-and safety-critical flight control tasks execute in a real-time OS alongside mission-critical vision-based navigation tasks in a Linux sandbox. Low latency shared memory communication allows flight commands and data to be relayed in real-time between sandboxes. A hypervisor-based fault-tolerance mechanism is also deployed to ensure failover flight control in case of critical function or timing failures. We validate FlyOS's performance and showcase its benefits when compared against traditional architectures in terms of predictable, extensible and efficient flight control.
... Performance evaluation on hypervisors and virtualization solutions has been studied extensively by academia and companies. However, most of the literature focuses on server applications or standard computers, such as the research conducted by Hwang et al. [12], Morabito et al. [13], Graniszewski and Arciszewski [14], Tran et al. [15], Li et al. [16], Elsayed and Abdelbaki [17], ur Rahman et al. [18], Babu et al. [19], Che et al. [20], or Soriga and Barbulescu [21]. Some publications evaluate the performance of hypervisors on embedded or realtime systems, such as Toumassian et al. [22] who compared the performance of selected hypervisors with that of a native Linux system on an ARM processor embedded into a bananapi board, or Patel et al. [23] who used an ARM dual-core board as a test platform. ...
Full-text available
Containerization is a technique for lightweight virtualization of programs in cloud computing, which leads to the widespread use of cloud computing. It has a positive impact on both the development and deployment of software. Containers can be divided into two groups based on their setup. The Application Container and the System Container are two types of containers. A container is a user-space that is contained within another container, while a system container is a user-space that is contained within another container. This study compares and contrasts several container architectures and their organization in micro-hosting environments for containers.
Conference Paper
Full-text available
Virtual Desktop Infrastructures (VDIs) are gaining popularity in cloud computing by allowing companies to deploy their office environments in a virtualized setting instead of relying on physical desktop machines. Consolidating many users into a VDI environment can significantly lower IT management expenses and enables new features such as "available-anywhere" desktops. However, barriers to broad adoption include the slow performance of virtualized I/O, CPU scheduling interference problems, and shared-cache contention. In this paper, we propose a new soft real-time scheduling algorithm that employs flexible priority designations (via utility functions) and automated scheduler class detection (via hypervisor monitoring of user behavior) to provide a higher quality user experience. We have implemented our scheduler within the Xen virtualization platform, and demonstrate that the overheads incurred from co-locating large numbers of virtual machines can be reduced from 66% with existing schedulers to under 2% in our system. We evaluate the benefits and overheads of using a smaller scheduling time quantum in a VDI setting, and show that the average overhead time per scheduler call is on the same order as the existing SEDF and Credit schedulers.
Conference Paper
Full-text available
In this paper, we present a direct methodology and framework for the measurement and characterization of an application's cross-core interference sensitivity on multicore microarchitectures. While prior works use indirect indicators, such as last level cache miss rate, to infer an application's cross-core interference sensitivity, our approach is direct, in that it characterizes the application's cross-core interference sensitivity using the performance impact due to actual contention. Our methodology and framework, the Cross-core interference Profiling Environment, or CiPE, is composed of a lightweight runtime environment on which a host application runs, along with a carefully designed contention synthesis engine that executes on a neighboring core. CiPE manipulates the co-running contention synthesis engine, while monitoring and analyzing the resulting dynamic impact on the host application. CiPE is able to characterize the cross-core interference sensitivity of the entire application, its individual phases, or source level code regions. To demonstrate the effectiveness of CiPE, we use CiPE characterizations to address two pressing problems. First, we use CiPE characterizations to perform contention conscious batch scheduling that minimizes cross-core interference, resulting in a 12% performance improvment on average when applied to the SPEC2006 benchmark suite, and beyond 20% in the case of mcf and omnetpp. Second, we use CiPE to design a performance analysis tool that is capable identifying contentious bottlenecks in application code.
Conference Paper
Full-text available
Virtual Machine (VM) environments (e.g., VMware and Xen) are experiencing a resurgence of interest for diverse uses including server consolidation and shared hosting. An application's performance in a virtual machine environment can differ markedly from its performance in a non-virtualized environment because of interactions with the underlying virtual machine monitor and other virtual machines. However, few tools are currently available to help debug performance problems in virtual machine environments.In this paper, we present Xenoprof, a system-wide statistical profiling toolkit implemented for the Xen virtual machine environment. The toolkit enables coordinated profiling of multiple VMs in a system to obtain the distribution of hardware events such as clock cycles and cache and TLB misses. The toolkit will facilitate a better understanding of performance characteristics of Xen's mechanisms allowing the community to optimize the Xen implementation.We use our toolkit to analyze performance overheads incurred by networking applications running in Xen VMs. We focus on networking applications since virtualizing network I/O devices is relatively expensive. Our experimental results quantify Xen's performance overheads for network I/O device virtualization in uni- and multi-processor systems. With certain Xen configurations, networking workloads in the Xen environment can suffer significant performance degradation. Our results identify the main sources of this overhead which should be the focus of Xen optimization efforts. We also show how our profiling toolkit was used to uncover and resolve performance bugs that we encountered in our experiments which caused unexpected application behavior.
Conference Paper
Full-text available
This paper explores the relationship between domain scheduling in a virtual machine monitor (VMM) and I/O performance. Tradition- ally, VMM schedulers have focused on fairly sharing the processor resources among domains while leaving the scheduling of I/O re- sources as a secondary concern. However, this can result in poor and/or unpredictable application performance, making virtualiza- tion less desirable for applications that require efficient and consis- tent I/O behavior. This paper is the first to study the impact of the VMM scheduler on performance using multiple guest domains concurrently run- ning different types of applications. In particular, different com- binations of processor-intensive, bandwidth-intensive, and latency- sensitive applications are run concurrently to quantify the impacts of different scheduler configurations on processor and I/O perfor - mance. These applications are evaluated on 11 different scheduler configurations within the Xen VMM. These configurations include a variety of scheduler extensions aimed at improving I/O perfor- mance. This cross product of scheduler configurations and applica- tion types offers insight into the key problems in VMM scheduling for I/O and motivates future innovation in this area.
Conference Paper
Full-text available
Cross-core application interference due to contention for shared on-chip and off-chip resources pose a significant challenge to providing application level quality of service (QoS) guarantees on commodity multicore micro-architectures. Unexpected cross-core interference is especially problematic when considering latency-sensitive applications that are present in the web service data center application domains, such as web-search. The commonly used solution is to simply disallow the co-location of latency-sensitive applications and throughput-oriented batch applications on a single chip, leaving much of the processing capabilities of multicore micro-architectures underutilized. In this work we present a Contention Aware Execution Runtime (CAER) environment that provides a lightweight runtime solution that minimizes cross-core interference due to contention, while maximizing utilization. CAER leverages the ubiquitous performance monitoring capabilities present in current multicore processors to infer and respond to contention and requires no added hardware support. We present the design and implementation of the CAER environment, two separate contention detection heuristics, and approaches to respond to contention online. We evaluate our solution using the SPEC2006 benchmark suite. Our experiments show that when allowing co-location with CAER, as opposed to disallowing co-location, we are able to increase the utilization of the multicore CPU by 58% on average. Meanwhile CAER brings the overhead due to allowing co-location from 17% down to just 4% on average.
Conference Paper
Application interference is prevalent in datacenters due to contention over shared hardware resources. Unfortunately, understanding interference in live datacenters is more difficult than in controlled environments or on simpler architectures. Most approaches to mitigating interference rely on data that cannot be collected efficiently in a production environment. This work exposes eight specific complexities of live datacenters that constrain measurement of interference. It then introduces new, generic measurement techniques for analyzing interference in the face of these challenges and restrictions. We use the measurement techniques to conduct the first large-scale study of application interference in live production datacenter workloads. Data is measured across 1000 12-core Google servers observed to be running 1102 unique applications. Finally, our work identifies several opportunities to improve performance that use only the available data; these opportunities are applicable to any datacenter.
Conference Paper
form only given. The main reasons hindering the wide spread deployment of passive RFID tags are high cost and limited range. The present work focuses on developing a sub-cent RFID capable of operating from a reasonable distance, though with some compromise on the information content. Since there are ample applications of read-only RFID with limited information content, the present technology is expected to fill a substantial part of the niche of sub-cent tags. A metal patch on a metallic ground plane, separated by a dielectric, acts like a microstrip patch antenna and has scattering characteristic defined by poles and zeros depending on the dimensions of the patch. Such resonating structures can be used to create tags, with a purpose of storing information in the various resonant frequencies. Multiple patches, either stacked on top of each other, or located transversely, can be used to increase information content. The challenge is to retrieve these resonant frequencies - from single or multiple patches - in presence of clutter (unwanted scatter) from surrounding objects without the use of any non-linear elements. The situation becomes especially difficult in presence of large metallic objects creating significant amounts of clutter. We have used soft-computing techniques to analyze the nature of the clutter signal. Multilayer Perceptron trained with error back propagation could deliver very accurate estimation of the resonant frequencies in realtime. We discuss in detail the experimental set-up, data collection and analysis methodology and demonstrate the stability of the results for signals measured at a distance, even in presence for of impairments.
Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application-platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.
Conference Paper
This paper describes a new architecture for achieving net- work virtualization using virtual NICs (VNICs) as the build- ing blocks. The VNICs can be associated with dedicated and independent hardware lanes that consist of dedicated NIC and kernel resources. Hardware lanes support dynamic polling, which enables the fair sharing of bandwidth with no performance penalty. VNICs ensure full separation of trac for virtual machines within the host. A collection of VNICs on one or more physical machines can be connected to create a Virtual Wire by assigning them a common attribute such as a VLAN tag.