virtFlow: Guest Independent Execution Flow Analysis Across Virtualized Environments

To read the full-text of this research, you can request a copy directly from the authors.


An agent-less technique to understand virtual machines (VMs) behavior and their changes during the VM life-cycle is essential for many performance analysis and debugging tasks in the cloud environment. Because of privacy and security issues, ease of deployment and execution overhead, the method preferably limits its data collection to the physical host level, without internal access to the VMs. We propose a host-based, precise method to recover execution flow of virtualized environments, regardless of the level of virtualization. Given a VM, the Any-Level VM Detection Algorithm (ADA) and Nested VM State Detection (NSD) Algorithm compute its execution path along with the state of virtual CPUs (vCPUs) from the host kernel trace. The state of vCPUs is displayed in an interactive trace viewer (TraceCompass) for further inspection. Then, a new approach for profiling threads and processes inside the VMs is proposed. Our proposed VM trace analysis algorithms have been open-sourced for further enhancements and to the benefit of other developers. Our new techniques are being evaluated with workloads generated by different benchmarking tools. These approaches are based on host hypervisor tracing, which brings a lower overhead (around 1%) as compared to other approaches.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Cloud computing is a fast-growing technology that provides on-demand access to a pool of shared resources. This type of distributed and complex environment requires advanced resource management solutions that could model virtual machine (VM) behavior. Different workload measurements, such as CPU, memory, disk, and network usage, are usually derived from each VM to model resource utilization and group similar VMs. However, these course workload metrics require internal access to each VM with the available performance analysis toolkit, which is not feasible with many cloud environments privacy policies. In this article, we propose a non-intrusive host-based virtual machine workload characterization using hypervisor tracing. VM blockings duration, along with virtual interrupt injection rates, are derived as features to reveal multiple levels of resource intensiveness. In addition, the VM exit reason is considered, as well as the resource contention rate due to the host and other VMs. Moreover, the processes and threads preemption rates in each VM are extracted using the collected tracing logs. Our proposed approach further improves the selected features by exploiting a page ranking based algorithm to filter non-important processes running on each VM. Once the metric features are defined, a two-stage VM clustering technique is employed to perform both coarse- and fine-grain workload characterization. The inter-cluster and intra-cluster similarity metrics of the silhouette score is used to reveal distinct VM workload groups, as well as the ones with significant overlap. The proposed framework can provide a detailed vision of the underlying behavior of the running VMs. This can assist infrastructure administrators in efficient resource management, as well as root cause analysis.
Performance analysis of Java applications requires a deep understanding of the Java virtual machine and the system on which it is running. An unexpected latency can be caused by a bug in the source code, a misconfiguration or an external factor like CPU or disk contention. Existing tools have difficulties finding the root cause of some latencies because they do not efficiently collect performance data from the different layers of the system. In this paper, we propose a multilevel analysis framework that uses Kernel and userspace tracing to help developers understand and evaluate the performance of their applications. Kernel tracing is used to gather information about thread scheduling, system calls, I/O operations, etc. and userspace tracing is used to monitor the internal components of the JVM such as the garbage collectors and the JIT compilers. By bridging the gap between kernel and userspace traces, our tool provides full visibility to developers and helps them diagnose difficult performance issues. We show the usefulness of our approach by using it to detect problems in different Java applications.
The dynamic nature of applications in Virtual Machines (VMs) and the increasing demand for virtualized systems make the analysis of dynamic environments critical to achieve efficient operation of such complex distributed systems. In this paper, we propose a precise host-based tracing and analysis method to retrieve execution flows, and dependency flows from virtualized environments, regardless of the level of nested virtualization. Given a host operating system level trace, the Any-Level vCPU Detection (ASD) algorithm and Guest Thread-state Analysis (GTA) algorithm detect the different states of vCPUs and threads for arbitrary nesting depths. Then, the Execution-graph Construction (HEC) algorithm extracts the waiting / wake-up dependencies chains out of the running processes across VMs, for any level of virtualization in a transparent manner. The process dependency graph, vCPU state, and VM process state are displayed in an interactive trace viewer, Trace Compass, for further inspection. Our proposed VM trace analysis algorithms have been open-sourced for further enhancements and collaborative research and development. Our new techniques were evaluated with workloads generated using several well-known server applications (e.g., Hadoop, Apache, MySQL, Linux apt-get, and IMS network). The proposed approaches are based on host hypervisor tracing, which brings a lower tracing overhead (around 1%), is easier to deploy, and presents fewer security issues as compared to other approaches.
Full-text available
This paper studies the preemption between programs running in different virtual machines on the same computer. One of the current monitoring methods consist of updating the average steal time through collaboration with the hypervisor. However, the average is insufficient to diagnose abnormal latencies in time-sensitive applications. Moreover, the added latency is not directly visible from the virtual machine point of view. The main challenge is to recover the cause of preemption of a task running in a virtual machine, whether it is a task on the host computer or in another virtual machine. We propose a new method to study thread preemption crossing virtual machines boundaries using kernel tracing. The host computer and each monitored virtual machine are traced simultaneously. We developed an efficient and portable trace synchronization method, which is required to account for time offset and drift that occur within each virtual machine. We then devised an algorithm to recover the root cause of preemption between threads at every level. The algorithm successfully detected interactions between multiple competing threads in distinct virtual machines on a multi-core machine.
Performance analysis and troubleshooting of cloud applications are challenging. In particular, identifying the root causes of performance problems is quite difficult. This is because profiling tools based on processor performance counters do not yet work well for an entire virtualized environment, which is the underlying infrastructure in cloud computing. In this work, we explore an approach for unified performance profiling of an entire virtual environment by sampling only at the virtual machine monitor (VMM) level and applying common-time-based analysis across the entire virtual environment from a VMM to all guests on a host machine. Our approach involves three steps: centralized data sampling at VMM-level, generation of symbol map for running programs in guests, and unified analysis of the entire virtualized environment with common time by the host-time-axis. We also describe the design of unified profiling for an entire virtual machine (VM) environment, and we actually implement a unified VM profiler based on hardware performance counters. Finally, our results demonstrate accurate profiling. In addition, we achieved a lower overhead than in a previous study as a result of having no additional context switches by the virtual interrupt injection into the guest during measurement.
Infrastructure-as-a-service clouds are becoming widely adopted. However, resource sharing and multi-tenancy have made performance anomalies a top concern for users. Timely debugging those anomalies is paramount for minimizing the performance penalty for users. Unfortunately, this debugging often takes a long time due to the inherent complexity and sharing nature of cloud infrastructures. When an application experiences a performance anomaly, it is important to distinguish between faults with a global impact and faults with a local impact as the diagnosis and recovery steps for faults with a global impact or local impact are quite different. In this paper, we present PerfCompass, an online performance anomaly fault debugging tool that can quantify whether a production-run performance anomaly has a global impact or local impact. PerfCompass can use this information to suggest the root cause as either an external fault (e.g., environment-based) or an internal fault (e.g., software bugs). Furthermore, PerfCompass can identify top affected system calls to provide useful diagnostic hints for detailed performance debugging. PerfCompass does not require source code or runtime application instrumentation, which makes it practical for production systems. We have tested PerfCompass by running five common open source systems (e.g., Apache, MySQL, Tomcat, Hadoop, Cassandra) inside a virtualized cloud testbed. Our experiments use a range of common infrastructure sharing issues and real software bugs. The results show that PerfCompass accurately classifies 23 out of the 24 tested cases without calibration and achieves 100 percent accuracy with calibration. PerfCompass provides useful diagnosis hints within several minutes and imposes negligible runtime overhead to the production system during normal execution time.
Conference Paper
This paper provides detailed explanation about the capabilities, challenges and future direction for Linux perf based virtualization performance monitoring in KVM cloud environment. It also explores various ways of utilizing the the collected performance data to detect, analyse and correct guest virtual machine's performance problems.
Conference Paper
Realization of cloud computing has been possible due to availability of virtualization technologies on commodity platforms. Measuring resource usage on the virtualized servers is difficult because of the fact that the performance counters used for resource accounting are not virtualized. Hence, many of the prevalent virtualization technologies like Xen, VMware, KVM etc., use host specific CPU usage monitoring, which is coarse grained. In this paper, we present a performance monitoring tool for KVM based virtualized machines, which measures the CPU overhead incurred by the hypervisor on behalf of the virtual machine along-with the CPU usage of virtual machine itself. This fine-grained resource usage information, provided by the above tool, can be used for diverse situations like resource provisioning to support performance associated QoS requirements, identification of bottlenecks during VM placements, resource profiling of applications in cloud environments, etc. We demonstrate a use case of this tool by measuring the performance of web-servers hosted on a KVM based virtualized server.
Efficient tracing of system-wide execution, allowing integrated analysis of both kernel space and user space, is something difficult to achieve. The following article will present you a new tracer core, Linux Trace Toolkit Next Generation (LTTng), that has taken over the previous version known as "LTT". It has the same goals of low system disturbance and ar-chitecture independance while being fully reen-trant, scalable, precise, extensible, modular and easy to use. For instance, LTTng allows tra-cepoints in NMI code, multiple simultaneous traces and a flight recorder mode. LTTng reuses and enhances the existing LTT instrumentation and RelayFS. This paper will focus on the approaches taken by LTTng to fulfill these goals. It will present the modular architecture of the project. It will then explain how NMI reentrancy requires atomic operations for writing and RCU lists for tracing behavior control. It will show how these techniques are inherently scalable to multipro-cessor systems. Then, time precision limita-tions in the kernel will be discussed, followed by an explanation of LTTng's own monotonic timestamps motives.
Conference Paper
Multi-tenant cloud, which usually leases resources in the form of virtual machines, has been commercially available for years. Unfortunately, with the adoption of commodity virtualized infrastructures, software stacks in typical multi-tenant clouds are non-trivially large and complex, and thus are prone to compromise or abuse from adversaries including the cloud operators, which may lead to leakage of security-sensitive data. In this paper, we propose a transparent, backward-compatible approach that protects the privacy and integrity of customers' virtual machines on commodity virtualized infrastructures, even facing a total compromise of the virtual machine monitor (VMM) and the management VM. The key of our approach is the separation of the resource management from security protection in the virtualization layer. A tiny security monitor is introduced underneath the commodity VMM using nested virtualization and provides protection to the hosted VMs. As a result, our approach allows virtualization software (e.g., VMM, management VM and tools) to handle complex tasks of managing leased VMs for the cloud, without breaking security of users' data inside the VMs. We have implemented a prototype by leveraging commercially-available hardware support for virtualization. The prototype system, called CloudVisor, comprises only 5.5K LOCs and supports the Xen VMM with multiple Linux and Windows as the guest OSes. Performance evaluation shows that CloudVisor incurs moderate slow-down for I/O intensive applications and very small slowdown for other applications.
Deepdive: Transparently identifying and managing performance interference in virtualized environments
  • D Novaković
  • N Vasić
  • S Novaković
  • D Kostić
  • R Bianchini
Performance profiling of virtual machines
  • J Du
  • N Sehrawat
  • W Zwaenepoel