Karsten Schwan

Karsten Schwan
Georgia Institute of Technology | GT · College of Computing

About

585
Publications
47,594
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,820
Citations

Publications

Publications (585)
Article
Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking and migrations, limiting the benefits from heterogeneity. To address this, we design HeteroOS, a...
Article
Full-text available
Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneo...
Article
Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking and migrations, limiting the benefits from heterogeneity. To address this, we design HeteroOS, a...
Conference Paper
This paper argues for the utility of back-end driven onloading to the edge as a way to address bandwidth use and latency challenges for future device-cloud interactions. Supporting such edge functions (EFs) requires solutions that can provide (i) fast and scalable EF provisioning and (ii) strong guarantees for the integrity of the EF execution and...
Conference Paper
Next generation byte addressable nonvolatile memories (NVMs) such as PCM, Memristor, and 3D X-Point are attractive solutions for mobile and other end-user devices, as they offer memory scalability as well as fast persistent storage. However, NVM's limitations of slow writes and high write energy are magnified for applications that require atomic, c...
Conference Paper
Full-text available
The massive explosion in social networks has led to a significant growth in graph analytics and specifically in dynamic, time-varying graphs. Most prior work processes dynamic graphs by first storing the updates and then repeatedly running static graph analytics on saved snapshots. To handle the extreme scale and fast evolution of real-world graphs...
Conference Paper
Integrated GPU platforms are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges. First, substantial application knowledge is required to lev...
Conference Paper
Full-text available
Next-generation byte-addressable nonvolatile memories (NVMs), such as phase change memory (PCM) and Memristors, promise fast data storage, and more importantly, address DRAM scalability issues. State-of-the-art OS mechanisms for NVMs have focused on improving the block-based virtual file system (VFS) to manage both persistence and the memory capaci...
Article
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to address larger physical memory than virtual address bits allow, if it wishes to maintain pointer-based data structures beyond process lifetimes, or if it wishes to share...
Article
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to address larger physical memory than virtual address bits allow, if it wishes to maintain pointer-based data structures beyond process lifetimes, or if it wishes to share...
Article
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to address larger physical memory than virtual address bits allow, if it wishes to maintain pointer-based data structures beyond process lifetimes, or if it wishes to share...
Conference Paper
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to address larger physical memory than virtual address bits allow, if it wishes to maintain pointer-based data structures beyond process lifetimes, or if it wishes to share...
Article
Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processor...
Conference Paper
Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processor...
Conference Paper
Task-based runtimes like OCR, X10 and Charm++ promise to address scalability challenges on Exascale machines due to their finegrained parallelism, inherent asynchrony, and consequent efficient localized synchronization. Although such runtimes are typically used to run a single application at a time, a common HPC scenario involves running a producer...
Conference Paper
Online "big data" processing applications have seen increasing importance in the high performance computing domain, including online analytics of large volumes of data output by various scientific applications. This work contributes to answering the question of how to promote efficient collaborative science in face of unpredictable analytics worklo...
Conference Paper
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operat...
Conference Paper
Full-text available
With total app installs touching 100 Billion in 2015, the increasing num- ber of active devices that support apps are posed to result in 200 billion downloads by 2017. Data center based App stores o�ering users convenient app access, how- ever, cause congestion in the last mile of the Internet, despite use of content delivery networks (CDNs) or ISP...
Conference Paper
Full-text available
The number of apps downloaded for smart devices has surpassed 80 billion, with trends suggesting continued substantial increases. The resulting volume of app installs and updates puts pressure on the existing appdelivery infrastructure due to interactions of end user devices with app stores via the Internet that involve app stores’ data center, con...
Conference Paper
Over the last decade, Hadoop has evolved into a widely used platform for Big Data applications. Acknowledging its wide-spread use, we present a comprehensive analysis of the solved issues with applied patches in the Hadoop ecosystem. The analysis is conducted with a focus on Hadoop's two essential components: HDFS (storage) and MapReduce (computati...
Article
Next generation byte addressable nonvolatile memory (NVM) technologies like PCM are attractive for end-user devices as they offer memory scalability as well as fast persistent storage. In such environments, NVM's limitations of slow writes and high write energy are magnified for applications that need atomic, consistent, isolated and durable (ACID)...
Conference Paper
As scientific simulation applications evolve on the path towards exascale, a new model of scientific inquiry is required where concurrently with the running simulation, online analytics operate on the data it produces. By avoiding offline data storage except when absoluately necessary, it enables speeding up the scientific discovery process by prov...
Article
This paper presents HeteroVisor, a heterogeneity-aware hypervisor, that exploits resource heterogeneity to enhance the elasticity of cloud systems. Introducing the notion of 'elasticity' (E) states, HeteroVisor permits applications to manage their changes in resource requirements as state transitions that implicitly move their execution among heter...
Article
Applications can map data on SSDs into virtual memory to transparently scale beyond DRAM capacity, permitting them to leverage high SSD capacities with few code changes. Obtaining good performance for memory-mapped SSD content, however, is hard because the virtual memory layer, the file system and the flash translation layer (FTL) perform address t...
Article
This paper presents HeteroVisor, a heterogeneity-aware hypervisor, that exploits resource heterogeneity to enhance the elasticity of cloud systems. Introducing the notion of 'elasticity' (E) states, HeteroVisor permits applications to manage their changes in resource requirements as state transitions that implicitly move their execution among heter...
Article
Full-text available
In creating the CS 2022 Report, we were searching for a meta-innovation that would tie all these technology areas together. We found a unifying theme in seamless intelligence, where everything is connected through ubiquitous networks, interfaces, and so on. While similar to previous pervasive and ubiquitous computing scenarios, seamless intelligenc...
Conference Paper
Full-text available
Emerging byte-addressable, non-volatile memory technolo-gies (NVRAM) like phase-change memory can increase the capacity of future memory systems by orders of magnitude. Compared to systems that rely on disk storage, NVRAM-based systems promise significant improvements in perfor-mance for key applications like online transaction process-ing (OLTP)....
Article
Workload consolidation, whether via use of virtualization or with lightweight, container-based methods, is critically important for current and future datacenter and cloud computing systems. Yet such consolidation challenges the ability of current systems to meet application resource needs and isolate their resource shares, particularly for high co...
Article
The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-perf...
Conference Paper
Full-text available
High-end computing systems are becoming increasingly heteroge-neous, with nodes comprised of multiple CPUs and accelerators, like GPGPUs, and with potential additional heterogeneity in mem-ory configurations and network connectivities. Further, as we move to exascale systems, the view of their future use is one in which simulations co-run with onli...
Article
Applications running on leadership platforms are more and more bottlenecked by storage input/output (I/O). In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System (ADIOS) in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge resea...
Conference Paper
Collaborative science demands global sharing of scientific data. But it cannot leverage universally accessible cloud-based infrastructures like Drop Box, as those offer limited interfaces and inadequate levels of access bandwidth. We present the Scibox cloud facility for online sharing scientific data. It uses standard cloud storage solutions, but...
Conference Paper
As high-end systems move toward exascale sizes, a new model of scientific inquiry being developed is one in which online data analytics run concurrently with the high end simulations producing data outputs. Goals are to gain rapid insights into the ongoing scientific processes, assess their scientific validity, and/or initiate corrective or supplem...
Conference Paper
Full-text available
With an ever increasing number of networked devices used in mobile settings, or residing in homes, offices, and elsewhere, there is a plethora of potential computational infrastructure available for providing end users with new functionality and improved experiences for their interactions with the cyber physical world. The goal of our research is t...
Conference Paper
End user experiences on mobile devices with their rich sets of sensors are constrained by limited device battery lives and restricted form factors, as well as by the 'scope' of the data available locally. The 'Personal Cloud' distributed software abstractions address these issues by enhancing the capabilities of a mobile device via seamless use of...
Conference Paper
Dynamic instrumentation of GPGPU binaries makes possible real-time introspection methods for performance debugging, correctness checks, workload characterization, and runtime optimization. Such instrumentation involves inserting code at the instruction level of an application, while the application is running, thereby able to accurately profile dat...
Conference Paper
In recent years, streaming-based data processing has been gaining substantial traction for dealing with overwhelming data generated by real-time applications, from both enterprise sources and scientific computing. In this work, however, we look at an emerging class of scientific data with Near Real-Time (NRT) requirement, in which data is typically...
Conference Paper
This paper explores the performance implications of using future byte addressable non-volatile memory (NVM) like PCM in end client devices. We explore how to obtain dual benefits - increased capacity and faster persistence - with low overhead and cost. Specifically, while increasing memory capacity can be gained by treating NVM as virtual memory, i...
Article
Full-text available
Cloud computing has become a very hot topic over the past few years. One of the main requirements in cloud computing environments is a high degree of automation for provisioning and dynamic management of IT resources (compute, storage and network resources) ...
Conference Paper
The MapReduce programming model is introduced for big-data processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores a...
Conference Paper
Severe I/O bottlenecks on High End Computing platforms call for running data analytics in situ. Demonstrating that there exist considerable resources in compute nodes un-used by typical high end scientific simulations, we leverage this fact by creating an agile runtime, termed GoldRush, that can harvest those otherwise wasted, idle resources to eff...
Conference Paper
The growth in browser-based computations is raising the need for efficient local storage for browser-based applications. A standard approach to control how such applications access and manipulate the underlying platform resources, is to run in-browser applications in a sandbox environment. Sandboxing works by static code analysis and system call in...
Conference Paper
Multicore platforms are moving from small numbers of homogeneous cores to 'scale out' designs with multiple tiles or 'islands' of cores residing on a single chip, each with different resources and potentially controlled by their own resource managers. Applications running on such machines, however, operate across multiple such resource islands, and...
Conference Paper
FastMR is a graph-style framework for steam-oriented applications to realize near real-time streaming data record processing, and more importantly, complex coordinations between those applications. We introduces two components --- compressed buffer trees (CBTs) and shared reducer trees (SRTs) to assist with this task. CBTs address the problem of ma...
Conference Paper
The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory effic...
Article
Mobile devices and applications exhibit highly diverse behavior in their usage and power/performance requirements. In order to accommodate such diversity, this paper presents ‘HeteroMates’ system that uses heterogeneous processors to extend the dynamic power/performance range of client devices, i.e., offer both high performance and reduced power co...
Conference Paper
Distributed key-value stores employ large main memory caches to mitigate the high costs of disk access. A challenge for such caches is that large scale distributed stores simultaneously face multiple workloads, often with drastically different characteristics. Interference between such competing workloads leads to performance degradation through in...
Conference Paper
Full-text available
The commoditization of high performance interconnects, like 40+ Gbps InfiniBand, and the emergence of low-overhead I/O virtualization solutions based on SR-IOV, is enabling the proliferation of such fabrics in virtualized datacenters and cloud computing platforms. As a result, such platforms are better equipped to execute workloads with diverse I/O...
Conference Paper
The remote visual exploration of live data generated by scientific simulations is useful for scientific discovery, performance monitoring, and online validation for the simulation results. Online visualization methods are challenged, however, by the continued growth in the volume of simulation output data that has to be transferred from its source...
Conference Paper
Accelerated and in-core implementations of Big Data applications typically require large amounts of host and accelerator memory as well as efficient mechanisms for transferring data to and from accelerators in heterogeneous clusters. Scheduling for heterogeneous CPU and GPU clusters has been investigated in depth in the high-performance computing (...
Conference Paper
Data Stream Processing is an important class of data intensive applications in the "Big Data" era. Chip Multi-Processors (CMPs) are the standard hosting platforms in modern data centers. Gaining high performance for stream processing applications on CMPs is therefore of great interest. Since the performance of stream processing applications largely...
Conference Paper
While GPUs have become prominent both in high performance computing and in online or cloud services, they still appear as explicitly selected 'devices' rather than as first class schedulable entities that can be efficiently shared by diverse server applications. To combat the consequent likely under-utilization of GPUs when used in modern server or...
Conference Paper
This paper presents a software-controlled technique for managing the heterogeneous memory resources of next generation multicore platforms with fast 3D die-stacked memory and additional slow off-chip memory. Implemented for virtualized server systems, the technique detects the 'hot' pages critical to program performance in order to then maintain th...
Article
We present CCM (Cloud Capacity Manager) - a prototype system and its methods for dynamically multiplexing the compute capacity of virtualized datacenters at scales of thousands of machines, for diverse workloads with variable demands. Extending prior studies primarily concerned with accurate capacity allocation and ensuring acceptable application p...
Conference Paper
Full-text available
On-chip heterogeneity has become key to balancing performance and power constraints, resulting in disparate (functionally overlapping but not equivalent) cores on a single die. Requiring developers to deal with such heterogeneity can impede adoption through increased programming effort and result in cross-platform incompatibility. We propose that s...
Conference Paper
To deal with the inordinate output data volumes of current and future high end simulations, researchers are turning to online methods in which multiple software components that implement desired data analytics and visualization are run on 'staging resources' of the petascale machine, concurrently and coupled with the simulations producing these out...
Conference Paper
Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored...
Conference Paper
Heterogeneous processors, consisting of a mix of high-performance 'brawny' processors and low-power 'wimpy' processors, have been proposed to achieve higher energy-efficiency by making it possible for different applications within a diverse mix of workloads to be run on the most appropriate cores. This paper performs a comparative analysis of such...
Conference Paper
Exponentially increasing transistor density with each processor generation, along with constant chip-level power budgets and a slower rate of improvement in transistor power dissipation, exponentially decreases the percentage of transistors that can switch on simultaneously. Such 'over-provisioned multicore' systems require active power management...
Conference Paper
Lack of I/O scalability is known to cause measurable slowdowns for large-scale scientific applications running on high end machines. This is prompting researchers to devise 'I/O staging' methods in which outputs are processed via online analysis and visualization methods to support desired science outcomes. Organized as online workflows and carried...
Conference Paper
Full-text available
Rapid checkpointing will remain key functionality for next generation high end machines. This paper explores the use of node-local nonvolatile memories (NVM) such as phase-change memory, to provide frequent, low overhead checkpoints. By adapting existing multi-level checkpoint techniques, we devise new methods, termed NVM-checkpoints, that efficien...
Article
Full-text available
In order to curtail the continuous increase in power consumption of modern datacenters, researchers are responding with sophisticated energy-aware workload management methods. This increases the complexity and cost of the management operation, and may lead to increases in failure rates. The goal of this paper is to illustrate that there exists cons...
Conference Paper
Full-text available
Data-Intensive infrastructures are increasingly used for on-line processing of live data to guide operations and decision making. VScope is a flexible monitoring and analysis middleware for troubleshooting such large-scale, time-sensitive, multi-tier applications. With VScope, lightweight anomaly detection and interaction tracking methods can be ru...
Conference Paper
Flume [1] is a widely used open-source real-time data processing framework. We have proposed, in a Middleware'12 research track paper [2], a middleware named VScope for troubleshooting complex big data systems like Flume. This poster introduces the recent evolution from Flume Old Generation (OG) to Flume New Generation (NG) [4], and research improv...
Conference Paper
Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new 'data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online da...
Conference Paper
Real-time data processing frameworks like S4 and Flume have become scalable and reliable solutions for acquiring, moving, and processing voluminous amounts of data continuously produced by large numbers of online sources. Yet these frameworks lack the elasticity to horizontally scale-up or scale-down their based on current rates of input events and...