Kate Keahey

Kate Keahey
University of Chicago | UC · Computation Institute

Ph.D.

About

145
Publications
19,140
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,795
Citations

Publications

Publications (145)
Preprint
Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflo...
Poster
Full-text available
With the help of proper preemption policies and proactive resource schedulers, combining academic cloud and High Throughput Computing(HTC) systems through preemptible instances would help increase the utilization rate of the clouds and reduce the energy cost at the same time. We propose a data-driven simulator CHISim, then with which we evaluate a...
Article
Clouds are shareable scientific instruments that create the potential for reproducibility by ensuring that all investigators have access to a common execution platform on which computational experiments can be repeated and compared. By virtue of the interface they present, they also lead to the creation of digital artifacts compatible with the clou...
Conference Paper
Isolation is a desirable property for applications executing in multi-tenant computing systems. On the performance side, hardware resource isolation via partitioning mechanisms is commonly applied to achieve QoS, a necessary property for many noise-sensitive parallel workloads. Conversely, on the software side, partitioning is used, usually in the...
Conference Paper
Chameleon is a large-scale, deeply reconfigurable testbed built to support Computer Science experimentation. Unlike traditional systems of this kind, Chameleon has been configured using an adaptation of a mainstream open source infrastructure cloud system called OpenStack. In this paper, we discuss operational challenges for experimental testbeds a...
Preprint
The third Global Experimentation for Future Internet (GEFI 2018) workshop was held October 25-26, 2018 in Tokyo, Japan, hosted by the University of Tokyo. A total of forty-four participants attended, representing Belgium, Brazil, China, Denmark, France, Ireland, Japan, the Republic of Korea, and the United States. The workshop employed a mixed form...
Chapter
Storage elasticity on the cloud is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. In this chapter, the authors explore how to transparently boost the I/O bandwidth during peak utilization to deliver high performance without over-provisioning storage resources. The proposal relie...
Conference Paper
The last several years have seen an unprecedented growth in data availability, with dynamic data streams from sources ranging from social networks to small, inexpensive sensing devices. This new data availability creates an opportunity, especially in geospatial data science where this new, dynamic, data allows novel insight into phenomena ranging f...
Article
In this paper we address the problem of network contention between the migration traffic and the Virtual Machine (VM) application traffic for the live migration of co-located Virtual Machines. When VMs are migrated with pre-copy, they run at the source host during the migration. Therefore the VM applications with predominantly outbound traffic cont...
Article
The papers in this special section contribute important advances towards leveraging clouds for scientific applications. The contributions focus on a broad range of topics, including: performance modeling and optimization, data management, resource allocation and scheduling, elasticity, reconfiguration, cost prediction and optimization. Most papers...
Article
Full-text available
Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. This paper provides a transparent solution that automatically boosts I/O bandwidth during peaks for underlying virtual disks, effectively avoiding over-provisioning without performance loss. Our...
Article
In this paper we address the problem of network contention between the migration traffic and the VM application traffic for the live migration of co-located Virtual Machines (VMs). When VMs are migrated with pre-copy, they run at the source host during the migration. Therefore the VM applications with predominantly outbound traffic contend with the...
Conference Paper
Full-text available
As small, specialized sensor devices become more ubiquitous , reliable, and cheap, increasingly more domain sciences are creating " instruments at large "-dynamic, often self-organizing, groups of sensors whose outputs are capable of being aggregated and correlated to support experiments organized around specific questions. This calls for an infras...
Article
The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood warnings and forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The key elements of the IFIS are: (1) flood inundation...
Article
Full-text available
The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood warnings and forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The key elements of the IFIS are: (1) flood inundation...
Conference Paper
Full-text available
Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing. However, the traditional provisioning model of leveraging virtual disks of fixed capacity and performance characteristics has limited ability to match the increasingly dynamic nature of I/O application requirements. This mismatch is particularly problemat...
Conference Paper
Full-text available
Small specialized sensor devices capable of both reporting on environmental factors and interacting with the environment are becoming increasingly ubiquitous, reliable and inexpensive. This transformation has enabled domain sciences to create "instruments at large" – dynamic and often self-organizing groups of sensors whose outputs are capable of b...
Article
Spatial data analysis has become ubiquitous as geographic information systems (GIS) are widely used to support scientific investigations and decision making in many fields of science, engineering, and humanities (e.g., ecology, emergency management, environmental engineering and sciences, geosciences, and social sciences). Tremendous data and compu...
Conference Paper
Full-text available
Storage elasticity on IaaS clouds is an important feature for data-intensive workloads: storage requirements can vary greatly during application runtime, making worst-case over-provisioning a poor choice that leads to unnecessarily tied-up storage and extra costs for the user. While the ability to adapt dynamically to storage requirements is thus a...
Conference Paper
Infrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types...
Article
The advantages of on-demand resource availability are making cloud computing a viable platform option for research and education that may enable new practices in science and engineering.
Conference Paper
With the proliferation of infrastructure clouds it is now possible to consider developing applications capable of leveraging multi-cloud environments. Such environments provide users a unique opportunity to tune their deployments to meet specific needs (e.g., cost, reliability, performance, etc.). Open source multi-cloud scaling tools, such as Nimb...
Article
Infrastructure clouds offer tremendous potential for scientific users, however, they face numerous challenges that must be addressed before they are widely adopted by scientific communities.
Article
Full-text available
As map-reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper, we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency,...
Conference Paper
The relatively recent introduction of infrastructure-as-a-service (IaaS) clouds, such as Amazon Elastic Compute Cloud (EC2), provide users with the ability to deploy custom software stacks in virtual machines (VMs) across different cloud providers. Users can leverage IaaS clouds to create elastic environments that outsource compute and storage as n...
Conference Paper
The emergence of Cloud computing has given rise to numerous attempts to study the portability of scientific applications to this new paradigm. Tightly-coupled applications are a common class of scientific HPC applications, which exhibit specific requirements previously addressed by supercomputers. A key challenge towards the adoption of the Cloud p...
Conference Paper
Infrastructure clouds created ideal conditions for users to outsource their infrastructure needs by offering on-demand, shortterm access, pay-as-you-go business model, the use of virtualization technologies which provide a safe and cost-effective way for users to manage and customize their environments, and sheer convenience, as users and instituti...
Article
Full-text available
The ability to conduct consistent, controlled, and repeatable large-scale experiments in all areas of computer science related to parallel, large-scale, or distributed computing and networking is critical to the future and development of computer science. Yet conducting such experiments is still too often a challenge for researchers, students, and...
Article
Full-text available
The Virtual Machine framework was used to assemble the STAR-computing environment, validated once, deployed on over 100 8-core VMs at NERSC and Argonne National Lab, and used as a homogeneous Virtual Farm processing events acquired in real time by STAR detector located at Brookhaven National Lab. To provide time dependent calibration, a database sn...
Conference Paper
Resources experience dynamic load as demand fluctuates. Therefore, resource providers must estimate the appropriate amount of resources to purchase in order to meet variable user demand. With the relatively recent introduction of infrastructure-as-a-service (IaaS) clouds (e.g. Amazon EC2) resource providers may choose to outsource demand as needed....
Conference Paper
Infrastructure-as-a-service (IaaS) clouds, such as Amazon EC2, offer pay-for-use virtual resources ondemand. This allows users to outsource computation and storage when needed and create elastic computing environments that adapt to changing demand. However, existing services, such as cluster resource managers (e.g. Torque), do not include support f...
Article
Infrastructure cloud computing introduces a significant paradigm shift that has the potential to revolutionize how scientific computing is done. However, while it is actively adopted by a number of scientific communities, it is still lacking a well-developed and mature ecosystem that will allow the scientific community to better leverage the capabi...
Article
Full-text available
Infrastructure-as-a-Service (IaaS) cloud computing has rev- olutionized the way we think of acquiring resources by in- troducing a simple change: allowing users to lease compu- tational resources from the cloud provider's datacenter for a short time by deploying virtual machines (VMs) on those resources. This new model raises new challenges in the...
Conference Paper
A key advantage of infrastructure-as-a-service (IaaS) clouds is providing users on-demand access to resources. To provide on-demand access, however, cloud providers must either significantly overprovision their infrastructure (and pay a high price for operating resources with low utilization) or reject a large proportion of user requests (in which...
Article
Full-text available
The MapReduce programming model, proposed by Google, offers a simple and efficient way to perform distributed com-putation over large data sets. The Apache Hadoop frame-work is a free and open-source implementation of MapRe-duce. To simplify the usage of Hadoop, Amazon Web Ser-vices provides Elastic MapReduce, a web service that en-ables users to s...
Conference Paper
Full-text available
FutureGrid provides novel computing capabilities that enable reproducible experiments while simultaneously supporting dynamic provisioning. This paper describes the FutureGrid experiment management framework to create and execute large scale scientific experiments for researchers around the globe. The experiments executed are performed by the vario...
Article
Full-text available
Infrastructure-as-a-Service (IaaS) cloud computing has revolutionized the way we think of acquiring resources by introducing a simple change: allowing users to lease computational resources from the cloud provider's datacenter for a short time by deploying virtual machines (VMs) on those resources. This new model raises new challenges in the design...
Article
Full-text available
We describe the work on enabling one click deployment of Grid sites of AliEn Grid framework on the Nimbus 'science cloud' at the University of Chicago. The integration of computing resources of the cloud with the resource pool of AliEn Grid is achieved by leveraging two mechanisms: the Nimbus Context Broker developed at Argonne National Laboratory...
Article
Full-text available
By using cloud computing it is possible to provide on- demand resources for epidemic analysis using computer intensive applications like SaTScan. Using 15 virtual machines (VM) on the Nimbus cloud we were able to reduce the total execution time for the same ensemble run from 8896 seconds in a single machine to 842 seconds in the cloud. Using the ca...
Conference Paper
Infrastructure-as-a-Service (IaaS) cloud computing offers new possibilities to scientific communities. One of the most significant is the ability to elastically provision and relinquish new resources in response to changes in demand. In our work, we develop a model of an “elastic site” that efficiently adapts services provided within a site, such a...
Article
Amazon's S3 protocol has emerged as the de facto interface for storage in the commercial data cloud. However, it is closed source and unavailable to the numerous science data centers all over the country. Just as Amazon's Simple Storage Service (S3) provides reliable data cloud access to commercial users, scientific data centers must provide their...
Article
Full-text available
1 Introduction Management of complex, distributed, and dynamically changing job executions is a central problem in computational Grids. These executions often span multiple heterogeneous resources, cross administrative domains, and need to adjust to the changing resource availability to leverage opportunities and account for failures or policy indu...
Article
Full-text available
The primary motivation for uptake of virtualization have been resource isolation, capacity management and resource customization: isolation and capacity management allow providers to isolate users from the site and control their resources usage while customization allows end-users to easily project the required environment onto a variety of sites....
Conference Paper
Infrastructure-as-a-Service (IaaS) style cloud computing is emerging as a viable alternative to the acquisition and management of physical resources. This raises several questions. How can we take advantage of the opportunities it offers? Are the current commercial offerings suitable for science? What cloud capabilities are required by scientific a...
Conference Paper
As virtual appliances become more prevalent, we encounter the need to stop manually adapting them to their deployment context each time they are deployed. We examine appliance contextualization needs and present architecture for secure, consistent, and dynamic contextualization, in particular for groups of appliances that must work together in a sh...
Conference Paper
Full-text available
This paper explores the use of cloud computing for scientific workflows, focusing on a widely used astronomy application-Montage. The approach is to evaluate from the point of view of a scientific workflow the tradeoffs between running in a local environment, if such is available, and running in a virtual environment via remote, wide-area network r...
Conference Paper
Full-text available
As the use of virtual machines (VMs) for scientific applications be- comes more common, we encounter the need to integrate VM provisioning models into the existing resource management infrastructure as seamlessly as possible. To address such requirements, we describe an approach to VM management that uses multi-level scheduling to integrate VM prov...
Article
Full-text available
The Center for Enabling Distributed Petascale Science is developing serviced to enable researchers to manage large, distributed datasets. The center projects focus on three areas: tools for reliable placement of data, issues involving failure detection and failure diagnosis in distributed systems, and scalable services that process requests to acce...
Conference Paper
As cluster computers are used for a wider range of applications, we encounter the need to deliver resources at particular times, to meet particular deadlines, and/or at the same time as other resources are provided elsewhere. To address such requirements, we describe a scheduling approach in which users request resource leases, where leases can req...
Article
An attribute-based authorisation infrastructure developed for the Open Science Grid (OSG) is presented. The infrastructure integrates existing identity-mapping and group-membership services using concepts prototyped in the PRIMA system. Authorisation scenarios for requests to compute and data resources are detailed. A new SAML obligated authorisati...
Article
Full-text available
The Science Clouds provide EC2-style cycles to scientific projects. This document contains a description of technologies enabling this project and an early summary of its experiences.
Article
Full-text available
Petascale science is an end-to-end endeavour, involving not only the creation of massive datasets at supercomputers or experimental facilities, but the subsequent analysis of that data by a user community that may be distributed across many laboratories and universities. The new SciDAC Center for Enabling Distributed Petascale Science (CEDPS) is de...
Article
Scientists often face the need for more computing power than is available locally, but are constrained by the fact that even if the required resources were available remotely, their complex software stack would not be easy to port to those resources. Many applications are dependency-rich and complex, making it hard to run them on anything but a ded...
Article
The use of virtualization in Grid computing has seen a lot of interest lately. However, while much effort has been expanded on de- veloping the capabilities of Virtual Machine Monitors (VMMs) and associated tools and services relatively little has been done to investigate the requirements underlying the scalable production, deployment, and manageme...
Conference Paper
Virtual machines provide a promising vehicle for controlled sharing of physical resources, allowing us to instantiate a precisely defined virtual resource, configured with desired software configuration and hardware properties, on a set of physical resources. We describe a model of virtual machine provisioning in a Grid environment that allows us t...
Conference Paper
Full-text available
To enable Grid scalability and growth, a usage model has evolved whereby resource providers make resources available not to individual users directly, but rather to larger units, called virtual organizations. In this paper, we describe abstractions that allow resource providers to delegate the usage of remote resources dynamically to virtual organi...
Conference Paper
Full-text available
This paper describes the recent results of the GridShib and MyProxy projects to integrate the public key infrastructure (PKI) deployed for Grids with different site authentication mechanisms and the Shibboleth identity federation software. The goal is to enable multi-domain PKIs to be built on existing site services in order to reduce the PKI deplo...
Article
Full-text available
This paper describes the recent results of the GridShib and MyProxy projects to integrate the public key infrastructure (PKI) deployed for Grids with different site authentication mechanisms and the Shibboleth identity federation software. The goal is to enable multi-domain PKIs to be built on existing site services in order to reduce the PKI deplo...
Conference Paper
A challenging issue facing Grid communities is that while Grids can provide access to many heterogeneous resources, the resources to which access is provided often do not match the needs of a specific application or service. In an environment in which both resource availability and software requirements evolve rapidly, this disconnect can lead to r...
Article
We report on first experiences with building and operating an edge services framework (ESF) based on Xen virtual machines instantiated via the workspace service in Globus toolkit, and developed as a joint project between EGEE, LCG, and OSG. Many computing facilities are architected with their compute and storage clusters behind firewalls. Edge serv...
Conference Paper
In this work we present a novel market based resource allocation framework named Virtual Workspaces Market. Our approach is to integrate Virtual Workspaces and some Tycoon components while developing while developing a new hybrid resource allocation policy which combines the advantages of leasing and auctions based mechanisms. The rationale behind...
Conference Paper
Large Grid deployments increasingly require abstractions and methods decoupling the work of resource providers and resource consumers to implement scalable management methods. We proposed the abstraction of a Virtual Workspace (VW) describing a virtual execution environment that can be made dynamically available to authorized Grid clients by using...
Article
Full-text available
We report on first experiences with building and operating an Edge Services Framework (ESF) based on Xen virtual machines instantiated via the Workspace Service in Globus Toolkit, and developed as a joint project between EGEE, LCG, and OSG. Many computing facilities are architected with their compute and storage clusters behind firewalls. Edge Serv...
Article
The various ways of implementation of virtual workspaces and their use to build layered deployment environments in the Grid, are described. A virtual workspace is an abstraction of an execution environment that can be made dynamically available to authorized clients by using well-defined protocols. Virtual machines allow a client to create a custom...