Andrew Grimshaw

Andrew Grimshaw
University of Virginia | UVa · Department of Computer Science

PhD

About

210
Publications
13,163
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,516
Citations
Introduction
Skills and Expertise

Publications

Publications (210)
Conference Paper
Full-text available
The XSEDE project seeks to provide “a single virtual system that scientists can use to interactively share computing resources, data and experience.” The potential compute resources in XSEDE are diverse in many dimensions, node architectures, interconnects, memory, local queue management systems, and authentication policies to name a few. The diver...
Conference Paper
Universities struggle to provide both the quantity and diversity of compute resources that their researchers need when their researchers need them. Purchasing resources to meet peak demand for all resource types is cost prohibitive for all but a few institutions. Renting capacity on commercial clouds is seen as an alternative to owning. Commercial...
Article
Full-text available
Emerging challenges for scientific communities are to efficiently process big data obtained by experimentation and computational simulations. Supercomputing architectures are available to support scalable and high performant processing environment, but many of the existing algorithm implementations are still unable to cope with its architectural co...
Article
Computational and data scientists at universities have different job resource requirements. Most universities maintain a set of shared resources to support these computational needs. Access to these resources are often free and the access policy is First Come First Serve (FCFS). However, FCFS policies on shared resources often lead to sub-optimal v...
Article
Breadth-First Search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal,...
Conference Paper
Ever since the end of the era of single processor performance improvement, we observe proliferation of multi and many-core architectures in almost all spheres of computing. Tablets, desktop PCs, workstation clusters, and supercomputers are ripe with multi-core CPUs and/or accelerators. Although there are considerable architectural heterogeneity and...
Article
The UltraScan data analysis application is a software package that is able to take advantage of computational resources in order to support the interpretation of analytical ultracentrifugation experiments. Since 2006, the UltraScan scientific gateway has been used with Web browsers in TeraGrid by scientists studying the solution properties of biolo...
Article
Computing in science and engineering is now ubiquitous: digital technologies underpin, accelerate, and enable new, even transformational, research in all domains. Access to an array of integrated and well-supported high-end digital services is critical for the advancement of knowledge. Driven by community needs, the Extreme Science and Engineering...
Article
In this issue of IEEE Cloud Computing, EIC Mazin Yousif talks with experts from US and European universities about the current state of cloud computing as well as where the technology is heading over next 5 to 10 years. They cover critical cloud topics, such as security, privacy, and standardization.
Conference Paper
With ever expanding datasets, efficient data management in grids becomes important. This paper describes Cabinet which employs two techniques for efficiently managing data in grids-a caching system and a new file staging approach called coordinated staging. The caching system is designed based on the characteristics of grid applications. Coordinate...
Conference Paper
Parameter sweeps are used by researchers with scientific domain-specific tools or workflows to submit a large collection of computational jobs whereby each single job of it only varies in certain parts. They require a more fine-grained distribution of jobs across resources, which also raise a significant challenge for efficient resource management...
Conference Paper
As computing resources have become ubiquitous, computational research initiatives have spread into a wider variety of disciplines. With the variety of computing environments dramatically expanded, using available compute resources can be a much more complicated proposition. Additionally, users in disciplines that are not traditionally compute-heavy...
Conference Paper
Full-text available
The UltraScan data analysis application is a software package that is able to take advantage of computational resources in order to support the interpretation of analytical ultracentrifugation (AUC) experiments. Since 2006, the UltraScan scientific gateway has been used with ordinary Web browsers in TeraGrid by scientists studying the solution prop...
Article
Certain scientific use cases possess complex requirements to have Grid jobs executed in collections where the jobs' request contains only some variation in different parts. These scenarios can easily be tackled by a single job request which abstract this variation and can represent the same collection. The Open Grid Forum (OGF) standards community...
Conference Paper
This paper presents a user study of the process of migrating MPI applications manually. The gathered data quantifies the scale of the challenge that researchers face when attempting to use shared computing resources. Migrating to one site took on average 2.5 hours where the majority of the time was spent on learning, compiling, and debugging. Less...
Article
Federated, secure, standardized, scalable, and transparent mechanism to access and share resources, particularly data resources, across organizational boundaries that does not require application modification and does not disrupt existing data access patterns has been needed for some time in the computational science community. The Global Federated...
Article
In distributed shared resource environments, such as grids or recent clouds, one of the major challenges is how to meet users’ QoS requirements and rationally distribute resources at the same time. Computational economy has long been studied as an effective solution to address such resource-allocation problems. However, price alone has limitations...
Conference Paper
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal,...
Article
The term "campus bridging" was first used in the charge given to an NSF Advisory Committee for Cyberinfrastructure task force. That task force developed this description of campus bridging: "Campus bridging is the seamlessly integrated use of cyberinfrastructure operated by a scientist or engineer with other cyberinfrastructure on the scientist's c...
Conference Paper
Although modular programming is a fundamental software development practice, software reuse within contemporary GPU kernels is uncommon. For GPU software assets to be reusable across problem instances, they must be inherently flexible and tunable. To illustrate, we survey the performance-portability landscape for a suite of common GPU primitives, e...
Article
International e-Infrastructures offer a wide variety of Information and Communications Technology (ICT) services that federate computing, storage, networking and other hardware in order to create an 'innovative toolset' for multidisciplinary research and engineering. UNICORE services are known to be secure, reliable, and fast providing researchers...
Conference Paper
We have added support for replication of stateful resources in a Web services based grid platform. Replication allows resources to be highly available for both reading and writing. The contributions of this work are algorithms for update propagation, conflict detection, and conflict resolution for generic resources in a decentralized environment. I...
Article
The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parall...
Book
The mission of the National Science Foundation (NSF) Advisory Committee on Cyberinfrastructure (ACCI) is to advise the NSF as a whole on matters related to vision and strategy regarding cyberinfrastructure (CI) [1]. In early 2009 the ACCI charged six task forces with making recommendations to the NSF in strategic areas of cyberinfrastructure: Campu...
Article
Large organizations always have a strong demand for storage from data-intensive applications and instruments. In this paper, we present the design, implementation, and evaluation of a new virtual storage system, Storage@desk, which can aggregate a large number of distributed machines within an organization to provide storage services with quality o...
Conference Paper
This poster presents efficient strategies for sorting large sequences of fixed-length keys (and values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting methods exhibit speedup of at least 2x for all generations of NVIDIA GPGPUs, and up to 3.7x for current GT200-based models. Our implementations demonstrate sorting...
Article
While it is true that the modern computer is many orders of magnitude faster than that of yesteryear; this tremendous growth in CPU clock rates is now over. Unfortunately, however, the growth in demand for computational power has not abated; whereas researchers a decade ago could simply wait for computers to get faster, today the only solution to t...
Chapter
Introduction Background Storage@desk Storage Market Model Evaluations Future Research Directions Conclusion Acknowledgment References
Conference Paper
Data grids, such as the ones used by the high energy physics community, are used to share vast amounts of data across geographic locations. However, interactions with grid data are generally limited by the interfaces provided by the corresponding grid's infrastructure. The standardization of grid interfaces is one way to expand the reach of grid da...
Article
The lack of a single authority in the Grid environment is perhaps the biggest source of security and interoperability challenges faced by Grid systems designers. A strong commitment to meaningful, interoperable security is crucial for fostering Grid adoption and buy-in. The issues of security- interoperability are twofold: (a) grids require federat...
Article
Naming transparencies, i.e., abstracting the name and binding of the entity being used from the endpoints that are actually doing the work, are used in distributed systems to simplify application development by hiding the complexity of the environment. In this paper we demonstrate how to apply traditional distributed systems naming and binding tech...
Article
To expand the use of distributed computer infrastructures as well as facilitate grid interoperability, OGSA has developed standards and specifications that address a range of scenarios, including high-throughput computing, federated data management, and service mobility.
Conference Paper
While grid computing promises unparalleled secure sharing of computation and resources in a heterogeneous environment, it has not yet found wide-spread adoption. Without a large user base, the promises of grid computing can not be fulfilled. One reason grid computing has lacked wide adoption because users are unwilling to suffer the cost of learnin...
Conference Paper
Abstract Storage demand,in large organizations,has grown rapidly in the last few decades. In this paper, we describe Storage@desk, a new virtual distributed storage,system,that utilizes a,large,number,of distributed machines,to provide storage services wi th quality,of,service,guarantees. We,present,the Storage@desk architecture and core components...
Article
SUMMARY The average PC now contains a large and increasing amount of storage with an ever greater amount left unused. We believe there is an opportunity for organizations to harness the vast unused storage capacity on their PCs to create a very large, low cost, shared storage system. What is needed is the proper storage system architecture and soft...
Article
Full-text available
Models of parallel computation based upon mes- sage passing are in wide-spread use today, yet the message passing primitives available on different architectures are often different in subtle ways. The situation on distributed systems is even worse; not only are there different interfaces, but the services provided are not sufficient for data drive...
Conference Paper
We have designed a new storage grid called Storage@desk to harness unused storage available on desktop machines and turn it into a useful resource for clients. Given the complexity of managing clientspecific QoS requirements, and the dynamism inherent in supply and demand for resources, even a highly experienced system administrator cannot effectiv...
Conference Paper
One of the major challenges in managing resources of computational Grids with diverse shared resources is how to meet users' QoS requirements and rationally distribute resources at the same time. In particular, even though less reliable desktop PCs are dominant resource providers of computational Grids, they are often underutilized because they do...
Conference Paper
As Grids become increasingly relied upon as critical infrastructure, it is imperative to ensure the highly-available and secure day-to-day operation of the Grid infrastructure. The current approach for Grid management is generally to have geographically-distributed system administrators contact each other by phone or email to debug Grid behavior an...
Conference Paper
Full-text available
This document presents a specification for a Basic Execution Service (BES): a service to which clients can send requests to initiate, monitor, and manage computational activities. The specification defines an extensible state model for activities; an extensible information model for a BES and the activities that it creates; and two port-types; BES-...
Article
The U.S. National Cancer Institute has used a panel of 60 diverse human cancer cell lines (the NCI-60) to screen >100,000 chemical compounds for anticancer activity. However, not all important cancer types are included in the panel, nor are drug responses of the panel predictive of clinical efficacy in patients. We asked, therefore, whether it woul...
Conference Paper
In the past years, hype over Web services and their uses in emerging software applications has prompted the creation of many standards and proto-standards. The OGF has seen a number of standards making their way through design and edit pipelines. While this standards process progresses, it is important that implementations of these standards develo...
Conference Paper
Accurate failure prediction in grids is critical for reasoning about QoS guarantees such as job completion time and availability. Statistical methods can be used but they suffer from the fact that they are based on assumptions, such as time-homogeneity, that are often not true. In particular, periodic failures are not modeled well by statistical me...
Conference Paper
The average PC contains increasingly large amounts of storage with an ever greater amount left unused. There is an opportunity for organizations to harness the vast unused storage capacity on their PCs to create a very large, low cost, shared storage system. What is needed is a virtual storage system to exploit and manage the unused portions of exi...
Conference Paper
Grid computing has been a hot topic for a decade. Several systems have been developed. Despite almost a decade of research and tens of millions of dollars spent, uptake of grid technology has been slow. Most deployed grids are based on a toolkit approach that requires significant software modification or development. An operating system technique u...
Conference Paper
The open grid services architecture (OGSA) addresses the need for standardization of diverse grid services by defining a set of core capabilities and behaviors needed by loosely coupled, service-oriented grid architectures. These OGSA standards and interfaces are based on ubiquitous, platform-neutral, technologies like SOAP, XML, and Web services....
Conference Paper
Grids should not just be facilitating advances in science and engi- neering; rather they should also be making an impact on our daily lives by ena- bling sophisticated applications such as new consumer services and support for homeland defense. This is not possible today because the poor grid dependabil- ity—which is tolerated by scientific users—w...
Article
Full-text available
Legion was the first integrated grid middleware architected from first principles to address the complexity of grid environments. Just as a traditional operating system provides an abstract interface to the underlying physical resources of a machine, Legion was designed to provide a powerful virtual machine interface layered over the distributed, h...
Article
Full-text available
Successful realization of the Open Grid Services Architecture (OGSA) vision of a broadly applicable and adopted framework for distributed system integration, virtualization, and management requires the definition of a core set of interfaces, behaviors, resource models, and bindings. This document, produced by the OGSA working group within the Globa...
Conference Paper
Full-text available
Presents the welcome message from the conference proceedings.
Article
The distinguishing feature of a metasystem is middleware that facilitates viewing a collection of large, distributed, heterogeneous resources as a single virtual machine, where each user of the metasystem is identified by a unique metasystem-level identity. The physical resources of the metasystem can exist in multiple administrative domains, each...
Article
To date, grids (a form of distributed system) have been used to aggregate resources for performance-starved applications typically resulting from scientific enquiry. Grids should not just be facilitating advances in science and engineering; rather they should also be making an impact on our daily lives by enabling sophisticated applications such as...
Article
Full-text available
One benefit of a computational grid is the ability to run high-performance applications over distributed resources simply and securely. We demonstrate this benefit with an experiment in which we studied the protein folding process with the CHARMM molecular simulation package over a grid managed by Legion, a grid operating system. High-performance a...
Article
Full-text available
Grids are collections of interconnected resources harnessed to satisfy various needs of users. Legion and Globus are pioneering grid technologies. Several of the aims and goals of both projects are similar, yet their underlying architectures and philosophies differ substantially. The scope of both projects is the creation of worldwide grids; in tha...
Article
Full-text available
Grid resource management is not just about scheduling jobs on the fastest ma- chines, but rather about scheduling all compute objects and all data objects on machines whose capabilities match the requirements, while preserving site au- tonomy, recognizing usage policies and respecting conditions for use. In this chapter, we present the Grid resourc...
Article
Real-time multimedia applications such as video and audio streaming, video conferencing and online collaboration are becoming increasingly popular. In order to guarantee effective support of many of these applications, the Internet must provide absolute ...
Chapter
Grids have metamorphosed from academic projects to commercial ventures. Avaki, a leading commercial vendor of Grids, has its roots in Legion, a Grid project at the University of Virginia begun in 1993. In this chapter, we present fundamental challenges and requirements for Grid architectures that we believe are universal, our architectural philosop...
Article
The object-oriented paradigm is a powerful tool for managing software complexity. A key question when the paradigm is applied to parallel computing is whether the associated overhead is so large as to defeat the high-performance objectives that motivate parallel computing. We show that high-performance and dynamic object-oriented parallel processin...
Article
As CPU performance has rapidly improved, increased pressure has been placed on the performance of accessing external data in order to keep up with demand. Increasingly often the I/O subsystem and related software is unable to meet this demand and valuable CPU resources are left underutilized while users are forced to wait longer than necessary for...
Article
Large metasystems comprised of a variety of interconnected high-performance architectures are becoming available to researchers. To fully exploit these new systems, software must be provided that is easy to use, supports large degrees of parallelism in applications code, and manages the complexity of the underlying physical architecture for the use...
Article
Full-text available
ADAMS provides a mechanism for applications programs, written in many languages, to define and access common persistent databases. The basic constructs are element, class, set, map, attribute, and codomain. From these the user may define new data structures and new data classes belonging to a semantic hierarchy that supports multiple inheritance. T...
Article
Archetype data parallel or task parallel applications are well served by contemporary languages. However, for applications containing a balance of task and data parallelism the choice of language is less clear. While there are languages that enable both forms of parallelism, e.g., one can write data parallel programs using a task parallel language,...
Article
Process Introspection is a fundamentally new solution to the process checkpoint/restart problem suitable for use in high-performance heterogeneous distributed systems. A processcheckpoint/restart mechanism for such an environment has the primary requirement that it must be platform-independent: process checkpoints produced on a computer system of o...
Article
that provide the illusion of a single virtual machine to users, a virtual machine that provides both improved response time via parallel execution and greater throughput. Legion is targeted towards both workstation clusters and towards larger, wide-area, assemblies of workstations, supercomputers, and parallel supercomputers. Rather than construct...
Article
ADAMS provides a mechanism for applications programs, written in many languages, to define and access common persistent databases. The basic constructs are element, class, set, map, attribute, and codomain. From these the user may define new data structures and new data classes belonging to a semantic hierarchy that supports multiple inheritance.
Article
this paper discusses our approach and the current NRAO environment in more detail and then presents the details of phase one of the project, including a brief discussion of the file structure chosen, a sketch of the implementation and performance results and observations. The final section presents our future plans for the remainder of the project...
Article
Writing portable applications for parallel architectures has proven to be more difficult than writing sequential software. This is due in large part to the lack of easy-to-use, high-level abstractions. Mentat is a portable object-oriented parallel processing system that extends object encapsulation to include parallelism encapsulation. In Mentat, p...
Article
Fortran is the most widely used programming language for high-performance scientific computing applications, yet in the past the Legion system has not supported objects implemented in Fortran. This paper describes the design and interface of the Legion Basic Fortran Support (BFS) system. This system consists of compiler and runtime library that all...
Article
Grid computing is the use of large collections of heterogeneous, distributed resources (including machines, databases, devices, and users) to support large-scale computations and wide-area data access. The Legion system is an implementation of a software architecture for grid computing. The basic philosophy underlying this architecture is the prese...
Article
The Legion Grid Portal is an interface to a Grid system. Users interact with the portal, and hence a Grid, through an intuitive interfacefrom which they can view files, submit and monitor runs, and view accounting information. The architecture of the portal is designed to accommodate multiple diverse Grid infrastructures, legacy systems, and applic...
Article
ether through enhanced support for collaboration and sharing. Why don't we use metasystems? The fundamental difficulty is lack of software---specifically, an inadequate conceptual model for metasystem software design. Faced with accelerating changes in hardware and networking, the computing community has sought to stretch the existing paradigm for...
Article
Parameter-space (p-space) studies involve running a single application several times with different parameter sets. Since the jobs are mutually independent, many computing resources can be recruited to conduct an entire study in a distributed manner. The p-space studies are attractive applications for grids, which are networked collections of compu...
Article
Wide-area operating systems, or grids, present users with access to a broad range of computational resources and storage facilities. To cope with the resulting heterogeneity, current solutions such as Legion operate at user level. While this provides desirable portability, lower levels of a host such as the kernel and device drivers are necessarily...
Conference Paper
Full-text available
Realizing that current file systems can not cope with the diverse requirements of wide-area collaborations, researchers have developed data access facilities to meet their needs. Recent work has focused on comprehensive data access architectures. In order to fulfill the evolving requirements in this environment, we suggest a more fully-integrated a...
Conference Paper
Full-text available
In a Computational Grid, it is not easy to maintain grid-wide control over the number of executing jobs, as well as a global view of the status of submitted jobs, due to the heterogeneity in resource type, availability, and access policies. This paper describes the design and implementation of JobQueue, which is a Computational Grid-wide queuing sy...
Article
Realizing that current file systems can not cope with the diverse requirements of wide-area collaborations, researchers have developed data access facilities to meet their needs. Recent work has focused on comprehensive data access architectures. In order to fulfill the evolving requirements in this environment, we suggest a more fully-integrated a...
Article
One of the benefits of a computational grid is the ability to run high-performance applications over distributed resources simply and securely. We demonstrated this benefit with an experiment in which we studied the protein-folding process with the CHARMM molecular simulation package over a grid managed by a grid operating system, Legion. High-perf...
Conference Paper
Full-text available
Computational Scientists often cannot easily access the large amounts of resources their applications require. Legion is a collection of software services that facilitate the secure and easy use of local and non-local resources by providing the illusion of a single virtual machine from heterogeneous, geographically-distributed resources. This paper...
Article
The continued growth and widespread deployment of gigabit networks and high-speed network protocols will provide significant communications bandwidth between increasingly powerful computers, enabling systems containing billions of objects and millions of machines. Harnessing the power of this vast array of raw hardware requires well-organized compl...
Article
Full-text available
: The domain of file system usage spans a wide range of geographic environments, usage scenarios, and security requirements. Existing file systems generally have been designed for a particular set of scenarios; however, no one system works well across all environments simultaneously. LegionFS is a file system infrastructure that allows multiple pol...
Article
Full-text available
Legion is a large-scale metacomputing project at the University of Virginia. Legion users have requirements in many dimensions, including scheduling, security, fault tolerance, programming languages and environments, and performance. Not all users have the same needs. Further, as higher levels of services generally imply higher costs, users should...
Article
Full-text available
Introduction Grids are becoming ubiquitous platforms for high-performance computing and distributed collaboration. A grid benefits users by permitting them to access heterogeneous resources, such as machines, data, people and devices, that are distributed geographically and organisationally. It benefits organisations by permitting them to offer unu...
Article
Full-text available
This paper attempts to answer this question from an applications perspective, pointing out concrete ways in which the current practices of high performance scientific computing can possibly be improved.
Article
The unprecedented scale, heterogeneity, and varied usage patterns of grids pose significant technical challenges to any underlying file system that will support them. While grids present a host of new concerns for file access, we focus on two issues: performance and usability. We discuss the Legion I/O model and interface to address the latter area...
Article
Full-text available
The ability to capture the state of a process and later recover that state in the form of an equivalent running process is the basis for a number of important features in parallel and distributed systems. Adaptive load sharing and fault tolerance are well-known examples. Traditional state capture mechanisms have employed an external agent (such as...
Article
A metacomputing environment, or metasystem, is a collection of geographically separated resources (people, computers, devices, databases) connected by one or more high-speed networks. The distinguishing feature of a metasystem is middleware that facilitates viewing the collection of resources as a single virtual machine. The traditional requirement...
Article
ed. The result is a collection of partial solutions --- some quite good in isolation, but lacking coherence and scalability --- that make the development of even a single wide-area application demanding at best. Thus, the challenge to the computer science community is to provide a solid, integrated, conceptual foundation on which to build applicati...
Article
Full-text available
The distinguishing feature of a metasystem is middleware that facilitates viewing a collection of large, distributed, heterogeneous resources as a single virtual machine, where each user of the metasystem is identified by a unique metasystem-level identity. The physical resources of the metasystem can exist in multiple administrative domains, each...
Conference Paper
The unprecedented scale, heterogeneity and varied usage patterns of computational grids pose significant technical challenges to any underlying file system that supports them. While grids present a host of new concerns for file access, we focus on two issues: performance and usability. We discuss the Legion I/O model and interface to address the la...
Conference Paper
The importance of large-scale electrical business processing is increasing today as recent Internet technologies build on the basic infrastructure. Simply integrating existing technologies and resources to form a platform that satisfies large-scale electrical business processing requirements is nor enough, however. Legion is a wide-area distributed...