Daniel Nurmi

Daniel Nurmi
University of California, Santa Barbara | UCSB · Department of Computer Science

PhD

About

44
Publications
12,456
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,918
Citations

Publications

Publications (44)
Article
Deadline-sensitive workflows require careful coordination of user constraints with resource availability. Current distributed resource access models provide varying degrees of resource control: from limited or none in grid batch systems to explicit in cloud systems. Additionally applications experience variability due to competing user loads, perfo...
Conference Paper
Full-text available
Today's scientific workflows use distributed heterogeneous resource s through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical appli- cations, predictable quality of service is necessary across these dis- tributed resources. VGrADS' virtual grid execution system (vgES) provides an...
Article
Full-text available
Utility computing, elastic computing, and cloud computing are all terms that refer to the concept of dynamically provisioning processing time and storage space from a ubiquitous "cloud" of computational resources. Such systems allow users to acquire and release the resources on demand and provide ready access to data from processing elements, while...
Article
In high-performance computing (HPC) settings, in which multiprocessor machines are shared among users with potentially competing resource demands, processors are allocated to user workload using space sharing. Typically, users interact with a given machine by submitting their jobs to a centralized batch scheduler that implements a site-specific, an...
Conference Paper
Full-text available
Cloud computing systems fundamentally provide ac- cess to large pools of data and computational resources through a variety of interfaces similar in spirit to exist- ing grid and HPC resource management and program- ming systems. These types of systems offer a new pro- gramming target for scalable application developers and have gained popularity o...
Article
We present a framework for making computation offloading decisions in computational grid settings in which schedulers determine when to move parts of a computation to more capable resources to improve performance. Such schedulers must predict when an offloaded computation will outperform one that is local by forecasting the local cost (execution ti...
Conference Paper
Providing QoS (quality of service) in batch resources against the uncertainty of resource availability due to the space-sharing nature of scheduling policies is a critical capability required for high-performance computing. This paper introduces a technique called personal cluster which reserves a partition of batch resources on user's demand in a...
Conference Paper
We present a framework for making computation offloading decisions in computational grid settings in which schedulers determine when to move parts of a computation to more capable resources to improve performance. Such schedulers must predict when an offloaded computation will outperform one that is local by forecasting the local cost (execution ti...
Conference Paper
In high-performance computing (HPC) settings, in which multiprocessor machines are shared among users with potentially competing resource demands, processors are allocated to user workload using space sharing. Typically, users interact with a given machine by submitting their jobs to a centralized batch scheduler that implements a site-specific pol...
Conference Paper
In high-performance computing (HPC) settings, in which multi- processor machines are shared among users with potentially com- peting resource demands, processors are allocated to user work- load using space sharing. Typically, users interact with a given ma- chine by submitting their jobs to a centralized batch scheduler that implements a site-spec...
Article
Most space-sharing parallel computers presently operated by production high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources may choose among different queues (charging different amounts) potentially on a number of machines to which they have acc...
Conference Paper
In this article, we investigate the dynamics exhibited by the production Condor pool at the University of Wisconsin with the goal of understanding its distributional properties. Condor is a cycle-harvesting service originally designed to launch and control "guest" user jobs (in batch mode) on idle workstations. Since its inception in 1985, however,...
Conference Paper
Full-text available
Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would assign a one-to-one mapping of applications to machines. Modern HPC users have simultaneous access to a large number of individual machines and are beginnin...
Article
Full-text available
Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typi- cally had access to relatively few individual supercompute rs and, in general, would assign a one-to-one mapping of ap- plications to machines. Modern HPC users have simultane- ous access to a large number of individual machines and are b...
Article
Most space-sharing resources presently operated by high performance computing centers employ some sort of batch queueing system to manage resource allocation to multiple users. In this work, we explore a new method for providing end-users with predictions of the bounds on queuing delay individual jobs will experience when waiting to be scheduled to...
Conference Paper
Full-text available
Desktop Grids have proved to be a suitable platform for the execution of Bag-of-Tasks applications but, being char- acterized by a high resource volatility, require the availability of scheduling techniques able to effectively deal with resource failures and/or unplanned periods of unavailability. In this paper we present a set of fault-aware sched...
Conference Paper
Most space-sharing parallel computers presently operatedby high-performance computing centers use batch-queuing sys tems to manage processor allocation. In many cases, users wishin g to use these batch-queued resources have accounts at multiplesites and have the option of choosing at which site or sites to submi t a parallel job. In such a situatio...
Chapter
In this paper, we describe methods for predicting the performance of Computational Grid resources (machines, networks, storage systems, etc.) us- ing computationally inexpensive statistical techniques. The predictions generated in this manner are intended to support adaptive application scheduling in Grid settings, and on-line fault detection. We d...
Article
Cycle-harvesting systems such as Condor have been developed to make desktop machines in a local area (which are often similar to clusters in hardware configuration) available as a compute platform. To provide a dual-use capability, opportunistic jobs harvesting cycles from the desktop must be checkpointed before the desktop resources are reclaimed...
Article
Recent research results and infrastructure efforts demonstrate the potential effectiveness of large-scale distributed computing. Effective scheduling based on empirically verifiable models has emerged as a key factor in these successes. Moving to a new truly global computing capability will similarly depend critically on new models and scheduling t...
Conference Paper
In this paper we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federated systems. We describe one parametric model fittin...
Article
In this paper, we examine the problem of predicting ma- chine availability in desktop and enterprise computing en- vironments. Predicting the duration that a machine will run until it restarts (availability duration) is critically use- ful to application scheduling and resource characterizati on in federated systems. We describe one parametric mode...
Conference Paper
In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: exponential, Pareto, Weibull, and hyperexponential. In e...
Conference Paper
Full-text available
We present ARWin, a single user 3D augmented reality desktop. We explain our design considerations and system architecture and discuss a variety of applications and interaction techniques designed to take advantage of this new platform.
Conference Paper
Full-text available
We present a generic software framework enabling inter-application interaction in a 3D augmented reality environment. Our framework is built within a 3D AR window manager centered around ARToolkit. The user interface presents users with a simple visual mechanism to establish communications among applications in a generic way. The application interf...
Article
In this paper, we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federated systems. We describe one parametric model fitti...
Article
Full-text available
Two resource sharing environments have polarized the distributed systems research in the last years: computational Grids and Peer-to-Peer communities. They seem likely to converge into a large-scale, decentralized, and self-configuring environment that provides complex functionalities. Resource discovery is discussed in this context: we propose a f...
Article
Full-text available
Computational grids provide mechanisms for sharing and accessing large and heterogeneous collections of remote resources such as computers, online instruments, storage space, data, and applications. Resources are requested ("discovered") by specifying a set of desired attributes. Resource attributes have various degrees of dynamism, from mostly sta...
Conference Paper
Computational grids provide mechanisms for sharing and accessing large and heterogeneous collections of remote resources such as computers, online instruments, storage space, data, and applications. Resources are requested by specifying a set of desired attributes. Resource attributes have various degrees of dynamism, from mostly static attributes,...
Conference Paper
Full-text available
Systems administrators of large clusters often need to perform the same administrative task hundreds or thousands of times. Administrators have traditionally performed some time-consuming tasks, such as operating system installation, configuration, and maintenance, manually. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with...
Conference Paper
Full-text available
In this paper, we describe the use of a cluster as a generalized facility for development. A development facility is a system used primarily for testing and development activities while being operated reliably for many users. We are in the midst of a project to build and operate a large-scale development facility. We discuss our motivation for usin...
Conference Paper
A critical but often ignored component of system performance is the I/O system. Today’s applications demand a great deal from underlying storage systems and software, and both high-performance distributed storage and high level interfaces have been developed to fill these needs. In this paper we discuss the I/O performance of a parallel scientific...
Conference Paper
Most space-sharing parallel computers presently operated by high-performance computing centers use batch-queuing systems to manage processor allocation. Because these machines are typically “space-shared,” each job must wait in a queue until sufficient processor resources become available to service it. In production computing settings, the queuing...
Article
New science in many important disciplines now requires com-plex, time-dependent access to distributed computing resources. We ad-dresses these problems by creating a higher level resource abstraction that layers on top of existing Cyberinfrastructure that allows an ap-plication to target stable resource aggregations spanning sites through a well-de...
Article
Full-text available
Utility computing, elastic computing, and cloud computing are all terms that refer to the concept of dynamically provisioning processing time and storage space from a ubiquitous "cloud" of computational re- sources. Such systems allow users to acquire and re- lease the resources on demand and provide ready ac- cess to data from processing elements,...
Article
In this paper, we consider the problem of modeling ma- chine availability in enterprise-area and wide-area dis- tributed computing settings. Using availability data gath - ered from three different environments, we detail the suit- ability of four potential statistical distributions for ea ch data set: exponential, Pareto, Weibull, and hyperexponen...
Article
In high-performance computing (HPC) settings, in which mul- tiprocessormachinesaresharedamonguserswithpotentiallycom- peting resource demands, processors are allocated to user work- load using space sharing. Typically, users interact with a given machine by submitting their jobs to a centralized batch scheduler that implements a site-specific, and...
Article
Full-text available
We present ARWin, a single user 3D augmented reality desktop win- dow manager, placing 3D user interfaces into a physical desktop workspace. We explain our design considerations and system architecture, exhibiting the ease with which such a system can be developed and used. We showcase a number of novel 3D applications, which take advantage of the...
Article
In this paper, we describe a system for application check- point scheduling in volatile resource environments. Our ap- proach combines historical measurements of resource avail- ability with an estimate of checkpoint/recovery delay to gen- erate checkpoint intervals that minimize overhead. When executing in a desktop computing or resource har- vest...
Article
Most space-sharing parallel computers presently operatedby high-performance computing centers use batch-queuing sys tems to manage processor allocation. In many cases, users wishin g to use these batch-queued resources have the option of choos ing between different queues (having different charging rates ) poten- tially on a number of different mac...

Network

Cited By