Francine Berman

Francine Berman
Rensselaer Polytechnic Institute | RPI · Department of Computer Science

About

151
Publications
10,095
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,075
Citations
Introduction

Publications

Publications (151)
Article
Full-text available
In the 21st century, digital data drive innovation and decision-making in nearly every field. However, little is known about the total size, characteristics, and sustainability of these data. In the scholarly sphere, it is widely suspected that there is a gap between the amount of valuable digital data that is produced and the amount that is effect...
Article
Data science promises new insights, helping transform information into knowledge that can drive science and industry.
Article
Full-text available
The emergent field of data science is a critical driver for innovation in all sectors, a focus of tremendous workforce development, and an area of increasing importance within science, technology, engineering, and math (STEM). In all of its aspects, data science has the potential to narrow the gender gap and set a new bar for inclusion. To evolve d...
Article
When economic models and infrastructure are not in place to ensure access and preservation, federally funded research data are "at risk.".
Article
The increasing volume of research data highlights the need for reliable, cost-effective data storage and preservation at the national scale.
Conference Paper
The new Ken Kennedy Award recognizes substantial contributions to programmability and productivity in computing and substantial community service or mentoring contributions. The award honors the remarkable research, service, and mentoring contributions of the late Ken Kennedy. It includes a $5,000 honorarium, and the first presentation of this awar...
Technical Report
Full-text available
The views and opinions expressed herein represent the rough consensus among the members of the Task Force and should not be construed to represent those of the U.S. Government, the U.S. National Science Foundation, the Library of Congress, JISC, or any of the other sponsoring agencies and organizations. Much of the information from which these prel...
Article
The article discusses data preservation, exploring the issues and trends associated with preserving digital data. The author also examines the manner in which such data can be kept manageable, available, accessible, and secure. Examples of vital information stored digitally include medical records, financial data, and photos. Other topics include c...
Chapter
Full-text available
TeraGrid is a national-scale computational science facility supported through a partnership among thirteen institutions, with funding from the US National Science Foundation [1]. Initially created through a Major Research Equipment Facilities Construction (MREFC [2]) award in 2001, the TeraGrid facility began providing production computing, storage...
Conference Paper
The 20th century brought about an "information revolution" which has forever altered the way we work, communicate, and live. In the 21st century, it is hard to imagine working without an increasingly broad array of supporting technologies and the digital data they provide.The care, management, and preservation of this tidal wave of data has become...
Article
The 20th century brought about an "information revolution" which has forever altered the way we work, communicate, and live. In the 21st century, it is hard to imagine working without an increasingly broad array of supporting technologies and the digital data they provide.
Conference Paper
Petascale computing is now a realizable goal that will impact all scientific and engineering applications. Reaching the full potential of petascale science demands that we tackle challenging problems of both hardware and software as we develop and deploy new computing systems and scale science and engineering applications to use them to their full...
Conference Paper
Full-text available
The NSF-DIGARCH is building digital preservation lifecycle management infrastructure for the preservation of large-scale multimedia collections. The infrastructure consists of interfaces to TV production lifecycle systems, metadata definition and capture systems, and a persistent archive workflow which preserves the material in a SRB data grid. Kep...
Conference Paper
Increasingly, intellectual content is “born digital.” In order to make it as easy as possible for content creators to preserve their content for the long-term, preservation processes should be integrated into the content production lifecycle. Our project takes an existing video production workflow and integrates it with a digital preservation life-...
Conference Paper
Full-text available
There is a critical need to organize, preserve, and make accessible the increasing number of digital holdings that represent intellectual capital. This intellectual capital contains scientific records that are the basis for current research, future scientific advances, and education source materials for use by the public, educators, scientists and...
Article
Full-text available
The goal of the Grid Application Development Software (GrADS) Project is to provide programming tools and an execution environment to ease program development for the Grid. This paper presents recent extensions to the GrADS software framework: (1) A new approach to scheduling workflow computations, applied to a 3-D image reconstruction application;...
Article
This chapter discusses the revolutionary changes in technology and methodology driving scientific and engineering communities to embrace Grid technologies. Today, the scientific community still leads the way as early attempts in Grid computing evolve to the more sophisticated and ubiquitous “virtual organization.” The UK e-Science concept, the NSF...
Conference Paper
Scientists have long relied on abstract models to study phenomena that are too complex for direct observation and experimentation. As new scientific modeling methodologies emerge, new computing technologies must be developed. In this paper, we focus on entity-level modeling, a modeling approach that is gaining prevalence in many scientific fields....
Article
Full-text available
The ongoing global effort of genome sequencing is making large scale comparative proteomic analysis an intriguing task. The Encyclopedia of Life (EOL; http://eol.sdsc.edu) project aims to provide current functional and structural annotations for all available proteomes, a computational challenge never seen before in biology. Using an integrative ge...
Article
Full-text available
Ensembles of widely distributed, heterogeneous resources, or Grids, have emerged as popular platforms for large-scale scientific applications. In this paper we present the Virtual Instrument project, which provides an integrated application execution environment that enables end-users to run and interact with running scientific simulations on Grids...
Article
Today, technology is ubiquitous, and nowhere more so than within the science and engineering community. Today's scientists and engineers can draw from a rich spectrum of resources--from increasingly powerful and prevalent university laboratory and departmental clusters, to scientific instruments providing a deluge of valuable data, to high performa...
Conference Paper
Full-text available
The goal of the Grid Application Development Software(GrADS) project is to provide programming tools and anexecution environment to ease program development for theGrid. In this paper, we describe several recent extensionsto the GrADS software framework that were demonstratedat the SC2003 conference: (1) A new approach to scheduling workflow comput...
Article
Computational Grids lend themselves well to parameter sweep applications, in which independent tasks calculate results for points in a parameter space. However, it is possible for a parameter space to become so large as to pose prohibitive system requirements. In these cases, user-guided searchespromise to reduce overall computation time. In this p...
Chapter
Summary of the BookPart A: OverviewPart B: Architecture and Technologies of the GridPart C: Grid Computing EnvironmentsPart D: Grid ApplicationsReferences
Chapter
SUMMARY Parameter sweep applications consist of large sets of independent tasks and arise in many fields of science and engineering. Due to their flexible task synchronization requirements, these applications are ideally suited to large-scale distributed platforms such as the Computational Grid. However, for users to readily benefit from such platf...
Article
In this paper we propose an adaptive scheduling approach designed to improve the performance of parallel applications in Computational Grid environments. A primary contribution of our work is that our design is decoupled, thus providing a separation of the scheduler itself from the application-specific components needed for the scheduling process....
Article
Full-text available
Ensembles of distributed, heterogeneous resources, also known as computational grids, have emerged as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, secondary storage, and other resources during a single...
Article
Developing grid applications is a challenging endeavor, which at the moment requires both extensive labor and expertise. The Grid Application Development Software Project (GRADS) provides a system to simplify grid application development. This system incorporates tools at all stages of the application development and execution cycle. In this chapte...
Article
In this paper, we show that distributed parallel application performance in clusters of workstations benefits from knowledge about network topology in cases where LAN resources can become potential performance bottlenecks. These performance bottlenecks often appear in common networking technologies that employ highly shared resources, such as ether...
Article
Full-text available
In most parallel supercomputers, submitting a job for execution involves specifying how many processors are to be allocated to the job. When the job is moldable (i.e., there is a choice on how many processors the job uses), an application scheduler called SA can significantly improve job performance by automatically selecting how many processors to...
Conference Paper
Advances across many fields of study are driving changes in the basic nature of scientific computing applications. Scien- tists have recognized a growing need to study phenomena by explicitly modeling interactions among individual enti- ties, rather than by simply modeling approximate collec- tive behavior. This entity-level approach has emerged as...
Conference Paper
Full-text available
Program development environments are instrumental in providing users with easy and efficient access to parallel computing platforms. While a number of such environments have been widely accepted and used for traditional HPC systems, there are currently no widely used environments for Grid programming. The goal of the Grid Application Development So...
Article
The Logistical Computing and Internetworking (LoCI) project is a reflection of the way that the next generation internetworking fundamentally changes our definition of high performance wide area computing. A key to achieving this aim is the development of middleware that can provide reliable, flexible, scalable, and cost-effective delivery of data...
Conference Paper
Computational Grids lend themselves well to parameter sweep applications, in which independent tasks calculate results for points in a parameter space. It is possible for a parameter space to become so large as to pose prohibitive system requirements. In these cases, user-directed steering promises to reduce overall computation time. In this paper,...
Article
In most parallel supercomputers, submitting a job for execution involves specifying (i) how many processors are to be allocated to the job, and (ii) for how long these processors are to be available to the job. Since most jobs are moldable (i.e., there is a choice on how many processors the job uses), the user typically has to decide how many proce...
Article
Full-text available
This paper presents the Virtual Instrument project which targets those platforms. More specifically, the project seeks to provide an integrated application execution environment that enables end-users to run and interact with running scientific simulations on the Grid. This work is performed in the specific context of a computational biology applic...
Article
Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively di#erentway---as a computational as well as an information resource. As described in the recent book #The Grid: Blueprint for a New Computing Infrastructure," this #Grid" will connect the nation's computers, databases, inst...
Article
Information Infrastructure has become a first-class tool for computational science. Today, computational and data management resources are key enablers for modeling, analyzing, and visualizing scientific phenomena as well as managing and mining immense amounts of scientific data. Over the last ten years, the ability to link computational and data s...
Article
There is a current need for scheduling policies that can leverage the performance variability observed when scheduling parallel computations on multi-user clusters. We develop one solution to this problem called stochastic scheduling that utilizes a distribution of application execution performance on the target resources to determine a performance...
Article
Full-text available
This paper describes the program execution framework being developed by the Grid Application Development Software (GrADS) Project. The goal of this framework is to provide good resource allocation for Grid applications and to support adaptive reallocation if performance degrades because of changes in the availability of Grid resources. At the heart...
Article
The achievement of parallel application performance on non-dedicated workstation clusters requires careful attention to the scheduling of tasks and communication on the underlying platform.
Article
Resource selection is fundamental to the performance of master/slave applications. In this paper, we address the problem of promoting performance for distributed master /slave applications targeted to distributed, heterogeneous "Grid" resources. We present a work-rate-based model of master/slave application performance which utilizes both system an...
Conference Paper
Projects like SETI@home have demonstrated the tremendous capabilities of Internet-connected commodity resources. The rapid improvement of commodity components makes the global computing platform increasingly viable for other large-scale data and compute-intensive applications. In this paper, we study how global computing can accommodate new types o...
Conference Paper
Full-text available
As with any computer system, the performance of supercomputers depends upon the workloads that serve as their input. Unfortunately, however, there are many important aspects of the supercomputer workloads that have not been modeled, or that have been modeled only incipiently. This paper attacks this problem by considering requested time (and its re...
Article
Full-text available
Tomography is a popular technique to reconstruct the three-dimensional structure of an object from a series of two-dimensional projections. Tomography is resource-intensive and deployment of a parallel implementation onto Computational Grid platforms has been studied in previous work. In this work, we address on-line execution of the application wh...
Article
The continuing deployment of high-performance network technology enables the development of computing platforms that aggregate widely distribute hardware resources. The vi-sion for such a Computational Grid promises computational platforms of unprecedented power for scientific applications. However, application developers need to rethink implemen-t...
Conference Paper
Tomography is a popular technique to reconstruct the three-dimensional structure of an object from a series of two-dimensional projections. Tomography is resource-intensive and deployment of a parallel implementation onto Computational Grid platforms has been studied in previous work. In this work, we address on-line execution of the application wh...
Article
Full-text available
Advancesi# networki#9 technologi#7 wi#l soon makei t possi#D9 to use the globali#balG/S7S6 i##balG tructurei# aquali#GP/W elydi#GDH/ t way---as a computation#6 ani#/H796GP/W resource. As descri# ed i# the recent book "The Gri#G Bluepri# t for a NewComputi#9 Infrastructure,"thi# "Gri#a wi#r connect thenati#7H2 computers, databases,i#atabase ts, and...
Article
Over the last decade, technologists have continued to push the envelope by creating more powerful computers and greater disk storage capacities. Increases in network technology have allowed programmers to link resources at increasingly sophisticated levels. Understanding and making scientific inferences with enormous distributed data collections th...
Article
Distributed applications executing on clustered environments typically share resources (computers and network links) with other applications. In such systems, application execution may be retarded by the competition for these shared resources. In this paper, we define a model that calculates the slowdown imposed on applications in time-shared multi...
Article
Full-text available
The Computational Grid is a promising platform for the efficient execution of parameter sweep applications over large parameter spaces. To achieve performance on the Grid, such applications must be scheduled so that shared data files are strategically placed to maximize reuse, and so that the application execution can adapt to the deliverable perfo...
Conference Paper
Full-text available
The performance of supercomputer schedulers is influenced by the workloads that serve as their input. Realistic workloads are therefore critical to evaluate how supercomputer schedulers perform in practice. There has been much written in the literature about rigid parallel jobs, i.e. jobs that require partitions of a fixed size to run. However the...
Conference Paper
Full-text available
The Computational Grid is a promising platform for the deployment of various high-performance computing applications. A number of projects have addressed the idea of software as a service on the network. These systems usually implement client-server architectures with many servers running on distributed Grid resources and have commonly been referre...
Article
The Computational Grid (21) is a promising platform for the deployment of large-scale scientific and engineering applications. Parameter Sweep Applications (PSAs) arise in many fields of science and engineering and are structured as sets of "experiments", each of which is executed with a distinct set of parameters. Given that structure, PSAs are pa...
Conference Paper
Full-text available
The Logistical Computing and Internetworking (LoCI) project is a reflection of the way that the next generation internetworking fundamentally changes our definition of high performance wide area computing. A key to achieving this aim is the development of middleware that can provide reliable, flexible, scalable, and cost-effective delivery of data...
Conference Paper
The Computational Grid is a promising platform for the efficient execution of parameter sweep applications over large parameter spaces. To achieve performance on the Grid, such applications must be scheduled so that shared data files are strategically placed to maximize reuse, and so that the application execution can adapt to the deliverable perfo...
Article
Full-text available
The Computational Grid provides a promising platform for the efficient execution of parameter sweep applications over very large parameter spaces. Scheduling such applications is challenging because target resources are heterogeneous, because their load and availability varies dynamically, and because independent tasks may share common data files....
Conference Paper
Full-text available
In this paper, we show how application scheduling can be used to reduce the turn-around time of supercomputer jobs. Our approach focuses on the use of SA, an AppLeS application scheduler, to adaptively craft the request to be submitted to the supercomputer based on the current state of the system. We demonstrate that SA significantly improves a job...
Article
Full-text available
Computational Grids are becoming an increasingly important and powerful platform for the execution of largescale, resource-intensive applications. However, it remains a challenge for applications to tap into the potential of Grid resources in order to achieve performance. In this paper, we illustrate how work queue applications can leverage Grids t...
Article
Full-text available
There is a current need for scheduling policies that can leverage the performance variability of resources on multiuser clusters. We develop one solution to this problem called stochastic scheduling that utilizes a distribution of application execution performance on the target resources to determine a performance-efficient schedule. In this paper,...
Conference Paper
Resource selection is fundamental to the performance of master/slave applications. In this paper, we address the problem of promoting performance for distributed master/slave applications targeted to distributed, heterogeneous “Grid” resources. We present a work-rate-based model of master/slave application performance which utilizes both system and...
Conference Paper
Full-text available
The computational grid provides a promising platform for the efficient execution of parameter sweep applications over very large parameter spaces. Scheduling such applications is challenging because target resources are heterogeneous, because their load and availability varies dynamically, and because independent tasks may share common data files....
Conference Paper
Full-text available
Computational grids are becoming an increasingly important and powerful platform for the execution of large-scale, resource-intensive applications. However, it remains a challenge for applications to tap into the potential of grid resources in order to achieve performance. In this paper, we illustrate how work queue applications can leverage grids...
Conference Paper
Full-text available
Computational Grids have become an important and popular computing platform for both scientific and commer- cial distributed computing communities. However, users of such systems typically find achievement of application ex- ecution performance remains challenging. Although Grid infrastructures such as Legion and Globus provide basic re- source sel...
Conference Paper
The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data files, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce a performance prediction method, AdRM (Adaptive...
Article
Full-text available
Current distributed parallel platforms can provide the resources required to execute a scientific application efficiently. However, when these platforms are shared by multiple users, performance prediction becomes increasingly difficult due to the dynamic behavior of the system. This paper addresses the use of stochastic values, represented by inte...
Article
The concept of logistical quality of service (QoS) is a generalization of the typical end-to-end model for reserving QoS, permitting much more flexible use of buffering of messages in order to achieve QoS delivery without difficult end-to-end requirements. Logistical QoS is tested as an enabling technology for the Next Generation Internet (NGI) com...