Henri Casanova

Henri Casanova
  • University of Hawaiʻi at Mānoa

About

259
Publications
53,493
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,035
Citations
Introduction
parallel and distributed computing
Current institution
University of Hawaiʻi at Mānoa

Publications

Publications (259)
Preprint
Full-text available
The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and...
Article
Full-text available
Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasibl...
Preprint
Full-text available
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given th...
Chapter
Runtime systems that automate the execution of applications on distributed cyberinfrastructures need to make scheduling decisions. Researchers have proposed many scheduling algorithms, but most of them are designed based on analytical models and assumptions that may not hold in practice. The literature is thus rife with algorithms that have been ev...
Preprint
Full-text available
The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over t...
Article
We present parallel algorithms to efficiently permute a sorted array into the level-order binary search tree (BST), level-order B-tree (B-tree), and van Emde Boas (vEB) layouts in-place . We analytically determine the complexity of our algorithms and empirically measure their performance. When considering the total time to permute the data in-place...
Article
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number...
Preprint
Full-text available
The importance of workflows is highlighted by the fact that they have underpinned some of the most significant discoveries of the past decades. Many of these workflows have significant computational, storage, and communication demands, and thus must execute on a range of large-scale computer systems, from local clusters to public clouds and upcomin...
Conference Paper
This paper summarizes the WORKS 2021 lightning talks, which cover four broad topics: (i) libEnsemble, a Python library to coordinate the concurrent evaluation of dynamic ensembles of calculations; (ii) Edu WRENCH, a set of online pedagogic modules that provides simulation-driven hands-on activity in the browser; (iii) VisDict, an envisioned visual...
Chapter
Many scientific workflows have computational demands that require the use of compute platforms managed by batch schedulers, which are unfortunately poorly suited to these applications. This work proposes GLUME, a strategy for partitioning a workflow into batch jobs. The novelty is that these jobs are explicitly constructed to minimize overall workf...
Preprint
Full-text available
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projec...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale H...
Chapter
Improving energy efficiency has become necessary to enable sustainable computational science. At the same time, scientific workflows are key in facilitating distributed computing in virtually all domain sciences. As data and computational requirements increase, I/O-intensive workflows have become prevalent. In this work, we evaluate the ability of...
Article
Teaching parallel and distributed computing topics in a hands-on manner is challenging, especially at introductory, undergraduate levels. Participation challenges arise due to the need to provide students with an appropriate compute platform, which is not always possible. Even if a platform is provided to students, not all relevant learning objecti...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number...
Preprint
Full-text available
Scientific workflow applications have become mainstream and their automated and efficient execution on large-scale compute platforms is the object of extensive research and development. For these efforts to be successful, a solid experimental methodology is needed to evaluate workflow algorithms and systems. A foundation for this methodology is the...
Preprint
Full-text available
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upco...
Preprint
The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applica...
Article
Full-text available
Lightning talks of EduHPC are a venue where HPCeducators discuss work in progress. This paper summarizes theEduHPC 2020 lightning talks, which cover four very differentareas: (i) The simulation-based pedagogy of the EduWRENCHproject, including motivations for using simulation to teachHigh Performance Computing, the design principles underlyingEduWR...
Preprint
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed at heterogeneous, distributed resources. The workflow research and development community has employed a number...
Article
Full-text available
This paper focuses on scheduling problems related to the execution of computational jobs in datacenters with thermal constraints. Mixed integer linear programming (MILP) formulations are proposed that encompass both spatial and temporal aspects of the temperature evolution under a unified model. This model takes into account the dynamics of heat pr...
Article
While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of a widely-used application-level model that...
Article
Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empir...
Conference Paper
Wireless interconnects based on inductive coupling technology are compelling propositions for designing 3-D integrated chips. This work addresses the heat dissipation problem on such systems. Although effective cooling technologies have been proposed for systems designed based on Through Silicon Via (TSV), their application to systems that use indu...
Chapter
While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of a widely-used application-level model that...
Conference Paper
Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empir...
Article
Let T be a terrain and P be a set of points on its surface. An important problem in Geographic Information Science (GIS) is computing the visibility index of a point p on P, that is, the number of points in P that are visible from p. The total visibility-index problem asks for the visibility index of every point in P. We present the first subquadra...
Conference Paper
We study the relationship between memory accesses, bank conflicts, thread multiplicity (also known as over-subscription) and instruction-level parallelism in comparison-based sorting algorithms for Graphics Processing Units (GPUs). We experimentally validate a proposed formula that relates these parameters with asymptotic analysis of the number of...
Article
Applications structured as Directed Acyclic Graphs (DAGs) of tasks occur in many domains, including popular scientific workflows. DAG scheduling has thus received an enormous amount of attention. Many of the popular DAG scheduling heuristics make scheduling decisions based on path lengths. At large scale compute platforms are subject to various typ...
Article
Full-text available
We consider the problem of orchestrating the execution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the...
Article
Full-text available
Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, t...
Conference Paper
We present an O(n log²n) algorithm that solves the 1D total visibility-index problem in the RAM model. Our algorithm is based on a geometric dualization technique, which reduces the problem into a set of instances of the red-blue line segment intersection counting problem. We also present a parallel version of this algorithm, which requires O(log²n...
Conference Paper
Designing low-latency network topologies of switches is a key objective for next-generation parallel computing platforms. Low latency is preconditioned on low hop counts, but existing network topologies have hop counts much larger than theoretical lower bounds. The degree diameter problem (DDP) has been studied for decades and consists in generatin...
Article
Full-text available
We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks i...
Article
The off-line (or post-mortem) analysis of execution event traces is a popular approach to understand the performance of HPC applications that use the message passing paradigm. Combining this analysis with simulation makes it possible to “replay” the application execution to explore “what if?” scenarios, e.g., assessing application performance in a...
Article
The processing of moving object trajectories arises in many application domains. We focus on a trajectory similarity search, the distance threshold search, which finds all trajectories within a given distance of a query trajectory over a time interval. A multithreaded CPU implementation that makes use of an in-memory R-tree index can achieve high p...
Article
Processor failures in post-petascale parallel computing platforms are common occurrences. The traditional fault-tolerance solution, checkpoint-rollback-recovery, severely limits parallel efficiency. One solution is to replicate application processes so that a processor failure does not necessarily imply an application failure. Process replication,...
Article
Various network topologies can be used for deploying High Performance Computing (HPC) clusters. The network topology, which connects switches In cabinets on a machine room floor, is typically defined once and for all at system deployment time. For a diverse application workload, there are downsides to having a single wired topology. In this work, w...
Article
Applications in many domains perform searches over datasets that contain moving object trajectories. A common class of searches are similarity searches that attempt to identify trajectories with similar characteristics. In this work, we focus on the distance threshold similarity search that finds all trajectories within a given distance of a query...
Article
Numerical linear algebra libraries provide many kernels that can be composed to perform complex computations. For a given computation, there is typically a large number of functionally equivalent kernel compositions. Some of these compositions achieve better response times than others for particular data and when executed on a particular computer a...
Article
Applications in many domains require processing moving object trajectories. In this work, we focus on a trajectory similarity search that finds all trajectories within a given distance of a query trajectory over a time interval, which we call the distance threshold similarity search. We develop three indexing strategies with spatial, temporal and s...
Article
The study of parallel and distributed applications and platforms, whether in the cluster, grid, peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of proposed algorithmic and system solutions via simulation. Unlike direct experimentation via an application deployment on a real-world testbed, simulation enables f...
Article
Full-text available
We study the scheduling of computational workflows on compute resources thatexperience exponentially distributed failures. When a failure occurs, rollback and recovery is usedto resume the execution from the last checkpointed state. The scheduling problem is to minimizethe expected execution time by deciding in which order to execute the tasks in t...
Article
Processing moving object trajectories arises in many application domains and has been addressed by practitioners in the spatiotemporal database and Geographical Information System communities. In this work, we focus on a trajectory similarity search, the distance threshold query, which finds all trajectories within a given distance d of a search tr...
Conference Paper
With low-delay switches on the horizon, end-to-end latency in large-scale High Performance Computing (HPC) interconnects will be dominated by cable delays. In this context we define a new network topology, Skywalk, for deploying low-latency interconnects in upcoming HPC systems. Skywalk uses randomness to achieve low latency, but does so in a way t...
Conference Paper
Full-text available
The processing of queries expressed as trees of boolean operators applied to predicates on sensor data streams has several applications in mobile computing. Sensor data must be retrieved from the sensors, which incurs a cost, e.g., an energy expense that depletes the battery of a mobile query processing device. The objective is to determine the ord...
Article
Analyzing and understanding the performance behavior of parallel applications on parallel computing platforms is a long-standing concern in the High Performance Computing community. When the targeted platforms are not available, simulation is a reasonable approach to obtain objective performance indicators and explore various hypothetical scenarios...
Article
Full-text available
In this paper, we study the execution of iterative applications on volatile processors such as those found on desktop grids. We envision two models, one where all tasks are assumed to be independent, and another where all tasks are tightly coupled and keep exchanging information throughout the iteration. These two models cover the two extreme point...
Article
Random network topologies have been proposed to create low-diameter, low-latency interconnection networks in large-scale computing systems. However, these topologies are difficult to deploy in practice, especially when re-designing existing systems, because they lead to increased total cable length and cable packaging complexity. In this work we pr...
Article
Full-text available
Researchers in the area of grid/cloud computing perform many of their experiments using simulations that must capture network behavior. In this context, packet-level simulations, which are widely used to study network protocols, are too costly given the typical large scales of simulated systems and applications. An alternative is to implement netwo...
Article
Full-text available
In this paper we present Simgrid, a toolkit for the versatile simulation of large scale distributed systems, whose development effort has been sustained for the last fifteen years. Over this time period SimGrid has evolved from a one-laboratory project in the U.S. into a scientific instrument developed by an international collaboration. The keys to...
Conference Paper
Full-text available
Platforms that comprise volatile processors, such as desktop grids, have been traditionally used for executing independent-task applications. In this work we study the scheduling of tightly-coupled iterative master-worker applications onto volatile processors. The main challenge is that workers must be simultaneously available for the application t...
Article
Full-text available
High performance computing applications must be resilient to faults. The traditional fault-tolerance solution is checkpoint-recovery, by which application state is saved to and recovered from secondary storage throughout execution. It has been shown that, even when using an optimal checkpointing strategy, the checkpointing overhead precludes high p...
Conference Paper
Full-text available
Platforms that comprise volatile processors, such as desktop grids, have been traditionally used for executing independent-task applications. In this work we study the scheduling of tightly-coupled iterative master-worker applications onto volatile processors. The main challenge is that workers must be simultaneously available for the application t...
Conference Paper
As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Random network topologies can be used to achieve low hop counts between nodes and thus low latency. However, random topologies lead to increased aggregate cable length and cable packaging complexity on a machine...
Conference Paper
As the scales of supercomputers increase total cable length becomes enormous, e.g., up to thousands of kilometers. Recent high-radix switches with dozens of ports make switch layout and system packaging more complex. In this study, we study the optimization of the physical layout of topologies of switches on a machine room floor with the goal of re...
Article
As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Fortunately, modern High Performance Computing (HPC) systems can exploit low-latency topologies of high-radix switches. In this context, we propose the use of random shortcut topologies, which are generated by a...
Article
Problem and Motivation Energy-Aware Infrastructures Current Resource Management Practices Scientific and Technical Challenges Energy-Aware Job Placement Algorithms Discussion Conclusion References
Article
Full-text available
Processor failures in post-petascale settings are common occurrences. The traditional fault-tolerance solution, checkpoint-rollback, severely limits parallel efficiency. One solution is to replicate application processes so that a processor failure does not necessarily imply an application failure. Process replication, combined with checkpoint-roll...
Conference Paper
Full-text available
We propose algorithms for allocating multiple resources to competing services running in virtual machines on heterogeneous distributed platforms. We develop a theoretical problem formulation and compare these algorithms via simulation experiments based in part on workload data supplied by Google. Our main finding is that vector packing approaches p...
Article
In this paper we study the problem of energy-aware resource allocation for hosting long-term services or on-demand computing jobs in clusters, e.g., deployed as part of computing infrastructures. We formalize the problem as three constrained optimization problems: maximize job performance under power consumption constraints, minimize power consumpt...
Article
We present an approach to generate shots for 3D computer graphics cinematic sequences from event-based descriptions of scenes of conversations between groups of actors. Our approach creates camera setups using a combination of geometric constraints and aesthetic parameters, while ensuring that the resulting cinematic sequence obeys the heuristics o...
Article
Full-text available
We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based scheduling approaches have focused primarily on technical issues or extensions to existing batch scheduling systems, wh...
Article
Full-text available
Researchers in the area of distributed computing conduct many of their experiments in simulation. While packet-level simulation is often used to study network protocols, it can be too costly to simulate network communications for large-scale systems and applications. The alternative is to simulate the network based on less costly flow-level models....
Technical Report
Full-text available
The study of parallel and distributed applications and platforms, whether in the cluster, grid, peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of proposed algorithm and system solutions via simulation. Unlike direct experimentation via an application deployment on a real-world testbed, simulation enables ful...
Conference Paper
Full-text available
Placing compute jobs on clustered hosts in a way that optimizes both performance and power consumption has become a crucial issue. Most solutions to the power-aware job placement problem boil down to consolidating workload on a small number of hosts so as to reduce power consumption which achieving acceptable performance levels. The question we inv...
Conference Paper
Full-text available
In this paper we study the execution of iterative applications on volatile processors such as those found on desktop grids. We develop master-worker scheduling schemes that attempt to achieve good trade-offs between worker speed and worker availability. A key feature of our approach is that we consider a communication model where the bandwidth capa...
Article
Full-text available
An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive measures: preventive checkpointing and preventive migration. We instantiate these models for plat...
Conference Paper
Full-text available
Simulation is a popular approach for empirically evaluating the performance of algorithms and applications in the parallel computing domain. Most published works present results without quantifying simulation error. In this work we investigate accuracy issues when simulating the execution of parallel applications. This is a broad question, and we f...
Conference Paper
Full-text available
Simulation is a popular approach for predicting the performance of MPI applications for platforms that are not at one's disposal. It is also a way to teach the principles of parallel programming and high-performance computing to students without access to a parallel computer. In this work we present SMPI, a simulator for MPI applications that uses...
Article
Full-text available
This work provides a rigorous analysis of checkpointing strategies for sequential and parallel jobs. The objective is to minimize the expected job execution time in an environment that is subject to processor failures. For sequential jobs, we give the optimal solution if failure inter-arrival times are exponentially distributed. To the best of our...

Network

Cited By