Bruce Hendrickson

Bruce Hendrickson
  • Sandia National Laboratories

About

154
Publications
26,515
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,519
Citations
Current institution
Sandia National Laboratories

Publications

Publications (154)
Article
Full-text available
It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position paper defining the problem and announcing our intention to launch an open effort to define this sta...
Article
The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.twitterFollow us on Twitter ...
Article
Given its leading role in high-performance computing for modeling and simulation and its many experimental facilities, the US Department of Energy has a tremendous need for data-intensive science. Locating the challenges and commonalities among three case studies illuminates, in detail, the technical challenges involved in realizing data-intensive...
Article
Communities of vertices within a giant network such as the World Wide Web are likely to be vastly smaller than the network itself. However, Fortunato and Barthélemy have proved that modularity maximization algorithms for community detection may fail to resolve communities with fewer than √L/2 edges, where L is the number of edges in the entire netw...
Article
Full-text available
Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is gro...
Article
In the past two decades, computational methods have emerged as an essential component of the scientific and engineering enterprise. A diverse assortment of scientific applications has been simulated and explored via advanced computational techniques. Computer vendors have built enormous parallel machines to support these activities, and the researc...
Conference Paper
Despite decades of activity, parallel computing remains immature. Like much of computer science, advances in the field are driven by a mixture of theoretical insights and technological advances. But in parallel computing, the gap between theory and practice remains disconcertingly wide. Key theoretical concepts in parallel computing were developed...
Article
Large, complex graphs arise in many settings including the Internet, social networks, and communication networks. To study such data sets, the authors explored the use of high-performance computing (HPC) for graph algorithms. They found that the challenges in these applications are quite different from those arising in traditional HPC applications...
Article
Full-text available
BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale, with 131,072 processors, and absolute performance, with a peak rate of 367 Tflop/s. BG/L has led the last four Top500 lists with a Linpack rate of 280.6 Tflop/s for the ful...
Article
Over the past half-century, the Applied Mathematics program in the U.S. Department of Energy's Office of Advanced Scientific Computing Research has made significant, enduring advances in applied mathematics that have been essential enablers of modern computational science. Motivated by the scientific needs of the Department of Energy and its predec...
Conference Paper
Since the early days of supercomputing, numerical routines have caused the highest demand for computing power anywhere, making their efficient parallelization one of the core methodical tasks in high-performance computing. And still, many of today’s fastest computers in the world are mostly used for the solution of huge systems of equations as they...
Article
Full-text available
In this paper we apply theoretical and practical results from facility location theory to the problem of community detection in networks. The result is an algorithm that computes bounds on a minimization variant of local modularity. We also define the concept of an edge support and a new measure of the goodness of community structures with respect...
Conference Paper
Full-text available
Search-based graph queries, such as finding short paths and isomorphic subgraphs, are dominated by memory latency. If input graphs can be partitioned appropriately, large cluster-based computing platforms can run these queries. However, the lack of compute-bound processing at each vertex of the input graph and the constant need to retrieve neighbor...
Article
Full-text available
Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for develop...
Article
Latent semantic analysis (LSA) is a method for information retrieval and processing which is based upon the singular value decomposition. It has a geometric interpretation in which objects (e.g. documents and keywords) are placed in a low-dimensional geometric space. In this paper, we derive an alternative algebraic/geometric method for placing obj...
Article
Full-text available
Support theory is a methodology for bounding eigenvalues and generalized eigenvalues of matrices and matrix pencils; such bounds have been stated both in algebraic terms and in combinatorial terms based on embeddings of the underlying graphs of the matrices. In this paper, we present a theorem that demonstrates the connection between these various...
Conference Paper
Full-text available
The Cray MTA-2 system provides exceptional perfor- mance on a variety of sparse graph algorithms. Unfor- tunately, it was an extremely expensive platform. Cray is preparing an Eldorado platform that leverages the Cray XT3 network and system infrastructure while integrating a new revision of the MTA-2 processors that is pin compatible with the AMD O...
Chapter
This paper addresses the problem of partitioning the nonzeros of sparse nonsymmetric and nonsquare matrices in order to efficiently compute parallel matrix-vector and matrix-transpose-vector multiplies. Our goal is to balance the work per processor while keeping communications costs low. Although the symmetric partitioning problem has been well-stu...
Article
Full-text available
Many emerging applications are built upon large, unstructured datasets that exhibit highly irregular (or even nearly random) memory access patterns. Examples include informatics applications, and other problems that are often represented by unstructured graph-based data structures. It is well known that these applications are challenging for conven...
Article
Full-text available
Combinatorial algorithms have long played a crucial enabling role in scientific and engineering computations. The importance of discrete algorithms continues to grow with the demands of new applications and advanced architectures. This paper surveys some recent developments in this rapidly changing and highly interdisciplinary field.
Conference Paper
Combinatorial algorithms have long played a crucial, albeit under-recognized role in scientific computing. This impact ranges well beyond the familiar applications of graph algorithms in sparse matri- ces to include mesh generation, optimization, computational biology and chemistry, data analysis and parallelization. Trends in science and in comput...
Chapter
Sparse matrix-vector multiplication is the kernel for many scientific computations. Parallelizing this operation requires the matrix to be divided among processors. This division is commonly phrased in terms of graph partitioning. Although this abstraction has proved to be very useful, it has significant flaws and limitations. The cost model implic...
Article
Full-text available
We will discuss our experiences in designing and us-ing a software infrastructure for processing seman-tic graphs on massively multithreaded computers. We have developed implementations of several algo-rithms for connected components, subgraph isomor-phism, and s-t connectivity. We will discuss their performance on the existing Cray MTA-2, and thei...
Conference Paper
Full-text available
A new trend in processor design is increased on-chip support for multithreading in the form of both chip multiprocessors and simultaneous multithreading. Recent research in data- base systems has begun to explore increased thread-level parallelism made possible by these new multicore and mul- tithreaded processors. The question of how best to use t...
Article
Latent semantic analysis (LSA) is a method for information retrieval and processing which is based upon the singular value decomposition. It has a geometric interpretation in which objects (e.g. documents and keywords) are placed in a low-dimensional geometric space. In this paper, we derive an alternative algebraic/geometric method for placing obj...
Conference Paper
Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth- first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes...
Article
Full-text available
In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But, there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment...
Article
Full-text available
The method of discrete ordinates is commonly used to solve the Boltzmann transport equation. The solution in each ordinate direction is most efficiently computed by sweeping the radiation flux across the computational grid. For unstructured grids this poses many challenges, particularly when implemented on distributed-memory parallel machines where...
Article
The traditional, serial, algorithm for finding the strongly connected components in a graph is based on depth first search and has complexity which is linear in the size of the graph. Depth first search is difficult to parallelize, which creates a need for a different parallel algorithm for this problem. We describe the implementation of a recently...
Article
Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms...
Article
As the need for complex parallel simulation software grows, better strategies for efficient and effective software development become important. We advocate a toolkit-or 'tinkertoy'-approach to parallel application development. By providing efficient implementations of basic services commonly needed by applications, toolkits allow application devel...
Article
Design and analysis of algorithms graph algorithms parallel algorithms strongly connected components divide--and--conquer discrete ordinates method Abstract: Strongly connected components of a directed graph can be found in an optimal linear time, by algorithms based on depth first search. Unfortunately, depth first search is difficult to paralleli...
Article
Full-text available
Combinatorial algorithms have long played a pivotal enabling role in many applications of parallel computing. Graph algorithms in particular arise in load balancing, scheduling, mapping and many other aspects of the parallelization of irregular applications. These are still active research areas, mostly due to evolving computational techniques and...
Article
This paper analyses a novel method for constructing preconditioners for diagonally dominant symmetric positive-definite matrices. The method discussed here is based on a simple idea: we construct M by simply dropping offdiagonal non-zeros from A and modifying the diagonal elements to maintain a certain row-sum property. The preconditioners are exte...
Article
Full-text available
We consider linear systems arising from the use of the finite element method for solving scalar linear elliptic problems. Our main result is that these linear systems, which are symmetric and positive semidefinite, are well approximated by symmetric diagonally dominant matrices. Our framework for defining matrix approximation is support theory. Sig...
Article
We show in this note how support preconditioners can be applied to a class of linear systems arising from use of the finite element method to solve linear elliptic problems. Our technique reduces the problem, which is symmetric and positive definite, to a symmetric positive definite diagonally dominant problem. Significant theory has already been d...
Article
Full-text available
Many parallel applications require periodic redistribution of workloads and associated data. In a distributed memory computer, this redistribution can be difficult if limited memory is available for receiving messages. We propose a model for optimizing the exchange of messages under such circumstances which we call the minimum phase remapping probl...
Article
Full-text available
We present support theory, a set of techniques for bounding extreme eigenvalues and condition numbers for matrix pencils. Our intended application of support theory is to enable proving condition number bounds for preconditioners for symmetric, positive definite systems. One key feature sets our approach apart from most other works: We use support...
Article
Discrete ordinates methods are commonly used to simulate radiation transport for fire or weapons modeling. The computation proceeds by sweeping the flux across a grid. A particular cell can't be computed until all the cells immediately upwind of it are finished. If the directed dependence graph for the grid cells contains a cycle then sweeping meth...
Article
Full-text available
The Zoltan library is a collection of data management services for parallel, unstructured, adaptive, and dynamic applications that is available as open-source software from www.cs.sandia.gov/zoltan. It simplifies the load-balancing, data movement, unstructured-communication, and memory usage difficulties that arise in dynamic applications such as a...
Article
The explosive growth in the availability of information is overwhelming traditional information management systems. Although individual pieces of information have become easy to find, the larger context in which they exist has become harder to track. These contextual questions are ideally suited to visualization since the humrex visual system is re...
Conference Paper
In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment...
Article
Envelope methods for solving sparse systems of linear equations require the matrix to be reordered so that the nonzeros are near the diagonal. Optimal reorderings are known to be NP-complete, but a variety of heuristics have been proposed. In this paper we describe a multilevel approach for finding small envelope orderings and related ordering prob...
Article
Introduction and General Principles Philosophy of Zoltan Coding Principles in Zoltan Include files Global Variables Function Names Parallel Communication Memory Management Errors, Warnings and Return Codes Zoltan Distribution CVS Layout of Directories Compilation and Makefiles Load-Balancing Interface and Data Structures Interface Functions ID Data...
Article
Full-text available
. We present a little-known preconditioning technique, called support-graph preconditioning, and use it to analyze two classes of preconditioners. The technique was first described in a talk by Pravin Vaidya, who did not formally publish his results. Vaidya used the technique to devise and analyze a class of novel preconditioners. The technique was...
Conference Paper
Full-text available
Graph partitioning is an important tool for dividing work amongst processors of a parallel machine, but it is unsuitable for some important applications. Specifically, graph partitioning requires the work per processor to be a simple sum of vertex weights. For many applications, this assumption is not true --- the work (or memory) is a complex func...
Conference Paper
:Developing parallel software for unstructured problems continues to be a difficultundertaking, particularly for distributed memory machines. Framework and librarysupport are limited for non-standard applications and developers are often forced tocode from scratch. This is particularly true for complex, unstructured applications.In this paper, we s...
Conference Paper
Full-text available
This memorycannot be utilized in subsequent phases, decreasing the total memory which is usablefor communication, thus potentially increasing the number of phases. Instead,another processor can temporarily move some of its data to this processor to freeup space for messages. An example is illustrated in Fig. 3. In this simple example,the top two pr...
Conference Paper
The method of discrete ordinates is commonly used to solve the Boltzmann radiation transport equation for applications ranging from simulations of fires to weapons effects. The equations are most efficiently solved by sweeping the radiation flux across the computational grid. For unstructured grids this poses several interesting challenges, particu...
Article
Calculations can naturally be described as graphs in which vertices represent computation and edges reflect data dependencies. By partitioning the vertices of a graph, the calculation can be divided among processors of a parallel computer. However, the standard methodology for graph partitioning minimizes the wrong metric and lacks expressibility....
Article
Three classes of parallel algorithms for short--range classical molecular dynamics are presented and contrasted and their suitability for simulation of molecular systems is discussed. Performance of the algorithms on the Intel Paragon and Cray T3D in benchmark simulations of Lennard--Jones systems and of a macromolecular system is also highlighted....
Article
Parallel computing offers new capabilities for using molecular dynamics (MD) to simulate larger numbers of atoms and longer time scales. In this paper we discuss two methods we have used to implement the embedded atom method (EAM) formalism for molecular dynamics on multiple-instruction/multiple-data (MIMD) parallel computers. The first method (ato...
Article
Full-text available
Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describ...
Article
Effective use of a parallel computer requires that a calculation be carefully divided among the processors. This load balancing problem appears in many guises and has been a fervent area of research for the past decade or more. Although great progress has been made, and useful software tools developed, a number of challenges remain. It is the convi...
Conference Paper
The standard serial algorithm for strongly connected components is based on depth first search, which is difficult to parallelize. We describe a divide-and-conquer algorithm for this problem which has significantly greater potential for parallelization. For a graph with n vertices in which degrees are bounded by a constant, we show the expected ser...
Article
In many important computational mechanics applications, the computation adapts dynamically during the simulation. Examples include adaptive mesh refinement, particle simulations and transient dynamics calculations. When running these kinds of simulations on a parallel computer, the work must be assigned to processors in a dynamic fashion to keep th...
Conference Paper
The design of general-purpose dynamic load-balancing tools for parallel applications is more challenging than the design of static partitioning tools. Both algorithmic and software engineering issues arise. We have addressed many of these issues in the design of the Zoltan dynamic load-balancing library. Zoltan has an object-oriented interface that...
Conference Paper
Full-text available
Many parallel applications require periodic redistribution of workloads and associated data. In a distributed memory computer, this redistribution can be difficult if limited memory is available for receiving messages. We propose a model for optimizing the exchange of messages under such circumstances which we call the minimum phase remapping probl...
Article
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is...
Article
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrixtranspose -vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partition...
Article
. A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix-transposevector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitio...
Conference Paper
Algorithms for finding the prime factors of large composite numbers are of practical importance because of the widespread use of public key cryptosystems whose security depends on the presumed difficulty of the factorisation problem. In recent years ...
Article
A method of data mining represents related items in a multidimensional space. Distance between items in the multidimensional space corresponds to the extent of relationship between the items. The user can select portions of the space to perceive. The user also can interact with and control the communication of the space, focusing attention on aspec...
Article
We describe a general strategy we have found effective for parallelizing solid mechanics simulations. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrodynamics is used to model highly deforming materials. The...
Article
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrixtranspose -vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partition...
Article
A number of computational procedures employ multiple grids on which solutions are computed. For example, in multi-physics simulations a primary grid may be used to compute mechanical deformation of an object while a secondary grid is used for thermal conduction calculations. When modeling coupled thermo-mechanical effects, solution data must be int...
Article
. This paper addresses the problem of partitioning the nonzeros of sparse nonsymmetric and nonsquare matrices in order to efficiently compute parallel matrix-vector and matrix-transpose-vector multiplies. Our goal is to balance the work per processor while keeping communications costs low. Although the symmetric partitioning problem has been well-s...
Article
Full-text available
The explosive growth in the availability of information is overwhelming traditional information management systems. Although individual pieces of information have become easy to find, the larger context in which they exist has become harder to track. These contextual questions are ideally suited to visualization since the human visual system is rem...
Article
An efficient, scalable, parallel algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for multiple-instruction, multiple-data (MIMD) parallel computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRON...
Article
Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (ga...
Article
We describe a parallel algorithm for finding the eigenvalues and eigenvectors of a dense symmetric matrix, with an emphasis on the dense linear algebra operations. We follow the traditional three-step process: reduce to tridiagonal form, solve the tridiagonal problem, then backtransform the result. Since the different steps have different algorithm...
Conference Paper
Full-text available
We describe our parallelization of PRONTO, Sandia's transient solid dynamics code, via a novel algorithmic approach that utilizes multiple decompositions for different key segments of the computations, including the material contact calculation. This latter calculation is notoriously difficult to perform well in parallel, because it involves dynami...
Article
Many important macroscopic properties of materials depend upon the number of microscopic degrees of freedom. The task of counting the number of such degrees of freedom can be computationally very expensive. We describe a new approach for this calculation which is appropriate for two-dimensional, glass-like networks, building upon recent work in gra...
Conference Paper
An efficient, scalable, parallel algorithm for treating contacts in solid mechanics has been applied to interactions between particles in smooth particle hydrodynamics (SPH). The algorithm uses three different decompositions within a single timestep: (1) a static FE-decomposition of mesh elements; (2) a dynamic SPH-decomposition of SPH particles; (...
Conference Paper
Full-text available
Graph partitioning is an important abstraction used in solving many scientific computing problems. Unfortunately, the standard partitioning model does not incorporate considerations that are important in many settings. We address this by describing a generalized partitioning model which incorporates the notion of partition skew and is applicable to...
Article
Full-text available
. Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (...
Article
Full-text available
. Envelope methods for solving sparse systems of linear equations require the matrix to be reordered so that the nonzeros are near the diagonal. Optimal reorderings are known to be NP-complete, but a variety of heuristics have been proposed. In this paper we describe a multilevel approach for finding small envelope orderings and related ordering pr...
Conference Paper
Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (ga...
Conference Paper
Full-text available
Terminal propagation is a method developed in the circuit placement community for adding constraints to graph partitioning problems. This paper adapts and expands this idea, and applies it to the problem of partitioning data structures among the processors of a parallel computer. We show how the constraints in terminal propagation can be used to en...
Article
Short--range molecular dynamics simulations of molecular systems are commonly parallelized by replicated--data methods, where each processor stores a copy of all atom positions. This enables computation of bonded 2--, 3--, and 4--body forces within the molecular topology to be partitioned among processors straightforwardly. A drawback to such metho...
Article
Simulations of interacting particles are common in science and engineering, appearing in such diverse disciplines as astrophysics, fluid dynamics, molecular physics, and materials science. These simulations are often computationally intensive and so are natural candidates for massively parallel computing. Many-body simulations that directly compute...
Article
Full-text available
. The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scientific computation. A fast and efficient parallel algorithm for this calculation is therefore desirable. This paper describes a parallel matrix--vector multiplication algorithm which is particularly well suited to dense matrices or matrices with an i...
Article
Full-text available
Efficient use of a distributed memory parallel computer requires that the computational load be balanced across processors in a way that minimizes interprocessor communication. A new domain mapping algorithm is presented that extends recent work in which ideas from spectral graph theory have been applied to this problem. The generalization of spect...
Conference Paper
Full-text available
The graph partitioning problem is that of dividing the vertices of a graph into sets of specified sizes such that few edges cross between sets. This NP-complete problem arises in many important scientific and engineering problems. Prominent examples include the decomposition of data structures for parallel computation, the placement of circuit elem...
Article
Full-text available
Many scientific and engineering applications require a detailed analysis of complex systems with strongly coupled fluid flow, thermal energy transfer mass transfer and nonequilibrium chemical reactions. Here we describe the performance of a newly developed application code, SALSA, designed to simulate these complex flows on large-scale parallel mac...
Conference Paper
Given a set of objects and a correlation function f reflecting the desire for two items to be near each other, find all sequences {pi} of the items so that correlation preferences are preserved; that is if {pi}(i) < {pi}(j) < {pi}(k) then f(i,j) {ge} f(i,k) and f(j,k) {ge} f(i,k). This seriation problem has numerous applications, for instance, solv...
Conference Paper
Full-text available
Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describ...

Network

Cited By