Jack Dongarra

Jack Dongarra
University of Tennessee | UTK · Department of Electrical Engineering and Computer Science

Doctor of Philosophy

About

1,772
Publications
204,431
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
69,117
Citations
Introduction
Skills and Expertise
Additional affiliations
April 2007 - present
The University of Manchester
Position
  • Turing Fellow
January 2002 - present
Rice University
Position
  • Professor (Associate)
June 1993 - August 1993
Ecole normale supérieure de Lyon
Position
  • Visiting Scientist

Publications

Publications (1,772)
Article
Full-text available
Convex optimization solvers are widely used in the embedded systems that require sophisticated optimization algorithms including model predictive control (MPC). In this paper, we aim to reduce the online solve time of such convex optimization solvers so as to reduce the total runtime of the algorithm and make it suitable for real-time convex optimi...
Article
This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one...
Article
Full-text available
The generalized minimum residual method (GMRES) is a commonly used iterative Krylov solver for sparse, non-symmetric systems of linear equations. Like other iterative solvers, data movement dominates its run time. To improve this performance, we propose running GMRES in reduced precision with key operations remaining in full precision. Additionally...
Article
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high...
Article
Geostatistical modeling is an efficient technique for climate and environmental analysis and prediction. A primary computational kernel of stationary spatial statistics is the evaluation of the Gaussian maximum log-likelihood estimation (MLE) function, whose central data structure is a dense, symmetric, and positive-definite covariance matrix of th...
Article
Full-text available
The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine learning applications, the traditional numerical algorithms community urgently needs to reconsi...
Book
The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 fu...
Book
The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 fu...
Book
The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 fu...
Book
The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 fu...
Book
The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 fu...
Book
The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 fu...
Chapter
Full-text available
The GMRES method is used to solve sparse, non-symmetric systems of linear equations arising from many scientific applications. The solver performance within a single node is memory bound, due to the low arithmetic intensity of its computational kernels. To reduce the amount of data movement, and thus, to improve performance, we investigated the eff...
Chapter
This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and pro...
Preprint
Full-text available
This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and pro...
Preprint
Full-text available
The GMRES method is used to solve sparse, non-symmetric systems of linear equations arising from many scientific applications. The solver performance within a single node is memory bound, due to the low arithmetic intensity of its computational kernels. To reduce the amount of data movement, and thus, to improve performance, we investigated the eff...
Article
Full-text available
Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different level...
Conference Paper
Full-text available
As the scale of high-performance computing (HPC) systems continues to grow, researchers are devoted themselves to explore increasing levels of parallelism to achieve optimal performance. The modern CPU’s design, including its features of hierarchical memory and SIMD/vectorization capability, governs algorithms’ efficiency. The recent introduction o...
Article
Each successive generation of computer architecture has brought new challenges to achieving high performance mathematical solvers, necessitating development and analysis of new algorithms, which are then embodied in software libraries. These libraries hide architectural details from applications, allowing them to achieve a level of portability acro...
Preprint
Full-text available
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cor...
Article
With the acquisition and widespread use of more resources that rely on accelerator/wide vector–based computing, there has been a strong demand for science and engineering applications to take advantage of these latest assets. This, however, has been extremely challenging due to the diversity of systems to support their extreme concurrency, complex...
Article
Machine learning and artificial intelligence (AI) applications often rely on performing many small matrix operations—in particular general matrix-matrix multiplication (GEMM). These operations are usually performed in a reduced precision, such as the 16-bit floating-point format (i.e., half precision or FP16). The GEMM operation is also very import...
Chapter
Half-precision computation refers to performing floating-point operations in a 16-bit format. While half-precision has been driven largely by machine learning applications, recent algorithmic advances in numerical linear algebra have discovered beneficial use cases for half precision in accelerating the solution of linear systems of equations at hi...
Chapter
Exascale computing aspires to meet the increasing demands from large scientific applications. Software targeting exascale is typically designed for heterogeneous architectures; henceforth, it is not only important to develop well-designed software, but also make it aware of the hardware architecture and efficiently exploit its power. Currently, sev...
Conference Paper
Full-text available
As the scale of high-performance computing (HPC) systems continues to grow, increasing levels of parallelism must be implored to achieve optimal performance. Recently, the processors support wide vector extensions, vectorization becomes much more important to exploit the potential peak performance of target architecture. Novel processor architectur...
Chapter
This chapter outlines a vision for how best to harness the computing continuum of interconnected sensors, actuators, instruments, and computing systems, from small numbers of very large devices to large numbers of very small devices. The hypothesis is that only via a continuum perspective one can intentionally specify desired continuum actions and...
Article
We propose two acceleration methods, namely, Fused and Gram, for reducing out‐of‐core data access when performing randomized singular value decomposition (RSVD) on graphics processing units (GPUs). Out‐of‐core data here are data that are too large to fit into the GPU memory at once. Both methods accelerate GPU‐enabled RSVD using the following three...
Technical Report
Full-text available
The main goal of this milestone was to help CEED-enabled ECP applications, including ExaSMR, MARBL, ExaWind and ExaAM, to improve their performance and capabilities on GPU systems like Summit and Lassen/Sierra. In addition, the CEED team also worked to: add and improve support for additional hardware and programming models in the CEED software comp...
Article
Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and paddin...
Article
A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- pl...
Technical Report
We present a parallel asynchronous Stochastic Gradient Descent algorithm for shared memory architectures. Different from previous asynchronous algorithms, we consider the case where the gradient updates are not particularly sparse. In the context of the MagmaDNN framework, we compare the parallel efficiency of the asynchronous implementation with t...
Book
Chapters "Parallel Adaptive Cross Approximation for the Multi-trace Formulation of Scattering Problems" and "A High-Order Discontinuous Galerkin Solver with Dynamic Adaptive Mesh Refinement to Simulate Cloud Formation Processes" are available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.
Conference Paper
Full-text available
This paper presents a quantitative evaluation of the power usage over time in data-intensive applications that use MapReduce over MPI. We leverage the PAPI powercap tool to identify ideal conditions for execution of our miniapplications in terms of (1) dataset characteristics (e.g., unique words in datasets); (2) system characteristics (e.g., KNL a...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Book
The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 71...
Chapter
This paper describes a hands-on Research Experiences for Computational Science, Engineering, and Mathematics (RECSEM) program in high-performance data sciences, data analytics, and machine learning on emerging computer architectures. RECSEM is a Research Experiences for Undergraduates (REU) site program supported by the USA National Science Foundat...
Chapter
In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, si...
Conference Paper
The SLATE (Software for Linear Algebra Targeting Exascale) library is being developed to provide fundamental dense linear algebra capabilities for current and upcoming distributed high-performance systems, both accelerated CPU-GPU based and CPU based. SLATE will provide coverage of existing ScaLAPACK functionality, including the parallel BLAS; line...
Technical Report
Full-text available
The goal of this milestone was the performance tuning of the CEED software, as well as the use and tuning of CEED to accelerate the first and second wave of targeted ECP applications. In this milestone, the CEED team developed optimization techniques and tuned for performance the CEED software to accelerate the first and second wave target ECP appl...
Article
Full-text available
Fast Fourier transforms (FFTs) are used in applications ranging from molecular dynamics and spectrum estimation to machine learning , fast convolution and correlation, signal modulation, wireless multimedia applications, and others. However, FFTs are memory bound, and therefore, to accelerate them, it is crucial to avoid and optimize the FFTs' comm...
Article
Full-text available
The objective of the Programming with OpenMP4 for Exascale Investigations (POMPEI) project is to explore new task-based programming techniques together with data structure centric programming for scientific applications to harness the potential of extreme-scale systems. Tasking is a well established by now approach on such systems as it has been us...