Doru Thom PopoviciLawrence Berkeley National Laboratory | LBL · Computational Research Division (CRD)
Doru Thom Popovici
Doctor of Philosophy
About
31
Publications
2,434
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
298
Citations
Introduction
Skills and Expertise
Publications
Publications (31)
Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to off...
With the growing reliance of modern supercomputers on accelerator-based architecture such a graphics processing units (GPUs), the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development GPU accelerated, distri...
Transformer-based models, such as BERT and ViT, have achieved state-of-the-art results across different natural language processing (NLP) and computer vision (CV) tasks. However, these models are extremely memory intensive during their fine-tuning process, making them difficult to deploy on GPUs with limited memory resources. To address this issue,...
With the growing reliance of modern supercomputers on accelerator-based architectures such a GPUs, the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development of GPU accelerated, distributed memory algorithms...
Information-theoretic community discovery method (popularly known as Infomap) is known for delivering better quality results in the Lancichinetti–Fortunato–Radicchi (LFR) benchmark compared to modularity-based algorithms. Parallel algorithms have been developed for Infomap due to the computational challenge of analyzing massive graphs resulting fro...
Small prime-sized discrete Fourier transforms appear in various applications from quantum mechanics, material sciences and machine learning. The typical implementation of the discrete Fourier transform for such problem sizes is done as a cyclic convolution using algorithms like Rader or Bluestein. However, these approaches exhibit extra computation...
We present the design of a solver for the efficient and high-throughput computation of the marginalized graph kernel on General Purpose GPUs. The graph kernel is computed using conjugate gradient to solve a generalized Laplacian of the tensor product between a pair of graphs. To cope with the large gap between the instruction throughput and the mem...
Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to in...
In this paper, we address the question of how to automatically map computational kernels to highly efficient code for a wide range of computing platforms and establish the correctness of the synthesized code. More specifically, we focus on two fundamental problems that software developers are faced with: performance portability across the ever-chan...
Achieving high performance for compute bounded numerical kernels typically requires an expert to hand select an appropriate set of Single-instruction multiple-data (SIMD) instructions, then statically scheduling them in order to hide their latency while avoiding register spilling in the process. Unfortunately, this level of control over the code fo...
Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill...
Real-time system level implementations of complex Synthetic Aperture Radar (SAR) image reconstruction algorithms have always been challenging due to their data intensive characteristics. In this paper, we propose a basis vector transform based novel algorithm to alleviate the data intensity and a 3D-stacked logic in memory based hardware accelerato...
Formal behavioral models of software services are used as input by analysis tools which check their properties on hand of the given models. However, there is a gap between the real systems which have to be validated and their abstract models. This work proposes to bridge this gap by tools which extract behavioral models from software services imple...
Many techniques used for discovering faults and vulnerabilities in distributed systems and services require as inputs formal behavioral models of the systems under validation. Such models are traditionally written by hand, according to the specifications which are known, leading to a gap between the real systems which have to be validated and their...