Purushotham Bangalore

Purushotham Bangalore
University of Alabama | UA · Department of Computer Science

Doctor of Philosophy

About

110
Publications
10,067
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
886
Citations
Citations since 2016
23 Research Items
375 Citations
2016201720182019202020212022020406080
2016201720182019202020212022020406080
2016201720182019202020212022020406080
2016201720182019202020212022020406080
Introduction
Dr. Bangalore’s current research work focuses on designing higher-level abstractions to support parallel programs for heterogeneous architectures and next-generation High Performance Computing (HPC) platforms emphasizing predictive performance and portability. He also undertakes research in improving the fault-tolerance of message-passing middleware and security and reliability of storage systems for Exascale systems.

Publications

Publications (110)
Article
Partitioned point-to-point communication primitives provide a performance-oriented mechanism to support a hybrid parallel programming model and have been included in the upcoming MPI-4.0 standard. These primitives enable an MPI library to transfer parts of the data buffer while the application provides partial contributions using multiple threads o...
Preprint
Full-text available
Over the past two decades, C++ has been adopted as a major HPC language (displacing C to a large extent, andFortran to some degree as well). Idiomatic C++ is clearly how C++ is being used nowadays. But, MPIs syntax and semantics defined and extended with C and Fortran interfaces that align with the capabilities and limitations of C89 and Fortran-77...
Article
The Exascale Computing Project (ECP) focuses on the development of future exascale‐capable applications. Most ECP applications use the message passing interface (MPI) as their parallel programming model with mini‐apps serving as proxies. This paper explores the explicit usage of MPI in such ECP proxy applications. We empirically analyze 14 proxy ap...
Article
Full-text available
Background Currently recommended traditional spirometry outputs do not reflect their relative contributions to airflow, and we hypothesized that machine learning algorithms can be trained on spirometry data to identify these structural phenotypes.Methods Participants enrolled in a large multicenter study (COPDGene) were included. The data points fr...
Conference Paper
This paper offers a timely study and proposed clarifications, revisions, and enhancements to the Message Passing Interface's (MPI's) Semantic Terms and Conventions. To enhance MPI, a clearer understanding of the meaning of the key terminology has proven essential, and, surprisingly, important concepts remain underspecified, ambiguous and, in some c...
Preprint
Full-text available
This paper documents the experience improving the performance of a data processing workflow for analysis of the Human Connectome Project's HCP900 data set. It describes how network and compute bottlenecks were discovered and resolved during the course of a science engagement. A series of computational enhancements to the stock FSL BedpostX workflow...
Conference Paper
Pregel-like systems are popular for iterative graph processing thanks to their user-friendly vertex-centric programming model. However, existing Pregel-like systems only adopt a naïve checkpointing approach for fault tolerance, which saves a large amount of data about the state of computation and significantly degrades the failure-free execution pe...
Conference Paper
The rapid advancements in high performance computing hardware and corresponding rise in deep convolutional neural network (CNN) architectures have led to state-of-the-art results in several biomedical image segmentation tasks. Recently, U-Net, a modified fully convolutional network, has become the state-of-the-art in various two-dimensional and thr...
Conference Paper
Exascale computing demands high bandwidth and low latency I/O on the computing edge. Object storage systems can provide higher bandwidth and lower latencies than tape archive. File transfer nodes present a single point of mediation through which data moving between these storage systems must pass. By increasing the performance of erasure coding, st...
Conference Paper
This paper addresses performance-portability and overall performance issues when derived datatypes are used with four MPI implementations: Open MPI, MPICH, MVAPICH2, and Intel MPI. These comparisons are particularly relevant today since most vendor implementations are now based on Open MPI or MPICH rather than on vendor proprietary code as was more...
Article
Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI-1. For regular computations with fixed communication patterns, significant additional optimizations can be revealed through the use of persistence (planned transfers) not currently available in the MPI-3 API except for a...
Conference Paper
Full-text available
This paper documents the experience improving the performance of a data processing workflow for analysis of the Human Connectome Project's HCP900 data set. It describes how network and compute bottlenecks were discovered and resolved during the course of a science engagement. A series of computational enhancements to the stock FSL BedpostX workflow...
Conference Paper
High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost,...
Conference Paper
This paper describes Petal, a prototype tool that uses compiler-analysis techniques to automate code transformations to hide communication costs behind computation by replacing blocking MPI functions with corresponding nonblocking and persistent collective operations while maintaining legacy applications' correctness. In earlier work, we have alrea...
Conference Paper
Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI-1. For regular computations with fixed communication patterns, more optimizations can be revealed through the use of persistence (planned transfers) not currently available in the MPI-3 API except for a limited form of p...
Conference Paper
Graph analytics systems have gained significant popularity due to the prevalence of graph data. Many of these systems are designed to run in a shared-nothing architecture whereby a cluster of machines can process a large graph in parallel. In more recent proposals, others have argued that a single-machine system can achieve better performance and/o...
Conference Paper
Full-text available
Multi-threaded performance in MPI is of concern for future systems, particularly at Exascale, where massive concur-rency will be necessary to leverage the full power of systems. While MPI provides generalized solutions and additional proposals like endpoints expand this general model, examining common use cases that have good solutions that may not...
Conference Paper
MPI is insufficient when confronting failures. FA-MPI (Fault-Aware MPI) provides extensions to the MPI standard designed to enable data-parallel applications to achieve resilience without sacrificing scalability. FA-MPI introduces transactions as a novel extension to the MPI message-passing model. Transactions support failure detection, isolation,...
Article
We compare and contrast the approaches and key features of two proposals for fault-tolerant MPI: User-Level Failure Mitigation (UFLM) and Fault-Aware MPI (FA-MPI). We show how they are complementary and also how they could leverage each other through modifications and/or extensions. We show how to "weaken" and extend ULFM to help integrate it with...
Conference Paper
Full-text available
A parallel file system (PFS) is often used to store intermediate results and checkpoint/restart files in a high performance computing (HPC) system. Multiple applications running on an HPC system often access PFSs concurrently resulting in degraded and variable I/O performance. By managing PFS accesses, these sharing induced inefficiencies can be co...
Article
Full-text available
A systems perspective on diverse phenotypes, mechanisms of infection, and responses to environmental stresses can lead to considerable advances in agriculture and medicine. A significant promise of systems biology within plants is the development of disease-resistant crop varieties, which would maximize yield output for food, clothing, building mat...
Chapter
There are several ongoing research efforts in the High Performance Computing (HPC) domain that are employing Domain-Specific Languages (DSLs) as the means of augmenting end-user productivity. A discussion on some of the research efforts that can positively impact the end-user productivity without negatively impacting the application performance is...
Conference Paper
The tremendous growth and diversification in the area of computer architectures has contributed towards an upsurge in the number of parallel programing paradigms, languages, and environments. However, it is often difficult for domain-experts to develop expertise in multiple programming paradigms and languages in order to write performance-oriented...
Article
Full-text available
The highly collaborative research sponsored by the NSF-funded Assembling the Porifera Tree of Life (PorToL) project is providing insights into some of the most difficult questions in metazoan systematics. Our understanding of phylogenetic relationships within the phylum Porifera has changed considerably with increased taxon sampling and data from a...
Article
Full-text available
As the computation power in desktops advances, parallel programming has emerged as one of the essential skills needed by next generation software engineers. However, programs written in popular parallel programming paradigms have a substantial amount of sequential code mixed with the parallel code. Several such versions supporting different platfor...
Article
This paper presents an overview of our experiments in integrating modern software engineering tools and techniques with the process of developing parallel applications for distributed memory architectures. The main goal was to determine the methods that have the potential of reducing the complexities associated with explicit parallelization. We exp...
Article
While creating a parallel version of a sequential program, some code sections may be duplicated in the translated version, which can hinder the evolution of the newly created program. This can be prevented if parallel sections of a program can be separated from the sequential sections. In this paper, we introduce a transformation language, called M...
Article
Full-text available
Growth in availability of data collection devices has allowed individual researchers to gain access to large quantities of data that needs to be analyzed. As a result, many labs and departments have acquired considerable compute resources. However, effective and efficient utilization of those resources remains a barrier for the individual researche...
Article
Message Passing Interface (MPI) is the most popular standard for writing portable and scalable parallel applications for distributed memory architectures. Writing efficient parallel applications using MPI is a complex task, mainly due to the extra burden on programmers to explicitly handle all the complexities of message-passing (viz., inter-proces...
Article
There are several ongoing research efforts in the High Performance Computing (HPC) domain that are employing Domain-Specific Languages (DSLs) as the means of augmenting end-user productivity. A discussion on some of the research efforts that can positively impact the end-user productivity without negatively impacting the application performance is...
Chapter
Grid computing environments are dynamic and heterogeneous in nature. In order to realize application-specific Quality of Service agreements within a grid, specifications at the level of an application are required. This chapter introduces an XML-based schema language (called the Application Specification Language, ASL) and a corresponding modeling...
Conference Paper
Full-text available
Programming languages that can utilize the underlying parallel architecture in shared memory, distributed memory or Graphics Processing Units (GPUs) are used extensively for solving scientific problems. However, from our observation of studying multiple parallel programs from various domains, such programming languages have a substantial amount of...
Article
Grid computing environments are characterized by resource heterogeneity that leads to heterogeneous application execution characteristics. This is due to the application-resource dependency and changing the availability of the underlying resources. In order to recognize and understand such dependencies, there is a need to capture and study the beha...
Article
Full-text available
Huffman compression is a statistical, lossless, data compression algorithm that compresses data by assigning variable length codes to symbols, with the more frequently appearing symbols given shorter codes than the less. This work is a modification of the Huffman algorithm which permits uncompressed data to be decomposed into indepen- dently compre...
Article
Sequence analysis has become essential to the study of genomes and biological research in general. Basic Local Alignment Search Tool (BLAST) leads the way as the most accepted method for performing necessary query searches and analysis of discovered genes. Combating growing data sizes, with the goal of speeding up job runtimes, scientist are resort...
Conference Paper
Full-text available
Graphical Processing Unit (GPU) programming languages are used extensively for general-purpose computations. However, GPU programming languages are at a level of abstraction suitable only for use by expert parallel programmers. This paper presents a new approach through which `C' or Java programmers can access these languages without having to focu...
Article
One of the key elements required for writing self-healing applications for distributed and dynamic computing environments is checkpointing. Checkpointing is a mechanism by which an application is made resilient to failures by storing its state periodically to the disk. The main goal of this research is to enable non-invasive reengineering of existi...
Conference Paper
Full-text available
Performance of any one application is more often than not very intimately related to the hardware and software characteristics of a resource the application is being executed on, as well as the use of application parameters during job instantiation. As a result, execution of applications and associated user jobs in heterogeneous grid environments e...
Article
Full-text available
Aspect-oriented programming (AOP) provides assistance in modularizing concerns that crosscut the boundaries of system decomposition. Aspects have the potential to interact with many different kinds of language constructs in order to modularize crosscutting concerns. Although several aspect languages have demonstrated advantages in applying aspects...
Conference Paper
Full-text available
Although access to grid resources is realized through a standardized interface, independent grid resources are not only managed autonomously but are also accessed as independent entities. Such environment results in configuration differences among individual resources forcing users that access those resources to deal with the variability in resourc...
Conference Paper
In this research, a Framework for Synthesizing Parallel Applications (FraSPA) in a user-guided manner is being developed. The FraSPA would facilitate the synthesis of parallel applications from existing sequential applications and middleware components for multiple-platforms and diverse domains. The framework design is based upon design patterns an...
Article
Scientific applications usually involve large number of distributed and dynamic resources and huge datasets. A mechanism like checkpointing is essential to make these applications resilient to failures. Using checkpointing as an example, this paper presents an approach for integrating the latest software engineering techniques with the development...
Article
Full-text available
The development of a fully coupled and parallel hydraulics, sediment, and wave model for simulation of physical marine processes has impact on many national defense and security operations. This report describes the development of a parallel version of a combined hydraulics and sediment model - a key component for the fully coupled model. The paral...
Article
This paper describes our experiences building and working with the reference implementation of myVocs (my Virtual Organization Collaboration System). myVocs provides a flexible environment for exploring new approaches to security, application development, and access control built from Internet services without a central identity repository. The myV...
Article
Grid computing has emerged as the next generation computing platform. Because of the resource heterogeneity that exists in the grid environment, user jobs experience variable performance. Grid job scheduling, or selection of appropriate mappings between resources and the application, with the goal of leveraging available capacity and imposed requir...
Chapter
Grid computing environments are dynamic and heterogeneous in nature. In order to realize application- specific Quality of Service agreements within a grid, specifications at the level of an application are required. This chapter introduces an XML-based schema language (called the Application Specification Language, ASL) and a corresponding modeling...
Chapter
Grid computing environments are dynamic and heterogeneous in nature. In order to realize application- specific Quality of Service agreements within a grid, specifications at the level of an application are required. This chapter introduces an XML-based schema language (called the Application Specification Language, ASL) and a corresponding modeling...
Chapter
Grid computing environments are dynamic and heterogeneous in nature. In order to realize application-specific Quality of Service agreements within a grid, specifications at the level of an application are required. This chapter introduces an XML-based schema language (called the Application Specification Language, ASL) and a corresponding modeling...
Conference Paper
Full-text available
Embarrassingly parallel applications represent an important workload in today's grid environments. Scheduling and execution of this class of applications is considered mostly a trivial and well-understood process on homogeneous clusters. However, while grid environments provide the necessary computational resources, associated resource heterogeneit...
Conference Paper
Checkpointing is one of the key requirements for writing fault-tolerant and flexible applications for dynamic and distributed environments like the Grid. Certain patterns are observed in the implementation of the application-level Checkpointing and Restart (CaR) mechanism across myriad of applications. These patterns indicate that a higher level of...
Conference Paper
Full-text available
Basic Local Alignment Search Tool (BLAST) is a heavily used bioinformatics application that has gotten significant attention from the high performance computing community. The authors have taken BLAST execution a step further and enabled it to execute on grid resources. Adapting BLAST to execute on the grid brings up concerns regarding grid resourc...
Conference Paper
Full-text available
From our experience in developing and deploying research applications on a regional grid infrastructure (SURAgrid, www.sura.org/suragrid), we observe that there are significant entry-barriers to "grid-enable" applications, and therefore, to realize the full benefit of a grid environment. In order to increase both the number and the variety of appli...
Conference Paper
Many computationally intense workflows are composed of the same algorithm applied to many data sets. For example, it is good practice in statistical genetics to assess the validity of a method by simulating thousands of datasets of known properties. Further each simulation may involve using permutation tests that necessitate repeating analyses thou...
Article
Full-text available
Predominantly, the Grid world has focused on discovering and using hardware solutions for executing scientific and mainstream applications. The applications for Grid are typically handcrafted and also assume the presence of an expert user, thus, making this process error-prone. This paper presents a framework, called GridFrame, whose vision is to r...
Article
Grid computing environments are dynamic and heterogeneous in nature. In order to realize application-specific Quality of Service agreements within a grid, specifications at the level of an application are required. This chapter introduces an XML-based schema language (called the Application Specification Language, ASL) and a corresponding modeling...
Conference Paper
Full-text available
BLAST is a commonly used bioinformatics application for performing query searches and analysis of biological data. As the amount of search data increases so does job search times. As means of reducing job turnaround times, scientists are resorting to new technologies such as grid computing to obtain needed computational and storage resources. Inher...
Conference Paper
Full-text available
Computational grid is an upcoming environment based on open standards and visualization of individual resources created with a goal of supporting collaboration and resource sharing. As the technology advances and user base grows; grid computing will advance beyond research institutions and find its place in the commercial sector, in need of support...
Conference Paper
Full-text available
Grid computin gi s the computing infrastructure of the next century where unlimited hardware and software resources are delivered to user's fingertips. Much of the power delivered by grid computing is realized throug h applicatio ns oftwar em ade readily available to its users. The process of application deployment an dd eliverance to the end-users...
Conference Paper
Developing and debugging parallel programs particularly for distributed memory architectures is still a difficult task. The most popular approach to developing parallel programs for distributed memory architectures requires adding explicit message passing calls into existing sequential programs for data distribution, coordination, and communication...