Shantenu Jha

Shantenu Jha
Rutgers, The State University of New Jersey | Rutgers · Department of Electrical and Computer Engineering

About

355
Publications
53,531
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,474
Citations

Publications

Publications (355)
Preprint
As quantum hardware continues to scale, managing the heterogeneity of resources and applications -- spanning diverse quantum and classical hardware and software frameworks -- becomes increasingly critical. \textit{Pilot-Quantum} addresses these challenges as a middleware designed to provide unified application-level management of resources and work...
Preprint
Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software c...
Preprint
Full-text available
The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and...
Preprint
Full-text available
Geological records of past environmental change provide crucial information for assessing long-term climate variability, non-stationarity, and nonlinearities. However, reconstructing spatio-temporal fields from these records is statistically challenging due to their sparse, indirect, and noisy nature. Here, we present PaleoSTeHM, a scalable and mod...
Article
We address the increasing complexity of scientific workflows in the context of high-performance computing (HPC) and their associated need for robust, adaptable, and flexible computational support systems. We explore five key trends as well as future challenges and opportunities for scientific workflows and HPC technologies.
Preprint
Full-text available
Scientific discovery increasingly requires executing heterogeneous scientific workflows on high-performance computing (HPC) platforms. Heterogeneous workflows contain different types of tasks (e.g., simulation, analysis, and learning) that need to be mapped, scheduled, and launched on different computing. That requires a software stack that enables...
Preprint
Full-text available
Scientific discovery increasingly depends on middleware that enables the execution of heterogeneous workflows on heterogeneous platforms One of the main challenges is to design software components that integrate within the existing ecosystem to enable scale and performance across cloud and high-performance computing HPC platforms Researchers are me...
Preprint
When running at scale, modern scientific workflows require middleware to handle allocated resources, distribute computing payloads and guarantee a resilient execution. While individual steps might not require sophisticated control methods, bringing them together as a whole workflow requires advanced management mechanisms. In this work, we used RADI...
Article
Full-text available
Radiation exposure poses a significant threat to human health. Emerging research indicates that even low-dose radiation once believed to be safe, may have harmful effects. This perception has spurred a growing interest in investigating the potential risks associated with low-dose radiation exposure across various scenarios. To comprehensively explo...
Article
Full-text available
Future sea-level rise projections are characterized by both quantifiable uncertainty and unquantifiable structural uncertainty. Thorough scientific assessment of sea-level rise projections requires analysis of both dimensions of uncertainty. Probabilistic sea-level rise projections evaluate the quantifiable dimension of uncertainty; comparison of a...
Article
Full-text available
The formation of biomolecular materials via dynamical interfacial processes, such as self-assembly and fusion, for diverse compositions and external conditions can be efficiently probed using ensemble Molecular Dynamics (MD). However, this approach requires many simulations when investigating a large composition phase space. In addition, there is d...
Article
Full-text available
The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the enormous search space containing the candidates and the substantial computational cost of high-fidelity property predicti...
Article
Recent advances in cryo-electron microscopy (cryo-EM) have enabled modeling macromolecular complexes that are essential components of the cellular machinery. The density maps derived from cryo-EM experiments are often integrated with manual, knowledge-driven or artificial intelligence-driven and physics-guided computational methods to build, fit, a...
Preprint
Quantum computing promises potential for science and industry by solving certain computationally complex problems faster than classical computers. Quantum computing systems evolved from monolithic systems towards modular architectures comprising multiple quantum processing units (QPUs) coupled to classical computing nodes (HPC). With the increasing...
Preprint
Full-text available
It is generally desirable for high-performance computing (HPC) applications to be portable between HPC systems, for example to make use of more performant hardware, make effective use of allocations, and to co-locate compute jobs with large datasets. Unfortunately, moving scientific applications between HPC systems is challenging for various reason...
Conference Paper
Full-text available
Heterogeneous scientific workflows consist of numerous types of tasks that require executing on heterogeneous resources. Asynchronous execution of those tasks is crucial to improve resource utilization, task throughput and reduce workflows' makespan. Therefore, middleware capable of scheduling and executing different task types across heterogeneous...
Preprint
Full-text available
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given th...
Article
Full-text available
Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence,...
Preprint
Full-text available
Future sea-level rise projections are characterized by both quantifiable uncertainty and unquantifiable, structural uncertainty. Thorough scientific assessment of sea-level rise projections requires analysis of both dimensions of uncertainty. Probabilistic sea-level rise projections evaluate the quantifiable dimension of uncertainty; comparison of...
Chapter
Geoscience is now facing the huge potential enabled by the cyberinfrastructure, sensor network, big data, cloud computing, and data science. In this new era, what skills should geoscientists know and what actions can they take to foster new research topics? Are there already successful stories of data science in geosciences and what are the experie...
Chapter
Execution of heterogeneous workflows on high-performance computing (HPC) platforms present unprecedented resource management and execution coordination challenges for runtime systems. Task heterogeneity increases the complexity of resource and execution management, limiting the scalability and efficiency of workflow execution. Resource partitioning...
Article
Full-text available
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental m...
Preprint
Full-text available
We describe the design, implementation and performance of the RADICAL-Pilot task overlay (RAPTOR). RAPTOR enables the execution of heterogeneous tasks -- i.e., functions and executables with arbitrary duration -- on HPC platforms, providing high throughput and high resource utilization. RAPTOR supports the high throughput virtual screening requirem...
Preprint
Full-text available
Workflows applications are becoming increasingly important to support scientific discovery. That is leading to a proliferation of workflow management systems and, thus, to a fragmented software ecosystem. Integration among existing workflow tools can improve development efficiency and, ultimately, increase the sustainability of scientific workflow...
Preprint
Full-text available
The importance of ensemble computing is well established. However, executing ensembles at scale introduces interesting performance fluctuations that have not been well investigated. In this paper, we trace our experience uncovering performance fluctuations of ensemble applications (primarily constituting a workflow of GROMACS tasks), and unsuccessf...
Preprint
Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular, the introduction of AI/ML models into the traditional HPC workflows has been an enabler of highly accurate mode...
Preprint
Heterogeneous scientific workflows consist of numerous types of tasks and dependencies between them. Middleware capable of scheduling and submitting different task types across heterogeneous platforms must permit asynchronous execution of tasks for improved resource utilization, task throughput, and reduced makespan. In this paper we present an ana...
Preprint
Full-text available
The formation of biomolecular materials via dynamical interfacial processes such as self-assembly and fusion, for diverse compositions and external conditions, can be efficiently probed using ensemble Molecular Dynamics. However, this approach requires a large number of simulations when investigating a large composition phase space. In addition, th...
Preprint
Full-text available
The importance of workflows is highlighted by the fact that they have underpinned some of the most significant discoveries of the past decades. Many of these workflows have significant computational, storage, and communication demands, and thus must execute on a range of large-scale computer systems, from local clusters to public clouds and upcomin...
Article
Full-text available
This new dataset is an ensemble of solar photovoltaic energy production simulations over the continental US. The simulations are carried out in three steps. First, a weather forecast system is used for the predictions of incoming insolation; then, forecast ensembles with 21 members are generated using the Analog Ensemble technique; finally, each en...
Preprint
Full-text available
This chapter proposes and provides an in-depth discussion of a scalable solution for running ensemble simulation for solar energy production. Generating a forecast ensemble is computationally expensive. But with the help of Analog Ensemble, forecast ensembles can be generated with a single deterministic run of a weather forecast model. Weather ense...
Preprint
Recent advances in cryo-electron microscopy (cryo-EM) has enabled modeling macromolecular complexes that are essential components of life. The density maps obtained from cryo-EM experiments is often integrated with ab-initio, knowledge-driven or first principles-based computational methods to build, fit and refine protein structures inside the elec...
Article
Full-text available
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that...
Preprint
Full-text available
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental m...
Article
Full-text available
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie...
Preprint
Full-text available
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projec...
Article
Rapid growth in data, computational methods, and computing power is driving a remarkable revolution in what variously is termed machine learning (ML), statistical learning, computational learning, and artificial intelligence. In addition to highly visible successes in machine-based natural language translation, playing the game Go, and self-driving...
Preprint
Full-text available
Effective selection of the potential candidates that meet certain conditions in a tremendously large search space has been one of the major concerns in many real-world applications. In addition to the nearly infinitely large search space, rigorous evaluation of a sample based on the reliable experimental or computational platform is often prohibiti...
Preprint
Full-text available
Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous soft...
Article
Many extreme scale scientific applications have workloads comprised of a large number of individual highperformance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execut...
Article
Full-text available
COVID-19 has claimed more than 2.7 × 106 lives and resulted in over 124 × 106 infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. We discuss innovations in computational infrastructure and methods that are accelerating and advancing drug design. Specifically, we describe several methods that integrate artificial intel...
Preprint
Full-text available
We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million ``in-stock'' molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale H...
Article
Full-text available
Very High Resolution satellite and aerial imagery are used to monitor and conduct large scale surveys of ecological systems. Convolutional Neural Networks have successfully been employed to analyze such imagery to detect large animals and salient features. As the datasets increase in volume and number of images, utilizing High Performance Computing...
Chapter
Molecular dynamics or MD simulation is gradually maturing into a tool for constructing in vivo models of living cells in atomistic details. The feasibility of such models is bolstered by integrating the simulations with data from microscopic, tomographic and spectroscopic experiments on exascale supercomputers, facilitated by the use of deep learni...
Article
Full-text available
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynami...
Preprint
Full-text available
The use of ML methods to dynamically steer ensemble-based simulations promises significant improvements in the performance of scientific applications. We present DeepDriveMD, a tool for a range of prototypical ML-driven HPC simulation scenarios, and use it to quantify improvements in the scientific performance of ML-driven ensemble-based applicatio...
Preprint
Full-text available
Many science and industry IoT applications necessitate data processing across the edge-to-cloud continuum to meet performance, security, cost, and privacy requirements. However, diverse abstractions and infrastructures for managing resources and tasks across the edge-to-cloud scenario are required. We propose Pilot-Edge as a common abstraction for...
Preprint
A vast and growing number of IoT applications connect physical devices, such as scientific instruments, technical equipment, machines, and cameras, across heterogenous infrastructure from the edge to the cloud to provide responsive, intelligent services while complying with privacy and security requirements. However, the integration of heterogeneou...
Preprint
Full-text available
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel non-covalent small-molecule inhibitor, MCULE-5948770040, that...
Preprint
Full-text available
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upco...
Preprint
Full-text available
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie...
Preprint
Full-text available
Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execu...
Preprint
Full-text available
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynami...
Article
Full-text available
The accurate sampling of protein dynamics is an ongoing challenge despite the utilization of high-performance computer (HPC) systems. Utilizing only "brute force" molecular dynamics (MD) simulations requires an unacceptably long time to solution. Adaptive sampling methods allow a more effective sampling of protein dynamics than standard MD simulati...