Ewa Deelman

Ewa Deelman
  • University of Southern California

About

437
Publications
81,071
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
23,038
Citations
Current institution
University of Southern California
Additional affiliations
January 2002 - December 2009
University of Southern California
January 1998 - present
University of California, Los Angeles
January 1993 - December 1997
Rensselaer Polytechnic Institute

Publications

Publications (437)
Preprint
Full-text available
Quantum computing exhibits the unique capability to natively and efficiently encode various natural phenomena, promising theoretical speedups of several orders of magnitude. However, not all computational tasks can be efficiently executed on quantum machines, giving rise to hybrid systems, where some portions of an application run on classical mach...
Poster
Full-text available
This work explores the development and implementation of Communities of Practice (CoP) within “Big Science” Organizations. A CoP is a group of professionals who share a common interest, profession, or passion and actively engage in collaborative learning and knowledge sharing (Wenger, 1998). In the context of “big science” (e.g., large-scale observ...
Conference Paper
Full-text available
Introduction The declaration of COVID-19 as a world pandemic by the world health organization (WHO) in March 2020, and the subsequent imposition of lockdown, have had important and lasting implications on businesses and industries’ operations and communication across the globe. One of these implications is the fast adoption of technologies to achi...
Preprint
Anomaly detection in computational workflows is critical for ensuring system reliability and security. However, traditional rule-based methods struggle to detect novel anomalies. This paper leverages large language models (LLMs) for workflow anomaly detection by exploiting their ability to learn complex data patterns. Two approaches are investigate...
Technical Report
Full-text available
The 2024 Cyberinfrastructure for Major Facilities (CI4MFs) Workshop, organized by U.S. National Science Foundation (NSF) CI Compass, the NSF Cyberinfrastructure Center of Excellence, brought together cyberinfrastructure (CI) professionals from the NSF major and midscale research facilities along with participants from the broader CI ecosystem to di...
Article
Full-text available
The Statewide (formerly Southern) California Earthquake Center (SCEC) conducts multidisciplinary earthquake system science research that aims to develop predictive models of earthquake processes, and to produce accurate seismic hazard information that can improve societal preparedness and resiliency to earthquake hazards. As part of this program, S...
Poster
Full-text available
2023 Student Fellows Learning about Parallel and Distributed Computing Engagements Major Facilities (MFs) rely on complex cyberinfrastructure (CI) to transform raw data into more interoperable and integration-ready data products 1. Recognize the expertise, experience, and mission-focus of MFs 2. Contribute knowledge and expertise to the MF Data Lif...
Conference Paper
Full-text available
During the COVID-19 pandemic, organizations struggled with finding reliable information and effective adaptations, including science organizations. This study explores the information sources and decision-making strategies utilized by science organizations, more specifically, major facilities (MF, e.g., Academic Research Fleets, National Radio Astr...
Poster
Full-text available
Introduction NSF major facilities are complex scientific organizations including oceanography, astronomy, atmospheric science, geosciences, astrophysics, artic and cyberinfrastructure, with each comprising of data managers, scientists, and university professors. Their daily work involves discussions, meetings, budgeting, communication with funders...
Poster
Full-text available
Overview In this study, we explored the information sources and decision-making strategies utilized by science organizations, more specifically, major facilities. We focused on studying MFs as an example of science organizations because the origin of the crisis-COVID-19-is inherently a scientific problem in medical sciences. Through 3 phases of dat...
Article
Full-text available
NASA’s Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star’s radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly availab...
Poster
Full-text available
2023 Student Fellows Learning about Parallel and Distributed Computing Engagements Major Facilities (MFs) rely on complex cyberinfrastructure (CI) to transform raw data into more interoperable and integration-ready data products 1. Recognize the expertise, experience, and mission-focus of MFs 2. Contribute knowledge and expertise to the MF Data Lif...
Preprint
A computational workflow, also known as workflow, consists of tasks that must be executed in a specific order to attain a specific goal. Often, in fields such as biology, chemistry, physics, and data science, among others, these workflows are complex and are executed in large-scale, distributed, and heterogeneous computing environments that are pro...
Article
Identifying and addressing anomalies in complex, distributed systems can be challenging for reliable execution of scientific workflows. We model these workflows as directed acyclic graphs (DAGs), where the nodes and edges of the DAGs represent jobs and their dependencies, respectively. We develop graph neural networks (GNNs) to learn patterns in th...
Article
Over the past few years, due to the boom of advances in image processing, edge computing, and wireless networking, unpiloted aerial vehicles, often referred to as drones, have become an important enabler to support a wide variety of scientific applications, ranging from environmental monitoring, disaster response, and wildfire monitoring to the sur...
Preprint
Full-text available
NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley \textit{et al.} reported Bayesian parameter measurements of the mass and the radius of the star using pulse-profile modeling of the X-ray data. In this paper, we reproduce their result using the open-source software \texti...
Chapter
Data-driven application systems often depend on complex, data-intensive programs operating on distributed datasets that originate from a variety of scientific instruments and repositories to provide time-critical responses for observed phenomena in different areas of science, e.g., weather warning systems, seismology, and ocean sciences, among othe...
Article
Full-text available
Effective communication is vital for academic project success, particularly in multidisciplinary teams with diverse backgrounds and disciplines. Misunderstandings can arise from differing interpretations of terms, which may go unnoticed. VisDict aims to bridge this gap by creating a visual dictionary within a science gateway to facilitate clear com...
Conference Paper
Full-text available
Visual Cloud Computing (VCC) applications provide highly efficient solutions in video data processing pipelines on edge/cloud infrastructures. These applications and their infrastructures demand end-to-end monitoring and fine-grained application traffic control to meet user quality of experience requirements. In this paper, we propose a novel netwo...
Article
Full-text available
This paper presents an interdisciplinary effort to develop and share sustainable knowledge necessary to analyze, understand, and use published scientific results to advance reproducibility in multi-messenger astrophysics. Specifically, we target the breakthrough work associated with generating the first image of a black hole called M87. The Event H...
Preprint
Full-text available
Molecular dynamics (MD) simulations are widely used to study large-scale molecular systems. HPC systems are ideal platforms to run these studies, however, reaching the necessary simulation timescale to detect rare processes is challenging, even with modern supercomputers. To overcome the timescale limitation, the simulation of a long MD trajectory...
Article
Full-text available
Computational science depends on complex, data intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist’s workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfiguration, end-to-end infrastructure th...
Preprint
Most recent network failure diagnosis systems focused on data center networks where complex measurement systems can be deployed to derive routing information and ensure network coverage in order to achieve accurate and fast fault localization. In this paper, we target wide-area networks that support data-intensive distributed applications. We first...
Article
Scientific breakthroughs in biomolecular methods and improvements in hardware technology have shifted from a long‐running simulation to a large set of shorter simulations running simultaneously, called an ensemble. In an ensemble, simulations are usually coupled with analyses of data produced by the simulations. In situ methods can be used to analy...
Preprint
Full-text available
This paper presents an interdisciplinary effort aiming to develop and share sustainable knowledge necessary to analyze, understand, and use published scientific results to advance reproducibility in multi-messenger astrophysics. Specifically, we target the breakthrough work associated with the generation of the first image of a black hole, called M...
Article
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number...
Preprint
The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As...
Conference Paper
Image processing at scale is a powerful tool for creating new data sets and integrating them with existing data sets and performing analysis and quality assurance investigations. Workflow managers offer advantages in this type of processing, which involves multiple data access and processing steps. Generally, they enable automation of the workflow...
Preprint
Full-text available
Image processing at scale is a powerful tool for creating new data sets and integrating them with existing data sets and performing analysis and quality assurance investigations. Workflow managers offer advantages in this type of processing, which involves multiple data access and processing steps. Generally, they enable automation of the workflow...
Conference Paper
This paper summarizes the WORKS 2021 lightning talks, which cover four broad topics: (i) libEnsemble, a Python library to coordinate the concurrent evaluation of dynamic ensembles of calculations; (ii) Edu WRENCH, a set of online pedagogic modules that provides simulation-driven hands-on activity in the browser; (iii) VisDict, an envisioned visual...
Article
Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new a...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number...
Preprint
Full-text available
Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely imp...
Preprint
Full-text available
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upco...
Preprint
Full-text available
In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by eng...
Article
In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid...
Article
Performance evaluation is crucial to understanding the behavior of scientific workflows. In this study, we target an emerging type of workflow, called in situ workflows. These workflows tightly couple components such as simulation and analysis to improve overall workflow performance. To understand the tradeoffs of various configurable parameters fo...
Article
With the increased prevalence of employing workflows for scientific computing and a push towards exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand their impact on the underlying infrastructure and vice-versa. Such analysis can help drive the design, development, a...
Chapter
Convective weather events pose a challenge to the burgeoning low altitude aviation industry. Small aircraft are sensitive to winds and precipitation, but the uncertainty associated with forecasting and the frequency with which impactful weather occurs require an active detect and response system. In this paper, we propose a dynamic, data-driven dec...
Preprint
Full-text available
Increasing popularity of the serverless computing approach has led to the emergence of new cloud infrastructures working in Container-as-a-Service (CaaS) model like AWS Fargate, Google Cloud Run, or Azure Container Instances. They introduce an innovative approach to running cloud containers where developers are freed from managing underlying resour...
Preprint
In February 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. These workflows used code written by the LIGO Scientific Collaboration...
Preprint
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed at heterogeneous, distributed resources. The workflow research and development community has employed a number...
Article
Translational research (TR) has been extensively used in the health science domain, where results from laboratory research are translated to human studies and where evidence-based practices are adopted in real-world settings to reach broad communities. In computer science, much research stops at the result publication and dissemination stage withou...
Chapter
Performance evaluation is crucial to understanding the behavior of scientific workflows and efficiently utilizing resources on high-performance computing architectures. In this study, we target an emerging type of workflow, called in situ workflows. Through an analysis of the state-of-the-art research on in situ workflows, we model a theoretical fr...
Article
While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of a widely-used application-level model that...
Article
Full-text available
Large-scale scientific workflows rely heavily on high-performance file transfers. These transfers require strict quality parameters such as guaranteed bandwidth, no packet loss or data duplication. To have successful file transfers, methods such as predetermined thresholds and statistical analysis need to be done to determine abnormal patterns. Net...
Article
Full-text available
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows - common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, an...
Article
Full-text available
We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture...
Poster
Understanding the impacts of climate change on natural and human systems poses major challenges as it requires the integration of models and data across various disciplines, including hydrology, agriculture, ecosystem modeling, and econometrics. While tactical situations arising from an extreme weather event require rapid responses, integrating the...
Article
FABRIC is a unique national research infrastructure to enable cutting-edge and exploratory research at-scale in networking, cybersecurity, distributed computing and storage systems, machine learning, and science applications. It is an everywhere-programmable nationwide instrument comprised of novel extensible network elements equipped with large am...

Network

Cited By