Ewa Deelman

Ewa Deelman
University of Southern California | USC · Information Sciences Institute

About

386
Publications
62,591
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,165
Citations
Additional affiliations
January 2002 - December 2009
University of Southern California
January 1998 - present
University of California, Los Angeles
January 1993 - December 1997
Rensselaer Polytechnic Institute

Publications

Publications (386)
Article
Full-text available
Computational science depends on complex, data intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist’s workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfiguration, end-to-end infrastructure th...
Preprint
Most recent network failure diagnosis systems focused on data center networks where complex measurement systems can be deployed to derive routing information and ensure network coverage in order to achieve accurate and fast fault localization. In this paper, we target wide-area networks that support data-intensive distributed applications. We first...
Article
Scientific breakthroughs in biomolecular methods and improvements in hardware technology have shifted from a long‐running simulation to a large set of shorter simulations running simultaneously, called an ensemble. In an ensemble, simulations are usually coupled with analyses of data produced by the simulations. In situ methods can be used to analy...
Preprint
Full-text available
This paper presents an interdisciplinary effort aiming to develop and share sustainable knowledge necessary to analyze, understand, and use published scientific results to advance reproducibility in multi-messenger astrophysics. Specifically, we target the breakthrough work associated with the generation of the first image of a black hole, called M...
Article
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number...
Preprint
The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As...
Conference Paper
Image processing at scale is a powerful tool for creating new data sets and integrating them with existing data sets and performing analysis and quality assurance investigations. Workflow managers offer advantages in this type of processing, which involves multiple data access and processing steps. Generally, they enable automation of the workflow...
Preprint
Full-text available
Image processing at scale is a powerful tool for creating new data sets and integrating them with existing data sets and performing analysis and quality assurance investigations. Workflow managers offer advantages in this type of processing, which involves multiple data access and processing steps. Generally, they enable automation of the workflow...
Article
Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new a...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number...
Preprint
Full-text available
Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely imp...
Preprint
Full-text available
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upco...
Preprint
Full-text available
In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by eng...
Article
In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid...
Article
Performance evaluation is crucial to understanding the behavior of scientific workflows. In this study, we target an emerging type of workflow, called in situ workflows. These workflows tightly couple components such as simulation and analysis to improve overall workflow performance. To understand the tradeoffs of various configurable parameters fo...
Article
With the increased prevalence of employing workflows for scientific computing and a push towards exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand their impact on the underlying infrastructure and vice-versa. Such analysis can help drive the design, development, a...
Chapter
Convective weather events pose a challenge to the burgeoning low altitude aviation industry. Small aircraft are sensitive to winds and precipitation, but the uncertainty associated with forecasting and the frequency with which impactful weather occurs require an active detect and response system. In this paper, we propose a dynamic, data-driven dec...
Preprint
Full-text available
Increasing popularity of the serverless computing approach has led to the emergence of new cloud infrastructures working in Container-as-a-Service (CaaS) model like AWS Fargate, Google Cloud Run, or Azure Container Instances. They introduce an innovative approach to running cloud containers where developers are freed from managing underlying resour...
Preprint
In February 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. These workflows used code written by the LIGO Scientific Collaboration...
Preprint
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed at heterogeneous, distributed resources. The workflow research and development community has employed a number...
Article
Translational research (TR) has been extensively used in the health science domain, where results from laboratory research are translated to human studies and where evidence-based practices are adopted in real-world settings to reach broad communities. In computer science, much research stops at the result publication and dissemination stage withou...
Chapter
Performance evaluation is crucial to understanding the behavior of scientific workflows and efficiently utilizing resources on high-performance computing architectures. In this study, we target an emerging type of workflow, called in situ workflows. Through an analysis of the state-of-the-art research on in situ workflows, we model a theoretical fr...
Article
While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of a widely-used application-level model that...
Article
Full-text available
Large-scale scientific workflows rely heavily on high-performance file transfers. These transfers require strict quality parameters such as guaranteed bandwidth, no packet loss or data duplication. To have successful file transfers, methods such as predetermined thresholds and statistical analysis need to be done to determine abnormal patterns. Net...
Article
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows - common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, an...
Article
Full-text available
We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture...
Poster
Understanding the impacts of climate change on natural and human systems poses major challenges as it requires the integration of models and data across various disciplines, including hydrology, agriculture, ecosystem modeling, and econometrics. While tactical situations arising from an extreme weather event require rapid responses, integrating the...
Article
FABRIC is a unique national research infrastructure to enable cutting-edge and exploratory research at-scale in networking, cybersecurity, distributed computing and storage systems, machine learning, and science applications. It is an everywhere-programmable nationwide instrument comprised of novel extensible network elements equipped with large am...
Conference Paper
With the continued rise of scientific computing and the enormous increases in the size of data being processed, scientists must consider whether the processes for transmitting and storing data sufficiently assure the integrity of the scientific data. When integrity is not preserved, computations can fail and result in increased computational cost d...
Preprint
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows---common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, an...
Conference Paper
Cybersecurity, which serves to protect computer systems and data from malicious and accidental abuse and changes, both supports and challenges the reproducibility of computational science. This position paper explores a research agenda by enumerating a set of two types of challenges that emerge at the intersection of cybersecurity and reproducibili...
Conference Paper
The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data) provides an abstract taxonomy to represent computational experiments and promote reproducibility by design. In this paper, we employ a post-hoc assessment of the model applicability to a set of Laser Interferometer Gravitational-...
Article
Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. Although the scientific co...
Chapter
While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of a widely-used application-level model that...
Article
Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance fo...
Article
Full-text available
Genetic analyses of ancestrally diverse populations show evidence of heterogeneity across ancestries and provide insights into clinical implications, highlighting the importance of including ancestrally diverse populations to maximize genetic discovery and reduce health disparities.
Article
Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we descri...
Article
Since 2001 the Pegasus Workflow Management System has evolved into a robust and scalable system that automates the execution of a number of complex applications running on a variety of heterogeneous, distributed high-throughput, and high-performance computing environments. Pegasus was built on the principle of separation between the workflow descri...
Preprint
Full-text available
The growing popularity of workflows in the cloud domain promoted the development of sophisticated autoscaling policies that allow automatic allocation and deallocation of resources. However, many state-of-the-art autoscaling policies for workflows are mostly plan-based or designed for batches (ensembles) of workflows. This reduces their flexibility...
Preprint
Science reproducibility is a cornerstone feature in scientific workflows. In most cases, this has been implemented as a way to exactly reproduce the computational steps taken to reach the final results. While these steps are often completely described, including the input parameters, datasets, and codes, the environment in which these steps are exe...
Preprint
The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data), provides an abstract taxonomy to represent computational experiments and enforce reproducibility by design. In this paper, we assess the model applicability to a set of Laser Interferometer Gravitational-Wave Observatory (LIGO)...
Conference Paper
Understanding the interactions between natural processes and human activities poses major challenges as it requires the integration of models and data across disparate disciplines. It typically takes many months and even years to create valid end-to-end simulations as different models need to be configured in consistent ways and generate data that...
Preprint
Full-text available
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of the role of human variation in disease, and hinders the equitable translation of genetic associatio...
Conference Paper
Full-text available
The function of a protein depends on its three-dimensional structure. Current approaches based on homology for predicting a given protein's function do not work well at scale. In this work, we propose a representation of proteins that explicitly encodes secondary and tertiary structure into fix-sized images. In addition, we present a neural network...
Conference Paper
Full-text available
Computational science researchers running large-scale scientific workflow applications often want to run their workflows on the largest available compute systems to improve time to solution. Workflow tools used in distributed, heterogeneous, high performance computing environments typically rely on either a push-based or a pull-based approach for r...
Conference Paper
Full-text available
Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance fo...