Ryan Chard

Ryan Chard
Argonne National Laboratory | ANL · Data Science and Learning

PhD Computer Science

About

77
Publications
16,084
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,042
Citations
Introduction
Ryan Chard currently works at the Data Science and Learning division, Argonne National Laboratory. Ryan does research in Distributed Computing, Parallel Computing, High Performance Computing, and Computer Communications (Networks).
Additional affiliations
June 2015 - August 2015
Argonne National Laboratory
Position
  • Research Associate
February 2015 - present
Victoria University of Wellington
Position
  • Lecturer
August 2014 - December 2014
University of Chicago
Position
  • Research Associate

Publications

Publications (77)
Preprint
Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are p...
Article
Full-text available
Serial synchrotron crystallography enables the study of protein structures under physiological temperature and reduced radiation damage by collection of data from thousands of crystals. The Structural Biology Center at Sector 19 of the Advanced Photon Source has implemented a fixed-target approach with a new 3D-printed mesh-holder optimized for sam...
Preprint
Full-text available
Research process automation--the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources--has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of...
Preprint
A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data are transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practi...
Preprint
Advancements in scientific instrument sensors and connected devices provide unprecedented insight into ongoing experiments and present new opportunities for control, optimization, and steering. However, the diversity of sensors and heterogeneity of their data result in make it challenging to fully realize these new opportunities. Organizing and syn...
Preprint
Full-text available
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analy...
Preprint
Full-text available
Serial synchrotron crystallography enables studies of protein structures under physiological temperature and reduced radiation damage by collection of data from thousands of crystals. The Structural Biology Center at Sector 19 of the Advanced Photon Source has implemented a fixed-target approach with a new 3D printed mesh-holder optimized for sampl...
Preprint
Beamlines at synchrotron light source facilities are powerful scientific instruments used to image samples and observe phenomena at high spatial and temporal resolutions. Typically, these facilities are equipped only with modest compute resources for the analysis of generated experimental datasets. However, high data rate experiments can easily gen...
Article
Full-text available
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that...
Presentation
Full-text available
A presentation I gave as part of the ML in HPC Environments workshop in November 2021 (https://ornl.github.io/MLHPC/cfp.html). It describes a library, Colmena, built for writing complex HPC applications that mix different types of computations. We also show how we've used it to find new molecules for batteries faster. Talk recording on YouTube: ht...
Preprint
Full-text available
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate...
Article
Full-text available
The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware-Accelerated Learning (HA...
Article
Rapid growth in data, computational methods, and computing power is driving a remarkable revolution in what variously is termed machine learning (ML), statistical learning, computational learning, and artificial intelligence. In addition to highly visible successes in machine-based natural language translation, playing the game Go, and self-driving...
Preprint
Full-text available
Recent advances in scientific instruments have resulted in dramatic increase in the volumes and velocities of data being generated in every-day laboratories. Scanning electron microscopy is one such example where technological advancements are now overwhelming scientists with critical data for montaging, alignment, and image segmentation -- key pra...
Conference Paper
Full-text available
We introduce Xtract, an automated and scalable system for bulk metadata extraction from large, distributed research data repositories. Xtract orchestrates the application of metadata extractors to groups of files, determining which extractors to apply to each file and, for each extractor and file, where to execute. A hybrid computing model, built o...
Article
Full-text available
Significance The 2′-O methyl group in Cap-1 is essential to protect viral RNA from host interferon-induced response. We determined crystal structures of SARS-CoV-2 Nsp10/16 heterodimer in complex with substrates (Cap-0 analog and S-adenosyl methionine) and products (Cap-1 analog and S-adenosyl-L-homocysteine) at room temperature using synchrotron s...
Preprint
Full-text available
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel non-covalent small-molecule inhibitor, MCULE-5948770040, that...
Preprint
Full-text available
Finding new ways to use artificial intelligence (AI) to accelerate the analysis of gravitational wave data, and ensuring the developed models are easily reusable promises to unlock new opportunities in multi-messenger astrophysics (MMA), and to enable wider use, rigorous validation, and sharing of developed models by the community. In this work, we...
Preprint
Full-text available
Flame Spray Pyrolysis (FSP) is a manufacturing technique to mass produce engineered nanoparticles for applications in catalysis, energy materials, composites, and more. FSP instruments are highly dependent on a number of adjustable parameters, including fuel injection rate, fuel-oxygen mixtures, and temperature, which can greatly affect the quality...
Preprint
Full-text available
The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later s...
Preprint
Full-text available
The genome of the SARS-CoV-2 coronavirus contains 29 proteins, of which 15 are nonstructural. Nsp10 and Nsp16 form a complex responsible for the capping of mRNA at the 5′ terminus. In the methylation reaction the S-adenosyl-L-methionine serves as the donor of the methyl group that is transferred to Cap-0 at the first transcribed nucleotide to creat...
Article
Machine Learning (ML) has become a critical tool enabling new methods of analysis and driving deeper understanding of phenomena across scientific disciplines. There is a growing need for “learning systems” to support various phases in the ML lifecycle. While others have focused on supporting model development, training, and inference, few have focu...
Preprint
IoT devices and sensor networks present new opportunities for measuring, monitoring, and guiding scientific experiments. Sensors, cameras, and instruments can be combined to provide previously unachievable insights into the state of ongoing experiments. However, IoT devices can vary greatly in the type, volume, and velocity of data they generate, m...
Preprint
Full-text available
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of...
Preprint
Full-text available
Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized acc...
Article
Full-text available
We explore how the function as a service paradigm can be used to address the computing challenges in experimental high-energy physics at CERN. As a case study, we use funcX—a high-performance function as a service platform that enables intuitive, flexible, efficient, and scalable remote function execution on existing infrastructure—to parallelize a...
Conference Paper
Full-text available
The use and reuse of scientific data is ultimately dependent on the ability to understand what those data represent, how they were captured, and how they can be used. In many ways, data are only as useful as the metadata available to describe them. Unfortunately, due to growing data volumes, large and distributed collaborations, and a desire to sto...
Conference Paper
Full-text available
The variety of instance types available on cloud platforms offers enormous flexibility to match the requirements of applications with available resources. However, selecting the most suitable instance type and configuring an application to optimally execute on that instance type can be complicated and time-consuming. For example, application parall...
Article
Facilitating the application of machine learning (ML) to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materials-specific ML models. Here, we present two projects, the Materials D...
Preprint
Full-text available
Growing data volumes and velocities are driving exciting new methods across the sciences in which data analytics and machine learning are increasingly intertwined with research. These new methods require new approaches for scientific computing in which computation is mobile, so that, for example, it can occur near data, be triggered by events (e.g....
Conference Paper
Full-text available
We report on our experiences deploying and operating Petrel, a data service designed to support science projects that must organize and distribute large quantities of data. Building on a high-performance 3.2 PB parallel file system and embedded in Argonne National Laboratory's 100+ Gbps network fabric, Petrel leverages Science DMZ concepts and Glob...
Conference Paper
Full-text available
In this paper we introduce the Data and Learning Hub for Science (DLHub). DLHub serves as a nexus for publishing, sharing, discovering, and reusing machine learning models. It provides a flexible publication platform that enables researchers to describe and deposit models by associating publication and model-specific metadata and assigning a persis...
Conference Paper
Full-text available
High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and...
Conference Paper
Full-text available
The Amazon Web Services spot market sells excess computing capacity at a reduced price and with reduced reliability guarantees. The low cost nature of the spot market has led to widespread adoption in industry and science. However, one of the challenges with using the spot market is that it is intentionally opaque and thus users have little underst...
Preprint
Full-text available
High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and...
Preprint
Full-text available
Facilitating the application of machine learning to materials science problems will require enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materials-specific machine learning models. Here, we present two projects, th...
Conference Paper
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. We present here the case for automating and outsourcing light source science using cloud-hosted data automation and enrichm...
Conference Paper
Full-text available
Cloud providers continue to expand and diversify their collection of leasable resources to meet the needs of an increasingly wide range of applications. While this flexibility is a key benefit of the cloud, it also creates a complex landscape in which users are faced with many resource choices for a given application. Suboptimal selections can both...
Preprint
Full-text available
While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant...
Conference Paper
Full-text available
Data publication systems are typically tailored to the requirements and processes of a specific domain, collaboration, and/or use case. We propose here an alternative approach to engineering such systems, based on customizable compositions of simple, independent platform services, each of which provides a distinct function such as identification, m...
Conference Paper
Full-text available
Exponential increases in data volumes and velocities are overwhelming finite human capabilities. Continued progress in science and engineering demands that we automate a broad spectrum of currently manual research data manipulation tasks, from data transfer and sharing to acquisition, publication, and analysis. These needs are particularly evident...
Conference Paper
Full-text available
We introduce an automated and scalable data collection, transfer, and processing pipeline for tomographic measurement of centimeter-sized mouse brains with sub-micrometer resolution. Parallelized computation is implemented in data processing to allow fast handling of tera-voxel sized datasets.
Conference Paper
Full-text available
As research processes become yet more collaborative and increasingly data-oriented, new techniques are needed to eeciently manage and automate the crucial, yet tedious, aspects of the data life-cycle. Researchers now spend considerable time replicating, cataloging , sharing, analyzing, and purging large amounts of data, distributed over vast storag...