Zhuozhao Li

Zhuozhao Li
University of Virginia | UVa · Department of Computer Science

About

46
Publications
5,690
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
658
Citations
Introduction
Skills and Expertise

Publications

Publications (46)
Preprint
funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators,...
Preprint
Full-text available
Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be execut...
Article
Full-text available
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that...
Conference Paper
Full-text available
We introduce Xtract, an automated and scalable system for bulk metadata extraction from large, distributed research data repositories. Xtract orchestrates the application of metadata extractors to groups of files, determining which extractors to apply to each file and, for each extractor and file, where to execute. A hybrid computing model, built o...
Preprint
Full-text available
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel non-covalent small-molecule inhibitor, MCULE-5948770040, that...
Preprint
Full-text available
The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later s...
Article
Machine Learning (ML) has become a critical tool enabling new methods of analysis and driving deeper understanding of phenomena across scientific disciplines. There is a growing need for “learning systems” to support various phases in the ML lifecycle. While others have focused on supporting model development, training, and inference, few have focu...
Preprint
Full-text available
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of...
Preprint
Full-text available
Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized acc...
Article
Full-text available
We explore how the function as a service paradigm can be used to address the computing challenges in experimental high-energy physics at CERN. As a case study, we use funcX—a high-performance function as a service platform that enables intuitive, flexible, efficient, and scalable remote function execution on existing infrastructure—to parallelize a...
Conference Paper
Full-text available
The use and reuse of scientific data is ultimately dependent on the ability to understand what those data represent, how they were captured, and how they can be used. In many ways, data are only as useful as the metadata available to describe them. Unfortunately, due to growing data volumes, large and distributed collaborations, and a desire to sto...
Conference Paper
Full-text available
The variety of instance types available on cloud platforms offers enormous flexibility to match the requirements of applications with available resources. However, selecting the most suitable instance type and configuring an application to optimally execute on that instance type can be complicated and time-consuming. For example, application parall...
Preprint
Full-text available
Growing data volumes and velocities are driving exciting new methods across the sciences in which data analytics and machine learning are increasingly intertwined with research. These new methods require new approaches for scientific computing in which computation is mobile, so that, for example, it can occur near data, be triggered by events (e.g....
Conference Paper
Full-text available
Python is increasingly the lingua franca of scientific computing. It is used as a higher level language to wrap lower-level libraries and to compose scripts from various independent components. However, scaling and moving Python programs from laptops to supercomputers remains a challenge. Here we present Parsl, a parallel scripting library for Pyth...
Conference Paper
Full-text available
In this paper we introduce the Data and Learning Hub for Science (DLHub). DLHub serves as a nexus for publishing, sharing, discovering, and reusing machine learning models. It provides a flexible publication platform that enables researchers to describe and deposit models by associating publication and model-specific metadata and assigning a persis...
Conference Paper
Full-text available
High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and...
Preprint
Full-text available
High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and...
Preprint
Full-text available
While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant...
Article
Full-text available
For electric taxicabs, the idle time spent on cruising for passengers, seeking chargers, and charging is wasteful. Previous works can only save cruising time through better routing, or charger seeking and charging time through proper charger deployment, but not for both. With the advancement of wireless charging techniques, efficient opportunistic...
Conference Paper
In spite of many advantages of hybrid electrical/optical datacenter networks (Hybrid-DCN), current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since the schedulers do not aggregate data traffic to facilitate using optical circuit switch (OCS). We propose SchedOCS, a job scheduler for data-parallel frameworks in Hybr...
Article
DISboards is a discussion forum that provides a platform for knowledge sharing on planning and resources for Disney-related travel. Since no previous work has been devoted to studying the online social networks (SNs) in the forums, we examine the SN and knowledge sharing activities in DISboards as a case study of discussion forums. Based on a large...
Article
MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given a...
Conference Paper
Many high-performance computing (HPC) sites extend their clusters to support Hadoop MapReduce for a variety of applications. However, HPC cluster differs from Hadoop cluster on the configurations of storage resources. In the Hadoop Distributed File System (HDFS), data resides on the compute nodes, while in the HPC cluster, data is stored on separat...
Article
For predictable application performance or fairness in network sharing in clouds, many bandwidth allocation policies have been proposed. However, with these policies, tenants are not incentivized to use idle bandwidth or prevent link congestion, and may even take advantage of the policies to gain unfair bandwidth allocation. Increasing network util...
Article
Scale-up machines perform better for jobs with small and median (KB, MB) data sizes, while scale-out machines perform better for jobs with large (GB, TB) data size. Since a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which how...
Conference Paper
For predictable application performance or fairness in network sharing in clouds, many bandwidth allocation policies have been proposed. However, with these policies, tenants are not incentivized to use idle bandwidth or prevent link congestion, and may even take advantage of the policies to gain unfair bandwidth allocation. Increasing network util...

Network

Cited By

Projects