Roy Campbell

Roy Campbell
University of Illinois, Urbana-Champaign | UIUC · College of Engineering, Department of Computer Science

PhD

About

636
Publications
125,097
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,199
Citations

Publications

Publications (636)
Preprint
Full-text available
Background Alzheimer's disease and related dementias (ADRD) and Parkinson's disease (PD) are the most common neurodegenerative conditions. These central nervous system disorders impact both the structure and function of the brain and may lead to imaging changes that precede symptoms. Patients with ADRD or PD have long asymptomatic phases that exhib...
Preprint
Full-text available
Focused ultrasound (FUS) represents an innovative, non-invasive method for modulating the permeability of the blood-brain barrier (BBB), allowing transient openings for therapeutic delivery. Yet, excessive BBB disruption risks cerebral damage and neurological symptoms. Current imaging techniques typically lack the ability to provide detailed hemody...
Article
Photoacoustic (PA) imaging can map the physiological conditions of tissues and track the biodistribution of contrast agents. Ultrasound localization microscopy (ULM) with microbubbles provides deep-tissue super-resolution blood vessel images and blood velocity maps. The integration of these techniques offers a potential tool for oncological applica...
Article
High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and pro...
Chapter
The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19....
Article
Full-text available
The clinical manifestations of Parkinson’s disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease cours...
Preprint
Full-text available
Longitudinal multi-dimensional biological datasets are ubiquitous and highly abundant. These datasets are essential to understanding disease progression, identifying subtypes, and drug discovery. Discovering meaningful patterns or disease pathophysiologies in these datasets is challenging due to their high dimensionality, making it difficult to vis...
Article
Full-text available
Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make...
Preprint
Full-text available
Background The clinical manifestations of Parkinson’s disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the di...
Preprint
Full-text available
Background: The clinical manifestations of Parkinson’s disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the d...
Preprint
Various learning models distinguish between an electroencephalogram (EEG) record of a normal patient and one having a seizure. In this paper, we propose a deep-learning based short-term memory (LSTM) model to identify whether an EEG record belongs to a seizure-prone patient with a non-seizure record or to a normal patient. The study builds on two d...
Chapter
Full-text available
The COVID-19 pandemic has caused millions of infections and deaths worldwide in an ongoing pandemic. With the passage of time, several variants of this virus have surfaced. Machine learning methods and algorithms have been very useful in understanding the virus and its implications so far. In this paper, we have studied a set of novelty detection a...
Article
Full-text available
Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make...
Article
Background Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of AL...
Preprint
Full-text available
The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19....
Preprint
Full-text available
Background The disease entity known as amyotrophic lateral sclerosis (ALS) is now known to represent a collection of overlapping syndromes. A better understanding of this heterogeneity and the ability to distinguish ALS subtypes would improve the clinical care of patients and enhance our understanding of the disease. Subtype profiles could be incor...
Preprint
Full-text available
Background Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multi-modal data is key moving forward. We build upon previous work to deliver multi-modal predictions of Parkinson’s Disease (PD). Methods We performed automated ML on multi-modal data from the Parkinso...
Preprint
GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the bio...
Article
The COrona VIrus Disease (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) has resulted in a challenging number of infections and deaths worldwide. In order to combat the pandemic, several countries worldwide enforced mitigation measures in the forms of lockdowns, social distancing, and disinfection measu...
Preprint
Full-text available
The COrona VIrus Disease (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) has resulted in a challenging number of infections and deaths worldwide. In order to combat the pandemic, several countries worldwide enforced mitigation measures in the forms of lockdowns, social distancing and disinfection measur...
Preprint
The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communica...
Preprint
Full-text available
Background Alzheimer’s disease (AD) is a common, age-related, neurodegenerative disease that impairs a person's ability to perform day to day activities. Diagnosing AD is difficult, especially in the early stages, many individuals go undiagnosed partly due to the complex heterogeneity in disease progression. This highlights a need for early predict...
Article
Full-text available
Background Unplanned readmission of a hospitalized patient is an indicator of patients’ exposure to risk and an avoidable waste of medical resources. In addition to hospital readmission, intensive care unit (ICU) readmission brings further financial risk, along with morbidity and mortality risks. Identification of high-risk patients who are likely...
Preprint
The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed...
Preprint
Full-text available
Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer m...
Chapter
This chapter explores the problems of scalability of cloud computing systems. Scalability allows a cloud application to change in size, volume, or geographical distribution while meeting the needs of the cloud customer. A practical approach to scaling cloud applications is to improve the availability of the application by replicating the resources...
Preprint
Full-text available
Alzheimer's disease (AD) is a degenerative brain disease impairing a person's ability to perform day to day activities. The clinical manifestations of Alzheimer's disease are characterized by heterogeneity in age, disease span, progression rate, impairment of memory and cognitive abilities. Due to these variabilities, personalized care and treatmen...
Chapter
The adoption of cloud computing by the U.S. government, including the Department of Defense, is proceeding quickly [1, 4, 8, 9] and is likely to become widespread [5]. As government becomes more comfortable with the technology, mission‐oriented cloud computing seems inevitable. However, security remains a top threat to the use of clouds for dependa...
Article
Full-text available
This paper describes the rationale for and implementation of an experimental graduate-level cybersecurity ethics course curriculum recently piloted at the at the University of Illinois at Urbana-Champaign. This case study-based ethics curriculum immerses students in real life ethical dilemmas within cybersecurity and engages in open dialogue and de...
Preprint
Full-text available
Background Unplanned readmission of a hospitalized patient is an extremely undesirable outcome as the patient may have been exposed to additional risks. The rates of unplanned readmission are, therefore, regarded as an important performance indicator for the medical quality of a hospital and healthcare system. Identifying high-risk patients likely...
Preprint
Full-text available
Background: The clinical manifestations of Parkinson disease are characterized by heterogeneity in age at onset, disease duration, rate of progression, and constellation of motor versus nonmotor features. Due to these variable presentations, counseling of patients about their individual risks and prognosis is limited. There is an unmet need for pre...
Article
State-of-the-art machine learning systems rely on graph-based models, with the distributed training of these models being the norm in AI-powered production pipelines. The performance of these communication-heavy systems depends on the effective overlap of communication and computation. While the overlap challenge has been addressed in systems with...
Article
Full-text available
Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existi...
Article
We present a mechanism that puts users in the center of control and empowers them to dictate the access to their collections of data. Revisiting the fundamental mechanisms in security for providing protection, our solution uses capabilities, access lists, and access rights following well-understood formal notions for reasoning about access. This co...
Article
In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering,...
Article
While there exist many isolation mechanisms that are available to cloud service providers, including virtual machines, containers, etc., the problem of side-channel increases in importance as a remaining security vulnerability, particularly in the presence of shared caches and multicore processors. In this paper we present a hardware-software mecha...
Article
Distributed stream processing systems need to support stateful processing, recover quickly from failures to resume such processing, and reprocess an entire data stream quickly. We present Apache Samza, a distributed system for stateful and fault-tolerant stream processing. Samza utilizes a partitioned local state along with a low-overhead backgroun...
Article
Genetics has proven to be a powerful approach in neurodegenerative diseases research, resulting in the identification of numerous causal and risk variants. Previously, we introduced the NeuroX Illumina genotyping array, a fast and efficient genotyping platform designed for the investigation of genetic variation in neurodegenerative diseases. Here,...
Conference Paper
This paper extends the concepts behind cloud services to offer hypervisor-based reliability and security monitors for cloud virtual machines. Cloud VMs can be heterogeneous and as such guest OS parameters needed for monitoring can vary across different VMs and must be obtained in some way. Past work involves running code inside the VM, which is una...
Article
This paper extends the concepts behind cloud services to offer hypervisor-based reliability and security monitors for cloud virtual machines. Cloud VMs can be heterogeneous and as such guest OS parameters needed for monitoring can vary across different VMs and must be obtained in some way. Past work involves running code inside the VM, which is una...
Conference Paper
The great diffusion of cloud computing applications and services in the last years has brought new threats to security of information. ¹ IT Certification and authorization mechanisms try to provide assurance against those threats by leveraging high security standards and controls. Two examples of such certification based on IT security controls are...
Article
Full-text available
Data locality is a fundamental problem to data-parallel applications where data-processing tasks consume different amounts of time and resources at different locations. The problem is especially prominent under stressed conditions such as hot spots. While replication based on data popularity relieves hot spots due to contention for a single file, h...
Conference Paper
The infrastructure beneath a worldwide social network has to continually serve billions of variable-sized media objects such as photos, videos, and audio clips. These objects must be stored and served with low latency and high throughput by a system that is geo-distributed, highly scalable, and load-balanced. Existing file systems and object stores...
Conference Paper
Many in the networking community believe that Software-Defined Networking, in which entire networks are managed centrally, has the potential to revolutionize the field. However, SDN faces several challenges that have prevented its wide-spread adoption. Current SDN technologies, such as OpenFlow, provide powerful and flexible APIs, but can be unreas...
Technical Report
Full-text available
In this paper, we propose a data acquisition and analysis framework for materials-to-devices processes, named 4CeeD, that focuses on the immense potential of capturing, accurately curating, correlating, and coordinating materials-to-devices digital data in a real-time and trusted manner before fully archiving and publishing them for wide access and...
Conference Paper
The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed...
Conference Paper
This paper reports experiences and lessons learned in the process of developing and implementing an undergraduate curriculum for digital forensics over the last three years at the University of Illinois at Urbana-Champaign. The project addresses the challenges of developing a higher-education standardized curriculum for digital forensics that meets...
Conference Paper
Distributed graph processing systems largely rely on proactive techniques for failure recovery. Unfortunately, these approaches (such as checkpointing) entail a significant overhead. In this paper, we argue that distributed graph processing systems should instead use a reactive approach to failure recovery. The reactive approach trades off complete...
Article
Full-text available
Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a "four-headed beast"-i...
Article
Full-text available
The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and data centers. During the lifetime of a datacenter, old hardware needs to be eventually replaced by new hardware. The hardware s...
Article
Full-text available
Virtualization has demonstrated its importance in both public and private cloud computing solutions. In such environments, multiple virtual instances run on the same physical machine concurrently. Thus, the isolation in the system is not guaranteed by the physical infrastructure anymore. Reliance on logical isolation makes a system vulnerable to at...
Conference Paper
Full-text available
Efficient namespace metadata management is increasingly important as next-generation storage systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of an appropriate namespace metadata benchmark. We describe MimesisBench, a novel namespace metadata benchmark for nex...
Conference Paper
Numerous file systems have been implemented to meet the needs in today's big data era, however many of them require specific configurations or frameworks for data processing. This paper presents CouchFS, a POSIX-compliant distributed file system for large data sets. We build CouchFS on top of CouchDB, which grants us flexibility to handle semistruc...
Article
The ten papers in this special issue focus on the management of cloud computing services, with special emphasis on Quality-of-Service (QoS), cloud-based resources, security management, data storage, and computer architecture to support cloud services.
Conference Paper
Virtualization techniques are widely used in cloud computing environments today. Such environments are installed with a large number of similar virtual instances sharing the same physical infrastructure. In this paper, we focus on the memory usage optimization across virtual machines by automatically de-duplicating the memory on per-page basis. Our...
Article
Abstract The performance evaluation of large file systems, such as storage and media streaming, motivates scalable generation of representative traces. We focus on two key characteristics of traces, popularity and temporal locality. The common practice of using a system-wide distribution obscures per-object behavior, which is important for system e...
Article
Astronomy, as is the case with many scientific domains, has entered the realm of being a data rich science. Nowhere is this reflected more clearly than in the growth of large area surveys, such as the recently completed Sloan Digital Sky Survey (SDSS) or the Dark Energy Survey, which will soon obtain PB of imaging data. The data processing on these...
Conference Paper
Vulnerabilities in key communication protocols that drive the daily operations of the power grid may lead to exploits that could potentially disrupt its safety-critical operation and may result in loss of power, consequent financial losses, and disruption of crucial power-dependent services. This paper focuses on the Inter Control Center Communicat...
Article
Full-text available
Cloud computing offers an attractive option for businesses to rent a suitable size MapReduce cluster, consume resources as a service, and pay only for resources that were consumed. A key challenge in such environments is to increase the utilization of MapReduce clusters to minimize their cost. One way of achieving this goal is to optimize the execu...
Conference Paper
In hybrid- or multi-cloud systems, security information and event management systems often work with abstract level information provided by the service providers. Privacy and confidentiality requirements discourage sharing of the raw data. With access to only the partial information, detecting anomalies and policy violations becomes much more diffi...