Hongtu Zhu

Hongtu Zhu
University of North Carolina at Chapel Hill | UNC · Biomedical Research Imaging Center

PhD

About

543
Publications
84,526
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,242
Citations
Citations since 2017
237 Research Items
10260 Citations
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
Additional affiliations
May 2016 - present
University of Texas MD Anderson Cancer Center
Position
  • Professor
August 2006 - present
University of North Carolina at Chapel Hill
Position
  • Professor (Full)
January 2004 - August 2006
New York State Psychiatric Institute
Position
  • Research Scientist VII

Publications

Publications (543)
Preprint
Full-text available
Unsupervised domain adaptation (UDA) via deep learning has attracted appealing attention for tackling domain-shift problems caused by distribution discrepancy across different domains. Existing UDA approaches highly depend on the accessibility of source domain data, which is usually limited in practical scenarios due to privacy protection, data sto...
Article
Online marketplace is a digital platform that connects buyers (demand) and sellers (supply) and provides exposure opportunities that individual participants would not otherwise have access to. The KDD-22 Workshop on Decision Intelligence and Analytics for Online Marketplaces: Jobs, Ridesharing, Retail, and Beyond brought together academics and prac...
Article
In this paper, we present a comprehensive, in-depth survey of the literature on reinforcement learning approaches to decision optimization problems in a typical ridesharing system. Papers on the topics of rideshare matching, vehicle repositioning, ride-pooling, routing, and dynamic pricing are covered. Most of the literature has appeared in the las...
Preprint
Full-text available
Mounting evidence shows that the complex interaction between genes and environmental exposures is one of the major reasons for the diverse trajectories of micro- and macro-structural alterations across individuals. However, it is largely elusive how the multi-factorial mechanism, through which the gene-by-environment interaction exerts pathogenetic...
Preprint
Full-text available
The aim of this paper is to provide a comprehensive review of statistical challenges in neuroimaging data analysis from neuroimaging techniques to large-scale neuroimaging studies to statistical learning methods. We briefly review eight popular neuroimaging techniques and their potential applications in neuroscience research and clinical translatio...
Article
Motivated by the analysis of longitudinal neuroimaging studies, we study the longitudinal functional linear regression model under asynchronous data setting for modeling the association between clinical outcomes and functional (or imaging) covariates. In the asynchronous data setting, both covariates and responses may be measured at irregular and m...
Preprint
This paper is motivated by the joint analysis of genetic, imaging, and clinical (GIC) data collected in many large-scale biomedical studies, such as the UK Biobank study and the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We propose a regression framework based on partially functional linear regression models to map high-dimensional G...
Preprint
Full-text available
Brain ventricular and subcortical structures are heritable both in size and shape. Genetic influences on brain region size have been studied using conventional volumetric measures, but little is known about the genetic basis of ventricular and subcortical shapes. Here we developed pipelines to extract seven complementary shape measures for lateral...
Preprint
Full-text available
Sleep is essential for the health of the brain and heart. Although sleep has been identified as a factor in a few specific clinical outcomes, a systematic analysis of the relationship between sleep and brain/heart and their genetic underpinnings is lacking. Medical images can provide useful clinical endophenotypes for organ structures and functions...
Article
Background The objective of this study was to determine the prevalence of pyridoxine deficiency, measured by pyridoxal phosphate (PLP) levels, in patients admitted to the hospital with established (benzodiazepine-resistant) status epilepticus (SE) (eSE) and to compare to three control groups: intensive care unit (ICU) patients without SE (ICU-noSE)...
Article
Full-text available
Early dietary exposure via human milk nutrients offers a window of opportunity to support cognitive and temperament development. While several studies have focused on associations of few pre-selected human milk nutrients with cognition and temperament, it is highly plausible that human milk nutrients synergistically and jointly support cognitive an...
Article
Full-text available
This article is concerned with constructing a confidence interval for a target policy’s value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications su...
Article
Full-text available
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused over six million deaths in the ongoing COVID-19 pandemic. SARS-CoV-2 uses ACE2 protein to enter human cells, raising a pressing need to characterize proteins/pathways interacted with ACE2. Large-scale proteomic profiling technology is not mature at single-cell resolution to exa...
Article
Full-text available
We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health appli...
Article
Full-text available
Attention-deficit/hyperactivity disorder (ADHD) is one of the most common neurodevelopmental disorders of childhood, and is often characterized by altered executive functioning. Executive function has been found to be supported by flexibility in dynamic brain reconfiguration. Thus, we applied multilayer community detection to resting-state fMRI dat...
Article
Over the past 30 years, magnetic resonance imaging has become a ubiquitous tool for accurately visualizing the change and development of the brain’s subcortical structures (e.g., hippocampus). Although subcortical structures act as information hubs of the nervous system, their quantification is still in its infancy due to many challenges in shape e...
Article
This paper aims to propose an intrinsic partial linear modelling (IPLM) framework for characterizing the complex relationship between the response manifold-valued data and a set of explanatory variables such as age, education years, or gender. Such manifold value data are widespread in medical imaging, gesture recognition, computer vision, feature...
Article
Full-text available
Fiber photometry is an emerging technique for recording fluorescent sensor activity in the brain. However, significant hemoglobin absorption artifacts in fiber photometry data may be misinterpreted as sensor activity changes. Because hemoglobin exists widely in the brain, and its concentration varies temporally, such artifacts could impede the accu...
Article
Full-text available
Spatial transcriptomics (ST) technologies allow researchers to examine transcriptional profiles along with maintained positional information. Such spatially resolved transcriptional characterization of intact tissue samples provides an integrated view of gene expression in its natural spatial and functional context. However, high-throughput sequenc...
Article
Full-text available
Single-cell RNA sequencing studies have suggested that total mRNA content correlates with tumor phenotypes. Technical and analytical challenges, however, have so far impeded at-scale pan-cancer examination of total mRNA content. Here we present a method to quantify tumor-specific total mRNA expression (TmS) from bulk sequencing data, taking into ac...
Article
Alzheimer’s disease is a progressive form of dementia that results in problems with memory, thinking, and behavior. It often starts with abnormal aggregation and deposition of β amyloid and tau, followed by neuronal damage such as atrophy of the hippocampi, leading to Alzheimer’s disease (AD). The aim of this article is to map the genetic-imaging-c...
Preprint
Full-text available
Early dietary exposure via human milk (HM) components offers a window of opportunity to support cognitive and temperamental development. While several studies have focused on associations of few pre-selected HM components with cognition and temperament, it is highly plausible that HM components synergistically and jointly support cognitive and beha...
Article
Full-text available
Abstract This paper investigates the central limit theorem for linear spectral statistics of high dimensional sample covariance matrices of the form $B_n = n^{−1} \sum_{j=1}^n Q x_j x_j^* Q^∗$ under the assumption that p/n → y > 0, where Q is a p × k nonrandom matrix and {x_j }_{j=1}^n is a sequence of independent k-dimensional random vector with i...
Article
Osteoarthritis (OA) is the most common disabling joint disease. Magnetic resonance (MR) imaging has been commonly used to assess knee joint degeneration due to its distinct advantage in detecting morphologic cartilage changes. Although several statistical methods over conventional radiography have been developed to perform quantitative cartilage an...
Article
Full-text available
The human brain forms functional networks of correlated activity, which have been linked with both cognitive and clinical outcomes. However, the genetic variants affecting brain function are largely unknown. Here, we used resting-state functional magnetic resonance images from 47,276 individuals to discover and validate common genetic variants infl...
Preprint
Full-text available
The aim of this paper is to propose a novel estimation method of using genetic-predicted observations to estimate trans-ancestry genetic correlations, which describes how genetic architecture of complex traits varies among populations, in genome-wide association studies (GWAS). Our new estimator corrects for prediction errors caused by high-dimensi...
Preprint
Full-text available
Genetic prediction of complex traits and diseases has attracted enormous attention in precision medicine, mainly because it has the potential to translate discoveries from genome-wide association studies (GWAS) into medical advances. As the high dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants has a blo...
Preprint
Uplift modeling is a rapidly growing approach that utilizes machine learning and causal inference methods to estimate the heterogeneous treatment effects. It has been widely adopted and applied to online marketplaces to assist large-scale decision-making in recent years. The existing popular methods, like forest-based modeling, either work only for...
Article
In the era of big data, univariate models have been widely used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high‐dimensional dense setting, in which many features are ‘true’, but weak signals. Genome‐wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimato...
Preprint
We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health appli...
Preprint
Full-text available
Functional magnetic resonance imaging (fMRI) has been widely used to identify brain regions linked to critical functions, such as language and vision, and to detect tumors, strokes, brain injuries, and diseases. It is now known that large sample sizes are necessary for fMRI studies to detect small effect sizes and produce reproducible results. Here...
Preprint
Full-text available
Spatial transcriptomic (ST) technologies allow researchers to examine high-quality RNA-sequencing data along with maintained two-dimensional positional information as well as a co-registered histology image. A popular use of ST omics data is to provide insights about tissue structure and spatially unique features. However, due to the technical natu...
Preprint
Policy evaluation based on A/B testing has attracted considerable interest in digital marketing, but such evaluation in ride-sourcing platforms (e.g., Uber and Didi) is not well studied primarily due to the complex structure of their temporal and/or spatial dependent experiments. Motivated by policy evaluation in ride-sourcing platforms, the aim of...
Preprint
This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such...
Preprint
The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet manag...
Article
This paper develops a functional hybrid factor regression modelling framework to handle the heterogeneity of many large-scale imaging studies, such as the Alzheimer’s disease neuroimaging initiative study. Despite the numerous successes of those imaging studies, such heterogeneity may be caused by the differences in study environment, population, d...
Article
Full-text available
A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those...
Article
Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-sourc...
Preprint
Tech companies (e.g., Google or Facebook) often use randomized online experiments and/or A/B testing primarily based on the average treatment effects to compare their new product with an old one. However, it is also critically important to detect qualitative treatment effects such that the new one may significantly outperform the existing one only...
Preprint
Full-text available
Cardiovascular health interacts with cognitive and psychological health in complex ways. Yet, little is known about the phenotypic and genetic links of heart-brain systems. Using cardiac and brain magnetic resonance imaging (CMR and brain MRI) data from over 40,000 UK Biobank subjects, we developed detailed analyses of the structural and functional...
Article
Tech companies (e.g., Google or Facebook) often use randomized online experiments and/or A/B testing primarily based on the average treatment effects to compare their new product with an old one. However, it is also critically important to detect qualitative treatment effects such that the new one may significantly outperform the existing one only...
Article
As demands for convenient and comfortable mobility grow, transportation network companies (TNCs) begin to diversify the ride-hailing services they offer. Modes that are offered now include ride-pooling (RP), non-ride-pooling (NP), and a third “bundled” option, which combines RP and NP. This emerging bundled option allows riders to be served via eit...
Article
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle repositioning on ride-hailing (a type of mobility-on-demand, MoD) platforms. Our approach learns the spatiotemporal state-value function using a batch training algorithm with deep value networks. The optimal repositioning acti...
Preprint
Full-text available
Fiber-photometry is an emerging technique for recording fluorescent sensor activity in the brain. However, significant hemoglobin-absorption artifacts in fiber-photometry data may be misinterpreted as sensor activity changes. Because hemoglobin exists in nearly every location in the brain and its concentration varies over time, such artifacts could...
Article
Despite interest in the joint modeling of multiple functional responses such as diffusion properties in neuroimaging, robust statistical methods appropriate for this task are lacking. To address this need, we propose a varying coefficient quantile regression model able to handle bivariate functional responses. Our work supports innovative insights...
Article
Full-text available
Individual variations of white matter (WM) tracts are known to be associated with various cognitive and neuropsychiatric traits. Diffusion tensor imaging (DTI) and genome-wide single-nucleotide polymorphism (SNP) data from 17,706 UK Biobank participants offer the opportunity to identify novel genetic variants of WM tracts and explore the genetic ov...
Article
This paper proposes a macroscopic fluid modeling framework to assist with strategic decision making of a platform for operating a large-scale on-demand app-based ride-hailing system. The framework captures the spatiotemporal characteristics of a ride-hailing system, and is flexible in representing control policies that a platform is implementing. I...
Preprint
Full-text available
The human cerebral cortex plays a crucial role in brain functions. However, genetic influences on the human cortical functional organizations are not well understood. Using a parcellation-based approach with resting-state and task-evoked functional magnetic resonance imaging (fMRI) from 40,253 individuals, we identified 47 loci associated with func...
Article
The aim of this paper is to develop a weighted functional linear Cox regression model that accounts for the association between a failure time and a set of functional and scalar covariates. We formulate the weighted functional linear Cox regression by incorporating a comprehensive three-stage estimation procedure as a unified methodology. Specifica...
Article
Connecting the dots on white matter The white matter of the brain, which is composed of axonal tracts connecting different brain regions, plays key roles in both normal brain function and a variety of neurological disorders. Zhao et al. combined detailed magnetic resonance imaging–based assessment of brain structures with genetic data on nearly 44,...
Preprint
Full-text available
Recent works on ride-sharing order dispatching have highlighted the importance of taking into account both the spatial and temporal dynamics in the dispatching process for improving the transportation system efficiency. At the same time, deep reinforcement learning has advanced to the point where it achieves superhuman performance in a number of fi...
Preprint
Full-text available
Cancers can vary greatly in their transcriptomes. In contrast to alterations in specific genes or pathways, differences in tumor cell total mRNA content have not been comprehensively assessed. Technical and analytical challenges have impeded examination of total mRNA expression at scale across cancers. To address this, we developed a model for quan...
Preprint
Full-text available
Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day, providing great promises for improving transportation efficiency through the tasks of order dispatching and vehicle repositioning. Existing studies, however, usually consider the two tasks in sim...
Article
Full-text available
Structural variations of the human brain are heritable and highly polygenic traits, with hundreds of associated genes identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) can both prioritize these GWAS findings and also identify additional gene-trait associations. Here we perform cross-tissue TW...
Preprint
Full-text available
In this paper, we present a comprehensive, in-depth survey of the literature on reinforcement learning approaches to ridesharing problems. Papers on the topics of rideshare matching, vehicle repositioning, ride-pooling, and dynamic pricing are covered. Popular data sets and open simulation environments are also introduced. Subsequently, we discuss...
Chapter
Children’s urban air pollution exposures result in systemic and brain inflammation and the early hallmarks of Alzheimer’s disease (AD). The apolipoprotein E (APOE) ε4 allele is the most prevalent genetic risk for AD. We assessed whether APOE in healthy children modulates cognition, olfaction, and metabolic brain indices. The Wechsler Intelligence S...
Preprint
Full-text available
Subpopulations of tumor cells characterized by mutation profiles may confer differential fitness and consequently influence prognosis of cancers. Understanding subclonal architecture has the potential to provide biological insight in tumor evolution and advance precision cancer treatment. Recent methods comprehensively integrate single nucleotide v...
Article
Full-text available
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all...
Chapter
Tumor segmentation is an important research topic in medical image segmentation. With the fast development of deep learning in computer vision, automated segmentation of brain tumors using deep neural networks becomes increasingly popular. U-Net is the most widely-used network in the applications of automated image segmentation. Many well-performed...
Article
Cross-trait polygenic risk score (PRS) method has gained popularity for assessing genetic correlation of complex traits using summary statistics from biobank-scale genome-wide association studies (GWAS). However, empirical evidence has shown a common bias phenomenon that highly significant cross-trait PRS can only account for a very small amount of...
Preprint
Full-text available
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle repositioning on ride-hailing (a type of mobility-on-demand, MoD) platforms. Our approach learns the spatiotemporal state-value function using a batch training algorithm with deep value networks. The optimal repositioning acti...
Article
How to dynamically measure the local-to-global spatio-temporal coherence between demand and supply networks is a fundamental task for ride-sourcing platforms, such as DiDi. Such coherence measurement is critically important for the quantification of the market efficiency and the comparison of different platform policies, such as dispatching. The ai...
Article
en We thank all the discussants for sharing their valuable viewpoints on the proposed statistical disease mapping (SDM) framework. In our article, we addressed the issue of imaging heterogeneity at both the global and local scales by efficiently borrowing common information shared among a large number of diseased and normal subjects. Understanding...
Preprint
Full-text available
The aim of this paper is to introduce a novel graph-based equilibrium metric (GEM) to quantify the distance between two discrete measures with possibly different masses on a weighted graph structure. This development is primarily motivated by dynamically measuring the local-to-global spatio-temporal coherence between demand and supply networks obta...
Article
Full-text available
Classical clusterwise linear regression is a useful method for investigating the relationship between scalar predictors and scalar responses with heterogeneous variation of regression patterns for different subgroups of subjects. This paper extends the classical clusterwise linear regression to incorporate multiple functional predictors by represen...
Article
en Many cancers and neuro-related diseases display significant phenotypic and genetic heterogeneity across subjects and subpopulations. Characterizing such heterogeneity could transform our understanding of the etiology of these conditions and inspire new approaches to urgently needed prevention, diagnosis, treatment, and prognosis. However, most e...