Xianyuan Zhan

Xianyuan Zhan
Tsinghua University | TH

Assistant Professor

About

85
Publications
27,308
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,809
Citations
Introduction
Offline reinforcement learning, data-driven methods for transportation applications
Additional affiliations
August 2011 - present
Purdue University West Lafayette
Position
  • Research Assistant

Publications

Publications (85)
Chapter
Semi-supervised learning holds great promise for many real-world applications, due to its ability to leverage both unlabeled and expensive labeled data. However, most semi-supervised learning algorithms still heavily rely on the limited labeled data to infer and utilize the hidden information from unlabeled data. We note that any semi-supervised le...
Preprint
Full-text available
Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions. Directly annotating multimodal instructions for model training proves impractical, due to the sparsity of paired multimodal data. In this study, we demonstrate t...
Preprint
Full-text available
Reusing pre-collected data from different domains is an appealing solution for decision-making tasks that have insufficient data in the target domain but are relatively abundant in other related domains. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as...
Conference Paper
Full-text available
The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily access...
Preprint
Full-text available
One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on th...
Preprint
Full-text available
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, the effectiveness of HRL is greatly influenced by subgoal reachability. Typical HRL methods only consider subgoal reachability from the unilateral level, where a dominant level enforces compliance to the subordinat...
Preprint
Full-text available
Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatil...
Preprint
Full-text available
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy per...
Preprint
Full-text available
Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, w...
Preprint
Full-text available
Multimodal pretraining has emerged as an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progression information; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via...
Conference Paper
Full-text available
In this study, we investigate the DIstribution Correction Estimation (DICE) methods , an important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE-based methods impose state-action-level behavior constraint , which is an ideal choice for offline learning. However, they typically perform much worse than current...
Preprint
Full-text available
The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily access...
Conference Paper
Full-text available
Safe offline reinforcement learning is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios...
Preprint
Full-text available
Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose...
Preprint
Full-text available
Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning policies from offline datasets without environmental interactions. Despite some success in the single-agent setting, offline multi-agent RL (MARL) remains to be a challenge. The large joint state-action space and the...
Preprint
Full-text available
Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients i...
Preprint
Full-text available
Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expen...
Preprint
Full-text available
Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that Q-values are ind...
Preprint
Full-text available
Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents’ behavior with human desired outcomes, but is often restrained by costly human feedback. To improve feedback efficiency, most existing PbRL methods focus on selecting queries to maximally improve the overall quality of the reward model, but counter-intuitively,...
Preprint
Full-text available
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel...
Conference Paper
Full-text available
In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas. Unfortunately, existing offline RL methods are often over-conservative, inevitably hurting generalization performance outside data distribution. In our study, one interesting observation i...
Preprint
Full-text available
In the field of artificial intelligence for science, it is consistently an essential challenge to face a limited amount of labeled data for real-world problems. The prevailing approach is to pretrain a powerful task-agnostic model on a large unlabeled corpus but may struggle to transfer knowledge to downstream tasks. In this study, we propose Instr...
Preprint
Full-text available
Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing $Q$-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed...
Conference Paper
Full-text available
Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing Q-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recent proposed In-s...
Preprint
Full-text available
In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas. Unfortunately, existing offline RL methods are often over-conservative, inevitably hurting generalization performance outside data distribution. In our study, one interesting observation i...
Preprint
Full-text available
Reward function is essential in reinforcement learning (RL), serving as the guiding signal to incentivize agents to solve given tasks, however, is also notoriously difficult to design. In many cases, only imperfect rewards are available, which inflicts substantial performance loss for RL agents. In this study, we propose a unified offline policy op...
Chapter
Full-text available
Contrastive learning (CL) has recently been applied to adversarial learning tasks. Such practice considers adversarial samples as additional positive views of an instance, and by maximizing their agreements with each other, yields better adversarial robustness. However, this mechanism can be potentially flawed, since adversarial perturbations may c...
Conference Paper
Full-text available
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In t...
Preprint
Full-text available
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In t...
Article
Full-text available
Improving the efficiency of coal-fired power plants has numerous benefits. The control strategy is one of the major factors affecting such efficiency. However, due to the complex and dynamic environment inside the power plants, it is hard to extract and evaluate control strategies and their cascading impact across massive sensors. Existing manual a...
Preprint
Full-text available
Heated debates continue over the best solution for autonomous driving. The classic modular pipeline is widely adopted in the industry owing to its great interpretability and stability, whereas the fully end-to-end paradigm has demonstrated considerable simplicity and learnability along with the rise of deep learning. As a way of marrying the advant...
Conference Paper
Full-text available
Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offli...
Preprint
Full-text available
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. Instead, the agent is provided with a supplementary offline dataset from suboptimal behaviors. Prior works that address this problem either require that expert data occupies the m...
Preprint
Full-text available
Contrastive learning (CL) has recently been applied to adversarial learning tasks. Such practice considers adversarial samples as additional positive views of an instance, and by maximizing their agreements with each other, yields better adversarial robustness. However, this mechanism can be potentially flawed, since adversarial perturbations may c...
Preprint
Full-text available
Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data due to covariate shift. Including a learned dynamics model can potentially improve the state-action space coverage...
Conference Paper
Full-text available
The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restric...
Preprint
Full-text available
Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offli...
Article
Full-text available
Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry. We develop a new data-driven AI system, namely DeepThermal, to optimize the combustion control strategy for TPGUs. At its core, is a new model-based offline reinforcement learning (RL) framework, called MO...
Article
Full-text available
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous....
Conference Paper
Full-text available
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. Instead, the agent is provided with a supplementary offline dataset from suboptimal behaviors. Prior works that address this problem either require that expert data occupies the m...
Article
Full-text available
Detecting anomalies in large complex systems is a critical and challenging task. The difficulties arise from several aspects. First, collecting ground truth labels or prior knowledge for anomalies is hard in real-world systems, which often lead to limited or no anomaly labels in the dataset. Second, anomalies in large systems usually occur in a col...
Article
Offline reinforcement learning (RL) enables learning policies from pre-collected datasets without online data collection. Although it offers the possibility to surpass the performance of the datasets, most existing offline RL algorithms struggle to compete with behavior cloning policies in many dataset settings due to trading off policy improvement...
Preprint
Full-text available
End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics. The latest methods that utilize human demonstration data and unsupervised representation learning has proven to be a promising direction to improve RL learning efficiency. The use of demonstration data also allows "warming-up" the RL policie...
Preprint
Most prior approaches to offline reinforcement learning (RL) utilize \textit{behavior regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty measuring divergence between the policy and the offline data. However, these approaches lack guaranteed performance improvement over the behavior policy. In this work,...
Preprint
Heated debates continue over the best autonomous driving framework. The classic modular pipeline is widely adopted in the industry owing to its great interpretability and stability, whereas the end-to-end paradigm has demonstrated considerable simplicity and learnability along with the rise of deep learning. We introduce a new modularized end-to-en...
Preprint
Full-text available
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous....
Conference Paper
Full-text available
Purchase prediction is an essential task in both online and offline retail industry, especially during major shopping festivals , when strong promotion boosts consumption dramatically. It is important for merchants to forecast such surge of sales and have better preparation. This is a challenging problem, as the purchase patterns during shopping fe...
Conference Paper
Full-text available
Accurate network-wide traffic state estimation is vital to many transportation operations and urban applications. However, existing methods often suffer from the scalability issue when performing real-time inference at the city-level, or not robust enough under limited data. Currently, GPS trajectory data from probe vehicles has become a popular da...
Preprint
Full-text available
Detecting anomalies in large complex systems is a critical and challenging task. The difficulties arise from several aspects. First, collecting ground truth labels or prior knowledge for anomalies is hard in real-world systems, which often lead to limited or no anomaly labels in the dataset. Second, anomalies in large systems usually occur in a col...
Article
Full-text available
Purchase prediction is an essential task in both online and offline retail industry, especially during major shopping festivals, when strong promotion boosts consumption dramatically. It is important for merchants to forecast such surge of sales and have better preparation. This is a challenging problem, as the purchase patterns during shopping fes...
Preprint
Full-text available
Offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction, which provides a promising direction to make RL useable in real-world systems. Although recent offline RL studies have achieved much progress, existing methods still face many practical challenges in real-world system control...
Preprint
Full-text available
Thermal power generation plays a dominant role in the world's electricity supply. It consumes large amounts of coal worldwide, and causes serious air pollution. Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry. We develop a new data-driven AI system, name...
Article
Full-text available
A key issue to understand urban system is to characterize the activity dynamics in a city—when, where, what, and how activities happen in a city. To better understand the urban activity dynamics, city-wide and multiday activity participation sequence data, namely, activity chain as well as suitable spatiotemporal models, are needed. The commonly us...
Article
License-plate recognition (LPR) data are emerging data sources in urban transportation systems which contain rich information. Large-scale LPR systems have seen rapid development in many parts of the world. However, limited by privacy considerations, LPR data are seldom available to the research community, which lead to huge research gap in data-dr...
Article
The spatial correlation between urban sprawl and the underlying road network has long been recognized in urban studies. Accessibility to road networks is often considered an approximation for the measurement of human mobility, which is a key factor in determining potential urban sprawl in the future. Despite the close relationship between urban dev...
Article
Full-text available
In this paper, we used complex network analysis approaches to investigate topological coevolution over a century for three different urban infrastructure networks. We applied network analyses to a unique time-stamped network data set of an Alpine case study, representing the historical development of the town and its infrastructure over the past 10...
Preprint
Full-text available
Effective evacuation of residents in hurricane-affected areas is essential in reducing the overall damage and ensuring public safety. However, traffic flow patterns in evacuation contexts is far more complex than normal traffic and is usually accompanied with severe congestion due to the presence of evacuees. In such scenarios, agent-based simulati...
Article
Full-text available
Just as natural river networks are known to be globally self-similar, recent research has shown that human-built urban networks, such as road networks, are also functionally self-similar, and have fractal topology with power-law node-degree distributions (p(k) = a k). Here we show, for the first time, that other urban infrastructure networks (sanit...
Article
Full-text available
We propose a new framework for modeling the evolution of functional failures and recoveries in complex networks, with traffic congestion on road networks as the case study. Differently from conventional approaches, we transform the evolution of functional states into an equivalent dynamic structural process: dual-vertex splitting and coalescing emb...
Article
Full-text available
This paper develops a complementarity formulation for a multi-user class, simultaneous route and departure time choice dynamic user equilibrium (DUE) model. A path-based multiclass cell transmission model (mCTM) is embedded to propagate the traffic flow on the network. Heterogeneous user classes are incorporated in the new formulation and heterogen...
Article
Full-text available
Hard shoulder running (HSR) and queue warning are two active traffic management (ATM) strategies and are commonly used to alleviate highway traffic congestion. This study proposes an optimisation model for HSR operation in coordination with queue warning service during non-recurring traffic accident condition using an updated cell transmission mode...
Article
Full-text available
Personal mobility carbon allowance (PMCA) schemes are designed to reduce carbon consumption from transportation networks. PMCA schemes influence the travel decision process of users and accordingly impact the system metrics including travel time and greenhouse gas (GHG) emissions. We develop a multi-user class dynamic user equilibrium model to eval...
Conference Paper
Full-text available
In this paper, we investigate the historical development of complex network topologies in urban water distribution networks (WDNs) and urban drainage networks (UDNs). The analyses were performed on time-stamped network data of an Alpine case study, which represent the evolution of the town and its infrastructure over the past 106 years. We use the...
Article
Full-text available
Household behavior and dynamic traffic flows are the two most important aspects of hurricane evacuations. However, current evacuation models largely overlook the complexity of household behavior leading to oversimplified traffic assignments and, as a result, inaccurate evacuation clearance times in the network. In this paper, we present a high fide...
Article
Full-text available
We examine high-resolution urban infrastructure data using every pipe for the water distribution network (WDN) and sanitary sewer network (SSN) in a large Asian city (≈4 million residents) to explore the structure as well as the spatial and temporal evolution of these infrastructure networks. Network data were spatially disaggregated into multiple...