Wenjun Zou

Wenjun Zou
  • Tsinghua University

About

13
Publications
1,451
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
82
Citations
Current institution
Tsinghua University

Publications

Publications (13)
Preprint
Full-text available
Due to their expressive capacity, diffusion models have shown great promise in offline RL and imitation learning. Diffusion Actor-Critic with Entropy Regulator (DACER) extended this capability to online RL by using the reverse diffusion process as a policy approximator, trained end-to-end with policy gradient methods, achieving strong performance....
Poster
Full-text available
The goal-reaching tasks with safety constraints are common control problems in real world, such as intelligent driving and robot manipulation. The difficulty of this kind of problem comes from the exploration termination caused by safety constraints and the sparse rewards caused by goals. The existing safe RL avoids unsafe exploration by restrictin...
Data
https://icml.cc/virtual/2024/poster/33234
Preprint
Full-text available
Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this prob...
Conference Paper
Full-text available
The goal-reaching tasks with safety constraints are common control problems in real world, such as intelligent driving and robot manipulation. The difficulty of this kind of problem comes from the exploration termination caused by safety constraints and the sparse rewards caused by goals. The existing safe RL avoids unsafe exploration by restrictin...
Article
In the paper, an integrated decision control (IDC) architecture has been introduced, seamlessly integrating autonomous decision-making and motion control into a unified processing framework. This architecture primarily comprises two key modules: a static path planner and a MPC-based dynamic optimal tracker. The former exclusively utilizes static in...
Article
Full-text available
Though policy evaluation error profoundly affects the direction of policy optimization and the convergence property, it is usually ignored in policy iteration methods. This work incorporates the practical inexact policy evaluation into a simultaneous policy update paradigm to reach the Nash equilibrium of the nonlinear zero-sum games. In the propos...
Article
Safety is a critical concern when applying reinforcement learning (RL) to real-world control problems. A widely used method for ensuring safety is to learn a control barrier function with heuristic feasibility labels that come from expert demonstrations [1] or constraint functions [2]. However, their forward invariant sets fall short of the maximum...
Article
Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robus...
Preprint
Full-text available
Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robus...

Network

Cited By