Figure 8 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Architecture of (a) Policy networks (transformer encoders) π distil (·) and π adapt (·); (b) Action decoder (transformer encoder) dec(·); (c) The transformer backbone (encoder-only).
Source publication
The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require lengthy inference times and face challenges in interacting with real-time autonomous driving environments. A key...
Context in source publication
Context 1
... framework. [43] utilises reverse Kullback-Leibler divergence to ensure that the student model does not overestimate the low-probability regions in the teacher distribution. Currently, knowledge distillation from LLM to RL for autonomous driving remains unexplored. To verify the contribution of the RAPID(Attentive) architecture (as shown in Fig. 8) in the offline training phase, we conduct extra experiments in this section. As illustrated in Tab. 3, we compared the DQN, DDQN, and CQL under MLP and RAPID architecture, respectively. As the environment complexity increases, the performance gap between RAPID and MLP narrows, suggesting RAPID handles simpler environments more ...