Cumulative reward on LANE-3-DENSITY-2 with progressively increasing observation (s) attack strength. The LLM-based agent π LLM exhibits improved robustness compared with vanilla Offline DQN.

Cumulative reward on LANE-3-DENSITY-2 with progressively increasing observation (s) attack strength. The LLM-based agent π LLM exhibits improved robustness compared with vanilla Offline DQN.

Source publication
Preprint
Full-text available
The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require lengthy inference times and face challenges in interacting with real-time autonomous driving environments. A key...

Contexts in source publication

Context 1
... Uniform Gaussian FGSM PGD 0.0 Recall the offline RL objective in Eq. (3)(4). Let the LLMdistil policy be π distil , with the collected dataset D LLM , the offline training is to optimize J(Q, π distill , D LLM ) for an improved Q function, then update the policy w.r.t Q. Empirically as shown in Fig. 1, the LLM-based agent π LLM is more robust against malicious state injection under the autonomous driving setting. However, a distilled offline policy is not as robust compared to LLMs, demonstrated by [25], where the Q value can change drastically over neighbouring states, leading to an unstable policy. Therefore, vanilla offline RL ...
Context 2
... Prompt. As shown in Fig. 10, the Prefix Prompt part primarily consists of an introduction to the autonomous driving task, a description of the scenario, common sense rules, and instructions for the output format. The previous decision and explanation are obtained from the experience buffer. The current scenario information plays an important role while making ...
Context 3
... We demonstrate one example to make readers better understand the reasoning process of GPT-3.5. As shown in Fig. 11, the ego car initially checks the available actions and related safety outcomes. On the first round of thought, GPT-3.5 tries to understand the situation of the ego car and checks the available lanes for decision-making. After several rounds of interaction, it checks whether the action keep speed is safe with vehicle 7. Finally, it ...