Fig 5 - uploaded by Zhengbing He
Content may be subject to copyright.
Co-Pilot for path tracking control. The system consists of three modules: the Reference State Determination module, the Co-Pilot module, and the Feedback Controller module. The Reference State Determination module calculates the next target and reference state based on the pre-determined path. The Co-Pilot module uses LLMs to process human intentions and memories, providing a Controller ID. The Feedback Controller module generates the control commands. simulation experiments and rank them according to their ability to fit two types of human intentions. Based on this ranking, we use a performance score as an evaluation criterion; see details in Table II. To demonstrate the effect of adjusting the content of the prompts on the performance, we manually update the memories sequentially to prompt-tune the Co-Pilot. Since the selection task is quite straightforward but involves a lot of information, we only tune the prompt part that contains the descriptions of the controllers as an initial attempt. Human experts start from a simple semantic memory (CoPilot A1), providing only the name of the controller, resulting in its blind preference for the popular NMPC controller. Experts speculate that this is because the NMPC controller has received higher ratings in the relevant literature and reference data, without memories of specific scenarios. Therefore, human experts add qualitative descriptions of the performance of different controllers for specific scenarios to update the semantic memory in Co-Pilot A2, while the term curvature in the road state description poses ambiguity for A2. It erroneously assumes that any segment mentioning curvature is curved, leading to inadequate performance in the accuracy of curvature-based information. Hence, the prompt of Co-Pilot A3 is proposed to supply specific data to update the episodic memory and enable LLMs to autonomously learn and synthesize knowledge from the added experiences. This grants it access to abundant and precise

Co-Pilot for path tracking control. The system consists of three modules: the Reference State Determination module, the Co-Pilot module, and the Feedback Controller module. The Reference State Determination module calculates the next target and reference state based on the pre-determined path. The Co-Pilot module uses LLMs to process human intentions and memories, providing a Controller ID. The Feedback Controller module generates the control commands. simulation experiments and rank them according to their ability to fit two types of human intentions. Based on this ranking, we use a performance score as an evaluation criterion; see details in Table II. To demonstrate the effect of adjusting the content of the prompts on the performance, we manually update the memories sequentially to prompt-tune the Co-Pilot. Since the selection task is quite straightforward but involves a lot of information, we only tune the prompt part that contains the descriptions of the controllers as an initial attempt. Human experts start from a simple semantic memory (CoPilot A1), providing only the name of the controller, resulting in its blind preference for the popular NMPC controller. Experts speculate that this is because the NMPC controller has received higher ratings in the relevant literature and reference data, without memories of specific scenarios. Therefore, human experts add qualitative descriptions of the performance of different controllers for specific scenarios to update the semantic memory in Co-Pilot A2, while the term curvature in the road state description poses ambiguity for A2. It erroneously assumes that any segment mentioning curvature is curved, leading to inadequate performance in the accuracy of curvature-based information. Hence, the prompt of Co-Pilot A3 is proposed to supply specific data to update the episodic memory and enable LLMs to autonomously learn and synthesize knowledge from the added experiences. This grants it access to abundant and precise

Source publication
Article
Full-text available
One of the most challenging problems in human-machine co-work is the gap between human intention and the machine's understanding and execution. Large Language Models (LLMs) have been showing superior abilities in solving such issue. In this paper, we design a universal framework that embeds LLMs as a vehicle “Co-Pilot” of driving, which can accompl...

Context in source publication

Context 1
... detailed illustration of the path-tracking control system is presented in Fig.5. ...

Citations

... For most collected scenario datasets, the lack of enough labels and attributes itself creates a bottleneck in the implementation and generalization of the methods. The rise of foundation models, such as GPT [2], [48] and Sora [49]- [51], which are pretrained on massive amount of data, gives the potential to recognize the vehicle behaviors automatically. The models have strong scenario reasoning and understanding ability to handle complex scenarios. ...
Article
Full-text available
Intelligent vehicles and autonomous driving systems rely on scenario engineering for intelligence and index (I&I), calibration and certification (C&C), and verification and validation (V&V). To extract and index scenarios, various vehicle interactions are worthy of much attention, and deserve refined descriptions and labels. However, existing methods cannot cope well with the problem of scenario classification and labeling with vehicle interactions as the core. In this paper, we propose VistaScenario framework to conduct interaction scenario engineering for vehicles with intelligent systems for transport automation. Based on the summarized basic types of vehicle interactions, we slice scenario data stream into a series of segments via spatiotemporal scenario evolution tree. We also propose the scenario metric Graph-DTW based on Graph Computation Tree and Dynamic Time Warping to conduct refined scenario comparison and labeling. The extreme interaction scenarios and corner cases can be efficiently filtered and extracted. Moreover, with naturalistic scenario datasets, testing examples on trajectory prediction model demonstrate the effectiveness and advantages of our framework. VistaScenario can provide solid support for the usage and indexing of scenario data, further promote the development of intelligent vehicles and transport automation.
... As shown in Fig. 1, we can take corner driving scenarios as prompts to generate more challenging scenarios via LLM in parallel framework [27], [28], [29], which achieves virtual-real interaction with descriptive learning, prescriptive learning, and predictive learning [30], [31]. LLMs have demonstrated the powerful semantic understanding and reasoning capabilities, which have great potential in the field of autonomous driving [32], [33]. The benefits of scenario generation via LLMs mainly include the following ones. ...
Article
Full-text available
Scenario engineering plays a vital role in various Industry 5.0 applications. In the field of autonomous driving systems, driving scenario data are important for the training and testing of critical modules. However, the corner scenario cases are usually rare and necessary to be extended. Existing methods cannot handle the interpretation and reasoning of the generation process well, which reduces the reliability and usability of the generated scenarios. With the rapid development of Foundation Models, especially the large language model (LLM), we can conduct scenario generation via more powerful tools. In this article, we propose LLMScenario, a novel LLM-driven scenario generation framework, which is composed of scenario prompt engineering, LLM scenario generation, and evaluation feedback tuning. The minimum scenario description specific to LLM is given by scenario analysis and ablation studies. We also appropriately design the score functions in terms of reality and rarity to evaluate the generated scenarios. The model performance is further enhanced through chain-of-thoughts and experiences. Different LLMs are also compared with our framework. Experimental results on naturalistic datasets demonstrate the effectiveness of LLMScenario, which can provide solid support for scenario engineering in Industry 5.0.
... Most recently, some pioneering studies have effectively utilized Large Language Models (LLMs) in diverse applications such as perceiving driving risks from traffic images (Driessen et al. 2023), analyzing traffic data (Zhang et al. 2023b, Zhang et al. 2024, and enhancing the functionality of autonomous vehicles (Mao et al. 2023, Wang et al. 2023b. While these studies represent significant advances, there is still room to further harness the full potential of LLMs in these areas. ...
Article
Full-text available
Driver behavior is a critical factor in driving safety, making the development of sophisticated distraction classification methods essential. Our study presents a Distracted Driving Classification (DDC) approach utilizing a visual Large Language Model (LLM), named the Distracted Driving Language Model (DDLM). The DDLM introduces whole-body human pose estimation to isolate and analyze key postural features-head, right hand, and left hand-for precise behavior classification and better interpretability. Recognizing the inherent limitations of LLMs, particularly their lack of logical reasoning abilities, we have integrated a reasoning chain framework within the DDLM, allowing it to generate clear, reasoned explanations for its assessments. Tailored specifically with relevant data, the DDLM demonstrates enhanced performance, providing detailed, context-aware evaluations of driver behaviors and corresponding risk levels. Notably outperforming standard models in both zero-shot and few-shot learning scenarios, as evidenced by tests on the 100-Driver dataset, the DDLM stands out as an advanced tool that promises significant contributions to driving safety by accurately detecting and analyzing driving distractions.
Article
The advent of Scenarios Engineering (SE) paves the way to a new era of intelligent vehicles (IVs), driven by Artificial Intelligence (AI)-enabled strategies. It aims at shaping the IVs to be a form that is more relevant to the underlying scenario, thereby accomplishing validation, verification (V&V), and calibration, certification (C&C) of each vehicle. However, such improved capabilities relies on the accumulation and analysis of an unprecedented volume of scenarios. Recently, Sora and other video generation models have opened up new horizons for Imaginative Intelligence. As an extension of TIV-DHW (Distributed/Decentralized Hybrid Workshop) on SE, this letter discusses the potential of Sora to change the scenario generation process by reducing physical shooting, increasing extreme scenario generation, thereby enabling more comprehensive training and testing of IVs. This letter also analyzes the limitations of Sora in accurately model physics and understand cause and effect, which may affect its effectiveness in SE applications. Last, through a comprehensive outlook, this letter aims to provide a potential direction for the development of Sora-like AI technology, thereby promoting the safety, efficiency, reliability, and sustainability of IVs.
Article
The integration of language descriptions or prompts with Large Language Models (LLMs) into visual tasks is currently a focal point in the advancement of autonomous driving. This study has showcased notable advancements across various standard datasets. Nevertheless, the progress in integrating language prompts faces challenges in unstructured scenarios, primarily due to the limited availability of paired data. To address this challenge, we introduce a groundbreaking language prompt set called "UnstrPrompt." This prompt set is derived from three prominent unstructured autonomous driving datasets: IDD, ORFD, and AutoMine, collectively comprising a total of 6K language descriptions. In response to the distinctive features of unstructured scenarios, we have developed a structured approach for prompt generation, encompassing three key components: scene, road, and instance. Additionally, we provide a detailed overview of the language generation process and the validation procedures. We conduct tests on segmentation tasks, and our experiments have demonstrated that text-image fusion can improve accuracy by more than 3% on unstructured data. Additionally, our description architecture outperforms the generic urban architecture by more than 0.1%. This work holds the potential to advance various aspects such as interaction and foundational models in this scenario.
Article
The latest developments in parallel driving foreshadow the possibility of delivering intelligence across organizations using foundation models. As is well-known, there are limitations in scenario acquisition in the field of intelligent vehicles (IV), such as efficiency, diversity, and complexity, which hinder in-depth research of vehicle intelligence. To address this issue, this manuscript draws inspiration from scenarios engineering, parallel driving and introduces a pioneering framework for scenario generation, leveraging the ChatGPT, denoted as SeGPT. Within this framework, we define a trajectory scenario and design prompts engineering to generate complex and challenging scenarios. Furthermore, SeGPT, in combination with “Three Modes”, foundation models, vehicle operating system, and other advanced infrastructure, holds the potential to achieve higher levels of autonomous driving. Experimental outcomes substantiate SeGPT's adeptness in producing a spectrum of varied scenarios, underscoring its potential to augment the development of trajectory prediction algorithms. These findings mark significant progress in the domain of scenario generation, also pointing towards new directions in the research of vehicle intelligence and scenarios engineering.
Article
This letter is the third report from a series of IEEE TIV's decentralized and hybrid workshops (DHWs) on intelligent vehicles for education (IV4E). Autonomous racing serves as a vital platform for nurturing engineering talents among university students, contributing to the development of skills essential for the intelligent vehicle industry. This letter investigates how recent emerging techniques, such as large language models (LLMs) and the Metaverse, can contribute to organizing IV4E-oriented autonomous racing events. Among these DHWs, scholars from diverse fields have collectively explored the integration of LLMs and the Metaverse into autonomous racing for educational purposes. The discussions emphasize the role of Metaverse in creating dynamic and immersive training virtual reality platforms and the role of LLMs in enhancing race commentary and the spectator experience. Within this context, the Metaverse introduces complex scenarios to the racetrack, maintaining suspense about the winning team until a race's final moment. This dynamic feature excites the race and motivates the participating teams to intensify their competition efforts. LLMs facilitate personalized commentary, inspiring spectators to become future participants in these races. Our DHWs highlighted a future in which technology, autonomy, and education intersect, fostering inclusive, educational, and engaging autonomous racing events.