From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

From Words🤬 to Wheels🚗 : Automated Style-Customized Policy Generation for Autonomous Driving

¹Data Science and Analytics Thrust, The Hong Kong University of Science and Technology (Guangzhou), China
²Intelligent Transportation Thrust, The Hong Kong University of Science and Technology (Guangzhou)
³Department of Automation, Tianjin University
⁴Shanghai Artificial Intelligence Laboratory
Corresponding author

Abstract

Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitations in delivering command-based driving styles. Most existing methods either rely on predefined driving styles that require expert input or use data-driven techniques like Inverse Reinforcement Learning to extract styles from driving data. These approaches, though effective in some cases, face challenges: difficulty obtaining specific driving data for style matching (e.g., in Robotaxis), inability to align driving style metrics with user preferences, and limitations to pre-existing styles, restricting customization and generalization to new commands. This paper introduces Words2Wheels, a framework that automatically generates customized driving policies based on natural language user commands. Words2Wheels employs a Style-Customized Reward Function to generate a Style-Customized Driving Policy without relying on prior driving data. By leveraging large language models and a Driving Style Database, the framework efficiently retrieves, adapts, and generalizes driving styles. A Statistical Evaluation module ensures alignment with user preferences. Experimental results demonstrate that Words2Wheels outperforms existing methods in accuracy, generalization, and adaptability, offering a novel solution for customized AV driving behavior.

Words2Wheels Framework

Workflow of Words2Wheels: When a natural language command is received, the system matches it with an online-usable style from the database. Style Reward generation and policy training run simultaneously in the backend, resulting in a new Style Policy that may outperform the existing one and replace it;

Driving Style Database: This conceptual database stores Style Rewards (initially from both data-driven and human-designed methods), Style Policies, and their statistics. It manages the increasing variety of driving styles and supports the automated policy customization;

Statistical Evaluation Module: This module ensures that the generated driving styles closely align with user commands by evaluating them against natural driving behaviors.

As Words2Wheels operates, new Style Policies generated by the LLM expand the database, creating a broader range of driving styles.

The Driving Style Database in Words2Wheels includes Style Rewards, Style Policies, and analytical data. Style Rewards are programming codes, and Style Policies are pre-trained neural networks. Analytical data is saved in JSON format and can be embedded as high-dimensional vectors for efficient retrieval. The LLM can select existing Style Policies or generate new ones using reward functions as templates. Pre-existing reward functions enhance the LLM's output quality, and research on driving reward design provides valuable references. As Words2Wheels operates, new Style Policies expand the database, improving efficiency and reducing reliance on RL training. User commands are also stored for fuzzy memory functionality.

Statistical Evaluation

The Statistical Evaluation module generates data on driving behavior to help the LLM assess how well a Style Policy aligns with user commands. A reserved test dataset simulates driving behavior and collects metrics like speed, acceleration, and spacing. These metrics, informed by prior research, are used by the LLM to evaluate Style Policies. The LLM uses a Chain-of-Thought approach to select relevant metrics and compare policies to natural driving data. Customizing the test set allows for expanded functionality and precise analyses, such as fine-tuning based on specific Style Policies and spatio-temporal filtering.

BibTeX

@misc{han2024words,
  title={From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving},
  author={Xu Han and Xianda Chen and Zhenghan Cai and Pinlong Cai and Meixin Zhu and Xiaowen Chu},
  year={2024},
  eprint={2409.11694},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

From Words🤬 to Wheels🚗 : Automated Style-Customized Policy Generation for Autonomous Driving

The Words2Wheels framework automatically customize driving policies based on user commands.
It employs a Style-Customized Reward Function (Style Reward) to generate a Style-Customized Driving Policy (Style Policy).

Abstract

Words2Wheels Framework

Automated Policy Generation

Driving Style Database

Statistical Evaluation

Results

Customizing Driving Style

Generation Capability

Human-in-the-Loop Comparisons

Generalization Capability

Fuzzy Memory

BibTeX

From Words🤬 to Wheels🚗 : Automated Style-Customized Policy Generation for Autonomous Driving

The Words2Wheels framework automatically customize driving policies based on user commands. It employs a Style-Customized Reward Function (Style Reward) to generate a Style-Customized Driving Policy (Style Policy).

Abstract

Words2Wheels Framework

Automated Policy Generation

Driving Style Database

Statistical Evaluation

Results

Customizing Driving Style

Generation Capability

Human-in-the-Loop Comparisons

Generalization Capability

Fuzzy Memory

BibTeX

The Words2Wheels framework automatically customize driving policies based on user commands.
It employs a Style-Customized Reward Function (Style Reward) to generate a Style-Customized Driving Policy (Style Policy).