Decision-Making with Lightweight Confidence-Aware Language Model for Autonomous Driving
2026-05-25 • Robotics
Robotics
AI summaryⓘ
The authors developed a new method to help self-driving cars make smart decisions faster and with fewer computer resources. They use a smaller, simpler language model that learns from a team of agents working together to decide actions, check confidence, and explain choices. This learning process helps the smaller model understand complex situations like big models but runs quicker. Tests show their method works well and is fast, making it better for real-world driving systems with limited computing power.
Large Language ModelsMultimodal LLMsAutonomous DrivingDecision-Making FrameworkConfidence-Aware ModelChain-of-Thought ReasoningModel DistillationDual-Head ArchitectureRetrieval Augmented GenerationnuPlan Benchmark
Authors
Ruoyu Yao, Ruiguo Zhong, Pei Liu, Mingxing Peng, Rui Yang, Jun Ma
Abstract
Large Language Models (LLMs) and Multimodal LLMs (MLLMs) have demonstrated immense potential in autonomous driving (AD) by offering human-like reasoning and open-world generalization. However, the excessive computational overhead and high inference latency of these massive models severely hinder their deployment in resource-constrained AD systems. To address this challenge, we propose a novel decision-making framework utilizing a lightweight confidence-aware language model, which bridges the gap between complex multimodal intention reasoning and efficient inference. Specifically, we design a multi-agent collaborative workflow, comprising action voting, confidence assessment, and summarization agents, to generate high-quality, confidence-annotated decision demonstrations via explicit Chain-of-Thought (CoT) reasoning. These demonstrations are then distilled into a lightweight language model featuring a dual-head architecture, enabling the joint prediction of decision probabilities and the generation of textual rationales. The distillation is realized via a confidence-aware fine-tuning strategy coupled with Retrieval Augmented Generation (RAG) to enhance the model's adaptability and data efficiency. Comprehensive closed-loop experiments on the nuPlan benchmark demonstrate that our approach achieves state-of-the-art (SOTA) success rates in both regular and long-tail scenarios while maintaining low inference latency.