计算机科学
人工智能
稳健性(进化)
块(置换群论)
适应性
离群值
计算机视觉
弹道
分割
机器学习
视觉推理
感知
高级驾驶员辅助系统
地形
运动规划
可视化
推理系统
空间关系
特征(语言学)
意外事件
基于案例的推理
视觉感受
作者
Liangdong Zhang,Yiming Nie,Haoyang Li,Fanjie Kong,Baobao Zhang,Shigui Huang,Kai Fu,Chen Min,Liang Xiao
出处
期刊:Cornell University - arXiv
日期:2026-01-07
标识
DOI:10.48550/arxiv.2601.03519
摘要
Efficient trajectory planning in off-road terrains presents a formidable challenge for autonomous vehicles, often necessitating complex multi-step pipelines. However, traditional approaches exhibit limited adaptability in dynamic environments. To address these limitations, this paper proposes OFF-EMMA, a novel end-to-end multimodal framework designed to overcome the deficiencies of insufficient spatial perception and unstable reasoning in visual-language-action (VLA) models for off-road autonomous driving scenarios. The framework explicitly annotates input images through the design of a visual prompt block and introduces a chain-of-thought with self-consistency (COT-SC) reasoning strategy to enhance the accuracy and robustness of trajectory planning. The visual prompt block utilizes semantic segmentation masks as visual prompts, enhancing the spatial understanding ability of pre-trained visual-language models for complex terrains. The COT- SC strategy effectively mitigates the error impact of outliers on planning performance through a multi-path reasoning mechanism. Experimental results on the RELLIS-3D off-road dataset demonstrate that OFF-EMMA significantly outperforms existing methods, reducing the average L2 error of the Qwen backbone model by 13.3% and decreasing the failure rate from 16.52% to 6.56%.
科研通智能强力驱动
Strongly Powered by AbleSci AI