计算机科学
人工智能
杠杆(统计)
单眼
计算机视觉
任务(项目管理)
语义学(计算机科学)
集合(抽象数据类型)
钥匙(锁)
语义映射
任务分析
工程类
计算机安全
系统工程
程序设计语言
作者
Haihong Xiao,Hongbin Xu,Wenxiong Kang,Yuqiong Li
标识
DOI:10.1109/tits.2023.3344806
摘要
We study outdoor 3D scene understanding, a challenging task demanding the intelligent system to infer both geometry and semantics from a single-view image – a critical skill for autonomous vehicles to navigate in the real 3D world. Towards this end, we present an instance-aware monocular semantic scene completion framework. To the best of our knowledge, this is the first endeavor specifically targeting the challenge of instance perception in the camera-based semantic scene completion task. Our method consists of two stages. In stage I, we design a region-based VQ-VAE network, providing an effective solution for 3D occupancy prediction. In stage II, we first introduce an instance-aware attention module, explicitly incorporating instance-level cues captured from mask images to enhance the instance features in RGB images. Then we leverage the deformable cross-attention to aggregate image features corresponding to each voxel query and utilize the deformable self-attention to refine query proposals. We combine these key ingredients and evaluate our method on two challenging datasets, namely SemanticKITTI and SSCBench-KITTI-360. The results unequivocally demonstrate the superiority of our proposed method over the state-of-the-art VoxFormer-S. Specifically, our method surpasses VoxFormer-S by 0.22 IoU and 0.72 mIoU on the validation set and achieves an impressive improvement of 3.04 IoU and 1.06 mIoU on the SSCBench-KITTI-360 validation set. Meanwhile, our approach ensures accurate perception of critical instances, thereby exhibiting its exceptional performance and potential for practical deployment.
科研通智能强力驱动
Strongly Powered by AbleSci AI