人工智能
计算机科学
体素
计算机视觉
目标检测
模式识别(心理学)
对象(语法)
作者
Lue Fan,Wang Feng,Naiyan Wang,Zhaoxiang Zhang
标识
DOI:10.1109/tpami.2024.3502456
摘要
LiDAR-based fully sparse architecture has gained increasing attention. FSDv1 stands out as a representative work, achieving impressive efficacy and efficiency, albeit with intricate structures and handcrafted designs. In this paper, we present FSDv2, an evolution that aims to simplify the previous FSDv1 and eliminate the ad-hoc heuristics in its handcrafted instance-level representation, thus promoting better universality. To this end, we introduce virtual voxels, taking over the clustering-based instance segmentation in FSDv1. Virtual voxels not only address the notorious issue of the Center Feature Missing in fully sparse detectors but also endow the framework with a more elegant and streamlined approach. Besides, we develop a suite of components to complement the virtual voxel mechanism, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy. We conduct experiments on three large-scale datasets: Waymo Open Dataset, Argoverse 2 dataset, and nuScenes dataset. Our results showcase state-of-the-art performance on all three datasets, highlighting the superiority of FSDv2 in long-range scenarios and its universality in achieving competitive performance across diverse scenarios. Moreover, we provide comprehensive experimental analysis to understand the workings of FSDv2. To facilitate further research, we have open-sourced the full code at https://github.com/tusen-ai/SST.
科研通智能强力驱动
Strongly Powered by AbleSci AI