支柱
缩放比例
对象(语法)
人工智能
计算机科学
计算机视觉
模式识别(心理学)
数学
工程类
几何学
结构工程
作者
Weixin Mao,Tiancai Wang,Diankun Zhang,Junjie Yan,Osamu Yoshie
标识
DOI:10.1109/tiv.2024.3386576
摘要
Pillar-based 3D object detectors mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain.This paper shows the effectiveness of 2D backbone scaling and pretraining for pillar-based 3D object detectors.For better backbone scaling, we first introduce several design principles for point cloud backbone, to tackle the sparsity of point cloud and improve the effective receptive field.The backbone scaling is achieved by adaptively designed based on the model size.For backbone pretraining, we propose a weight adaptation module, to transfer the image knowledge obtained by pretraining on large-scale image datasets for the point cloud.Our proposed pillar-based detector, termed PillarNeSt, outperforms the existing 3D object detectors by a large margin on the nuScenes and Argoversev2 datasets.Code is released at https://github.com/WayneMao/PillarNeSt.
科研通智能强力驱动
Strongly Powered by AbleSci AI