Robust LiDAR-Camera Alignment With Modality Adapted Local-to-Global Representation

计算机科学人工智能激光雷达计算机视觉杠杆（统计）模式识别（心理学）遥感地质学

作者

Angfan Zhu,Yang Xiao,Chengxin Liu,Zhiguo Cao

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2022-08-08 卷期号：33 (1): 59-73 被引量：10

标识

DOI：10.1109/tcsvt.2022.3197212

摘要

LiDAR-Camera alignment (LCA) is an important preprocessing procedure for fusing LiDAR and camera data. For it, one key issue is to extract unified cross-modality representation for characterizing the heterogeneous LiDAR and camera data effectively and robustly. The main challenge is to resist the modality gap and visual data degradation during feature learning, while still maintaining strong representative power. To address this, a novel modality adapted local-to-global representation learning method is proposed. The research efforts are paid in 2 main folders via modality adaptation and capturing global spatial context. First for modality gap resistance, LiDAR and camera data is projected into the same depth map domain for unified representation learning. Particularly, LiDAR data is converted to depth map according to pre-acquired extrinsic parameters. Thanks to the recent advantage of deep learning based monocular depth estimation, camera data is transformed into depth map in data driven manner, which is jointly optimized with LCA. Secondly to capture global spatial context, ViT (vision transformer) is introduced to LCA. The concept of LCA token is proposed for aggregating the local spatial patterns to form global spatial representation with transformer encoding. And, it is shared by all the samples. In this way, it can involve global sample-level information to leverage generalization ability. The experiments on KITTI dataset verify superiority of our proposition. Furthermore, the proposed approach is more robust to camera data degeneration (e.g., imaging blurring and noise) often faced by the practical applications. Under some challenging test cases, the performance advancement of our method is over

$1.9~cm$

/4.1° on translation / rotation error. While our model size (8.77M) is much smaller than existing methods (e.g., LCCNet of 66.75M). The source code will be released at https://github.com/Zaf233/RLCA upon acceptance.

求助该文献

最长约 10秒，即可获得该文献文件

Robust LiDAR-Camera Alignment With Modality Adapted Local-to-Global Representation

今日热心研友