点云
计算机科学
自编码
人工智能
变压器
深度学习
解码方法
模式识别(心理学)
特征学习
计算机视觉
机器学习
算法
量子力学
电压
物理
作者
Jiaming Liu,Yue Wu,Maoguo Gong,Zhixiao Liu,Qiguang Miao,Wenping Ma
标识
DOI:10.1109/tmm.2023.3317998
摘要
Masked autoencoder (MAE) is a recently widely used self-supervised learning method that has achieved great success in NLP and computer vision. However, the potential advantages of masked pre-training for point cloud understanding have not been fully explored. There is preliminary work on MAE-based point clouds using the Transformer architecture to explore low-level geometric representations in 3D space, which is insufficient for fine-grained decoding completion and downstream tasks. Inspired by multimodality, we propose Inter-MAE, a inter-modal MAE method for self-supervised learning on point clouds. Specifically, we first use Point-MAE as a baseline to partition point clouds into random low percentage of visible and high percentage of masked point patches. Then, a standard Transformer-based autoencoder is built by asymmetric design and shifting mask operations, and latent features are learned from the visible point patches aiming to recover the masked point patches. In addition, we generate image features based on ViT after point cloud rendering to form inter-modal contrastive learning with the decoded features of the completed point patches. Extensive experiments show that the proposed Inter-MAE generates pre-trained models that are effective and exhibit superior results in various downstream tasks. For example, an accuracy of 85.4% is achieved on ScanObjectNN and 86.3% on ShapeNetPart, outperforming other state-of-the-art self-supervised learning methods. Notably, our work establishes for the first time the feasibility of applying image modality to masked point clouds. The code is publicly available at https://github.com/ywu0912/TeamCode.git
科研通智能强力驱动
Strongly Powered by AbleSci AI