计算机科学
编码器
分割
残余物
人工智能
点云
变压器
判别式
模式识别(心理学)
计算机视觉
算法
工程类
电压
电气工程
操作系统
作者
Hui-Xian Cheng,Xuefei Han,Guoqiang Xiao
标识
DOI:10.1109/tits.2023.3248117
摘要
Effective and efficient 3D semantic segmentation from large-scale LiDAR point cloud is a fundamental problem in the field of autonomous driving. In this paper, we present Transformer-Range-View Network (TransRVNet), a novel and powerful projection-based CNN-Transformer architecture to infer point-wise semantics. First, a Multi Residual Channel Interaction Attention Module (MRCIAM) is introduced to capture channel-level multi-scale feature and model intra-channel, inter-channel correlations based on attention mechanism. Then, in the encoder stage, we use a well-designed Residual Context Aggregation Module (RCAM), including a residual dilated convolution structure and a context aggregation module, to fuse information from different receptive fields while reducing the impact of missing points. Finally, a Balanced Non-square-Transformer Module (BNTM) is employed as fundamental component of decoder to achieve locally feature dependencies for more discriminative feature learning by introducing the non-square shifted window strategy. Extensive qualitative and quantitative experiments conducted on challenging SemanticKITTI and SemanticPOSS benchmarks have verified the effectiveness of our proposed technique. Our TransRVNet presents superior performance over most existing state-of-the-art approaches. The source code and trained model are available at https://github.com/huixiancheng/TransRVNet .
科研通智能强力驱动
Strongly Powered by AbleSci AI