计算机科学
人工智能
数据挖掘
编码器
分割
变压器
特征提取
安全性令牌
模式识别(心理学)
计算机网络
物理
量子力学
电压
操作系统
作者
Bo Guo,Liwei Deng,Ruisheng Wang,Wenchao Guo,Alex Hay‐Man Ng,Wenfeng Bai
标识
DOI:10.1109/tgrs.2023.3322579
摘要
In this work, we implement a hybrid method to utilize sufficient information by aggregating both fine-grained and globally contextual features for point cloud semantic segmentation with a hierarchical network. By surpassing the defects of convolution operation mainly for extracting low-level features, we combine higher-level cross-attention based Transformer to investigate the importance of long-range relations together with position embedding for multiscale feature representation. Specifically, adding a learnable token to the feature sequence of a layer, a Transformer encoder is first implemented with limited scope to embed these features. Furthermore, instead of performing all-to-all attention, we merely fuse tokens spanning various scales. To improve efficiency, we propose a simple yet efficient token-fusing architecture based on cross-attention, in which the computation of attention maps can be restricted within linear time by only using a token to calculate the query. The cross-attention module can be efficiently aggregated in a multiscale network to further enlarge the scope of the receptive field for attention. Experiments show that our MCTNet achieves promising results on three largest point cloud datasets, DALES, DublinCity and S3DIS datasets. For the DALES benchmark dataset, MCTNet improves the mean intersection-over-union (mIoU) to 83.3% and the overall accuracy (OA) to 98.3%, which outperforms other existing baselines. We also perform abundant ablation studies on various attention and normalization modules and discuss the effect of parameters to validate the descriptive power of cross-attention module and provide an understanding of how long-range dependency can be used to learn fair and unbiased features.
科研通智能强力驱动
Strongly Powered by AbleSci AI