粒度
水准点(测量)
计算机科学
分割
人工智能
自然语言处理
像素
深度学习
计算机视觉
图像分割
模式识别(心理学)
地理
地图学
操作系统
作者
Yu Zheng,Fugen Zhou,Shangying Liang,Wentao Song,Xiangzhi Bai
标识
DOI:10.1109/tits.2023.3300038
摘要
Video semantic segmentation has achieved great success, which is significant for road scene understanding. However, semantic segmentation remains challenging in poor illumination and inclement weather. Thermal camera, highly invariant to light and highly penetrating to rain and fog, enables semantic segmentation to work under challenging conditions. Thus, this paper explores semantic segmentation in thermal videos to broaden the scope of the application of road scene understanding. We offer the first thermal video semantic segmentation dataset TVSS including 1695 thermal videos with 50850 frames in road scenes. It is available at: https://xzbai.buaa.edu.cn/datasets.html . TVSS is finely annotated by 17 categories at the frame rate of 1fps, with a labeled pixel density of 98.9%. Existing video semantic segmentation methods rely on the amount of labels and the representation power of backbones, which cannot achieve ideal results on thermal videos. Thus, we introduce a multi-granularity contrastive learning based thermal video semantic segmentation model (MGCL), which explores the abundant unlabeled frames to boost the supervised segmentation. Specifically, MGCL constructs multi-granularity self-supervised signals on unlabeled thermal videos by contrastive learning, including the intra-frame context generalization loss, the intra-clip temporal consistency loss, and the inter-video category discrimination loss. In addition, a hard anchor sampling strategy is introduced to focus on hard-classify pixels for further performance improvement. Extensive experiments on TVSS demonstrate the superior performance of MGCL in both accuracy and efficiency. Compared to the 12 state-of-the-art semantic segmentation methods, MGCL achieves 2.8% to 8.1% gains in mIoU performance while maintaining the inference speed.
科研通智能强力驱动
Strongly Powered by AbleSci AI