Semantic Segmentation in Thermal Videos: A New Benchmark and Multi-Granularity Contrastive Learning-Based Framework

粒度水准点（测量）计算机科学分割人工智能自然语言处理像素深度学习计算机视觉图像分割模式识别（心理学）地理地图学操作系统

作者

Yu Zheng,Fugen Zhou,Shangying Liang,Wentao Song,Xiangzhi Bai

出处

期刊：IEEE Transactions on Intelligent Transportation Systems [Institute of Electrical and Electronics Engineers]
日期：2023-08-09 卷期号：24 (12): 14783-14799 被引量：1

标识

DOI：10.1109/tits.2023.3300038

摘要

Video semantic segmentation has achieved great success, which is significant for road scene understanding. However, semantic segmentation remains challenging in poor illumination and inclement weather. Thermal camera, highly invariant to light and highly penetrating to rain and fog, enables semantic segmentation to work under challenging conditions. Thus, this paper explores semantic segmentation in thermal videos to broaden the scope of the application of road scene understanding. We offer the first thermal video semantic segmentation dataset TVSS including 1695 thermal videos with 50850 frames in road scenes. It is available at: https://xzbai.buaa.edu.cn/datasets.html . TVSS is finely annotated by 17 categories at the frame rate of 1fps, with a labeled pixel density of 98.9%. Existing video semantic segmentation methods rely on the amount of labels and the representation power of backbones, which cannot achieve ideal results on thermal videos. Thus, we introduce a multi-granularity contrastive learning based thermal video semantic segmentation model (MGCL), which explores the abundant unlabeled frames to boost the supervised segmentation. Specifically, MGCL constructs multi-granularity self-supervised signals on unlabeled thermal videos by contrastive learning, including the intra-frame context generalization loss, the intra-clip temporal consistency loss, and the inter-video category discrimination loss. In addition, a hard anchor sampling strategy is introduced to focus on hard-classify pixels for further performance improvement. Extensive experiments on TVSS demonstrate the superior performance of MGCL in both accuracy and efficiency. Compared to the 12 state-of-the-art semantic segmentation methods, MGCL achieves 2.8% to 8.1% gains in mIoU performance while maintaining the inference speed.

求助该文献

最长约 10秒，即可获得该文献文件

Semantic Segmentation in Thermal Videos: A New Benchmark and Multi-Granularity Contrastive Learning-Based Framework

今日热心研友