人工智能
计算机科学
目标检测
情态动词
计算机视觉
对象(语法)
模式识别(心理学)
突出
化学
高分子化学
作者
Jie Wang,Xiangji Kong,Nana Yu,Zihao Zhang,Yahong Han
标识
DOI:10.1109/tcsvt.2024.3514897
摘要
Bi-modal (RGB-T and RGB-D) salient object detection (SOD) aims to enhance detection performance by leveraging the complementary information between modalities. While significant progress has been made, two major limitations persist. Firstly, mainstream fully supervised methods come with a substantial burden of manual annotation, while weakly supervised or unsupervised methods struggle to achieve satisfactory performance. Secondly, the indiscriminate modeling of local detailed information (object edge) and global contextual information (object body) often results in predicted objects with incomplete edges or inconsistent internal representations. In this work, we propose a novel paradigm to effectively alleviate the above limitations. Specifically, we first enhance the consistency regularization strategy to build a basic semi-supervised architecture for the bi-modal SOD task, which ensures that the model can benefit from massive unlabeled samples while effectively alleviating the annotation burden. Secondly, to ensure detection performance (i.e., complete edges and consistent bodies), we disentangle the SOD task into two parallel sub-tasks: edge integrity fusion prediction and body consistency fusion prediction. Achieving these tasks involves two key steps: 1) the explicitly disentangling scheme decouples salient object features into edge and body features, and 2) the exclusively fusing scheme performs exclusive integrity or consistency fusion for each of them. Eventually, our approach demonstrates significant competitiveness compared to 26 fully supervised methods, while effectively alleviating 90% of the annotation burden. Furthermore, it holds a substantial advantage over 15 non-fully supervised methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI