计算机科学
人工智能
目标检测
机器学习
一般化
冗余(工程)
过程(计算)
特征学习
突出
稳健性(进化)
视觉对象识别的认知神经科学
领域(数学分析)
特征(语言学)
特征提取
利用
对象(语法)
模式
深度学习
上下文模型
数据建模
阿达布思
领域知识
监督学习
模式识别(心理学)
学习对象
二元分类
计算机视觉
成对比较
缩小
上下文图像分类
主动学习(机器学习)
作者
Ziyang Luo,Nian Liu,Xuguang Yang,Dingwen Zhang,Deng-Ping Fan,Fahad Shahbaz Khan,Junwei Han
标识
DOI:10.1109/tpami.2025.3635136
摘要
Salient object detection (SOD) and camouflaged object detection (COD) are related but distinct binary mapping tasks, each involving multiple modalities that share commonalities while maintaining unique characteristics. Existing approaches often rely on complex, task-specific architectures, leading to redundancy and limited generalization. Our previous work, VSCode, introduced a generalist model that effectively handles four SOD tasks and two COD tasks. VSCode leveraged VST as its foundation model and incorporated 2D prompts within an encoder-decoder framework to capture domain and task-specific knowledge, utilizing a prompt discrimination loss to optimize the model. Building upon the proven effectiveness of our previous work VSCode, we identify opportunities to further strengthen generalization capabilities through focused modifications in model design and optimization strategy. To unlock this potential, we propose VSCode-v2, an extension that introduces a Mixture of Prompt Experts (MoPE) layer to generate adaptive prompts. We also redesign the training process into a two-stage approach: first learning shared features across tasks, then capturing specific characteristics. To preserve knowledge during this process, we incorporate distillation from our conference version model. Furthermore, we propose a contrastive learning mechanism with data augmentation to strengthen the relationships between prompts and feature representations. VSCode-v2 demonstrates balanced performance improvements across six SOD and COD tasks. Moreover, VSCode-v2 effectively handles various multimodal inputs and exhibits zero-shot generalization capability to novel tasks, such as RGB-D Video SOD.
科研通智能强力驱动
Strongly Powered by AbleSci AI