计算机视觉
计算机科学
人工智能
机器视觉
自然语言处理
作者
Xiangkai Shen,Lei Li,Yushan Ma,Shaofeng Xu,Jinhai Liu,Zhiming Yang,Yan Shi
标识
DOI:10.1109/tim.2025.3583364
摘要
Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this paper proposes a Vision-Language Cyclic Interaction Model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language”. Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and cross-modal interaction (CMI) strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. Experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mIoU over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI