计算机科学
解码方法
保险丝(电气)
图像(数学)
语义学(计算机科学)
编码器
语义计算
人工智能
水准点(测量)
融合
自然语言处理
语义压缩
对偶(语法数字)
模式识别(心理学)
情报检索
算法
语义技术
语义网
程序设计语言
艺术
语言学
哲学
文学类
大地测量学
地理
电气工程
工程类
操作系统
作者
Pufen Zhang,Peng Shi,Song Zhang
标识
DOI:10.1109/icme55011.2023.00012
摘要
In previous fine-grained image recognition (FGIR) methods, the single global or local semantic fusion view may not be comprehensive to reveal the semantic associations between image and text. Besides, the encoding fusion strategy cannot fuse the semantics finely because the low-order text semantic dependence and the irrelevant semantic concepts are fused. To address these issues, a novel Dual-Semantic Decoding Fusion Networks (2S-DFN) is proposed for FGIR. Specifically, a multilayer text semantic encoder is first constructed to extract the higher-order semantics dependence among text. To obtain sufficient semantic association, two decoding semantic fusion streams are symmetrically designed from the global and local perspectives. Moreover, by decoding way to implant text features to semantic fusion layer as well as cascading it deeply, two streams fuse the semantics of text and image finely. Extensive experiments demonstrate that the effectiveness of the proposed method and 2S-DFN attains the state-of-the-art results on two benchmark datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI