已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval

计算机科学 简单(哲学) 原始数据 图像融合 人工智能 融合 传感器融合 图像检索 图像(数学) 计算机视觉 模式识别(心理学) 情报检索 哲学 语言学 认识论 程序设计语言
作者
Haokun Wen,Xuemeng Song,Xiaolin Chen,Yinwei Wei,Liqiang Nie,Tat‐Seng Chua
标识
DOI:10.1145/3626772.3657727
摘要

Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal query, i.e., a reference image paired with corresponding modification text. Recent CIR studies leverage vision-language pre-trained (VLP) methods as the feature extraction backbone, and perform nonlinear feature-level multimodal query fusion to retrieve the target image. Despite the promising performance, we argue that their nonlinear feature-level multimodal fusion may lead to the fused feature deviating from the original embedding space, potentially hurting the retrieval performance. To address this issue, in this work, we propose shifting the multimodal fusion from the feature level to the raw-data level to fully exploit the VLP model's multimodal encoding and cross-modal alignment abilities. In particular, we introduce a Dual Query Unification-based Composed Image Retrieval framework (DQU-CIR), whose backbone simply involves a VLP model's image encoder and a text encoder. Specifically, DQU-CIR first employs two training-free query unification components: text-oriented query unification and vision-oriented query unification, to derive a unified textual and visual query based on the raw data of the multimodal query, respectively. The unified textual query is derived by concatenating the modification text with the extracted reference image's textual description, while the unified visual query is created by writing the key modification words onto the reference image. Ultimately, to address diverse search intentions, DQU-CIR linearly combines the features of the two unified queries encoded by the VLP model to retrieve the target image. Extensive experiments on four real-world datasets validate the effectiveness of our proposed method.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
忧心的香水完成签到,获得积分10
2秒前
领导范儿应助lanbou采纳,获得10
3秒前
4秒前
领导范儿应助海绵宝宝采纳,获得10
4秒前
6秒前
6秒前
英俊的铭应助忧心的香水采纳,获得10
6秒前
7秒前
星辰大海应助狂野一手采纳,获得10
7秒前
8秒前
轻松的梦竹完成签到 ,获得积分10
9秒前
9秒前
小猪吹风发布了新的文献求助30
9秒前
11秒前
勤勤恳恳写论文完成签到 ,获得积分10
11秒前
853225598发布了新的文献求助10
11秒前
知行合一发布了新的文献求助50
13秒前
13秒前
14秒前
15秒前
王小果完成签到,获得积分10
17秒前
殷勤的岱周完成签到 ,获得积分10
17秒前
酷波er应助Orange采纳,获得10
18秒前
18秒前
19秒前
20秒前
海绵宝宝发布了新的文献求助10
20秒前
研友_LMBPXn发布了新的文献求助10
23秒前
蜗牛完成签到 ,获得积分10
23秒前
科目三应助科研通管家采纳,获得10
26秒前
OsamaKareem应助科研通管家采纳,获得10
26秒前
星辰大海应助科研通管家采纳,获得10
26秒前
OsamaKareem应助科研通管家采纳,获得10
27秒前
小马甲应助科研通管家采纳,获得10
27秒前
27秒前
27秒前
27秒前
28秒前
xiankanyun完成签到,获得积分10
29秒前
30秒前
高分求助中
Overcoming Stigma and Bias in Obesity Management 800
Malcolm Fraser : a biography 700
Signals, Systems, and Signal Processing 610
Bounds for Statistical Estimation in Semiparametric Models 500
Climate change and sports: Statistics report on climate change and sports 500
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
A Foreign Missionary on the Long March: The Unpublished Memoirs of Arnolis Hayman of the China Inland Mission 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6470523
求助须知:如何正确求助?哪些是违规求助? 8274996
关于积分的说明 17644798
捐赠科研通 5547812
什么是DOI,文献DOI怎么找? 2908904
邀请新用户注册赠送积分活动 1885789
关于科研通互助平台的介绍 1735691