Syntactic-Conditional Diffusion Networks for Controllable Image Captioning

隐藏字幕 计算机科学 扩散 图像(数学) 人工智能 计算机视觉 自然语言处理 热力学 物理
作者
Bing Liu,Wenjie Yang,Mingming Liu,Hao Liu,Yong Zhou,Peng Liu
出处
期刊:ACM Transactions on Multimedia Computing, Communications, and Applications [Association for Computing Machinery]
标识
DOI:10.1145/3748653
摘要

Current diffusion model-based image captioning methods generally focus on generating descriptions in a non-autoregressive manner. Nevertheless, it is not trivial to employ such generative models to control the generation of discrete words while pursuing the balance between diversity and accuracy. Inspired by the success of continuous diffusions in image captioning, we introduce the Part-of-Speech (POS) information and classifier-free guidance into the diffusion model, and propose a novel controllable image captioning model, namely POS-Conditional Diffusion Networks (POSCD-Net), which consists of a Diffusion-based POS Generator (DPG) and a Diffusion-based Caption Generator (DCG). The DPG is built to produce diverse syntactic structures for each input image. The diverse POS sequences are further regarded as the control signals of the DCG, which produces the output sentences in a conditional diffusion process. In the DCG, a syntactic control module (SCM) is designed to strengthen the alignment progressively between words and the corresponding POS tags in a cascaded manner. Furthermore, to improve the controllability of POSCD-Net, the classifier-free guidance with learnable parameters is exploited to jointly optimize both the DPG and DCG in a non-autoregressive manner. Extensive experiments on the MSCOCO dataset demonstrate that our proposed method outperforms the state-of-the-art non-autoregressive counterparts and achieves promising performance compared with the autoregressive models.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Ava应助Faye采纳,获得10
1秒前
2秒前
桓白白应助lylyzhl采纳,获得10
3秒前
科研通AI5应助陈y采纳,获得10
3秒前
3秒前
思源应助CatC采纳,获得10
3秒前
123完成签到,获得积分10
3秒前
4秒前
4秒前
4秒前
Chen完成签到,获得积分10
4秒前
5秒前
5秒前
忧虑的钻石完成签到,获得积分10
5秒前
炫潮浪子完成签到,获得积分10
6秒前
6秒前
小钻风发布了新的文献求助10
6秒前
英俊的铭应助爱吃百香果采纳,获得10
6秒前
6秒前
浮游应助科研通管家采纳,获得10
7秒前
脑洞疼应助科研通管家采纳,获得10
7秒前
小豆包发布了新的文献求助10
7秒前
7秒前
华仔应助科研通管家采纳,获得10
7秒前
小二郎应助科研通管家采纳,获得10
7秒前
科研通AI5应助科研通管家采纳,获得10
7秒前
7秒前
大个应助2333采纳,获得10
7秒前
Akim应助科研通管家采纳,获得20
7秒前
科研通AI6应助科研通管家采纳,获得10
7秒前
7秒前
8秒前
天天快乐应助科研通管家采纳,获得10
8秒前
CodeCraft应助科研通管家采纳,获得10
8秒前
8秒前
香蕉觅云应助科研通管家采纳,获得20
8秒前
搜集达人应助科研通管家采纳,获得10
8秒前
8秒前
英俊的铭应助科研通管家采纳,获得10
8秒前
李健应助科研通管家采纳,获得10
8秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Разработка технологических основ обеспечения качества сборки высокоточных узлов газотурбинных двигателей,2000 1000
Vertebrate Palaeontology, 5th Edition 510
Optimization and Learning via Stochastic Gradient Search 500
Nuclear Fuel Behaviour under RIA Conditions 500
Why America Can't Retrench (And How it Might) 400
Higher taxa of Basidiomycetes 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 催化作用 遗传学 冶金 电极 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 4689296
求助须知:如何正确求助?哪些是违规求助? 4061773
关于积分的说明 12558314
捐赠科研通 3759222
什么是DOI,文献DOI怎么找? 2076105
邀请新用户注册赠送积分活动 1104791
科研通“疑难数据库(出版商)”最低求助积分说明 983769