清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Transformer-based local-global guidance for image captioning

计算机科学 隐藏字幕 变压器 判决 文字嵌入 嵌入 人工智能 自然语言处理 模式识别(心理学) 图像(数学) 量子力学 物理 电压
作者
Hashem Parvin,Ahmad Reza Naghsh‐Nilchi,Hossein Mahvash Mohammadi
出处
期刊:Expert Systems With Applications [Elsevier BV]
卷期号:223: 119774-119774 被引量:24
标识
DOI:10.1016/j.eswa.2023.119774
摘要

Image captioning is a difficult problem for machine learning algorithms to compress huge amounts of images into descriptive languages. The recurrent models are popularly used as the decoder to extract the caption with significant performance, while these models have complicated and inherently sequential overtime issues. Recently, transformers provide modeling long dependencies and support parallel processing of sequences compared to recurrent models. However, recent transformer-based models assign attention weights to all candidate vectors based on the assumption that all vectors are relevant and ignore the intra-object relationships. Besides, the complex relationships between key and query vectors cannot be provided using a single attention mechanism. In this paper, a new transformer-based image captioning structure without recurrence and convolution is proposed to address these issues. To this end, a generator network and a selector network to generate textual descriptions collaboratively are designed. Our work contains three main steps: (1) Design a transformer-based generator network as word-level guidance to generate next words based on the current state. (2) Train a latent space to learn the mapping of captions and images into the same embedding space to learn the text-image relation. (3) Design a selector network as sentence-level guidance to evaluate next words by assigning fitness scores to the partial captions through the embedding space. Compared with the architecture of existing methods, the proposed approach contains an attention mechanism without the dependencies of time. It executes each state to select the next best word using local–global guidance. In addition, the proposed model maintains dependencies between the sequences, and can be trained in parallel. Several experiments on the COCO and Flickr datasets demonstrate that the proposed approach can outperform various state-of-the-art models over well-known evaluation measures.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
5秒前
量子星尘发布了新的文献求助10
20秒前
steven完成签到 ,获得积分10
22秒前
研友_VZG7GZ应助彩色的芷容采纳,获得10
40秒前
minmi完成签到,获得积分10
42秒前
科研通AI5应助科研通管家采纳,获得10
53秒前
量子星尘发布了新的文献求助10
56秒前
57秒前
1分钟前
成就大白菜真实的钥匙完成签到 ,获得积分10
1分钟前
racill完成签到 ,获得积分10
1分钟前
Yolo完成签到 ,获得积分10
1分钟前
1分钟前
小马甲应助Alan采纳,获得10
1分钟前
zongzi12138完成签到,获得积分0
1分钟前
1分钟前
愛研究完成签到,获得积分10
1分钟前
情怀应助彩色的芷容采纳,获得10
1分钟前
发嗲的慕蕊完成签到 ,获得积分10
1分钟前
量子星尘发布了新的文献求助10
1分钟前
bkagyin应助彩色的芷容采纳,获得10
1分钟前
1分钟前
Strive姜完成签到 ,获得积分10
1分钟前
尔信完成签到 ,获得积分10
1分钟前
万默完成签到 ,获得积分10
1分钟前
量子星尘发布了新的文献求助10
1分钟前
2分钟前
2分钟前
gy完成签到,获得积分20
2分钟前
wenhuanwenxian完成签到 ,获得积分10
2分钟前
大方的笑萍完成签到 ,获得积分10
2分钟前
2分钟前
多情的忆之完成签到,获得积分10
2分钟前
大轩完成签到 ,获得积分10
2分钟前
2分钟前
量子星尘发布了新的文献求助10
2分钟前
lli发布了新的文献求助20
2分钟前
研友Bn发布了新的文献求助10
2分钟前
2分钟前
高分求助中
【提示信息,请勿应助】请使用合适的网盘上传文件 10000
The Oxford Encyclopedia of the History of Modern Psychology 1500
Green Star Japan: Esperanto and the International Language Question, 1880–1945 800
Sentimental Republic: Chinese Intellectuals and the Maoist Past 800
The Martian climate revisited: atmosphere and environment of a desert planet 800
The Psychology of Advertising (5th edition) 500
Electron microscopy study of magnesium hydride (MgH2) for Hydrogen Storage 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3865751
求助须知:如何正确求助?哪些是违规求助? 3408356
关于积分的说明 10657160
捐赠科研通 3132337
什么是DOI,文献DOI怎么找? 1727549
邀请新用户注册赠送积分活动 832351
科研通“疑难数据库(出版商)”最低求助积分说明 780242