Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting

嵌入 图像(数学) 计算机科学 安全性令牌 空格(标点符号) 计算机视觉 人工智能 医学影像学 图像处理 计算机安全 操作系统
作者
Yan Yang,Jun Yu,Zhenqi Fu,Ke Zhang,Ting Yu,Xianyun Wang,Hanliang Jiang,Junhui Lv,Qingming Huang,Weidong Han
出处
期刊:IEEE Transactions on Medical Imaging [Institute of Electrical and Electronics Engineers]
卷期号:: 1-1 被引量:1
标识
DOI:10.1109/tmi.2024.3412402
摘要

Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment between images and reports is crucial. However, the exposure bias problem in autoregressive text generation poses a notable challenge, as the model is optimized by a word-level loss function using the teacher-forcing strategy. To this end, we propose a novel Token-Mixer framework that learns to bind image and text in one embedding space for medical image reporting. Concretely, Token-Mixer enhances the cross-modal alignment by matching image-to-text generation with text-to-text generation that suffers less from exposure bias. The framework contains an image encoder, a text encoder and a text decoder. In training, images and paired reports are first encoded into image tokens and text tokens, and these tokens are randomly mixed to form the mixed tokens. Then, the text decoder accepts image tokens, text tokens or mixed tokens as prompt tokens and conducts text generation for network optimization. Furthermore, we introduce a tailored text decoder and an alternative training strategy that well integrate with our Token-Mixer framework. Extensive experiments across three publicly available datasets demonstrate Token-Mixer successfully enhances the image-text alignment and thereby attains a state-of-the-art performance. Related codes are available at https://github.com/yangyan22/Token-Mixer.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
熙熙完成签到,获得积分20
1秒前
starboy2nd完成签到,获得积分10
1秒前
菌菌发布了新的文献求助10
1秒前
1秒前
顾飞飞完成签到 ,获得积分10
2秒前
3秒前
cdercder应助炙热的寻菡采纳,获得10
4秒前
4秒前
雨霖铃发布了新的文献求助10
4秒前
llf发布了新的文献求助10
5秒前
5秒前
5秒前
6秒前
zsllj完成签到,获得积分10
6秒前
XXX987完成签到,获得积分10
6秒前
fff123完成签到,获得积分10
6秒前
7秒前
邵初蓝发布了新的文献求助10
7秒前
7秒前
顾矜应助hunajx采纳,获得10
7秒前
Jasper应助现代的东蒽采纳,获得10
8秒前
HQK完成签到,获得积分10
8秒前
带领大家发布了新的文献求助10
8秒前
8秒前
沉稳捺发布了新的文献求助10
9秒前
Hello应助研友_pnx37L采纳,获得10
9秒前
9秒前
Fishball发布了新的文献求助10
9秒前
852应助诚心小海豚采纳,获得10
9秒前
暴躁的百褶裙完成签到,获得积分10
10秒前
11秒前
星辰完成签到,获得积分10
11秒前
12秒前
科研通AI5应助Captain采纳,获得30
12秒前
今天没有哭鸭完成签到,获得积分10
12秒前
大道要熬发布了新的文献求助10
12秒前
腼腆的从安完成签到,获得积分10
12秒前
大胆睫毛膏完成签到,获得积分10
13秒前
笨笨的鬼神完成签到,获得积分10
13秒前
英姑应助粥粥卷采纳,获得10
13秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Introduction to Strong Mixing Conditions Volumes 1-3 500
Technologies supporting mass customization of apparel: A pilot project 450
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3794036
求助须知:如何正确求助?哪些是违规求助? 3338945
关于积分的说明 10293257
捐赠科研通 3055500
什么是DOI,文献DOI怎么找? 1676694
邀请新用户注册赠送积分活动 804637
科研通“疑难数据库(出版商)”最低求助积分说明 762015