Improving Fine-Grained Image Classification With Multimodal Information

计算机科学 人工智能 模式识别(心理学) 特征提取 预处理器 规范化(社会学) 特征(语言学) 图像融合 图像(数学) 语言学 哲学 社会学 人类学
作者
Jie Xu,Xiaoqian Zhang,Changming Zhao,Zili Geng,Yuren Feng,Ke Miao,Yunji Li
出处
期刊:IEEE Transactions on Multimedia [Institute of Electrical and Electronics Engineers]
卷期号:26: 2082-2095 被引量:20
标识
DOI:10.1109/tmm.2023.3291819
摘要

Fine-grained image datasets have small inter-class differences and large intra-class differences, which is a difficulty of the fine-grained image classification. Traditional fine-grained image classification methods only focus on the visual features of images. However, this limitation can be eliminated when these methods are improved with multimodal information. This paper proposes an improved fine-grained image classification method with multimodal information that includes multimodal data preprocessing, multimodal feature extraction, multi-temporal feature fusion and decision correction. The preprocessing method proposed solves the problems of scattered distribution, difficult processing and uneven contribution to prediction of multimodal data through normalization, packing phrases and weighted concatenating methods. When extracting multimodal features, the SAMLP (Self-Attention MLP) module proposed combines self-attention with MLP to capture the internal correlation of multimodal information. The multi-temporal feature fusion proposed is divided into early feature fusion and late feature fusion. The former refers to adding multimodal information markers to the original image, and the latter refers to designing a multi-cascade dynamic MLP structure to fuse visual features and multimodal features. In view of the limitation of feature fusion, a decision strategy is proposed to revise the prediction results of fused features according to the prediction results of multimodal features. Ablation experiment on INAT18-1K and INAT21-1K datasets shows that our method is effective in improving classification with multimodal information. Experiments on the INAT2021_mini large dataset show that the comprehensive method in this article has higher accuracy and negligible efficiency loss compared with the state-of-the-art method.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
1秒前
3秒前
造梦发布了新的文献求助10
3秒前
12333发布了新的文献求助10
3秒前
hgf完成签到,获得积分10
3秒前
小二郎应助Ly采纳,获得10
3秒前
木木木发布了新的文献求助10
4秒前
丑小鸭完成签到,获得积分10
5秒前
sunish发布了新的文献求助10
6秒前
6秒前
6秒前
白亦完成签到,获得积分10
8秒前
汉堡包应助xndkwj采纳,获得10
9秒前
尼克狐尼克完成签到,获得积分10
10秒前
10秒前
酷波er应助12333采纳,获得10
10秒前
11秒前
赘婿应助大海采纳,获得10
11秒前
R18686226306发布了新的文献求助10
11秒前
传奇3应助dc123456采纳,获得10
11秒前
12秒前
HHHedyD发布了新的文献求助10
12秒前
13秒前
彭于晏应助废柴喵采纳,获得10
13秒前
传奇3应助小c采纳,获得10
13秒前
14秒前
14秒前
科研通AI6.3应助学术渣渣采纳,获得30
14秒前
郭倍坚发布了新的文献求助30
15秒前
15秒前
learning发布了新的文献求助10
15秒前
15秒前
方东发布了新的文献求助10
16秒前
16秒前
欢呼海露发布了新的文献求助10
17秒前
17秒前
析木发布了新的文献求助10
18秒前
斯文败类应助Panpan采纳,获得10
18秒前
高分求助中
Cronologia da história de Macau 5000
Erwählung und Berufung bei Paulus: Bedeutung, Entwicklung und Funktion einer Vorstellung in ihrem frühjüdischen und griechisch-römischen Kontext 850
Matrix Methods in Data Mining and Pattern Recognition 510
Interactions of Vowel Quality and Prosody in East Slavic 500
用于植入式医疗器械的馈通设计与实现 400
Animalia: Animal and Human Interaction in the Early Medieval English World (Exeter Studies in Medieval Europe) 400
Synfacts Issue 07 · Volume 22 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 内科学 物理 复合材料 催化作用 细胞生物学 无机化学 光电子学 物理化学 电极 基因
热门帖子
关注 科研通微信公众号,转发送积分 7138195
求助须知:如何正确求助?哪些是违规求助? 8786775
关于积分的说明 18575162
捐赠科研通 6725548
什么是DOI,文献DOI怎么找? 3154655
关于科研通互助平台的介绍 2281456
邀请新用户注册赠送积分活动 2129158