判别式
计算机科学
人气
卷积神经网络
深度学习
人工智能
相似性(几何)
机器学习
班级(哲学)
图层(电子)
公制(单位)
光学(聚焦)
变化(天文学)
数据科学
图像(数学)
心理学
社会心理学
化学
运营管理
物理
有机化学
光学
经济
天体物理学
作者
Dichao Liu,Longjiao Zhao,Yu Wang,Jien Kato
标识
DOI:10.1016/j.patcog.2023.109550
摘要
Fine-grained visual classification (FGVC) is valuable yet challenging. The difficulty of FGVC mainly lies in its intrinsic inter-class similarity, intra-class variation, and limited training data. Moreover, with the popularity of deep convolutional neural networks, researchers have mainly used deep, abstract, semantic information for FGVC, while shallow, detailed information has been neglected. This work proposes a cross-layer mutual attention learning network (CMAL-Net) to solve the above problems. Specifically, this work views the shallow to deep layers of CNNs as “experts” knowledgeable about different perspectives. We let each expert give a category prediction and an attention region indicating the found clues. Attention regions are treated as information carriers among experts, bringing three benefits: (i) helping the model focus on discriminative regions; (ii) providing more training data; (iii) allowing experts to learn from each other to improve the overall performance. CMAL-Net achieves state-of-the-art performance on three competitive datasets: FGVC-Aircraft, Stanford Cars, and Food-11. The source code is available at https://github.com/Dichao-Liu/CMAL
科研通智能强力驱动
Strongly Powered by AbleSci AI