计算机科学
格拉米安矩阵
判别式
架空(工程)
联营
人工智能
编码(集合论)
成对比较
机器学习
特征(语言学)
一般化
特征学习
分割
强化学习
补语(音乐)
公制(单位)
模式识别(心理学)
表型
基因
经济
互补
集合(抽象数据类型)
物理
特征向量
程序设计语言
量子力学
生物化学
操作系统
化学
语言学
数学分析
数学
运营管理
哲学
作者
Jongbin Ryu,Dongil Han,Jongwoo Lim
标识
DOI:10.1109/iccv51070.2023.00537
摘要
We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (i.e., classification heads) instead of relying on channel expansion or additional building blocks. Our approach employs attention-based aggregation, utilizing pairwise feature similarity to enhance multiple lightweight heads with minimal resource overhead. We compute the Gramian matrices to reinforce class tokens in an attention layer for each head. This enables the heads to learn more discriminative representations, enhancing their aggregation capabilities. Furthermore, we propose a learning algorithm that encourages heads to complement each other by reducing correlation for aggregation. Our models eventually surpass state-of-the-art CNNs and ViTs regarding the accuracy-throughput trade-off on ImageNet-1K and deliver remarkable performance across various downstream tasks, such as COCO object instance segmentation, ADE20k semantic segmentation, and fine-grained visual classification datasets. The effectiveness of our framework is substantiated by practical experimental results and further underpinned by generalization error bound. We release the code publicly at: https://github.com/Lab-LVM/imagenet-models.
科研通智能强力驱动
Strongly Powered by AbleSci AI