Multimodal Isotropic Neural Architecture with Patch Embedding

计算机科学 嵌入 卷积神经网络 人工智能 变压器 模式识别(心理学) 建筑 可扩展性 计算机工程 数据库 艺术 物理 量子力学 电压 视觉艺术
作者
Hubert Truchan,Evgenii Naumov,Rezaul Abedin,Gregory Palmer,Zahra Ahmadi
出处
期刊:Lecture Notes in Computer Science 卷期号:: 173-187 被引量:2
标识
DOI:10.1007/978-981-99-8079-6_14
摘要

Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs of these pathways are merged and fed to a temporal classifier. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors' newly collected multi-dimensional multimodal dataset, Mudestreda, obtained from real industrial processing devices $$^{1}$$ ( $$^{1}$$ Link to code and dataset: https://github.com/hubtru/Minape ).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
量子星尘发布了新的文献求助10
刚刚
6秒前
8秒前
10秒前
11秒前
共享精神应助科研通管家采纳,获得10
12秒前
bkagyin应助科研通管家采纳,获得10
12秒前
小蘑菇应助科研通管家采纳,获得10
13秒前
冰魂应助科研通管家采纳,获得10
13秒前
FelixChen应助科研通管家采纳,获得10
13秒前
Hello应助科研通管家采纳,获得10
13秒前
慕青应助科研通管家采纳,获得10
13秒前
共享精神应助科研通管家采纳,获得10
13秒前
Lucas应助科研通管家采纳,获得30
13秒前
13秒前
Shion发布了新的文献求助10
14秒前
sci完成签到 ,获得积分10
15秒前
执着的日记本完成签到,获得积分10
16秒前
zhzssaijj完成签到,获得积分10
16秒前
17秒前
18秒前
18秒前
pluto应助estrella采纳,获得50
20秒前
zy3637完成签到 ,获得积分10
21秒前
21秒前
雷高炜发布了新的文献求助10
24秒前
量子星尘发布了新的文献求助10
25秒前
25秒前
小何又学累了完成签到 ,获得积分10
27秒前
27秒前
28秒前
平常的毛豆应助modesty采纳,获得10
28秒前
31秒前
漫溢阳光完成签到 ,获得积分10
31秒前
32秒前
33秒前
鑫鑫完成签到,获得积分10
34秒前
雾失楼台发布了新的文献求助30
36秒前
Jojin完成签到,获得积分10
37秒前
37秒前
高分求助中
【提示信息,请勿应助】请使用合适的网盘上传文件 10000
The Oxford Encyclopedia of the History of Modern Psychology 1500
Green Star Japan: Esperanto and the International Language Question, 1880–1945 800
Sentimental Republic: Chinese Intellectuals and the Maoist Past 800
The Martian climate revisited: atmosphere and environment of a desert planet 800
Parametric Random Vibration 800
Building Quantum Computers 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3865193
求助须知:如何正确求助?哪些是违规求助? 3407463
关于积分的说明 10654630
捐赠科研通 3131554
什么是DOI,文献DOI怎么找? 1727175
邀请新用户注册赠送积分活动 832169
科研通“疑难数据库(出版商)”最低求助积分说明 780175