Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory

计算机科学 并行计算 内存管理 计算机体系结构 操作系统 半导体存储器
作者
Jung‐Woo Choi,Hyuk-Jae Lee,Kyomin Sohn,Hak-soo Yu,Chae Eun Rhee
出处
期刊:IEEE Access [Institute of Electrical and Electronics Engineers]
卷期号:12: 160190-160204 被引量:1
标识
DOI:10.1109/access.2024.3488004
摘要

Training of convolutional neural networks (CNN) consumes a lot of time and resources. While most previous works have focused on accelerating the convolutional (CONV) layer, the proportion of non-convolutional (non-CONV) layers, such as batch normalization, is gradually increasing during training. Non-CONV layers have low cache reuse and arithmetic intensity, thereby performance is limited by memory bandwidth. Processing-in-memory (PIM) can utilize wide memory bandwidth, making it suitable for acceleration of non-CONV layers. Therefore, it makes sense to perform the computationally complex CONV layer on the host and handle the memory bottleneck challenges of the non-CONV layer on the PIM. Further improved performance can be expected if they run simultaneously. However, memory access conflicts between the host and PIM are the biggest factors hindering performance improvement. Prior studies proposed bank partitioning to alleviate memory conflicts, but it is not effective because CNN training involves significant data sharing between CONV and non-CONV layers. In this paper, we propose a memory scheduling and CNN training flow for the pipelined execution of CONV layers on the host and non-CONV layers on PIM. First, instead of applying bank partitioning, the host and PIM exclusively access memory for a certain period to avoid the movement of shared data between host memory and PIM memory. The conditions for switching the memory access authority between the host and PIM are set per layer, taking into account memory access characteristics and the number of queued memory requests. Second, in the training flow, CONV and non-CONV layers are pipelined in units of output feature map channels. Specifically, for the backward pass, the non-CONV tasks of the feature map gradient calculation phase and the weight gradient update phase are rearranged so that they can be easily performed within CONV layers. Experimental results show that the proposed pipelined execution achieves an average speedup of 18.1% at the network level compared to the serial operation of the host and PIM.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
李健应助SU15964707813采纳,获得10
刚刚
李春霞发布了新的文献求助10
1秒前
小田完成签到,获得积分10
1秒前
wanci应助王晓芳采纳,获得10
1秒前
承欢完成签到,获得积分10
1秒前
大模型应助深情丸子采纳,获得10
2秒前
不吃香菜完成签到,获得积分10
2秒前
清风完成签到 ,获得积分10
2秒前
火山发布了新的文献求助20
2秒前
ashin143完成签到,获得积分10
2秒前
ergatoid完成签到,获得积分10
2秒前
林结衣完成签到,获得积分10
3秒前
3秒前
长情的八宝粥完成签到 ,获得积分10
3秒前
包容的荷花完成签到,获得积分10
4秒前
柔弱的钢铁侠完成签到,获得积分10
4秒前
王潇怡完成签到,获得积分10
4秒前
4秒前
予北完成签到 ,获得积分10
4秒前
勤qin完成签到 ,获得积分10
5秒前
zhangkx23完成签到,获得积分10
5秒前
5秒前
5秒前
多情凝荷完成签到,获得积分10
6秒前
Gbn发布了新的文献求助10
6秒前
ashin143发布了新的文献求助10
6秒前
2041完成签到,获得积分10
6秒前
6秒前
ZR666888完成签到,获得积分10
6秒前
还好发布了新的文献求助10
6秒前
Ava应助wanghb616采纳,获得10
6秒前
半梦发布了新的文献求助10
7秒前
无辜吐司完成签到,获得积分10
7秒前
陈迹完成签到,获得积分10
7秒前
科研通AI6应助i-bear采纳,获得10
8秒前
NatureLee完成签到 ,获得积分10
8秒前
8秒前
8秒前
慕青应助Aoia采纳,获得10
9秒前
zhongbo发布了新的文献求助10
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Encyclopedia of Reproduction Third Edition 3000
《药学类医疗服务价格项目立项指南(征求意见稿)》 1000
花の香りの秘密―遺伝子情報から機能性まで 800
1st Edition Sports Rehabilitation and Training Multidisciplinary Perspectives By Richard Moss, Adam Gledhill 600
Chemistry and Biochemistry: Research Progress Vol. 7 430
Biotechnology Engineering 400
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5629388
求助须知:如何正确求助?哪些是违规求助? 4720032
关于积分的说明 14969548
捐赠科研通 4787503
什么是DOI,文献DOI怎么找? 2556351
邀请新用户注册赠送积分活动 1517486
关于科研通互助平台的介绍 1478188