已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory

计算机科学 并行计算 内存管理 计算机体系结构 操作系统 半导体存储器
作者
Jung‐Woo Choi,Hyuk-Jae Lee,Kyomin Sohn,Hak-soo Yu,Chae Eun Rhee
出处
期刊:IEEE Access [Institute of Electrical and Electronics Engineers]
卷期号:12: 160190-160204 被引量:1
标识
DOI:10.1109/access.2024.3488004
摘要

Training of convolutional neural networks (CNN) consumes a lot of time and resources. While most previous works have focused on accelerating the convolutional (CONV) layer, the proportion of non-convolutional (non-CONV) layers, such as batch normalization, is gradually increasing during training. Non-CONV layers have low cache reuse and arithmetic intensity, thereby performance is limited by memory bandwidth. Processing-in-memory (PIM) can utilize wide memory bandwidth, making it suitable for acceleration of non-CONV layers. Therefore, it makes sense to perform the computationally complex CONV layer on the host and handle the memory bottleneck challenges of the non-CONV layer on the PIM. Further improved performance can be expected if they run simultaneously. However, memory access conflicts between the host and PIM are the biggest factors hindering performance improvement. Prior studies proposed bank partitioning to alleviate memory conflicts, but it is not effective because CNN training involves significant data sharing between CONV and non-CONV layers. In this paper, we propose a memory scheduling and CNN training flow for the pipelined execution of CONV layers on the host and non-CONV layers on PIM. First, instead of applying bank partitioning, the host and PIM exclusively access memory for a certain period to avoid the movement of shared data between host memory and PIM memory. The conditions for switching the memory access authority between the host and PIM are set per layer, taking into account memory access characteristics and the number of queued memory requests. Second, in the training flow, CONV and non-CONV layers are pipelined in units of output feature map channels. Specifically, for the backward pass, the non-CONV tasks of the feature map gradient calculation phase and the weight gradient update phase are rearranged so that they can be easily performed within CONV layers. Experimental results show that the proposed pipelined execution achieves an average speedup of 18.1% at the network level compared to the serial operation of the host and PIM.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
李健应助张张采纳,获得10
1秒前
1秒前
小二郎应助tg2024采纳,获得10
1秒前
2秒前
3秒前
6秒前
小段发布了新的文献求助10
6秒前
7秒前
Yang发布了新的文献求助10
7秒前
7秒前
my完成签到 ,获得积分10
8秒前
10秒前
陈龙发布了新的文献求助10
10秒前
10秒前
Jacquielin发布了新的文献求助10
12秒前
科研通AI2S应助hillbert采纳,获得10
13秒前
Knots发布了新的文献求助10
16秒前
诉与山风听完成签到,获得积分10
16秒前
17秒前
Hello应助小路采纳,获得10
18秒前
旭旭关注了科研通微信公众号
19秒前
蛙蛙完成签到 ,获得积分10
19秒前
20秒前
研友_8DAv0L完成签到,获得积分10
20秒前
电化学小生完成签到,获得积分10
21秒前
21秒前
21秒前
诚心闭月完成签到,获得积分10
22秒前
可爱的函函应助tg2024采纳,获得10
23秒前
Jeffery完成签到,获得积分10
24秒前
Burney应助luanzhaohui采纳,获得80
24秒前
25秒前
1111完成签到,获得积分10
26秒前
26秒前
盏盏应助Knots采纳,获得10
26秒前
26秒前
领导范儿应助动听的蜗牛采纳,获得10
29秒前
VDC发布了新的文献求助10
29秒前
奋斗盼望完成签到 ,获得积分10
30秒前
王多鱼发布了新的文献求助10
33秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Chemistry and Physics of Carbon Volume 18 800
The Organometallic Chemistry of the Transition Metals 800
Leading Academic-Practice Partnerships in Nursing and Healthcare: A Paradigm for Change 800
The formation of Australian attitudes towards China, 1918-1941 640
Signals, Systems, and Signal Processing 610
Research Methods for Business: A Skill Building Approach, 9th Edition 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6425484
求助须知:如何正确求助?哪些是违规求助? 8243150
关于积分的说明 17525648
捐赠科研通 5480076
什么是DOI,文献DOI怎么找? 2894135
邀请新用户注册赠送积分活动 1870332
关于科研通互助平台的介绍 1708360