已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs

英菲尼班德 远程直接内存访问 计算机科学 瓶颈 PCI Express 并行计算 超级计算机 消息传递 低延迟(资本市场) 操作系统 嵌入式系统 计算机网络 数据采集
作者
Sreeram Potluri,Khaled Hamidouche,Akshay Venkatesh,Devendar Bureddy,Dhabaleswar K. Panda
标识
DOI:10.1109/icpp.2013.17
摘要

GPUs and accelerators have become ubiquitous in modern supercomputing systems. Scientific applications from a wide range of fields are being modified to take advantage of their compute power. However, data movement continues to be a critical bottleneck in harnessing the full potential of a GPU. Data in the GPU memory has to be moved into the host memory before it can be sent over the network. MPI libraries like MVAPICH2 have provided solutions to alleviate this bottleneck using techniques like pipelining. GPUDirect RDMA is a feature introduced in CUDA 5.0, that allows third party devices like network adapters to directly access data in GPU device memory, over the PCIe bus. NVIDIA has partnered with Mellanox to make this solution available for InfiniBand clusters. In this paper, we evaluate the first version of GPUDirect RDMA for InfiniBand and propose designs in MVAPICH2 MPI library to efficiently take advantage of this feature. We highlight the limitations posed by current generation architectures in effectively using GPUDirect RDMA and address these issues through novel designs in MVAPICH2. To the best of our knowledge, this is the first work to demonstrate a solution for internode GPU-to-GPU MPI communication using GPUDirect RDMA. Results show that the proposed designs improve the latency of internode GPU-to-GPU communication using MPI Send/MPI Recv by 69% and 32% for 4Byte and 128KByte messages, respectively. The designs boost the uni-directional bandwidth achieved using 4KByte and 64KByte messages by 2x and 35%, respectively. We demonstrate the impact of the proposed designs using two end-applications: LBMGPU and AWP-ODC. They improve the communication times in these applications by up to 35% and 40%, respectively.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
Ale发布了新的文献求助10
刚刚
马马完成签到 ,获得积分10
1秒前
zyf完成签到,获得积分10
3秒前
JamesPei应助寒冷的尔芙采纳,获得10
4秒前
何丽雅发布了新的文献求助10
4秒前
浮游应助joyemovie采纳,获得10
5秒前
bmhs2017应助方囧采纳,获得10
5秒前
6秒前
Ale完成签到,获得积分10
8秒前
tututu发布了新的文献求助10
9秒前
qqq159753发布了新的文献求助10
10秒前
10秒前
11秒前
科研通AI6应助能干之卉采纳,获得10
13秒前
13秒前
汉堡包应助Zmy采纳,获得10
14秒前
顺心梦山完成签到,获得积分10
15秒前
光而不耀完成签到,获得积分10
15秒前
16秒前
jiuhua发布了新的文献求助10
16秒前
17秒前
18秒前
可爱的函函应助zhq采纳,获得10
19秒前
19秒前
20秒前
阔达的马里奥完成签到 ,获得积分10
20秒前
海棠完成签到 ,获得积分10
22秒前
刘窜疯发布了新的文献求助10
23秒前
苑开心完成签到,获得积分10
23秒前
24秒前
欣喜绮彤关注了科研通微信公众号
24秒前
司空靖琪完成签到,获得积分10
25秒前
25秒前
xin关注了科研通微信公众号
26秒前
爱吃鱼的猫完成签到,获得积分10
26秒前
费渡的小纸船完成签到,获得积分10
27秒前
日落再见发布了新的文献求助10
27秒前
29秒前
29秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Constitutional and Administrative Law 1000
Synthesis and properties of compounds of the type A (III) B2 (VI) X4 (VI), A (III) B4 (V) X7 (VI), and A3 (III) B4 (V) X9 (VI) 500
Microbially Influenced Corrosion of Materials 500
Die Fliegen der Palaearktischen Region. Familie 64 g: Larvaevorinae (Tachininae). 1975 500
The Experimental Biology of Bryophytes 500
The YWCA in China The Making of a Chinese Christian Women’s Institution, 1899–1957 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5394445
求助须知:如何正确求助?哪些是违规求助? 4515580
关于积分的说明 14054946
捐赠科研通 4426881
什么是DOI,文献DOI怎么找? 2431530
邀请新用户注册赠送积分活动 1423661
关于科研通互助平台的介绍 1402638