发布文献求助

Efficient GPU Resource Management under Latency and Power Constraints for Deep Learning Inference

计算机科学服务器延迟（音频）推论吞吐量试验台分布式计算深度学习实时计算计算机工程计算机网络人工智能操作系统电信无线

作者

Di Liu,Zimo Ma,Aolin Zhang,Kuangyu Zheng

标识

DOI：10.1109/mass58611.2023.00074

摘要

Recent rapid development in deep learning (DL) applications generates harsh requirements for DL inference services provided by GPU servers. On one hand, a high volume of different DL workloads always demands better processing throughput. On the other hand, GPU servers need to meet both the constraints of latency and power: each inference request must be responded in real-time with strict latency requirements; GPU servers need to be operated within a fixed power cap to prevent system failures from power overloading or overheating. Therefore, how to efficiently manage GPU resources to achieve better throughput under both latency and power constraints has become a key challenge.To address this issue, we first perform comprehensive measurements of inference tasks and have studied the impact of several critical knobs, including batch size, frequency, and GPU spatial sharing, on system performance in throughput, latency, and power. Then, we propose Morak, a multi-knob resource management framework for DL inference under the constraints of latency and power. A key mechanism of Morak is GPU resource partitioning with efficient space multiplexing for DL models. To further improve throughput, Morak efficiently explores the search space of GPU frequency and batch size under the constraints. Experiment results on a hardware testbed show that Morak can achieve as much as 67.7% throughput improvement compared with several state-of-the-art baselines under tight constraints of latency and power.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2025年影响因子查询已上线 (2025-6-18)

更新

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 田様上传了应助文件

刚刚; Akim上传了应助文件

1秒前; 完美世界上传了应助文件

2秒前; 莫三颜发布了新的文献求助10

2秒前; 麻师长完成签到，获得积分10

2秒前; zzz完成签到，获得积分10

3秒前; 不倦上传了应助文件

3秒前; 时倾发布了新的文献求助10

4秒前; 领导范儿的应助被华莉变身采纳，获得10

4秒前; 彭佳丽发布了新的文献求助10

4秒前; 科目三的应助被Chenchuanpeng采纳，获得10

4秒前; liuhai发布了新的文献求助10

4秒前; YuguangWu完成签到，获得积分20

5秒前; zcl上传了应助文件

5秒前; 大个的应助被赵宇宙采纳，获得10

6秒前; 科目三上传了应助文件

6秒前; 斯文败类的应助被qls123采纳，获得10

6秒前; 可爱的函函的应助被qls123采纳，获得10

6秒前; 天天快乐的应助被qls123采纳，获得10

6秒前; 李爱国的应助被qls123采纳，获得10

6秒前; 充电宝的应助被qls123采纳，获得10

6秒前; Ava的应助被qls123采纳，获得10

6秒前; 酷波er的应助被qls123采纳，获得10

6秒前; 研友_LJaXX8发布了新的文献求助10

6秒前; 英姑的应助被qls123采纳，获得10

6秒前; changping的应助被qls123采纳，获得10

6秒前; 科研通AI2S的应助被qls123采纳，获得10

6秒前; 4Peace完成签到，获得积分10

7秒前; 善学以致用上传了应助文件

7秒前; FashionBoy上传了应助文件

7秒前; 香蕉觅云上传了应助文件

8秒前; 充电宝的应助被科研通管家采纳，获得10

8秒前; chenqiumu的应助被科研通管家采纳，获得20

8秒前; 梦里花落声的应助被科研通管家采纳，获得10

8秒前; 小杭76的应助被科研通管家采纳，获得10

8秒前; 尉迟希望的应助被科研通管家采纳，获得10

8秒前; wy.he的应助被科研通管家采纳，获得10

8秒前; 善学以致用的应助被科研通管家采纳，获得10

8秒前; 干饭虫的应助被科研通管家采纳，获得10

8秒前; 风风风上传了应助文件

9秒前

高分求助中: (应助此贴封号)【重要！！请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000; Fermented Coffee Market 2000; Constitutional and Administrative Law 500; PARLOC2001: The update of loss containment data for offshore pipelines 500; Critical Thinking: Tools for Taking Charge of Your Learning and Your Life 4th Edition 500; Investigative Interviewing: Psychology and Practice 300; Atlas of Anatomy (Fifth Edition) 300

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 5286347; 求助须知：如何正确求助？哪些是违规求助？ 4439154; 关于积分的说明 13820291; 捐赠科研通 4320921; 什么是DOI，文献DOI怎么找？ 2371639; 邀请新用户注册赠送积分活动 1367266; 关于科研通互助平台的介绍 1330704

今日热心研友

昏睡的蟠桃

虎皮猫大人

殷勤的紫槐

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通