Performance and Power Prediction for Concurrent Execution on GPUs

计算机科学 SIMD公司 云计算 服务器 虚拟化 并行计算 GSM演进的增强数据速率 分布式计算 操作系统 人工智能
作者
Diksha Moolchandani,Anshul Kumar,Smruti R. Sarangi
出处
期刊:ACM Transactions on Architecture and Code Optimization [Association for Computing Machinery]
卷期号:19 (3): 1-27 被引量:1
标识
DOI:10.1145/3522712
摘要

The unprecedented growth of edge computing and 5G has led to an increased offloading of mobile applications to cloud servers or edge cloudlets. 1 The most prominent workloads comprise computer vision applications. Conventional wisdom suggests that computer vision workloads perform significantly well on SIMD/SIMT architectures such as GPUs owing to the dominance of linear algebra kernels in their composition. In this work, we debunk this popular belief by performing a lot of experiments with the concurrent execution of these workloads, which is the most popular pattern in which these workloads are executed on cloud servers. We show that the performance of these applications on GPUs does not scale well with an increase in the number of concurrent applications primarily because of contention at the shared resources and lack of efficient virtualization techniques for GPUs. Hence, there is a need to accurately predict the performance and power of such ensemble workloads on a GPU. Sadly, most of the prior work in the area of performance/power prediction is for only a single application. To the best of our knowledge, we propose the first machine learning-based predictor to predict the performance and power of an ensemble of applications on a GPU. In this article, we show that by using the execution statistics of stand-alone workloads and the fairness of execution when these workloads are executed with three representative microbenchmarks, we can get a reasonably accurate prediction. This is the first such work in the direction of performance and power prediction for concurrent applications that does not rely on the features extracted from concurrent executions or GPU profiling data. Our predictors achieve an accuracy of 91% and 96% in estimating the performance and power of executing two applications concurrently, respectively. We also demonstrate a method to extend our models to four or five concurrently running applications on modern GPUs.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
3秒前
5秒前
wanci应助董怼怼采纳,获得10
6秒前
可爱的函函应助神勇夏寒采纳,获得10
7秒前
在水一方应助菠萝披萨采纳,获得10
8秒前
8秒前
月潮共生完成签到 ,获得积分10
9秒前
9秒前
高大惜天完成签到,获得积分10
9秒前
充电宝应助小盛采纳,获得10
10秒前
充电宝应助wwwww采纳,获得10
10秒前
11秒前
12秒前
13秒前
华贞发布了新的文献求助10
15秒前
15秒前
16秒前
FashionBoy应助深渊采纳,获得10
16秒前
16秒前
shinysparrow应助孤独的柠檬采纳,获得10
17秒前
123567发布了新的文献求助10
17秒前
Rjj完成签到,获得积分10
17秒前
脑洞疼应助无尘采纳,获得10
18秒前
19秒前
Wang发布了新的文献求助10
20秒前
21秒前
UU发布了新的文献求助10
21秒前
clearlove发布了新的文献求助10
21秒前
华仔应助神勇夏寒采纳,获得10
22秒前
23秒前
天才小张发布了新的文献求助10
24秒前
24秒前
充电宝应助junhan采纳,获得10
25秒前
董怼怼发布了新的文献求助10
26秒前
笨笨友桃发布了新的文献求助10
27秒前
27秒前
鱼笙完成签到,获得积分10
28秒前
29秒前
30秒前
濮阳思远发布了新的文献求助10
30秒前
高分求助中
Teaching Social and Emotional Learning in Physical Education 900
Plesiosaur extinction cycles; events that mark the beginning, middle and end of the Cretaceous 800
Recherches Ethnographiques sue les Yao dans la Chine du Sud 500
Two-sample Mendelian randomization analysis reveals causal relationships between blood lipids and venous thromboembolism 500
Chinese-English Translation Lexicon Version 3.0 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 440
Wisdom, Gods and Literature Studies in Assyriology in Honour of W. G. Lambert 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2390004
求助须知:如何正确求助?哪些是违规求助? 2096062
关于积分的说明 5279889
捐赠科研通 1823226
什么是DOI,文献DOI怎么找? 909483
版权声明 559621
科研通“疑难数据库(出版商)”最低求助积分说明 485999