DAG-based workflows scheduling using Actor–Critic Deep Reinforcement Learning

计算机科学 强化学习 工作流程 调度(生产过程) 分布式计算 有向无环图 作业车间调度 地铁列车时刻表 人工智能 理论计算机科学 算法 数学优化 数学 数据库 操作系统
作者
Guilherme Koslovski,Kleiton Pereira,Paulo Roberto Albuquerque
出处
期刊:Future Generation Computer Systems [Elsevier]
卷期号:150: 354-363 被引量:26
标识
DOI:10.1016/j.future.2023.09.018
摘要

© 2023 Elsevier B.V.High-Performance Computing (HPC) is essential to support the advance in multiple research and industrial fields. Despite the recent growth in processing and networking power, the HPC Data Centers (DCs) are finite, and should be carefully managed to host multiple jobs. The scheduling of tasks (composing a job) is a crucial and complex task, once the reflexes of the scheduler's decisions are perceptible both for users (e.g., slowdown) and for infrastructure administrators (e.g., use of resources and queue length). In fact, the process of scheduling workflows atop a DC can be modeled as a graph mapping problem. While an undirected graph is used to represent the DC, a Directed Acyclic Graph (DAG) is used to express the tasks dependencies. Each vertex and edge from both graphs can have weights associated with them, denoting the residual capacities for DC resources, as well as computing and networking demands for workflows. Motivated by the combinatorial explosion of the aforementioned scheduling problem, the integration of Machine Learning (ML) for generating or improving scheduling policies is a reality, however the proposals in the specialized literature opt, mostly, for using simplified models to reduce the search space or are trained to specific scenarios, which leads to policies that eventually fall short of real DCs expectations. Given this challenge, this work applies Actor–Critic (AC) Reinforcement Learning (RL) to schedule DAG-based workflows. Instead of proposing a new policy, the AC RL is used to select the appropriated scheduling policy from a pool of consolidated algorithms, guided by the DAGs workload and DC usage. The AC RL-based scheduler analyzes the DAGs queue and the DC status to define which algorithms are better suited to improve the overall performance indicators in each scenario instance. The simulation protocol comprises multiple analysis with distinct workload configurations, number of jobs, queue ordering polices and strategies to select the target DC servers. The results demonstrated that the AC RL selects the scheduling policy which fits the current workload and DC status.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
shinn发布了新的文献求助10
刚刚
1秒前
1秒前
1秒前
Owen应助浮华采纳,获得10
1秒前
2秒前
科研老兵完成签到,获得积分10
2秒前
2秒前
3秒前
阔达宛凝完成签到,获得积分10
4秒前
ZhangY发布了新的文献求助10
4秒前
科研通AI6.1应助ash采纳,获得10
4秒前
4秒前
忘言完成签到,获得积分10
5秒前
6秒前
6秒前
orixero应助香蕉猴子啦啦啦采纳,获得10
6秒前
7秒前
顶针发布了新的文献求助10
7秒前
量子星尘发布了新的文献求助10
7秒前
8秒前
小丸子完成签到,获得积分10
8秒前
9秒前
Cy完成签到,获得积分10
9秒前
DTOU应助科研通管家采纳,获得10
9秒前
DTOU应助科研通管家采纳,获得10
9秒前
香蕉觅云应助科研通管家采纳,获得10
9秒前
Aow发布了新的文献求助10
9秒前
niNe3YUE应助科研通管家采纳,获得10
10秒前
10秒前
10秒前
10秒前
爆米花应助科研通管家采纳,获得10
10秒前
斯文败类应助科研通管家采纳,获得10
10秒前
SciGPT应助科研通管家采纳,获得10
10秒前
10秒前
ksr8888应助科研通管家采纳,获得10
10秒前
10秒前
10秒前
10秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Encyclopedia of Quaternary Science Reference Third edition 6000
Encyclopedia of Forensic and Legal Medicine Third Edition 5000
Introduction to strong mixing conditions volume 1-3 5000
Aerospace Engineering Education During the First Century of Flight 3000
Agyptische Geschichte der 21.30. Dynastie 3000
Les Mantodea de guyane 2000
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5785120
求助须知:如何正确求助?哪些是违规求助? 5686059
关于积分的说明 15466834
捐赠科研通 4914228
什么是DOI,文献DOI怎么找? 2645117
邀请新用户注册赠送积分活动 1592946
关于科研通互助平台的介绍 1547300