纱线
强化学习
计算机科学
工作流程
调度(生产过程)
星团(航天器)
人工智能
分布式计算
计算机网络
工程类
数据库
机械工程
运营管理
作者
Jing Xue,Ting Wang,Puyu Cai
标识
DOI:10.1109/globecom54140.2023.10436820
摘要
Hadoop Yarn is an open-source cluster manager responsible for resource management and job scheduling. However, data-driven applications are typically organized into workflows that consist of a series of jobs with dependencies. Yarn does not manage users' workflows and only considers the current job rather than the entire workflow when scheduling. In practice, multiple workflows share the same Yarn cluster and are pre-assigned separate Yarn resource queues to avoid mutual interference. However, this coarse-grained resource division can sometimes result in low resource utilization and increased pending time of jobs on the Yarn queue. For instance, one resource queue may have exhausted its quota while still having pending jobs, while other queues may have available resources but cannot begin executing any jobs due to unfulfilled data dependencies. To address this problem, we propose a deep reinforcement learning-based workflow scheduling scheme that takes into account job dependencies, job priorities, and dynamic resource usage. The proposed approach can intelligently identify and utilize free windows of different resource queues. Our simulation results demonstrate that the proposed DRL-based workflow scheduling scheme can significantly reduce the average job latency compared to existing approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI