计算机科学
虚拟化
并行计算
分布式计算
负载平衡(电力)
还原(数学)
领域(数学分析)
理论计算机科学
操作系统
云计算
几何学
数学
数学分析
网格
作者
Sam White,Laxmikant V. Kalé
标识
DOI:10.1109/ipdpsw55747.2022.00085
摘要
Dynamic load balancing can be difficult for MPI-based applications. Application logic and algorithms are often rewritten to enable dynamic repartitioning of the domain. An alternative approach is to virtualize the MPI ranks as threads-instead of operating system processes- and to migrate threads around the system to balance the computational load. Adaptive MPI is one such implementation. It supports virtualization of MPI ranks as migratable user-level threads. However, this migratability itself can introduce new performance overheads to applications. In this paper, we identify non-commutative reduction operations as problematic for any runtime supporting either user-defined initial mapping of ranks or dynamic migration of ranks among the cores or nodes of a machine. We investigate the challenges associated with supporting efficient non-commutative reduction operations, and explore algorithmic alternatives such as recursive doubling and halving in combination with a novel adaptive message combining technique. We explore tradeoffs in the different algorithms for various message sizes and mappings of ranks to cores, demonstrating our performance improvements using microbenchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI