计算机科学
加速
试验台
服务器
切片
调度(生产过程)
并行计算
卷积神经网络
缩放比例
分布式计算
分拆(数论)
人工智能
计算机网络
数学优化
组合数学
万维网
数学
几何学
作者
Shuai Wang,Dan Li,Jinkun Geng
标识
DOI:10.1109/infocom41043.2020.9155282
摘要
Increasingly rich data sets and complicated models make distributed machine learning more and more important. However, the cost of extensive and frequent parameter synchronizations can easily diminish the benefits of distributed training across multiple machines. In this paper, we present Geryon, a network-level flow scheduling scheme to accelerate distributed Convolutional Neural Network (CNN) training. Geryon leverages multiple flows with different priorities to transfer parameters of different urgency levels, which can naturally coordinate multiple parameter servers and prioritize the urgent parameter transfers in the entire network fabric. Geryon requires no modification in CNN models and does not affect the training accuracy. Based on the experimental results of four representative CNN models on a testbed of 8 GPU servers, Geryon achieves up to 95.7% scaling efficiency even with 10GbE bandwidth. In contrast, for most models, the scaling efficiency of vanilla TensorFlow is no more than 37% and that of TensorFlow with parameter partition and slicing is around 80%. In terms of training throughput, Geryon enhanced with parameter partition and slicing achieves up to 4.37x speedup, where the flow scheduling algorithm itself achieves up to 1.2x speedup over parameter partition and slicing.
科研通智能强力驱动
Strongly Powered by AbleSci AI