计算机科学
瓶颈
计算
变压器
并行计算
计算机工程
量化(信号处理)
安全性令牌
分布式计算
正确性
算法
嵌入式系统
计算机网络
物理
量子力学
电压
作者
Zihao Zeng,Chubo Liu,Zhuo Tang,Kenli Li,Keqin Li
标识
DOI:10.1109/tpds.2022.3187815
摘要
Transformer-based deep neural networks have recently swept the field of natural language processing due to their outstanding performance, and are gradually spreading to more applications such as image/video processing. However, compared with general DNNs, training a sizeable transformer-based model is further time-consuming and memory-hungry. The existing distributed training strategies for general DNNs are not appropriate or can not efficiently handle transformer-based networks. In view of this, we propose an intra-layer model parallelization optimization strategy, AccTFM, which introduces a novel fine-grained pipeline execution and hybrid communication compression strategy to overcome the synchronization bottleneck. Specifically, on one hand, it first decouples the inter-layer computation and communication dependencies, and then searches for the optimal partitioning strategy to maximize the overlap of computation and communication. On the other hand, the hybrid communication compression module consists of token-level top- $k$ sparsification and piecewise quantization methods aiming at minimizing communication traffic. Experimental results show that AccTFM accelerates transformer-based DNNs training by up to 2.08x compared to state-of-the-art distributed training techniques.
科研通智能强力驱动
Strongly Powered by AbleSci AI