计算机科学
量化(信号处理)
加速
推论
内存占用
图形
软件部署
页面排名
理论计算机科学
网络拓扑
分布式计算
人工智能
算法
并行计算
计算机网络
操作系统
作者
Yuxuan Chen,Yilong Guo,Zeng Zeng,Xiaofeng Zou,Yangfan Li,Cen Chen
标识
DOI:10.1109/smartworld-uic-atc-scalcom-digitaltwin-pricomp-metaverse56740.2022.00143
摘要
Graph neural networks (GNNs) have shown strong performance on many tasks with graph-structured data. However, GNNs also face challenges as existing typical GNNs follow a neighborhood aggregation strategy, which leads to a very high complexity of GNNs thus limiting their deployment on resource-limited devices such as mobile devices. Thus, an efficient GNN is essential. Quantization is a very effective and important technique for inference acceleration in deep neural networks (DNNs), but very limited research works have been carried out to explore quantization algorithms for GNNs. In this paper, we propose a Topology-aware Quantization strategy via Personalized PageRank (TQPP) that can topologically perceive importance of nodes in overall structure of the graph. Furthermore, the protective masks generated based on the importance level ensure that sensitive nodes perform full-precision operations while the rest of the nodes are quantized. With this mixing precision approach, efficient partitioning of different nodes for quantization or keeping full-precision operations can be achieved, leading to greater acceleration and less memory footprint while retaining better model performance. We validate our algorithm on three different datasets and we demonstrate the gains of up to 27.5% over the base quantization baseline as well as a speedup of up to $3.6 \times$ on CPU deployment.
科研通智能强力驱动
Strongly Powered by AbleSci AI