计算机科学
库达
递归(计算机科学)
并行计算
绘图
计算
计算科学
方案(数学)
图形处理单元
编码(集合论)
图形处理单元的通用计算
基础(线性代数)
启发式
集合(抽象数据类型)
算法
数学
计算机图形学(图像)
数学分析
人工智能
几何学
程序设计语言
作者
Jorge L. Gálvez Vallejo,Giuseppe M. J. Barca,Mark S. Gordon
标识
DOI:10.1080/00268976.2022.2112987
摘要
A novel methodology for the evaluation of two electron integrals up to f functions using Graphics Processing Units (GPUs) is presented. The Head-Gordon-Pople recursion relationships are solved via a simple heuristic methodology to minimize the number of evaluated intermediates in the recursion trees. Automatic code generation is used to generate highly optimized CUDA kernels. A novel approach for f functions is presented in which integral classes are split into smaller subclasses to minimize register pressure and exploit additional parallelism at the cost of recomputing a small number of intermediates. Alongside optimized kernels, the ERI evaluation scheme works in conjunction with an efficient work distribution scheme which guarantees load-balancing during computation. The new HGP scheme shows excellent speedups of 2× to above 60× against existing GPU code. Additionally, when coupled with digestion into the Fock matrix, the scaling is excellent on up to 7 GPUs with an 85% parallel efficiency for the 6-31G(d) basis set.
科研通智能强力驱动
Strongly Powered by AbleSci AI