计算机科学
建筑
随机存取存储器
计算机体系结构
计算机硬件
并行计算
艺术
视觉艺术
出处
期刊:IEEE Transactions on Circuits and Systems I-regular Papers
[Institute of Electrical and Electronics Engineers]
日期:2023-08-25
卷期号:70 (11): 4505-4515
被引量:12
标识
DOI:10.1109/tcsi.2023.3306347
摘要
The attack on quantum computers is an enormous threat to conventional public-key cryptography. Hence, it is crucial to study quantum-resistant cryptosystems. After four rounds of evaluation, the National Institute of Standards and Technology (NIST) has decided to standardize CRYSTALS-Kyber as one of the public-key post-quantum cryptography (PQC) algorithms. In the hardware design of CRYSTALS-Kyber, the polynomial-related calculations are the most time-consuming. In this paper, we present a highly-efficient hardware architecture for CRYSTALS-Kyber. Firstly, we propose the CRYSTALS-Kyber-oriented conflict-free memory mapping scheme with two modes. Based on this scheme, we construct the mixed radix-2/4 NTT/INTT algorithm, which has no pre- or post-processing, for the first time. By using the "lazy-last-layer" trick, the available memory bandwidth of NTT is temporarily increased, and the average performance of NTT is improved. Besides, the point-wise-multiplication (PWM) is performed in a single memory bank by cooperating with the two modes of our memory mapping scheme. This avoids the waste of memory bandwidth, thus avoiding the usage of large FIFOs for the sampled data. Last, we propose an efficient modular multiplier for CRYSTALS-Kyber, and we merge the divide-by-2 operations in the finite field into modular adders and subtractors to reduce resource consumption. This design, which supports all three security levels, is implemented on Xilinx Artix-7 FPGA with 7.3k LUTs, 3.2k FFs, 2.2k Slices, 5 BRAMs, and 4 DSPs. It performs 12% better in area-time-product than other leading designs in the literature.
科研通智能强力驱动
Strongly Powered by AbleSci AI