NTT Multiplication for NTT-unfriendly Rings

New Speed Records for Saber and NTRU on Cortex-M4 and AVX2

Authors

  • Chi-Ming Marvin Chung Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
  • Vincent Hwang Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
  • Matthias J. Kannwischer Max Planck Institute for Security and Privacy, Bochum, Germany
  • Gregor Seiler IBM Research – Zurich, Rüschlikon, Switzerland; ETH Zurich, Zurich, Switzerland
  • Cheng-Jhih Shih Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
  • Bo-Yin Yang Academia Sinica, Taipei, Taiwan

DOI:

https://doi.org/10.46586/tches.v2021.i2.159-188

Keywords:

Polynomial Multiplication, NTT Multiplication, Saber, NTRU, Cortex-M4, AVX2

Abstract

In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).
For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.
As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.

Published

2021-02-23

Issue

Section

Articles

How to Cite

Chung, C.-M. M., Hwang, V., Kannwischer, M. J., Seiler, G., Shih, C.-J. ., & Yang, B.-Y. . (2021). NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(2), 159-188. https://doi.org/10.46586/tches.v2021.i2.159-188