Batching CSIDH Group Actions using AVX-512

Authors

  • Hao Cheng DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Georgios Fotiadis DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Johann Großschädl DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Peter Y. A. Ryan DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Peter B. Rønne DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg

DOI:

https://doi.org/10.46586/tches.v2021.i4.618-649

Keywords:

Post-Quantum Cryptography, Isogeny-Based Cryptography, CSIDH, AVX-512IFMA, Software Optimization, Constant-Time Implementation

Abstract

Commutative Supersingular Isogeny Diffie-Hellman (or CSIDH for short) is a recently-proposed post-quantum key establishment scheme that belongs to the family of isogeny-based cryptosystems. The CSIDH protocol is based on the action of an ideal class group on a set of supersingular elliptic curves and comes with some very attractive features, e.g. the ability to serve as a “drop-in” replacement for the standard elliptic curve Diffie-Hellman protocol. Unfortunately, the execution time of CSIDH is prohibitively high for many real-world applications, mainly due to the enormous computational cost of the underlying group action. Consequently, there is a strong demand for optimizations that increase the efficiency of the class group action evaluation, which is not only important for CSIDH, but also for related cryptosystems like the signature schemes CSI-FiSh and SeaSign. In this paper, we explore how the AVX-512 vector extensions (incl. AVX-512F and AVX-512IFMA) can be utilized to optimize constant-time evaluation of the CSIDH-512 class group action with the goal of, respectively, maximizing throughput and minimizing latency. We introduce different approaches for batching group actions and computing them in SIMD fashion on modern Intel processors. In particular, we present a hybrid batching technique that, when combined with optimized (8 × 1)-way prime-field arithmetic, increases the throughput by a factor of 3.64 compared to a state-of-the-art (non-vectorized) x64 implementation. On the other hand, vectorization in a 2-way fashion aimed to reduce latency makes our AVX-512 implementation of the group action evaluation about 1.54 times faster than the state-of-the-art. To the best of our knowledge, this paper is the first to demonstrate the high potential of using vector instructions to increase the throughput (resp. decrease the latency) of constant-time CSIDH.

Downloads

Published

2021-08-11

Issue

Section

Articles

How to Cite

Cheng, H., Fotiadis, G., Großschädl, J., Ryan, P. Y. A., & Rønne, P. B. (2021). Batching CSIDH Group Actions using AVX-512. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(4), 618-649. https://doi.org/10.46586/tches.v2021.i4.618-649