Quantum Free-Start Collision Attacks on Double Block Length Hashing with Round-Reduced AES-256

. Recently, Hosoyamada and Sasaki (EUROCRYPT 2020), and Xiaoyang Dong et al. (ASIACRYPT 2020) proposed quantum collision attacks against AES-like hashing modes AES-MMO and AES-MP . Their collision attacks are based on the quantum version of the rebound attack technique exploiting the diﬀerential trails whose probabilities are too low to be useful in the classical setting but large enough in the quantum setting. In this work, we present dedicated quantum free-start collision attacks on Hirose’s double block length compression function instantiated with AES-256, namely HCF-AES-256 . The best publicly known classical attack against HCF-AES-256 covers up to 9 out of 14 rounds. We present a new 10-round diﬀerential trail for HCF-AES-256 with probability 2 − 160 , and use it to ﬁnd collisions with a quantum version of the rebound attack. Our attack succeeds with a time complexity of 2 85 . 11 and requires 2 16 qRAM in the quantum-attack setting, where an attacker can make only classical queries to the oracle and perform oﬄine computations. We also present a quantum free-start collision attack on HCF-AES-256 with a time complexity of 2 86 . 07 which outperforms Chailloux, Naya-Plasencia, and Schrottenloher’s generic quantum collision attack (ASIACRYPT 2017) in a model when large qRAM is not available.


Introduction
The prospect of large scale quantum computers have prompted scrutiny towards the post-quantum security analysis of cryptographic primitives. In public-key setting, Shor's seminal work [Sho94] for factoring integers and computing discrete logarithms will break public-key schemes such as RSA, ECDSA, and ECDH in polynomial time. In symmetric-key setting, it was generally believed that Grover's algorithm [Gro96] would provide the quadratic speedup in exhaustive search attack against the symmetric-key schemes such as block ciphers and hash functions, and thus doubling the key length addresses the concern. Interestingly, this belief has now been challenged due to several dedicated quantum attacks, such as on block ciphers [BNS19b], hash functions [HS20, DSS + 20], message authentication codes, authenticated encryption schemes [KM10, KLLN16, Bon17, LM17, HS18, BNS19a, IHM + 19, DDW20] etc. These attacks primarily rely on Simon's algorithm [Sim97] requiring online quantum superposition queries, except in [BHN + 19] where offline queries are performed. However, the practical relevance of making online quantum superposition queries to the keyed primitives is controversial.
In contrast, finding collisions for hash functions does not require any online quantum superposition queries since all computations are public and can be performed offline. In the classical setting, the generic attack complexity to find collisions against an n-bit hash function is O(2 n/2 ) using the birthday paradox. In the quantum setting, BHT algorithm [BHT98] finds collisions with a query complexity of O(2 n/3 ) provided that O(2 n/3 )-qubit quantum random access memory (qRAM) is available. Therefore, any dedicated attack with less than O(2 n/3 ) quantum complexity will be regarded as a meaningful attack. However, given the current state of development of quantum computers, it is generally admitted that large qRAMs are not feasible in the near future. Therefore quantum algorithms using small or no qRAMs are preferable. Chailloux, Naya-Plasencia, and Schrottenloher [CNS17] present a collision finding algorithm with a query complexity of O(2 2n/5 ), a classical memory of O(2 n/5 ), and only O(n) quantum memory. This is the only algorithm in the literature that beats the classical birthday bound without using a large qRAM.
In this work, we target to attack double block-length hash functions in the quantum setting. The double block length (DBL) hashing is a well-established method of constructing a compression function with 2n-bit output based only on an n-bit block cipher. DBL hash functions have an obvious advantage over classical block cipher based functions such as the PGV hash modes [PGV93,BRSS10] in that the same type of underlying primitive allows for a larger compression function. The original idea dates back to the designs of MDC-2 and MDC-4 in 1988 by Meyer and Schilling [MS88]. Since then, many schemes have been presented following this approach [Mer89, LM92, HLMW93, Hir04, Hir06, Sta08, Sta09, FGL09, AFK + 11, Men17]. In particular, Hirose suggested a more efficient DBL construction by using two different provably secure block ciphers inside the compression function. Armknecht et al. [AFK + 11] showed preimage resistance and collision resistance of Hirose [Hir06], Abreast-DM [LM92] and Tandem-DM [FGL09] compression functions.
In 2009, Mendel et al. [MRST09] introduced the rebound attack as a variant of differential cryptanalysis and applied it to the hash function Whirlpool, standardized by ISO/IEC. Lambereger et al. [LMS + 15] further improved the rebound attacks by introducing multiple inbound phases. Following rebound attack techniques, Chen et al. [CHKM14] proposed the first free-start collision attack on DBL compression function when the underlying block cipher is instantiated with AES-256. Their attacks work for 6, 8, and 9 rounds of the construction with time complexity of 2 8 , 2 96 , and 2 120 , respectively in the classical setting. Recently, Hosoyamada and Sasaki [HS20] presented a dedicated quantum collision attack on 7-round AES-MMO and 6-round Whirlpool when a large qRAM is available. Later, Xiaoyang Dong et al. [DSS + 20] presented an improved quantum version of rebound attacks on 7-round AES-MMO and 5-round Grøstel-512 in the setting where a small qRAM is available. Their collision attacks use the rebound attack technique to exploit the differential trails, which have probabilities too low to be useful in the classical setting but large enough in the quantum setting. Motivated by their works, we apply a quantum version of rebound attacks with multiple inbound phases to find collisions on Hirose's DBL compression function when we instantiate the underlying block cipher with AES-256.

Our Contribution
This paper describes the first dedicated quantum collision attacks against double block length hash functions. We apply a quantum version of rebound attack that finds free-start collisions on Hirose's DBL compression function instantiated with AES-256 (in short, we call it HCF-AES-256). The proposed attack covers up to 10-rounds of HCF-AES-256 in the quantum setting. Our rebound attack uses two inbound phases that help us to mount dedicated quantum collision attacks against 10-round HCF-AES-256 that are faster than the generic quantum collision attacks even when small qRAM or no qRAM is available. However, the success of our attack largely depends on the configuration of the 16-byte constant (used in the design of HCF-AES-256, cf. § 2.2): the number of its non-zero bytes and their positions. On the other hand, the best publicly known classical attack exists up to 9-rounds of HCF-AES-256 if 4-bytes of the constant c are non-zero. Moreover, our attack improves the flexibility of the previously known best attack by allowing 8-bytes of constant c to be non-zero at some specific positions.
In addition, we propose a MILP-based method to systematically explore the search space of useful differential trails for the rebound attack with multiple inbound phases. Using this method, we find a differential trail for 10-round AES-256 with differential probability 2 −160 . We demonstrate that this trail can be used to mount free-start collision attacks on 10-round HCF-AES-256 in the quantum setting when a small qRAM is available and the 16-byte constant has eight non-zero bytes at some specific positions. We also present quantum collision attacks on 10-round HCF-AES-256 when no qRAM is available.
In time-space trade-off (TSTO) setting, if a quantum computer of size S qubits is available, then we can find collisions with time complexity 2 88.61 / S/2 4 , where 2 4 ≤ S < 2 76 . A summary of free-start collision attacks against HCF-AES-256 is given in Table 1.

Organization of the Paper
In § 2, we give preliminaries on Hirose's DBL compression function, AES, quantum computations, generic quantum collision attack settings and rebound attack. In § 3, following the rebound attack procedure, we present a quantum free-start collision attack on 10-round HCF-AES-256 when the qRAM available is small. In § 4, we show that the attack described earlier can be slightly modified, yet still remaining valid, in the setting when we do not have any qRAM. In § 4.4, we briefly discuss the quantum free-start collision attack on HCF-AES-192. In § 5, we describe how to search for useful truncated differential trails by MILP methods with multiple inbound phases. Finally, in § 6, we conclude and outline some directions for future work in this line of work.

Preliminaries
This section gives a brief introduction of AES-256, Hirose's double block length compression function, basic quantum computation and quantum random access memories (qRAMs), the frameworks for generic quantum collision-finding attacks, and the quantum version of rebound attacks.

Description of AES-256
AES-256 is a NIST/ISO standardized iterated block cipher which encrypts 128-bit plaintexts with 256-bit keys. The 128-bit block is arranged into a 4 × 4 byte matrix, whose bytes are numbered as described in Figure 1. AES-256 has 14 rounds, where each round function, except the last, consists of four subroutines in the following order: • SubBytes (SB) is a non-linear byte-wise substitution that applies the same 8 × 8 Sbox S to every byte.
• ShiftRows (SR) is a cyclic shift of the i th row by i bytes to the left.
• MixColumns (MC) is a matrix multiplication over a finite field applied to each column.
• AddRoundKey (ARK) is an exclusive-or with the round subkey.

Hirose's Double Block Length Compression Function
Hirose's DBL compression function [Hir04,Hir06] internally evaluates a 2n-bit keyed block cipher E : {0, 1} 2n × {0, 1} n → {0, 1} n by calling it two times. The first cipher call already compresses the entire input to the compression function, but the second cipher call also compresses the input independently of the first cipher call to produce a 2n-bit output. Formally, Hirose's compression function can be defined as follows.
The DBL compression function given in Definition 1 is also shown in Figure 2.

Quantum Computation and Quantum RAM
A quantum computer applies quantum gates on inputs available in qubits to obtain new quantum states. A qubit (|0 or |1 ) is a quantum system defined over a finite set B = {0, 1}. The state of a 2-qubit quantum system |ψ is the superposition defined as |ψ = α |0 + β |1 , where α, β ∈ C and |α| 2 + |β| 2 = 1. In general, the states of an n-qubit quantum system can be described as unit vectors in C 2 n under the orthonormal basis {|0 . . . 00 , |0 . . . 01 , . . . |1 . . . 11 }, alternatively written as {|i : 0 ≤ i < 2 n }. Any quantum algorithm is described by a sequence of gates in the form of a quantum circuit, and all quantum computations are reversible. We use the standard quantum circuit model and adopt the Clifford group {H, CN OT, S = T 2 } plus T gates. Here, H is the single-qubit Hadamard gate H : |b → 1 √ 2 (|0 + (−1) b |1 ), CNOT is the two-qubit controlled-NOT gate CN OT : |a |b → |a |b ⊕ a ), and T is the π/8 gate defined as T : |0 → |0 and T : |1 → e iπ/4 |1 . The identity operator on n-qubit states is denoted by I n .
When we estimate time complexity of an attack on a primitive, we assume unit of time to be the time required to run the primitive once (e.g., the time required for one encryption if the primitive is a block cipher). In addition, when we estimate space complexity of a quantum attack on a primitive, we regard the number of qubits to implement the target primitive as the unit of space size.

Superposition Oracles for Classical Circuit.
Considering a Boolean function f : {0, 1} n → {0, 1}, the quantum oracle for f is the unitary transformation U f acting on the (n + 1)-qubit system that transforms a standard basis vector |x |y → |x |y ⊕ f (x) , where x ∈ {0, 1} n and y ∈ {0, 1}. The linear operator U f acts on the superposition states as (2) Note that U f can be implemented efficiently in the standard quantum circuit model as long as there is an efficient reversible classical circuit that computes f . To build the quantum circuit for unitary operator U f , we first construct an efficient reversible circuit for f and substitute quantum gates for each of the reversible gates involved.
Grover's Search Algorithm. Given a search space of 2 n elements, say {x : x ∈ {0, 1} n } and a Boolean function or predicate f : {0, 1} n → {0, 1}, the best classical algorithm with a black-box access to f requires about 2 n evaluations of the black-box oracle to identify x such that f (x) = 1 with probability one. In the quantum setting, Grover's search algorithm [Gro96] solves this problem with about O( √ 2 n ) calls to a quantum oracle U f that outputs x a x |x |y ⊕ f (x) upon input of x a x |x |y . Firstly, we construct a uniform superposition of states by applying the Hadamard transformation H ⊗n to |0 ⊗n . We then iteratively apply the Grover operator (2 |ψ ψ| − I)U f to |ψ such that the amplitudes of those values x with f (x) = 1 are amplified. We then measure the resulting state which gives x such that f (x) = 1 with an overwhelming probability.
The exact complexity of the Grover search can be estimated by implementing the oracle circuit efficiently. It is thus essential to have a precise estimate of the quantum resources needed to implement the oracle. For example, the oracle circuit might require a large or small qRAM for its implementation.
Quantum Random Access Memories (qRAMs). The quantum random access memory (qRAM) is the quantum analogue of the classical random access memory (RAM), which uses n qubits to address a quantum superposition of 2 n memory cells. Given a list of classical data L = {x 0 , . . . , x 2 n −1 } with x i ∈ {0, 1} m , the qRAM for L is modeled as an unitary operator U L qRAM defined by where i ∈ {0, 1} n , y ∈ {0, 1} m , and |· Address and |· Output may be regarded as the address and output registers respectively. Therefore, we can access any quantum superposition of the data cells by using the corresponding superposition of address: When we say that qRAM is available, we assume that a quantum gate that realizes the unitary operation (3) (for a list of classical data) is available in addition to basic quantum gates.

Frameworks for Quantum Collision-Finding Attacks
This section reviews the various frameworks depending upon the generic quantum collision-finding algorithms in the quantum settings. Suppose that we have a differential trail with probability p. In the classical setting, we can mount a collision attack requiring at least 1/p operations, and such a collision attack is faster than a generic attack (birthday paradox) only if 1/p < 2 n/2 or p > 2 −n/2 holds. In the quantum setting, similar to the work [HS20], we consider the following scenarios: 1. BHT Algorithm (the setting with qRAM). Brassard, Høyer, and Tapp [BHT98] developed the generic quantum collision-finding algorithm. It finds collisions in time O(2 n/3 ) by making O(2 n/3 ) quantum queries when exponentially large qRAM is available. Let f : {0, 1} n → {0, 1} n be a random function. BHT consists of two steps. The first step performs a classical precomputation that chooses a subset X ⊂ F n 2 of size |X| = 2 n/3 and computes the value f (x) for all x ∈ X (which requires O(2 n/3 ) queries and O(2 n/3 ) time). The 2 n/3 pairs L = {x, f (x)} x∈X are stored into qRAM so that they can be accessed in quantum superpositions. Then the second step performs Grover search to find Hence, if we have a differential trail with probability p, then we can mount a collision attack in time ≈ 1/p. Such an attack is faster than the generic attack (BHT algorithm) if 1/p < 2 n/3 . In other words, the attack is better than generic if p > 2 −2n/3 and a large qRAM is available.

Tradeoffs between Time and Space.
From the viewpoint of time-space complexity, BHT [BHT98] is worse than the classical parallel rho method by Oorschot and Wiener [vOW94]. Roughly speaking, when P classical processors are available, the parallel rho method finds a collision in time O(2 n/2 /P ). If a quantum computer of size 2 n/3 without qRAM is available then we can run the parallel rho method on such a quantum computer and find a collision in time 2 n/6 . This is faster than using BHT. Let S denote the size of computational resources required for a quantum algorithm (i.e., S is the maximum size of quantum computers and classical memory), and T denote its time complexity. The tradeoff T · S = 2 n/2 given by the parallel rho method is the best one even in the quantum setting.
Thus, if we have a differential trail with probability p, then we can mount a collision attack using the rebound technique in time T ≈ T in . 1/p, where T in is the time to perform the inbound phase of size S 0 . Such an attack is faster than generic attack (parallel rho method) if p > T 2 in S 0 2 −n holds. In addition, if a quantum computer of size S ≥ S 0 is available, by parallelizing the Grover search for outbound phase, we obtain the tradeoff T = T in . 1/p S 0 /S, which is better than generic tradeoff T = 2 n/2 /S as long as S < 2 n .p/(T 2 in .S 0 ). 3. Small Quantum Computer with Large Classical Memory. Suppose that only a small quantum computer of polynomial-size in terms of the number of qubits required for designing the circuit is available, but we can use an exponentially large classical memory. In this scenario, Chailloux et al. [CNS17] showed that we could find a collision in time O(2 2n/5 ) with a quantum computer of size O(1) and O(2 n/5 ) classical memory. The product of T and S becomes around 2 3n/5 , which is larger than 2 n/2 , but it is quite usual to consider a classical memory of size O(2 n/5 ), which is usually available. The algorithm by Chailloux et al. [CNS17] shows that we can obtain another better tradeoff between time and space if we treat the sizes of quantum hardware and classical hardware separately.
Therefore, if we have a differential trail with probability p, then we can mount a collision attack in time ≈ 1/p. Such an attack is faster than the generic attack (BHT algorithm) if 1/p < 2 2n/5 (or p > 2 −4n/5 ) even when no qRAM is available.

Rebound Attacks with Quantum Computers
The rebound attack consists of two phases, called inbound and outbound phases. The inbound phase is an efficient meet-in-the-middle phase, which exploits the truncated differences and the available degrees of freedom in the internal state to fulfill the low probability parts in the middle of a differential characteristic. In the probabilistic outbound phase, the matches of the inbound phase are computed backward and forward to obtain an attack on the hash or compression function. Usually, the inbound phase is repeated many times to generate enough starting points (data pairs) respecting the inbound differential, which then propagates to the outbound differential to satisfy the full truncated differential trail.
To mount a rebound attack in quantum setting, we use Grover's algorithm to a search space by defining a Boolean function F which marks the elements of interest. Let (∆ in , ∆ out ) be the input-output difference with regard to the inbound differential. Let 3. Uncompute steps 1 and 2.
Extended Inbound Phases. The idea is to use all available degrees of freedom (from both states and subkeys) to extend the rebound attack for more rounds. Extended inbound phases consist of more than one independent inbound phase, which can be connected by choosing subkeys accordingly. In the outbound phase, we further extend the differential trail backward and forward by propagating the matching differences in the inbound phases to get a truncated differential path in each direction. The rebound attack's quantum setting remains the same as described above, except in the case of connecting inbound phases through the subkeys generated from the master key.

Quantum Collision Attacks on 10-Round HCF-AES-256 with Small qRAM
This section gives the proposed free-start collision attack against HCF-AES-256. To find a collision for the compression function CF, the attack uses the following fact from [CHKM14,CHKM16]:  (h 1 , M )).
Using the above fact, the goal of finding collisions on CF reduces to finding collisions on f 0 , for which we can proceed as follows: 1. Find a colliding pair of inputs (h 0 , (h 1 , M )) and (h 0 ⊕ ∆h 0 , (h 1 , M )).
Chen et al. [CHKM14,CHKM16] instantiate CF with AES-256, and find collisions for f 0 using the rebound attack procedure. The attack technique returns a pair of colliding inputs (h 0 , h 1 , M ) and (h 0 , h 1 , M ) with difference ∆h 0 = h 0 ⊕ h 0 = c whose bytes are non-zero at the same position as the non-zero bytes of the constant c.
In this section, we present a new differential trail for 10-round AES-256 and demonstrate how to use the differential trail to mount rebound attacks on HCF-AES-256 in the small-qRAM quantum setting. Our attack finds a colliding pair of inputs if the constant c has eight non-zero bytes at some specific positions.

A New Differential Trail for 10-Round AES-256
Here, we give a new differential trail with the differential probability p out = 2 −96 for 10-round AES-256 that can be used to find collisions against 10-round HCF-AES-256. With some effort, we can come up with a 10-round differential trail as shown in Figure 3. Here, each 4 × 4 square matrix shows the active byte pattern of the AES state. This trail gives p out = 2 −96 since the probability of an 8-byte cancellation for the feed-forward operation is 2 −64 . We then use this trail to mount rebound attacks on 10-round HCF-AES-256 in the quantum settings, that returns a pair of colliding inputs (h 0 , h 1 , M ) and (h 0 ⊕ ∆h 0 , h 1 , M ). Further, we need the condition ∆h 0 = c, where c has 8 non-zero bytes at some specific positions, and this can be achieved with probability 2 −64 . Therefore, the overall time complexity of the attack is 2 96 × 2 64 = 2 160 .

Differential Distribution Table of S-box
We precompute the differential distribution table (DDT) of the S-box in Table T using Algorithm 1, and load it into RAM. We can compute an input-output data pair through 1 DDT access on given an input-output difference to a cell. Since the S-box can be implemented with RAM, we regard that one random access to a classical memory or qRAM is equivalent to one S-box application.
Algorithm 1: The differential distribution table of S with data pairs 1 Let T be an empty dictionary.

A Small-qRAM Collision Attack on 10-Round HCF-AES-256
At the core of the attack, we apply Grover's algorithm to a search space where an efficiently computable Boolean function marks the elements of interest to be possible solutions. Next, we proceed to define our Boolean function F . We assume that the instantiated input-output difference pair is represented as (∆ in , ∆ out ) for the inbound differential with regard to Figure 3. The goal of the #Starting Points  inbound phase of a rebound attack is to generate data pairs respecting the multiple inbound differentials. For two inbound phases, we define the input-output difference pair be the input-output difference pair for the first inbound differential, and (∆ 2 in , ∆ 2 out ) be the input-output difference pair for the second inbound differential. For the complete inbound differential, we define a Boolean function in a way such that F (∆ 1 in , ∆ 2 in , ∆ 1 out , ∆ 2 out ) = 1 if and only if the starting point computed with (∆ 1 in , ∆ 2 in , ∆ 1 out , ∆ 2 out ) fulfils the backward and forward outbound differentials. Therefore, if F (∆ 1 in , ∆ 2 in , ∆ 1 out , ∆ 2 out ) = 1, we can produce two different colliding inputs h 0 and h 0 such that CF(h 0 , (h 1 , M )) = CF(h 0 , (h 1 , M )), where h 0 and h 0 are obtained from the starting point, and (h 1 , M ) is obtained from the keys derived from connecting rounds in inbound phases 1 and 2. On given (∆ in , can be computed with a classical computer by the following approach: 1. Compute the differential (∆X ).
If there are no admissible inputs for the pair (∆X 4 , ∆Y 4 ), then return to Step 1.  If there are no admissible inputs for the pair (∆X 7 , ∆Y 7 ), then return to Step 3. 5. Select ∆Y 5 compatible with ∆X 5 with the help of precomputed DDT lookup table.

Compute the differential (∆X
Calculate ∆X 6 form ∆Y 5 and check whether ∆X 6 and ∆Y 6 are compatible for each eight active bytes. If there are no admissible inputs for the pair (∆X 5 , ∆Y 5 ) and (∆X 6 , ∆Y 6 ), then repeat the process in Step 5.
6. Connect the results of two inbound phases to ensure that the differences in the eight active bytes of round 5 and the actual values of Y 5 and X 6 match by choosing the subkeys K 4 , K 5 and K 6 accordingly.
7. Using the key schedule of AES-256, we compute the round key K 3 from K 4 and K 5 , K 2 from K 3 and K 4 , K 1 from K 2 and K 3 , and K 0 from K 1 and K 2 . Similarly, we can compute the round key K 7 from K 5 and K 6 , K 8 from K 6 and K 7 , K 9 from K 7 and K 8 , and K 10 from K 8 and K 9 .
8. Compute starting points X 5 and X 6 from key K 4 with state W 4 and from key K 6 with state W 5 respectively. Note that we now have a corrected path for the starting points X 4 → X 5 → X 6 → X 7 .
9. If the starting point (X 4 , X 4 ⊕ ∆X 4 ) obtained in Step 2 respects the backward outbound differential, and the starting point (X 7 , X 7 ⊕ ∆X 7 ) obtained in Step 4 respects the backward outbound differential, then F (∆ in , ∆ out ) returns 1; otherwise it returns 0.
At the end, if F (∆ in , ∆ out ) = 1, we compute the corresponding inputs (X, X ⊕ ∆X) from X 4 and ∆X 4 , and (W 12 , ∆W 12 ) from Y 7 and ∆Y 7 . If ∆X = ∆W 12 , then output the pair of inputs (K, X) and (K, X ⊕ ∆X), which are mapped to the same hash value by f 0 instantiated with 10-round AES-256, where K = K 0 ||K 1 . Therefore, by applying Grover's search with the quantum oracle U F which maps |∆ in , ∆ out , α |y to |∆ in , ∆ out , α |y ⊕ F (∆ in , ∆ out , α) , we can find a collision with around √ 2 160 queries. To estimate the overall complexity, we need to find the exact complexity incurred by U F .

Implementation of the Quantum Oracle U F
To implement the quantum oracle U F , we firstly define an additional function D (i) for 0 ≤ i < 16, that computes the actual input-output data pair respecting the differential of each S-box S for round j by accessing the precomputed DDT. For example, the function D (i) outputs min{X Since the computation of D (i) in the classical setting uses the table T computed by Algorithm 1, implementing a quantum oracle of D (i) requires qRAMs of 2 16 size. Thus, the oracle U F can be constructed with the quantum circuit of D (i) , which is presented in Algorithm 2.

Computing Round Key K 6
The following eight conditions are deduced from the AES-256 key expansion algorithm: Now since X 7 is given to the algorithm as input, the above equations (14) and (15) {0, 1, . . . , 15} do   3 Compute the corresponding differential ∆X  from the corresponding bytes of W 5 and X 6 . 22 Compute the round key K 6 satisfying the conditions obtained so far by expanding them into 16 linear equations as described in § 3.5. 23 Compute the remaining bytes of K 4 from K 5 [2], K 5 [3] and K 6 . 24 Compute W 5 from Y 4 with K 4 , and X 6 from X 7 with K 6 . Then compute for 0 ≤ j ≤ 1. 25 Compute the round keys K 0 , K 1 , K 2 , K 3 , K 7 , K 8 , K 9 and K 10 . 26 /* Create starting points derived from 4 ) and X 4 ← (X ) and X 7 ← (X

Complexity Analysis
First, we describe the following facts and assumptions used in our quantum collision attack on HCF-AES-256.
• The complexity of one access to the qRAM that stores a table of input-output differences is equivalent to one S-box computation.
• The complexity of solving linear equations involved in computing K 6 , which are conditioned on K 4 and K 5 , is ignored.
• One computation of inverse S-box is about the same as computing two S-boxes [JNRV20].
• Uncomputing is taken into account to free-up the wires after executing a task.
In our attack setting, we first precompute the differential distribution table (DDT) with 2 16 classical data for the S-box (see Algorithm 1) and then load this table into a qRAM in advance. This qRAM is accessed by the quantum circuit for D (i) .

Complexity of D (i)
. D (i) is used to compute input-output data pairs through the precomputed DDT accesses as given in Algorithm 1. One DDT access is equivalent to one S-box evaluation. Hence, we need only one S-box evaluation, which is about 2× 1 200 ≈ 2 −6.64 10-round AES-256 computations.
Complexity of U F . In Algorithm 2, Steps 2-4 as well as Steps 6-8, make 16 calls to D (i) .
requiring a dedicated qRAM.
We adopt three methods to get rid of the requirement of qRAM for accessing D (i) . The first two methods are similar to the work by Dong et al. [DSS + 20], while the third method relies on time-space tradeoff. Essentially, we re-implement D (i) without using the DDT stored in qRAMs, while keeping the functional behavior of D (i) unchanged.

Method 1: Using Grover's Search for S-box
The idea is to generate data pairs by online search instead of table lookups when given a specific input-output differential (δ in , δ out ) for an 8 × 8 S-box. Specifically, we just replace the table lookups using D (i) by Grover's search algorithm for D (i) in Algorithm 2, while other parts of the algorithm remain the same.

Method 3: A Time-Space Tradeoff
Recall that the generic collision finding algorithm in this setting is the parallel-rho method, which gives the tradeoff T.S = 2 n/2 , or equivalently T = 2 n/2 /S. We regard the size (the number of qubits) required to implement the attack target (here, 10-round AES-256) as the unit of space size. We again use the same differential trail with probability 2 −160 for the outbound phase, and thus the domain size of F is 2 160 . Following § 4.1, the cost of U F is ≈ 2 8.96 encryptions. In addition, we require some ancillary quantum registers to realize U F . In Algorithm 2, Steps 4, 8, 12 and 15 require some ancillary quantum registers to solve the S-box differential equation S(x i ) ⊕ S(x i ⊕ δ in ). The size of quantum register is ≈ 1/16 units (as the block size of S-box is 1/16 of the internal state size of AES). To compute and store the values (δ (i) in , δ (i) out ) for 1 ≤ i ≤ 16 in Steps 3, 7, 12 and 15, we use ancillary quantum registers of size ≈ 4 × (1/16 × (2 × 1/16)) = 8. We also have to use a quantum register to store x i and another quantum register to compute D (i) . Both of these are of sizes ≈ 1/16 units. Thus, additional quantum registers of size 16 × (1/16 + 1/16) = 2 is required for Steps 2-18. In Steps 29 and 31, we use additional quantum registers of size ≈ 5. The 2 qubits required for flag 1 and flag 2 are quite small and we can ignore them. In total, we use additional quantum registers of size (8 + 2 + 5) ≈ 2 4 . Therefore, we can implement U F on a quantum circuit in such a way that it runs in time around 2 8.96 encryptions with 10-round AES, by using ancillary quantum register of size around 2 4 .
When a quantum computer of size S (S ≥ 2 4 ) is available and we use them to parallelize the Grover search, our rebound attack runs in time T ≈ 2 88.61 / S/2 4 = 2 90.61 / √ S. Therefore, our attack is better than the generic attack in the setting where the efficiency of a quantum algorithm is measured by the tradeoff between time T and space S as long as 2 4 ≤ S < 2 76 , but it is worse than the generic attack in other settings.

Discussion on Quantum Collision Attack on HCF-AES-192
Previously, we presented Hirose's double block length compression function instantiated with AES-256, namely HCF-AES-256 in Figure 2. Similarly, we can define HCF-AES-192 by modifying the message length from 128-bit to 64-bit.
To mount quantum free-start collision attacks on HCF-AES-192, we can easily find a 10-round differential trail for AES-192 with the same probability as for AES-256 by using the MILP methods. More specifically, by modifying the differential trail for AES-256 given in Figure 3 for connecting the two inbound phases (from round 4 to round 6), we can find a valid differential trail for AES-192. We need to select the positions of active bytes after applying AddRoundKey operation such that we can efficiently recover the round keys K 4 , K 5 , K 6 using the key schedule of AES-192. As a result, the attack strategy against HCF-AES-192 would not only remain the same as described in § 3, but also the time complexity of quantum collision attacks remains the same, i.e. 2 85.11 in the setting of Q-Model-I. Note that this method can attack 10 out of 12 rounds of HCF-AES-192.

Searching for Differential Trail with MILP Methods
We now describe a Mixed Integer Linear Programming (MILP) based tool proposed by Mouha et al. [MWGP11] to find the optimal differential trail for AES-256.

MILP Model.
In order to find the optimal differential trail for our attack, we use a tool based on MILP. This model describes the propagation characteristics of the difference patterns using linear inequalities and defines an objective function to minimize the complexity of the collision attack. Specifically, for finding collisions for hash functions using the rebound attack technique, we need to modify the MILP model by Mouha et al. [MWGP11] by simply converting the collision search such that the active byte patterns of the first round input and the last round output are identical.
Assume that there is a differential trail for E K of HCF-AES-256 with probability p whose input-output differences share a common value ∆. That is, let the differential trail be E h1||M (h 0 ) ⊕ E h1||M (h 0 ⊕ ∆) = ∆. Given around 1/p pairs of input messages with difference ∆, we expect one pair ((h 0 , h 1 , M ), (h 0 ⊕ ∆, h 1 , M )) to follow this trail. Thus, the difference of the outputs of HCF-AES-256 for the valid pair of messages becomes zero, and leads to a collision.
Since K is known in hash functions, it is possible to generate many data pairs which conform to one particular segment of the desired trail. Then these pairs are tested to find the one which fulfills the remaining part of the trail. This is the basic strategy employed by the rebound attack [MRST09].
For each model, we fix the positions of two inbound phases. In the first inbound phase, we fix the round index r + 1 for which MixColumns in rounds r + 1 and r + 2 are satisfied with cost one on average. Similarly, in the second inbound phase, we fix the round index r + 4 in which MixColumns in rounds r + 4 and r + 5 are satisfied with cost one on average. We connect these two inbound phases in round r + 3. Because the last round does not have MixColumns, we only have 4 choices in the case of the 10-round attack: r ∈ {1, 2, 3, 4} by starting the round counting from 0. For example, the 10-round trail introduced in § 3.1 is when r = 2. The probability of the outbound phase is affected by two factors: 1. the number of difference cancellations in MixColumns, 2. the number of difference cancellations in the feed-forward.

Variables and Constraints.
For an N -round primitive, we first introduce an integer variable r, which determines two inbound phases from round r + 1 to r + 2, and from round r + 4 to r + 5. These inbound phases are connected in round r + 3. The backward outbound phase is connected from round r to 0, and the forward outbound phase is connected from round r + 5 to N − 1.
We introduce a set of 0-1 variables x j for all cells of the states involved, where x j = 1 if and only if the corresponding cell is differentially active. Let, x i0 , x i1 , x i2 , x i3 denote the input bytes and y i0 , y i1 , y i2 , y i3 denote the output bytes of the MixColumns transformation for each column. We also introduce a 0-1 dummy variable d to denote whether the column is active or not; and another variable b (0 ≤ b ≤ 3) to count the number of inactive bytes in active columns. Then, the proper relationships can be modeled in the following equality: Additionally, we use following set of the inequalities to model the behaviour of linear transformation of AES: where B D denotes the differential branch number of the AES MixColumns transformation. The value of d is nonzero only if any of the x i0 , x i1 , x i2 , x i3 , y i0 , y i1 , y i2 and y i3 is nonzero.
Finally, we need to add the constraints such that active byte patterns of the first round input and the last round output are identical. For example, to ensure the feed-forward cancellation for 10-round HCF-AES-256, we add the following constraints: where X and W 12 (refer to Figure 3) denote the input and output differences of the 10-round differential trail for AES-256.
The Objective Function. To minimize the time complexity of the outbound phase, our objective function is to minimize the sum of b's and the variables x 0 to x 15 .
Hence, our goal is to Round 0 x j .

Conclusions and Open Problems
In this work, we presented quantum free-start collision attacks on the DBL compression function [Hir06] instantiated with 10-round AES-256, namely HCF-AES-256 when small qRAM or no qRAM is available. This is achieved by performing a quantum version of the rebound attack with extended inbound phases. Our attack on HCF-AES-256 outperforms the generic attack of Chailloux, Naya-Plasencia, and Schrottenloher [CNS17] in a model when large qRAM is not available. However, our attack has two limitations: (1) it is a free-start collision attack, and (2) we require the constant c to have a low hamming weight. More precisely, c should have 8 non-zero bytes at some specific positions for our attack to be valid. Extending these attacks to more than 10 rounds will be interesting. Another interesting future work will be to extend the quantum collision attacks on Hirose's double block length compression function with other variants as given in [Hir06]. One might also revisit previous differential trail search activities so that we will be able to construct more efficient dedicated quantum collision-finding attacks against hash functions. The extension of the collision attack with real IV's remains an open problem.

Discussion on Related-Key Differential Cryptanalysis of Hash Functions based on AES.
In a related-key attack against a block cipher, the attacker is given access to the encryption oracle under keys that differ from the target key by a known difference. In differential cryptanalysis, the attacker is allowed to introduce difference ∆X = X ⊕ X in plaintext pairs, whereas in related-key differential cryptanalysis, the attacker is additionally allowed to introduce difference ∆K = K ⊕ K in keys such that ∆X becomes ∆X r after r rounds with high probability.
At present, in the classical setting, the best related-key differential attack [BKN09] can break the full 14-round AES-256 with total complexity of 2 131 time and 2 65 memory. Biryukov et al. [BKN09] also discussed how the related-key differential trail of AES-256 can be used to find free-start collisions for the Davies-Meyer compression function. The differences in the IV can be cancelled by the feed-forward operation of Davies-Meyer mode, if the difference in the plaintexts is equal to the difference in the ciphertexts.
Note that Hirose's double block length compression function uses the Davies-Meyer mode. This allows us to use a related-key differential for the underlying block cipher. The related-key differential trail for full AES-256 from [BKN09] does not have the same difference in the plaintext and the ciphertext pairs, which is required for cancellation by the feed-forward operation in this mode. Assuming that a related-key differential trail with same plaintext-ciphertext difference can be constructed for full AES-256 with probability 2 −131 (same as that given in [BKN09]), we can find collisions on full 14-round HCF-AES-256. However, to mount the collision attack on HCF-AES-256, we need to satisfy an extra condition ∆h 0 = c. If the attack requires c to have 8 non-zero bytes at some specific positions, as in the attacks described by us earlier, it will contribute a probability 2 −64 cost to the attack. Therefore, the time complexity of quantum free-start collision attack will be ≈ √ 2 195 = 2 97.5 . On the other hand, if we apply CNS algorithm [CNS17] on full HCF-AES-256, then we can find collisions with time complexity 2 102.4 and 2 51.2 classical memory. Quantifying the exact time/memory/data complexity of this attack, and improving it further to reduce the cost of the attack for full HCF-AES-256 in the related-key attack model remains an interesting research problem for future.