Mixture Diﬀerential Cryptanalysis: a New Approach to Distinguishers and Attacks on round-reduced AES

. At Eurocrypt 2017 the ﬁrst secret-key distinguisher for 5-round AES - based on the “multiple-of-8” property - has been presented. Although it allows to distinguish a random permutation from an AES-like one, it seems rather hard to implement a key-recovery attack diﬀerent than brute-force like using such a distinguisher. In this paper we introduce “ Mixture Diﬀerential Cryptanalysis ” on round-reduced AES-like ciphers, a way to translate the (complex) “multiple-of-8” 5-round distinguisher into a simpler and more convenient one (though, on a smaller number of rounds). Given a pair of chosen plaintexts, the idea is to construct new pairs of plaintexts by mixing the generating variables of the original pair of plaintexts. Here we theoretically prove that for 4-round AES the corresponding ciphertexts of the original pair of plaintexts lie in a particular subspace if and only if the corresponding pairs of ciphertexts of the new pairs of plaintexts have the same property. Such secret-key distinguisher - which is independent of the secret-key, of the details of the S-Box and of the MixColumns matrix (except for the branch number equal to 5) - can be used as starting point to set up new key-recovery attacks on round-reduced AES. Besides a theoretical explanation, we also provide a practical veriﬁcation both of the distinguisher and of the attack.


Introduction
Block ciphers are certainly among the most important cryptographic primitives. They are designed by iterating an efficiently implementable round function many times in the hope that the resulting composition behaves like a randomly drawn permutation. In the compromise, a round function is iterated enough times to make sure that any symmetries and structural properties that might exist in the round function vanish.
One of the most important tools that a cryptanalyst has at hand when trying to evaluate the security of ciphers or hash functions is -without doubt -differential cryptanalysis. Since its conception by Biham and Shamir [BS90,BS91] in their effort to break the Data Encryption Standard (DES), it has been successfully applied in many cases such that any modern cipher is expected to have strong security arguments against this attack.
With today's knowledge, designing a secure block cipher is a problem that is largely considered solved. Especially with the AES we have at hand a very well analyzed and studied cipher that, after more than 20 years of investigation still withstands all cryptanalytic attacks. However, new results on the AES still appear regularly, especially within the last couple of years (e.g. polytopic cryptanalysis [Tie16], "multiple-of-8" distinguisher [GRR17a] and yoyo distinguisher [RBH17]). While those papers do not pose any practical thread to the AES, they do give new insights into the internals of what is arguably the cipher that is responsible for the largest fraction of encrypted data worldwide.
"Multiple-of-8" distinguisher [GRR17a] proposed at Eurocrypt 2017 by Grassi, Rechberger and Rønjom is the first 5-round secret-key distinguisher for AES that exploits a property which is independent of the secret key and of the details of the S-Box. This distinguisher is based on a new structural property for up to 5 rounds of AES: by appropriate choices of a number of input pairs it is possible to make sure that the number of times that the difference of the resulting output pairs lie in a particular subspace is always a multiple of 8. This distinguisher allows to distinguish an AES permutation from a random one with a success probability greater than 99% using 2 32 chosen texts and a computational cost of 2 35.6 look-ups. On the other hand, as this distinguisher is based on a property of the whole state in the output of AES, it makes it challenging to convert it into a key-recovery attack over more rounds, since e.g. it requires guessing the whole subkey in the last round.
In this paper we introduce "mixture differential cryptanalysis" on round-reduced AES-like ciphers, a way to translate the (complex) "multiple-of-8" 5-round distinguisher [GRR17a] into a simpler and more convenient one (though, on a smaller number of rounds). As we are going to show, such new proposed technique leads to a new distinguisher and key-recovery attacks on 4-and 5-round AES (respectively) with data and computational complexity similar than other attacks in literature.
Such distinguisher and attack -fully practically verified -are also general enough to be applied to any AES-like cipher, and they might be valuable as a reference framework. In particular, many constructions employ reduced round AES as part of their design (e.g. among many others, AEGIS [WP] -one of the finalist of the on-going CAESAR competition [CAE] -uses five AES round-functions in the state update functions). Reduced versions of AES have nice and well-studied properties that can be favorably as components of larger designs (see for instance Simpira [GM16]). As a result, distinguishers and attacks on 4-/5-round AES can be also useful in analyzing those primitives. To give a concrete example, in [BEK16] authors exploit -in a new way -known properties of round-reduced AES to set up a new attack on ELmD [DN], another finalist of the on-going CAESAR competition.

Related Work
To the best of our knowledge, the concept of mixture differential cryptanalysis is new and has not been used in cryptanalysis before. Nonetheless there are other works that share some similarities with mixture differential cryptanalysis.
Before going on, as first thing we recall the notion of secret-key distinguisher, one of the weakest attacks that can be launched against a secret-key cipher. In this attack, there are two oracles: one that simulates the cipher for which the cryptographic key has been chosen at random and one that simulates a truly random permutation. The adversary can query both oracles and her task is to decide which oracle is the cipher and which is the random permutation. The attack is considered to be successful if the number of queries required to make a correct decision is below a well defined level.
Differential Attacks. Differential attacks [BS90] exploit the fact that couples of plaintexts with certain differences yield other differences in the corresponding ciphertexts with a non-uniform probability distribution. The resulting pair of differences is called a Table 1: Secret-Key Distinguishers for 4-round AES. The complexity is measured in minimum number of chosen plaintexts/ciphertexts (CP/CC) or/and adaptive chosen plaintexts/ciphertexts (ACP/ACC) which are needed to distinguish the AES permutation from a random one with probability higher than 95% (all distinguishers work both in the encryption and in the decryption mode). Time complexity is measured in equivalent encryptions (E), memory accesses (M) or XOR operations (XOR) -using the common approximation 20 M ≈ 1 Round of Encryption. The distinguisher of this paper is in bold. differential. Such a property can be used both to distinguish a cipher permutation from a random one, and to recover the secret key. Possible variants of this attack/distinguisher are the truncated differential attack [Knu95], in which the attacker considers only part of the difference between pairs of texts (i.e. a differential attack where only part of the difference in the ciphertexts can be predicted), and impossible differential attack [Knu98,BBS99], in which the attacker considers differential with zero-probability.

Property
In the original version of differential cryptanalysis [BS90], a unique differential is exploited. A generalization of such attack is multiple differential cryptanalysis [BG11], where several input differences are considered together and the corresponding output differences can be different from an input difference to another, that is the set of considered differentials has no particular structure.
The common feature of all these distinguishers/attacks is the fact that -in all these cases -the attacker focuses on the probability that a single pair of plaintexts with a certain input difference yield other difference in the corresponding pair of ciphertexts, working independently on each pair of texts.
Recent Results. Recently, new differential distinguishers have been proposed in the literature, precisely the polytopic cryptanalysis [Tie16] at Eurocrypt 2016 and the yoyo distinguisher on SPN constructions [RBH17] at Asiacrypt 2017, which present an important difference with respect to the previously recalled attacks. Instead of working on each couple 1 of two (plaintext, ciphertext) pairs independently of the others as in the previous scenario, in these cases the attacker works on the relations that hold among the couples of pairs of texts. In other words, given a couple of two (plaintext, ciphertext) pairs with a certain input/output differences, one focuses and studies how such couple influences other couples of two (plaintext, ciphertext) pairs to satisfy particular input/output differences.
More precisely, polytopic cryptanalysis is similar to multiple differential cryptanalysis. However, as opposed to assuming independence of the differentials (which does not hold in general, as showed in [Mur11]), the authors explicitly take their correlation into account and use it in their framework, considering interdependencies between larger sets of texts and as they traverse through the cipher.
The strategy exploited by the yoyo game on SPN constructions proposed at Asiacrypt 2017 is similar to the one that we are going to exploit to set up our new distinguisher. Given a pair of chosen plaintexts and the corresponding ciphertexts, the attacker constructs new pair of ciphertexts related to the other ones by linear and differential relations. Authors Consider N (plaintext, ciphertext) pairs (a). In a "classical" differential attack (b), one works independently on each couple of two (plaintext, ciphertext) pairs and exploits the probability that it satisfies a certain differential trail. In our attack (c), one divides the couples into non-random sets, and exploits particular relationships (based on differential trails) that hold among the couples that belong to the same set in order to set up a distinguisher.
prove that the corresponding new pair of plaintexts of this new second pair of ciphertexts satisfies -with prob. 1 -a difference related "in some sense" to the input difference of the original pair of plaintexts, independently of the secret-key. This allows to distinguish e.g. round-reduced AES from a random permutation, or to set up key-recovery attack.

Our Contribution
In this paper, we present "mixture differential cryptanalysis" on 4-round AES. This 4round secret-key distinguisher -proposed in Sect. 4 -is similar in nature to polytopic cryptanalysis and the yoyo distinguishers just recalled.
Given plaintexts in the same coset of a subspace C, the attacker first divides the couples of two (plaintext, ciphertext) pairs into sets of N ≥ 2 non-independent couples. These sets are defined such that particular relationships (that involve differential and linear relationships) hold among the plaintexts of the couples that belong to the same set. Due to the particular way -explained in detail the following -in which these sets are defined, we call our new technique as Mixture Differential Cryptanalysis. As already pointed out, the way in which these sets are constructed resemble the "multiple-of-8" distinguisher [GRR17a] recently proposed at Eurocrypt 2017.
Such sets have the property that the two ciphertexts of a certain couple belong to the same coset 2 of a particular subspace M if and only if the two ciphertexts of all the other couples in that set have the same property. In other words, it is not possible that two ciphertexts of some couples belong to the same coset of M, and that two ciphertexts of other couples don't have this property. Since this last event can occur for a random permutation, it is possible to distinguish 4-round AES from a random permutation.
In more detail and referring to Fig. 1, given n chosen (plaintext, ciphertext) pairs, in a "classical" (differential) attack one works on each couple of two (plaintext, ciphertext) pairs independently of the others -case (b). In our distinguishers/attacks instead, one first divides the couples in (non-random) sets of N ≥ 2 couples -case (c), and then she works on each set of couples independently of the other sets, exploiting the property just given.
We remark that our new mixture differential distinguisher is independent of the secret key (and of the key-schedule), of the details of the S-Box and of the MixColumns matrix. Such distinguisher works both in the encryption and in the decryption process, and it is general enough to be applied to any AES-like cipher. Compared to the yoyo distinguisher proposed at Asiacrypt 2017 that requires adaptive chosen plaintexts/ciphertexts, ours Furthermore, we highlight that our 4-round distinguisher might be used as starting point in order to set up new 5-round secret-key distinguishers on AES, as suggested in detail in [Gra17].
A New Key-Recovery Attack on 5-round AES-128. Finally, we show that mixture differential cryptanalysis is not only theoretically intriguing, but indeed relevant for practical cryptanalysis. In particular, in Sect. 5 we propose an attack on 5-round AES that exploits the distinguisher on 4 rounds proposed in Sect. 4. Such attack has then been improved in [BODK + 18], becoming the one with the lowest computational cost among the attacks currently present in the literature (that don't use adaptive chosen plaintexts/ciphertexts). In this attack, the attacker chooses plaintexts in the same coset of a particular subspace D which is mapped after one round into a coset of another subspace C. Using the mixture differential distinguisher just introduced and the facts that • the way in which the couples of two (plaintext, ciphertext) pairs are divided in sets depends on the (partially) guessed key • the behavior of a set for a wrongly guessed key is (approximately) the same as the case of a random permutation, she can filter wrong candidates of the key, and finally finds the right one.

Description of AES
The Advanced Encryption Standard [DR02] is a Substitution-Permutation network that supports key sizes of 128, 192 and 256 bits. The 128-bit plaintext initializes the internal state represented by a 4 × 4 matrix of bytes seen as values in the finite field F 256 , defined using the irreducible polynomial x 8 + x 4 + x 3 + x + 1. Depending on the version of AES, N r rounds are applied to the state: N r = 10 for AES-128, N r = 12 for AES-192 and N r = 14 for AES-256. An AES round applies four operations to the state matrix: • SubBytes (S-Box) -applying the same 8-bit to 8-bit invertible S-Box 16 times in parallel on each byte of the state (provides non-linearity in the cipher); • ShiftRows (SR) -cyclic shift of each row (i-th row is shifted by i bytes to the left); • MixColumns (M C) -multiplication of each column by a constant 4 × 4 invertible matrix over the field GF (2 8 ) (together with the ShiftRows operation, it provides diffusion in the cipher); • AddRoundKey (ARK) -XORing the state with a 128-bit subkey.
One round of AES can be described as In the first round an additional AddRoundKey operation (using a whitening key) is applied, and in the last round the MixColumns operation is omitted.
Notation Used in the Paper. Let x denote a plaintext, a ciphertext, an intermediate state or a key. Then x i,j with i, j ∈ {0, ..., 3} denotes the byte in the row i and in the column j. We denote by k r the subkey of the r-th round (where k 0 is the secret key for AES-128). If only one subkey is used (e.g. the first subkey k 0 ), then we denote it by k to simplify the notation. Finally, we denote by R one round 3 of AES, while we denote r rounds of AES by R r . As last thing, in the paper we often use the term "partial collision" (or "collision") when two texts belong to the same coset of a given subspace X .

Subspace Trails
Let F denote a round function in an iterative block cipher and let V ⊕ a denote a coset of a vector space V . Then if F (V ⊕ a) = V ⊕ a we say that V ⊕ a is an invariant coset of the subspace V for the function F . This concept can be generalized to trails of subspaces [GRR17b], which has been recently introduced as generalization of the invariant subspace cryptanalysis.
Definition 1. Let (V 1 , V 2 , ..., V r+1 ) denote a set of r + 1 subspaces with dim(V i ) ≤ dim(V i+1 ). If for each i = 1, ..., r and for each a i , there exist a i+1 such that F (V i ⊕ a i ) ⊆ V i+1 ⊕ a i+1 , then (V 1 , V 2 , ..., V r+1 ) is subspace trail of length r for the function F . If all the previous relations hold with equality, the trail is called a constant-dimensional subspace trail.
This means that if F t denotes the application of t rounds with fixed keys, then F t (V 1 ⊕ a 1 ) = V t+1 ⊕ a t+1 . We refer to [GRR17b] for more details about the concept of subspace trails. Our treatment here is however meant to be self-contained.

Subspace Trails of AES
Here we recall the subspace trails of AES presented in [GRR17b], working with vectors and vector spaces over F 4×4 2 8 . For the following, we denote by {e 0,0 , ..., e 3,3 } the unit vectors of F 4×4 2 8 (e.g. e i,j has a single 1 in row i and column j). We recall that given a subspace X , the cosets X ⊕ a and Definition 2. The column spaces C i are defined as C i = e 0,i , e 1,i , e 2,i , e 3,i .
For instance, C 0 corresponds to the symbolic matrix Definition 3. The diagonal spaces D i and the inverse-diagonal spaces ID i are defined as D i = SR −1 (C i ) and ID i = SR(C i ).
For instance, D 0 and ID 0 correspond to symbolic matrices For instance, M 0 corresponds to symbolic matrix Definition 5. For I ⊆ {0, 1, 2, 3}, let C I , D I , ID I and M I be defined as As shown in detail in [GRR17b] 4 : • for any coset D I ⊕ a there exists unique b ∈ C ⊥ I such that R(D I ⊕ a) = C I ⊕ b; • for any coset C I ⊕ a there exists unique b ∈ M ⊥ I such that R(C I ⊕ a) = M I ⊕ b.
Theorem 1 ( [GRR17b]). For each I and for each a ∈ D ⊥ I , there exist unique b ∈ C ⊥ I and c ∈ M ⊥ I (which depend on a and on the secret key k) such that We refer to [GRR17b] for a complete proof of the Theorem. Moreover, note that if X is a generic subspace, X ⊕ a is a coset of X and x and y are two elements of the (same) coset X ⊕ a, then x ⊕ y ∈ X . It follows that: For all x, y and for all I ⊆ {0, 1, 2, 3}: ( We finally recall that for each I, J ⊆ {0, 1, 2, 3}: as demonstrated in [GRR17b]. It follows that: , 1, 2, 3}. As a result, the complements of the subspaces C I , D I , ID I , M I are simply the (respective) orthogonal We remark that all these results can be re-described using a more "classical" truncated differential notation 5 , as formally pointed out in [BLN17]. To be more concrete, if two texts t 1 and t 2 are equal except for the bytes in the i-th diagonal 6 for each i ∈ I, then they belong to the same coset of D I . A coset of D I corresponds to a set of 2 32·|I| texts with |I| active diagonals. Again, two texts t 1 and t 2 belong to the same coset of M I if the bytes of their difference M C −1 (t 1 ⊕ t 2 ) in the i-th anti-diagonal for each i / ∈ I are equal to zero. Similar considerations hold for the column space C I and the inverse-diagonal space ID I . We finally introduce some notation that we largely use in the following.
Moreover, we say that t 1 < t 2 if t 1 ≤ t 2 (with respect to the definition just given) and t 1 = t 2 .
Definition 7. Let X be one of the previous subspaces, that is Let t be an element of an arbitrary coset of X , that is t ∈ X ⊕ a for arbitrary a. We say that t is "generated" by the generating variables Similarly, let X = C 0 ≡ e 0,0 , e 1,0 , e 2,0 , e 3,0 , and let p ∈ C 0 ⊕ a. Then p ≡ (p 0 , p 1 , p 2 , p 3 ) if and only if p ≡ a ⊕ p 0 · e 0,0 ⊕ p 1 · e 1,0 ⊕ p 2 · e 2,0 ⊕ p 3 · e 3,0 .

"Multiple-of-8" Secret-Key Distinguisher for 5-round AES
The starting point of our secret-key distinguisher is the property proposed and exploited in [GRR17a] to set up the first 5-round secret-key distinguisher of AES (independent of the secret key). For this reason, in this section we recall the main idea of that paper, and we refer to [GRR17a] for a complete discussion. Consider a set of plaintexts in the same coset of the diagonal space D I , that is 2 32·|I| plaintexts with |I| active diagonals, and the corresponding ciphertexts after 5 rounds. The 5-round AES distinguisher proposed in [GRR17a] exploits the fact that the number of different pairs of ciphertexts that belong to the same coset of M J for a fixed J (that is, the number of different pairs of ciphertexts that are equal on |J| fixed anti-diagonals, omitting the final MixColumns operation) is always a multiple of 8 with probability 1 independently of the secret key, of the details of the S-Box and of the MixColumns matrix. In more details, given a set of plaintexts/ciphertexts (p i , c i ) for i = 0, ..., 2 32·|I| − 1 (where all the plaintexts belong to the same coset of D I ), the number of different pairs 7 of ciphertexts (c i , c j ) that satisfy c i ⊕ c j ∈ M J for a certain fixed J ⊂ {0, 1, 2, 3} has the special property to be a multiple of 8 with prob. 1. Since for a random permutation the same number 5 Our choice to use the subspace trail notation to present our new distinguisher and attack is motivated by the fact that it allows to describe them in a more formal way than using the "classical" notation. 6 The i-th diagonal of a 4 × 4 matrix A is defined as the elements that lie on row r and column c such that r − c = i mod 4. The i-th anti-diagonal of a 4 × 4 matrix A is defined as the elements that lie on row r and column c such that r + c = i mod 4. 7 Two pairs (c i , c j ) and (c j , c i ) are considered equivalent.
doesn't have any special property (e.g. it has the same probability to be even or odd), this allows to distinguish 5-round AES from a random permutation.
Since each coset of D I is mapped into a coset of M I after 2 rounds with prob. 1 -see Theorem 1 -and vice-versa, in order to prove the result given in [GRR17a] it is sufficient to show that given plaintexts in the same coset of M I , then the number of collisions after one round in the same coset of D J is a multiple of 8 (see [GRR17a] for details).
Theorem 2 ([GRR17a]). Let M I and D J be the subspaces defined as before for certain fixed I and J with 1 ≤ |I| ≤ 3 . Given an arbitrary coset of We refer to [GRR17a] for a detailed proof, and we limit here to recall and to highlight the main concepts that are useful for the following.
Without loss of generality (w.l.o.g.), we focus on the case |I| = 1 and we assume I = {0}. Given two texts p and q in M 0 ⊕ a, by definition there exist p 0 , p 1 , p 2 , p 3 ∈ F 2 8 and q 0 , q 1 , q 2 , q 3 ∈ F 2 8 such that where 2 ≡ 0x02 and 3 ≡ 0x03, or equivalently p ≡ (p 0 , p 1 , p 2 , p 3 ) and q ≡ (q 0 , q 1 , q 2 , q 3 )see (5). As first thing, we recall that if 1 ≤ r ≤ 3 generating variables are equal, then the two texts cannot belong to the same coset of D J for |J| ≤ r after one round -this is due to the branch number of the MixColumns matrix (which is 5).
Case: Different Generating Variables. If the two texts p and q defined as before are generated by different variables (i.e. p i = q i for each i = 0, ..., 3), then they can belong to the same coset of D J for a certain J with |J| ≥ 1 after one round. It is possible to prove that p ≡ (p 0 , p 1 , p 2 , p 3 ) and q ≡ (q 0 , q 1 , q 2 , q 3 ) satisfy R(p) ⊕ R(q) ∈ D J for |J| ≥ 1 if and only if others pairs of texts generated by different combinations of the previous variables have the same property. A formal statement is provided in Lemma 2.
Lemma 2. Let p and q be two different elements in M I ⊕ a (i.e. a coset of M I ) for I ⊆ {0, 1, 2, 3} and |I| = 1, with p ≡ (p 0 , p 1 , p 2 , p 3 ) and q ≡ (q 0 , q 1 , q 2 , q 3 ), such that p i = q i for each i = 0, ..., 3. Independently of the secret key, of the details of the S-Box and of the MixColumns matrix, R(p) and R(q) belong to the same coset of a particular subspace have the same property.
Case: Equal Generating Variables. Similar results can be obtained if one or two variables are equal. For the following, we focus on the case in which two variables are equal (the case of one equal variable is analogous).
Lemma 3. Let p and q be two different elements in M I ⊕ a for I ⊆ {0, 1, 2, 3} and |I| = 1, with p ≡ (p 0 , p 1 , p 2 , p 3 ) and q ≡ (q 0 , q 1 , q 2 , q 3 ), such that p i = q i for i = 0, 1 and p i = q i for i = 2, 3 (similar for the other cases). Independently of the secret key, of the details of the S-Box and of the MixColumns matrix, R(p) and R(q) belong to the same coset of a particular subspace D J for J ⊆ {0, 1, 2, 3} if and only if the pairs of texts in M I ⊕ a generated by the following combinations of variables where z and w can take any possible value in F 2 8 , have the same property.
Case |I| = 2 and |I| = 3. Finally, we mention that similar considerations can be done for the cases |I| ≥ 2. W.l.o.g consider |I| = 2 and assume I = {0, 1} (the other cases are analogous). Given two texts p and q in the same coset of M I , that is M I ⊕ a for a given a ∈ M ⊥ I , there exist p 0 , p 0 , p 1 , p 1 , p 2 , p 2 , p 3 , p 3 ∈ F 2 8 and q 0 , q 0 , q 1 , q 1 , q 2 , q 2 , q 3 , q 3 ∈ F 2 8 such that: As for the case |I| = 1, the idea is to consider all the possible combinations of the In other words, the idea is to consider variables in For the following, given texts in the same cosets of C I or M I for I ⊆ {0, 1, 2, 3}, we recall that the number of couples of texts with n "equal generating variable(s) in (F 2 8 ) |I| " (as just defined) for 0 ≤ n ≤ 3 is given by as proved in App. A.

Why is it (rather) hard to set up key-recovery attacks that exploit such distinguisher?
Given this 5-round distinguisher, a natural question regards the possibility to exploit it in order to set up a key-recovery attack on 6-round AES-128 which is better than a brute force one. A possible way is the following. Consider 2 32 chosen plaintexts in the same coset of a diagonal space D i , and the corresponding ciphertexts after 6 rounds. A possibility is to guess the final key, decrypt the ciphertexts and check if the number of collisions in the same coset of M J is a multiple of 8. If not, the guessed key is wrong. However, since a coset of M J is mapped into the full space, it seems hard to check this property one round before without guessing the entire key. It follows that it is rather hard to set up an attack different than a brute force one that exploits directly the 5-round distinguisher proposed in [GRR17a]. For comparison, note that such a problem doesn't arise for the other distinguishers for up to 4-round AES (e.g. the impossible differential or the integral ones) present in the literature, for which it is sufficient to guess only part of the secret key in order to verify if the required property is satisfied or not.

New 4-round Secret-Key Distinguisher for AES
In this section, we re-exploit the property proposed in [GRR17a] to set up a new 4-round secret-key distinguisher for AES. Before we go into the details, we present the general idea.
As we have just seen, given 2 32 plaintexts in the same coset of M I for |I| = 1 and the corresponding ciphertexts after 1 round, that is (p i , c i ) for i = 0, ..., 2 32 − 1 where p i ∈ M I ⊕ a and c i = R(p i ), then the number n of different pairs of ciphertexts (c i , c j ) for i = j that satisfy c i ⊕ c j ∈ D J is always a multiple of 8. This is due to the fact that if one pair of texts belong to the same coset of D J after one round, then other pairs of texts have the same property.
Thus, consider a pair of plaintexts p 1 and p 2 such that the corresponding texts after one round belong (or not) to the same coset of D J . As we have seen, there exist other pairs of plaintextsp 1 andp 2 whose ciphertexts after one round have the same property. The crucial point is that the pairs (p 1 , p 2 ) and (p 1 ,p 2 ) are not independent in the sense that the variables that generate the first pair of texts are the same that generate the other pairs, but in a different combination. The idea is to exploit this property in order to set up a new distinguisher for round-reduced AES. In other words, instead of limiting to count the number of collisions and check that it is a multiple of 8 as in [GRR17a], the idea is to check if these relationships between the variables that generate the plaintexts (whose ciphertexts belong or not the same coset of a given subspace M J ) hold or not.

Mixture Differential Distinguisher for 4-round AES
A formal description of the proposed Mixture Differential Distinguisher for 4-round AES is given in the following Lemma 8 .

Proof using the "super-Sbox" Notation
As first thing, we prove the previous result using the "super-Sbox" notation -introduced in [DR06] by the designers of AES, where Consider two pairs of texts (p 1 , p 2 ) and (p 1 ,p 2 ) in a coset of C 0 ∩ D 0,3 -that is C 0 ∩ D 0,3 ⊕a for a fixed a, such that The goal is to prove that Since P rob(R 2 (x) ⊕ R 2 (y) ∈ M I | x ⊕ y ∈ D I ) = 1 (see (2)), this is equivalent to prove that First of all, observe that p 1 ⊕ p 2 ∈ C 0 ∩ D 0,3 ⊆ D 0,3 , and that R 2 (p 1 ) ⊕ R 2 (p 2 ) ∈ M 0,3 .
can occur if and only if |J| = 3.
As it is well known, 2-round encryption can be rewritten using the super-Sbox notation Since ShiftRows and MixColumns operations are linear, it is sufficient to prove that Since each column of q 1 and q 2 depends on different and independent variables, since the super-Sbox works independently on each column and since the XOR-sum is commutative, it follows that which implies the thesis.
In order to distinguish 4-round AES from a random permutation, one has to check that if and only ifĉ If this property is not satisfied for at least one couple, then it is possible to conclude that the analyzed permutation is a random one.
As a result, in order to distinguish a random permutation from 4-round AES with probability higher than pr, it is sufficient that the previous event occurs for at least one couple of two pairs of texts with probability higher than pr (in order to recognize the random permutation). It follows that one needs approximately n different independent pairs of texts such that pr ≥ 1 − (1 − 2 −29 ) n , that is For pr = 95%, one needs approximately n ≥ 2 30.583 different independent pairs of texts, that is approximately 2 different cosets C 0 ∩ D 0,3 for a total data cost of 2 16 · 2 = 2 17 chosen plaintexts.
Computational Cost. We limit here to report the computational cost of the distinguisher, and we refer to App. B for all the details. In order to implement the distinguisher, the idea is to re-order the ciphertexts using a particular partial order as defined in Def. 8, and to work in the way described in Algorithm 1. Instead of checking the previous property for all possible couples of texts, the idea is to check it only for the couples of texts for which the two ciphertexts belong to the same coset of M J . In other words, if c 1 ⊕ c 2 ∈ M J , then one checks thatĉ 1 ⊕ĉ 2 ∈ M J (prob. 1 for 4-round AES vs prob. 2 −32 for a random permutation). Instead, if c 1 ⊕ c 2 / ∈ M J , then one doesn't check thatĉ 1 ⊕ĉ 2 / ∈ M J . Note that the probability of this last event is very close for the AES and for the random permutation (prob. 1 for 4-round AES vs prob. 1 − 2 −32 for a random permutation). In other words, checking that "if c 1 ⊕ c 2 ∈ M J then c 1 ⊕ĉ 2 ∈ M J " is sufficient to distinguish 4-round AES from a random permutation.
The reason of this strategy is that it allows to save and minimize the computational cost, which is well approximated by 2 23.09 table look-ups, or approximately 2 16.75 four-round encryptions (assuming 9 20 table look-ups ≈ 1 round of encryption), where we limit to remember that the cost to re-order a set of n texts w.r.t. a given partial order is with t 1 = t 2 . Text t 1 is less or equal than text t 2 w.r.t. the partial order (i.e. t 1 t 2 ) if and only if one of the two following conditions is satisfied (indexes are taken modulo 4): where < is defined in Def. 6.

Practical Verification
Using a C/C++ implementation 10 , we have practically verified the distinguishers just described both for full size AES and a small scale variant of AES, as presented in [CMR05]. While for full size AES each word is composed of 8 bits, in the small scale variant each word is composed of 4 bits (we refer to [CMR05] for a complete description of this small scale AES). We highlight that the previous results hold exactly in the same way also for this small scale variant of AES, since the previous argumentation is independent of the fact that each word of AES is of 4 or 8 bits.
The distinguisher just presented works in the same way for full and small scale AES, and it is able to distinguish AES from a random permutation using 2 · (2 8 ) 2 = 2 17 chosen plaintexts in the first case and 2 · (2 4 ) 2 = 2 9 in the second one (i.e. 2 cosets of C 0 ∩ D 0,3 , each one of size 2 16 and 2 8 respectively for full and small scale AES 11 ) as expected. For full size AES, while the theoretical computational cost is of 2 23 table look-ups, the practical one is on average 2 22 in the case of a random permutation and 2 24 in the case of an AES permutation. We emphasize that for a random permutation, it is sufficient to find one couple of two pairs of texts that doesn't satisfy the required property (to recognize the 9 We highlight that even if this approximation is not formally correct -the size of the table of an S-Box look-up is lower than the size of the table used for ours distinguisher, it allows to give a comparison between our distinguishers and the others currently present in the literature. This approximation is largely used in the literature. 10 The source codes of the distinguishers/attacks are available at https://github.com/Krypto-iaik/ Attacks_AES 11 Following the same analysis proposed in Sect. 4.1, here we show that 2 initial cosets are necessary to set up the attack also for the small scale case. Using the same notation of Sect. 4.1.1, the probability that R 4 (p 1 ) ⊕ R 4 (p 2 ) ∈ M J and R 4 (p 1 ) ⊕ R 4 (p 2 ) / ∈ M J (or vice-versa) for a (small scale) random permutation is 2 · 4 · 2 −16 · (1 − 2 −16 ) = 2 −13 . It follows that one needs n ≥ 2 14.583 different independent pairs of texts to set up the attack with probability higher than 95%, that is approximately 2 different cosets C 0 ∩ D 0,3 (note that for each coset it is possible to construct 1 2 · 2 8 2 ≈ 2 14 independent pairs of texts). random permutation). In the case of the AES permutation, the difference between the theoretical and the practical cases (i.e. a factor 2) is due to the fact that the cost of the merge sort algorithm is O(n · log n) and by the definition of the big O(·) notation 12 . For the small scale AES, using 2 different initial cosets of C 0 ∩ D 0,3 , the theoretical computational cost is well approximated by 2 · 4 · 2 8 · (log 2 8 + 1) 2 14.2 table look-ups. The practical cost is approximately 2 13.5 for the case of a random permutation and 2 15 for the AES case.

Generic Mixture Differential Distinguishers for 4-round AES
Using results presented in [GRR17a] and recalled in detail in Sect. 3, it is possible to set up alternative 4-round "mixture differential" distinguishers also for any pair of plaintexts p 1 and p 2 that have different generating variables or that belong to the same coset of a subspace C I for each I ⊆ {0, 1, 2, 3}. For sake of simplicity, we don't list all possible cases, but we limit to (formally) present two cases that are used to set up new secret-key distinguishers -see [Gra17] for details -and new key-recovery attacks for AES -see Sect. 5. The proof of the following distinguishers is based on the one just proposed, adapted to the analyzed case. As before, also the following distinguishers work in both the decryption and encryption direction 13 .
The proof of this result is equivalent to the one proposed in Sect. 4.1.1. In particular, let q 1 = SR(p 1 ) and q 2 = SR(p 2 ) as before. If a column of q 1 is equal to the corresponding column of q 2 , it follows that the difference super-Sbox(q 1 ) ⊕ super-Sbox(q 2 ) is independent of the value of such column. As a result, the difference R 2 (p 1 ) ⊕ R 2 (p 2 ) is independent of the generating variables which are equal for p 1 and p 2 . It follows that (z 2 , w 1 , x, y)), Such result is the starting point for new 5-round secret-key distinguisher of AES, as proposed in [Gra17]. We finally emphasize that the previous result is based on Lemma 3 (proposed in [GRR17a]).
Starting Point for Key-Recovery Attack of Sect. 5. As second case, we consider two plaintexts in a coset of C 0 (or more generally C I for |I| = 1) generated by different generating variables.
The following event

holds with prob. 1 for 4-round AES, independently of the secret key, of the details of the S-Box and of the MixColumns matrix (except for the branch number equal to 5).
The proof of this result is equivalent to the one proposed in Sect. 4.1.1. In particular, let q 1 = SR(p 1 ) and q 2 = SR(p 2 ) as before. Since (1st) the super-Sbox(·) works independently on each column of q 1 and q 2 , (2nd) the columns of q 1 and q 2 depend on different and independent variables and (3rd) the XOR sum is commutative, it follows that Such result is used in Sect. 5 to set up a new key-recovery attack on 5-round AES. We finally emphasize that the previous result is based on Lemma 2 (proposed in [GRR17a]).

Comparison with Other 4-round Secret-Key Distinguishers
Here we highlight the major differences with respect to the other 4-round AES secretkey distinguishers present in the literature. Omitting the integral one (which exploits a completely different property), we focus on the impossible and the truncated differential distinguishers, on the polytopic cryptanalysis, on the "multiple-of-8" distinguisher (adapted -in a natural way -to the 4-round case) and on the yoyo distinguisher. Impossible Differential. The impossible differential distinguisher is based on Prop. 1, that is it exploits the property that M I ∩ D J = {0} for |I| + |J| ≤ 4. In our case, we consider plaintexts in the same coset of C 0 ∩ D I ⊆ D I where |I| ≥ 2 (e.g. I = {0, 3}) and looks for collisions in M J with |J| = 3. Since |I|+|J| ≥ 5, the property exploited by the impossible differential distinguisher cannot be applied here.
Truncated Differential. The truncated differential distinguisher has instead some aspects in common with our distinguisher. In this case, given pairs of plaintexts with certain difference on certain bytes (i.e. that belong to the same coset of a subspace X ), one considers the probability that the corresponding ciphertexts belong to the same coset of a subspace Y. For 2-round AES it is possible to exploit truncated differential trails with probability 1, while for the 3-round case there exist truncated differential trails with probability lower than 1 but higher than for the random case 14 (in both cases, X ≡ D I and Y ≡ M J ).
Our distinguisher works in a similar way and exploits a similar property. However, instead of working with a single couple of texts independently of the others, in our distinguisher one basically considers sets of 2 "non-independent" couples of texts and exploits the relationships that hold among the couples of texts that belong to the same set.
Polytopic Cryptanalysis. Polytopic cryptanalysis [Tie16] has been introduced by Tiessen at Eurocrypt 2016, and it can be viewed as a generalization of standard differential cryptanalysis. Consider a set of d ≥ 2 couples of plaintexts (p 0 , p 0 ⊕ α 1 ), (p 0 , p 0 ⊕ α 2 ), ...(p 0 , p 0 ⊕α d ) with one plaintext in common (namely p 0 ), called d-poly. The idea of polytopic cryptanalysis is to exploit the probability that the input set of differences α ≡ (α 1 , α 2 , ..., α d ) is mapped into an output set of differences β ≡ (β 1 , β 2 , ..., β d ) after r rounds. If this probability 15 -which depends on the S-Box details -is different from the corresponding probability in the case of a random permutation, it is possible to set up distinguishers or key-recovery attacks. Impossible polytopic cryptanalysis focuses on the case in which the probability of the previous event is zero. In [Tie16], an impossible 8-polytopic is proposed for 2-round AES, which allows to set up key-recovery attacks on 4-and 5-round AES.
Our proposed distinguisher works in a similar way, since also in our case we consider sets of "non-independent" couples of texts and we focus on the input/output differences. However, instead of working with a set of couples of plaintexts with one plaintext in common, we consider sets of couples of texts for which particular relationships between the generating variables of the texts hold. Moreover, instead of considering the probability that "generic" input differences α are mapped into output differences β, the way in which the texts are divided in sets guarantees the two ciphertexts of all couples satisfy or not an output (truncated) difference independently of the S-Box details (that is, it is not possible that some of them satisfy this output difference and some others not).
"Multiple-of-8" Distinguisher. The "multiple-of-8" distinguisher [GRR17a] can be adapted to the 4-round case, e.g. considering plaintexts in the same coset of C J , counting the number of collisions of the ciphertexts in the same coset of M I and checking if it is (or not) a multiple of 8. Since our distinguisher exploits more information (that is, the relationships that hold among the generating variables of the couples of plaintexts in the same set, beside the fact that the previous number is a multiple of 8), its data and computational costs are lower than [GRR17a], in particular 2 17 chosen plaintexts/ciphertexts instead of 2 33 and approximately 2 23 table look-ups instead of 2 40 .
Yoyo Distinguisher. The basic idea exploited by the yoyo distinguisher [RBH17] proposed at Asiacrypt 2017 is similar to the one exploited by our distinguisher. Consider 4-round AES, where the initial and the final ShiftRows and the final MixColumns operations are omitted 16 . Given a pair of plaintexts in the same coset of a column space C I -that is p 1 , p 2 ∈ C I ⊕ a, consider the corresponding ciphertexts c 1 and c 2 after 4 rounds. In the yoyo game, the idea is to construct a new pair of ciphertextsĉ 1 andĉ 2 by swapping the columns of c 1 and c 2 . E.g., if c i ≡ (c i 0 , c i 1 , c i 2 , c i 3 ) for i = 1, 2 where c i j denotes the j-th column of c i , one can define the new pair of ciphertexts aŝ c 1 ≡ (c 2 0 , c 1 1 , c 1 2 , c 1 3 ) andĉ 2 ≡ (c 1 0 , c 2 1 , c 2 2 , c 2 3 ). As proved in [RBH17], the corresponding plaintextsp 1 = R −4 (ĉ 1 ) andp 2 = R −4 (ĉ 2 ) belong to the same coset of C I with prob. 15 We mention that the probability of polytopic trails is usually much lower than the probability of trails in differential cryptanalysis, that is simple polytopic cryptanalysis can not in general outperform standard differential cryptanalysis -see Sect. 2 of [Tie16] for details. 16 The distinguisher works as well also in the case in which these linear operations are not omitted. We refer to [RBH17] for all the details. 1 for 4-round AES (that is,p 1 ⊕p 2 ∈ C I with prob. 1), while this happens with prob. 2 −32·(4−|I|) for a random permutation.
Our distinguisher and the yoyo one are very similar. Both ones exploit particular relationships that hold among the generating variables of a pair of texts and particular properties which depend on such relations to distinguish 4-round AES from a random permutation. However, we emphasize that while the yoyo distinguisher requires adaptive chosen ciphertexts in order to construct new pairs of texts related to the original one, in our case such new pairs of texts are constructed directly from the chosen plaintexts. In other words, ours distinguisher doesn't require adaptive chosen plaintexts/ciphertexts.

New Key-Recovery Attack on 5-round AES
The modified version of the previous 4-round secret-key distinguisher proposed in Sect. 4.2 can be used as starting point to set up a new (practical verified) key-recovery attack on 5-round AES.
W.l.o.g. consider two plaintexts p 1 and p 2 in the same coset of D 0 , e.g. D 0 ⊕ a for a ∈ D ⊥ 0 , such that p i = x i · e 0,0 ⊕ y i · e 1,1 ⊕ z i · e 2,2 ⊕ w i · e 3,3 ⊕ a or equivalently The idea is to filter wrongly guessed keys of the first round by exploiting the previous distinguisher.
In particular, given plaintexts in the same coset of D 0 , the idea of the attack is simply to guess 4 bytes of the first diagonal of the secret key k, that is k i,i for each i ∈ {0, 1, 2, 3}, to (partially) compute R k (p 1 ) and R k (p 2 ) and to exploit the following consideration: if the guessed key is the right one, then if and only if there exist other pairs of texts R k (q 1 ) and R k (q 2 ) with the same property, that is where R k (q 1 ) and R k (q 2 ) are defined by a different combination of the generating variables of R k (p 1 ) and R k (p 2 ). If this property is not satisfied and due to the distinguisher just proposed, then it is possible to claim that the guessed key is a wrong candidate. As we are going to show, this attack works because the variables that define the (other) pairs of texts R k (q 1 ) and R k (q 2 ) depend on the guessed key (besides on the texts p 1 and p 2 ).

Details of the Attack
In the following we give all the details of the attack. As for the distinguisher just presented, consider a pair of texts p 1 and p 2 in the same coset of D 0 such that • c 1 ⊕ c 2 ≡ R 5 (p 1 ) ⊕ R 5 (p 2 ) ∈ M J (observe that this condition is independent of the (partially) guessed key); • R(p i ) ≡ (x i ,ŷ i ,ẑ i ,ŵ i ) for i = 1, 2 as before, s.t.x 1 =x 2 ,ŷ 1 =ŷ 2 ,ẑ 1 =ẑ 2 and w 1 =ŵ 2 .
For completeness, we emphasize that the attack works even if one or two generating variables of R(p 1 ) and R(p 2 ) are equal (e.g. if two generating variables are equal, in the following it is sufficient to exploit Lemma 3 instead of Lemma 2). We limit to discuss the case in which the generating variables are all different only for sake of simplicity, and since this is the event that happens with highest probability (the probability that all the generating variables are different is [(256 · 255)/256 2 ] 4 = 255 4 256 4

98.45%). Due to the definition ofx
note that the second condition depends on the (partially) guessed key.
Why does the attack work? Wrong-Key Randomization Hypothesis! One of the assumption required by the proposed attack is the "wrong-key randomization hypothesis". This hypothesis states that when decrypting one or several rounds with a wrong key guess creates a function that behaves like a random function. For our setting, we formulate it as following:

Wrong-key randomization hypothesis. When the pairs of -intermediate -texts
and R k (q 2 ) are generated using a wrongly guessed key, the probability that the resulting pairs of ciphertexts satisfy the required property is equal to the probability given for the case of a random permutation.
In the following we show that such assumption holds. The crucial point is that the new pairs of texts R k (q 1 ) and R k (q 2 ) (and the way in which they are constructed) depend on the guessed key.
In the proposed attack, the wrong-key randomization hypothesis follows immediately from the definition of the generating variables and from the fact that the S-Box is a non-linear operation. To have more evidence of this fact, let k be the secret key andk be a guessed key. Given R k (p 1 ) ≡ (x 1 , y 1 , z 1 , w 1 ) and R k (p 2 ) ≡ (x 2 , y 2 , z 2 , w 2 ) in C 0 ⊕ b as before, the generating variables of Rk(q 1 ) ≡ (x 1 ,ỹ 1 ,z 1 ,w 1 ) and Rk(q 2 ) ≡ (x 2 ,ỹ 2 ,z 2 ,w 2 ) Data: 1 coset of D 0 (e.g. D 0 ⊕ a for a ∈ D ⊥ 0 ) and corresponding ciphertexts after 5 rounds -more generally a coset of D i for i ∈ {0, 1, 2, 3} Result: 4 bytes of the secret key -(k 0,0 , k 1,1 , k 2,2 , k 3,3 ) let (p i , c i ) for i = 0, ..., 2 32 − 1 be the 2 32 (plaintexts, ciphertexts) of D 0 ⊕ a; do find indexes j and h s.t. c j ⊕ c h ∈ M I ; for each one of the 2 32 combinations ofk = (k 0,0 , k 1,1 , k 2,2 , k 3,3 ) do (partially) compute Rk(p j ) and Rk(p h ); f lag ← 0; for each couple (q 1 , R 5 (q 1 )) and (q 2 , R 5 (q 2 )) where Rk(q 1 ) and Rk(q 2 ) are constructed by a different combination of the generating variables of Rk(p j ) and Rk(p h ) do if R 5 (q 1 ) ⊕ R 5 (q 2 ) / ∈ M I then f lag ← 1; next combination of (k 0,0 , k 1,1 , k 2,2 , k 3,3 ); end end if f lag = 0 then identify (k 0,0 , k 1,1 , k 2,2 , k 3,3 ) as candidate of the key; end end while more than a single candidate of the key is found -Repeat the procedure for different indexes j, h (and I) // usually not necessary -only one candidate is found; return (k 0,0 , k 1,1 , k 2,2 , k 3,3 ) Algorithm 2: 5-round AES Key-Recovery Attack. The attack exploits the 4-round distinguisher presented in Sect. 4.2. For sake of simplicity, in this pseudo-code we limit to describe the attack of 4 bytes -1 diagonal of the secret key (the same attack can be used to recover the entire key).
for certain h, j, k, l ∈ {1, 2}. For a wrongly guessed keyk = k, the relations among the generating variables [x i ,ỹ i ,z i ,w i ] = [x h , y j , z k , w l ] do not hold 17 . It follows that if k =k, then the attacker is considering random pairs of texts, which implies that the required property is -in general -not satisfied (as for the case of a random permutation).
Before going on, we emphasize that this result also implies the impossibility to set up a 5-round distinguisher similar to the one just presented in this section choosing plaintexts in the same coset of a diagonal space D I instead of a column space C I . Indeed, given p 1 and p 2 as before in the same coset of D I (instead of C I ), since the key k is secret and the S-Box is non-linear, there is no way to findp 1 andp 2 in the coset of D I s.t. R 5 (p 1 ) ⊕ R 5 (p 2 ) ∈ M J if and only if R 5 (p 1 ) ⊕ R 5 (p 2 ) ∈ M J without guessing the secret key.

Data and Computational Costs
Data Cost. First of all, since the cardinality of a coset of D I for |I| = 1 is 2 32 and since P rob(t ∈ M J ) = 4 · 2 −32 = 2 −30 for |J| = 3, the average number of collisions for each coset of D I is approximately 2 −30 · 2 32 2 2 −30 · 2 63 2 33 , so it's very likely that two (plaintexts, ciphertexts) pairs (p 1 , c 1 ) and (p 2 , c 2 ) exist such that c 1 ⊕ c 2 ∈ M J and for which the two plaintexts have different generating variables.
Given a couple of plaintexts p 1 and p 2 for which the corresponding ciphertext c 1 and c 2 belong to the same coset of M J , consider the other 7 couples of plaintexts q 1 and q 2 defined as before (that is, such that R(q 1 ) and R(q 2 ) are defined by a different combination of the generating variables of R(p 1 ) and R(p 2 )). For a wrong key, the probability that the two ciphertexts of each one of the other 7 couples belong to the same coset of M J for a fixed J (that is, the probability that a wrong key passes the test) is (2 −32 ) 7 = 2 −224 .
Since there are 2 32 − 1 wrong candidates for the diagonal of the key, the probability that at least one of them passes the test is approximately 1 − (1 − 2 −224 ) 2 32 −1 2 −192 . Thus, one couple of plaintexts p 1 and p 2 (for which the corresponding ciphertexts belong to the same coset of M J ) together with the corresponding other 7 couples of texts q 1 and q 2 are (largely) sufficient to discard all the wrong candidates for a diagonal of the key. Actually, in general only two different couples q 1 and q 2 (that is, two couples of texts given by two different combinations of the generating variables) are sufficient to discard all the wrong candidates, so it is not necessary to consider all the 7 pairs of texts q 1 and q 2 . Indeed, given two couples, the probability that at least one wrong key passes the test is approximately 1 − (1 − 2 −32·2 ) 2 32 −1 2 −32 1, which means that all the wrong candidates are discarded with high probability.
As a result, the attack requires 2 33.6 chosen plaintexts.
Computational Cost. Each coset of D I with |I| = 1 is composed of 2 32 texts, thus on average 2 63 · 2 −32 = 2 31 different pairs of ciphertexts belong to the same coset of M J for a fixed J with |J| = 3. However, it is sufficient to find one collision in order to implement the attack and to find the key.
In order to find it, the best strategy is to re-order the ciphertexts with respect to the partial order and then to work on consecutive elements, as done in Sect. 4.1.2. For each initial coset of D I and for a fixed J, the cost to re-order the ciphertexts with respect to the partial order (for M J with J fixed -|J| = 3) and to find a collision is approximately of 2 32 · (log 2 32 + 1) = 2 37 table look-ups.
When such a collision is found, one has to guess 4 bytes of the key and to constructat least -two other different couples given by a different combination of the generating variables of R(p 1 ) and R(p 2 ) (observe that the conditionx 1 =x 2 ,ŷ 1 =ŷ 2 ,ẑ 1 =ẑ 2 and w 1 =ŵ 2 is satisfied with probability (255/256) 4 ≈ 1). In order to perform this step efficiently, the idea is to re-order -and to store separately a second copy of -the (plaintexts, ciphertexts) pairs w.r.t. the partial order ≤ as defined in Def. 6 s.t. p i ≤ p i+1 for each i. Using the same strategy proposed for the 4-round distinguisher (see App. B for all details), this allows to construct these two new different couples (and to check if the corresponding ciphertexts satisfy or not the required property) with only 4 table look-ups. As a result, the cost of this step is of 2 32 · 2 · 4 = 2 35 S-Box and of 2 32 · 4 = 2 34 table look-ups.
It follows that the cost to find one diagonal of the key is well approximated by 2 35 S-Box look-ups and 2 37.17 table look-ups, that is approximately 2 30.95 five-round encryptions. The idea is to use this approach for three different diagonals, and to find the last one by brute force. As a result, the total computational cost is of 2 32 + 3 · 2 30.95 = 2 33.28 five-round encryptions, while the data cost is of 3 · 2 32 = 2 33.6 chosen plaintexts.
Summary. As a result, the attack -practical verified on a small scale AES -requires 2 33.6 chosen plaintexts and has a computational cost of 2 33.28 five-round encryptions. The pseudo-code of the attack is given in Algorithm 2. We remark for completeness that the same attack works also in the decryption/reverse direction, using chosen ciphertexts instead of plaintexts.

Practical Verification
Using a C/C++ implementation, we have practically verified the attack just described 18 on the small scale AES [CMR05]. We emphasize that since the proposed attack is independent of the fact that each word of AES is composed of 4 or 8 bits, our verification on the small scale variant of AES is strong evidence for it to hold for the real AES.
Practical Results. For simplicity, we limit to report the result for a single diagonal of the key. First of all, a single coset of a diagonal space D i is largely sufficient to find one diagonal of the key. In more detail, given two (plaintexts, ciphertexts) pairs (p 1 , c 1 ) and (p 2 , c 2 ), then other two different couples q 1 and q 2 (out of the seven possible ones) are sufficient to discard all the wrong candidates of the diagonal of the key, as predicted.
About the computational cost, the theoretical cost for the small scale AES case is well approximated by 4 · 2 16 · (log 2 16 + 1) + 2 16 · 4 = 2 21 table look-ups and 2 16 · 4 · 3 = 2 19.6 S-Box look-ups, for a total of 2 19.6 + 2 21 = 2 21.5 table look-ups (assuming that the cost of 1 S-Box look-up is approximately equal to the cost of 1 table look-up). The average practical computational cost is of 2 21.5 table look-ups, approximately the same as the theoretical one.
Note that the total number of all the possible couples is 2 31 · (2 32 − 1).
The formula for the other cases is obtained in an analogous way.
Remark. The proposed formula is used in the context of the 5-round distinguisher presented in [GRR17a], and for the distinguishers proposed in this paper. As explained in Sect. 3, in these distinguishers we work with "generating variables" in (F 2 8 ) |I| . If |I| = 1, then this corresponds to work independently on each variable. In the other cases, this means to work with sets of variables in (F 2 8 ) |I| for |I| ≥ 2.
As a result, this formula doesn't apply if one works independently on each generating variable also in the cases |I| ≥ 2, that is with generating variables in F 2 8 also for |I| ≥ 2. In this last case, the required formula becomes 4 n · 2 32·|I|−1 · (2 8 − 1) (4−n)·|I| .
(note that the two formulae are identical only for |I| = 1).

B 4-round Secret-Key Distinguisher for AES -Details
In this section, we give all the details about the computational cost of the 4-round secret-key distinguisher for AES presented in Sect. 4. We refer to Sect. 4 for all the details about the distinguisher. Given 2 16 chosen plaintexts in the same coset of (C 0 ∩ D 0,3 ) ⊕ a and the corresponding ciphertexts, a first possibility is to construct all the possible pairs, to divide them in sets S of non-independent pairs defined as S = (p 1 , p 2 ), (p 1 ,p 2 ) ∈ C 0 ∩D 0,3 ⊕ a 2 p 1 ≡ (x 1 , x 2 ), c 1 = R 4 (p 1 ) , p 2 ≡ (y 1 , y 2 ), c 2 = R 4 (p 1 ) ; p 1 ≡ (y 1 , x 2 ),ĉ 1 = R 4 (p 1 ) , p 2 ≡ (x 1 , y 2 ),ĉ 2 = R 4 (p 2 ) , In order to reduce the computational cost, a possibility is to re-order the ciphertexts with respect to a partial order as defined in Def. 8 (see also [GRR17a]). Note that depends on an index J. Using a merge-sort algorithm, the cost to re-order n texts is of O(n · log n) table look-ups. When the ciphertexts have been re-ordered, it is no more necessary to construct all the possible pairs and it is sufficient to work only on consecutive texts with respect to .
In more detail, first one stores all the plaintext/ciphertext pairs twice, (1) once in which the plaintexts are ordered with respect to the partial order ≤ defined in Def. 6 and (2) once in which the ciphertexts are ordered with respect to the partial order defined in Def. 8. Then, working on this second set, one focuses only on consecutive ciphertexts c i and c i+1 for each i, and checks if c i ⊕ c i+1 ∈ M J or not. Assume that c i ⊕ c i+1 ∈ M J for a certain J fixed previously. The idea is to take the corresponding plaintexts p i ≡ (x 1 , y 1 ) and p i+1 ≡ (x 2 , y 2 ), to construct the corresponding set S and to check if the ciphertextsĉ 1 andĉ 2 of the corresponding plaintextsp 1 ≡ (x 1 , y 2 ) and p 2 ≡ (x 2 , y 1 ) satisfy the conditionĉ 1 ⊕ĉ 2 ∈ M J for the same J. If not, by previous observations one can simply deduce that this is a random permutation. Note that if there are r consecutive ciphertexts c i , c i+1 , ..., c i+r−1 such that c j ⊕ c l ∈ M J for i ≤ j, l < i + r, then one has to repeat the above procedure for all these r 2 = r · (r − 1)/2 possible pairs 20 . To optimize the computational cost, note that the plaintextsp 1 andp 2 are respectively in positions x 1 + 2 8 · y 2 and x 2 + 2 8 · y 1 in the first set of plaintext/ciphertext pairs (i.e. in the set where the plaintexts are ordered with respect to the partial order ≤). Thus, the cost to get these two elements is only of 2 table look-ups. Moreover, we emphasize that it is sufficient to work only on (consecutive) ciphertexts c i and c j such that c i ⊕ c j ∈ M J . Indeed, consider the case in which the two ciphertexts c i and c j don't belong to the same coset of M J , i.e. c i ⊕ c j / ∈ M J . If the corresponding ciphertextsĉ 1 andĉ 2 -defined as before -don't belong to the same coset of M J , then the property is (obviously) verified. Instead ifĉ 1 ⊕ĉ 2 ∈ M J , then this case is surely analyzed. The pseudo-code of such strategy can be found in Algorithm 1.
Using this procedure, the memory cost is well approximated by 4 · 2 17 · 16 = 2 23 bytesthe same plaintext/ciphertext pairs in two different ways. The cost to order the ciphertexts for each possible J with |J| = 3 and for each one of the two cosets is approximately of 2 · 4 · 2 16 · log 2 16 2 23 table look-ups, while the cost to construct all the possible pairs of consecutive ciphertexts is of 2 · 4 · 2 16 = 2 19 table look-ups. Since the probability that a pair of ciphertexts belong to the same coset of M J for |J| = 3 is 2 −30 and since each coset contains approximately 2 31 different pairs, then one has to do on average 2·4·2 −30 ·2 31 = 2 4 table look-ups in the plaintext/ciphertext pairs ordered with respect to the plaintexts. Thus, the total cost of this distinguisher is well approximated by 2 23 + 2 19 + 16 2 23.09 table look-ups, or approximately 2 16.75 four-round encryptions (using the approximation 20 table look-ups ≈ 1 round of encryption).

C Integral Attack on 5-round AES
Here we recall a strategy that allows to improve the computational cost of an integral attack on 5-round AES. Such a strategy has been proposed by an anonymous reviewer 21 .

Integral Attack [DR02] on 5-round AES.
It is well known that given 2 32 chosen plaintexts with one active diagonal (i.e. a coset of a diagonal space D I ⊕ a), then the XOR-sum of the corresponding texts after four encryption is equal 0 independently of the value of the secret key, that is p i ∈D I ⊕a R 4 (p i ) = 0 for each I ∈ {0, 1, 2, 3}. 20 Since M J is a subspace, given a, b, c such that a ⊕ b ∈ M J and b ⊕ c ∈ M J , then a ⊕ c ∈ M J . 21 A similar strategy has also been exploited e.g. in [GR17] in order to set up a "known-key" distinguisher for 12-round AES.
Such a property can be used to set up an integral attack on 5-round AES, by guessing -byte per byte -the final key and checking that the XOR-sum is equal to zero one round before. In particular, assume the final MixColumns is omitted, and let k j,l be the guessed byte in row j and column l of the final subkey. Due to the previous property, such guessed byte of the key is certainly wrong if where c i ≡ R 5 (p i ) and I ∈ {0, 1, 2, 3}. By simple computation, the attack requires 2 32 chosen plaintexts and 16 · 2 8 · 2 32 = 2 44 S-Box operations, that is approximately 2 37.36 5-round encryptions (assuming 20 S-Box ≡ 1-round encryption). Note that the probability that a wrong byte key satisfies the zero-sum property is 2 −8 . As a result, if a wrong key survives the test, then it can simply filtered using a brute force attack. Finally, if the final MixColumns is not omitted, one can simply repeat the previous attack by swapping the final MixColumns operation and the final subkey (we refer to [DR02] for more details) -remember that the MixColumns operation is linear.

Improved Integral Attack on 5-round AES.
Here a way to improve the previous attack is proposed. The crucial point is the following. Working at byte level, note that Eq. (9) is the sum of 2 32 bytes. Since each byte can take "only" 2 8 different values, it turns out that many bytes of the form S-Box −1 (c i j,l ⊕ k j,l ) in (9) are equal. Thus, the total computational cost can be reduced using a precomputation process, as showed in the following. end if x = 0 then identify k as candidate for k 0,0 ; end end return candidates for k 0,0 . Algorithm 3: Integral attack on 5-round AES: working on each byte of the key independently of the others, filter wrong key candidates using zero-sum property. Other bytes of the key can be found in a similar way.
The pseudo-code of the attack is given in Algorithm 3. Given 16 tables (one for each byte of the state) of size 2 8 bits, for each ciphertext the idea is to record in each hash table the value of the corresponding byte. The crucial point is that in order to compute the XOR-sum (9), we are interested only in the parity of the number of times each value appears. As a result, given a guessed byte of the key, the cost to check if the XOR-sum (9)