Commutative Cryptanalysis Made Practical

. About 20 years ago, Wagner showed that most of the (then) known techniques used in the cryptanalysis of block ciphers were particular cases of what he called commutative diagram cryptanalysis . However, to the best of our knowledge, this general framework has not yet been leveraged to find concrete attacks. In this paper, we focus on a particular case of this framework and develop commutative cryptanalysis , whereby an attacker targeting a primitive E constructs affine permutations A and B such that E ◦ A = B ◦ E with a high probability, possibly for some weak keys. We develop the tools needed for the practical use of this technique: first, we generalize differential uniformity into “A-uniformity” and differential trails into “commutative trails”, and second we investigate the commutative behaviour of S-box layers, matrix multiplications, and key additions. Equipped with these new techniques, we find probability-one distinguishers using only two chosen plaintexts for large classes of weak keys in both a modified Midori and in Scream . For the same weak keys, we deduce high probability truncated differentials that can cover an arbitrary number of rounds, but which do not correspond to any high probability differential trails. Similarly, we show the existence of a trade-off in our variant of Midori whereby the probability of the commutative trail can be decreased in order to increase the weak key density. We also show some statistical patterns in the AES super S-box that have a much higher probability than the best differentials, and which hold for a class of weak keys of density about 2 − 4 . 5 .


Introduction
Symmetric cryptographic primitives protect a large fraction of our sensitive and personal data.We have at hand a strong set of algorithms that have resisted an impressive amount of cryptanalysis.However, as our understanding increases, the field of cryptanalysis becomes more and more scattered leading to a situation where it becomes harder to keep track of all possible attacks to consider when analysing, or designing, a new primitive.Thus, as more and more advanced, sophisticated, and numerous attacks are being developed, it is important to unify attacks and security arguments.By doing so, we can hope to keep the security analysis to a manageable level but also, and more importantly in the long run, to improve our fundamental understanding of modern ciphers.
However, while generalizing and unifying attacks is certainly important, it has to be done to such an extent that it still leads to meaningful results, i.e., the results have to be applicable.This in particular means that it should be possible to check if the attack vectors apply to a given cipher.Otherwise, conceptually nice ideas do not allow to be populated with non-trivial examples and remain of limited interest despite the potential large variety of properties covered.To give a concrete example, the partitioning attack [HM97], while being a very nice and elegant framework, seems, at least for now, too general to be ever falsifiable entirely.That is, we are far from giving security arguments against partitioning attacks in their general form, and even more, concrete examples of ciphers broken by partitioning cryptanalysis are covered by very special cases of the general framework.
In our work, we show how to mount attacks exploiting that, for a given cipher E, there exist affine bijective mappings A and B such that there exists (a lot of) keys k for which is verified with high probability (taken over x), compared to the case where E is replaced with a random permutation.We focus on the case where E is a substitution-permutation construction.For those constructions, we feel that this property, and attacks based on it, provide the right level of generality: enabling A and B to be any function, as in Wagner's work on commutative diagram cryptanalysis [Wag04], casts too wide a net for practical use.On the other hand, as we outline next, this set-up unifies several important attack vectors, allows to construct new attacks and it is possible, in the case of a very-high probability, to be detected algorithmically in its general form.The latter point is particularly important for security arguments against those attack vectors.
Generality.As pointed out in [Wag04], the framework of what we will refer to as commutative cryptanalysis captures important classes of attacks as special cases.Most prominently, differential attacks correspond to the case where both A and B are simple translations.As extensions to this, it also includes rotational cryptanalysis, as introduced in [KN10] and generalized to capture differences on top of rotations in [AL16].In the former A and B are rotations, while in the latter they are allowed to be rotations composed with translations.
Focusing on a round-based structure, i.e. when E is the composition of several simpler permutations R i , allows to discuss commutative trails where we have a chain of affine mappings such that each of the round functions R i fulfills with high probability.An interesting special case of this is when translations for A i are interleaved with non-translations.This in particularly captures and puts into a better perspective a recent example at CRYPTO 2023 [BFL + 23] of a toy cipher, where a probability-one differential over two rounds has been constructed such that it does not stem from a probability-one characteristic.
Applicability.On the other hand, in the case of an iterated cipher, and more particularly in the case of a SPN cipher, tracing a commutative trail through its parts becomes feasible, at least in the case of a high probability.As we will detail, each of the usual components, the S-box and linear layers, and the key or constant addition, allows for a rather rich theory and comes with its own insights to the full picture of commutative cryptanalysis.
For an S-box, the probability-one case of the equation S(A(x)) = B(S(x)), which corresponds to a self-affine-equivalent S-box, can be algorithmically solved for all instances of interest by known algorithms, in particular [BDBP03] and [Din18].In practice, several S-boxes from the literature have such a property, in particular Midori [BBI + 15] (both the 64-and the 128-bit versions), and Scream [GLS + 15]. 1or the linear layer, the behaviour of commutation can be captured with the rich structure of linear and affine mappings.Here, in particular, the case of an S-box-aligned experimental results on the behaviour of probabilistic commutative trails.Section 7 concludes the paper.

Notations
Let κ, d, s be positive integers.Let n, m, ℓ be positive integers such that n = m × ℓ.Cardinality of a set S is denoted |S|.
Finite fields and vector spaces.Let F 2 denote the finite field with two elements and F d 2 the vector space of dimension d over F 2 .Given two bits (elements of F 2 ) or binary vectors (elements of F n 2 ), we denote + the bit-wise addition (XOR).We use the usual vector space isomorphism F n 2 ≃ (F m 2 ) ℓ and refer to the former as the state point of view, and to the latter as the cell point of view.When it is necessary to distinguish between both, we reserve caligraphic letters to applications applied on the whole state: F : F n 2 → F n 2 , and capital letters to functions applied on a cell: For a ∈ F m 2 , a denotes the vector whose ℓ cells are all equal to a: a := (a, • • • , a). a ×s := (a, • • • , a) ∈ F s 2 might be used to emphasize the sizing.
We denote the subspace spanned by a s-tuple of vectors (v 1 , . . ., v s ) ∈ (F n 2 ) s by ⟨v 1 , . . ., v s ⟩ := {a 1 v 1 + • • • + a s v s , (a 1 , . . ., a s ) ∈ F s 2 }.Generic sub-spaces are denoted V .If not explicitly stated otherwise, when linearity (resp.affinity) is mentioned, we always refer to F 2 -linearity (resp.F 2 -affinity).Given an affine mapping A : F s 2 → F s 2 , we denote c A := A(0) its constant term and L A := A + c A the linear part of it.
We represent binary vectors often as integers, either written in decimal, hexadecimal or binary notation.The explicit relationship between integers and vectors we use throughout the paper is as follows: Matrix spaces.We denote by M(n, F 2 s ) (respectively GL(n, F 2 s )) the space of square (resp.invertible square) matrices of size n with coordinates in the field with 2 s elements.I denotes the identity matrix.When dealing with block matrices, we always consider a matrix of size n × n, where all the sub-matrices are square matrices of size m × m.Given ℓ . While defining a matrix (or vector), the value 0 can be replaced by a dot to make the reading easier.
Parallel applications.Abusing the previous notation, in case of generic mappings and Diag(G) will refer to the parallel application of G on each of the ℓ cells.Finally, if mappings F, G, A, S are defined from F m 2 to itself, F, G, A, S always refer to Diag(F ), Diag(G), Diag(A), Diag(S).

Commutation and conjugation
By F 1 commutes with F 2 we mean that F 1 and F 2 verify the following relation Abusing standard labeling, by A and B commute through F we mean that A, B, F verify are actually affine, it corresponds to a self-affine-equivalence relation of F .Such a behavior will be denoted using arrows, as in standard linear or differential trail studies: either as

Toy Cipher Families: Vert and Grün
Later in this paper, we will use variants of the block cipher Midori [BBI + 15] to illustrate our approach.Like Midori, they are named after the word "green" in different languages.

Block cipher.
Let us first introduce general notations common to all block ciphers we investigate.Let E k be an n-bit key-alternating block cipher parameterized by a κ-bit key k, a bijective non-linear S-box S : F m 2 → F m 2 and a linear layer L : We denote the translation by c as T c : we use T c := T c instead to emphasize that it is applied on the full state.We denote sk i the n-bit string (derived from k) that is added to the state at the start of round i, so that the r rounds of E k can be written Round functions are denoted using Midori64.First, the S-box layer of Midori64 uses a single involutive 4-bit S-box which is applied on all nibbles.Then, the MixColumn operation uses a binary quasi-MDS matrix M that is applied on each column: the i-th output cell is the XOR of the three input cells of index different from i.We denote M the parallel application of M on the 4 columns of the state.
The ShuffleCell operation consists in a reorganization (permutation) of the cells.Finally a round key is XORed at the end of each round.It is derived from the 128-bit master key K = K 0 ||K 1 : K 0 is used for even rounds, and K 1 for odd rounds.Sparse round constants that belong to {0x0, 0x1} 16 are also added at each round.From now on, if not explicitly stated otherwise, Midori will always refer to Midori64.

Vert.
Vert is a family of 64-bit-state and 128-bit-key ciphers which are heavily based on Midori64: all of its subroutines are almost identical to the ones used in the latter.
Vert uses the same S-box and MixColumn layers as Midori64.The permutation of cells can be chosen to be either the genuine Midori ShuffleCell or the AES ShiftRows.Finally, the key-schedule is identical, except that the round constants added in each nibble can take any value in {0, c} throughout the encryption (Midori uses c = 1).The cells permutation is denoted using subscripts, and the choice of c using superscripts: Vert c SR and Vert c SC .Midori thus corresponds to Vert 1 SC .Such modified versions of Midori64 have already been studied, namely Vert 1 SR in the original Midori paper [BBI + 15] (where bounds on differential active S-boxes are given), or in previous cryptanalysis papers, Vert c SC , c ∈ ⟨2, 8⟩ in [Bey18,Bey20] and Vert 5 SC in [TLS19].

Grün.
Grün is a modified-constants version of Midori128.It is identical to Midori128 except for the constants that lie in {0x00, 0x11} rather than in the genuine {0x00, 0x01}.

Commutative Cryptanalysis as a Unifying Framework
While the concept of a commutative distinguisher may seem abstract, we illustrate in this section a claim of Wagner, namely that it is in fact a convenient tool which captures a wide range of attacks, from differential cryptanalysis to rotational attacks. 2 To this end, we first setup some concepts that will be used throughout the paper, and then argue that multiple attacks fit them.As a side-note, in Appendix A we explain that the framework can also be used to understand what has previously been discussed as conjugate ciphers, whereby a round function R is not studied directly, but instead a conjugate version G • R • G −1 is, for an auxiliary permutation G. Though G itself may be non-linear, the differential behaviour of G • R • G −1 is best explained by commutative trails.

Basic Tools and Definitions
Definition 1 (A-Uniformity).Given an S-box S : F m 2 → F m 2 and two bijective affine mappings A, B : The maximum Γ S is referred as the a(ffine)-uniformity of S.
This naturally generalizes the well-known notion of differential uniformity as introduced by Nyberg in [Nyb94].Unfortunately, at this stage, there is no efficient algorithm to compute this quantity.Brute-forcing A and B is doable for m up to 4, but not beyond this small value.In fact, we consider that an algorithm able to solve this problem would be a very interesting scientific contribution.
2 is a permutation, and A and B are affine permutations with Γ S (A, B) = 2 m , i.e. S is self-equivalent, then: Proof.First, by multiplying on the left both sides of The second point is obtained using a straightforward induction: if The third point is proved by simply writing down the definition of Γ C•S•D (A, B) and simplifying the resulting equations.
Finally, for the fourth point, let x ∈ Fix(A) be a fixed point of A. Then it holds that S(x) = S • A(x) = B • S(x).In other words, S(x) is a fixed point of B and S(Fix(A)) ⊆ Fix(B).Equality follows from switching the roles of A and B and replacing S by S −1 .
It is possible to leverage such commutative patterns existing at the level of a subfunction to develop what we call commutative cryptanalysis.It can be seen as a generalization of differential cryptanalysis: for "regular" differential cryptanalysis, we study patterns of the type S • A(x) = B • S(x), where A(x) = x + α and B(x) = x + β.We can then define an active S-box as one for which A or B is not the identity.Furthermore, much like differential cryptanalysis, this attack can be investigated first at the round level and then adapted to multiple rounds using a commutative trail.This principle is summarized in the diagram Figure 1.As we show below, this framework captures several types of attacks.
Figure 1: Overview of a commutative trail built layer by layer.

Differential Attack (and Some Variants)
As we explained above, the classical differential attack corresponds to the case where In the binary case (p = 2), the definition of c D a F can thus be reformulated as Stated otherwise, the c-derivative with respect to a estimates how much T a and M c commute through F .While c-differential uniformity has been extensively studied on its own, e.g. in [EFR + 20, ERST21, HPS22, MRS + 21, SGG + 22], we are not aware of any cryptanalysis leveraging it at this stage.It seems to be hard to find an exploitable invariant.Indeed, as we will see later, as long as c ̸ = 1 any non-zero constant (or key) addition destroys a commutative trail involving M c .At the same time, c-differential uniformity is a lower bound on A-uniformity.

Rotational(-XOR).
Let ρ be the rotation of a word by one bit to the left.In a rotational distinguisher [KN10], an attacker tries to find pairs of rotations ρ i and ρ j such that ρ i • F = F • ρ j .This is a simple example of commutative pattern where the affine permutations correspond to the rotations.Furthermore, this patterns are built iteratively (round by round) in a way which corresponds exactly to a commutative trail.
This attack was more recently generalized into Rotational-XOR (RX) cryptanalysis [AL16] where the rotations can be composed with constant additions.The goal is to track pairs of inputs of the form (x + a, ρ i (x) + b), and see if they retain a relationship of this shape after each subfunction R. Concretely, the aim is to find (a, a ′ , b, b ′ ) and i such that R(x + a) + a ′ = R ρ i (x) + b + b ′ with a high probability.This equivalently means R(y) + a ′ + b ′ = R ρ i (y + a) + b , where y = x + a, which is a particular case of

Self-Similarity, Linear Commutants & Invariant Subspaces
In a paper of Leander, Minaud & Rønjom [LMR15], and in more depth in Minaud's thesis [Min16], the case of linear maps commuting with the round function is addressed.It corresponds to the case where A i is linear rather than affine.As explained in [LMR15], the existence of such linear commutants can be restated as a particular case of self-similarity.
Definition 3 (Self-similarity in a block-cipher [BB02,BDLF10]).For a fixed block cipher E, a self-similarity relation is given by invertible and efficiently computable mappings ϕ, ψ, θ such that: In Section 6, we will present several such relations.Moreover, as indicated in [LMR15], linear commutants always imply (possibly trivial) invariant subspaces.A similar implication holds when considering affine commutants, as the fixed points of an affine mapping form an affine subspace in case they exist.
Finally, the linear maps investigated in these previous works lie under the so-called "S-box independent setting": they act as a permutation of the cells.3Stated otherwise, they can be viewed as block matrices in which a single block of each row and each column is the identity matrix, while all the others are the zero matrix.This choice avoids taking into account the S-box and rather focus on the linear layer.The approach of Section 4 is complementary: the affine commutants are built using a small affine mapping A which is "S-box dependent".On the contrary, it almost avoids taking the linear layer into account.
Cryptanalysis of NORX v2.0 [CFG + 17].NORX v2.0 is an ARX permutation-based AEAD cipher which was a third-round candidate of the CAESAR competition [CAE14].In a paper from Chaigneau, Fuhr, Gilbert, Jean & Reinhard [CFG + 17], a ciphertext-only forgery attack on full NORX v2.0 is described.The cornerstone of this attack is the observation that the permutation P , that is applied to a square state, commutes with column rotations.Denoting ρ the rotation of a square state by one column to the left: we obtain ρ • P = P • ρ.This is proved by following a trail through two half-rounds as ρ commutes with the subcomponents G col , G diag of P .Finally, using the second point of Lemma 1, P commutes with the left column-rotation by any number of columns.

Probability-One Differential Over Two Rounds
In their recent work, Beierle et al. [BFL + 23] present a cipher that, for some weak keys, exhibits a probability-one differential over two rounds.This corresponds to a differential version of the backdoored cipher Boomslang which was proposed in [BBFL22].The S-box layer S consists of the threefold parallel application of an m = 5-bit S-box S. For the exact definitions of the S-box S and the linear layer L, we refer to [BFL + 23].Here, we just state that the probability-one differential, which does not consists of a single differential trail, can be understood as a single probability-one commutative trail.
More precisely, there is an affine map A, and a difference δ (both given in [BFL + 23]), for which it holds that translating the input of the S-box S by δ is the same as applying the affine map A to its output, and vice versa.That is, Of course, the same holds if we consider the full S-box layer S. Translating the inputs of all three S-boxes by ∆ = δ ×3 is the same as applying A = A ×3 to the outputs, and again vice versa.Furthermore, A commutes with the linear layer, i.e., A • L = L • A. Now, for an arbitrary key k and a weak key k ′ (we will discuss the properties of weak keys in Section 4.1), combining the aforementioned properties gives an iterative probability-one commutative trail over the two rounds L

Commuting with Basic Building Blocks
The easiest way to find commutants for any iterative construction is to find compatible ones for each building block, and then chain those to form trails. Thus, we investigate each of the layers of a traditional SPN block cipher (as specified in Section 2.2) separately.

Commuting with the Key Addition
The subkey addition has a non-trivial interaction with commutation.We indeed need to distinguish the probability of a commutation for a fixed key, and the probability that a given key enables a commutation.To better discuss these probabilities, we introduce the following concept.

Definition 4 (Strong and p-weak keys). Consider a commutation
The idea is that a key is weak if it enables the commutation, and is weaker if it allows it with a higher probability.For example, in the differential case where A = B = T a , we have that all keys are 1-weak (i.e. the worst possible situation from a security standpoint).On the other hand, if A = T a and B = T b with a ̸ = b, then all keys are strong.The following proposition allows us to predict these behaviours in the general case.
Proposition 1.Let T k , A, B : F n 2 → F n 2 be the translation by k and two affine permutations.Then, k is either strong or 2 − rank(L A +L B ) -weak, and the number of 2 − rank(L A +L B ) -weak keys is given by and thus as As a consequence, in order for the equation to have solutions (x, k), it is necessary and sufficient that Im(A + B) and Im(I + L B ) have a non-empty intersection.For a fixed k, the number of solutions x of the equation is then either 0 if (I + L B )(k) / ∈ Im(A + B), or given by the size of the kernel of Regarding weak keys, they can be enumerated first by fixing a value v ∈ Im(A + B) ∩ Im(I + L B ), and then finding the this can be greatly simplified as Eq.(1) becomes

and strong otherwise. The first case occurs for weak keys living in a space of dimension n − rank(I + L B ), and is possible if and only if c
Again, this corollary is coherent with the usual differential attack: in that case L A = L B = I, and the transition has probability 1 if c A = c B (0 otherwise).

Commuting with S-box Layers
Finding (all) affine permutations A and B such that S • A = B • S amounts to the wellknown problem of affine equivalence.Hence, we can use the algorithm of Dinur [Din18] if the degree of S is maximum (i.e., m − 1), or otherwise the one of Biryukov et al. [BDBP03].While less time efficient, the latter works for any permutation, regardless of its degree.We tweaked this algorithm to exhaustively list all pairs (A, B −1 ) of affine permutations such that B −1 • S • A = S and we added our implementation of this algorithm to sboxU4 as the self_affine_equivalent_mappings function.
While a random permutation (of sufficient size) is not expected to be (non-trivially) affine equivalent to itself [Hou06], the S-boxes used in practice are usually highly structured, either because they correspond to a simple Boolean circuit for an efficient implementation, or because they have a strong mathematical structure, e.g., because they are affineequivalent to a finite field monomial.Related to that, all known APN permutations admit a non-trivial self-equivalence and it has actually been conjectured to be true for all APN permutations in [BBL21].For a discussion on this, we also refer to [BDBP03].For 4-bit S-boxes, we checked all equivalence classes and found that 137 out of all 302 classes are (non-trivially) affine self-equivalent.The case of Γ S (A, B) < 2 m , i.e., the case of (A, B) not implying maximal A-uniformity, is covered by [BDBP03, Section 4.3].Now consider the whole S-box layer S. If S consists of S-boxes S with non-trivial linearity or differential uniformity then [RP20, Theorem 1] implies that there exist affine permutations A, B such that S • A = B • S if and only if there exist families of affine permutations and A and B are (up to a permutation of the S-boxes) the same as Diag(A 1 , . . ., A ℓ ) and Diag(B 1 , . . ., B ℓ ), respectively.In other words, finding A and B can be reduced to finding A and B such that S • A = B • S and combining them accordingly.Notice that this also includes linear mappings that merely permute the full inputs of the S-boxes, such as the ShiftRows operation of the AES.Much like in a differential cryptanalysis, such trails imply a notion of active S-boxes.We call active an S-box that is expected to commute with an affine mapping that is not the identity.When restricting commutative cryptanalysis to the differential case, the definitions match.Perhaps counter-intuitively however, an active S-box in a commutative trail does not necessarily decrease its probability.

Commuting with Linear Layers
First, we recall that for two affine permutations A, B : F n 2 → F n 2 to commute with the linear layer L with probability 2 −n Γ L (A, B) it has to hold that for Γ L (A, B) values of x.In other words, Γ L (A, B) is the number of solutions of the right hand side of Eq. (2), which is either zero if In case A and B are Diag((A i ) i ) and Diag((B i ) i ) with S • A i = B i • S respectively, meaning that L A and L B are block diagonal matrices, and if we denote by L i,j the blocks of size m × m of L, then (x 1 , . . ., x ℓ ) ∈ ker (L If we now require the commutation to happen with probability one5 (i.e. for all x), then the right hand side of Eq. (2) directly implies L(c A ) = c B (by using x = 0) and lies in the center of L, denoted by Z(L).Interestingly, the center of any linear map is an algebra and has therefore a lot of structure.Even more can be said if A is the parallel application of the same cell-size mapping A ′ , denoted by A. Firstly, any cells permutation layer commutes with any such (affine) permutation applied in parallel on all cells.But we can also fully classify all A with L • A = A • L for arbitrary L.
Theorem 1.Let L = (L ij ) be a linear permutation expressed as a ℓ × ℓ block matrix whose blocks are of size m × m.Let A = Diag(A) for an affine permutation A. Then Proof.From the discussion above we already know that is a block diagonal matrix whose blocks are aligned with the those of L.
As we can see, Condition 1 of Theorem 1 can be reformulated as: This reformulation is very convenient when investigating a fixed linear layer: in that case, the main objective is the description of the intersection of the centers of all sub-blocks.In particular, the case of matrices whose sub-blocks are either the null matrix or the identity matrix is very simple to handle.Indeed, any matrix commutes with both of them.In that case, Condition 1 does not constrain the choice of the linear part of A. As simple as this example might seem, it is actually very enlightening, as a lot of linear layers from the literature are built in this way.This is the case of binary MixColumn layers (as in Midori and many other ciphers), but also the case of the linear layers of LS designs [GLSV15].One should thus be careful when defining a cipher using a self-affine-equivalent S-box together with a binary linear layer, as the cases of Vert and Scream in Section 6 will highlight.

Searching for Probability-One Commutative Trails
By the discussion in Section 4.2, if the S-box S has non-trivial linearity or differential uniformity, then all A and B that commute through the S-box layer S are (up to a permutation of the S-boxes) exactly the direct products of A 1 , . . ., A ℓ and B 1 , . . ., B ℓ such that (A i , B i ) commute through the S-box S, and finding all (A i , B i ) that commute through S is an already-solved problem.Hence, we can reduce the problem of finding probability-one commutative trails to (efficiently) checking if there exists an arrangement of the (A i , B i ) such that L • Diag(A i1 , . . ., A i ℓ ) = Diag(B i1 , . . ., B i ℓ ) • L. As we have already seen in Section 4.3, this is equivalent to To prevent iterating all choices of Diag(A i1 , . . ., A i ℓ ), we note that for any , which allows us to filter A i and B i using a divide-and-conquer approach, see Algorithm 1. Furthermore, if two rounds of the cipher operate on independent parts, which are often referred to as superboxes, then we can analyze those parts independently, as long as they themselves do have non-trivial linearity and differential uniformity.

Algorithm 1 Searching Two Round Commutative Trails
Require: S-box S of size m, Linear Layer L, number of S-boxes ℓ = 2 d Ensure: All trails T 1 over L that can be extended to trails over S for (B, A ′ ) ∈ T i do ▷ Filter trails block wise 7:  Note that we see any constant addition as part of the key schedule here, and that the trails only hold if the round keys are within the class of weak keys.Additionally, knowing all possible two round trails enables us to combine them and exhaustively list all possible trails for any given number of rounds.We would like to mention that for most of the ciphers a sagemath [The23] implementation6 of Algorithm 1 takes less than a second to finish on a personal laptop7 (ignoring some pre-computation for setting up the linear-layer).For the others, it still takes less than two minutes to calculate all possible commutative trails after the affine self-equivalences are found (which we do using the algorithm of Biryukov et al. [BDBP03]).The only outlier here is Rectangle, where the three non-trivial self-equivalences are found immediately, but the algorithm still takes around 30 minutes to finish.Nevertheless, the highest total execution time we have seen is for AES, which is still below 40 minutes.
We tested our algorithm on AES, Ascon, Boomslang, Craft, Gift-{64, 128}, iScream, Kuznyechik, LED, Mantis, Midori64, Pride, Prince, Present, Rectangle, Scream, Skinny-{64, 128} and Streebog.Out of the tested ciphers, we only find (non-trivial) trails over at least two rounds for Scream, as well as Mantis and Midori, both of which use the same 4-bit S-box and the same linear layer.These trails (over the superboxes) for the later two are trails of the form .
We will discuss it further, first in Section 6.1.1,and Supplementary Material A.3.4.The second kind of trail is such that (A, B) ∈ {(A 2 , A 3 ), (A 3 , A 2 )} where . Note that all trails have the same class of weak keys.However, because the key schedule of Mantis is heavy (especially its dense round constants), the weak key space seems incompatible with the genuine scheduling.Thus, we focus our efforts on Midori/Vert whose constants are sparser.In the case of Scream, we show that the one (non-trivial) trail found (see Section 6.1.2) is compatible with its key schedule, and can propagate over an arbitrary number of rounds for 2 80 out of the 2 128 possible keys.

Applications
As we just established, Midori and Scream seem like promising targets for attacks leveraging commutative behaviours.For Midori, the commutative patterns briefly mentioned above and presented in Section 6.1.1 are in line with the observations made on Vert using a different framework, which are depicted in Supplementary Material A.1.Naturally, in this section, we study attacks based on commutative trails for Midori/Vert and Scream in more detail.Our effort is here focused on distinguishers.More specifically, in the case of Vert, we can compare our results to the complexity estimates obtained using classical wide-trail strategy arguments, as given in the specification paper of Midori.Those arguments do not take the constants into account, and would therefore hold for any Vert variants.Our results show one of the limits of such an argument.Indeed, despite the correctness of this argument, we will establish the following properties: • for 2 96 out of 2 128 keys, there exists a probability-1 commutative trail covering an arbitrary number of rounds in Vert 2 SC (and Vert 2 SR ), • for the same keys, this pattern implies a truncated differential covering an arbitrary number of rounds with probability 2 −16 , and • for 2 120 out of 2 128 keys, there exists for Vert 2 SR a commutative trail with probability 2 −4r over r rounds, which is essentially the square root of the wide trail bound.
These properties are summarized in Figure 2. We first present in Section 6.1 the probabilityone results for Vert, as well as for Scream.A complementary probability-one distinguisher for Grün is presented in Supplementary Material B.3.Then the probabilistic behaviors are presented in Sections 6.2 and 6.3.

Probability-one Trail in Vert
We consider here A 1 (and A 1 ), where A 1 is the mapping introduced in Section 5 and detail the distinguisher we obtain from it.
First, one can easily verify that A 1 • S = S • A 1 . 8Then, according to Section 4.3, Condition 1 of Theorem 1, is already satisfied by the binary Midori MixColumn.So is Condition 2 because any vector (c, c, c, c), with c ∈ F 4 2 is a fixed point of M .This immediately implies that c A1 ∈ Fix(M ), and thus L • A 1 = A 1 • L. Finally, according to Corollary 1, because A = B = A 1 here, there exist 1-weak keys and they correspond to the fixed points of Overall, with a one-bit condition for 16 nibbles of both half of the key (i.e. for 2 96 keys out of 2 128 ), we get a distinguishing self-similarity property for Vert 2 SC (and Vert 2 SR ) because 2 ∈ V : This weak-key space is, to be the best of our knowledge, new.Yet it is striking to observe the similiaries with weak-key spaces already present in the literature.Indeed, the non-linear invariant attack from Todo, Leander & Sasaki [TLS19] on Vert 5 SC also works on Vert 2 SC , and has 2 64 weak keys: ⟨0x2, 0x5⟩ 32 .The same holds for the nonlinear invariant presented by Beyne [Bey18,Bey20] which works for Vert 2 SC , given that 16 or K 1 ∈ ⟨0x2, 0x8⟩ 16 }.This naturally opens the question of how to establish a unified framework to look at all these sets as one.
However, it is also important to note that A 1 , and thus A 1 , have no fixed point.In that case, the invariant subspace obtained from the fourth item of Lemma 1 is empty.With this in mind, and compared to invariant subspaces distinguishers, such an affine-self-similarity relation appears to be fundamentally different, and in this particular case, stronger.
Remark 1. Regarding A 2 and A 3 defined in Section 5, they verify ), so the same distinguisher applies in that case, except that the trail will be an alternating one:

Weak Keys in Scream
Scream [GLS + 15] is a 128-bit-state and 128-bit-key tweakable block cipher of the LS-design category.Its 128-bit state can be viewed as an 8 × 16 matrix.The S-box layer consists in applying a unique 8-bit S-box in parallel on each column, while the linear layer consists in applying a unique 16-bit linear permutation (called L-box) on each row.At each round, round constants are added to the first row of the state, the key is added to the state (and the tweak, that we consider to be equal to 0 here, is added on the first 4 rows).For further details, we refer to the CAESAR competition [CAE14] submission document.
Using our tweaked version of the algorithm of Biryukov et al., we found out that the 8-bit affine permutation A 4 , that is defined below, commutes with the Scream S-box.
We further observe that Fix(L A4 ) = ⟨0x01, 0x10, 0x20, 0x40, 0x80⟩.As the round constants are added on the least significant row and as 0x01 (and 0x00) belongs to Fix (L A4 ), we immediately observe that the round constants belong to Fix(L A4 ), and thus are 1-weak constants, according to Corollary 1.Then, as observed in Section 4.3, Condition 1 of Theorem 1 is verified for A 4 , because L is only made of null and identity matrix blocks.Finally, c A4 = 0b00100001, so c A4 is composed of two all-1 rows and six all-0 rows.But one can easily verify that the all-1 vector (and obviously the all-0 one) is a fixed point of the L-box of Scream as the columns of its matrix given in [GLS + 15] all have an odd Hamming weight.This means that c A4 is a fixed point of L. Applying the same reasoning as for Vert 2 SC , we obtain a probability-one distinguisher for 2 128−3×16 = 2 80 keys because of the sixteen 3-bit conditions.
Unlike our attacks against Vert, this one can be applied to the "real" primitive without modifying its key schedule.However, the weak keys we obtain are a strict subset of those obtained in [TLS19], where the non-linear invariant attack they mount works provided that 2 rows of the key are constrained (while we need to constrain one more).

From Commutative Patterns to Very High Probability Differentials
Let us take a step back to Vert and look more carefully at A 1 .We can see that for any x ∈ V , we get A 1 (x) = x + 0xf and for any x ∈ F 4 2 \ V , A 1 (x) = x + 0xa.This means that x + A 1 (x) ∈ U , where U := {0xa, 0xf}, each equality holding with a probability of 1/2.As a consequence, by picking a random state x ∈ F 64 2 and looking at the pair (x, x + 0xf ×16 ), we actually have a pair of the form (x, A 1 (x)) with a probability of 2 −16 .However, as we established, A 1 commutes with any number of rounds of Vert (provided that the constant and key nibbles are all in V ).This means that the final difference is necessarily of the form y + A 1 (y), and thus has to lie in the small set U 16 .As a consequence, for the 2 96 weak keys in V 32 there exists a truncated differential of the form 0xf ×16 → U 16 that has probability 2 −16 over an arbitrary amount of rounds!We have experimentally verified this surprising property.
This observation raises multiple striking points.First of all, the cost function of such a truncated differential is remarkable: it is independent of the number of internal rounds and only depends on the cost of the first round.9Moreover, as we also know that any internal difference on any nibble has the form x + A(x) ∈ U , we are also assured that all the S-boxes of each round will be differentialy active because 0 / ∈ U .The discrepancy between the bound on the differential probability obtained via a wide-trail argument and our result is illustrated in Figure 2. It is not only yet another example of how much the fixed-key behavior can deviate from the expected average computed with standard wide-trail strategy arguments, but even more a high differential independent of the number of rounds.
The same actually happens when the genuine Scream is used with a 1-weak key.Indeed, I + A 4 can take only eight values; we denote U ′ this set of values, and let α ∈ U ′16 .In that case, for a random plaintext x, the pair (x, x + α) has a probability of 8 −16 = 2 −48 to coincide with (x, A 4 (x)); the corresponding truncated differential α → U ′ has thus a probability of 2 −48 , independently of the number of rounds.Because 0 / ∈ U ′16 , we are again assured that all the S-boxes will be differentially active.

Probabilistic Trails through the Linear Layer
As we have seen, the size of the weak-key space is driven by the number of commutatively active S-boxes.It is tempting to try to use a partial A instead of the full one, as we did for now.Indeed, decreasing the number of active S-boxes will limit the number of constraints on the key and thus increase the number of weak keys.However, this change imposes a counter-intuitive concept: that of active linear layers.Indeed, such a partial affine permutation does not commute with matrix multiplications with probability one.
Let i, j ∈ {0, • • • , 15} and their binary decompositions i = 3 t=0 i t+1 2 t and j = 3 t=0 j t+1 2 t .We denote A 0 := I and A 1 = A and define A i := Diag(A i1 , A i2 , A i3 , A i4 ).We study M • A i (x) = A j • M (x) by developing it, and observe that it is equivalent to the following equality: , where L A = L A + I and δ u,v is the complement of the Kronecker delta: δ u,v = 1 if u ̸ = v and 0 otherwise.
As we can see, should the right-hand side not be in the image of B(i, j), then the transition A i M − → A j would be impossible.Otherwise, the number of x that satisfy this relation is given by the size of the kernel of B(i, j), and can be deduced from the size of its image.The dimension of this image is given in Table 1.
Table 1: Dimension of Im(B(i, j)).All entries must be multiplied by dim Im( L A ).When written in bold, the commutation holds for all c A , otherwise, we need c A ∈ Im(I + L A ).

Application to Vert 2 SR
The counter-part of choosing a partial affine layer is the necessary handling of the cells permutation, which could be ignored beforehand.Hence, because the ShuffleCell permutation is stronger than the ShiftRows permutation, what follows only applies to Vert 2 SR .The square activity pattern (see Equation 3), which has already been used for instance against PRINCE [CFG + 15]), is preserved by the classical ShiftRows.
As a consequence, if we consider commutants that are inactive everywhere (the identity mapping) except on these nibbles which are all activated with the same mapping A, then we can build iterated commutative trails.As just established, commutation with the ShiftRows operation holds with probability one.In order to go through the S-box layer with probability one, we use the already-introduced affine permutation A 1 on the active nibbles, and denote A 1 the corresponding partially-active mapping.Then, in order to study the transition for the whole state A 1 M − − → A 1 , we first examine, thanks to Table 1, the transition for a single partially-active column.This activity pattern corresponds to the case i = j = 5 and dim(Im( L A )) = 1: in that case Im(B(i, j)) has dimension 2.Then, because of the two active columns.Thus, assuming independence of the rounds, Vert 2 SR has commutative trails with probability 2 −4r , where r is the number of full rounds (rounds with M involved).This probabilistic behaviour is counter-balanced by the size of the weak-key space.Indeed, because the square activity pattern only involves four active nibbles, only the corresponding nibbles of the key need to be constrained10 (they must belong to Fix(A 1 )).A significantly bigger weak-key space is thus obtained: with four 1-bit constraints on each half of the key (K 0 and K 1 ), the number of 2 −4r -weak keys becomes 2 128−4×2 = 2 120 .

Experimental Results
These high probabilities enable to thoroughly test experimentally our distinguishers.

Experiment 1
We picked uniformly at random (weak key, plaintext) pairs (K, x) and verified whether , where E stands for a round-reduced Vert 2 SC or Vert 0 SC .This estimates the probability of the hull but not the one of the trail, which is the one mentioned in the previous section.As experimenters with full-access, we also focus on the trail by studying Ri We repeated the draw (1 key and 1 plaintext) 2 36 times, expecting an average number of solutions of 2 36−4(r−1) for the r-round version.As we can see on Figure 3, the behaviour of the experimental average is more intricate as one could first think.In the weak-constant setting and as the rounds go, the experimental average of the trail seems to slowly decrease compared to the theoretical average.However, the behavior of the hull stays really close to the theoretical average for the trail.This seems to indicate the not-so-surprising fact that the round independence hypothesis is probably too strong in some cases at the trail level.However, even if the dominance of the trail among the hull slowly vanishes, the hull effect becomes stronger and compensates this drop.The difference between the null constants (a particular class of weak constants) and the weak ones is also pointed out by Figure 3.With no round constant, no asymmetry are input at each round.This could be the reason why in that scenario, if a pair (x, A(x)) goes through a few rounds, it has a higher probability of continuing going further.
Experiment 2 We also studied the fixed-weak-key setting, for multiple round-reduced versions.We picked uniformly at random a weak key and a set of plaintexts (K, X), and observed whether A • Vert 2 SC (K, x) For a fixed weak key, we drew 2 4(r−1)+6 plaintexts, hoping for an average of 2 6 solutions.We repeated the experiment for 10000 weak keys, except for r = 7 for which we used 6000 weak keys.Naturally, the average from Experiment 2 goes in the same direction as Experiment 1: as the rounds increase, the trail average moves away from the theoretical one while the hull average stays much closer.What really appears in Figure 4, is the fact that the average  case taken over "all" weak keys and "all" plaintexts for the trail is not as representative as we could expect: the probability p = 2 −4r seems appropriate for r = 3, however as r grows it seems that p-weak keys are rather p ′ -weak keys where p ′ can take a palette of values.
For the hull, the distribution of p ′ seems to flatten as r grows and tends to a uniform distribution over [0, 1].In particular, it is unclear why about half of the tested weak keys appears to be actually strong, while some others are weaker than expected.An experiment studying the strongest keys is provided in Supplementary Material C. As just shown, the basic model seems to work well-enough to estimate the average probability and effectiveness of our distinguisher.However, it fails at explaining precisely the all-A trail.Explaining the observed clustering, and understanding the sub-classes among the weak keys are two of the many open questions raised by our experimentation.
Regarding the fixed points, L A6 has 8 and L A7 has 4. It also holds that Γ S (A 7 , A 6 ) = 12.We can reuse our previous framework to mount an alternating commutative trail based on the square pattern, as well as the corresponding distinguisher.In that case, the probability of going through the S-box layer is estimated as 2 4 log 2 ( 12 16 ) as 4 S-boxes are activated.The probability of going through a full round should thus be 2 4(log 2 ( 12 16 )−1) , and the theoretical average mentioned in Figure 5 is computed as 2 4(log 2 ( 12 16 )−1)r+4 log 2 ( 12 16 ) , because of the final round where the linear layer is omitted.
However, there is a significant divergence between our initial estimate and our experimental results: the probability of commuting is higher than expected under the assumption that S-box and linear layer transitions are independent.To get a better picture, we computed the probability of having where A r is a partial layer involving A r , where {i, j} = {6, 7} and r ∈ {i, j}.Checking all 2 16 possible inputs, we have that the two transitions happen with probabilities 2 −5.6 (which is coherent with our estimate), and 2 −8.8 (instead of 2 −9.6 ).Understanding the rest of the difference is an interesting open problem, but we expect dependencies between the round probabilities.
We also found A 8 = {4, 5, d} which is such that A 8 • S(x) = S • A 8 (x) holds with probability 10/16.L A8 has 8 fix points, but in that case 1 belong to them, so the genuine constants of Midori can be used.
In both of these cases, the probability of going through a round-reduced version of Vert SR is smaller than in Section 6.2.2, because of the new cost imposed by the probabilistic crossing of the S-box layer.The weak-key space, in the first example, is also smaller.While a priori less impressive than the results based on probability-one S-box transitions, this probabilistic case opens some very interesting open problems.Indeed, while we can quickly rule out the applicability of the probability-one case by ensuring the absence of non-trivial commuters for the S-box, there is no way at this stage to efficiently compute the A-uniformity of an S-box operating on more than 4 bits.Thus, we cannot be sure that primitives using 8-bit S-boxes are safe from non-probability-one commutative cryptanalysis.

High Probability Commutants in the AES Super-S-Box
The AES [AES01] is a 128-bit block cipher, arguably the most important primitive in symmetric cryptography due to its wide use, and well-trusted security.As shown by Gilbert and Peyrin in [GP10], two rounds of this primitive can be seen as the application of a layer of Super S-boxes followed by an affine layer.Here, we focus on said Super S-boxes: they are permutations of (F 8 2 ) 4 obtained by composing: 1. a layer of four parallel S-boxes S : where C is an affine permutation (we shorten c := c C ) and Inv is the multiplicative inversion in F 256 , 2. a key addition,11 3. a multiplication of the internal state by the MC matrix operating on F 4 256 , and 4. another application of the S-box layer.
We denote Mult λ the multiplication by λ in F 256 , and recall that Since S is essentially a monomial, it is tempting to investigate its commutative behaviour with a multiplication as input.12Lemma 2. Let G = B • F • A be a permutation of F 2 m , where A and B are affine permutations, and where F : x → x d is a power permutation of F 2 m .Then, for any and deduce the lemma.
For the AES S-box, we obtain that S • Mult λ = Mult C 1/λ • S, for any λ ∈ F 256 \ {0}.Since the linear layer is a simple matrix multiplication, it commutes with four Mult λ applied in parallel.As a consequence, we investigate commutative trails of the form where all transitions have probability 1 except for Mult C λ T k −→ Mult µ .We consider that λ ̸ = 1, otherwise we only get a trivial result.Applying Proposition 1 we deduce that the set of p-weak keys is of size . The set ker(I + Mult µ ) is trivial because µx = x is equivalent to x = 0 as µ ̸ = 1.This implies that Im(I + Mult µ ) is the full field.We deduce that the number of p-weak keys is equal to −→ Mult µ , and if so what the value of p is.For all λ ̸ ∈ {0, 1}, there exists at least one µ such that keys that are either 2 −5 -, 2 −4 -or 2 −2 -weak for the corresponding commutation.More precisely, there are 68 values of λ for which the weakest keys are 2 −5 -weak, 180 for which they are 2 −4 -weak, and 6 for which they are 2 −2 -weak.
We can look at this property from another angle: what is the probability that a random 32-bit key is at least p-weak for at least one commutation, thus yielding a commutation for the full super S-box?This requires that all bytes of the key be weak for the same pair (λ, µ).To estimate this probability, we sampled 2 22 uniformly random 32-bit keys K, and then checked if there exists a pair (λ, µ) such that all its 8-bit cells are p-weak for it.Out of the 2 22 keys K we looked at, there existed a pair (λ, µ) such that K is at least 2 −16 -weak in 6430 cases, meaning a density of 2 −9.35 .We observed 2 −20 -weakness in 196 089 cases, hence a density of about 2 −4.42 for this weak-key space.We also computed directly the number of 2 −8 -weak keys: there are 6 transitions for which such keys exist, and 4 keys allow each; hence, there are 6 × 4 4 ≈ 2 10.58 very weak keys, i.e. a 2 −21.42 density.
These high probabilities show that commutative cryptanalysis is a powerful technique.Indeed, as established by Keliher & Sui [KS07], the maximum expected differential probability for the AES super S-box is 53/2 34 ≈ 2 −26.27 , which is much lower than the worst probabilities we considered.However, commutative cryptanalysis only works for weak keys, but, as we established, this set can be of much higher density than we might expect.

Conclusion and Future Work
Our revisiting of commutative cryptanalysis, an idea that dates back almost 20 years, provides a rich structure and solid foundations for understanding and applying this approach concretely.While it allows to explain interesting phenomena in a compact and unifying way, it also leaves many open questions, possibilities for improvements and paths to follow.
Beyond the high probability patterns we found, the existence of weak-key truncated differentials in a Midori variant and in Scream whose probability is independent from the number of rounds, and does not correspond to any trail, challenges how usual security arguments are built.Indeed, a differential pattern over r rounds is not necessarily obtained by concatenating several high probability patterns over fewer rounds.This implicit but very common assumption should be used with care.
From an application point of view, as mentioned above, it is most important (and challenging) to develop algorithms that allow to compute the A-uniformity in cases where it is close to maximal.This would then potentially allow to find good, but not probability-one, commutative trails for large classes of ciphers -likely discovering new attacks, at least in weak-key settings.Moreover, the ciphers we considered so far all use a very simple key-scheduling.Understanding the influence of more complex key-scheduling algorithms is left open.A closely related but twisted question is if it is possible to hide the existence of a highly-probable commutative trail, potentially in larger than usual S-boxes.This might be a way of putting backdoors in otherwise secure ciphers that are very hard to find -even so the general principle would be known.Finally, similarly to the case of c-differentials, and even better motivated by attacks, studying A-uniformity from a Boolean function point of view seems rewarding.Concrete questions include the discussion on bounds on the A-uniformity and the construction of families of permutations with either maximal or minimal A-uniformity.
may also have other highly-undesirable properties, such as almost-one linear approximations for some weak keys.
If previous works have thoroughly studied the linear case, the differential one has been (to the best of our knowledge) left out until now.As we show in the next section, the differential properties of some particular conjugates of a cipher can be expressed in the commutative framework.The experimental results that follow, while being a little bit more general, assess the soundness of such a relationship between commutation and conjugation.

A.2 Relationship with Commutative Cryptanalysis.
For a permutation F n 2 → F n 2 , we have that We rewrite it as: Under the assumption that G is a permutation such that T (G −1 ) α and T (G −1 ) β are affine, we thus obtain: where . In particular, α → β holds with probability one through F G if and only if A and B commute through F .A particular case of the situation we just described actually happens for Vert, as we will show below.

A.3.1 The conjugation framework
Let us consider a composition F = R r • ... • R 1 , where all functions R i map F n 2 to itself.A conjugate of F can be obtained by composing conjugates of its subfunctions for the same permutation.Indeed, by interleaving G −1 • G within the computation of F G , we can rewrite We investigate the 4 main operations in Midori (and Vert).First, a permutation G applied in parallel at the S-box level commutes with the cells permutation, be it ShuffleCell or ShiftRows.Thus, if we denote P a cells permutation, we have that G • P • G −1 = P.
Recall that the MixColumn layer M consists of the parallel application of M over the 4 four columns of the state.Thus, it does not provide any intra-nibble mixing and simplifications can be hoped to occur in G • M • G −1 provided that G (and thus G) is sparse enough-which is indeed the case, as we will see.
Usually ignored14 in statistical attacks, the key addition plays an important role here.Indeed, studying the differential behavior through T G c is not as simple as through a standard key addition because the key dependency within it can be non-linear!In order to handle transitions through T G c , it is important to keep it as simple as possible.Finally, regarding the S-box layer, we can restrict our investigation to the nibble level, which allows us to brute-force a rather large space of candidates.

A.3.2 Our space of conjugates.
Given the analysis sketched above and the necessity of G to be simple and sparse enough, we investigate the conjugates of the S-box S of Midori through change of variables containing a single quadratic coordinate, as they are the simplest non-linear change of variables one can think of.Because a balanced quadratic Boolean function in m variables is linearlyequivalent to a function of the m−1 , • • • , x 2 ) + x 1 (see Propositions 55 and 28 of [Car20]) we chose at first to study Feistel-like permutations (in fact, involutions) of the form: where g is a quadratic Boolean function.This however induces a restriction in our search space: only the 1-component can be non-linear.To solve this, we compose our change of variables with a linear permutation which enables to move the 1-component into the a-component, for any value of a ∈ F n 2 .The only constraint for such a linear permutation L a is that L a (1) = a.We thus look at conjugates of the form , where, in our case, we deterministically built L a starting from a chosen a: once the image of 1 is fixed, only the images of 2, 4, 8 need to be chosen (as (1, 2, 4, 8) is the standard basis of and L a is linear).So starting from 2 to 8, we selected as image the smallest integer such that the rank of the partial list of images is increased by one, until obtaining the images for the full basis.
All in all, we focused on the class of permutations G a,g := G g • L −1 a .Regarding the practical search, this space is sufficiently constrained to be efficiently explored in practice as it consists of 2 m − 1 choices for a multiplied by 2 2 m−1 Boolean functions mapping m − 1 bits to 1 bit, meaning about 2 12 possibilities (in our case m = 4).
Yet being quite small, this class contains interesting conjugates.Indeed, when used in parallel, such sparse permutations yield simple conjugates for M .Furthermore, the conjugates of constant additions have the intended very simple shape: We further see that where c ′ = (c m , • • • , c 2 ) and ∆ c ′ g denotes the (first-order) derivative of g toward c ′ .Finally, we obtain T and as g is quadratic, we observe that G ∆ c ′ g and thus T G −1 a,g c are affine.This corresponds to the assumption we gave for Equation 5 to hold; it will be needed for the commutative interpretation in Supplementary Material A.3.4.
In the same way we observe that, T •G g is also affine.This corresponds to our desire to keep the conjugate of key addition as simple as possible to enable an easier study of the key dependency.
Let us now present a differential study of a conjugate cipher, built from such permutations G a,g .
From the remainder of this section, we denote G = G 5,x2+x2x4 for lighter notation.We also denote S ′ := S G .
First, we can easily observe that 0xd → 0xd holds with probability one through S ′ (for instance S ′ (0x0) + S ′ (0xd) = 0xb + 0x6 = 0xd and so on).It immediately yields a probability-one γ → γ transition through the conjugate of the S-box layer S ′ := S ′×16 , where γ := 0xdddddddddddddddd.Furthermore, we observe that 0xdddd → 0xdddd holds with probability one through M G ×4 , which ultimately leads to the probability-one transition γ → γ through M G , the conjugate of the full MixColumn layer. 15erhaps counter-intuitively, while this differential goes through both the S-box and the linear layer with probability one, its interaction with the key addition is more sophisticated.As expected with such a choice for G = G a,g (the Boolean function g is a quadratic function), T G k is affine, so once a key is fixed, the derivative in any direction is constant.This means that, depending on the key k, any transition α → β either holds with probability 0 or 1 through T G k .We can thus easily establish the set V of key nibbles for which α → β holds with probability one, by looking at the equation for x = 0, namely: m}, is equal to c i + (c i + α i ) = α i , while the first one is equal to α 1 + g(0) + g(c ′ ) + g(α ′ ) + g(c ′ + α ′ ), where c ′ , α ′ correspond to c and α where the first coordinate is omitted.Replacing c by L −1 a (k 4 , k 3 , k 2 , k 1 ) = (k 4 , k 2 , k 1 + k 3 , k 3 ), and using that g(x) = x 2 + x 2 x 4 , we thus obtain that Finally, we are able to determine the set of keys V for which 0xd → 0xd holds with probability one through T G k , by solving 1 = 1 We easily obtain V = {k ∈ F 4 2 , k 1 + k 3 = 0} = ⟨0x2, 0x5, 0x8⟩.To conclude, provided that the round key lies in V 16 , there exists a probability-one differential γ → γ for the conjugate of one round of Vert 2 SC or Vert 2 SR .This is true for any number of rounds if both K 0 and K 1 are in V 16 .It is also true for any variant of Vert where each constant nibble is taken in ⟨0x2, 0x5, 0x8⟩, i.e. each nibble can take 8 of the 16 values.
Remark 2. This does not apply to the genuine Midori because 1 / ∈ V .
Any such change of variables G (resp.conjugate cipher) can be used as distinguisher: given an oracle access to Vert 2 SC or Vert 2 SR , it is sufficient to choose, p 1 = G −1 (p), p 2 = G −1 (p + γ), to ask for the corresponding ciphertexts c 1 2 and to verify whether G(c 1 ) is equal to G(c 2 ) + γ.
More importantly, as we see in the next section, the differential behaviour of this conjugate of Vert is best explained by commutative trails.

A.3.4 Commutative Interpretation
The differential behaviour of the non-linear conjugate of Vert we exhibited here can be explained by the iterative commutative trail described in Section 6.1.1.Indeed, as mentioned in Supplementary Material A.2, Equation 5 links the existence of a probabilityone differential α → β for F G to the existence of an affine self-similarity behavior for F .However, this holds under the assumption that T G −1 α and T G −1 β are affine.In the case of G a,g , where g is quadratic, we already proved in the last section that T G −1 a,g c was indeed affine.Let us look at the case where a = 5 and g = x 2 + x 2 x 4 more carefully.In that case, from the design of L a , we know that L a verifies L a (1) = 5, L a (2) = 1, L a (4) = 2, L a (8) = 8.We thus deduce16 the ANF of L a : From the ANF of L a , L −1 a and g, we can deduce the ANF of T (x) = (x 4 + 1, x 1 + 1, x 2 + 1, x 3 + 1).As a matter of fact, we can observe that T Ga,g −1 0xd coincides with the definition of A 1 given in Section 5.The differential behavior of this non-linear conjugate of Vert can thus be equivalently explained by the commutative framework.

B The Toy Cipher Grün and its Probability-one Trail
In this section, we present Midori128, the 128-bit state version of Midori, the toy cipher Grün which is based on it, as well as a probability-one self-similarity distinguisher for Grün based on the same commutative cryptanalysis techniques that we presented in Section 6.1.

B.1 Midori128
The 128-bit-state Midori128 operates on a 4 × 4 square state of bytes.The S-box layer uses four distinct involutive 8-bit S-boxes: one of them is applied on each byte.These four 8-bit S-boxes are built by linearly conjugating a parallel call to a single 4-bit S-box.We denote them SSb i := SS Li .They are depicted in Figure 6.

B.2 Grün
Let us recall that Grün is a modified-constants version of Midori128.It is identical to Midori128 except for the constants that lie in {0x00, 0x11} rather than genuine {0x00, 0x01}.

B.3 A Third Example of Probability-one Commutative Trail
Let us apply the same methodology as in Section 6.1 to the case of Grün.As we can see in Figure 6, each of the four S-boxes used in Grün has an internal symmetry: swapping the two nibbles in the input amounts to swapping the two nibbles in the output.The four of them thus commutes with the linear map A 5 : F 4 2 × F 4 2 → F 4 2 × F 4 2 , (x, y) → (y, x) and the full S-box layer thus commutes with A 5 .Then, as already explained, Condition 1 of Theorem 1 is immediately verified for Midori MixColumn.So is Condition 2, because here, A 5 is linear (i.e.c A5 = 0).Thus, A 5 • M = M • A 5 holds.The ShuffleCell layer naturally commutes with A 5 .Finally, we observe that Fix(L A5 ) = {(x, x)} ⊂ (F 4 2 ) 2 ≃ F 8 2 .Because the constants of Grün lie in Fix(L A5 ), we obtain from the 16 4-bit conditions, a set of 2 128−4×16 = 2 64 weak keys.

C A Complementary Experiment
Experiment 3. We studied more precisely the peak of the strongest keys among the weak ones.To do so, we repeated Experiment 2, but this time, we fixed the round number to r = 4 and let the size of X grow.

Interpretation.
The idea behind such an experiment was to observe whether a Gaussian bell could be hidden behind the peak for 0 solution: a key for which the actual number of solutions is very low, could appear as a key with no solution because of a lack of data.According to Figure 7, the peak stays the same as data increase.Either the peak does not collapse, or still, more data is needed.However, for the other weak keys, a multitude of sub-classes appear among the Gaussian bells, as data increases.7: Fixed-key study for 4-round version: Evolution of the distribution of the numbers of x, A(x) pairs following the trail/hull, as X grows.
provided that C and D are affine permutation, i.e.A-uniformity is invariant under affine equivalence, and 4. S(Fix(A)) = Fix(B).
represents a superset of all trails over L that can be extended to ones over S • L • S 3: for b = 0, . . ., d do ▷ Consider blocks that contain 2 b S-boxes 4: Split L into the m • 2 b × m • 2 b blocks L i,j 5: for i = 1, . . ., 2 d−b do ▷ For each (diagonal) block 6:

Figure 2 :
Figure 2: Comparison of the complexities of our attacks with wide-trail argument bounds.
r) TA(r)) / TA(r) trail w/ weak constants hull w/ weak constants trail w/ null constants hull w/ null constants 3: Evolution of the deviation between experimental (EA) and theoretical (TA) averages throughout the rounds.

Figure 4 :
Figure 4: Fixed-key study: Estimation of the p-weakness through the numbers of solutionsx, A(x) following the trail/hull.The expected average is 2 6 for every number of rounds.
for any i.In that case, the commutative trail exactly corresponds to a classical differential trail.F p n → F p m , and c ∈ F p m , the (multiplicative) c-derivative of F with respect to a ∈ F p n is the function c D a F defined as c D a F (x) = F (x + a) − cF (x), for all x ∈ F p n .For a fixed c ∈ F p m let k be the maximal number of solutions of c D a F (x) = b, where the maximum is taken over b and that p = 1/| Im(Mult L C λ + Mult µ )|.We then have a trade-off: if | Im(Mult Lc λ + Mult µ )| is large, then the number of pweak keys is large but they are not very weak as p is small.On the other hand, if Im(Mult Lc λ + Mult µ ) is small, then there are very few p-weak keys, but they are very weak.