Maximums of the Additive Differential Probability of Exclusive-Or

At FSE 2004, Lipmaa et al. studied the additive differential probability adp⊕(α, β → γ) of exclusive-or where differences α, β, γ ∈ F2 are expressed using addition modulo 2. This probability is used in the analysis of symmetrickey primitives that combine XOR and modular addition, such as the increasingly popular Addition-Rotation-XOR (ARX) constructions. The focus of this paper is on maximal differentials, which are helpful when constructing differential trails. We provide the missing proof for Theorem 3 of the FSE 2004 paper, which states that maxα,β adp⊕(α, β → γ) = adp⊕(0, γ → γ) for all γ. Furthermore, we prove that there always exist either two or eight distinct pairs α, β such that adp⊕(α, β → γ) = adp⊕(0, γ → γ), and we obtain recurrence formulas for calculating adp⊕. To gain insight into the range of possible differential probabilities, we also study other properties such as the minimum value of adp⊕(0, γ → γ), and we find all γ that satisfy this minimum value.


Introduction
Differential cryptanalysis [BS91] is a well-known statistical method for the analysis of symmetric-key primitives. The main idea is to see how a difference ∆X between two inputs (e. g., plaintexts) propagates to a difference ∆Y between the corresponding outputs (e. g., ciphertexts). The ordered pair (∆X, ∆Y ) is referred to as a differential. A differential trail is defined as a sequence (∆X, ∆X 2 , . . . , ∆X p−1 , ∆Y ) where ∆X 2 , . . . , ∆X p−1 are some intermediate values that appear in the primitive.
A common technique to construct a differential trail is to use a "greedy" strategy to pick the intermediate differences that have the highest differential probability. Under some assumptions, the probabilities of a differential trail can be multiplied together to obtain a good estimate of the probability of a differential.
However, this presupposes that the maximal differential probabilities of elementary operations can be efficiently calculated. For ciphers based on S-boxes, this is rather straightforward: their size is usually small enough so that all input and output differences can be enumerated in a Difference Distribution Table (DDT). However, this is often not the case for Addition-Rotation-XOR (ARX) constructions, where the addition modulo 2 n can have n = 32 or n = 64, thereby making it infeasible to Figure 1: The differential probability adp ⊕ (α, β → γ) of exclusive-or when differences are represented using differences α, β, γ are expressed using addition modulo 2 n . The probability is obtained by averaging over all values of x and y.
construct a DDT. Two of the five finalists of the NIST SHA-3 hash function competition are ARX constructions: BLAKE [AMPH14] which uses either 32-bit or 64-bit additions (depending on the length of the hash value), and Skein [FLS + 09] which uses 64-bit additions.
The differential probability adp ⊕ of exclusive-or (XOR) when differences are expressed using addition modulo 2 n was studied at FSE 2004 by Lipmaa et al. [LWD04]. It is defined as adp ⊕ (α, β → γ) = Pr x,y∈F n 2 [(x + α) ⊕ (y + β) = γ + (x ⊕ y)], and illustrated in Fig. 1. Lipmaa et al. showed that adp ⊕ can be expressed as a rational series. That is, if we define ω i = 4α i + 2β i + γ i , then (as we will recall in Sect. 3) there are eight 8-dimensional square matrices A j , a column vector C, and a row vector L, such that adp ⊕ (α, β → γ) = L · A ωn−1 · . . . · A ω0 · C, here ω i , i. e., which matrix is used as the i-th term of the product, depends on α i , β i , γ i . This formula allows us to easily calculate the probability given a differential (α, β → γ).

Related Work
At the Dagstuhl "Symmetric Cryptography" seminar in January 2009, Weinmann introduced the term AXR for symmetric-key primitives based on additions modulo 2 n , XORs and rotations. Later at the FSE 2009 rump session, he renamed the term to ARX. The design strategy, however, is much older: perhaps the earliest example of an ARX primitive is the block cipher FEAL [SM88]  To apply differential cryptanalysis to an ARX primitive, one approach is to use XOR differences: these differences pass through rotation and XOR operations with probability one, and formulas for the differential probability xdp + of the modular addition were provided at FSE 2001 by Lipmaa et al. [LM01].
In this paper, however, we are interested in differences that are expressed using addition modulo 2 n . These differences go through the modular addition with probability one. The additive differential probability of rotation was studied by Berson [Ber92], and Lipmaa et al. [LWD04] provided a formula for adp ⊕ , the additive differential probability of XOR.
Using Lipmaa et al.'s expression for adp ⊕ , Velichkov et al. [VMDCP12, App. C] provided a search algorithm to list the output differences γ that maximize adp ⊕ for a given (α, β). Although this search algorithm can be very helpful, it cannot be used to provide general statements that hold for any value of n. At FSE 2011, Velichkov et al. [VMDCP11] explained how to calculate the additive differential probability of one ARX operation. Sun et al. [SHW + 16] showed how to model adp ⊕ using the Mixed-Integer Linear Programming (MILP) approach for differential cryptanalysis [MWGP11].
Compared to additive differences, XOR differences not only propagate through two operations with probability one (XOR and rotation) instead of only one operation (addition). Another advantage of using XOR differences over additive differences is that the differential probabilities have simpler expressions (see Lipmaa et al. [LWD04, Table 3]). Lipmaa et al. [LWD04] pointed out that the number of possible differentials is larger for adp ⊕ than for xdp + , but the average possible differential has a smaller probability.
Despite the advantages of using XOR differences, there are ciphers for which additive differences may be more appropriate. For example, when Biryukov and Velichkov [BV14] provided a differential cryptanalysis using additive differences for TEA [WN94] and Raiden [PHCER08]; they argued that additive differences are more appropriate given that round keys and round constants are added (instead of XORed), and that there is a higher number of add operations compared to XOR operations in one round. In similar spirit, when Sparx and LAX were proposed by Dinu  20a], they provided some rationale of why their designs resist differential attacks using additive differences.
Lastly, we would like to point out that care should be taken when multiplying probabilities of differentials. For example, in the differential cryptanalysis of XTEA [NW97] by Hong et al. [HHK + 03] using XOR differences, the authors constructed a three-round iterative trail (α, 0) → (α, 0), where α = 0x80402010. The trail contains two consecutive addition operations, which separately have probabilities xdp + (α, 0 → α) = 2 −3 and xdp + (α, α → 0) = 2 −3 . Hong et al. found that the joint probability xdp + (α, 0, α → 0) is higher than the product of the two probabilities 2 −3 · 2 −3 = 2 −6 , and estimated the probability to be 2 −4.755 . Mouha et al. [MVDCP11,Sect. 3.6] revisited this problem by correctly calculating the XOR-differential probability of the three-input addition as 2 −3 , which can be trivially confirmed using the commutative property of addition: Mutatis mutandis, a similar observation also holds when analyzing, for example, the two consecutive XOR operations in one round of TEA using additive differences: calculating the differential probabilities of each XOR operation separately using the formulas in this paper and multiplying them, may not lead to a correct estimate. Therefore, some caution is needed when applying the results in this paper to differential trails of an ARX primitive. We consider these issues to be outside the scope of this paper, but we mention the analysis of larger components as a suggestion for future work in Sect. 9.

Definitions
Let G, H be abelian groups and f : G → H be a function.
The differential probability is defined as In this work, we consider the additive differential probability adp ⊕ of exclusive-or, i. e., G = H = Z 2 n and the function f (x, y) = x ⊕ y in two arguments. In other words, For convenience, we denote that x, y, α, β, γ ∈ F n 2 , i. e., they are elements of the ndimensional vector space over the two-element field. In this context, x + y, x − y and −x mean x + y mod 2 n , x − y mod 2 n and −x mod 2 n respectively, where x = x 0 + x 1 2 1 + ... + x n−1 2 n−1 (the same for y ), i. e., x is a binary representation of the integer x ∈ {0, . . . , 2 n − 1}. Note that the coordinates of x ∈ F n 2 start with 0: Working with F n 2 , we denote the XOR operation by x ⊕ y. Also, we define By 0 n and 1 n we denote (0, . . . , 0) and (1, . . . , 1) ∈ F n 2 respectively. We will often use integers, e. g., 0 and 2 n−1 , instead of elements of F n 2 if n is clear from the context. There is a matrix (or rational series) approach for calculating adp ⊕ (α, β → γ), α, β, γ ∈ F n 2 . Let e 0 , . . . , e 7 be standard basis vectors of Q 8 (they are vector-columns).

Argument Symmetries of adp ⊕
First, we list several argument symmetries of adp ⊕ .
Proposition 1. The function adp ⊕ is symmetric, i. e., for any α, β, γ ∈ F n 2 , it holds that Note that all other argument permutations are combinations of these two.
Proposition 2. For any α, β, γ ∈ F n 2 it holds that in light of Proposition 1, we can add 2 n−1 to any two arguments.

Maximum of adp ⊕ (x, y → γ) for Fixed γ
In this section we give the missing proof of Theorem 3 from [LWD04]: we will prove that

Let us define
Lemma 4. For any octal word ω n . . . ω 0 , where n ≥ 0, and 0 ≤ k ≤ 7 the following holds: Proof. Let us denote by T k the 8×8 involution matrix that swaps the i and i⊕k coordinates, Note that A ωi⊕k = A ωi for even k (as an integer number, i. e., for k = 0, 2, 4, 6) and A ωi⊕k = A ωi for odd k.
Proof. Theorem 2 gives us that (0, γ) ∈ adpmax(γ). The other pairs are provided by Propositions 1, 2 and 3, since adp ⊕ has the same value for these pairs with fixed γ.
At the same time, a pair from P cannot be equal to a pair from P , since at least one coordinate of any pair from P is equal to 2 n−1 , but 0, γ, −γ = 2 n−1 .
Let us list some of their straightforward properties.
Proof. The first point directly follows from the definition. Next, Proposition 2 provides that The equality β ⊕ 2 n−1 = β ⊕ 2 n−1 completes the proof.

Recurrence Formulas for adp ⊕
A matrix approach to calculate adp ⊕ and Lemma 4 allow us to obtain recurrence formulas for adp ⊕ (α, β → γ). It is possible to rewrite the proof of Theorem 2 in terms of these formulas. First, let us denote the vector (0, x 0 , x 1 , . . . , x n−1 ) ∈ F n+1 2 by x0, i. e., in terms of integers, x0 = 2x. We define x1: x1 = 2x + 1 in exactly the same way.
Let us prove an auxiliary lemma.
Proof. In light of Proposition 1, it is sufficient to prove the statement for adp ⊕ (α0, β1 → γ1).

Simplified Matrix Form for adp ⊕ (0, γ → γ)
When calculating adp ⊕ (0, γ → γ) using Theorem 1, we only need A 0 (for bit positions where γ i = 0) and A 3 (for bit positions where γ i = 1). These matrices can be minimized to size 3 × 3 using the S-function toolkit of Mouha et al. [MVDCP11]: applying the software toolkit to remove non-accessible states and to merge indistinguishable states leads to: where A 0 and A 3 can be obtained from A 0 and A 3 by removing the last four columns (the non-accessible states) and rows, and by merging the middle two remaining rows and columns (which correspond to indistinguishable states). Note that (1, 1, 1)A 0 = (1, 1, 1)A 3 = (1, 0, 1), which will help us to minimize the size of the matrices to 2 × 2 if we "cheat" by excluding the most significant bit from the matrix product. More formally, we can obtain matrices B 0 and B 1 by removing all rows and columns from A 0 and A 3 except 0 and 3, and calculate adp ⊕ (0, γ → γ) as follows: Proof. According to Theorem 1, we can calculate adp ⊕ (0, γ → γ) by matrices A 0 and A 3 . First, A 0 x T and A 3 x T depend only on x 0 , x 3 , x 5 , x 6 , where x ∈ Q 8 . Secondly, they have a block structure where P i and Q i are matrices of size 4×4. In addition, coordinates {4, 5, 6, 7} of e 0 are zero. This means that coordinates 5 and 6 of the vector A ωi A ωi−1 . . . A ω0 e 0 , where ω i = 3γ i , Proof. Let u, v ∈ F n 2 . Since 2 n−1 |2 n , we can correctly consider modulo 2 n−1 operations. Without loss of generality, we can assume that both u, v < 2 n−1 . Otherwise, we can consider u = u ⊕ 2 n−1 instead of u, here u < 2 n−1 and u = u (mod 2 n−1 ), since Proposition 2 guarantees that adp ⊕ (a, u → u) = adp ⊕ (a, u → u ) (and the same for v).

Conclusion and Future Work
In this work we investigated some properties of adp ⊕ that are interesting for the differential cryptanalysis of ARX ciphers. We provide the missing proof of the theorem about max α,β adp ⊕ (α, β → γ) from [LWD04], and established that there are either two (for adp ⊕ (0, γ → γ) = 1) or eight (for any other cases) distinct pairs α, β on which adp ⊕ attains this maximum value. We obtained recurrence formulas for an arbitrary adp ⊕ (α, β → γ) which help to find minimum nonzero value of adp ⊕ (α, β → γ), find all γ ∈ F n 2 for which adp ⊕ (0, γ → γ) = min c∈F n 2 adp ⊕ (0, c → c), and calculate this minimum value. As with any paper that analyzes the components of a primitive (e. g., additions, rotations, and XORs, but also S-boxes or matrix multiplications), some caution is necessary when extending the results to the analysis of a full primitive. We mention the analysis of larger components and the application to a full primitive as suggestions for future work.