SAT-aided Automatic Search of Boomerang Distinguishers for ARX Ciphers (Long Paper)

. In Addition-Rotation-Xor (ARX) ciphers, the large domain size obstructs the application of the boomerang connectivity table. In this paper, we explore the problem of computing this table for a modular addition and the automatic search of boomerang characteristics for ARX ciphers. We provide dynamic programming algorithms to efficiently compute this table and its variants. These algorithms are the most efficient up to now. For the boomerang connectivity table, the execution time is 4 2 ( n − 1) simple operations while the previous algorithm costs 8 2 ( n − 1) simple operations, which generates a smaller model in the searching phase. After rewriting these algorithms with boolean expressions, we construct the corresponding Boolean Satisfiability Problem models. Two automatic search frameworks are also proposed based on these models. This is the first time bringing the SAT-aided automatic search techniques into finding boomerang attacks on ARX ciphers. Finally, under these frameworks, we find out the first verifiable 10-round boomerang trail for SPECK32/64 with probability 2 − 29 . 15 and a 12-round trail for SPECK48/72 with probability 2 − 44 . 15 . These are the best distinguishers for them so far. We also perceive that the previous boomerang attacks on LEA are constructed with an incorrect computation of the boomerang connection probability. The result is then fixed by our frameworks.


Introduction
Differential cryptanalysis [BS91] is one of the most fundamental cryptanalytic technique in symmetric cryptography. It studies the propagation of difference of the plaintexts, and establishes the relation between the difference of plaintexts and that of ciphertexts. For iterated ciphers based on S-boxes, the propagation through the S-box mainly determines the security against differential cryptanalysis. The propagation for an n-bit S-box is typically depicted by a 2 n × 2 n table T ddt , named the Differential Distribution Table (DDT). For any pair (∆ in , ∆ out ), the entry T ddt (∆ in , ∆ out ) stores the number of x ∈ F n 2 such that S(x) ⊕ S(x ⊕ ∆ in ) = ∆ out holds. The entry T ddt (∆ in , ∆ out ) in T ddt means that the input difference ∆ in of S propagates to the output difference ∆ out with probability T ddt (∆ in , ∆ out ) · 2 −n . The whole table T ddt is constructed directly by enumerating all possible values of ∆ in and x.
In many cases, there is no differential characteristic with a high probability for the entire cipher. The boomerang attack framework [Wag99] could be more suitable to the cipher. This technique aims to concatenate two short differential characteristics to form a distinguisher covering more rounds. In a boomerang attack, the target cipher E is E 1 decomposed into E 0 and E 1 , i.e., E = E 1 • E 0 , showed in Figure 1a. Suppose that there is a differential characteristic (∆ in , ∆ out ) with probability p for E 0 , and another one (∇ in , ∇ out ) with probability q for E 1 . Meanwhile, the characteristics cover r 0 and r 1 rounds respectively. Then, under the assumption that these characteristics are independent, the expected probability of the boomerang distinguisher is The number of rounds covered is r 0 + r 1 . However, pointed out by Murphy, this simple concatenation may result in an invalid distinguisher [Mur11]. Several works have studied the compatibility of two characteristics [BDD03,BK09,DKS14,Leu12]. Most of the observations are captured by the sandwich attack framework proposed in [DKS14]. In this framework, the cipher is regarded as a composition of three parts, i.e., E = E 1 • E m • E 0 , where E m typically contains short transformations, depicted in Figure 1b. Two short differential characteristics are still needed for E 0 and E 1 . For E m , a short boomerang characteristic is applied whose probability is For ciphers based on S-boxes, the computation of this probability is reduced to that of a single S-box when E m is a single round. This idea was proposed in [CHP + 18], where the switching effect was described by a new table of size 2 n ×2 n , called Boomerang Connectivity Table (BCT). It further enables the automatic search of boomerang characteristics. Similar to DDT, BCT can also be constructed by enumerating. Later, Orr Dunkelman provided an improved algorithm to construct the table in O 2 2n time for an n × n S-box [Dun18]. When BCT meets the Addition-Rotation-Xor (ARX) ciphers, everything become difficult. ARX ciphers are composed of only three operations: additions modulo 2 n , rotations and XOR operations. The rotation and XOR operation are linear operations, while the modular addition is a nonlinear operation. Following the sandwich attack framework, the simplest case is that E m consists of a single modular addition. Under this circumstance, the modular addition could be regarded as a 2n × 2n S-box, and again described by the corresponding BCT. In most cases, n is greater than or equal to 16, leading to a large S-box. Even if the value of n is 16 (a 32 × 32 S-box), the resulting execution time of computing the whole BCT would be 2 64 simple operations which is infeasible. The BCT of a modular addition was first mentioned in [CHP + 18], but no efficient algorithm was given. To the best of our knowledge, only one related work [KKS20] was published. This work proposed the first practical algorithm, following the differential analysis of S-functions [MVDP11]. However, the authors of [KKS20] did not merge the BCT of the modular addition into the Boolean Satisfiability Problem (SAT) models or Mixed Integer Linear Programming (MILP) models. They selected several differential characteristics, trying all combinations of them, and checked the validity by their algorithm. Beyond the idea of the BCT, another tool called ARXtools [Leu12] was proposed based on the theory of S-functions to study ARX constructions. This tool can be used to detect incompatibility or compute the probability of a differential characteristic as well as a boomerang characteristic, but it cannot automate the search of the characteristics.
Our contributions. In this paper, we focus on the computation of BCT, as well as its variants, and the automatic search of boomerang distinguishers on ARX ciphers. First, we notice that the entries of BCT can be computed by a dynamic programming algorithm without using the S-functions. This provides us a new perspective on the computation. We then extend our new algorithm to efficiently compute the variants of BCT, i.e., Upper BCT (UBCT), Lower BCT (LBCT), and Extended BCT (EBCT), defined by Delaune et al. [DDV20]. The time complexities are all linear to the size of the modular addition (bit-length of an addend). Second, we transform our algorithms into new computations so that they can be described by the language of SAT. Nonetheless, the existing modelling methods either target specific ciphers [CHP + 17] or require computing the whole BCT and encode it in the models [LS19], which are not workable for the huge BCT of an addition. We borrow the idea from [BV14] to overcome this difficulty. Based on the new computations and their SAT models, we propose two automatic search frameworks to find out boomerang distinguishers with high probabilities. Finally, to verify the power of the proposed technique, we apply it to the ARX-based block ciphers SPECK and LEA. Consequently, we find out several new boomerang distinguishers on SPECK. Table 1 compares our distinguishers with previous ones. For SPECK32/64, ours is the first 10-round distinguisher verified with experiments. The experimental evaluation showed that the probability is about 2 −27.31 . We also discuss our results on LEA and those from [KKS20]. Although no improved characteristic was found, a new result is obtained on the previous one. For the same boomerang switch, the estimation of the probability is 0.710802 from our framework, while the experimental evaluation gives about 0.71. It is more precise, compared with 0.661755 in [KKS20]. Furthermore, the comparison of our technique with ARXtools is carried out, showing that ARXtools performs better in computing probabilities while ours is more suitable for searching characteristics.

Addition and Subtraction Modulo 2 n
In this paper, most of the integer values during the analysis are represented as binary numbers with a fixed bit length which can contain some leading zeros. For such a binary number x, the i-th bit is denoted as x[i] and x[0] is the least significant bit. The one's complement of it is denoted as x whose length is the same as x's. For example, the binary number x = 00101 has a least significant bit x[0] = 1, and x is 11010. We use ⊞ n to denote an addition modulo 2 n and ⊟ n to refer to a subtraction modulo 2 n . When an addition or subtraction is applied to two binary strings, it means respectively the addition or subtraction of these two binary numbers. For instance, the addition of 001 and 101 modulo 2 3 is 001 ⊞ 3 101 = 1 ⊞ 3 5 = 6 = 110. For the convenience of differential analysis, the following property is widely used.
For the subtraction modulo 2 n , a trick is applied to derive a similar property: The functions carry0 n and carry1 n are defined in Definition 1 and Definition 2. The only difference lies where i is 0.
Definition 1. The carry c = carry0 n (x, y) is an n-bit number computed from two n-bit number x and y. It is defined recursively as follows.
Definition 2. The special carry b = carry1 n (x, y) is an n-bit number computed from two n-bit number x and y. It is defined recursively as follows.
We note that the (i + 1)-th bit of carry0 n (x, y), resp. carry1 n (x, y), only depends on three bits-the i-th bits of x, y and carry0 n (x, y), resp. carry1 n (x, y). A function is then defined for carry0 n and carry1 n which concentrates on the bits. Definition 3. Define a function carry : This function directly comes from Definition 1 and Definition 2. It takes two bits, x and y, as well as a carry bit, and outputs the next carry bit.

Boomerang Connectivity Table and Its Variants
For an n-bit to n-bit S-box, the diagram of how the difference propagates through the boomerang switch is shown in Figure 2. Let S −1 be the inverse of S. The BCT is defined as Given a pair of (∆, ∇), the probability that a right quartet is generated in S is given by BCT(∆, ∇) · 2 −n .
The BCT tool considers the scenario in which the boomerang switch contains only one S-box layer. However, many works [WP19, DDV20, BHL + 20] have shown that a boomerang switch with multiple rounds is possible which has more than one S-box layers. Such a switch often results in a better boomerang characteristic. In order to analyse the switching effect, variants of BCT are used in these works. These variants consider more differences besides those required by the BCT in the switch. We use the same definitions from [DDV20] which defines the variants as follows.
Definition 4 ( [DDV20]). Three variants of BCT are defined respectively as The BCT table was generalized to the case where S is a modular addition by Cid et al. in [CHP + 18]. They considered the modular addition as a 2n-bit to n-bit mapping, and redefined a BCT table for it. In this paper, we instead regard it as an S-box which maps 2n-bit input to 2n-bit output. Illustrated in Figure 3, the input consists of two n-bit values corresponding to two addends, where ∥ is the concatenation. The left half of the output is the modular addition of the addends, and the right half is still the second addend. Formally, this S-box is defined as S(L∥R) = (L + R)∥R. Differences are represented in the same form. For example, the difference between X 1 and X 2 is ∆ l ∥∆ r , which means that Thus, the definition of BCT for an S-box still works for modular addition. In the definition, we can substitute S with its actual computation, i.e., the modular addition. The result is (1) Figure 2: An S-box in the boomerang switch. Figure 3: A boomerang switch on a modular addition.
The subscript n denotes that the modulus is 2 n . This equation is the same as the one provided in [CHP + 18], but the idea can be easily extended to all variants of BCT.

Computing Boomerang Tables for A Modular Addition
In this section, we propose dynamic programming algorithms for computing the entries of BCT and its variants. Dynamic programming is a classical but significant method in algorithms. The word 'programming' refers to a tabular method [CLRS09]. A dynamic programming algorithm starts by partitioning the problem into subproblems. The subproblems can share some subsubproblems. Each subsubproblem is solved just once, and the solution is saved in a table for the next occurrence of the same subsubproblem. The algorithm then combines the solutions in the table to form a solution to each subproblem and finally to the original problem.
In most of the time, to avoid repeating similar contents, we only illustrate our detailed idea on BCT. However, we would like to stress that it is not limited to BCT but applicable to all the variants. The details for the variants are provided in the appendix. Our explanations use the symbols in Figure 3.

Construct the Dynamic Programming Algorithm
The definition of BCT for modular addition is given in Equation (1). To simplify the equation, we use c 1 , c 2 , b 1 and b 2 to refer to carry0 and carry1 in the equation which are defined as follows. (2) That is, given the values of ∆ l , ∆ r , ∇ l , and ∇ r , computing the BCT entry is equivalent to counting the number of (L, R) that satisfies  Remark 1. This property was first used by the boomerang attacks in [BDD03]. It was observed again in [CHP + 18] where it was facilitated in the MSB switch. However, the MSB switch only considered the MSB of L. Our lemma claims that the MSB of R can also be taken into account.
Then, the values of (L, R) are classified by c 1 [n − 1]∥b 1 [n − 1]∥c 2 [n − 1]∥b 2 [n − 1]. Formally and generally, for any bit length i, define a function Here, the i-bit carry0 and carry1 are extended by 1 bit following the definitions of them respectively. The superscript i of (L i , R i ) is just to denote that L i and R i are i-bit strings. Thus, we derive a new computation of Equation (2): (3) 7 The constant 4 comes from Lemma 1. Because, for every (L n−1 , R n−1 ), there are 4 difference choices of (L[n − 1], R[n − 1]) to form the values of (L n , R n ) and every one of them is valid.
In the remaining part of this section, we present a dynamic programming algorithm to compute Equation (3). Based on the idea of dynamic programming [CLRS09], it is necessary to find out a recursive expression for function f . As a base case, it is already known that For The number of the subsets is 8. Moreover, these subsets are mutually exclusive. This is because the bits c 1 We define a function g to denote the relation among consecutive bits of carries and the corresponding bits of L, R, ∆ l , ∆ r , ∇ l , and ∇ r . This function also shows the dependences among subproblems in dynamic programming.
Then, the recursive expression of function f comes out: It is feasible to enumerate all possible inputs of the function g and compute its outputs, which can be stored in a table of size 2 14 × 1. The cost to construct such a table is cheap.
To keep simplicity, we define a new function It is also computed in advance.
The final expression of f becomes In the language of dynamic programming, the problem There are two ways to construct a dynamic programming algorithm. The direct way is computing f (i + 1, v) recursively while storing each value of f in a table for later use. We adopt the other way, starting from the base case and constructing the solution to a 'bigger' problem from its subproblems until reaching f (i + 1, v), listed in Algorithm 1.
Algorithm 1 A dynamic programming algorithm to compute f .
Initialize a hash table T dp such that, for all u ∈ S BCT , T dp [u] = 0 except that T dp [0101] = 1; 3: T dp ← T ′ dp ; 10: Theorem 1. In Algorithm 1, assume Lines 7 to 8 are a simple operation. Then the time complexity is O (n). In particular, the algorithm requires 8 2 n simple operations.
This algorithm contains three loops which loop n, 8, and 8 times respectively. The table look-up in line 7 costs just constant time in modern computers. Thus, the time complexity is shown as Theorem 1. In fact, this algorithm computes all f (n − 1, v) for v ∈ S BCT . While the fourfold sum of them is BCT n (∆ l , ∆ r , ∇ l , ∇ r ), we can simply modify line 10 of the algorithm to obtain the algorithm for computing entries of BCT. The execution time of the resulting algorithm is 8 2 (n − 1) simple operations.

Optimize the Algorithm
So far, BCT n (∆ l , ∆ r , ∇ l , ∇ r ) is computed bit by bit. Given a bit-length n, the top level for loop in Algorithm 1 is fixed. We focus on the inner loops and improve them with another property of the function g.
In the beginning, we provide Lemma 2 revealing an important property of the carry. The correctness is verified directly from Definition 3. This lemma leads to the result that f (i, v) and f (i, v) can be 'merged', formally stated in Theorem 2.
Lemma 2. For x, y, c ∈ F 2 , the carry has the following equation.

Then, Equation (4) implies
Proof. A direct result from Definition 5 and Lemma 2 is Then, for function T , Note that S ′ BCT is a special subset selected from S BCT . On one hand, for every x ∈ S ′ BCT , x and x are both in S BCT . On the other hand, for every x ∈ S BCT , either x or x is in S ′ BCT . Therefore, S ′ BCT is a half of S BCT and its elements are not the one's complement of each other. This gives us the following derivation.
That is, The forms of Equation (4) and (5) are identical. The Algorithm 1 still works for Equation (5) by replacing the set and the functions with the new ones. In addition, Since the set S ′ BCT is smaller, the execution time of computing one entry of BCT is reduced to 4 2 (n − 1) simple operations.

Computation of Other Tables
At first, we provide the definitions of UBCT, LBCT, and EBCT for a modular addition, following the counterparts for an S-box.
It is clear to see that these tables are based on BCT and have one or two more restrictions on (L, R). Actually, with a little modification on the dynamic programming algorithm for BCT, we hence have similar algorithms for computing the entries of these tables. Due to the limited space, we only take LBCT as an example, showing how to modify the previous algorithm. The algorithms for UBCT and EBCT can be constructed in the same way.
After comparing Equation (8) and (1), the new restriction is The (i + 1)-th bit of c 3 still only depends on the previous bits of L, R, ∇ ′ l , and ∇ r . This implies the existence of a recursive expression for LBCT like the one for BCT. Then, the restriction is rewritten to This set is constructed by extending every element in S BCT by one bit represented for c 3 and taking any value of this bit. During the same derivation of a recursive expression like Equation (4), we define a function g LBCT , similar to function g but specific to LBCT.
As the counterpart of the function T in the previous subsection, a new function is also defined. Note that we are still able to go through all possible inputs and compute T LBCT in advance. Therefore, following the same idea in Section 3.1.1, the dynamic programming algorithm for LBCT comes out, listed in Algorithm 2.
Since the rest of the idea and the optimization are almost the same as those in Section 3.1, the detailed procedure is not explained again here. We only point out that the size of S LBCT decreases to 8. The execution time of the optimized algorithm is 8 2 (n − 1) simple operations. Moreover, for UBCT and EBCT, the corresponding sets are of size 4 and 8, respectively. The numbers of simple operations that the optimized algorithms spend are 4 2 (n − 1) for UBCT and 8 2 (n − 1) for EBCT. For more details, please see Appendix A.

Observations on the Computations
This subsection presents three observations on the computations of BCT and its variants. The observations illustrate the relation between previous results and ours as well as basic properties for constructing models in the next chapter. We only explain the observations on BCT. For LBCT, UBCT, and EBCT, they have identical results as well.
The first observation is that Equation (4) and (5) can be written as matrix multiplications. Take Equation (5) as an example here. When all the equations for f ′ (i + 1, v) Algorithm 2 A dynamic programming algorithm to compute entries of LBCT.
Initialize a hash table T dp such that, for all u ∈ S LBCT , T dp [u] = 0 except that T dp [01010] = 1; 3: Initialize a hash table T ′ dp such that T ′ dp [u] = 0 for all u ∈ S LBCT ; 5: T dp ← T ′ dp ; 10: for all u ∈ S LBCT do 12: Let u be u 0 ∥u 1 ∥u 2 ∥u 3 ∥u 4 ; for v ∈ S ′ BCT combine, they can be replaced with a single matrix multiplication as the following equation.
, and ∇ r [i] are ignored here because of the limited space. We should emphasize that each value of function T ′ depends on these four bits. The 4 × 4 matrix is determined by a 4-bit string The total number of different matrices is 16. By rearranging the precomputed table for function T ′ , a table storing these matrices is obtained. Recall that, for i = 0, we have Therefore, the computation of BCT n (∆ l , ∆ r , ∇ l , ∇ r ) can use a chain of matrix multiplications, showed as Equation (10).
In Algorithm 1, Lines 2 to 9 can then use a multiplication of a matrix and a vector instead, while the time complexity remains unchanged. This formula is almost identical to Kim et al.'s result [KKS20], except that their matrices are of size 8 × 8, larger than ours. Consequently, the execution time of their algorithm is 8 2 (n − 1) simple operations. The smaller matrices not only result in a lower time complexity, but also reduce the size of the SAT models, which is detailed in Section 4.1.
The second observation is stated in Theorem 3. Before proving this theorem, we provide 2 lemmas on the properties of g as well as prove them.
Theorem 3. Given the boolean values δ l , δ r , γ l , and γ r , for any u ∈ S ′ BCT , the sum v∈S ′
Lemma 3. Given a u ∈ S BCT and the values of δ l , δ r , γ l , and γ r , if there exists values w ∈ S BCT and a, b ∈ F 2 such that g(u, w, a, b, δ l , δ r , γ l , γ r ) = 1, then, for any boolean values x and y, there exists a v ∈ S BCT such that g(u, v, x, y, δ l , δ r , γ l , γ r ) = 1.
Proof. Let u = u 0 ∥u 1 ∥u 2 ∥u 3 and v = v 0 ∥v 1 ∥v 2 ∥v 3 . Because of the premise that u, v ∈ S BCT , it means that u 0 ⊕ u 1 ⊕ u 2 ⊕ u 3 = 0 and v 0 ⊕ v 1 ⊕ v 2 ⊕ v 3 = 0. According to Definition 5, the equation for v can be rewritten as It can be further expanded with Definition 3. The resulting equation is very long, so we are not able to put it here. It is only composed of boolean variables and logical operations, which can be analysed by tools of boolean functions.
With the help of the BooleanFunction module in the SageMath computer algebra system [Sag20], we simplify the left side of Equation (11) to x(u 0 + u 1 + u 2 + u 3 ) plus a boolean polynomial without variables x and y. Since u 0 + u 1 + u 2 + u 3 is 0, all the x and y are cancelled in the equation. It means that x and y have no impact on the satisfiability of Equation (11). Thus, if there exists x = a and y = b such that Equation (11) satisfies, then, for any boolean values x and y, it remains satisfied. Note that the satisfiability of Equation (11) implies a value of v ∈ S BCT . Then the lemma is proved.
Based on Lemma 3 and Lemma 4, Theorem 3 is proved.
Proof. (Theorem 3) The proof starts by expanding the sum to an expression about g.
If there exists x = a and y = b such that this sum equals 1, then it means that there must be one and only one v ∈ S BCT such that g(u, v, a, b, δ l , δ r , γ l , γ r ) = 1. Then, with Lemma 3, we would know that, for any values of x and y, v∈S BCT g(u, v, x, y, δ l , δ r , γ l , γ r ) = 1.
The last observation is that, given a u ∈ S BCT and the values of δ l , δ r , γ l , and γ r , if v∈S ′ BCT T ′ (u, v, δ l , δ r , γ l , γ r ) is 4, then T ′ (u, u, δ l , δ r , γ l , γ r ) must be larger than 0. This is directly observed from the table of T ′ .
Furthermore, when considering the sum Otherwise, s i is less than 4s i−1 . Whether this happens or not depends on the values of

Automatic Search of Boomerang for ARX Ciphers
In this section, we propose the automatic search technique for boomerang distinguishers on ARX ciphers. We start from modelling the switch effect on a single modular addition.
The key problem is how to construct SAT/SMT models for BCT and its variants. A natural approach would be using an SMT model describing Algorithm 1. The power of SMT makes this possible. However, the value of an entry can be very large, exceeding the limitation of most of the SMT solvers. If one tries to take the binary logarithm of the value, he/she would encounter that it seems impossible, since the corresponding formula (Equation (10)) is a chain of matrix multiplications. Moreover, the possible values form an extremely large subset of {0, 1, 2, . . . , 4 n }. Thus, it is impractical to list all the possible values and encode them like any other models of ciphers based on S-boxes.
As a result, it requires us to develop new models. In the beginning, a model is proposed to describe all non-zero entries in the tables. With a heuristic strategy, this model is Algorithm 3 A dynamic programming algorithm to check an entry in BCT.

2:
Initialize a hash table T dp such that, for all u ∈ S BCT , T dp [u] = F alse except that T dp [0101] = T rue; Initialize a hash table T ′ dp such that T ′ dp [u] = F alse for all u ∈ S BCT ;

5:
for all u ∈ S ′ BCT do 6: T dp ← T ′ dp ; improved. We should emphasize that our new models give us neither the value of the entry nor the probability. They just encode the information: 'non-zero'. That is, they merely restrict a boomerang switch to a valid switch. We then merge these models into our automatic search frameworks to find out boomerang distinguishers.

SAT Models for Any Non-zero Entries
We still consider the case for BCT, but the idea can also be applied to LBCT, UBCT, and EBCT. According to Equation (6), BCT n (∆ l , ∆ r , ∇ l , ∇ r ) ̸ = 0 implies that there exists a v ∈ S ′ BCT such that f ′ (n − 1, v) ̸ = 0. On account of the recursive expression of function f ′ , it further requires that, for any i ∈ [0, n − 1], there exists a v ∈ S ′ BCT such that f ′ (i, v) ̸ = 0. Formally, the propagation of this property is computed by is not zero as well. The algorithm to check whether an entry in BCT is zero or not is listed in Algorithm 3.
Since Algorithm 3 only contains boolean variables and bit operations, it is convenient to construct the corresponding SAT model. Lines 4 to 9 are transformed into four expressions as follows.
Note that, the bigger the set S ′ BCT is, the more expressions are created in the model. Meanwhile, the size of S ′ BCT matches the size of the square matrix in Equation (9), and thereby the smaller matrices are better. Finally, the model is supposed to include the non-zero entries, so Lines 10 to 13 are modelled by

A Heuristic Strategy
One of the disadvantages of the model in Section 4.1 is that it contains too many entries with small values. In cryptanalysis, an attack with high probability is expected. We then decide to make the model to describe only the entries with sufficiently large values. This idea comes from the concept of a partial differential distribution table in [BV14]. Here, our model in this subsection can be regarded as a SAT model for a partial boomerang connectivity table.
As we have already explained in Section 3.3, the largest entries in BCT require that s i = 4s i−1 for all i ∈ [1, n − 1]. Conversely, if we restrict k out of n − 1 these equations to be satisfied, the resulting entry would be 4 k+1 multiplying some value q. In other words, the value of the entry would have the form of 4 k+1 q where q is unknown. Currently, we do not have enough knowledge about q. However, our experiments show that this restriction always forced the model to return an entry with a value around 4 k+1 . The research on q is left to future work.
Note that, based on the second and third observations in Section 3.3, BCT . Therefore, based on the model in Section 4.1, we add 4(n − 1) more boolean variables to indicate During the automatic search, we do not directly set the value of k but restrict its lower bound. That is, an integer variable is created for k and a new constraint k ≥ k lb is added. The larger k lb is, the smaller part of BCT is contained in the model. However, if k lb is too small, the model often returns a small entry in BCT, which is observed in our experiments. While k lb is too large, the model may not contain an appropriate entry. It then implies that there exists an optimal k lb . k lb now replaces k, becoming the tweakable parameter. It is set to be a small value and increases steadily until a desired BCT entry is found. For LBCT, UBCT, and EBCT, the above idea still works. Our suggestion on initial values of the lower bounds are listed in Table 2.

Our Automatic Search Frameworks
Recall that, in boomerang attacks, a cipher E is decomposed as E 1 • E m • E 0 . The models for E 0 and E 1 are identical to the differential models. In ARX ciphers, E m is made up of only modular additions, rotations and XOR operations. Each modular addition is regarded as a special S-box. Constructing the model of E m is similar to constructing that for ciphers based on S-boxes. It is feasible to handle dependences between modular additions with our models of BCT and its variants. We do not elaborate the model of E m here, since it relies on the specification of the cipher. The details are left to the next chapter. In this subsection, our main concentration is how to utilize the models in Section 4.2, although they lack the exact probabilities. We propose two frameworks.
The first one is to search for a boomerang characteristic with a high probability. For ciphers based on S-boxes, the objective function can be the product of the probabilities of E 0 , E 1 , and E m (or the sum of their binary logarithms). By optimizing it, the model returns a boomerang characteristic with the optimal probability. However, our model for E m only tells us that the probability is probably high according to Section 4.2. To make use of this information, we propose a two-step framework: 1. Set the product of the probabilities of E 0 and E 1 as the objective function and maximize it; 2. Compute the probability of E m according to the information provided by the boomerang characteristic and then obtain the total probability.
The parameter k lb is tweaked to find a better boomerang characteristic. For every different k lb , the above two steps are repeated. Specifically, k lb = n − 1 gives us an E m with probability 1. If k lb = 0, the model returns any valid boomerang characteristic. Note that, when k lb is approaching to n − 1, the limited choices of E m could cause E 0 and E 1 away from the optimal ones. This is another reason why we recommend to start from a small k lb in the previous subsection. This framework is suitable for searching long boomerang characteristics or for ciphers with a complicated E m . In this paper, we show an example on LEA [HLK + 14].
The second framework aims to take advantage of the clustering effect. The complete model is the same as the one in the first framework. It has three stages, listed below.
1. The objective function is still the product of the probabilities of E 0 and E 1 . Set k lb to 0 and maximize the objective function to obtain a boomerang characteristic. Let the optimized value of the objective function be p max . It is an upper bound on the probabilities of boomerang characteristics in this framework.
2. Set k lb to a proper value and the value of the objective function to be p max − t.
Let the SAT solver to enumerate all solutions in this model. With the help of a hash table, it is easy to count the solutions by the input difference of E 0 , ∆ in , and the output difference of E 1 , ∇ out . In other words, the index of the hash table is (∆ in , ∇ out ), and the value is the number of boomerang characteristics in accordance with these differential values.
3. Take the record with the largest value in the hash table. This record means that, given k lb and t, the largest cluster has input difference ∆ in for E 0 and output difference ∇ out for E 1 . Set k lb to 0 or a smaller value than the one at Stage 2. Set the value of the objective function to be in [p max − t p , p max ], where t p is a value less than or equal to the t at Stage 2. Then, let the SAT solver to enumerate all solutions again under the given ∆ in and ∇ out . This will give us the probability of the cluster.
The parameters k lb and t are supposed to be tweaked to obtain a better cluster, while the parameter t p only affects the precision of the probability. At Stage 2, they affect the total number of solutions to be enumerated. More solutions lead to a more accurate result. Nonetheless, more solutions also cause a higher cost of time. Thus, they should be configured with proper values which are suitable to the hardware and the target cipher. At Stage 3, the model is given the exact values, ∆ in and ∇ out . This offers more information about the model to the SAT solver and allows the solver to enumerate much more solutions. Therefore, we are able to use a smaller k lb and t p , which leads to a more precise estimation of the probability. The precision of the estimation is the advantage over the first framework.
Because of the enumerations, this framework requires more time. Thus, it is applicable to short boomerang trails or ciphers with a simple E m . In this paper, we apply it to SPECK [BSS + 13].

Comparing the Technique with S-functions
This subsection compares our technique with those based on S-functions since they are very similar. The concept of S-functions was proposed by Mouha et al. [MVDP11] to describe several operations in ARX constructions, which is the core part of their general framework for analysing the ARX-based ciphers.
The boomerang characteristics of ARX-based ciphers can indeed be analysed by the original technique of S-functions, as Kim et al. have already shown [KKS20]. This technique requires graph theory, representing the S-functions as a graph, and counting the paths by matrix multiplications. Appendix C shows an example of this process. In contrast, the idea in Section 3.1 directly constructs the algorithm from the perspective of dynamic programming, although it starts from the same property used by S-functions-the (i + 1)-th state only relies on the i-th state. Our idea is a new view of the property and can be easily extended to the variants of BCT. Furthermore, this approach is natural, because the property perfectly matches the key ingredients in dynamic programming, i.e., optimal substructure and overlapping subproblems [CLRS09]. This dynamic programming algorithm also has a graph representation [CLRS09] similar to the S-functions' where the function g (Definition 5) is the same as the transition function in S-functions. But our function g presents how subproblems depend on one another in the algorithm. Thus, these two approaches are from two different perspectives while reaching the same result. Moreover, to reduce the sizes of the matrices, Mouha et al. proposed a new algorithm to find out equivalent states, while ours reveals mathematical properties behind the matrices and then generates smaller matrices. These properties were not published before.
In addition, it should be pointed out that the property stated in Lemma 2 can explain more observations about the modular addition. We take the result of the Finite State Machine (FSM) reduction algorithm from Mouha et al. [MVDP11] as an example. The key equations in the differential analysis are where all the variables are represented in n-bit strings. Then, it can be derived that  Rooted in S-functions, ARXtools is an improved and automatic tool to study the differential properties in ARX constructions. The previous work [Leu12] demonstrates the power of this tool in detecting incompatibility and computing the probability of a characteristic but also points out that it cannot automate the search of the characteristics. Specifically, several incompatible boomerang attacks are detected automatically, while the search of boomerang characteristics with high probabilities is absent. Later, another work by Leurent [Leu13] presents a rebound-like approach to construct differential characteristics with the help of the ARXtools. However, to the best of our knowledge, this approach cannot be adapted for boomerang characteristics. On the other hand, when computing the probability of a complicated boomerang switch, ARXtools has some advantages over ours. For example, according to [Leu12], the time of solving an S-function is proportional to the word length. Nonetheless, solving with SAT solvers could be much slower, since SAT is NP-complete. ARXtools is able to count all the solutions while ours have to discard a lot of possible solutions following the partial tables due to the speed of SAT solvers. Moreover, S-functions and ARXtools can be generalized to more cases, but currently our modelling technique is specific to BCT and its variants.
The automatic search technique in this paper creates SAT models in the benefits of the hidden properties within the matrices. Although the S-functions analysis gives the same result as Equation (10) and ARXtools is able to generate the corresponding finite automaton, they both ignore the properties in Section 3.3. In Appendix D, we provide the results generated by ARXtools to demonstrate this claim. Therefore, at this moment, we have not found a way to combine these techniques, which is left to future work.

Applications
In this section, we apply our technique in Section 4 to SPECK [BSS + 13] and LEA [HLK + 14]. SPECK is the simplest ARX cipher but still secure up until now. LEA is an ARX cipher approved by the Korean Cryptographic Module Validation Program, and has a more complicated structure. Boomerang attacks have been considered by the designers of LEA. Thereby LEA is a proper target for showing the advantage of our technique.
We briefly describe the specifications of the ciphers. Since we do not facilitate properties of the key schedules, their details are omitted. Then, we explain the SAT models of them and provide the results of the automatic search. In all our experiments, the CPU is Intel ® Xeon ® Silver 4110 CPU @ 2.10GHz, the SAT solver is OR-Tools [Goo21], and the programming language is C++. OR-Tools has ranked first in most of the categories in the MiniZinc Challenge 1 since 2018. Its C++ library is easy to use. We can conveniently control the number of threads, parameters, and various conditions in the code. Besides, it is not necessary to transform logical expressions into their algebraic normal forms. All the source codes are available at https://github.com/0NG/boomerang_search.

Applications on SPECK
SPECK is a family of lightweight block ciphers which are all ARX ciphers. For word size n ∈ {16, 24, 32, 48, 64}, each variant is identified by SPECK2n/mn, where 2n is its block size and mn is the key size. The value m depends on n. The round function of SPECK is shown in Figure 4. The rotation constants are α = 7 and β = 2 for SPECK32/64, while α = 8 and β = 3 for the others.
The application on SPECK is straightforward. SPECK is decomposed into three parts, such that the middle part, E m , only contains a modular addition. The other parts are described by the differential models. The model of E m corresponds to the model for the BCT in Section 4.2. Nevertheless, we notice that it is feasible to add one more round to E m , that is, a 2-round switch, see Figure 5. It gives us a much better result than the 1-round switch. For ciphers based on S-boxes, the probability for switches with multiple Figure 4: SPECK round function. rounds is fully studied in [WP19, BHL + 20]. Since the modular addition is viewed as an S-box and each round of SPECK has only one addition, the results on S-boxes are still correct here. Namely, the probability of the 2-round switch in Figure 5 is where γ, δ, γ ′ , and δ ′ are determined by ∆x i+1 and ∇x i+1 . In other words, the property of the first modular addition is described by the UBCT, and the second one is described by LBCT, since they perfectly match Equation (7) and Equation (8). When the SAT model is constructed, variables are created for ∆x i+1 and ∇x i+1 without fixing their values. This allows the SAT solver to enumerate valid values of them, i.e., to do the summation during searching. Later, the complete model is simply the concatenation of two differential models and the models for the UBCT and LBCT with dealing the linear operations among them. We then apply the second framework in Section 4.3 to search for boomerang distinguishers using the parameters in Table 3. Specifically, in the first step, SPECK32/64 has p max = 2 −20 , and SPECK48/72 has p max = 2 −40 . This step is very fast with 10 threads, costing 4 seconds and 34 seconds   Table 4. Two examples of the boomerang characteristics are also listed in Appendix B. Finally, we choose the best ones of them as our main results. However, these improved distinguishers could not result in better key-recovery attacks than previous results. It is because that the 2-round attack from Dinur [Din14] can turn an n-round differential distinguisher into a (n + 2)-round key-recovery attack while, to the best of our knowledge, a powerful attack like this is absent for boomerang distinguishers. Thus, the differential distinguishers in Table 1 still lead to better key-recovery attacks than ours. The distinguisher on SPECK32/64 is practical, so we implement it to verify the correctness. The program randomly selects a pair of plaintexts, following the boomerang trail to obtain a new pair of plaintexts. Then, it checks the difference of the new pair.
The key of SPECK is also randomly selected. We ran the experiments 2000 times for each characteristic. As a result, it reported that all the probabilities are slightly larger than the estimations from our automatic search.

Applications on LEA
LEA is a block cipher with a 128-bit block, whose key size is 128, 192 or 256. It is an ARX cipher with a Feistel structure. The states of LEA are separated into four 32-bit words. Meanwhile, the operations are 32-bit operations. Figure 6 is the diagram of the round function of LEA.
Let E m be 1-round LEA, see Figure 7. Since the XOR of the round keys has no impact Figure 6: LEA round function.  . Later, The SAT solver will select valid values for them. When we set the solver to the enumeration mode, it will go through all possible values of them and give us the probability. In [KKS20], the authors claimed that the probability can be computed from three BCTs. However, as our diagram shows, it is impossible to compute such a switch with three BCTs, due to the missing information about ∇X i [1] and ∇X i [2]. To demonstrate our computation, we set up a SAT model to compute the probability following our idea above. Note that it is infeasible to directly enumerate all possible values of ∇X i [1] and ∇X i [2]. Thanks to our heuristic strategy in Section 4.2, the parameter k lb can be set to some value larger than 0, which can eliminate a lot of negligible probabilities to speed up the computation.
The target switch in the comparison is a boomerang switch reported in their work. The input difference of the switch is ∆ = 28000200∥0002a000∥00080000∥00000001, and the output difference is ∇ = 80000004∥80400004∥80400014∥80400010. By setting k lb = 16, the SAT solver returned the probability 0.708521 in about 17 hours. We then tried k lb = 8, and received the result 0.710802, which is only a slight improvement but costs around 33 hours, so k lb = 16 is enough. To compare the results, we also implemented a program to get the approximating probability of the same switch by randomly selecting 2 30 plaintexts. The program showed that the probability is about 0.71. Thereby, we believe our model and computation are correct. In contrast, the estimation in [KKS20] is 0.661755. Thus, the boomerang distinguishers reported in this work need to be checked carefully. The security of LEA against boomerang attacks is supposed to be further studied.
In fact, the 1-round switch of LEA is an S-function, which can also be analysed by ARXtools. The detailed analysis is explained in Appendix C. The results of the experiments are listed in Table 5. As expected, ARXtools returned the result in a much shorter time than the SAT solver, which is consistent with the discussion in Section 4.4. This is the main advantage of ARXtools over our techniques in practical analysis. However, when it comes to the search of a characteristic, our techniques still work, but ARXtools cannot. The last but important observation is that our technique can reach a good estimation approaching the result of ARXtools in a practical time, which indicates the reliability of the technique.
To search for a boomerang distinguisher for LEA, the first framework in Section 4.3 is chosen. We have also tried the second framework, but the complexity of LEA made it infeasible to enumerate all the characteristics. The best 15-round boomerang distinguisher found is exactly the one reported in [KKS20]. However, for 16-round LEA, no boomerang trail with probability higher than 2 −128 was found. When the model takes the differences given by [KKS20], the upper bound p max is only 2 −132 .

Conclusion
In this paper, we concentrate on the automatic search of boomerang distinguishers on ARX ciphers. We study the computation of the boomerang connectivity table and its variants from the perspective of dynamic programming. Based on the algorithms, we use boolean variables and logical expressions to describe the tables. Then several models for the propagations in boomerang switches of ARX ciphers are provided. We further propose two frameworks powered by SAT solvers, facilitating our models of the tables. These frameworks for the first time provide the methods to search for boomerang distinguishers on ARX ciphers. We take SPECK and LEA as the targets in our applications. Several boomerang distinguishers for SPECK are found, which have not been reported before. Although no new boomerang distinguisher on LEA is found, we compare our result with the previous result, showing that our method is more precise.
However, there are some disadvantages of our techniques. First, the computations of BCT and its variants are not convenient to be modelled. Second, the model for a switch with more than 2 rounds are too large, even for SPECK. It is infeasible to handle this huge model in SAT solvers. Third, we can only find out distinguishers with large probability by our heuristic strategy. The optimal distinguishers are still unknown. These are left to future work. Finally, computing the probability of a boomerang switch is much slower than ARXtools. How to combine both of the these tools/techniques is a challenging work.

Acknowledgement
We would like to thank the reviewers for their constructive comments and suggestions, which helped us improve the quality of the paper. In particular, one of the anonymous reviewers reminded us that ARXtools can compute the probability of the 1-round switch in LEA very efficiently. This not only drove us to improve the experiments but also helped us better understand the differences between these techniques. Meanwhile, another researcher Zhongfeng Niu noticed the wider applications of Lemma 2 beyond BCT. We want to thank him for helping us improve the comparison in Section 4.4 and the editorial quality of the paper. This work is supported by the National Key Research and Development Program of China (2022YFB2701900), the Natural Science Foundation of China (62272362, U19B2021, 62032014), and the Fundamental Research Funds for the Central Universities.

A Dynamic Programming Algorithms for The Variants of BCT
In this section, we provide the details about the dynamic programming algorithms for the variants of BCT and their optimization.
Then, since Lemma 1 still holds here, we have Similar to the recursive expression of the function f , the expression of f LBCT is This result has the same form as the one for BCT, but the functions are replaced with the new ones. In fact, during this derivation, the only impact of the new restrictions is changing the symbols of the functions and the set. This allows us quickly writing down the expressions for UBCT and EBCT. Next, with Lemma 2, we also have g LBCT (u, v, . It leads to where S ′ LBCT = {00000, 00110, 01010, 01100, 00001, 00111, 01011, 01101}, This is because that Theorem 2 only relies on the invariant of the function g under the one's complement of u, v, L[i], and R[i], and this invariant still holds for g LBCT . Again, it enables us to quickly obtain the optimization for UBCT and EBCT. Finally, the dynamic programming algorithm for LBCT is listed below. The difference between this algorithm and the one for BCT is the last code about summing up T dp [u]. It is due to the special subset Q LBCT . For UBCT and EBCT, this code conforms to the corresponding special subset and the others remain almost unchanged.

B Examples of Boomerang Characteristics in SPECK
Two characteristics output by the SAT solver are listed below for SPECK32/64 and SPECK48/72, respectively.

C Analysis by S-function
In this section, we present the S-functions analysis of the 1-round switch of LEA in Section 5.2. To simplify the explanation, the linear operations are ignored as Figure 8 shows.

C.1 Definition of the Probability
Let E m be the short cipher in Figure 8. The probability of this switch is To use the S-functions technique, this formula should be rewritten in detail. Given X 1 , ∆, and ∇, we calculate ∆Z using T Then, the probability of the switch is redefined as p m = 2 −128 · # X 1 ∈ F 128 2 | ∆Z = ∆ .

C.3 Computing the Probability
In order to derive the formula for computing the probability, a graph is generated from the S-function (Equation (62) Figure 9 shows an example of a bipartite graph. Note that, for any bit position i, there are 2 12 values for S[i], which generates 2 12 different vertices. According to [MVDP11], the probability is equal to 2 −128 times the number of the paths from a state whose values are all zero to any of the 2 12 vertices S Define a 1 × 2 12 matrix L = [1 1 . . . 1] and a 2 12 × 1 matrix C = [1 0 . . . 0] T . Then, the formula of the probability is Finally, the FSM reduction algorithm in [MVDP11] helps to reduce the number of states. ARXtools can automate the above process following [Leu12] and the instructions on its webpage (https://who.rocq.inria.fr/Gaetan.Leurent/arxtools.html). Our code is available at https://github.com/0NG/boomerang_search. The program successfully obtained the probability of the switch in Section 5.2 in 34s, and reported that the exact probability is about 0.710850.