Throwing Boomerangs into Feistel Structures Application to CLEFIA , WARP , LBlock , LBlock-s and TWINE

. Automatic tools to search for boomerang distinguishers have seen significant advances over the past few years. However, most previous work has focused on ciphers based on a Substitution Permutation Network (SPN), while analyzing the Feistel structure is of great significance. Boukerrou et al. recently provided a theoretical framework to formulate the boomerang switch over multiple Feistel rounds, but they did not provide an automatic tool to find distinguishers. In this paper, by enhancing the recently proposed method by Hadipour et al., we provide an automatic tool to search for boomerang distinguishers and apply it to block ciphers following the Generalized Feistel Structure (GFS). Applying our tool to a wide range of GFS ciphers, we show that it significantly improves the best previous results on boomerang analysis. In particular, we improve the best previous boomerang distinguishers for 20 and 21 rounds of WARP by a factor of 2 38 . 28 and 2 36 . 56 , respectively. Thanks to the effectiveness of our method, we can extend the boomerang distinguishers of WARP by two rounds and distinguish 23 rounds of this cipher from a random permutation. Applying our method to the internationally-standardized cipher CLEFIA , we achieve a 9-round boomerang distinguisher which improves the best previous boomerang distinguisher by one round. Based on this distinguisher, we build a key-recovery attack on 11 rounds of CLEFIA , which improves the best previous sandwich attack on this cipher by one round. We also apply our method to LBlock , LBlock-s , and TWINE and improve the best previous boomerang distinguisher of these ciphers.


Introduction
Boomerang analysis, initially invented by Wagner [Wag99], has been under significant development over the last years.Recent progress includes a theoretical framework to evaluate boomerang switches in SPN ciphers as well as automatic tools to search for sandwich distinguishers.For instance, Hadipour et al. [HBS21] introduced a tool to search for sandwich distinguishers taking the switching effect into account for multiple rounds.They applied their tool to significantly improve the rectangle distinguishers for SKINNY and CRAFT.Almost at the same time, Delaune et al. [DDV20] introduced another tool to discover sandwich distinguishers that handles the probability computation of the middle part automatically and applied their tool to SKINNY.Other works [QDW + 21, DQSW21] improved these methods further to identify sandwich distinguishers which take the keyrecovery phase into account for linear key schedules.However, to the best of our knowledge, all previous works focus on SPN ciphers, particularly those with linear key schedules.In contrast, Feistel structures, an important category of block ciphers, have not been analyzed well by these new methods.Although Boukerrou et al. [BHL + 20] proposed a theoretical framework to compute the probability of boomerang switches in Feistel structures very recently, they do not provide a tool to search for distinguishers.
For CLEFIA, we not only improve the probability of the best previous sandwich distinguisher of this cipher remarkably but also extend it by one round by introducing a 9-round sandwich distinguisher with a probability much higher than 2 −n .Moreover, we provide the first practical distinguisher for 7 rounds of CLEFIA which can be experimentally verified.Due to the high importance of CLEFIA, building upon our 9-round sandwich distinguisher, we also provide a key-recovery attack on 11 rounds of CLEFIA, which improves the previous best sandwich attack by one round.We also apply our tool to TWINE, LBlock, and LBlock-s.In all cases, we improve the best previous sandwich distinguishers.Table 1 summarizes our results.For all applications, we have identified several practical reduced-round distinguishers that we have verified experimentally.The source code of our tool for finding distinguishers and the experimental verification are publicly available in the following Github repository: https://github.com/hadipourh/comebackOutline We review the background on boomerang analysis as well as the previous works and recall the theoretical framework to formulate the probability of boomerang distinguishers in Section 2. Next, in Section 3, we introduce our search method for boomerang distinguishers, where we give an overall view of our method and clarify its main difference from the previous methods.Then, in Section 4, Section 5, Section 6, and Section 7 we demonstrate the utility of our method to improve boomerang analysis of the block ciphers WARP, CLEFIA, TWINE, LBlock, and LBlock-s.We conclude in Section 8.

Boomerang and Rectangle Distinguishers
Wagner introduced the boomerang attack at FSE 1999 to exploit two short differentials with high probability [Wag99].In this attack, we split the targeted cipher E into two parts E = − − → ∇ 3 with a high probability p and q, respectively.Then, the two differentials, referred to as upper and lower differentials, are combined as shown in Figure 1 in an adaptively-chosen-plaintext-and-ciphertext setting (ACPC) to distinguish E from a random permutation using algorithm 1.Assuming that the two differentials are independent, the entire probability of the boomerang distinguisher is estimated by p 2 q 2 .The plaintexts ((P 1 , P 2 ), (P 3 , P 4 )) satisfying the boomerang condition are called right quartets.As an n-bit random permutation satisfies the condition with probability 2 −n , we require p 2 q 2 ≫ 2 −n .In that case, the number of required adaptively chosen plain-and ciphertexts to obtain at least one right quartet is approximately 4 • (pq) −2 and the data/time complexity of constructing a boomerang distinguisher is in O (pq) −2 .
To remove the requirement a decryption oracle, Kelsey et al. [KKS00] proposed the amplified boomerang attack.This attack was further refined by Biham et al. [BDK01]  and called rectangle attack.In this attack, the targeted cipher can be distinguished from a random permutation by querying enough quartets ((P 1 , P 2 ), (P 3 , P 4 )) with P 1 ⊕ P 2 = P 3 ⊕ P 4 = ∆ 1 and verifying whether the corresponding ciphertexts ((C 1 , C 2 ), (C 3 , C 4 ))

Algorithm 1: Boomerang Distinguisher
Input: Encryption and decryption algorithms denoted by E k , D k respectively Output: Distinguishing the targeted cipher from a random permutation 1 Generate (pq) −2 different pairs of plaintexts (P 1 , P 2 ) such that The underlying oracle is the cipher E. 8 return The underlying oracle is a random permutation.satisfy 1).If ∆ 1 E0 − − → ∆ 2 holds with probability p, then x 1 ⊕ x 2 = x 3 ⊕ x 4 and thus x 1 ⊕ x 3 = x 2 ⊕ x 4 is satisfied with probability p 2 .Assuming that these differences are equal to ∇ 2 , which happens with probability 2 −n , then C 1 ⊕ C 3 = C 2 ⊕ C 4 = ∇ 3 holds with probability q 2 .As a result, the probability of getting a right quartet ((C 1 , C 3 ), (C 2 , C 4 )) is 2 −n p 2 q 2 .In contrast, a random permutation generates a right quartet with probability 2 −2n .Hence, we can distinguish E from a random permutation if p 2 q 2 ≫ 2 −n .To produce 2 n (pq) −2 quartets of ciphertexts, we need 2 n 2 (pq) −1 plaintext pairs, so we have to encrypt 4 • 2 n 2 (pq) −1 different chosen plaintexts.Although we have to check O(2 n (pq) −2 ) ciphertext quartets, by using a hash table the time complexity of a rectangle distinguisher can be reduced to O(2 n 2 (pq) −1 ).Thus, the data and time complexity of a rectangle distinguisher is in O(2 n 2 (pq) −1 ).In practice, the dependency between the upper differential trail of E 0 and the lower differential trail of E 1 has a significant (positive or negative) impact on the actual probability of the resulting boomerang distinguisher.The importance of this effect was shown in followup studies [BK09, Mur11].To formalize this dependency between the upper and lower differentials, Dunkelman et al. [DKS10, DKS14] introduced the sandwich attack.In this attack, the cipher E is divided into three parts as depicted in Figure 1 where E m is the middle (inner) part that includes the dependency between the upper and lower differential trails.E 0 and E 1 are also referred to as the outer parts of sandwich distinguishers.The entire probability of a sandwich distinguisher is estimated by p 2 q 2 r, where the probability r = r(∆ 2 , ∇ 3 ) of the middle part can be calculated as Because the intermediate differences ∆ 2 and ∇ 3 can take arbitrary values in the sandwich distinguisher, we can consider the clustering effect.Therefore, a more accurate formula to compute the probability of sandwich distinguisher is: where

Boomerang Switch
Since the introduction of the sandwich attack, there have been attempts to formulate the probability of the middle part E m , which is also called the boomerang switch.Definition 1 (DDT).Let S be a function from The FBCT can be used to compute the probability over one round of boomerang switch in a Feistel structure.For fixed ∆ and ∇ as depicted in Figure 2, the probability of a returning boomerang over 1 round of a Feistel structure is equal to 2 −n • FBCT(∆, ∇).
The entry located in row ∆ and column ∇ of the FBCT is the number of times the second-order derivative of S becomes zero at point (∆, ∇).Moreover, FBCT(∆, 0) = FBCT(0, ∇) = 2 n for all ∆ ∈ F n 2 and ∇ ∈ F m 2 , corresponding to the ladder switch [BK09] and FBCT(∆, ∆) = 2 n for all ∆ ∈ F n 2 , corresponding to the Feistel switch [Wag99].Let denote the set of valid inputs satisfying the differential transition ∆ 1 S − → ∆ 2 .Then, the FBCT can be reformulated [BHL + 20]: Assuming that ∆ in Figure 2 is fixed and ∇ is distributed uniformly and has not been affected by the upper differential trails, the probability of a returning boomerang is: which is the same as the probability calculation according to the traditional boomerang framework, p 2 q 2 .It can be shown in a similar way that when ∇ is fixed and ∆ is independent and uniformly distributed, the probability of a returning boomerang can be calculated based on the p 2 q 2 formula.The differences propagated from the upper and lower trails through the middle part are referred to as the upper and lower crossing differences.
Accordingly, the boundaries of E m are where the lower and upper crossing differences become uniformly distributed, which mainly depends on the diffusion layer of the targeted cipher as well as the number of active positions in the input/output differences of the middle part.Analogous to the differential uniformity, the boomerang uniformity of an S-box in Feistel structures is defined as follows [BHL + 20]: which should be small to harden a design against boomerang-like attacks.
To calculate the boomerang switch over multiple rounds, the Feistel boomerang difference table (FBDT) is needed.This table is analogous to the UBCT (Upper BCT or boomerang difference table (BDT)) [WP19] and the LBCT (Lower BCT) [SQH19] in the BCT framework.
Definition 3 (FBDT [BHL + 20]).Let S be a function from F n 2 to itself and (∆, δ, ∇) ∈ (F 2 ) 3 .The three-dimensional Feistel boomerang difference table (FBDT) is defined as follows: Now, we recall the ladder switch, one of the most important switching effects that plays a vital role in our automatic search for sandwich distinguishers.According to Equation 1, if ∆ 2 , which is propagated from the upper differential transition through the middle part, is zero, then r, the probability of boomerang switch, is one.This also happens if ∇ 3 , which is propagated from the lower differential transition through the middle part, is zero.Now, assume that the upper and lower crossing differences are propagated with probability one over the middle part.By generalizing the previous argument, it can be seen that if a certain S-box in the middle is activated by at most one of the upper and lower crossing differences, it is "free", i.e., it does not affect the probability of the middle part.In other words, the probability of the boomerang switch only depends on the common active S-boxes between the upper and lower crossing differences over the middle part.Thus, the probability p 2 q 2 r of the sandwich distinguisher is determined as follows: p and q depend on the number of active S-boxes in E 0 and E 1 , while r is depends on the number of common active S-boxes between the upper and lower crossing differences in E m .As the cost of active S-boxes in the outer parts, E 0 and E 1 , is higher than the cost of common active S-boxes in the middle, we can minimize an adequately weighted sum over these active S-boxes to find a sandwich distinguisher with a high probability.

Our Method to Find Distinguishers
Our strategy to find sandwich distinguishers is based on Mixed-Integer Linear Programming (MILP) modeling and divided into three phases.First, we identify appropriate upper and lower truncated differential trails.To do this, we optimize the number of active S-boxes in E 0 and E 1 as well as the number of common active S-boxes in E m .Next, these truncated characteristics are instantiated by concrete differential trails.Finally, by fixing the differences in 4 positions, the input of E 0 , the input and output of E m , and the output of E 1 , we compute p, q, r separately to derive the probability p 2 q 2 r of our distinguisher.
The main difference between our search method and the previous models [HBS21, DDV20] lies in the first step: while these methods utilize a standard truncated model to encode the propagation of truncated differential characteristics in the outer and inner parts of the sandwich distinguisher, following the nature of BCT or FBCT frameworks, we differentiate between the encoding of truncated trails over the inner and the outer parts of the sandwich distinguisher.Concretely, for the inner part, we model the propagation of truncated differential trails with probability one.
In a standard model of truncated trails, differential cancellation is very likely to happen in the diffusion layer, especially when minimizing the number of active S-boxes.For example, assuming that z = x ⊕ y and x, y, z ∈ F n 2 for some n ∈ N, the propagation of truncated differential trails over the XOR operation is normally encoded as follows: where X, Y , Z are binary variables indicating the activity of x, y, and z respectively.In this model, (X, Y, Z) = (1, 1, 0) is a valid transition.However, according to the BCT or FBCT frameworks, any common active S-boxes in the middle part of the sandwich distinguisher affect the entire probability of the boomerang switch and should not be neglected.As a result, the standard encoding, where differential cancellation through the diffusion layer is allowed, might indicate too few common active S-boxes.
To build such a model for differential trails of probability one, we need to consider the direction of propagation.This is in contrast to the standard model, where no directionality is encoded.For example, the inequalities describing the truncated model of the XOR operation only describe its differential branch number and are thus symmetric with respect to the input/output variables.However, in the BCT or FBCT framework, the upper and lower crossing differences must be propagated in forward and backward directions to explore the interaction between active S-boxes of upper and lower trails in the middle part.Therefore, to improve the encoding of truncated boomerang trails and to avoid spurious solutions, we differentiate between the encoding of truncated trails over the outer and the inner parts of the sandwich distinguisher.More precisely, instead of using the same truncated model for the entire upper truncated trail, we encode the propagation of the upper truncated trail through the outer part based on a standard approach while it is propagated forward with probability one through the middle part in our encoding.Similarly, the lower truncated trail is encoded using a standard model for the outer part, while it is propagated backward with probability one over the inner part in our tool.For example, to encode the XOR operation in the middle part of our word-based models, we use the following inequalities: This excludes the (X, Y, Z) = (1, 1, 0) point from the solution space with the aim of preventing the difference cancellation over the diffusion layer.
Modifying the previous approach [HBS21] accordingly, our method works as follows: 1. We partition the targeted cipher E into r 0 + r m + r 1 rounds for a sandwich distinguisher, as Figure 3 illustrates.Our tool first generates two MILP models with independent variables to encode the propagation of truncated upper and lower differential trails through r 0 + r m and r m + r 1 rounds, respectively.We encode the propagation of the upper truncated trail in a standard way over the first r 0 rounds, but switch to encoding the propagation forward with probability one for the last r m rounds of the upper trail.Similarly, the truncated lower differential trail is propagated backward with probability one over the first r m rounds, whereas its propagation in the last r 1 rounds is encoded in a standard way.Next, we encode the common active S-boxes between the upper and lower differential trails in the middle part.We define additional variables to indicate whether a certain S-box is active in both upper and lower truncated trails and use them to link the two MILP models.Let u 0 , . . ., u t−1 denote the activity of S-boxes in the last r m rounds of E m • E 0 , and l 0 , . . ., l t−1 those in the first r m rounds of E 1 • E m , as depicted in Figure 4. Consequently, u i and l i correspond to the same S-box positions in the middle part for all 0 ≤ i ≤ t − 1.We denote the corresponding t new binary variables by s 0 , . . ., s t−1 .We use them to link u i with l i for all 0 ≤ i ≤ t − 1 as follows: As a result, s i = 1 if and only if u i = l i = 1.As Figure 4 illustrates, the binary variables ũ0 , . . ., ũm−1 and l0 , . . ., ln−1 denote the activity of S-boxes in the first r 0 and last r 1 rounds, respectively.We use the constants w 0 , w 1 , and w m corresponding to the cost of active S-boxes in E 0 , E m , and E 1 to define our objective function: 2. Next, based on the discovered truncated differential characteristics for E 0 and E 1 , our tool looks for the best concrete differential trails instantiating the derived truncated trails over E 0 , and E 1 .To do so, it generates a bit-wise MILP model encoding the propagation of differential trails.If there is no differential satisfying the derived truncated trails, we go back to step 1 and try again.After deriving the concrete differential trails satisfying the desired activity pattern, we consider the clustering effect of differential characteristics.Therefore, our tool fixes the input/output differences of E 0 , and E 1 , and computes the differential effect of upper and lower differentials, i.e., p = Pr(∆ To achieve this, we create a new MILP model where we only fix the input/output difference and search for all compatible differential characteristics.This is computationally feasible in our case as we are working with rather few rounds for E 0 and E 1 .3. By using the fixed input/output differences of the middle part, we experimentally evaluate the probability of the boomerang switch according to the following formula: The number of common active S-boxes provides an initial estimate for r which allows us to deduce the number of trials needed to experimentally estimate r.We always make sure that the number of trials is significantly larger than r −1 .If r = 0, the discovered upper and lower differential trails are either incompatible, or combining them yields a weak distinguisher.If so, we go back to step 1 and repeat the process.
4. In the final step, we compute the entire probability p 2 q 2 r of the discovered sandwich distinguisher.To make sure that the computed probability is accurate enough, we perform an additional check.The accuracy of the estimated probability is highly related to correctly allocating the boundaries of the middle part.If the middle part is too short, the upper and lower crossing differences are not uniformly distributed at the boundaries of E m and the p 2 q 2 r formula underestimates the actual probability.
On the other hand, if we choose E m too large, the probability of the middle part decreases, and more computational power is required to evaluate the probability based on theoretical and experimental frameworks.A good indicator to see whether the middle part was appropriately allocated is extending E m by a few rounds and comparing the value p 2 q 2 r with the experimental probability of the extended E m .If the boundaries of E m are chosen appropriately, p 2 q 2 r gives a good estimate of the actual probability.Otherwise, we need to extend E m .
Our tool is very easy to use and extend for other ciphers.It receives the lengths r 0 , r m , r 1 of the partitions in sandwich distinguishers as well as the cost w 0 , w m , w 1 of active S-boxes in each partitions.It outputs the discovered truncated trails, the total number of active S-boxes, the number of common active S-boxes in the middle part, and the concrete differential trails covering E 0 and E 1 .Note that our sandwich distinguishers do not rely on individual differential characteristics and instead use more accurate estimates for the probability of the differential and the boomerang switch.More precisely, when computing p, q, and r, our tool fixes the differences at four positions only: the input of E 0 , the input and output of E m (on two different sides of boomerang switch), and the output of E 1 .All other differences are unrestricted.Besides the main outputs, our tool generates a figure representing the propagation of upper and lower differential trails through different parts of the sandwich distinguisher (e.g., Figure 6 and Figure 12) which not only gives us intuition about correctly allocating E m but also makes the manual verification of our discovered distinguishers much easier.
To perform MILP optimization, we use the Gurobi solver [Gur22].We also use this solver to compute the differential effect by setting the PoolSearchMode parameter to 2, which instructs Gurobi to find the n best solution, where n is a very large parameter.To avoid deriving multiple solutions corresponding to the same differential characteristic and thus counting it twice, we only define dummy variables in our bit-oriented MILP models when strictly necessary.Even if we have to define dummy variables (e.g., to encode the large binary matrices of CLEFIA), we ensure no extra solutions are created.In other words, we make sure that there is a one-to-one correspondence between the solution space of our models and the possible differential trails of the targeted cipher.
By adjusting the weights w 0 , w m , and w 1 in step 1, we can find sandwich distinguishers with different probabilities.In addition, if the targeted cipher employs different S-boxes with different differential uniformity or different (Feistel) boomerang uniformity, we can use a different weight for each S-box in our objective function to appropriately adjust the cost of its activity in the resulting truncated boomerang trail.To show the utility of our method, we demonstrate its application for several Feistel ciphers in the next sections.
To encode the differential behavior of nonlinear operations, particularly S-boxes, we have implemented the methods introduced in previous works [AST + 17, SWW18, AK18].We use the off-the-shelf logic minimization tool ESPRESSO, from the University of California, Berkeley, to simplify the extracted MILP constraints.ESPRESSO includes the efficient implementation of the Espresso [BHMSV84] algorithm and supports both fast and exact logic minimization.Unlike Logic Friday, a closed-source Windows program supporting Boolean functions with at most 16 input variables, ESPRESSO is an open-source tool that supports Boolean functions with any input sizes.It also outperforms the results derived by Logic Friday for large S-boxes [YK21].Given that extracting and simplifying the MILP (or SMT/SAT) constraints encoding the differential and linear behaviors of S-boxes is important in automatic differential and linear analysis, we have implemented this part of our tool as a subclass of the Sbox class in SageMath [Sag22].Thus, other researchers can use our S-box encoding tool with ease.Appendix H briefly describes the usage of our SageMath module to derive and simplify the constraints encoding the DDT of S-boxes.

Application to WARP
In this section we briefly describe the specification of WARP and then illustrate the efficiency of our tool to significantly improve the sandwich distinguishers of this cipher.

WARP
WARP is a lightweight block cipher that was proposed by Banik et al. at SAC 2020 [BBI + 20].It receives a 128-bit plaintext and a 128-bit master key and then performs 40 full rounds as represented in Figure 5 plus one partial round (without nibble permutation) to produce a 128-bit ciphertext.Employing a 32-branch generalized Feistel structure (GFS), WARP aims at providing 128-bit security in the single-key setting while achieving a small footprint.
The internal state of WARP can be represented as X = X 0 || • • • ||X 31 , where X i ∈ {0, 1} 4 .WARP splits the 128-bit master key K into two 64-bit halves, K = K 0 ||K 1 .K (r−1) mod 2 is used as the round-key in the rth round.As shown in Figure 5, the round function of WARP applies the same 4-bit S-box and round-key addition to one of each two consecutive nibbles of the internal state.Afterwards, a permutation π is applied to the nibbles of the state.We refer to design paper [BBI + 20] for a full specification.We use X (r) to denote the input state of round r + 1.To denote the input difference of round r + 1 in upper and lower trails of sandwich distinguishers, we use ∆X (r) and ∇X (r) .In addition, we use ∆X (r) i (or ∇X (r) i ) to denote the difference of the ith nibble in the input of round r + 1.

Sandwich Distinguishers for WARP
WARP's designers investigated the security of their cipher against well-known attacks on block ciphers.For instance, by applying the automatic methods for differential and linear cryptanalysis, they found a 21-round impossible differential distinguisher for WARP and computed the minimum number of differentially and linearly active S-boxes for up to 19 rounds of this cipher.They also applied the division property to find a 20-round integral distinguisher.WARP has also received third-party analysis which mostly focused on differential cryptanalysis [TB21, KY21].For example, by employing the FBCT framework, Teh and Biryukov [TB21] investigated the security of WARP against boomerang attacks and introduced a 21-round sandwich distinguisher with probability 2 −121.11building upon which they mounted a key-recovery attack on 24 rounds.This 21-round distinguisher has been discovered with an automatic tool that takes the FBCT into account.However, as the FBCT only handles the boomerang switch over one round, this tool also only considers the boomerang switch for one round in the middle.
Here, using our method, we take advantage of boomerang switches up to 10 rounds of WARP.This not only enables us to dramatically improve the probability of the sandwich distinguishers but also allows us to improve the sandwich distinguishers of WARP by two rounds and distinguish up to 23 rounds of WARP from a random permutation.WARP achieves nibble-wise full diffusion after 10 rounds for both encryption and decryption.Hence, if only one nibble is active at the input (output) of the middle part, the upper (lower) crossing differences become almost uniformly distributed after 10 rounds.Consequently, 10 rounds are a good choice for the length of the middle part.The FBCT of WARP's S-box is shown in Table 6 and its differential and (Feistel) boomerang properties are listed in Table 9.As the FBCT shows, two difference values {2, a} result in a better boomerang switch compared to the other difference values.Thus, we limit the input/output differences of the middle part to 2 and a.This guides our tool to find better bit-wise differences for the boundary between middle and outer parts when instantiating the truncated trails.
As the lower and upper crossing differences are propagated with probability one in our tool, we are able to find the longest deterministic nibble-level sandwich distinguisher.To do so, we set the length r 0 , r 1 of the outer parts to zero and increase r m as long as there is a deterministic sandwich distinguisher.Accordingly, we discover that there are 9-round deterministic sandwich distinguishers for WARP.One of these is listed in Table 2 and illustrated in Figure 6.This automatically generated figure shows that there is no interaction between the propagation of upper crossing differences (red) and lower crossing differences (blue).As a result, the probability of our 9-round distinguisher is one due to the ladder switch.Compared to our 9-round deterministic distinguisher, the best differential covering 9 rounds has a probability of 2 −28 .Now, we set (r 0 , r m , r 1 ) = (2, 10, 2) and apply our tool to find a 14-round sandwich distinguisher.If r 0 + r 1 > 0, choosing appropriate weights for the active S-boxes in the middle and outer parts affects the identified distinguisher.Given that WARP employs the same S-box in each round and taking the p 2 q 2 r formula into account, we set the weight of active S-boxes as (w 0 , w m , w 1 ) = (2, 1, 2).This guides our tool to find a near-optimal sandwich distinguisher for 14 rounds.The resulting distinguisher is listed in Table 3. Figure 12 (Appendix) shows how the differences are propagated through each part of this distinguisher with common active S-boxes marked in yellow.
To motivate our approach of evaluating the probability of the Boomerang switch experimentally, we compare it to the theoretical estimates of the FBCT framework [BHL + 20].For the theoretical estimate, we use lower case Greek letters such that α (r) j and β (r) j denote the upper and lower crossing differences in the jth nibble of X (r) .As evident from Figure 12, there are 3 common active S-boxes between the upper and lower trails.Additionally, for the common active S-box in round 4, the input upper and lower crossing differences are α in the lower differential trail.Therefore, its switching effect can be formulated by FBDT(α (3) 6 ).However, given that the upper crossing difference α (4) 8 does not affect the other two common active S-boxes, we can simply use FBCT(α 3 ) to formulate the switching effect of the common active Sbox in round 4. Concerning the common active S-box in the 8th round, the upper and lower crossing differences at the input of this S-box are α 18 .As a result, the total probability of the boomerang switch over the 10 middle rounds of our distinguisher is r(α •DDT(α 18 .To speed up the computation, it is possible to split Equation 2 into some precomputed tables according to Equation 3. We computed the above formula for several possible values of (α ).For example, if (α ) = (a, a) we obtain r = 45801799680 2 40 = 2 −4.58 , which matches the experimental probability.Table 4 compares the value of this formula with the experimental probability for some further input/output differences (α ). Evidently, choosing (α ) from {(2, 2), (a, a)} results in a greater probability for the middle part, which is expected according to the FBCT (Table 6).To evaluate the experimental values in Table 4, we set up several random tests including 2 26 boomerang queries corresponding to 2 13 random keys with 2 13 random messages each and compute the average number of returned boomerangs.
Now, we examine the overall probability of our 14-round sandwich distinguisher.Our tool calculates the probability of the differentials over E 0 and E 1 as p = q = 2 −4 which takes the clustering effect into account.As r(a, a) = 2 −4.58 (see Table 4), the total probability of our 14-round distinguisher is 2 −20.58 .This is large enough to be experimentally verified on an ordinary laptop.
To perform this experimental verification, we evaluate 2 28 boomerang queries corresponding to 2 10 random keys with 2 18 random messages each, and compute the average number of returned boomerangs.Accordingly, the experimental value for the whole 14round distinguisher is 2 −20.39 , which is very close to our estimate based on the p 2 q 2 r formula.The best differential effect reported for 14 rounds of WARP so far is 2 −72.14 [TB21].This shows the advantage of sandwich distinguishers compared to differential distinguishers for reduced-round WARP.
To discover a 15-round sandwich distinguisher, we partition the cipher using (r 0 , r m , r 1 ) = (2, 10, 3).Interestingly, the sandwich distinguishers discovered in this setting are the extension of our 14-round sandwich distinguisher by one round forward.As a result, we achieve a 15-round distinguisher with p = 2 −4 , q = 2 −8 , and r = 2 −4.58 as it is described in Table 10 with a total probability of 2 −28.58 .To experimentally verify our 15-round distinguisher for WARP, we carried out several random tests, including 2 34 random boomerang queries corresponding to 2 10 random keys with 2 24 messages each.The experimental probability for the whole 15-round distinguisher is 2 −28.33 , which is very close to p 2 q 2 r and thus verifies the validity of our estimate.
Similar to before, we apply our tool on up to 23 rounds of WARP.The full specification of our sandwich distinguishers for 15, 16, and 20 to 23 rounds of WARP is listed in Table 10.These results improve the best sandwich distinguishers for WARP on 20 and 21 rounds by a factor of 2 38.28 and 2 36.56 , respectively.We even distinguish 22 and 23 rounds of WARP with great advantage from a random permutation for the first time.
When comparing the number of active nibbles at input/output differences in our 20and 21-round sandwich distinguishers with the best previous ones [TB21], we find that our distinguishers not only have much higher probabilities but also have fewer active nibbles in the input/output differences.This is advantageous when building key-recovery attacks.For instance, the number of active nibbles at the input/output of our 21-round distinguisher is 14, compared to previously 17 [TB21].

Application to CLEFIA
CLEFIA[SSA + 07] is a 128-bit block cipher supporting key lengths of 128, 192, and 256 bits which are compatible with AES.Designed by Sony Corporation, CLEFIA was introduced in FSE 2007 and is internationally standardized in ISO/IEC 29192-2.Depending on the key size, the number of rounds in CLEFIA is 18 (128-bit key), 22 (192-bit key), and 26 (256-bit key).As shown in Figure 7, the round function of CLEFIA uses the generalized Feistel structure with four 32-bit branches in which two 32-bit functions F 0 and F 1 are applied in parallel.F 0 and F 1 follow the SP structure and perform three basic operations, including sub-key addition, application of four 8-bit S-boxes in parallel, and diffusing the output bytes of the S-box layer by applying a 4 × 4 MDS matrix over F 2 8 .As Figure 7 shows, CLEFIA employs two different S-boxes which are used in different order in F 0 and F 1 .Moreover, the diffusion mechanism of CLEFIA was designed based on a novel design technique called Diffusion Switching Mechanism (DSM) [SS04, SSA + 07] according to which two different MDS matrices with a certain property are used in the two branches.This guarantees a larger minimum number of active S-boxes in comparison to an ordinary GFS cipher without DSM.For a full specification, we refer the reader to [SSA + 07].Consistent with the previous sections, we use X (r) to denote the input state of the r + 1th round and denote the differences in forward and backward directions by ∆X (r) and ∇X (r) , respectively.CLEFIA's security has been investigated by its designers as well as many other researchers [TTS + 08, MDS11, Tez10, LWZ11, BGW + 13, BN19].The longest distinguishers for CLEFIA so far are 9-round impossible differential distinguishers [SSA + 07, TTS + 08], a 9-round integral distinguisher [LWZ11], and a 9-round zero-correlation distinguisher [BGW + 13].However, regarding the differential and boomerang analysis, the designers only estimated some upper bounds for the probability of differential and sandwich distinguishers based on the minimum number of differentially active S-boxes, and the best sandwich distinguisher for CLEFIA so far covers 8-round with probability 2 −92 [MQ14].Here, we not only improve the probability of the best previous 8-round sandwich distinguisher of CLEFIA by a factor of 2 15.97 , but also introduce a 9-round sandwich distinguisher for the first time.Thus, we contradict the claim by Biryukov and Nikolic that 9-round boomerang distinguishers for CLEFIA do not exist [BN19].Still, their conclusion that 12 rounds of CLEFIA resist boomerang attacks holds up.

Sandwich Distinguishers for CLEFIA
The differential and (Feistel) boomerang properties of the employed S-boxes in CLEFIA are briefly described in Table 9.As can be seen, the S-box S 0 is weaker against differential and boomerang attacks.More precisely, the maximum differential probabilities of S 0 and S 1 are 2 −4.68 and 2 −6 respectively.In addition, the Feistel boomerang uniformity of S 0 is 20, whereas the Feistel uniformity of S 1 is 4. Therefore, in contrast to our truncated MILP model for WARP where the cost of active S-boxes is only determined by the parameters w 0 , w m , w 1 , we treat S 0 and S 1 differently in our truncated MILP model for CLEFIA.
Concretely, depending on which part of the sandwich distinguisher the active S-box is located in and which S-boxes it is, we use 4.68 • w and 6 • w for w ∈ {w 0 , w m , w 1 } as the actual weight of S 0 and S 1 , respectively.Additionally, we have to take the DSM property of CLEFIA into account.According to DSM, the difference cancellation, which may occur in multiple rounds of a normal GFS cipher, is prevented in CLEFIA thanks to a clever way of choosing two different MDS matrices for the diffusion layer.To model the DSM, we adopt the method introduced by Sajadieh and Vaziri [SV18] in our truncated MILP model to avoid activity patterns excluded by the DSM property of CLEFIA.
Our bit-wise MILP model for CLEFIA is much heavier compared to WARP's bit-wise MILP model, which is mainly due to the large (8-bit) S-boxes and large MDS matrices employed by CLEFIA.Table 5 briefly describes the number of constraints derived by our tool to encode the differential behavior of S 0 and S 1 .As before, we apply our tool to find the longest deterministic sandwich distinguisher for CLEFIA which results in a 3-round distinguisher.To search for longer distinguishers, we set the length of the middle part in our sandwich distinguishers to 4 or 5 rounds.As CLEFIA reaches full diffusion on byte-level after 5 rounds, this is sufficient to model the dependency between the upper and lower trails.The specification of our sandwich distinguishers for 4 to 8 rounds is listed in Table 11.As evident from the table, by partitioning 7 rounds into 1 + 5 + 1 rounds, we discover a practical sandwich distinguisher with probability 2 −32.67 .To the best of our knowledge, this is the first 7-round distinguisher for CLEFIA which can be experimentally verified with a very limited computational power.The minimum number of differentially active S-boxes over the 7 rounds is 14, and hence the probability of a 7-round differential characteristic is at most 2 −14×4.68= 2 −65.52 .In reality, the probability will be much lower as the stronger S-box S 1 will also be active.Consequently, even if we take the clustering effect into account, there is still a huge gap between the probability of our 7-round sandwich distinguisher and the best possible 7-round differential for CLEFIA.For 8 rounds, we split the cipher into 2 + 5 + 1 rounds and discover a distinguisher with a probability of 2 −76.03 .Similarly, for 9 rounds, we split the cipher into 2 + 4 + 3 rounds and find a distinguisher with a probability of 2 −99.12 , which is illustrated in Figure 8.
The huge gap between the probabilities of our sandwich distinguishers for 7 and 8 rounds is due to the strong diffusion property of CLEFIA as well as a limitation of our method.The diffusion switching mechanism (DSM) [SS04, SSA + 07] of CLEFIA comes into effect for more than 7 rounds and increases the minimum number of active S-boxes by up to 40% in comparison to a normal GFS without DSM.This also increases the number of active S-boxes in the middle part of our sandwich distinguisher.Unfortunately, when the number of common active S-boxes in the middle increases, computing the probability based on either theoretical frameworks or by experimental approach becomes infeasible.Therefore, when applying our tool for more than 7 rounds of CLEFIA, we constrain it to find sandwich distinguishers with a limited number of common active S-boxes so that we can compute the probability of the middle part in a reasonable time.Due to the P 0 P 1 P 2 P 3 97000000 0a14283c 00000000 00000000 2 −4.68 Round 3 00000000 00000000 00000000 97000000 2 0 Round 4 00000000 00000000 97000000 00000000 4 middle rounds 2 −13.26 00770000 00000000 00000000 00000000 P 0 P 1 P 2 P 3 00770000 00000000 00000000 00000000 2 −6.00 Round 9 e0703ddd 00000000 00000000 00770000 2 −24.42 Round 10 00006000 00000000 00770000 e0703ddd 2 −10.00 Round 11 00006000 c12f77ee 00770000 31c79fae Figure 8: CLEFIA: 9-round boomerang distinguisher with probability 2 −99.12 based on 2-round upper trail (left, probability 2 −4.68 ), 4 middle rounds (left, 2 −13.26 ), and 3-round lower trail (right, probability 2 −38.25 including differential effect).
importance of CLEFIA as a standardized cipher, we propose a key-recovery attack on 11 rounds of CLEFIA based on our 9-round sandwich distinguisher.
We propose a key-recovery attack on 11 rounds of CLEFIA by prepending two rounds before our 9-round sandwich distinguisher.Our attack has a time complexity of 2 116.1 , a data complexity of 2 103.13 , and a memory complexity of 2 113.6 .To acquire the pairs, we use the initial structure depicted in Figure 9.This structure is built such that for each of the 2 97 elements, there is a second element that leads to the required difference at the beginning of round 3. Therefore, we get 2 96 pairs for 2 97 encryption and decryption queries (2 98 data).To arrive at 4 • 2 99.12 pairs for the distinguisher, we need 2 5.13 ≈ 35 of these structures.Therefore, we expect at least one right pair for the distinguisher with a probability of 98 %.
We carry out our attack in the following steps, which we repeat for each of the 2 5.13 initial structures.
1. Generate 2 97 plaintexts according to the initial structure.
2. For each plaintext P , obtain C = E(P ) and P ′ = E −1 (C ⊕ ∇).Store (P, P ′ , C) in the list P.This leads to our data complexity of 2 5.13 × 2 × 2 97 = 2 103.13 .Note that while we iterate over all 2 32 values in the rightmost 32 bits in P , we only care about pairs with difference that can generate a difference of 0a14283c after being Xored with the output of MixColumns.Concretely, we want pairs with a difference in the set H = {M 1 (S 1 (x) ⊕ S 1 (x ⊕ 97), 0, 0, 0) ⊕ 0a14283c} with |H| = 127.
4. For each of the 2 111 quartets (P, P ′ , C, Q, Q ′ , D), find the possible values for K 0 based on P and P ′ as well as based on Q and Q ′ .This can be done efficiently by using the X DDT of the S-boxes.For each transition, we expect one valid key candidate on average.Note that the key candidates for both pairs need to match.As this happens with a probability of 2 −32 , we are left with about 2 79 quartets.
5. Next, we target 8 bits of K 1 based on the transition of the relevant S-box in the first round.We expect two candidates on average, as each quartet is already filtered to only contain the 127 valid differences after the S-box.Due to the structure of the X DDT , either both candidates or no candidates match, and we expect a match with probability 2 −7 .Thus, we are left with 2 72 quartets.Note that when the key candidates match, we always get two candidates: k and k ⊕ 97.
6. Now, we recover K 3 ⊕ W K 1 .The transition in the relevant S-boxes of F 1 in the second round also depends on K 1 .For each of the 2 25 candidates of K 1 , we find the relevant candidates for K 3 ⊕ W K 1 .These candidates match with a probability of 2 −32 .Thus, we expect to be left with 2 65 quartets.
7. Now, we consider the final round by targeting 8 bits of K 20 and K 21 each.This further reduces the number of quartets by 2 −14 to 2 51 .
8. Next, we target K 18 ⊕ W K 3 .We consider all 2 25 values for K 21 and expect a match with a probability of 2 −32 and thus reduce the number of quartets to 2 44 .9. With the number of quartets reduced to 2 44 and each quartet only compatible with a few candidates for K 0 and K 1 , we can brute-force the remaining 64 bits of K 2 and K 3 .As K 0 , . . ., K 3 are calculated by applying 12 Feistel rounds to W K 0 , . . ., W K 3 , we unfortunately cannot use the additional key material to speed up this process.
Accounting for the fact that we have 2 5.13 initial structures leading 2 44 quartets each, this step needs about 2 5.13+44+64 = 2 113 time complexity.
The overall time complexity is dominated by step 3, where we identified the set of quartets with valid input and output differences.Thus, we get a total time complexity of 2 116.1 .The memory complexity of 2 113.6 = 6 × 2 111 is dominated by the need to store 2 111 quartets.

Application to TWINE
Now, we apply our tool to TWINE, a 64-bit block cipher which supports key sizes of 80 and 128 bits [SMMK12].This cipher uses a Type-2 generalized Feistel structure with 16 4-bit branches.Both variants perform 36 applications of the round function illustrated in Figure 10.The round function includes a nonlinear layer consisting of 8 parallel applications of the same 4-bit S-box and a diffusion layer permuting the 16 nibbles.Table 8 shows the FBCT of TWINE's S-box.Additionally, the differential and boomerang properties of this S-box are briefly described in Table 9.The table shows that the differential and F-boomerang uniformity of TWINE's S-box are 4.As the minimum Fboomerang uniformity of a non-APN function is 4 [BHL + 20], TWINE's S-box achieves optimal values for differential and F-boomerang uniformity for 4-bit S-boxes.Hence, the probability of the boomerang switch mainly depends on the position of active nibbles at the input/output differences rather than the concrete difference in these nibbles.Consequently, we expect that our method finds nearly optimal sandwich distinguishers for TWINE.
We apply our tool to find good sandwich distinguishers for TWINE.When searching for deterministic sandwich distinguishers, we find that there is a 5-round sandwich distinguisher with probability 1.To find longer distinguishers, we set the length of the middle part to 8 rounds as TWINE achieves full nibble-wise diffusion after 8 rounds.Our distinguishers for 13 to 16 rounds of TWINE are listed in Table 12. Figure 13 shows our 16-round sandwich distinguisher.Our distinguishers have a higher probability than the best previous sandwich distinguishers [TB21], while the number of active nibbles of input/output differences in our distinguishers remains the same as before [TB21].

Application to LBlock and LBlock-s
In this section, we apply our tool to LBlock, a 64-bit block cipher with 80-bit keys proposed at ACNS 2011 [WZ11].As shown in Figure 11, the round function of LBlock follows a 2-branch balanced Feistel structure, where the right branch is modified by an 8-bit left rotation.The keyed F -function applies eight 4-bit S-boxes in parallel, after which a permutation is applied to the nibbles.Similar to TWINE, for both encryption and decryption, LBlock can provide the full nibble-wise diffusion after 8 rounds.Hence, we set the length of the middle part of our sandwich distinguishers to 8. LBlock employs 8 different S-boxes for the F -function.We do not need to differentiate between these S-boxes in our truncated MILP model, as their differential and boomerang uniformity is identical.However, as evident from the FBCT of LBlock's Sboxes (Table 7), two difference values {3, b} yield a better probability for the boomerang switch.Thus, these differences are a good choice for the active nibbles at the input/output of the boomerang switch.
Throwing Boomerangs into Feistel Structures We obtain the following results for LBlock.First, we discover a 5-round deterministic sandwich distinguisher.To find longer distinguishers, we set the length of the middle part to 8 rounds, similar to the application on TWINE.The resulting distinguishers for 13 to 16 rounds are listed in Table 14.In comparison to the best previous boomerang distinguisher for LBlock [CM13], our distinguisher has a higher probability.
We also apply our tool to LBlock-s, a simplified version of LBlock that uses the S-box S 0 for all nibbles in the F -function.Our distinguishers for 13 to 16 rounds of LBlock-s are listed in Table 13.The best previous sandwich distinguisher for LBlock-s covers 16 rounds of this cipher with probability 2 −56.14 [BHL + 20].As Table 13 illustrates, the probability of the 16-round distinguisher discovered by our tool is 2 −53.59 (improvement of 2 2.55 ), whereas the number of active nibbles at its input/output differences is the same as before [BHL + 20].Although Boukerrou et al. [BHL + 20] considered the boomerang switch over the 8 rounds, they generated the upper and lower differential trails independently.In contrast, our tool takes the switching effect into account while searching for a sandwich distinguisher and thus yields a better distinguisher.
The diffusion strength of TWINE and LBlock are almost the same.Furthermore, as Table 9 shows, TWINE's and LBlock's S-boxes have the same differential uniformity.However, according to Table 9, the smaller F-boomerang uniformity of TWINE's S-box results in weaker sandwich distinguishers in comparison to LBlock.

Conclusion
In this paper, we introduced an improved automatic method to search for boomerang distinguishers by enhancing the method proposed by Hadipour et al. and applied it to several ciphers following the generalized Feistel structure.Thanks to the effectiveness of our method, we managed to improve the best previous results concerning the boomerang analysis on a wide range of GFS ciphers.Notably, we improved the probability of the best previous boomerang distinguishers for 20 and 21 rounds of WARP by a factor of 2 38.28 and 2 36.56 .In terms of the number of rounds, we also improved the boomerang distinguishers of WARP by 2 rounds and managed to distinguish up to 23 rounds of WARP from a random permutation.Applying our method to the internationally-standardized cipher CLEFIA, we proposed a 9-round boomerang distinguisher for this cipher which improves the best previous boomerang distinguisher by one round.We also built an 11-round key-recovery attack based on this distinguisher.Moreover, we introduced a practical boomerang distinguisher with probability 2 −32.67 for 7 rounds of CLEFIA which is, to the best of our knowledge, the first practical distinguisher for 7 rounds of this cipher.We also applied our method to TWINE, LBlock, and LBlock-s.In all cases, we succeeded in improving the best previous boomerang distinguishers of these GFS ciphers.

A Comparison with Lallemand et al.'s Approach [LMR22]
In parallel to this work, Lallemand et al. [LMR22] introduced another method to search for rectangle attacks on Feistel ciphers.Here, we provide a brief comparison between our method and theirs, which were developed independently.Lallemand et al. adapted the method proposed by Delaune et al. [DDV20] for finding boomerang distinguishers to the case of Feistel ciphers.However, we enhanced the method proposed by Hadiour et al. [HBS21].Thus, most of the differences stem from the differences between these original methods [HBS21, DDV20].Hadipour et al. [HBS21, Section 8] provide a precise comparison between these methods [DDV20, HBS21].We first recall the main similarities.
Both approaches have three main steps.They first find suitable upper and lower truncated differential trails.Then, they instantiate the discovered truncated trails with concrete differential trails, and lastly, compute the probability of the three main parts of the discovered sandwich distinguisher.Both approaches can also be extended to construct a unified model for the key recovery of rectangle attacks.For instance, Qin et al. [QDW + 21] and Dong et al. [DQSW21] extended the methods proposed in [HBS21] and [DDV20], respectively, to make a unified model for key recovery of rectangle attacks.
However, the two methods follow a different approach to computing the probability of boomerang switch.Our method uses the experimental approach to compute the probability of a boomerang switch, whereas the methods employed in [DDV20] and [LMR22] automatically handle the probability computation of the boomerang switch.To achieve a fully automatic tool, one has to encode different types of (F)BCT tables in boomerang switch at the bit level.However, this makes the resulting models harder to solve, and thus the execution time is longer when the boomerang switch includes many rounds or the S-box size is large enough (≥ 8-bit).Therefore, there is no choice but to sacrifice the accuracy of the probability computation.For example, the difference values over the boomerang switch should be equal at both sides of boomerang distinguishers in [DDV20] and [LMR22].Lallemand et al. proposed more techniques for speeding up the tool to keep the execution time reasonable.Although these techniques can decrease the execution time, they can also reduce the accuracy of the probability computation.However, we do not consider additional constraints on the difference values over the boomerang switch.After finding suitable truncated upper and lower trails and instantiating them with concrete differential paths, we only fix the difference values at four positions as discussed in Section 3.That is why we achieved better distinguishers for WARP compared to [LMR22].For instance, we obtained a 23-round sandwich distinguisher for WARP, which has a higher (by a factor of 2 8.41 ) success probability compared to the one proposed in [LMR22].Notably, our tool finds this distinguisher in 36 seconds running on a regular laptop (Core(TM) i7-1165G7 @ 2.80GHz).

G Distinguishers for LBlock and LBlock-s
Table 13: Specification of Sandwich Distinguishers for LBlock-s.

Figure 2 :
Figure 2: Differences of S-box at four sides of boomerang switch in Feistel structure.

Figure 3 :
Figure 3: High-level view of common active S-boxes in the boomerang switch.

w1Figure 4 :
Figure 4: The variables of our MILP model to find truncated upper/lower trails.

Figure 5 :
Figure 5: The round function of WARP.
the output differences of the common active S-box in round 8 do not affect the other two common active S-boxes, its boomerang switch can be formulated by FBCT(α Figure12shows that this upper crossing difference originates from α the common active S-box in round 11, the lower crossing difference at the input of this S-box is β(12) 4, which is fixed by the lower trail, and the input upper crossing difference at the input of this S-box is α (10) 18 .Consequently, the boomerang switch of the common active S-box in round 11 can be formulated by FBDT(β
Starting from the simplest case where the boomerang switch E m includes only one S-box layer, Cid et al. [CHP + 18] proposed the boomerang connectivity table (BCT).This idea was further developed in follow-up works [SQH19, WP19, HBS21] to provide a theoretical framework for evaluating the probability of the middle part when it is composed of multiple rounds.However, the BCT framework only works for block ciphers following the SPN design strategy.To formulate the probability of the boomerang switch over multiple rounds of Feistel ciphers,Boukerrou etal.[BHL + 20] proposed the Feistel boomerang connectivity table (FBCT) as the Feistel counterpart of the BCT framework.The setup is depicted in Figure 2.

Table 4 :
Comparison between the theoretical and experimental probabilities of the boomerang switch over 10 rounds of WARP.

Table 5 :
Number of MILP constraints to encode the pb-DDTs of CLEFIA.

Table 12 :
Specification of Sandwich Distinguishers for TWINE.