Automated Search Oriented to Key Recovery on Ciphers with Linear Key Schedule

. Automatic modelling to search distinguishers with high probability covering as many rounds as possible, such as MILP, SAT/SMT, CP models, has become a very popular cryptanalysis topic today. In those models, the optimizing objective is usually the probability or the number of rounds of the distinguishers. If we want to recover the secret key for a round-reduced block cipher, there are usually two phases, i


Introduction
Differential cryptanalysis [BS91] proposed by Biham and Shamir, is one of the most successful cryptanalysis techniques. To launch a differential key-recovery attack on block ciphers, the first step is finding a differential with probability larger than a random case. Based on the found differential, the second step is appending several extra rounds before and after the differential distinguisher to recover the keys.
In the perspective of a distinguishing attack, good differential distinguishers are those with relatively high probabilities or those covering a larger number of rounds. Based on the target cipher, one could utilize a dedicated search strategy to obtain differentiallike distinguishers, such strategies include Matsui's branch and bound method [Mat93], MILP-based automatic search [MWGP11, SHW + 14] and SAT/SMT-based tools [KLT15]. Whereas in a key-recovery attack, the goal is to attack as many rounds as possible with a relatively low (data, time and memory) complexity. Therefore, a good distinguisher for recovering the key is expected to balance the factors in determining the data/time/memory complexities and the number of attacked rounds.
As a consequence, the strategy of searching good distinguishers for a key-recovery attack may differ from that of a distinguishing attack. In other words, an optimal differential in distinguishing attack is not necessarily the optimal one for recovering the key. For instance, in order to extend more rounds before and after a differential distinguisher, one aims at minimizing the number of active bits from the input/output differences of the differential distinguisher, such that a data collection to filter the wrong pairs can be efficiently performed.
In order to search for differential distinguishers targeting at an improvement on the number of covered rounds by a key-recovery attack, one has to take into account multiple factors and their interactive influences, including the probability and the length of the differential distinguisher, the number of inactive bits in the differences after the forward and backward extension, and the number of guessed key bits for a partial decryption. It is interesting to study an automatic search model to analyze the trade-offs amongst various factors such that the constructed distinguisher is optimized for an efficient key-recovery attack. Till now, there are a few works covering related domains. Derbez et al. [DF16], Shi et al. [SSD + 18] and Chen et al. [CSSH19] introduced automatic tools on Demirci-Selçuk Meet-in-the-Middle attack that take the distinguisher and the key-recovery phase as a uniform searching model. Zong et al. [ZDC + 21] studied the key-recovery-attack friendly differentials and linear distinguishers on GIFT-128 [BPP + 17].
With the emergence of a large number of highly-constrained devices in burgeoning fields such as the Internet of things (IoTs), sensor networks, etc., lightweight cryptography becomes an active research domain in symmetric-key research groups. For applications in lightweight block ciphers where the adversary may have more power under some advanced attacking models, more interactive factors should be taken into consideration in the search of distinguishers. For instance, an attacker may have access to or control over the keys or tweaks. A related-key attack scenario where the encryption oracle is queried under a pair of keys with certain known relation is practical for some lightweight ciphers.
At CRYPTO 2016, Beierle et al. proposed a new lightweight block cipher family -SKINNY [BJK + 16], which has comparable hardware/software performances with SIMON [BSS + 13] and also has much stronger security guarantees. Also in 2016, NIST started the Lightweight Cryptography (LWC) standardization project [oSN20] to solicit lightweight cryptographic algorithms that are suitable for constrained devices. Among the Round 2 candidates of the LWC project, three algorithms are based on SKINNY, that is, SKINNY-AEAD and SKINNY-Hash [BJK + 20], ForkAE [ALP + 19a] and Romulus [IKMP19]. And Romulus is one of the finalists in the LWC project. So the security analysis of SKINNY is of great importance, which also affects the security evaluation of these candidates.

Our contributions
In this paper, we focus on an automatic model to search for distinguishers that directly improve the cryptanalytic results. Firstly, we analyze the detailed factors that restrict the generic differential key-recovery attack, and specifically a generalized rectangle attack model given by Zhao et al. [ZDM + 20, ZDJ19]. With data complexity and time complexity lower than that of the exhaustive search, we try to maximize the attacked rounds for a block cipher. The constraints that we take into consideration include the probability of the distinguisher, the number of differential inactive bits of the input-output of the attacked cipher, and the number of key bits need to be guessed in the extended rounds. Therefore, we propose a new automatic MILP model for related-key rectangle attacks on SKINNY, where the probability of a distinguisher and the dominating factors of the key-recovery phase are systematically processed by the constraints. So the key-recovery attacks may be improved in the number of covered rounds and/or the attack complexity. The uniform MILP model is built mostly based on the works of [BJK + 16, HBS20] and takes all the above constraints in searching a good distinguisher. We are able to find new good properties in the distinguishers, which can be used to perform key-recovery attacks on SKINNY and ForkSkinny covering more rounds than previous results. The cryptanalytic results are summarized in Table 1.

The tradeoff in differential cryptanalysis
with probability of p. The N b -round E b and N f -round E f are the rounds added before and after the distinguisher, respectively. Denote the block size as n, the master key size as k, the number of active bits of the input difference of E b as r b , and the number of key bits need to be guessed in E b as m b . Similarly, we define r f and m f for E f . After appending E b to the distinguisher E , there still need inactive bits in the input of E b , i.e. r b < n. Figure 1: Differential key-recovery attack on block cipher E.
In differential cryptanalysis, an attacker's goal is either to distinguish E from a random function, or to recover the master key based on the differential and partial decryption technique.
When searching for a differential distinguisher, the attacker aims at differentials that cover the highest number of rounds or with the maximized probability to distinguish the primitive from a random function. Meanwhile, for performing a key-recovery attack, there are two metrics to be optimized, i.e., on top of maximizing the number of attacked rounds, minimizing the attack complexity. Intuitively, using a differential distinguisher with highest probability of longest rounds, it is more likely to launch a better key-recovery attack. However, in practice, a good key-recovery attack often requires a comprehensive trade-off between the key-recovery phase and the differential distinguisher. Differential key-recovery attack. The general procedures of the key-recovery attack based on a differential α → β (see Fig. 1): 1. Data collection: collect y = 2 × 2 −r b · s/p structures of 2 r b plaintexts each, where s is the expected number of right pairs.
2. Filter the wrong pairs using inactive bits of the ciphertext and there are y · 2 2r b −1 /2 n−r f pairs left.
3. Initialize a list of 2 m b +m f empty counters.
4. For all y · 2 2r b −1 /2 n−r f pairs, perform a guess-and-filter procedure to determine candidate keys (denote the time complexity of the guess-and-filter procedure as ε ) and increase the corresponding counters.
The data complexity is about y · 2 r b = 2 · s/p. The time complexity to generate the key rank counters is about y · 2 2r b −1 /2 n−r f · ε ≈ 2 · s/p · 2 r b +r f −n · ε. In order to lower the time complexity of the key-recovery attack, we have to not only increase the probability p of α → β, but also decrease 2 r b +r f −n . Hence, there may exist various trade-offs between the differentials, the attacked rounds, the data complexity and the time complexity. A differential with lower probability may lead to better time-data tradeoff, or even lead to longer attacked rounds, due to the potentially marginal term 2 r b +r f in such a differential.
In this paper, we maximize the number of attacked rounds. Hence, the following constraints are necessary in our optimization strategy: (1) Maximize:

The tradeoff in rectangle attack on ciphers with linear key-schedule
Boomerang attack is a statistical cryptanalysis proposed by Wagner in 1999 [Wag99]. The original boomerang distinguisher is constructed by splitting the encryption function into two parts E = E 1 • E 0 , where two differentials α E0 − − → β and γ E1 − − → δ are combined to a boomerang. The probability of a boomerang is estimated by p 2 q 2 , where p, q are the probabilities of the differentials. A number of studies have shown advanced techniques for a better evaluation of the boomerang's probability, including sandwich attack, boomerang switch, etc., in both single-key and related-key models [Mur11,BK09,BDK05]. In 2018, previous observations on the boomerang switch are unified in the framework of boomerang connectivity table (BCT) by Cid et al. [CHP + 18]. As a result, the probability of the middle round in a boomerang can be precisely evaluated. Based on the proposal of the BCT A boomerang distinguisher requires a chosen-plaintext chosen-ciphertext model, and it can be converted into a chosen-plaintext attack that is known as the rectangle attack [BDK01] or amplified boomerang attack [KKS00]. The probability of the rectangle distinguisher is 2 −n p 2 q 2 . In the attack, only α and δ are fixed and the internal differences β and γ can be arbitrary values as long as β = γ. Hence, the probability would be increased to 2 −np2q2 , wherê where s is the expected number of right quartets.
2. For each structure, query the 2 r b plaintexts by the encryption oracle under K 1 , K 2 , K 3 and K 4 and obtain four plaintext-ciphertext sets denoted by L 1 , L 2 , L 3 and L 4 , where K 1 is the secret key and K 2 = K 1 ⊕ ∆K, K 3 = K 1 ⊕ ∇K and K 4 = K 1 ⊕ ∆K ⊕ ∇K. Insert L 2 and L 4 into hash tables H 1 and H 2 indexed by the r b bits of plaintexts.
3. Guess the m b subkey bits involved in E b : (a) Initialize a list of 2 m f counters, each of which corresponds to a m f -bit subkey guess.
(b) For each structure, partially encrypt plaintext P 1 ∈ L 1 to the position of α by the guessed subkey bits, and partially decrypt it to the plaintext P 2 after xoring the known difference α. Then we look up H 1 to find the plaintext-ciphertext indexed by the r b bits. Do the same operation with P 3 and P 4 . We get two sets The size of S 1 , as well as S 2 , is y · 2 r b with y structures. Insert S 1 into a hash table H 3 indexed by the (n − r f ) bits of C 1 and (n − r f ) bits of C 2 with fixed output differences through E f from δ. Then for each element of S 2 , we find the corresponding (P 1 , C 1 , P 2 , C 2 ) satisfying C 1 ⊕ C 3 = 0 and C 2 ⊕ C 4 = 0 in the (n − r f ) bits. In total we obtain y 2 · 2 2r b −2(n−r f ) quartets.
(d) We use all the quartets obtained in step (c) to determine the key candidates involved in E f and increase the corresponding counters. This phase is just a guess-and-filter procedure. We denote the time complexity in this step as ε.

Specification of SKINNY
The lightweight block cipher SKINNY was proposed by Beierle et al. [BJK + 16]. Let n denote the block size, t denote the tweakey size and c denote the cell size, the family of SKINNY has six main versions SKINNY-n-t: for each n ∈ {64, 128}, there are three tweakey size versions t = n, t = 2n and t = 3n. The internal state is viewed as a 4 × 4 square array of cells and the tweakey is viewed as a set of z 4 × 4 square arrays of cells, where z = t/n ∈ {1, 2, 3}. The set of tweakey arrays are denoted as (T K1) when z = 1, (T K1, T K2) when z = 2, and (T K1, T K2, T K3) when z = 3. SKINNY follows an SPN structure and a TWEAKEY framework [JNP14]. In each round of SKINNY, the state is updated with 5 operations: SubCells (SC), AddConstants (AC), AddRoundTweakey (ART), ShiftRows (SR) and MixColumns (MC), which is illustrated in Fig. 3. The subtweakey ST K i is only xored to the first two rows. Refer to Appendix A.1 for more details of the tweakey schedule. ∆X i is the difference of state X in round i. X i [j, . . . , k] denote the cells of the state with index {j, j + 1, · · · , k}, where 0 ≤ j, k ≤ 15. We denote the equivalent subtweakey in round i by ET K i , where ET K i = MC • SR(ST K i ) as Fig. 4.  For the MDS matrix of AES, when 4 out of 8 input-output bytes are fixed, other bytes are determined. However, the situation is different for SKINNY's non-MDS matrix. For instance, when the input bytes are (1, 1, 1, ?), the output bytes are (?, 1, 1, 1), where "1" labels a known value and "?" unknown. Let M · (a, b, c, d) T = (α, β, γ, δ) T and Lemma 1. [BDL20] For any given SKINNY S-box S and any two non-zero differences δ in and δ out , the equation  [CHP + 18], and proposed a generalized framework of BCT to systematically calculate the probability of a boomerang distinguisher considering the dependence. They re-evaluated the probabilities of the boomerang distinguishers given in [LGS17], where the probabilities are much higher than before. As in Dunkelman et al.'s (related-key) sandwich attack framework [DKS10], the N dround cipher E is considered asẼ 1 • E m •Ẽ 0 , whereẼ 0 , E m ,Ẽ 1 contain r 0 , r m , r 1 rounds, respectively. Letp andq be the probabilities of the upper differential used forẼ 0 and the lower differential used forẼ 1 . The middle part E m specifically handles the dependence and contains a small number of rounds. If the probability of generating a right quartet for E m is t, the probability of the whole N d -round boomerang distinguisher isp 2q2 t.
Recently, [HBS20] introduced a heuristic approach to search boomerang distinguishers using MILP/SAT models. They introduced some new tables for S-boxes to model the dependence between the upper and lower differentials in boomerang distinguishers. We briefly introduce their searching approach by following steps: 1. Firstly, search truncated differentials with the minimum number of active S-boxes with a word-oriented MILP model considering the switching effect in multiple rounds, which is based on the idea of [CHP + 17]'s model. In their MILP model, they searched for a (r 0 + r m )-round upper truncated differential and a (r 1 + r m )-round lower truncated differential. The objective function is the number of active S-boxes in the distinguisher. Considering the dependence, for the r m -round middle part, only the S-boxes which are active in both upper and lower truncated differentials are involved in the objective function.
2. Then, use the MILP/SAT models to get the actual differential characteristics for the upper and lower truncated differentials separately. If there is no actual differential characteristic, go back to step 1.
3. Evaluate the probability of the middle part experimentally. If the probability is equal to 0, go back to step 1.
4. Based on the algorithm proposed in [SQH19], evaluate the probability of the middle part mathematically.
Almost at the same time, [DDV20] proposed a new automatic tool to search boomerang distinguishers and provided their source code to facilitate follow-up works. Similar with [HBS20], they also introduced a set of tables which help to calculate the probability of the boomerang distinguisher. With the tables to help roughly evaluate the probability, they use an MILP model to search for the upper and lower trails throughout all rounds by automatically handling the middle rounds. Then a CP model is applied to search for the best possible instantiations.

Our model to determine a distinguisher
According to Sect. 2, to launch a boomerang attack covering more rounds with a boomerang distinguisher, the interaction between the N d -round distinguisher and the N b and N f extended rounds needs to be considered simultaneously. Given the time complexity 2 m b +2r f −n · s/p 2q2 · ε in Sect. 2, it is necessary to set additional constraints over the active r f bits in the output state of the lower trail and m b -bit subtweakey involved in the extended N b rounds. So we present an extended model for searching the entire (N b + N d + N f ) rounds of a boomerang attack. The aim is to find new boomerang distinguishers in a related-tweakey setting that result in key-recovery attacks with more rounds. In Sect. 2.2, the target is to maximize N b + N d + N f . However, in practical programming, we take N b , N d = r 0 + r m + r 1 and N f as parameters to input the model, and the target is the time complexity 2 m b +2r f −n · s · (p 2q2 t) −1 · ε as in Sect. 2. By feeding different values of N b , N d and N f into the model, we try to find the maximal value for Our new model tweaks the model of Hadipour et al. [HBS20] to search for a (N b + r 0 + r m )-round upper truncated differential and a (r m + r 1 + N f )-round lower truncated differential. Let X u r and X l r denote the internal state before SubCells in round r of the upper and lower truncated differentials. For i-th cell of the internal state X u r of the upper differential, we define a binary variable DXU (0 ≤ r ≤ r m + r 1 + N f , 0 ≤ i ≤ 7) defined for the lower differential. The modelling strategies of the (r 0 + r m )-round upper differential and (r m + r 1 )-round lower differential are the same to the model of Hadipour et al. [HBS20]. The constraints for the tweakey schedule of SKINNY are also the same with the previous ones [BJK + 16, HBS20, LGS17]. Here, we only list the differences in our model.

Modelling the active cells propagation in the N f rounds after the lower differential.
Starting from the (r m + r 1 )-round internal state X l rm+r1 of the lower differential, the truncated differences are propagated forwards with probability 1. Hence, the constraints on DXL and DSTKL is different from those in the (r m + r 1 ) rounds. As introduced in [BJK + 16], the linear diffusion layer SR and MC of SKINNY can be seen as a binary 16 × 16 matrix L.
Let L x (i) (resp. L −1 x (i)) be the set of the indexes j such that the coefficient In the related-tweakey attacks, the subtweakey differences will affect the differences of the internal state. Take the key-recovery attack on SKINNY-64-192 as an example (see Fig. 7). In round 27, the non-zero difference in ∆ST K 27 [3] leads to differences in ∆X 28 [7, 15], and further affects the following internal states. Note that the subtweakeys are only xored to the first two rows of the state. ET K r is produced by ST K r by applying the matrix L (Eq.(16)). As shown in Fig. 4, each cell of ET K r only depends on one cell of ST K r , since there is only one entry of L i,j = 1 with 0 ≤ j ≤ 7 for a given i in the linear matrix L (Eq.(16)). For each i, we denote such j that makes L i,j = 1 (0 ≤ j ≤ 7) as L k (i). Therefore, the constraints on the impact of DSTKL to the internal state and the propagation of the active cells in N f rounds are given below: In addition, we require that there are some inactive cells in the output state of the lower differential, which can help us filter quartets to reduce the time complexity in the key-recovery attack 1 : Modelling the active cells propagation in the N b rounds before the upper differential. For the (N b + r 0 + r m )-round upper differential, starting from the N b -round internal state X u N b of the upper differential, the truncated differences are propagated backwards with probability 1. Note that different from the key-recovery attack, in the searching model, we use the original style of representation of SKINNY, i.e., we do not use the equivalent key ET K 0 to replace ST K 0 . The constraints describing the impact of DSTKU to the internal state and the active cells propagation backwards from the N b -round are as follows (note that the subtweakeys are only xored to the first two rows of the state): (11) [i], i.e., the activeness of state X 3 is determined by Y 3 , where the constraints of Y 3 to W 3 are included by the program of (r 0 + r m )-round truncated differential.
To avoid all cells active in the plaintext, we require that at least one cell of the internal state X 1 of the upper differential is inactive 2 : Modeling the subtweakey cells involved in the N b rounds before the upper differential. There are many papers considering the dependence of keys in key-recovery, such as [DFJ13,FN20,DF16]. As shown in Eq.(1), the time complexity of the rectangle attack is also highly related to the number of keys involved in N b , i.e., m b . In fact, we need to guess m b -bit subtweakey first to deduce P 2 from P 1 by partial encryption and decryption in the first N b rounds. Please recall the details in Step 3 (b) of the rectangle attack model in Sect. 2.2.
We hope that the smaller m b , the better. Since the matrix M in the MC operation is not an MDS matrix, the subtweakeys involved in the partial encryption and decryption are different. So we model the subtweakey cells from two aspects.
Taking the key-recovery attack on SKINNY-64-192 as an example (see Fig. 7), we decompose the whole process of deducing P 2 from P 1 into two phases: 1 Note that, in the key-recovery attack, we use the active cells of X l rm+r 0 +N f −1 to filter quartets instead of X l rm+r 0 +N f . 2 In the key-recovery attack, we use the active cells of X u 1 to collect data instead of X u 0 .
(2) Partially decryptȲ 3 = Y 3 ⊕ ∆Y 3 to get P 2 as shown in Fig. 6. In Phase (1) shown in Fig. 5, in order to compute Y 3 [3, 6, 7, 9, 13] from P 1 , we need the values of the cells marked by in X 3 . With the details of MC and SR, the values of the cells marked by and in Z 2 and corresponding ST K 2 are needed. The cell positions in X 2 which need to be known are the same as those in Z 2 , which are marked by and . Similarly deducing for other rounds, the cells need to be known are all determined for round 0-3, which are marked by and . We define a binary variable KnownEnc[r][i] to identify whether the ith (0 ≤ i ≤ 15) cells of the internal state Z r (0 ≤ r ≤ N b − 1) should be known in Phase (1). A binary 16 × 16 matrix L E (see Appendix A.2) is introduced to describe the linear diffusion (combination of SR and MC) determining the cells need to be known in Z r (same for X r ) from X r+1 . Let L E z (i) be the set of the indexes j such that the coefficient In addition, for 0 In Phase (2) as shown in Fig. 6, with Y 3 [3, 6, 7, 9, 13] computed in Phase (1), we computeȲ 3 [3, 6, 7, 9, 13] = Y 3 ⊕ α[3, 6, 7, 9, 13]. So the differences of the active cells in ∆X 3 are determined. Therefore, we compute the differences of active cells in ∆Y 2 from ∆X 3 through the linear operations SR and MC. In order to compute backwards further, we have to know the values of the active cells of Y 2 from P 1 . The calculation from P 1 to Y 2 is similar to the calculation from P 1 to Y 3 in the encryption of Phase (1). To compute the values of the active cells of Y 2 , all the cells in Z 1 marked by north west lines have to be known. Moreover, similar to Y 2 , all active cells of Y 1 are also needed to be known. Hence, combining the cells needed to compute Y 2 and the active cells in Y 1 together, all the cells marked by north west lines in Y 1 are needed to be known.
Suppose we have known such cells in Y 2 marked by , then we are able to compute the active cells marked by ofȲ 2 = Y 2 ⊕ ∆Y 2 and the difference of active cells of ∆X 2 . Similarly, with ∆Y 1 deduced from ∆X 2 and active cells known in Y 1 , we compute ∆X 1 . Then P 2 is determined. Totally, we have to compute cells marked by north west lines of Y 2 and Y 1 . In our programming of the model, we integrate the above two phases. For example, in round 2, X 2 marked by dots in Phase (1) and by north west lines in Phase (2) are all needed to be known. Hence, we can take the union of these marked cells and compute backwards further. We define a binary variable Known (1 ≤ r ≤ N b − 1, 0 ≤ i ≤ 15) for each cell of each internal state X r (same for Y r ) to indicate whether the value is needed either in Phase (1) or Phase (2). Known = 1 is marked by north east lines in Fig. 7. Then the binary variable KnownEnc = 1 is also marked by north east lines in Fig. 7. For the round N b − 1, only active cells (i.e., state X 3 in Fig. 7) of the internal state need to be known and the subtweakey of this round does not need to be guessed: From round N b − 2 to round 0, we give the constraints over the linear diffusion (SR and MC) determining which cells to be known in Z r from X r+1 as in Phase (1): In round N b − 2 to round 1, the cells in X r need to be known involve two types: the active cells need to be known in Phase (2), and cells need to be known in Z r , which are computed from the needed cells of X r+1 in last round: The objective function. As given in Sect. 2, the time complexity of the rectangle attack is 2 m b +2r f −n · s · (p 2q2 t) −1 · ε. For the (r 0 + r m + r 1 )-round distinguisher, the target to be optimized is same with Hadipour et al. [HBS20]. First we add the variables DXU of r 0 -round upper differential (on behalf of thep) with weight w 0 to the objective function. Similarly add the variables DXL of r 1 -round lower differential (on behalf of theq) with weight w 1 . Then, considering the switching effects, we add the variables DXU and DXL in r m -round middle part (on behalf of the t) with weight w m to the objective function. In order to find the distinguishers whose probabilities are likely larger than random case, we set an upper bound on the number of active Sbox in the (r 0 + r m + r 1 )-round part: where BOUND is selected experimentally. Namely, suppose minimum of Eq. (13) is MIN, we set BOUND= MIN + 10.
In addition, we add the variables KnownEnc (on behalf of the m b ) and DXL (on behalf of the r f ) with different weight w u and w l to get a uniformed objective: Because different parameters have different coefficients in the formula of the time complexity, we give them different weights to model the objective more accurately. For example, in Eq. (5), with m b + 2 · r f , the complexity is more sensitive with r f than m b . So we set the the coefficients as w l = 2 · w u .
Then considering the probabilities in the DDT tables of the S-boxes and the switching effects similar to [HBS20], we adjust the weight w l = 2w u = 2w 0 = 2w 1 = 4w m = 4. We use different N b , N d and N f to run our model. N b and N f are chosen from 1 to 4. N d is chosen based on experience. For example, when the best previous distinguishers have y rounds, we will choose N d with y − 3 ≤ N d ≤ y. As pointed out in [SQH19], the dependence of the upper and lower trails could affect up to 6 rounds. So we choose r m = 6, and then r 0 and r 1 vary with N d . With our new model, we search for more proper truncated upper and lower differentials for applying the key-recovery attack. And then, for r 0 -roundẼ 0 of the upper differential and r 1 -roundẼ 1 of the lower differential, we use the CP model to get the instantiations for the truncated differentials, as [SGL + 17]. We also calculate the probabilityp andq considering the clustering effect. We experimentally calculate the probability of r m = 6round middle part of the distinguisher. Note that the probability of the middle part should be high to be verified with a small computer in reasonable time. We use one computer equipped with one RTX 2080 Ti to experimentally compute the probability of the middle part and the results of our experiments are listed in Table 2. Note that t may be zero. If so, we need to find new instantiations for the truncated differentials or even need to search new truncated differentials. When t > 0 andp 2q2 t > 2 −n , we successfully get a boomerang distinguisher with probabilityp 2q2 t.
The details of the boomerang distinguishers we obtained are listed in Table 4, 5, 6 and 7. For more details, we refer to Table 16, 17, 18 and 19 in Appendix. B. In addition, we summarize the previous boomerang distinguishers for SKINNY in Table 3.
Ref. Remarks. We compare our distinguishers with the other recent distinguishers, in particular with those provided by Hadipour et al. in [HBS20]. Our distinguishers have the same r m = 6 with [HBS20], as pointed out in [SQH19] that the upper and lower differentials can be dependent up to 6 rounds. Then for the r 0 -round upper differentials and r 1 -round lower differentials, there are some differences, which are listed as follows: • For the upper differentials, there are more than one active cell in the tweakey. So benefiting from the linear key schedule, the state differences propagation can be controlled by the subkey differences. Although our r 0 is larger or equal to that in [HBS20], there are fewer active cells in the input state. So when we extend same N b rounds before the distinguisher, the number of subkey bits involved in E b is smaller than the attack in [HBS20], e.g., for the attack on Skinny-64-128, by adding 2 rounds before the distinguisher, there are m b = 3c in our attack (see Sect. 5.2) and m b = 8c in [HBS20]. In some cases, we can extend more rounds, e.g., for the attack on SKINNY-64-192, there are N b = 4 and m b = 19c in our attack (see Sect. 5.1) and N b = 3 and m b = 16c in [HBS20].
• For the lower differentials, our distinguishers for Skinny have smaller r 1 than [HBS20] and fewer active cells in the output state. So we can extend more rounds (N f ) with fewer active cells (r f ) in the ciphertext, and filter more quartets with inactive cells of the ciphertexts before the key-recovery process to reduce the time complexity.
Taking the attack on SKINNY-64-192 as an example, when we add 4 rounds after our distinguisher, there are 12 active cells in the ciphertexts (N f = 4, r f = 12, see Sect. 5.1); in [HBS20], all the cells in the ciphertexts are active when 3 rounds are added (N f = 3, r f = 16).
So the numbers of attacked rounds in our paper and [HBS20] are different, considering all the parameters affecting the key recovery attack, i.e. the number m b of guessed subkeys in E b , the number r f of active cells in the ciphertext, the number (N b + r 0 + r m + r 1 + N f ) of rounds of the whole attack and the probability of the (r 0 + r m + r 1 )-round distinguisher.
By extending 4 rounds before and 4 rounds after the 22-round distinguisher, we attack the 30-round SKINNY-64-192 as illustrated in Fig. 7.
In the first round, we first apply SR and MC operations, and then apply the ART operation with the equivalent subtweakey ET K 0 instead of the subtweakey ST K 0 . So there is no subtweakey involved in the first round, and we can build our structures at W 0 . Treating W 0 as the plaintext and Z 29 as the ciphertext, we can get following parameters: 2. Let K 1 be the master secret key, K 2 = K 1 ⊕ ∆K, K 3 = K 1 ⊕ ∇K and K 4 = K 1 ⊕ ∆K ⊕ ∇K. For each structure, query the corresponding ciphertexts for the 2 r b plaintexts under four related keys K 1 , K 2 , K 3 and K 4 , which are named as four plaintext-ciphertext sets L 1 , L 2 , L 3 and L 4 . Then insert L 2 and L 4 into hash tables  to find the element in S 1 where (C 1 , C 3 ) and (C 2 , C 4 ) collide in the 8 inactive cells. So there are y 2 · 2 2r b · 2 −2(n−r f ) = y 2 · 2 88 quartets as (C 1 , C 2 , C 3 , C 4 ), which can be used to conduct the tweakey recovery for the 68-bit subtweakey involved in E f . Call the tweakey recovery process to check whether the guessed tweakey is correct. [10] = 0xa is a 4-bit filter for both (C 1 , C 3 ) and (C 2 , C 4 ). Therefore, y 2 · 2 32 · 2 −8 = y 2 · 2 24 quartets remain.
Set the expected number of right quartets s = 1 and y = √ s · 2 n/2−r b · (pq √ t) −1 = 2 0.87 . The data complexity is 2 62.87 and the memory complexity is 2 68.05 . Set the advantage h = 36 and the time complexity is about 2 163.11 . Let the signal-to-noise be S N = 2 −n ·p 2q2 t/2 −2n , based on the theoretical analysis by Selcuk [SB02], the success probability is about ) = 59.5% .

Complexity.
Setting the expected number of right quarters s = 1 and the advantage h = 30, y = √ s · 2 n/2−r b · (pq √ t) −1 = 2 43.67 . The data complexity is 2 61.67 , the memory complexity is 2 84 and the time complexity is about 2 96.83 . The success probability is about 75.8%.
According to Zhao et al.'s attack procedures, for each guessing of 2 m b = 2 152 possible values in E b , there are about y 2 · 2 2r b · 2 −2(n−r f ) = y 2 · 2 192 quartets remaining. We give the detailed process to recover m f key bits for E f . For each quartet remaining, do: 1. In round 29: for the 2nd column of X 29 of (C 1 , C 3 ), we obtain ∆X 29  for (C 2 , C 4 ), which acts as an 8-bit filter. y 2 · 2 160 · 2 −8 = y 2 · 2 152 quartets remain.

Complexity.
Setting the expected number of right quarters s = 1 and the advantage h = 56, we choose y = 1. The data complexity is 2 122 , the memory complexity is 2 128.02 and the time complexity is about 2 341.1 . The success probability is about 84.1%.
Complexity. Set the expected number of right quarters s = 1 and the advantage h = 32, y = √ s·2 n/2−r b ·(pq √ t) −1 = 2 58.48 . The data complexity is 2 124.48 , the memory complexity is 2 168 and the time complexity is about 2 226.38 . The success probability is about 80.6%. The primitive is based on the lightweight tweakable block cipher SKINNY, and there are four different instances with variant block sizes and tweakey sizes (see Table 12). The construction of ForkSkinny is shown in Fig 11. The encryption of ForkSkinny is split into two steps. The first R init rounds process the input message with the round function of SKINNY under modified constants. Then, the encryption procedure is forked into ForkSkinny 0 and ForkSkinny 1 , where two copies of the output from the first stage are separately processed by the two forks with R I and R II rounds, respectively. The tweakeys are generated by the tweakey schedule for R init + R I + R II rounds in total, and used sequentially in the initial step, ForkSkinny 0 and ForkSkinny 1 . For instance, the last R II round tweakeys are applied in ForkSkinny 1 . Overall Structure. We illustrate our design in Fig. 3 for ForkSkinny-128-192. This version takes a 128-bit plaintext M , a 64-bit tweak T and a 128-bit secret key K as input, and outputs two 128-bit ciphertext blocks C 0 and C 1 (i.e., ForkSkinny(K, T, M, b) = C 0 , C 1 ). The first r init = 21 rounds of ForkSkinny are almost identical to the one of SKINNY and only differ in the value of the constant added to the internal state. After that, the encryption is forked, which means that two copies of the internal state are further modified with different sets of tweakeys. For reasons that we detail below, a constant denoted by BC (Branch Constant) is added to the internal state used to compute C 1 , right after forking. Then, ForkSkinny 0 iterates r 0 = 27 rounds and ForkSkinny 1 iterates r 1 = 27 rounds. As illustrated in Figure 3, after forking the tweakeys for the round functions of ForkSkinny 0 are computed from the tweakey state obtained after r init rounds, while the tweakeys for the round functions of ForkSkinny 1 are derived from the tweakey state at the end of r init + r 0 rounds (denoted by T w ). Figure 4 details the ForkSkinny construction, where Enc-SKinny r (·, ·) denotes the SKINNY encryption using r round functions taking as input a plaintext or state together with a tweakey. Similarly, Dec-SKinny r (·, ·) denotes the corresponding decryption algorithm using r rounds. We use a similar tweaked model to search boomerang distinguishers for the encryption from plaintext M to C 1 in Fig. 11, where only the tweakey schedule is slightly different. For ForkSkinny-128-256, the subtweakeys used are ST K 0 , ST K 1 , . . .,ST K 20 , ST K 48 , ST K 49 , ..., ST K 74 . As pointed out in [BDL20], there are some master key differences δs that satisfy δ = LF SR2 15 (δ). So we can get differential characteristics with 6 consecutive inactive subtweakeys ST K 18 , ST K 19 , ST K 20 , ST K 48 , ST K 49 and ST K 50 . We take advantage of these properties and add constraints to the model in Sect. 4.2.

Discussion and Conclusion
We give a uniform automatic MILP model of related-tweakey rectangle attacks on SKINNY and ForkSkinny, which includes the key-recovery attack process and the related-tweakey rectangle distinguisher. In the model, we balance the probability of a distinguisher and the dominating factors of the key-recovery process, such as the guessed key bits. Hence, we give the improved related-tweakey rectangle attacks on a few versions of round-reduced SKINNY and ForkSkinny, which cover 1-2 more rounds than the best previous ones. We would like to state that according to the discussion 4 on the LWC forum, our rectangle attacks on round-reduced SKINNY block cipher do not impact the security of SKINNY-AEAD.

A.1 Tweakey schedule of SKINNY
The round tweakey ST K i is defined as: The tweakey arrays T K1 i , T K2 i and T K3 i in round i are generated as follows. First, apply the permutation P = [9, 15, 8, 13, 10, 14, 12, 11, 0, 1, 2, 3, 4, 5, 6, 7] on each T Km i−1 tweakey arrays: Then, apply an LFSR to update each cell of the first and second rows of T K2 i and T K3 i . The details of the LFSRs used in different versions are given in Table 15.

B Boomerang Distinguishers of SKINNY and ForkSkinny
In this section, we list the differentials of the boomerang distinguishers searched in Sect. 4.2. For each round of the (r 0 +r m +r 1 )-round distinguisher, we list the input/output differences of the S-box and subtweakey differences for r 0 -round upper differential, as well as r 1 -round lower differential. For r m -round middle part of the distinguisher, we only present the input difference of the upper differential and the output difference of the lower differential. In the following table, the differences are given in hexadecimal, "*" denote arbitrary nonzero difference in the computation of the middle part, "-" denote arbitrary difference in the differential.