Catching the Fastest Boomerangs

. In this paper we describe a new tool to search for boomerang distinguishers. One limitation of the MILP model of Liu et al. is that it handles only one round for the middle part while Song et al. have shown that dependencies could aﬀect much more rounds, for instance up to 6 rounds for SKINNY. Thus we describe a new approach to turn an MILP model to search for truncated characteristics into an MILP model to search for truncated boomerang characteristics automatically handling the middle rounds. We then show a new CP model to search for the best possible instantiations to identify good boomerang distinguishers. Finally we systematized the method initiated by Song et al. to precisely compute the probability of a boomerang. As a result, we found many new boomerang distinguishers up to 24 rounds in the TK3 model. In particular, we improved by a factor 2 30 the probability of the best known distinguisher against 18-round SKINNY -128/256.


Introduction
Differential cryptanalysis is one of the most powerful cryptanalysis techniques.It was proposed by Biham and Shamir in [BS91] and has generated much attention since then.The aim of differential cryptanalysis is to study the propagation of differences through a cipher to highlight unexpected behaviors compared to a random permutation.Typically it concerns the existence of a differential characteristic with a high probability but we may also search for impossible transitions [BBS99].
Nowadays we know how to design ciphers resistant to differential cryptanalysis, ciphers for which we can give upper bounds on the probability of the best differential characteristics.To go further, Wagner proposed the boomerang attack in [Wag99].The main idea introduced by Wagner is that combining two short differentials may lead to a better probability than one long differential.In boomerang attacks, a cipher E is regarded as the composition of two sub-ciphers E 0 and E 1 so that E = E 1 • E 0 .Suppose there exist both a differential α → β for E 0 and a differential γ → δ for E 1 with probabilities p and q respectively.If we assume the two differentials are independent then we obtain a boomerang distinguisher of probability: However, in practice the independence assumption does not hold, especially at the junction of both the lower and upper differentials.At SAC'07, Wang et al. [WKD07] first gave some evidences for non-returning boomerangs (i.e.P = 0 instead of p 2 q 2 ).In 2011, Murphy [Mur11] provided several examples for both AES and DES of boomerangs never coming back.Similar results were obtained by Kircanski in [Kir15]: a SAT solver is used to show that previous rectangle/boomerang attacks on XTEA [Lu09], SM3 [WKD07] and SHACAL-1 [DKK06] primitives were based on incompatible characteristics.
Recently, in [CHP + 18], Cid et al. proposed a new tool named boomerang connectivity table (BCT) to overcome the dependency issues.The BCT is actually a precomputation of all boomerangs through one single Sbox.Its main advantage is to provide a unified view of the switches previously introduced to refine the computation of the probability [BK09,DKS14].In [SQH19], Song et al. give a generalized framework for the BCT and propose a method to precisely evaluate the probability of a boomerang.They reevaluated the probability of several boomerang distinguishers from [LGS17] against both SKINNY and AES, showing their real probability was much higher than expected.
Searching boomerangs.One natural question when facing a new cryptanalysis technique is how to find the best distinguishers.For boomerang distinguishers, the classical approach is to first search for two short characteristics with high probability and to combine them.But we believe this approach should now be deprecated since the dependency in the middle rounds may hugely affect the probability of the distinguisher and thus it seems sub-optimal to search for both the lower and upper differentials independently.
In [CHP + 17], Cid et al. used an MILP model to study the ladder switch for a boomerang attack on Deoxys.A more generic approach was proposed in [LS19], where Liu et al. describe an MILP model to directly search for the best boomerang distinguisher against the block cipher GIFT.The cipher is decomposed in three parts E 0 , E m and E 1 where E m is restricted to one single round, the junction of both differentials which handles the BCTs.With this model they found a new boomerang distinguisher on 19-round GIFT, achieving a better probability than merging two optimal short trails.

Contribution.
In this paper, we propose to go further than both [SQH19] and [LS19] by providing a new tool to search for boomerang distinguishers.One limitation of the MILP model of Liu et al. is that it handles only one round for the middle part while Song et al. have shown that dependencies could affect much more rounds, for instance up to 6 rounds for SKINNY.First, we propose a new approach to turn an MILP model to search for truncated characteristics into an MILP model to search for truncated boomerang characteristics.The main novelty is that this model handles the dependencies in the middle rounds automatically.Furthermore, there is no need to specify which rounds are the middle ones, this is also directly handled by the model.Second, we propose a new Constraint Programming (CP) model to search for the best instantiation of a truncated boomerang characteristic.This model even goes further by clustering instantiations to improve the probabilities.Finally, we systematized the method from [SQH19] to precisely compute the probability of a boomerang.
We applied our new set of tools to the block cipher SKINNY [BJK + 16] and found many new distinguishers on all versions of the ciphers.Our results are given in Table 1.All previous results from [SQH19] are improved, in particular we found a new boomerang distinguisher on 18-round SKINNY-128/256 (i.e. on the TK2 model) with probability 2 −47.37 while the previous best distinguisher had probability 2 −77.83 .We experimentally verified some of the distinguishers to confirm the probabilities.
All our source codes and results (including characteristics) are publicly available at: https://gitlab.inria.fr/pderbez/boomerangskinnyOrganization of the paper.In Section 2 we recall previous definitions regarding DDT and BCT and introduced new interesting tables as the Extended Boomerang Connectivity Table .In Section 3 we detail how to precisely compute the probability of a boomerang on

Definitions
In this section we introduce the tables we will need to compute the probability of a boomerang characteristic.Figure 1 shows a representation of the differences used for each table.We also exhibit several properties on those tables.

Tables
We first recall the definitions of the DDT and the BCT [CHP + 18] for an n-bit (invertible) Sbox S.
Definition 1.Given γ, θ, δ ∈ F n 2 three differences, the DDT and the BCT are defined as Recently in [WP19] two extra tables have been introduced under the name BDT and BDT'.We recall here their definitions but change their names to UBCT (for Upper BCT) and LBCT (for Lower BCT).We believe those names emphasize better the fact that the Upper (resp.Lower) BCT focuses on the upper (resp.lower) characteristic.

Definition 2.
[WP19] Given γ, θ, δ ∈ F n 2 three differences, the UBCT and the LBCT are defined as Figure 1: Differences used in the tables for a single layer of Sbox These two tables are symmetric in the sense that one focuses on the upper characteristic, and the other focuses on the lower characteristic.All the properties found on one have their counterparts for the other table.
Finally, we define a last table called Extended BCT (EBCT) and originally proposed by Boukerrou et al. in [BHL + 20].This table counts the number of values such that the boomerang will return on a single Sbox, while fixing all the differences.Definition 3. Given γ, θ, λ, δ ∈ F n 2 four differences, the EBCT is defined as Note that the EBCT contains 2 4n entries but only approximately 2 5n−3n = 2 2n should be non zero.Those entries can be computed in 2 3n simple operations.The idea is to exhaust all the possible values for x, γ, and λ and each time to compute the corresponding value of both θ and δ before checking whether the last equation holds.Actually all the tables described in this Section have approximately 2 2n non-zero entries and can be computed in 2 3n operations.In [Dun18], Dunkelman shows an algorithm to construct the BCT in O(2 2n ) operations and it is an open problem to know whether the UBCT/LBCT and the EBCT could be constructed with the same complexity.Definition 4. Let γ, θ, λ, δ ∈ F n 2 .We define for each table the transition probability as

Table Properties
There are many properties linking each table.The BCT and the DDT have already been studied and properties between them can be found in [CHP + 18, SQH19].Here we present properties on the EBCT, as well as the UBCT and LBCT.

Properties on the EBCT
Proof.All the properties directly follow from the definitions, and the property that ∀α ∈ F n 2 , DDT(0, α) = DDT(α, 0) = 2 n • 1 α=0 (injectivity of the Sbox).A detailed proof of those properties is given in Appendix C.
The following property was already stated in [SQH19] in different terms.It explains the sandwich construction, when stating that the middle part of the cipher bears all the dependencies between the two characteristics.
Property 2. Let α, β ∈ F n 2 , and let δ ∼ U(F n 2 ) be a random variable following the uniform distribution over F n 2 .Then Proof.We prove below the property for the UBCT, the one for the LBCT can be done in a similar way.
This property helps us to understand the sandwich construction.The middle part of the cipher is where the dependencies between the upper and the lower characteristics happen.When the middle is big enough, the differences after the middle rounds in the upper characteristic will follow a uniform distribution, and thus the property can be applied to compute the probability for both E 0 and E 1 as the square of probability of the differential characteristics.

Probability of Boomerang Characteristics
In this section we detail the computation of the probability of a boomerang characteristic.We illustrate this on the block cipher SKINNY.The matrix has a low weight (only half of the coefficient are non-zero) and the tweakey is xored to the two first rows only.It enciphers blocks of length n = 64 or n = 128 bits seen as a 4 × 4 matrix of cells.We use positions to refer to them: the position 0 is the one at the top-left corner, and we then number them from left to right, and from top to bottom.SKINNY has three main tweakey size versions: the tweakey size can be equal to t=64 or 128 bits, t=128 or 256 bits, and t=192 or 384 bits, and we denote z = n/t the tweak size to the block size ratio.The round function of SKINNY is depicted on Figure 2, and the number of rounds is directly derived from the z value: between 32 rounds for the 64/64 version up to 56 for the 128/384 version.We refer the interested reader to [BJK + 16] for more details.

Computing Probability
Given a boomerang characteristic, i.e. two differential characteristics for the upper and the lower characteristic, we now want to compute the probability for the boomerang to return.To compute the probability, one simply has to multiply the probability of transition for each Sbox separately.To compute the probability of transition for one particular Sbox, the tables defined in Section 2 are used, depending on the configuration of the differences on the characteristic.We show on an example how to choose the tables depending on the situation.
Example 1. Figure 3 shows a toy example of boomerang characteristic with two differential characteristics on 3 rounds of SKINNY-64 (the upper characteristic is above the lower characteristic).We suppose that we are not adding differences in the key (SK model), so we omit the key from the figure as it will not modify the differences.Lime (for the upper characteristic) and pink (for the lower characteristic) colored cells are non zero differences, and grey cells are unspecified differences.All the differences are given in hexadecimal.We denote the input and output differences by ∆ ex = [0, d, d, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0] and ∇ ex = [0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] respectively.The probability can be computed by multiplying the probability of transition for each Sbox independently.1 • Round 0: There are only three Sboxes to consider, the transitions on the other ones being valid with probability 1: -The Sboxes in Positions 1 and 2 (reading from left to right, and from top to bottom), having the transitions d → 2 and d → 9, are passed using the probability P DDT , because of Property 1 (1a) (the lower characteristic has a zero difference).
-The Sbox in Position 9 has a difference in the upper characteristic (input and output), and a difference in the lower characteristic, but the input difference in the lower characteristic is not specified.This is the use case for the UBCT table.
The transition probability for Round 0 is then • Round 1: There are only four Sboxes to consider: -Same as in Round 0, the Sbox in Position 2 is passed with probability P DDT (9, 4) -The Sboxes in Positions 1 and 11 do specify the differences in the input of the upper characteristic and the output of the lower characteristic, but not the other.This is the use case for the BCT table.
-The Sbox in Position 15 specifies all the differences, this is the use case for the EBCT table.
The transition probability for Round 1 is • Round 2: There are only two Sboxes to consider: -The Sbox in Position 2 specifies the differences in the lower characteristic, and in the input of the upper characteristic, it is the use case for the LBCT table.
-The Sbox in Position 5 specifies the differences in the lower characteristic, and the difference in the upper characteristic is unspecified.By Property 2, the probability is P DDT (5, 2) 2 The transition probability for Round 2 is The probability of the boomerang characteristic is then To sum up when to use each table, we can give the following rules covering every case: • P DDT : difference specified in one characteristic (input and output), and zero in the other characteristic • P DDT (•, •) 2 : difference specified in one characteristic (input and output), and unspecified difference in the other characteristic • P BCT : input difference specified in the upper characteristic, output difference specified in the lower characteristic, and other differences unspecified.
• P UBCT : input and output difference specified in the upper characteristic, output difference specified in the lower characteristic (and input unspecified) • P LBCT : input and output difference specified in the lower characteristic, input difference specified in the upper characteristic (and output unspecified) • P EBCT : all differences specified

Clusters
Recall that for a boomerang distinguisher, the only important differences are the input of the upper characteristic (and upper key), and the output of the lower characteristic (and lower key).The values of the characteristics in all the rounds do not need to be specified at all, and it is simply a lower approximation of the distinguisher probability.
In this section we propose a high-level procedure as a simple method to generate a formula that will give a more precise probability of the distinguisher, without focusing on a single characteristic.It follows the work of Song et al. from [SQH19] in a more systematic manner.The main novelties are related to the accuracy of the formula and its computation.More precisely, thanks to the tables introduced Section 2, the only approximation in our formula comes from the assumption that Sboxes are independent.Of course this leads to a more complex formula which may be too costly to compute directly.To overcome this issue, we describe in Section 3.3.2an algorithm to reorder the terms in the formula to simplify its computation.

Generation of the formula
The procedure takes as input a fixed number of rounds, the input difference of the upper characteristic (and the upper key), and the output difference of the lower characteristic.The procedure is symmetric in the upper and lower characteristic, but the lower characteristic applies the inverse operations (as in the decryption process).
1. Propagates the zeros from the initial differences:2 (a) For the upper characteristic, start from the input difference, and propagates the zeros to the next rounds.
(b) For the lower characteristic, start from the output difference, and propagates the zeros backward to the previous rounds.
2. Mark the differences: for each Sbox in the upper (resp.lower) characteristic with non-zero difference and such that the lower (resp.upper) difference is non-zero as well, mark the input (resp.output) difference of the upper (resp.lower) characteristic.We do not mark the input difference of the upper characteristic nor the output difference of the lower characteristic because the values are fixed.
3. Mark all the non-zero differences that propagate to a marked difference (recursively).
The propagation goes from right to left for the upper characteristic, and from left to right for the lower characteristic.
4. Create a variable for every marked difference in the output (resp.input) of a Sbox for the upper (resp.lower) characteristic.Let us take the same example as before, but this time we will not consider a fixed characteristic but rather the distinguisher.
Example 2. Figure 4 shows a representation of the distinguisher after applying the procedure to generate the formula.Let us follow step by step the procedure to show how to generate the figure .1.The first step is to propagates the zeros, from the input of the upper characteristic and the output of the lower characteristic.This gives us the blank cells and the gray cells (currently no cells are colored, they are all either white or gray).
2. The second step will mark the cells colored in magenta and green.We do not mark R0 for the upper characteristic nor R2 for the lower characteristic because the values are fixed.
3. The third step will mark all the remaining colored cells in lime and pink.We start with the upper characteristic.To know the values of the two cells at Positions 2 and 5 in R2, we need to know the values at Positions 1, 2, and 15 before SB.Note that the other cells that are involved are already known to be zero.Then, recursively, to know the byte at Position 2, we have to know the one at Position 2 in the previous state.Note that the other ones (those at Positions 1 and 15) are already marked.A similar reasoning allows one to do the propagation regarding the cells at Positions 1, 11, and 15 in R1.Regarding the lower characteristic, we start in R0 with the cell at Position 9. To know its value, we need to know the values of the cells at Positions 7 and 15 in R1 (before the SB transformation).The one at Position 7 is zero, thus we only have to mark in pink the one at Position 15.Similarly, we do the propagation from R1 to R2.
4. The fourth step will create the variables, and apply the linear function to get the input (resp.output) of the Sboxes for the upper (resp.lower) characteristic.
5. The fifth step is simply writing a sum over all the variables, and multiplying each transition probability.The final formula is .

Automated tool
We implemented in Java the algorithm to automatically compute the probability of a boomerang.This tool generates the formula and computes it (if feasible).It is designed specifically for all versions of SKINNY, but is modular and could easily be modified to generate and compute formulas for other ciphers.
The main issue with the accurate formula is the complexity of computing it.For instance, in the formula above, computing P ∆ ex E − ← − − → − ∇ ex requires to sum over 9 variables.While this is reasonable if those variables are nibbles it is not for bytes.To reduce the complexity of the computation we propose two optimizations.
Do not enumerate zeros.Do not enumerate inconsistent values for the variables.For example, there are only few values α such that P DDT (1, α) = 0, and only these values need to be considered.
Reordering the formula.It is possible to split it into two sub-formulas that can be computed recursively.This is very important because the number of variables involved in the sum is really the limiting factor.Let n be the number of variables involved in the sum.The goal is to rewrite the formula as a product of sums so that each of them can be computed independently.Of course the sums should involve as few variables as possible to be easy to compute.Hence the goal is to find the product of sums minimizing the maximum number of variables in each sum.This can be solved using dynamic programming.Let X be the set of all variables involved in the formula and T be a table indexed by all subsets of X.The idea is to store in T [A] the depth of formula assuming variables from Assuming T is fulfilled for subsets of size k, let us see how fulfill T for subsets of size k + 1.Let A be a subset of size k + 1 and rewrite the formula as Now there are two cases.If there exists two non-empty subsets A 1 and A 2 such that ), then the formula can be rewritten as ).Note that here we do not have to try each possible decomposition.The idea is to start from a variable x of A and to construct the smallest set then we need to find the variable x of A minimizing the complexity of computing the formula as Thus in that case ).The time complexity of this procedure is upper bounded by If n is too large to perform this algorithm one can use some heuristics to reorder the formula in the most efficient way possible.But note that for a byte-oriented block ciphers we can hardly compute a formula with a depth greater than 6 and thus n should most often belongs to the practical range of the algorithm.For instance the formula for P ∆ ex E − ← − − → − ∇ ex can be rewritten with depth 5 as follows: Remarks.It may happen that the formula cannot be computed in a reasonable time and in that case we have no other choice than to make approximations as for instance assuming some variables as uniform.Note also the formula is valid only if the characteristics are the same in both sides of the boomerang.Actually when considering the distinguisher (hence only the input and output of the boomerang), the two sides of the boomerang could take completely different differential characteristics, as long as the output difference is the same, the boomerang returned.Supposing that the values on each side of the boomerang could be different is possible, but would suppose to generate tables with up to eight entries (duplicate each entry, and there are already 4 entries in the EBCT), and would also double the number of variables in the formula.These two reasons explains why we did not consider this case in this formula, but we still considered different comebacks in some parts of the distinguisher on particular cases in Section 5.3.

MILP Model to Search for Truncated Boomerangs
As for the search of differential characteristics, we chose to split the search of boomerang characteristics into two steps.In the first one we search for good truncated boomerang characteristics, and in the second one we search for the best possible instantiations of them.This section is dedicated to the MILP model we created to find truncated boomerang characteristics.
Let E be a classical SPN cipher of R round with an n-cell internal state and such that the round function is composed of a SubCell operation, a key addition and a linear layer which multiplies the internal state by a matrix M (at the cell level).Assuming an MILP model to search for truncated (related-key) differential characteristics on this cipher, we show how to turn it into an MILP model to search for truncated boomerang characteristics.
The first part of the model consists of writing twice the MILP model for truncated differential, once for the upper characteristic and once for the lower one.Such models are somehow easy to write and are already available for several block ciphers [ZDY19, BJK + 16].We assume for each cell of each internal state of the upper (resp.lower) characteristic a binary variable isActiveUp (resp.isActiveLo) indicating whether the cell is active or not.In the following we explain which constraints to add to search for good truncated boomerang characteristics.Note that we do not define E 0 , E m and E 1 , because the upper and lower characteristics will be searched on all the rounds.To represent the fact that some differences will take any value uniformly, we introduce free variables (non free variables will be called controlled variables).Controlled variables are the differences that will be set to a fixed value in the characteristic.

Controlled/Free Variables
We introduce two sets of binary variables for each characteristic: isFreeXup and isFreeSBup (resp.isFreeXlo and isFreeSBlo) to indicate whether a difference will be free before and after the Sbox respectively.For the upper characteristic if a difference is free before an Sbox (i.e.isFreeXup = 1), then it is free after the Sbox (isFreeSBup = 1).For the lower characteristic, if a difference is free after an Sbox, then it is free before the Sbox (because the propagation is done in the opposite direction).This leads to the following constraints: Those variables are also related to both isActiveUp and isActiveLo because a difference can be set to 0 only if the difference is controlled.Thus we have the following constraints: Another important constraint is the one stating that a free difference propagates with probability 1 (i.e.no cancellations occur).For the upper characteristic we define depsU(i) as the set of all the indexes j such that the coefficient m i,j of the matrix M (of the linear layer) is non-zero.For the lower one, we define depsL in a similar way but from the matrix M −1 .Then the constraint of propagation of free variables are simply: In order to apply the tables defined Section 2, we need an extra constraint to ensure that the probability of each Sbox can be computed.More precisely, we require that at most 2 variables can be free for each Sbox (considering upper and lower characteristic, before and after the Sbox).This leads to the following constraints: With all these constraints, the solutions generated will lead to truncated boomerang characteristics.We emphasize with our new set of constraints there is no middle rounds defined for our truncated boomerang characteristics.In particular, the BCTs are not necessarily all on the same round but may be spread over several rounds.Thus our modelization is more generic than the previous ones, in particular than the modelization proposed by Liu et al. in [LS19].

Objective Function
The MILP model describes all truncated boomerang characteristics but there are too many of them and it is impossible to exhaust all of them.But we still need to find good truncated characteristics.When searching for truncated differential characteristics, active Sboxes are counted to give a lower bound on the probability of the best possible instantiation of the actual characteristic.The same approach is done here, but because of the multiple different tables, it is not as easy.The important data is the maximum probability of a (non-trivial) transition for each table.As the probabilities are multiplied in the final formula, we consider the exponents in base two of the maximum probabilities so it will be possible to sum all the values.We create one variable per Sbox per table, equal to one if the table is used.These variables can be easily fixed from the previous variables using the rules for deciding when to use each table.More precisely we introduce the binary variables isDDT, isDDT2, isBCT, isUBCT, isLBCT and isEBCT.Then we have the constraints Then, the objective function will be a weighted sum over all the Sboxes and over all the tables, weighted by the maximum probability exponent.For the probability of transition P DDT (•, •) 2 , simply multiply by two the weight of the DDT.Let n T be the − log 2 of the highest probability of a transition in table T .The objective to minimize is then:

Application to SKINNY
We applied this model on all versions of SKINNY.We used the original MILP model described in [BJK + 16] for truncated differential characteristics.Because for SKINNY the maximum probability of a non-trivial transition for the DDT, UBCT, LBCT and EBCT is 2 −2 , and the maximum probability for the BCT is 1 = 2 0 , we simplified the model to create only two binary variables isTable and isDDT2 for each Sbox.The final objective is the sum of all these variables.The variables isDDT2 is equal to 1 if, and only if the transition probability is P DDT (•, •) 2 , and the variable isTable is equal to 1 if, and only if there is a non-zero difference in the upper or lower characteristic through this Sbox, and the table used is not the BCT.Indeed, due to the possibility of a probability 1 transition through the BCT, it is not counted in the objective.Truncated boomerang characteristics were generated for all the versions of SKINNY using the MILP solver Gurobi. 4For truncated boomerangs, there is no difference between the 64 and 128 bits versions as for each table the optimal probability is the same for both versions.Gurobi allows to either get one optimal solution, or get the N best solutions.Both solvings are important: finding the optimal objective allows to know the number of rounds when the boomerang characteristics have probability lower than 2 −64 or 2 −128 ; finding the best solutions allows to search for different characteristics in the next step, hence increasing the chances to find a good distinguisher.
Table 2 shows all the optimal objectives for different number of rounds on different attack models.Recall that this objective has to be multiplied by two to get the best boomerang characteristic probability exponent, because of optimizations done in the objective function dedicated to SKINNY.

Remark.
It is interesting to note that our model automatically switches to classical truncated differential characteristics after reaching a particular number of rounds.More precisely, when the number of rounds studied is too high, the best boomerang characteristic is the best differential characteristic (the orthogonal difference is 0 for all rounds).For instance we observe on Table 2 that after 13 rounds in the single key model we obtain the exact same bounds than for truncated differential characteristics.This shows how much our model is generic.

Optimizations
For SPNs, a table showing the minimum number of active Sboxes for a given number of rounds is often available.For example on SKINNY, the table is given by the authors [BJK + 16] for up to 30 rounds on the different attacker models (SK, TK1, TK2, and TK3).We can use this table to add non-trivial redundant constraints to the model.Let tab[r] be the minimum number of active Sboxes on a differential characteristic with r rounds.Then we add the constraints: We experimentally noticed that adding those constraints really speeds up the solving process as it is now easier for the solver to prove lower bounds.
There are many minor optimizations that are possible but really dependent on the solver used.For Gurobi, we found out that solving a sub-problem first and then solving the full model really helps in the resolution (helping to know in advance what variables to branch on in the solving).What has been done is to add one constraint forcing to use a BCT table in one cell on Round R/2.Solving this problem is really faster because BCTs force to have free variables in the upper and lower characteristics that will propagate further into the rounds.Then we simply remove the constraint and solve the full model.

Limitations
Even though the model is really simple and can be generalized easily on different ciphers, some issues has been found when testing on SKINNY.The main issue is the fact that it is possible to have truncated boomerangs that differ only on some f ree variables.These truncated boomerangs are duplicates in the point of view of distinguishers, and thus instantiations will be almost the same.Moreover, when applying the procedure to compute the probability of the boomerang, they will have exactly the same probability because the input and output will be the same.
The second issue that is related to these almost duplicates is the number of solutions which grows exponentially.The tool was configured to find the N best solutions and thus would find non optimal solutions.But as there were too many of them, it was not able to go much further than optimal objective (in a reasonable time limit).
Overcoming the limitations.Some solutions have been found to speed up the computations.The goal here is to add constraints that are always verified in the best solutions.These constraints will be specific to each cipher.On SKINNY, we found out that in the best solutions for both the TK2 and TK3 models, there were always a given number of rounds with only zero differences (in the upper and lower characteristics).
We added those constraints to force some rounds to be zero.To leave some freedom, we did not force the maximum number of rounds to zero, therefore, the solver could choose to add other rounds to zero before or after the ones we fixed.Adding these constraints really speed up the solving, and reduced the number of solutions generated, allowing the model to search for non optimal solutions.Actually, on SKINNY, most of the best boomerang distinguishers given in Table 1 were obtained this way, however the results in Table 2 were obtained without adding constraints to ensure to have a proved optimal objective.Finally, we found that the most difficult part of the model to solve is when there is no BCT and adding a constraint to ensure that at least one BCT appears in the characteristic improves the solving time by a huge factor (≥ 100).But in that case we would miss some characteristics, in particular the model would not handle truncated differential characteristics.Still we believe this is a good trade-off between the solving time and the quality of solutions.

Instantiation of Truncated Boomerangs
After having generated truncated boomerangs, the next step is to instantiate them to actual values and find the best possible instantiation.

CP Model
We made a Constraint Programming (CP) model which uses the truncated boomerangs to know which nibbles/bytes have to be fixed to zero, greatly reducing the search space.The CP solver used is Choco [PFL16] version 4.10.2.It allows to simply define the constraints, and change the search strategy to one that is more adapted to the problem.Using a truncated solution, it is straightforward to know which table (DDT/BCT/...) is used for each Sbox.Our model is very simple as each transition (DDT/BCT/... but also the xor operation) is written with a constraint table, i.e. using a table containing all the possible (or impossible) transitions.For instance, describing that x ⊕ y = z for binary variables could be done with constraint: (x, y, z) ∈ Tab ⊕ , where Tab ⊕ = {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)} To get an efficient model, we chose to create a new model for each truncated boomerang characteristic.For each Sbox as well as each xor, we create the corresponding constraint table.An extra variable is added for each probabilistic transition such that every entry of the table is associated to the exponent in base two of the probability.The objective is then the sum of the exponents for each active Sbox.

Finding Characteristics
To find good characteristics, the solver first finds all the instantiations that have optimal probability and that have different input of upper characteristic, output of lower characteristic, and key differences (meaning that the characteristics comes from different distinguishers).Actually, due to the methods for solving CP models, we first have to find the optimal probability and then fix it and enumerate all the instantiations with this probability.Of course it is not always possible to find this optimal probability in a reasonable time and we added a time limit to the solver.
Then, for each distinguisher found (associated to one truncated boomerang), we can enumerate all the instantiations having the same input and output and key differences, and sum the probabilities of these instantiations.In practice it is not possible to enumerate all the instantiations, so we have to give a bound on the number of solutions generated, or the probability up to which we consider the instantiation.Experimentally, we found out that searching for solutions with probability better than 2 −10 • opt was a good compromise between the running time and the final probability raised by the model.
This solving process allowed us to get good boomerang distinguishers for each truncated characteristic.A time limit of ten minutes has been put to limit the computation time, but it was only limiting for some instances of TK3 in 128 bits.Then, to get a better approximation of the probability, we took the best distinguisher found and analysed it with a more precise cluster analysis.
After finding a good characteristic, a more precise cluster analysis is done on the best characteristic found to have a better approximation of the probability.

Precise Cluster Analysis
To get a better approximation of the boomerang probability, we then had to apply our tool from Section 3.3.2 to compute the probability of the best distinguisher found by the CP model.In practice, the tool can not compute the probability for more than 6 rounds because of the number of variables created.Hopefully, we were able to use the particular form of the solutions of boomerangs on SKINNY.
Table 3 and Figure 5 both depict the same boomerang characteristic on 18-round SKINNY-128-256 (in the attack model TK2 for which differences are introduced in the two parts of the tweakey).We show here how we were able to compute precisely the probability of this boomerang distinguisher.
We first note that some of the differences are fixed because of the cancellations.This happens for the differences 03 in round R0, 02 in round R2, and 04 in round R6 for the upper characteristic, and differences 90 in round R11 and 21 in round R15 for the lower characteristic.
Next, we observe that middle rounds are defined automatically from the structure of the distinguisher.We do not explain in details here how to compute the probability of the middle rounds R7 to R11 as it was already done in Section 3.3.The only detail is the approximation in round R11 where we supposed that the difference in Position 2 is uniform in the upper characteristic.This approximation has been made because without it the formula was too expensive to compute.
Due to the zero Rounds R3 to R6 for the upper characteristic and R12 to R15 for the lower one, Rounds R0, R1, R2 and Rounds R16 and R17 can be be analyzed as differential characteristics but the probability will be squared (the difference in the other characteristic is uniform, hence we can apply Property 2).Applying the standard formula to compute the probability of the differential characteristics, we obtain: • Upper: P DDT (80, 03) 2 • α,β∈F n 2 P DDT (01, α) 6 • P DDT (α, β) 2 • P DDT (β, 02) 2 • Lower: As explained previously, it is no required to have the same differences in the comeback of the boomerang.This means that for each difference where we define a variable, we now define two variables, one for each side of the boomerang.The positions at which we were able to define two variables for the back and forth of the boomerang are marked with a tilde.The probabilities then becomes: For Rounds R7 to R11, we apply the precise boomerang probability computation presented in Section 3.3.1.The formula was generated by the automated tool, but as already mentioned we had to approximate the upper characteristic in Round R11 as uniform, otherwise 8 extra variables would appear in the formula making it too expensive Table 3: Characteristic on 18 rounds of skinny-128-256.Values are represented in hexadecimal, greek letters are variables used in the sum, and dashes are unspecified differences.For each round, the two first lines describe the state before and after the SubCells operation and the third line describes the round key.
Remark.Beside values, both the upper and lower characteristics are the same than the ones studied in details in [SQH19].The only difference is that the upper characteristic is shifted such that the active byte on R7 is on the second column instead of the third.This small change increases the probability of the boomerang by a huge factor of 2 30 , showing that it is important to search a boomerang as a whole and not as two independent characteristics.Other characteristics.The same approach was used for all the characteristics given in Table 1 but the higher the number of rounds is, the higher the number of approximations that have to be made in the formula.As a result, all the probabilities for the different versions and number of rounds in SKINNY were computed in less than 10 minutes each, except for one that took 30 minutes.

Experimental Results
To validate our results we experimentally verified the probability of four boomerang distinguishers.For each of them we first randomly pick a key and then, as long as a rightful quartet is not found, randomly pick a plaintext, construct the corresponding quartet and then check whether the boomerang came back.The number of trials required to generate a rightful quartet is then compared to the expected probability of the distinguisher.

B Boomerang characteristic on TK3 24 rounds 128 bits
Table 4 shows the distinguisher found on 24 rounds of skinny 128 bits, under TK3.To be able to compute a probability, some approximation have been made.Ideally we would not do any approximation and consider a middle round composed of the rounds R9 to R15, but too many variables would be introduced and the final formula could not be computed.Three types of approximations have been made: • Fixing a value of a cell: the value of cell 14 of the lower characteristic in round R14 have been fixed to 0x01, and cell 15 of round R10 in the lower characteristic have been fixed to 0x10.
• Supposing the uniformity of some cells: for the upper characteristic cells 8 of R13, 14 of round R14 and 1 of round R15 have been supposed to take uniform values.For the lower characteristic cells 9 of round R9 and cells 3 and 10 of round R10 have been supposed to take uniform values.

C Proof of Property 1
In this section we detail the proof for some of the properties stated in Section 2.2.

Figure 3 :
Figure 3: Toy example of boomerang characteristic on three rounds of skinny-64 in the SK model 3

Figure 4 :
Figure 4: Toy example of boomerang cluster on 3 rounds of SKINNY-64 in the SK model

Table 1 :
[SQH19] for different versions and number of rounds on skinny.Probabilities marked with asterisks have been validated experimentally.The four previous results from[SQH19]are also given in parenthesis.
the middle rounds.In Section 4 we describe our new MILP model to search for truncated boomerang characteristics and give results for SKINNY.Finally in Section 5 we describe our CP model to instantiate truncated boomerang characteristics and detail the new boomerang distinguisher on 18-round SKINNY-128/256 we found.
SKINNY is a family of very lightweight tweakable block ciphers designed by Beierle et al. and presented at CRYPTO'16 [BJK + 16].The round function of SKINNY follows a classical SPN structure and a non-MDS binary matrix is used for the MixColumns operation.

Table 2 :
Step 1 objectives in skinny.The objectives in parenthesis are upper bounds, the computations were stopped after 1 day on 32 cores.one table can be used for an Sbox.Writing the constraints relating those variables to variables isActive and isFree can be a bit tricky.We propose to use the approach used by Abdelkhalek et al. in [AST + 17].The idea is to write all impossible cases and to reduce the corresponding set of inequalities using the Quine-McCluskey algorithm.
The model is inspired from [DDH + 20] in which Delaune et al. describe a CP model to instantiate truncated differential characteristics on SKINNY.Their CP model seems much faster than the MILP approach used by Abdelkhalek et al. in [AST + 17].To adapt it to boomerang distinguishers we simply had to modify the table constraints representing the transition probability for each Sbox.
[SQH19] paper we proposed a new set of tools to search for boomerang characteristics on SPN.In particular, we have shown how to turn an MILP model searching for truncated differential characteristics into an MILP model searching for truncated boomerang characteristics.The main advantage of our method is that the middle rounds of the boomerang do not have to be specified and are automatically handled by the model.We have also shown that a CP model can be used to find the best possible instantiations of a given truncated boomerang characteristic.Finally we systematized the method developed by Song et al. in[SQH19]to precisely compute the probability of a boomerang.As a result, we applied our new tools to the block cipher SKINNY and found many new boomerang distinguishers up to 24 rounds in the TK3 model.In particular, we improved by a factor 2 30 the probability of the best known distinguisher against 18-round SKINNY-128/256.SAC 2007, Ottawa, Canada, August 16-17, 2007, Revised Selected Papers, volume 4876 of Lecture Notes in Computer Science, pages 212-231.Springer, 2007.

Table 4 :
Characteristic on 24 rounds of skinny-128-384.Values are represented in hexadecimal, greek letters are variables used in the sum, and dashes are unspecified differences.