Increasing Precision of Division Property

. In this paper we propose new techniques related to division property. We describe for the ﬁrst time a practical algorithm for computing the propagation tables of 16-bit Super-Sboxes, increasing the precision of the division property by removing a lot of false division trails. We also improve the complexity of the procedure introduced by Lambin et al. (Design, Codes and Cryptography, 2020) to extend a cipher with linear mappings and show how to decrease the number of transitions to look for. While search procedures for integral distinguishers most often rely on MILP or SAT solvers for their ease of programming the propagation constraints, such generic solvers can only handle small 4/8-bit Sboxes. Thus we developed an ad-hoc tool handling larger Sboxes and all the improvements described in the paper. As a result, we found new integral distinguishers on SKINNY-64 , HIGHT and Midori-64 .


Introduction
Integral cryptanalysis exploits distinguishers computing the sum of ciphertexts corresponding to a set of plaintexts spanning a linear subspace. This technique was originally introduced by Knudsen in [DKR97] as a specific attack against the byte-oriented structure of the block cipher SQUARE. In 2000, Ferguson et al. [FKL + 00] presented at FSE powerful attacks based on integral distinguishers against round-reduced versions of AES, named Partial Sum attacks. In particular they described a practical attack against 6 rounds which is still one of the best known attacks against AES. Integral distinguishers were found by propagating through the round functions simple properties on words composing the internal states: ALL (the word takes all the possible values once), BALANCED (the word sums to zero), CONSTANT (the value of the word is constant).
The so-called division property, introduced by Todo at Eurocrypt'15 [Tod15], is a method to find more sophisticated integral distinguishers. The idea behind the division property technique is actually quite simple. Let f and g be two n-bit functions and assume the goal is to find an integral distinguisher on g • f without computing it explicitly. Let y i = f i (x 0 , . . . , x n−1 ) and z i = g i (y 0 , . . . , y n−1 ) be the intermediate and final expressions of the coordinate functions of f and of g, and let m z be a monomial in the z i 's, and so m z is a polynomial in some m y monomials. Division property actually captures that if for a subset X of F n 2 each monomial m y appearing in m z satisfies x∈X m y (x) = 0 then x∈X m z (x) = 0. Several variants of this property were used to find integral distinguishers. For instance, in [TM16], Todo and Morii used that if all monomials m y but one sum to zero then x∈X m z (x) = 1. And more recently, in both [HLM + 20] and [HLLT20], the exact relation was used: x∈X m z (x) = 0 if and only if the number of monomials m y for which x∈X m y (x) = 1 is even.
In practice we cannot try all possible sets X nor compute the corresponding sums for all monomials involved in the description of a cryptographic primitive. Furthermore we typically want integral distinguishers independent from the key, adding an extra complexity to the problem. However it is easy to show that if P is a polynomial in variables (x 1 , . . . , x n ) then (x1,...,xi)∈F i 2 P (x 1 , . . . , x n ) = 0 for each value of (x i+1 , . . . , x n ) if and only if P does not involve a monomial containing all the variables x 1 , . . . , x i . This property can be understood more easily using higher-order differential and means that if we derive i times w.r.t. to the i first variables, a multivariate polynomial P that does not contain a monomial involving the x 1 x 2 . . . x i monomial, then we get the 0 polynomial. Thus integral distinguishers are highly related to the maximal monomials involved in a polynomial and division property can be seen as a method to track them through an iterated function.
Searching for integral distinguishers. The main difficulty is to efficiently modelize the propagation of division property through the round functions of a cipher. Except in [TM16] where Todo and Morii used an ad-hoc tool to exhaust division trails on SIMON-32, searching for integral distinguishers usually relies on generic solvers for MILP, SAT or SMT models. In [XZBL16] Xiang et al. show that it is possible to describe transitions through small Sboxes with inequalities by computing the convex hull of points. This work has been extended by Zhang and Rijmen [ZR19] to binary linear mapping. Eskandari et al. in [EKKT18] have built a tool called Solvatore to find such division property trails using a SAT solver and found many new integral distinguishers. The difficulty of the search procedure depends on the cipher and on the variant of division property implemented. The original variant is the simplest to search for but is also the less accurate as it may miss some cancellations of monomials and thus miss distinguishers. In [HLLT20], Hebborn et al. worked with the exact variant and described a new method dedicated to (small) block cipher aiming at proving that for each linear combination of the ciphertext bits and for each degree n − 1 monomial in the plaintexts bits, there is at least one key (considering independent round keys) for which the monomial appears in the ANF of the linear combination. They used a heuristic approach to find round keys for which evaluating the parity of division trails is the cheapest. As a result they found that 13-round SKINNY-64, 11-round Gift and 11-round PRESENT are all immune to integral distinguishers if considering independent round keys.

Our Contributions.
In this paper, our contributions are three-fold.
i) Our main idea is to increase the precision of the original division property. To this end, we want to handle larger Sboxes than the typical 4/8-bit Sboxes block ciphers are usually composed of. More precisely, we want to handle Super-Sboxes to cover two layers of Sboxes in one operation. Hence, in this paper we propose a new algorithm to compute the so-called propagation table associated to a Super-Sbox. Our algorithm computes the propagation table corresponding to a collection of k n-bit permutations in O(nk2 2n + 2 3n ) simple operations while applying k times the classical algorithm would lead to a complexity in O(k2 3n ).
ii) MILP and SAT solvers seem to be unable to efficiently handle such large propagation tables and we decided to implement an ad-hoc tool to this end, based on a classical branch-and-bound. To the best of our knowledge, this is the first time an ad-hoc approach can practically search for division trails on 64-bit block ciphers without relying on generic solvers. It is well-known that MILP and SAT solvers make easier the development of tools, but they also have caveats such as it is hard to predict their running time and it is also hard to reverse engineer the algorithmic techniques they used to speedup the running time. For cryptanalysts, such generic tools give a first iii) We also provide several new algorithms which may improve all previous models related to division property. First we show how to remove some unnecessary elements from a chain of propagation tables describing a cipher. This restricts the search space and decreased the running time of our tool up to factor 3. We provide as well new algorithms to add linear mappings around the cipher to extend the search space of division trails. Indeed, contrary to differential or linear cryptanalysis, integral division property attack are not invariant under linear mapping and Lambin et al. in [LDF20] have shown that considering linear mappings at the beginning and end of the cipher may allow to find integral distinguishers covering more rounds. In particular our new algorithms have a much better time complexity than the ones introduced in [LDF20] since for an m-bit Sbox we only have to consider 2 m mappings while Lambin et al. had to consider O 2 m 2 of them.
As a result, we found new integral distinguishers against the three blockciphers SKINNY-64 [BJK + 16], Midori-64 [BBI + 15] and HIGHT [HSH + 06], increasing the number of rounds covered compared to previously best known integral distinguishers. We also experimentally verified some distinguishers found on smaller instances in order to validate our tool. For instance, we searched for low data distinguishers by fixing some input bits of the Super-Sboxes to constant and we found an integral distinguishers requiring only 2 15 chosen plaintexts against both 8-round SKINNY-64 and 6-round Midori-64. All the results found by our tool are given in Table 1.
The C++ code of our ad-hoc tool is available at https://gitlab.inria.fr/pderbez/divlin For now it handles any 64-bit function which can be written as where σ is a permutation operating at the nibble level and where the f i 's are parallel applications of four 16-bit to 16-bit functions, eventually depending on a round key. This includes a large set of ciphers as TWINE, Gift, HIGHT, . . . .

Organization of the paper.
Section 2 contains the notations and definitions and Section 3 some related works. In Section 4 we present new techniques, including an algorithm to compute the propagation table of a Super-Sbox. In Section 5 we describe our ad-hoc tool to search for integral distinguishers. Finally, Section 6 contains our new results against the block ciphers SKINNY-64, Midori-64 and HIGHT.

Preliminaries
In this section, we give the notations and definitions we will use in this paper. We also introduce division property based distinguishers on block cipher.

Notations and Definitions
We denote x = (x 0 , . . . , x n−1 ) ∈ F n 2 an n-bit vector, where x 0 is the least significant bit and will often write x 0 x 1 . . . x n−1 instead of (x 0 , . . . , x n−1 ). There is a trivial mapping from n-bit vectors to monomials in variables (X 0 , . . . , X n−1 ) and we will often refer to x as a monomial. Definition 2. Given a set s of monomials, we denote by max(s) (resp. min(s)) the set of the maximal (resp. minimal) monomials of s.
Note that given a set s of n monomials, building max(s) (resp. min(s)) requires at most n × | max(s)| (resp. n × | min(s)|) comparisons and is upper bounded by n 2 . Furthermore, both operators min and max can be easily extended to polynomials over F 2 since they can be seen as set of monomials.
Definition 3 (Bit-product). For x, u ∈ F n 2 , we denote by x u the bit product Definition 4 (Bit-based Division Property [TM16]). A set X ⊂ F n 2 has the division property D n K , where K ⊂ F n 2 is a set, if for all u ∈ F n 2 , we have

Integral Distinguishers
Basically, the division property is a tool to track the monomials through the successive applications of a round function. Given a block cipher, let P b (X 0 , . . . X n−1 , K 0 , . . . , K m−1 ) be the polynomial describing the b-th bit of the ciphertext as a function of the plaintext (X) and the master key (K). If no monomial greater than or equal to X 0 X 1 . . . X i−1 appears in P b then for any value y of (X i , . . . , X n−1 , K 0 , . . . , K m−1 ) we have that which is a property a random function should not have. However, in practice we cannot computationally obtain the polynomial expression of all the bits of the ciphertext because the number of terms is too huge. Hopefully the division property tackles down this problem. Let f and g be two n-bit functions and let y i = f i (x 0 , . . . , x n−1 ) and z i = g i (y 0 , . . . , y n−1 ) = g i • f (x 0 , . . . , x n−1 ) be the intermediate and final expressions of the coordinate functions of f and g respectively. Division property captures that if for all monomials y v appearing in z u , y v does not involve a monomial greater than x w then z u (now seen as a function of the x i 's) does not involve a monomial greater than x w too. Hence, a common way to study division property for a block cipher is to study the division trails of this cipher, which show the propagation of the division property through the basic operations composing the block cipher.
Definition 5 (Division Trails [XZBL16]). Let f denote the round function of an iterated block cipher. Assume the input set to the block cipher has initial division property D n {k} , and denote the division property after propagating through i rounds of the block cipher (i.e. i applications of f ) by D n Ki . Thus, we have the following chain of division property propagations : Moreover, for any vector k i in K i (i ≥ 1), there must exist a vector k i−1 in K i−1 such that k i−1 can propagate to k i by the division property propagation rules, i.e. f ki contains a monomial m such that m k i−1 . Furthermore, for (k 0 , k 1 , . . . , In the rest of the paper, we will denote k f → k if the vector k ∈ F n 2 can propagate to a vector k ∈ F m 2 through the n-bit to m-bit function f . This proposition is the core of the division property technique. Given an n-bit to n-bit function f and x ∈ F n 2 , if there is no division trail through f from x to the i-th unit vector then it means that monomial x does not divide any of the monomials involved in the anf of the i-th coordinate function f i and thus there is an integral distinguisher on f i .

Related Works
In this section we recall some previous works our paper is based on.

Propagation Table
At ASIACRYPT'16, Xiang et al. [XZBL16] proposed an algorithm to compute the propagation table of an n-bit to n-bit function f . The propagation table of f is a table T such that for any m ∈ The algorithm building the propagation table is quite simple. It computes all the product f u , finds all the monomials included in at least one monomial of f u and adds u to the corresponding lines. At the end, each line of the table is reduced based on the fact that minimal monomials are sufficient to characterize the division property.

Algorithm 1: Building propagation table
Data: an n-bit to n-bit function f Result: T the propagation table associated to f init T as empty The exact complexity of the algorithm to produce the propagation table is hard to compute, as it depends on the function f and more precisely on the number of terms each coordinate function is composed of. In the worst case it is O(2 3n ) simple operations. It is depicted in Algorithm 1.

Linear Mappings at Input and Output
In [LDF20], Lambin et al. show that for a given block cipher E, we should consider L out • E • L in , where both L out and L in are linear mappings, since division property is not linearly invariant. This may lead to new distinguishers but the drawback is that the search space is greatly increased. For instance, let f k be the encryption function where p 0 , . . . , p 3 are non-zero polynomials. In that case classical application of division property would conclude that no output bit is balanced. But if either p 0 = p 2 or p 1 = p 3 then the xor of both output bits is balanced.
The idea proposed by Lambin et al. is, given an n-bit function f , to generate all the possible invertible n × n matrices and to compose them with f . Then all the corresponding propagation tables are built. From there, several matrices may lead to the same propagation table and so one may consider classes of equivalence to reduce the search space. However, the number of invertible matrices is around O(2 n 2 ), and it seems very complicated to go much further than n = 6. Hence they restrict themselves to cases where the linear mapping L in (resp. L out ) is applied in front (resp. back) of a 4-bit sbox.
As a result, they show that 10-round RECTANGLE [ZBL + 15] can be distinguished while previous best known distinguishers could not target more than 9 rounds.
Remark. Actually they do not run 2 n 2 × 2 n 2 = 2 2n 2 times the search procedure. For each invertible matrix they combine it to the sbox and compute its propagation table. Then if two matrices lead to the same propagation table they only have to try one of them since both would lead to the same result.

Advanced Division Property Search
In this section we present our new ideas to improve division property based search procedures for integral distinguishers.

Reducing Propagation Tables
Since searching for integral distinguisher boils down to exhausting all the possible trails which reach a unit vector, its complexity directly depends on the number of possible trails. We saw in previous sections that if both trails m 0 f − → m 1 and m 0 f − → m 1 are valid and if m 1 m 1 , then we can consider only the second one. This property allows to reduce the number of elements stored in the propagation tables and to decrease the number of possible trails to try in order to find integral distinguishers. Note that this property is local to a function. However, in practice, we search for trails through the composition of many functions and we propose to go further. Let we have: Our objective is to study division trails through f : Our idea is to remove unnecessary elements from both propagation tables T f 0 0 and T f 0 1 of f 0 0 and f 0 1 respectively. Let Y 2 and Y 3 be the sets containing all the possible values for y 2 and y 3 respectively as output of T f 0 1 and let (y 0 , y 1 ) and (y 0 , y 1 ) be two outputs of T f 0 0 . We say that (y 0 , y 1 ) is smaller than (y 0 , y 1 ) if and only if both the following conditions are satisfied: such that u u This means that for all trails going through (y 0 , y 1 ), there is a trail going through (y 0 , y 1 ) reaching a smaller output after f 1 . Hence the propagation table of f 0 0 can be reduced using this (partial) order by keeping only the minimal elements on each line. This order can be easily extended recursively to more rounds added at the beginning. Constructing the sets Y i is free as it can be done while constructing the propagation tables. Then at each step we only need to remember whether y i is smaller than y i or not.
In practice we found this technique to be very efficient to remove elements in the propagation tables. For instance on SKINNY-64 using this technique decreases the running time of our tool up to a factor 3.

Larger Tables and Super-Sboxes
For many SPN-based block ciphers, the internal state is a 4 × 4 matrix of cell (typically 4 or 8 bits) and the round function is the composition of: • a SubCells (SC) operation, applying the same Sbox to each cell independently; • a MixColumns (MC) operation, applying a linear transformation on each column independently; • a AddRoundKey (ARK) operation, xoring the round key to the internal state; • a CellsPerm (CP) operation, permuting the cells of the internal state.
Typically, the search of integral distinguishers is done by exhausting trails going through each layer successively. But given an n-bit function, Algorithm 1 produces the propagation table in roughly O(2 3n ) operations which is practical up to n ≈ 16. Hence it seems possible to improve the precision of the propagation by considering 16-bit Sboxes, and more precisely Super-Sboxes. Since all operations except the cell permutation act on column and since SC • CP = CP • SC, two rounds can be rewritten as CP • ARK • MC • CP • SSC where SSC acts on each column independently.
Our idea is to build the propagation table for each of the 4 parts of the SSC operation. Because of the key addition between the two layers of Sboxes, a naive approach would require to run Algorithm 1 for all possible values of the (part of) round key used in the Super-Sbox and then merge the propagation tables. This would quickly make the computation untractable. Instead we propose a new version of Algorithm 1, taking as input a collection of k n-bit functions and outputing the propagation table containing all the valid transitions for at least one of the function. This is described in Algorithm 2 and the time complexity of this algorithm is in O(kn2 2n + 2 3n ). Note that typical value for k is 2 n and so our algorithm has complexity O(n2 3n ), to be compared to O(2 4n ), the cost of calling 2 n times Algorithm 1.

Algorithm 2: Building propagation table of a collection of functions
Data: a collection F of k n-bit to n-bit functions Result: T the propagation table associated to F init T as empty The core idea of our algorithm is to first compute all the products for all the n-bit functions and to store u in T [m] only if the monomial m appears in one of the polynomials f u while, in Algorithm 1, u is stored whenever one monomial contains m. Only then the Organizing the computation according to the degree of monomials we can remove one step, and perform operations on smaller sets: We successfully ran this algorithm to generate the propagation tables associated to Super-Sboxes of many block ciphers, including for instance SKINNY-64 [BJK + 16], Midori-64 [BBI + 15] and PRESENT [BKL + 07].
Remark 1. Note that the complexity n2 3n is an upper bound which is never reached in practice. Furthermore all operations are very simple, sequential (cache-friendly) and easy to vectorize and parallelize. Hence the algorithm is practical for 16-bit Super-Sboxes and for instance it requires less than an hour on a 128-core server to build the

Linear Mapping at the Output
As explained in Section 3.2, in [LDF20] Derbez et al. suggested to compose the last round with an invertible matrix to extend the distinguishers we can search for. But since we are looking for integral distinguishers, we are only interested in knowing whether the i-th bit of the output is balanced or not. Hence there is no reason to consider invertible matrices, linear combinations are enough, reducing the number of mappings to try from O(2 n 2 ) to O(2 n ).
To illustrate this, let consider the following example: Assuming this is the ANF of the last round, the bits 0, 1 and 2 are balanced if and only if there is no trail reaching at least one monomial of {x, y, z, xy, xz}, {x, y, z, xz, yz} and {x, y, z, xy, yz} respectively.
Let now look at all the linear combinations of b 0 , b 1 and b 2 :

Linear Mapping at the Input
As for linear combinations at the output, it is not required to try all invertible matrices to cover the whole search space. Actually, what matters for integral distinguishers is the vector space spawned by constant (linear combinations of) bits. Indeed, let P (x 1 , . . . x n ) be a polynomial and let H(i, j) be the property that a polynomial does not contain any monomial greater than or equal to x i . . . x j . We know there exists two polynomials P 1 and Q 1 such that P (x 1 , . . . , x n ) = x 1 P 1 (x 2 , . . . , x n ) ⊕ Q 1 (x 2 , . . . , x n ). In particular, for any k ∈ {1, . . . , n}, P satisfies H(1, k) if and only if P 1 satisfies H(2, k). Now let be j ∈ {2, . . . , n} and consider polynomial P (x 1 , . . . , x n ) = P (x 1 ⊕ x j , x 2 , . . . , x n ). We have the following equalities:

Search Algorithm
In [TM16], Todo and Morii proposed a way to look for integral distinguishers based on the division property, with a complexity upper bounded by 2 n , where n is the block size of the block cipher. In practice, they said that their algorithm is not suitable for block ciphers with block size beyond 32 bits, and thus the number of possible targets is very limited. However, a lot of work has been done towards efficiently searching such distinguishers, based on either MILP or SAT/SMT solvers. Regarding MILP-based search algorithms, the main point is to generate sets of inequalities describing all the propagation tables involved in the decomposition of the cipher. But the number of inequalities required to describe a 16-bit propagation table seems too large to be handled efficiently by any MILP solver. For instance, the propagation table of the Super-Sbox of Midori-64 contains approximately 2 23 elements. Hence we developed a dedicated algorithm to search for integral distinguishers. To the best of our knowledge, this is the first time one shows a practical algorithm to search for division trails on 64-bit block ciphers not relying on generic solvers for MILP, SAT or SMT models.

Dedicated Tool to Search for Integral Distinguishers
We aimed at developing a tool to search for integral distinguishers and able to handle large propagation tables to increase the precision compared to previous approaches. To simplify the implementation process, and to be as fast as possible, we restrict ourselves in answering the following question: is there an integral distinguisher? Indeed, we believe this is the most important question and most often the only one interesting designers. Hence we did not try to improve on the data complexity nor the number of balanced bits. Actually, finding the integral distinguisher with the smallest possible data is a very hard task and seems completely out of reach if we consider linear mappings at the input.  {0, 1, . . . , 15}. We denote by x j i the i-th nibble at the input of round j and by y j i the i-th nibble at output of round j. The relation between those variables is as follows: . . The functions may depend on some key bits so Super-Sboxes are handled. However we do not take care of the key-schedule and key bits involved in different functions are considered as independent. This representation is generic enough to handle most of SPN-based block ciphers with 64-bit internal state. Note that here means that the transition from (x j 4i , x j 4i+1 , x j 4i+2 , x j 4i+3 ) to (y j 4i , y j 4i+1 , y j 4i+2 , y j 4i+3 ) should be valid regarding the propagation table of f j i . First layer. This corresponds to function minimalInputs in Algorithm 3 and aims at finding which linear mapping to add in front of the cipher. Here first layer means the first application of the four Super-Sboxes (one for each column). We only want to check for the existence of an integral distinguisher so our input division property has an Hamming weight of 63 (i.e. the input of one Super-Sbox has Hamming weight 15 while the 3 other ones have Hamming weight 16). For each column, we try the 2 16 − 1 possible linear combinations for the constant bit and keep only the minimal ones. Here, minimal means the smallest set of inputs such that if there is no integral distinguisher from the inputs of the set then there is no integral property on the cipher.
Note that this step of the algorithm is equivalent to generating 2 16 − 1 invertible matrices L in such that the first line takes all the possible non-zero values, then computing the propagation table of f 0 i • L in and extracting the line 01 . . . 1 of the table. However, once we compute all the products of the components of f 0 i , getting all the sets we want is straightforward. Let us consider the following example: The only information we need to remember is the monomials of degree at least n − 1 (in this example n = 3 ) and then we only work with the simplified following version: Now, combining f with a linear mapping L in = (α i,j ) modifies monomials as follows: xy −→ α 0,0 xy ⊕ α 0,1 xz ⊕ α 0,2 yz ⊕ p 0 (x, y, z) xz −→ α 1,0 xy ⊕ α 1,1 xz ⊕ α 1,2 yz ⊕ p 1 (x, y, z) yz −→ α 2,0 xy ⊕ α 2,1 xz ⊕ α 2,2 yz ⊕ p 2 (x, y, z) xyz −→ xyz ⊕ q(x, y, z) where the p i 's have degree at most 1 and q at most 2. But we are only interested in the u's such that f u contains a monomial containing yz. Hence we can restrict ourselves to: Now, we can try the 2 3 − 1 possible linear combinations and we find that the only sets we have to try are {001, 110}, {010, 101} and {100, 011} corresponding respectively to x ⊕ z, y and z being constant. Indeed, all other sets have smaller elements and so will not lead to a distinguisher if those ones do not.
Last layer. This corresponds to function minimalOutputs in Algorithm 3. As for the first layer, the last one is handled separately. For each column, we try the 2 16 − 1 possible linear combinations for the balanced bit and keep only the minimal ones i.e. the smallest set of outputs required to check whether there is an integral distinguisher against the cipher or not. This was illustrated in Section 4.3.

Middle layers.
We begin by constructing the propagation tables for each of the f j i 's. Then we reduce the tables using the improvement described Section 4.1. Then for each pair of input/output constructed in the previous steps, we exhaust all trails with a classical branch-and-bound approach. At each step we look at all the unset variables, guess the one with the less possible values and propagate this information which may reduce the number of possible values for other variables. If we find a trail then there is no integral distinguisher for the pair. Otherwise the pair is saved and will be returned by the algorithm. An example illustrating the behavior of Algorithm 4 is given Section A. Complexity. It is quite hard to evaluate the time complexity as it depends on too many parameters. First it is important to notice that we have less transitions than in the classical approach. For instance if we would like to know whether a transition x → y is valid through a Super-Sbox with our approach we only have to look in the propagation table of the Super-Sbox. But in the classical approach we would have to find u and v such that both x → u and v → y are valid transitions through the SubCells layer and such that u → v is a valid transition through the linear layer. Furthermore there may be many couples (u, v) satisfying those conditions. It also seems that sometimes searching for integral distinguishers is much easier than we would expect. Our tool was designed to focus on trails with inputs of Hamming weight n − 1 and outputs of Hamming weight 1 which highly restricts the possible values of intermediate variables.
In practice, the running time of our tool was reasonable for all the ciphers we tried except for one: LED [GPPR11]. We believe this due to the complex linear layer of LED which leads to a very high number of possible transitions.

Remarks.
We tried several heuristics for the selection of the next variable to guess and selecting the one with the less possible values gave the best results. Note that in our algorithm a variable is a nibble. We tried to guess the internal state variables bit by bit instead of nibble by nibble but that was much worst for most of our targets. We believe this is because for SPN with a linear layer operating at the word level, bit-based division property is not that far to word-based division property.
The most expensive operation in our algorithm is the propagation which aims at restricting the number of possible values for all variables. But overall, the main difficulty is the high number of trails to try. Hence, we believe the improvement presented Section 4.1 is of great importance to improve the efficiency of any approach searching for integral distinguishers.
Note also that Algorithm 3 could be easily extended to bigger internal states but limitation comes from Algorithm 2 which cannot be used to build propagation table for functions on more than 16 bits.

Results
In this section we give the results of our new algorithm combined with the improvements presented in Section 4. We also used our tool to search for low data distinguishers by setting at the input each Super-Sbox to an Hamming weight of 0, 15 or 16. While this does not cover all the possible inputs, we obtained some very interesting results.

Midori-64
Midori is a lightweight block cipher designed by Banik et al. and presented at ASI-ACRYPT'15. It has a classical SPN structure but the MixColumns matrix is a binary non-MDS matrix. An overview of the cipher is depicted on Figure 1, and we refer the interested readers to [BBI + 15] for the specification of Midori. We decomposed the cipher by alternating Super-Sboxes and linear layer. For the last part of the cipher, if the number of rounds targeted is odd then we consider the 16-bit sboxes obtained by combining both the linear layer and the SubCell operations.
We were able to study up to 10 rounds of Midori. As a result, we found new integral distinguishers against 9-round Midori-64 and showed there is no such distinguisher against 10 rounds. Note that the previously known best integral distinguisher against Midori-64 reached only 7 rounds and the technique used was described in [ZR19] by Zhang and Rijmen. Furthermore, they claimed that there is no potential of improvement of the result on the attack by distinguishers after using our method which our results invalidate. The fact is that Zhang and Rijmen compute exactly the Sbox propagation tables and the linear layer, but the composition on two rounds is not exact and it is precisely what we exploit.
Integral distinguishers against 6-round Midori-64. We found several distinguishers on 6 rounds requiring only 2 15 data and the search procedure took 569 CPU-minutes (32 minutes on our 2× AMD EPYC 7742 64-Core server). For instance, if bits of indices from ShuffleCell −1 ({9, 16, . . . 63}) are constant while we sum on the other ones, then the xor of bits 1, 5, 9 and 13 of the state right after the 6-th application of the SubCell operation is balanced.
Integral distinguishers against 7-round Midori-64. We used our tool to search for integral distinguishers on 7 rounds. We were able to find distinguishers requiring only 2 45 chosen plaintexts. For instance, if at the input the linear combination of bits b 1 ⊕ b 5 ⊕ b 9 ⊕ b 13 is constant for three first Super-Sboxes and if the whole fourth one is also constant while summing on a complementary space of dimension 45 then the xor of bits 1, 5, 9 and 13 of the state right after the 7-th application of the SubCell operation is balanced.
However our tool shown its limits on this example. Exhausting distinguishers (inside our restricted search space) with data 2 48 and 2 47 took only few minutes on our server. But we manually stopped the search for distinguishers requiring 2 46 after two weeks. For distinguishers with data 2 45 we had to manually force an input/output pair we believed would lead to a distinguisher and it took a full day to our tool to confirm it. This shows how hard it is to search for low data integral distinguishers, at least with our method.
Integral distinguishers against 9-round Midori-64. Our search algorithm took 5, 183 CPU-minutes to exhaust minimal distinguishers (89 minutes on our server). As a result, we found several distinsguishers requiring 2 63 data. At the input they require that, for one Super-Sbox, one of the following linear combinations of bits is constant: In that case, and if summing on a complementary space of dimension 63, for all columns on the state right after the 9-th application of the SubCell operation the following linear combinations of bits are balanced: Integral distinguishers against 10-round Midori-64. Our search algorithm exhausted the search space in 509 CPU-minutes (11 minutes on our server) and did not find any distinguisher. 1 Propagation tables. We compared the propagation tables of both the Super-Sbox and the classical approach. We found that as soon as the Hamming weight of the input is at least 9 then all lines of the Remark. Since we did not consider the key-schedule nor the round constants (seen as belonging to the key-schedule), we were able to reduce the search space by a factor 4, using the inner symmetries of the cipher.

SKINNY-64
SKINNY [BJK + 16] is a family of very lightweight tweakable block ciphers designed by Beierle et al. and presented at CRYPTO'16. The round function of SKINNY is very similar to the one of Midori. It also has a classical SPN structure and a non-MDS binary matrix is used for the MixColumns operation. But the matrix has a low weight (only half of the coefficient are non-zero) and the tweakey is xored to the two first rows only. The round function of SKINNY is depicted on Figure 2 and we refer the interested reader to [BJK + 16] for more details. We focused on SKINNY-64, the internal state of the 128-bit version being too big to be handled by our tool. As for Midori, we decomposed the cipher by alternating Super-Sboxes and linear layers. However, we had to generate two different propagation tables for the Super-Sboxes, one for the first column, and one for the other columns. Indeed, the third nibble of the first column has to be xored with the round constant but not to the key. Hopefully, the same constant (0x02) is used for all rounds and we do not need to build a different propagation table for each round nor taking care of the rounds we are analysing.
Many integral distinguishers against 10-round SKINNY-64 were published in the last years (e.g. [BJK + 16], [ZR19] or [EKKT18]). While the designers of SKINNY wrote in the original paper that maybe the division property could be used to slightly extend those results, Zhang and Rijmen claimed in [ZR19] there is no space for improvement of the result on this type of distinguishers. Actually, all the previous approaches led to the conclusion that there is no integral distinguisher against 11 or more rounds of SKINNY-64. But with our new technique, we were able to find distinguishers against 11 rounds, showing the benefit in precision obtained by considering Super-Sboxes.
Integral distinguishers against 8-round SKINNY-64. We found several distinguishers on 8 rounds requiring only 2 15 data. For instance, if bit 1 as well as all bits of indices in ShiftRows −1 ({16, . . . , 63}) are constant while we sum on the other ones, then both bit 7 and bit 11 of the state right after the 8-th application of the SubCells operation are balanced.
The search procedure took 125 CPU-minutes to finish (5 minutes on our server).
Integral distinguishers against 10-round SKINNY-64. We found several distinguishers on 10 rounds requiring only 2 47 data. For instance, if we take as constant bits of indices in ShiftRows −1 ({13, 32, . . . , 47}) while we sum on the other ones, then bit 39 of the state right after the 10-th application of the SubCells operation is balanced. The search procedure took 160 CPU-minutes to finish (7 minutes on our server).
Integral distinguishers against 11-round SKINNY-64. The search algorithm took 683 CPU-minutes (22 minutes on our server) to exhaust all the minimal distinguishers. As a result we found if bit 1 is constant while summing on all the 63 other bits, then bits 54 and b 55 of the state right after the 11-th application of the SubCells operation are balanced. Note that from those results we can deduce more balanced bits as for instance b 15 . Our algorithm does not output it because it only tries a minimal set of (linear combinations of) bits as explained Section 5.
Integral distinguishers against 12-round SKINNY-64. Our search algorithm exhausted the search space in 152 CPU-minutes (4 minutes on our server) and did not find any distinguisher 2 .
Propagation tables. As for Midori, we compared the propagation tables of both the Super-Sbox and the classical approach. Results are also surprising since even for inputs of hamming weight 2 there are differences:  Figure 3 and we refer interested readers to [HSH + 06] for the complete specification.
One round of HIGHT is then composed of a parallel application of those functions followed by a permutation. This shows that Algorithm 2 as well as our tool are very versatile and not restricted to classical SPN and Super-Sboxes. We ran our search algorithm up to 21 rounds of HIGHT. We found integral distinguishers up to 20 rounds while the previously best known integral distinguisher against HIGHT could cover only 19 rounds with a data complexity of 2 63 [FTIM19].
Integral distinguishers against 20-round HIGHT. The search algorithm took more than 13 days on our server to exhaust all the minimal distinguishers. As a result, it found 4 distinguishers: Remark. Our distinguishers against 20-round HIGHT require linear mappings both in front and back of the cipher and we did not find other distinguishers. This shows the property all maximal degree monomials used in [HLLT20] does not ensure security against integral distinguishers as claimed by Hebborn et al.. Indeed, while they do consider the case of a linear mapping added at the end of the cipher, they do not consider the possibility of adding it at the beginning.

Low Data Distinguishers
We experimentally verified the low data distinguishers and the experiments never failed. 3 For each experiment all the round keys as well as the constant bits were drawn uniformly at random, showing our distinguishers are independent of the key schedule as we expected.
We do not give results for both 8-round Midori-64 nor 9-round SKINNY-64. This is because the best integral distinguishers we found have approximately the same data complexity than our distinguishers on 9 and 10 rounds respectively. This comes from the restriction to the search space we added to our tool: at the input of the cipher, for each Super-Sbox the input has Hamming weight 0, 15 or 16.

Conclusion
In this paper we showed how considering larger Sboxes and, especially, Super-Sboxes, makes the propagation more accurate. We discovered new integral distinguishers that previous approaches could not find, covering more rounds against 3 well-studied block ciphers. We also proposed several generic improvements regarding division property, including an algorithm to reduce the number of trails to try as well as much faster algorithms to add linear mappings around the cipher, reducing the number of mappings to try from O(2 n 2 ) to O(2 n ).
We also believe this work will challenge the community in handling such large propagation tables with generic solvers for MILP, SAT or SMT models. Furthermore our search algorithm is quite basic and we are sure there is room for improvement.

Future work.
In this work we built the propagation tables for Super-Boxes by adding transitions valid for at least one key. However it could be interesting to investigate the weak-key setting, for instance by adding a transition only if it is valid for 50% of the keys.

A About the Search Procedure
In this section we detail our search procedure on an example extracted from our tool on SKINNY-64. The procedure is a classical branch-and-bound, guessing variables and propagating constraints to other ones. Let assume after s steps of the algorithm one branch arrives to the internal state described in Table 2. This table contains