SuperBall: A New Approach for MILP Modelings of Boolean Functions

. Mixed Integer Linear Programming (MILP) solver has become one of the most powerful tools of searching for cryptographic characteristics. It has great significance to study the influencing factors of the efficiency of MILP models. For this goal, different types of MILP models should be constructed and carefully studied. As Boolean functions are the fundamental cryptographic components, in this paper, we study the descriptive models of Boolean functions. Here, a descriptive model of a Boolean function refers to a set of integer linear inequalities, where the set of the binary solutions to these inequalities is exactly the support of this Boolean function. Previously, it is hard to construct various types of descriptive models for study, one important reason is that only a few kinds of inequalities can be generated. On seeing this, a new approach, called SuperBall, is proposed to generate inequalities. The SuperBall approach is based on the method of undetermined coefficients, and it could generate almost all kinds of inequalities by appending appropriate constraints. Besides, the Sasaki-Todo Algorithm is also improved to construct the descriptive models from a set of candidate inequalities by considering both their sizes and strengths, while the strengths of descriptive models have not been considered in the previous works. As applications, we constructed several types of descriptive models for the Sboxes of Liliput , SKINNY -128, and AES. The experimental results first prove that the diversity of the inequalities generated by the SuperBall approach is good. More importantly, the results show that the strengths of descriptive model do affect the efficiencies, and although there is not a type of descriptive model having the best efficiency in all experiments, we did find a specific type of descriptive model which has the minimal size and relatively large strength, and the descriptive models of this type have better efficiencies in most of our experiments.


Introduction
In 2009, Borghoff et al. firstly applied the Mixed Integer Linear Programming (MILP) method to attack Bivium [BKS09]. Although they did not get outstanding results at that time, they believed that this kind of attacks was also applicable to other cryptographic algorithms. In 2011, the MILP method was used for cryptanalyses by Mouha et al. [MWGP11] and Wu et al. [WW11] again. The authors transformed the problem of counting the minimal number of active S-boxes into an MILP problem. Mouha et al. proposed the MILP modelings for the XOR operation and the linear transformations, but they could not model the differential properties of Sboxes. Thus, only word-oriented models were constructed, and these models cannot be applied to all kinds of ciphers at that time, e.g. Present [BKL + 07] and Gift [BPP + 17]. In 2014, Sun et al. proposed two methods to model the possible and impossible differential propagations of the Sboxes in [SHW + 14a, SHW + 14b], so they could find the differential characteristics automatically for bit-oriented block ciphers. Since then, MILP methods have been widely applied in cryptanalysis, including differential attacks [SHW + 14a, SHW + 14b, FWG + 16, ZZDX20], impossible differential attacks [ST17b], and cube attacks [XZBL16, SWW17, TIHM18, HW18, WHG + 19, HLM + 20].
Usually, the propagation property of a cryptographic component (e.g. Sbox) can be represented as a Boolean function. For a Boolean function F , to model F by an MILP method is to compute a set of integer linear inequalities such that the set of binary solutions to these inequalities is exactly the support of F . For convenience, we call the set of inequalities that models F as a descriptive model of F . According to [BC20], two problems should be solved if we want to compute a descriptive model of F . Problem 1: How to generate inequalities such that the bit vectors in the support of F are all solutions to these inequalities?
Problem 2: If the first problem is solved, then among the candidate inequalities, how to choose a set of inequalities that models F ? For Problem 1, Sun et al. proposed two methods to generate inequalities in [SHW + 14a, SHW + 14b], the convex hull method and the logical condition method. In the convex hull method, the inequalities were generated by computing the H-representation of the convex hull of all points corresponding to possible differential propagations. The convex hull can be computed by the Sage software [Dev16]. The logical condition method could generate an inequality for every point corresponding to the impossible differential propagation. However, both of the above methods can only work on small Sboxes, e.g. 4-bit Sboxes. In 2017, Abdelkhalek et al. gave the first MILP model for 8-bit Sboxes in [AST + 17], they studied the product-of-sum representation and generated the inequalities via the Espresso algorithm. In 2020, Boura and Coggia proposed several techniques to generate inequalities for Sboxes and linear layers in [BC20]. They extended the logical condition method to exclude more points, and considered the inequalities with zero coefficients. Here, the points excluded by an inequality refer to the points that are not solutions of this inequality. They also constructed a kind of new inequalities, called distorted balls. More importantly, the authors proposed a method of generating new inequalities from existing inequalities. And their experimental results also showed that Boura and Coggia's methods generated more kinds of inequalities than the previous works. Although Boura and Coggia's methods could still only generate some special kinds of inequalities, their work greatly motivated the following studies. In 2021, Udovenko proposed a new approach to generate inequalities in [Udo21]. His idea is similar to ours, but the details are distinct. Both Udovenko's work and ours were done independently almost at the same time. We will provide more details about Udovenko's work in the end of this section.
For Problem 2, Sun et al. gave a greedy method to remove redundant inequalities in [SHW + 14b]. Their idea is to iteratively select the inequality, which excludes the most points, from the set of unselected inequalities. Sun et al.'s method could reduce the size of a descriptive model, but cannot reach the optimal size. In 2017, Sasaki and Todo proposed a new algorithm to reduce the size of a descriptive model automatically in [ST17a]. Specifically, in order to choose a set of inequalities that models a Boolean function F , a binary variable is assigned to each candidate inequality to show if this inequality is selected, and a constraint is added to each bit vector that is outside the support of F . These constraints ensure that each bit vector outside the support of F is rejected by at least one inequality. Then, an MILP model is constructed to find out a descriptive model with the specified size. Generally, the objective of the MILP model is to minimize the number of inequalities, so the inequalities which reject as many bit vectors outside the support of F as possible are preferred. The Sasaki-Todo algorithm has been widely used in the follow-up research.
Motivations: Since MILP models have been widely used in cryptanalyses, it is important to improve the efficiency of solving MILP models. However, it still remains unclear which type of descriptive model could achieve the best efficiency. Previously, the descriptive models that have minimal sizes were assumed to have better efficiencies, but some counter-examples in [ST17a] showed that this type of descriptive model may not be so efficient sometimes. To resolve this problem, various types of descriptive models should be constructed and carefully studied. And for this goal, the first step is to generate various inequalities, because all descriptive models are constructed from a set of candidate inequalities. If the candidate inequalities have less diversity, the differences in the efficiencies between the descriptive models will not be so obvious, which may prevent us from finding out the most efficient descriptive models.
Contributions: Firstly, we propose a novel approach, called the SuperBall 1 approach, to generate inequalities for Boolean functions and the sets of inequalities generated by the SuperBall approach have greater diversities than those in previous. Existing methods could only generate some special kinds of inequalities. Unlike the previous works, the SuperBall approach generates inequalities in a new way. By using the method of undetermined coefficients, we regard the coefficients and constant terms of the inequalities as unknowns, and formulate the constraints that the unknowns should satisfy. Then new inequalities can be obtained by solving the constructed systems. To make the constructed systems easier to study as well as to speed up the solving procedure, we use the divide-and-conquer strategy. Specifically, we divide the set of general inequalities into several patterns, and in each pattern, the signs of coefficients in the inequalities are fixed, which makes the related studies much easier. The detailed studies can be found in Sec. 3. In this paper, we use the concept of diversity to evaluate a set of inequalities. If the number of inequalities is large and the inequalities generated are "different" from each other, we say the set of inequalities has a great diversity. To measure the diversity of the sets of inequalities, we define the diversity index in Definition 5. We compared the diversity indexes of the sets of inequalities generated by our approach with those of [SHW + 14b] and [AST + 17] in Table 1. Because the candidate inequalities of [BC20] and [Udo21] were not available, we generated some other descriptive models with minimal sizes instead. The comparisons are shown in Table 2. The results show that the set of inequalities generated by the SuperBall approach has a greater diversity than previous works.
Secondly, we improve the Sasaki-Todo algorithm by considering both the sizes and strengths of descriptive models, while the strengths of models have not been considered before. "Strength", also called "tightness" in some papers (e.g. [MELR13]), is defined as the search space (relaxed feasible region) that the solver needs to explore in order to find the (optimal integer) solution. The descriptive model of a Boolean function that has the largest strength is just the convex hull of the support of the Boolean function. However, the size, i.e. the number of inequalities, of the convex hull is large. In [Vie15], Vielma pointed out that an efficient descriptive model should balance the size and the strength. But since the strength is hard to be controlled when we are selecting inequalities from a candidate set, we define the concepts of approximate strength and cover rate for a descriptive model. Then we improve the Sasaki-Todo algorithm by bounding the approximate strength and maximizing the cover rate at the same time. Specifically, given an approximate strength s ≥ 0, the improved Sasaki-Todo algorithm computes a descriptive model that has the minimal size and a relatively large cover rate among the ones whose approximate strengths are at least s. The definitions of approximate strength and cover rate, as well as the details about the improved Sasaki-Todo algorithm come in Sec. 4.
Thirdly, we find and suggest a type of descriptive model which has a better efficiency in most of our experiments for the first time, although there is not a type of descriptive models having the best efficiency in all experiments. The great diversity of the set of inequalities enables us to construct several different types of descriptive models for the Sboxes of Liliput, SKINNY-128, and AES, where the types of descriptive models are mainly defined by their strengths. We designed two search tasks, verifying the differential pairs and finding the minimal numbers of active Sboxes, to test the efficiencies of descriptive models. As a result, we found the strengths of descriptive models do affect the efficiencies, and the descriptive models that have minimal sizes and relatively high cover rates often have better efficiencies in our experiments.
Our conclusion seems to contradict with the previous counter-examples in which showed that the descriptive models minimal sizes may not be so efficient sometimes, e.g. experiments in [ST17a]. We think we can explain this phenomenon in two aspects. Firstly, the sets of inequalities generated before the work [BC20] lack good diversity, and the strength factor related to inequalities is also not considered when the inequalities are generated, so we think the diversity of the candidate inequalities used previously is not great enough. Secondly, there exist many descriptive models that have the same minimal size, but the strengths of these descriptive models are distinct. Among these minimal descriptive models, the ones with the cover rate 0 may lead to much worse efficiencies than the ones with high cover rates in our experiments. So we guess the descriptive models used in the counter-examples perhaps have low cover rates.

Relation to Udovenko's work in [Udo21]:
Udovenko proposed a similar idea for generating inequalities concurrently and independently of our work. The original manuscript of our paper [Sun21] was submitted to eprint slightly earlier than Udovenko's paper. Besides the high-level ideas, Udovenko also used the same divide-and-conquer strategy to study the inequalities in specific patterns, but his method of calculating the inequalities is different from our work. Specifically, in order to compute all maximal sets of points that can be removed by a single inequality (referred as maximal 1-removable sets in [Udo21]), Udovenko utilized the techniques for learning monotone Boolean functions. In the enumeration procedure, he first constructed an oracle to check whether a set is 1removable, and then the maximal 1-removable sets can be generated by calling the oracle and updating some 1-removable sets with special properties for a few times, the constrains for these sets can be generated in the same time. But in this paper, we use the method of undetermined coefficients such that the above inequalities can be computed by only one call of the MILP solver. Besides the advantage in the efficiency, the method of undetermined coefficients also gives us a great flexibility to generate various kinds inequalities, e.g. the inequalities that balance the size and strength factors, as well as the inequalities containing many zero coefficients.
All source codes of the algorithms in this paper, the inequalities in the descriptive models, and the MILP models used in the experiments, can be found at https://github. com/ysun0102/superball. Organization: Some necessary notations and preliminaries are presented in Sec. 2. We introduce the SupberBall approach for generating inequalities in Sec. 3, and the improved Sasaki-Todo algorithm is presented in Sec. 4. Experiments are shown in Sec. 5. We conclude this paper in Sec. 6.

Notations
Let F be a Boolean function from F n 2 to F 2 , where F 2 is the field consisting of {0, 1}. In this paper, the n-bit vector v = (v 0 , v 1 , . . . , v n−1 ) ∈ F n 2 is also called as a point and written as v 0 v 1 · · · v n−1 for short. The support of F consists of the points on which F takes 1, i.e. Supp(F ) := {v ∈ F n 2 | F (v) = 1}. Let wt(v) denote the Hamming weight of the point v.
The inequality discussed in this paper always has the following linear form where a i 's and b are integers in Z and x i ∈ {0, 1} ⊂ Z for 0 ≤ i < n. Please note that, in the Boolean function F , operations are bit AND and XOR (⊕), while in the inequality (1), integer multiplication and addition (+) are used. For sake of simplification, we also say x i ∈ F 2 if no confusion arises. Denote the linear polynomial a 0 x 0 + a 1 x 1 + · · · + a n−1 x n−1 + b by f ∈ Z[x 0 , . . . , x n−1 ]. Studying the inequality (1) actually equals to studying the polynomial f . The value of f taking on v = (v 0 , v 1 , . . . , v n−1 ) ∈ F n 2 is the integer a 0 v 0 + · · · + a n−1 v n−1 + a n ∈ Z. The set of solutions to an inequality f ≥ 0 is defined as Sol(f ≥ 0) := {x ∈ F n 2 | f (x) ≥ 0}, and the set of common solutions to several inequalities, say The strength (also called tightness) of a descriptive model reflects the scale of the search space (relaxed feasible region) that the solver needs to explore in order to find the optimal integer solution [MELR13]. The smaller the search space is, the greater the strength of a descriptive model has. To measure the strength factor related to an inequality, we define the strength index of an inequality f ≥ 0 as the cardinality of the set Sol(f = 0). Figure 1 shows that if the sets Sol(f = 0) of inequalities are larger, the descriptive model constructed by these inequalities may have a larger strength. Given two sets of inequalities and Sol(f i = 0) ⊂ Sol(f ′ i = 0). The descriptive models constructed by P and P ′ have the same integer solutions. However, the search space related to P is larger than that of P ′ as shown in Figure 1, so solving the model constructed by P ofter costs more time. To construct good descriptive models which balance the size and the strength, for an inequality f ≥ 0, we also consider the set Sol(f < 0) := {x ∈ F n 2 | f (x) < 0} and the set Sol(f = 0) := {x ∈ F n 2 | f (x) = 0}. We will see in Sec. 4 that an inequality f ≥ 0 with a large Sol(f < 0) is usually selected into the minimal descriptive model.

Some Boolean functions to be modeled
In automatic search models, propagation patterns of components need to be represented by MILP models. We consider three kinds of Boolean functions that are deduced from the Differential Distribution Tables (DDTs) of Sboxes, linear layers, and division property.

Boolean functions from DDTs of Sboxes
Generally, DDTs are used to describe the differential propagation of Sboxes. For an m-bit Sbox S, let α, β ∈ F m 2 be input and output differential pair. Then DDT(α, β) is the cardinality of the set {x ∈ F m 2 | S(x) ⊕ S(x ⊕ α) = β}. The Boolean function deduced from the DDT of S is:

Boolean functions from linear layers
For a linear layer, the output y ∈ F m 2 can usually be represented as a linear transform . Thus, the Boolean function deduced from the linear layer is:

Boolean functions from division property
As the bit-based division property can propagate like the differential characteristics [XZBL16], the Boolean functions can be deduced similarly by enumerating all feasible input and output pairs. However, since the algebraic structures of the propagations are clear, the division property is often modeled by inequalities that use additional auxiliary variables, e.g. [TIHM18, HW18, WHG + 19, HLM + 20, HSWW20].

The SuperBall approach for generating inequalities
Let F be a Boolean function from F n 2 to F 2 , and S be the set Supp(F ) ⊂ F n 2 . To model the boolean function F , we need to find a set of inequalities such that the set of their common solutions is exactly S. For this aim, we first propose a new approach, called SuperBall, to generate various inequalities in this section. Next, we improve Sasaki-Todo algorithm for selecting inequalities in the next section.
To illustrate our approach, we use the following toy example throughout this paper.

Main ideas
Given a set S ∈ F n 2 , to compute an inequality such that the points in S are all solutions to this inequality, our first idea is to use the method of undetermined coefficients. Specifically, in Example 1, to solve for a polynomial f = a 0 x 0 + · · · + a 3 x 3 + b such that S ⊂ Sol(f ≥ 0), we regard a 0 , a 1 , a 2 , a 3 , b as unknowns in Z and solve the following system: (2) The system (2) consists of |S| constraints. Clearly, this system has a huge number of solutions in Z 5 , and each solution (ã 0 ,ã 1 ,ã 2 ,ã 3 ,b) ∈ Z 5 deduces a polynomialf = a 0 x 0 + · · · +ã 3 x 3 +b, as well as an inequalityf ≥ 0. According to the constraints in (2), we must have S ⊂ Sol(f ≥ 0). However, most solutions of the system (2) are useless. For example, if there are two deduced polynomials, say f and f ′ , such that Sol(f ≥ 0) = Sol(f ′ ≥ 0) and Sol(f = 0) = Sol(f ′ = 0), then one of them is useless.
Moreover, in practical applications, we usually have extra requests on the inequalities. For example, we often expect a polynomial f such that the set Sol(f < 0) is as large as possible, since this kind of inequalities is useful to reduce the sizes of descriptive models. To satisfy this request, we hope the following constraints to hold as many as possible: (3) The number of constraints in (3) is 2 n − |S|.
Please note that, unlike the constraints in (2) that should hold at the same time, only some of the constraints in (3) can hold generally. And for the latter case, the difficulty is how to determine which of these constraints could hold simultaneously. In [Udo21], the author used a testing method to resolve this problem. That is, some constraints in (3) are chosen first, and then combined with the constraints in (2), a test is performed to check whether there is a solution to the constructed system. As the number of constraints in (3) is often large, this method is not very efficient. We use a simpler method to resolve this problem in Sec. 3.3.
The method of undetermined coefficients is able to compute a polynomial f such that S ⊂ Sol(f ≥ 0), but it may not be efficient because the numbers of constraints in (2) and (3) are often large, which makes the constructed system hard to solve. So our second idea is to use the divide-and-conquer strategy. Note that the difference between the inequalities a 0 + b ≥ 0 and a 0 + a 2 + b ≥ 0 in the system (2) is only a 2 . If we assume a 2 ≥ 0, then the former inequality implies the latter one. This means the numbers of constraints in (2) and (3) can be reduced significantly if we know the signs of the coefficients a i 's.
Thus, instead of computing a polynomial f = a 0 x 0 + · · · + a 3 x 3 + b when the coefficients a j 's are in Z, we only consider a specific pattern of f each time by fixing the signs of a j 's.

Definition 1.
For a polynomial f = a 0 x 0 + · · · + a n−1 x n−1 + b ∈ Z[x 0 , . . . , x n−1 ], we say f is in a pattern if there exists a set P ⊂ {0, 1, . . . , n − 1} such that a j ≥ 0 for j ∈ P and a j ≤ 0 for j / ∈ P . Let c = (c 0 , c 1 , . . . , c n−1 ) be a point in F n 2 such that c j = 0 for j ∈ P and c j = 1 for j / ∈ P . We call c is the center of this pattern. For simplicity, we also say f is in the pattern c.
If a polynomial f is in the pattern c, then f takes the minimal value on the center c, 0, a 3 ≤ 0} and the center of this pattern is 0011. Then f takes the minimal value on the center point 0011, i.e. x 0 = 0, x 1 = 0, x 2 = 1, x 3 = 1. Particularly, a polynomial may be in several patterns if some of its coefficients are zeros.
By definition, there are 2 n patterns in all. To solve for all possible polynomials (as well as inequalities), we can perform the computations in each pattern first, and then gather all results. In fact, this divide-and-conquer strategy not only speeds up the computations, but also makes the study of inequalities easier in each pattern. Because by the following proposition and corollaries, to compute a polynomial in some pattern c, it suffices to compute a polynomial in the pattern 0. Proposition 1. Letf =ā 0 x 0 + · · · +ā n−1 x n−1 +b be a polynomial in the pattern 0. Let c = (c 0 , . . . , c n−1 ) be a point in F n 2 , and P be the subset of {0, 1, . . . , n − 1} such that c j = 0 for j ∈ P and c j = 1 for j / ∈ P . Denote f := a 0 x 0 + · · · + a n−1 x n−1 + b, where a j :=ā j for j ∈ P , a j := −ā j for j / ∈ P , and b :=b + j / ∈Pā j . Then f is in the pattern c, and for any Proof. Sincef is in the pattern 0, we haveā j ≥ 0 for 0 ≤ j < n. So a j =ā j ≥ 0 for j ∈ P and a j = −ā j ≤ 0 for j / ∈ P , which means f is in the pattern c.
where the second equality follows from that fact that Similarly, we have the following two corollaries. Corollary 3. Given a set E ⊂ F n 2 , to compute a polynomial f in the pattern c such that E = Sol(f < 0), it suffices to compute a polynomialf in the pattern 0 such that {v ⊕ c | v ∈ E} = Sol(f < 0).
By the above corollaries, we can see that computing a polynomial in the pattern c can always be converted to computing a polynomial in the pattern 0. All we need to do is to transform related point v to v ⊕ c. Thus, it suffices to study the polynomials in the pattern 0 in the rest of this section. For convenience, if a polynomial f is in the pattern 0, we say it is in the normal form.

Properties of points in F n 2 with respect to a setS
By Cor. 1, given a set S ⊂ F n 2 , to compute a polynomial f in a pattern c such that S ⊂ Sol(f ≥ 0), it suffices to calculate a polynomialf in the normal form such that before discussing how to computef using the method of undetermined coefficients, we study the properties of the points inS and F n 2 \S first. The properties of these points will significantly reduce the number of constraints that are used in the computation.
We still use Example 1 to illustrate the related conceptions. To compute a polynomial in the pattern c = 0011 with respect to the set S = {1000, 1010, 0110, 1110, 1001, 0101, 1101} ⊂ F 4 2 , it suffices to compute a polynomialf =ā 0 x 0 + · · · +ā 3 x 3 +b in the normal form such thatS ⊂ Sol(f ≥ 0) whereS = {1011, 1001, 0101, 1101, 1010, 0110, 1110}. To ensure that the conditionS ⊂ Sol(f ≥ 0) holds, the undetermined coefficientsā 0 ,ā 1 ,ā 2 ,ā 3 and constant termb must satisfy the following constraints: (4) Note that, sinceā j ≥ 0 for 0 ≤ j < n, some constraints in (4) can be implied by others, Similarly, if we require the set Sol(f < 0) is as large as possible, we need the following constraints to hold as many as possible: However, we can now determine the constraintf (1111) < 0 cannot hold, because we havef (1011) ≥ 0 andā 1 ≥ 0. That is, although the point 1111 ∈ F n 2 \S, we must have The above example shows that the properties of points in S and those of points in F n 2 \ S are different. To study the points, we need the following partial ordering and theorem.

Definition 2. For two points
Theorem 1. Letf =ā 0 x 0 + · · · +ā n−1 x n−1 +b be a polynomial in the normal form, and v, u ∈ F n 2 be two points such that v ⪯ u. Then we havef The proof of theorem is straightforward sinceā j ≥ 0 for 0 ≤ j < n. Next, we give the definitions of some special points in F n 2 w.r.t.S. Given a setS, Alg. 1 computes the region, border, and minimal border points.

Computing a polynomialf in the normal form
Our final goal is to compute a set of inequalities to model a Boolean function F , such that the set of common solutions to these inequalities are exactly the set Supp(F ). As we will consider the sizes and strengths of the descriptive models, the size and strength factors of the inequalities should also be considered when they are generated. According to the Sasaki-Todo algorithm, an inequality f ≥ 0 is preferred if the set Sol(f < 0) is large, because this kind of inequalities helps to reduce the sizes of descriptive models. So the size factor of the inequality is related to the set Sol(f < 0). By the definition of strength index, the strength factor of the inequality f ≥ 0 is related to the set Sol(f = 0). To make our approaches more understandable, we first discuss how to generate an inequality by only considering its size factor. By Cor. 1, we only need to focus on the problem: given a setS, how to compute a polynomialf in the normal form, such that S ⊂ Sol(f ≥ 0) and the set Sol(f < 0) is as large as possible.
Again holds. Next, we use the following MILP model to solve for the coefficients off , wheref (v) has the form ofā 0 x 0 +ā 1 x 1 +ā 2 x 2 +ā 3 x 3 +b: where γ is a positive integer and its value is discussed below.
Please note that in the above MILP model,ā j 's are all unknown positive integers and b is an unknown integer, and they have lower and upper bounds in an MILP solver. For a 0 ,ā 1 ,ā 2 ,ā 3 , we assume they are not bigger than a positive integer R, i.e.ā j ∈ [0, R], which means 0 ≤ā j ≤ R. Here, the constant R is usually chosen as a multiple of the dimension n = 4, e.g. R = 25n. With this setting, the value ofā 0 x 0 +ā 1 x 1 +ā 2 x 2 +ā 3 x 3 ranges from 0 to 4R, so the range ofb is set as [−4R, 0]. Because ifb < −4R orb > 0, we always havef (v) < 0 andf (v) > 0 for all v ∈ F 4 2 . Therefore, the range of the values off is [−4R, 4R]. Then the constant γ is often set as 4R + 1, such that if y v = 1, the constraint always holds and hence does not constrain the values ofā j 's andb.
To increase the strength of an inequalityf ≥ 0, we expect the set Sol(f = 0) also contains as many border points as possible, because we believe in this case the inequalitȳ f ≥ 0 can eliminate many useless numerical solutions and hence leads to a better strength.
To compute a polynomialf by considering both Sol(f < 0) and Sol(f = 0) ∩ {border points ofS}, the idea is the same as that used to maximize the size of Sol(f < 0). Specifically, we introduce a new auxiliary binary variable, say z v , to determine whether a border point v is in Sol(f = 0). Assume the sets Region, Border, and miniBorder contain all region, border, and minimal border points w.r.t.S. We use the following MILP model to compute the desired polynomials: where γ is a positive integer. And as discussed in the above, we usually choose γ = nR + 1 and R is the upper bound of the coefficients inf .
In the model (6), we have two parameters, α and β, to adjust the weights of points in Sol(f < 0) and Sol(f = 0) ∩ Border. For example, if we set α = 2 n and β = 1, then the above model will first try to find the polynomialsf 's such that Sol(f < 0) is maximal, and then among thesef 's, the model finds the one such that the set Sol(f = 0) ∩ Border is maximal. Note that the MILP model (5)  Computing irredundant polynomials: By adjusting the values of α and β in the model (6), we can generate many polynomials, but we need to prevent the model from computing the redundant polynomials. We define a polynomialf is redundant if there exists another polynomialf ′ such that Sol(f < 0) ⊂ Sol(f ′ < 0) and Sol(f = 0) ⊂ Sol(f ′ = 0).
We briefly talk about the complexity of Alg. 2. The complexity of Alg. 2 contains two parts. The first part is the complexity of computing one polynomial. This complexity is usually determined by the number of constraints added in Line 8 ∼ 10. According to our experiments, the size of the set Region dominates this complexity. That is, the larger the set Region is, the longer time it takes to compute a polynomial. The second part is the number of possible irredundant polynomials, which is usually determined by the size of Region ∪ Border. Generally, if the dimension n is large, e.g. n ≥ 12, it takes too long to obtain all irredundant polynomials. So in this case, we often stop the algorithm after a number of polynomials have been obtained. Computing other special polynomials: More special polynomials can also be generated by adding corresponding constraints to the model (6). For example, we can even generate the sparsest polynomials by using auxiliary binary variables to determine whether coefficients are 0's. Specifically, assumeā j is a coefficient ranging from 0 to R. Let w j be a binary variable, then the following constraints could determine whetherā j is 0: Note that if w j = 1, we must haveā j = 0; otherwise, we have 1 ≤ā j ≤ R. So by adding w j to the objective function, we can control the number of nonzero coefficients inf .

Generating a descriptive model of a Boolean function
Generally, a descriptive model is used as a component of automatic search programs. Different descriptive models of the same Boolean function often lead to distinct efficiencies of automatic search programs. Given a set of candidate inequalities, Sasaki and Todo proposed an algorithm to select inequalities such that the number of selected inequalities is minimal in [ST17a], i.e. the Sasaki-Todo algorithm generates a descriptive model with the minimal size. This algorithm was also used in the follow-up research, including [BC20,Udo21]. However, it is found that the descriptive model with the minimal size does not always lead to the best efficiency of the automatic search programs. For this phenomenon, we think there are two possible reasons. Firstly, the inequalities in consideration are not good enough, because before this paper, the kinds of inequalities that can be generated were quite limited. Secondly, the Sasaki-Todo algorithm only considers the size of the descriptive models but ignores the influence from the strength of the descriptive models.
In [Vie15], the author pointed out that the descriptive models which balance the size and strength usually lead to better efficiencies. We agree with this viewpoint, but think it is not easy to apply this viewpoint to practical computations, because the strengths of the descriptive models are hard to be calculated when we are selecting the inequalities. So we need an approximate concept to evaluate the strengths of descriptive models.
Given a Boolean function F , the convex hull of Supp(F ) has the largest strength [Hoj21]. Intuitively, one important characteristic of the convex hull is that, every point on the "surface" of Supp(F ) often lies on many hyperplanes of the convex hull. We plan to use this feature to define the approximate strength of a descriptive model. Note that the point on the "surface" is in fact a border point with respect to some set, and a point v lying on a hyperplane defined by the inequality f ≥ 0 equals to v ∈ Sol(f = 0). Then we have the following definition.
. . , f m ≥ 0} be a set of inequalities that models F , we define the approximate strength of D as where | · | means the cardinality of a set. Particularly, to differentiate the descriptive models whose approximate strengths are 0's, we define the cover rate of D as Please note that, for a point v, the number |{f i | v ∈ Sol(f i = 0), 0 ≤ i ≤ m}| is actually the number of hyperplanes that v lies on. Besides, if the approximate strength of D is not 0, then its cover rate is surely 1. So the cover rate is mainly used to compare the strengths of descriptive models whose approximate strengths are 0's. With conception of the approximate strength, then we can classify the descriptive models into different types according to their approximate strengths.
Since B is the set of all border points, we have B ⊆ Supp(F ) according to Definition 3. Actually, we always have B = Supp(F ) in our experiments. Besides, we generated many sets of points by random, and computed their convex hulls. The approximate strengths of these convex hulls are all large. So we think the definition of approximate strength is reasonable.
Next, we present an algorithm (Alg. 3) to compute a descriptive model by balancing the size and the approximate strength. This algorithm improves the Sasaki-Todo algorithm, and computes a descriptive model with a minimal size among the descriptive models with an approximate strength being at least s. When s = 0, this algorithm computes a descriptive model with the minimal size, and the cover rate of this descriptive model is maximal in this size.
The constraint in Line 9 ensures the approximate strength of the output descriptive model is at least s. In case s = 0, the objective function 2 n · ( 0≤i≤m d i ) − v∈B c v guarantees the output descriptive models must have the minimal size first, and then guides the solver to find a descriptive model with a relatively high cover rate ( v∈B c v )/|B|.
Next, we compute two descriptive models for the Boolean function in Example 1. The support of the Boolean function is S = {1000, 1010, 0110, 1110, 1001, 0101, 1101} ⊂ F 4 2 . To generate the candidate inequalities, parameters are set as α = 2 4 , β = 1, and R = 100 in Alg. 2, and polynomials in all 2 4 patterns are computed. The following inequalities are the results.

Experiments
We designed two types of experiments. The first type of experiment aims to test the diversity of the inequalities generated by the SuperBall approach. In the second type of experiment, we generated several types of descriptive models of different Boolean functions, trying to find out which type of descriptive model leads to a better efficiency.

Testing the diversity of inequalities
By saying a set of inequalities, say P = {f 0 ≥ 0, f 1 ≥ 0, . . . , f m ≥ 0} has a good diversity, we tried to mean that the number of irredundant inequalities in P is large, and the solutions of the functions f i 's are different from each other. To reflect the diversity of a set of inequalities, we have the following definition.
Definition 5. Let F be a Boolean function from F n 2 to F 2 and P F = {f 0 ≥ 0, f 1 ≥ 0, . . . , f m ≥ 0} be a set of inequalities which describes F . We define the diversity index of P F as where Supp(F ) is the support of F and | · | is the cardinality of a set.
We say P F has a good "diversity", if its diversity index is large. We compared the diversity indexes of the sets of inequalities generated by the SuperBall approach with those generated by the Convex hull method [SHW + 14b] or Espresso method [AST + 17]. In practice, there are off-the-shelf softwares to implement the Convex hull method [SHW + 14b] or Espresso method, such as Sage and Logic Friday. So we generated the inequalities by Sage and Logic Friday, then computed the diversity indexes and the minimal sizes of the descriptive models of Sboxes. Since the method in [SHW + 14b] could not be applied to large Sboxes, we only consider the Boolean functions deduced from the 4-bit Sboxes.
The results in Table 1 show that the diversities of the sets of inequalities generated by our approach are greater than those of [SHW + 14b] and [AST + 17]. Table 1 also lists the maximum values of the diversity index (max D.I.) for each Sbox. For a Boolean function F , we compute its maximum value of the diversity index as follows. For any pair (v 1 , v 2 ) ∈ F n 2 \ Supp(F ), we construct the MILP model via the undetermined coefficients method to verify whether there exists an inequality f ≥ 0 such that f (v 1 ) < 0 and f (v 2 ) < 0. The coefficients of the inequality have the same ranges as Alg. 2. The number of pairs (v 1 , v 2 ) that v 1 and v 2 can be rejected by one inequality simultaneously is the maximum value of the diversity index. For the 4-bit Sbox in Table 1, the diversity index of the set of inequalities generated by the SuperBall approach has reached its maximum value. Besides, we also computed the minimal descriptive models and found that the size of the minimal descriptive models could reflect the diversity of the set of candidate inequalities. That is, for a Boolean function, if its minimal descriptive model has a smaller size, the corresponding set of candidate inequalities always has a greater diversity. Since the sets of the candidate inequalities in [BC20] and [Udo21] are not available, the diversities can not be computed directly. Our way of testing the diversity of inequalities generated by the methods of [BC20] and [Udo21] is as follows. Firstly, we generate three kinds of inequalities via Alg. 2 2 by setting (α = 2 n , β = 1), (α = 1, β = 2 n ), and (α = 1, β = 1), and gather all inequalities. Then, we compute a descriptive model with the minimal size via the Sasaki-Todo algorithm. Thirdly, we compare the sizes of descriptive models with existing best results, e.g. results in [BC20,Udo21]. As the authors in [BC20,Udo21] also used the Sasaki-Todo algorithm to obtain the minimal sizes, the differences between the minimal sizes would reflect the diversities of the set of candidate inequalities. Specifically, the smaller the size is, the greater diversity of inequalities is, which is demonstrated by the results in Table 1.
In the experiments, we considered the Boolean functions deduced from Sboxes, linear layers, and the division property. The comparisons of minimal descriptive models from different Sboxes are shown in Table 2. Generally, if an Sbox has more invalid differential propagations, the number of region points of the Sbox is larger. That is because the region consists of invalid differential propagations. For example, the block cipher SKINNY-128 has 54067 invalid differential propagations through an Sbox, while other 8-bit Sboxes have only 33150, so the maximal number of region points of SKINNY-128 is abnormally higher compared to others. As the number of region points in a pattern reflects the difficulty of computing inequalities, we also list the maximal number of region points among the 2 n patterns in the table. Comparisons of minimal descriptive models from different linear layers and the division properties are shown in Table 3.
From Table 2 and 3, we can see that the sizes of our descriptive models are always the smallest. Udovenko's method found the minimal sizes of descriptive models for many Sboxes, but his method failed for the Sboxes of Keccak , Fides-6, SC2000-6, and AES. We think the main reason is that, Udovenko's method can only compute one kind of inequalities, i.e. his algorithm is equivalent to Alg. 2 by setting α = 2 n and β = 0. But we generated three kinds of inequalities. So the diversity of our inequalities is better than Udovenko's. Besides, we think the efficiency of Udovenko's method also limits its performance.
In all, the experimental results show that the inequalities generated by the SuperBall approach have a great diversity.

Finding out the most efficient descriptive models
To find out the most efficient descriptive models, we considered the Boolean functions deduced from the Sboxes of three block ciphers, Liliput, SKINNY-128, AES. We prepared 4 types of descriptive models generated by Alg. 3 as well as 4 counterparts from existing works. To check the efficiency, we designed two traditional search tasks for each cipher.
Specifically, the descriptive models in consideration were generated as follows. Balance-s: These types of descriptive models were generated in the following two steps. Firstly, three kinds of inequalities were generated using Alg. 2 3 by setting (α = 2 n , β = 1), (α = 1, β = 2 n ), and (α = 1, β = 1). Secondly, Alg. 3 was called to compute the descriptive models by setting the approximate strength bound as s. Particularly, in case s = 0, the descriptive model output by Alg. 3 also has the maximal cover rate, so we denote the descriptive model of this type as Balance-0-crmax.
Balance-0-cr0: The descriptive model of this type was modified 4 from the corresponding descriptive model Balance-0-crmax. For each inequality f ≥ 0 in Balance-0-crmax, we computed a new polynomial f ′ , such that Sol(f ′ < 0) = Sol(f < 0) and Sol(f ′ = 0) = ∅. Thus, the size of Balance-0-cr0 is the same as that of Balance-0-crmax, but there is no border points lying on the hyperplanes of the inequalities of Balance-0-cr0, which means the cover rate of Balance-0-cr0 is 0. So Balance-0-crmax always has a slightly larger strength than Balance-0-cr0. By considering this type of descriptive model, we hope to see whether the strength really affects the efficiency.
Sage+ST: Descriptive models of this kind were constructed by the algorithm in [ST17a]. The candidate inequalities were generated by Sage [Dev16], and then the Sasaki-Todo algorithm was applied to find the descriptive models with the minimal sizes.
Sage+Greedy: Descriptive models of this kind were constructed by the algorithm in [SHW + 14a]. The candidate inequalities were generated by Sage, and then a greedy method was used to reduce the sizes of descriptive models. Sage: The descriptive model of this kind consists of all inequalities generated by Sage. So the descriptive model is in fact the convex hull, and has the largest strengths, as well as large sizes. Espresso: Descriptive models of this kind were constructed by the algorithm in [AST + 17]. The authors used the Espresso algorithm implemented by Logic Friday (http://sontrak.com/) to generate linear inequalities for the DDT of an Sbox. They translated the problem of searching for inequalities into the classical problem of minimization of the product-of-sum representation of Boolean functions, which is a well-studied problem, and Espresso algorithm is a heuristic method to solve this problem.
The sizes and strengths of the above descriptive models are shown in Table 4. Please note that the numbers in the table are all related to one single Sbox. If k Sboxes are used in a cipher, the numbers of constraints per round in the search model should be k times of the corresponding numbers in the table.  -1  22  1  1  189  1  1  3423 1  1  Balance-3  34  3  1  367  3  1  8806 3  1  Sage+ST  23  0 101/106 ------Sage+Greedy  26  0 102/106 ------Sage  324 12  1  ------Espresso  ---377  4  1  8310 1  1 Two traditional search tasks were prepared as follows. Verification of differential pairs: For an r-round cipher, this task is to check whether a given input and output difference pair (δ in , δ out ) is possible, i.e. whether there exists a differential trail connecting δ in and δ out . In each search, the program is stopped if one differential trail is found or the model is proved infeasible. For Liliput and SKINNY, we generated 1000 random input and output pairs, and took the average time. For roundreduced AES algorithm, we only considered 250 random input and output pairs, since each search of AES took more time than the other two ciphers.
Finding the minimal number of active Sboxes: For an r-round cipher, this task is to find out the minimal number of active Sboxes in feasible differential trails. In this task, there are no constraints on the input and output differential pairs. For each round-reduced cipher, we conducted 5 searches and took the average time. Since this task costs too much computing time for the cipher AES, we only report the times for the ciphers Liliput and SKINNY-128.
As we only want to compare the performances of different types of descriptive models for the Sboxes, the constraints of the other components, e.g. the linear layers, are all the same in the experiments. The comparisons are shown in Table 5, 6, and 7. Our platform is AMD Threaddripper 3990x 2.9GHz, 256 GB RAM, running Ubuntu 20.04. The tables also show the numbers of rounds, the numbers of Sboxes in total, and the total numbers of inequalities related to these Sboxes. From the experimental results in Table 5, 6, and 7, we have three observations. Firstly, the strengths of descriptive models affect the efficiency. Note that although many descriptive models have the same minimal size, the strengths of them are still different, while the differences are reflected on the cover rates. This is why we design the comparisons between "Balance-0-crmax" and "Balance-0-cr0". By definition, the cover rate of Balance-0-cr0 is 0, but Balance-0-crmax usually has a high cover rate. The experimental results show that the descriptive model Balance-0-crmax always has a better efficiency than Balance-0-cr0. We think this fact is reasonable, because there are fewer useless numerical solutions to the descriptive model Balance-0-crmax than those of Balance-0-cr0, the MILP solver should have fewer obstacles to find out the useful integer solutions.
Secondly, although there is no type of descriptive model that always has the best efficiency, but the descriptive model Balance-0-crmax, which has the minimal size and a relatively high cover rate, has a better efficiency in most of experiments, particularly when the total number of Sboxes is large. In our opinion, although the descriptive models with larger strengths contain fewer useless numerical solutions, the sizes of the descriptive  models play a more important role in the solving procedure. Specifically, to solve an MILP model, the solvers always finds a numerical solution to the model first, and then searches for integer solutions around the numerical one. We think solving of an MILP model that has a larger strength may save some time in the second phase. But the matrix operations are the fundamental operations in both phases, and the sizes of the matrices are directly determined by the sizes of the descriptive models. So the sizes of descriptive models often affect the efficiency more. The descriptive model Balance-0-crmax has the smallest size and a not bad strength, so we think this is why it performs better in most experiments. Thirdly, if the difference between the sizes is not large, the descriptive models that have larger approximate strengths may have better efficiencies. This phenomenon appears in the task of finding least active Sboxes of SKINNY-128.
Thus, in all, we suggest the type of descriptive model that has the minimal size and relatively high cover rate. Although this type of descriptive model does not achieve the best efficiency all the time, it has a relatively stable performance.

Conclusions
In this paper, we propose the SuperBall approach to generate various inequalities. Since the method of undetermined coefficients is used, we are able to generate any kinds of inequalities by adding appropriate constraints in theory. However, not all inequalities are useful for constructing efficient descriptive models. We focus on the inequalities that could affect the sizes and strengths of descriptive models. Specifically, we only consider the inequality f ≥ 0 such that Sol(f < 0) and Sol(f = 0) are large. But we cannot guarantee the other kinds of inequalities are useless.
Our initial goal is to find a type of descriptive model such that the models of this type always have the best efficiencies. We think this goal is partially reached. We found one type of descriptive model, and the models of this type have better performances in most of our experiments. But we should admit that descriptive models of this type are not always the most efficient. Thus, for other ciphers that are not considered in this paper, we suggest using the descriptive model that has the minimal size and a relatively high cover rate first. If the efficiency of this descriptive model is not good enough, one could try to improve this descriptive model by generating other kinds of inequalities by the SuperBall approach.