Reconstructing an S-box from its Difference Distribution Table

In this paper we study the problem of recovering a secret S-box from its difference distribution table (DDT). While being an interesting theoretical problem on its own, the ability to recover the S-box from the DDT of a secret S-box can be used in cryptanalytic attacks where the attacker can obtain the DDT (e.g., in Bar-On et al.’s attack on GOST), in supporting theoretical analysis of the properties of difference distribution tables (e.g., in Boura et al.’s work), or in some analysis of S-boxes with unknown design criteria (e.g., in Biryukov and Perrin’s analysis) . We show that using the well established relation between the DDT and the linear approximation table (LAT), one can devise an algorithm different from the straightforward guess-and-determine (GD) algorithm proposed by Boura et al. Moreover, we show how to exploit this relation, and embed the knowledge obtained from it in the GD algorithm. We tested our new algorithm on random S-boxes of different sizes, and for random 14-bit bijective S-boxes, our results outperform the GD attack by several orders of magnitude.


Introduction
Differential cryptanalysis, introduced by Biham and Shamir [BS91], has transformed the field of cryptanalysis and offered attacks against multiple symmetric-key primitives (and a few public-key ones).An essential component in estimating the probability of a differential characteristic is the Difference Distribution Table of an S-box.This table is easy to compute when the S-box is given (in time O(2 2n ) for an n-bit S-box).However, the inverse problem of deducing the S-box from a given DDT, was mostly left unstudied.
At first, this problem looks like a theoretical problem of very limited practical interest.However, efficient reconstruction of the S-box from the DDT is a useful tool in several cases.First, several cryptanalytic attacks on secret S-boxes constructions (such as GOST [GOS98] and Blowfish [Sch94]) may have access to the difference distribution table rather than the S-box itself.For example, in Bar-On et al.'s slide attack on GOST [BOBDK18], the attacker can learn the DDT, and needs to deduce the secret S-box from it.
Another line of research that will enjoy such efficient reconstruction algorithms is the study of the theoretical properties of DDTs.A recent work by Boura et al. [BCJS19] studied a theoretical question -can two different S-boxes, that do not satisfy some trivial relation, share the same DDT.As part of this work, a guess-and-determine (GD) algorithm for the reconstruction of the S-box was introduced and used. 1 While being practical for small S-boxes, this algorithm's running time was not analyzed for the general case, and it seems that for large S-boxes it may be impractical.
In this paper we tackle the reconstruction problem using a different approach.We rely on the well-established relation between the DDT of an S-box and the S-box's LAT [BLN17,BN13,CV95].We show that using this relation, it is possible to transform the DDT into multiple linear approximation tables,2 each of which is offering an S-box (that can be easily computed relying on the Walsh-Hadamard transform).
More precisely, we first use this relation to reconstruct as many of the Boolean functions c i • S(x) as we can (and need to) use this relation.For m-bit output S-boxes, reconstructing m such independent Boolean functions, i.e., c i • S(x) where 0 ≤ i < m and 0 ≤ c i < 2 m , is sufficient to trivially and efficiently reconstruct the S-box S(x).
After analyzing the process of the reconstruction of a single c i • S(x), we show how to use the knowledge obtained to improve the GD algorithm.We offer a heuristic analysis of the running time of both the GD algorithm (which may be of independent interest) and of our approach, suggesting that the combination offers superior results to the previous ones.More precisely, for many types of S-boxes, it is expected that our algorithm outperforms the GD algorithm.
Finally, we test different types of S-boxes, checking the time complexities of the actual reconstruction for different sizes of S-boxes.We compare our method with the simple GD algorithm and discuss in which cases our new method provides better performance than the simple guess and determine attack.For example, for 8-bit to 8-bit S-boxes, it seems that our algorithm is fairly comparable to the standard GD one.However, as the S-box size increases, our approach becomes significantly better -for 10-bit S-boxes, our approach is on average 10 times faster (and also the median is about 10 times better), for 12-bit S-boxes, our approach is about 4,500 times faster on average, and for 14-bit S-boxes, the speed-up is by a factor of 6.8 • 10 6 .This paper is organized as follows.In Section 2, we discuss the preliminary of the reconstruction problem, including the DDT and LAT, the previous works on the relation between an S-box, its DDT and its LAT.The notations used in this paper are also introduced.In Section 3, the problem of recovering the Boolean function c i • S(x) is solved by introducing a new problem which we call the sign determination problem.With the knowledge obtained by solving the new problem, the GD algorithm of [BCJS19] is improved in Section 4.Then, our approach is tested on different S-boxes and some special Boolean functions.In Section 5 we compare the performances of our method with the GD algorithm of [BCJS19].In Section 6, we conclude this paper.

Background and Notations
Throughout the paper we discuss S-boxes with n-bit inputs and m-bit outputs, i.e., n × m S-boxes.When m = n, we refer to the S-box simply as an n-bit S-box.We treat the S-box as a vectorial Boolean function, i.e., S(x) = (S m−1 (x), . . ., S 0 (x)), with m Boolean functions S i : F n 2 → F 2 for 0 ≤ i < m.After recalling the definitions of the difference distribution table and the linear approximation table, the previous work on the relation between them is revisited.We then introduce additional notations which are used in this paper.We then quickly recall some properties of Hadamard matrices.

Difference Distribution Table and Linear Approximation Table
The difference distribution table (DDT) of an S-box counts the number of cases when the input difference of a pair is a and the output difference is b (see [BS91]). 3 For an input difference a ∈ F n 2 and an output difference b ∈ F m 2 , the entry δ(a, b) of the S-box's DDT is: In  [DR07], that suggests the probability that a random entry of the DDT is non-zero is We follow the work of Boura et al. in [BCJS19] and call two S-boxes S 0 (x) and S 1 (x) DDT-equivalent if they have the same DDT.We also call an element a ∈ F n 2 a linear structure of the S-box S(x) if S(x) ⊕ S(x ⊕ a) is constant.In [BCJS19], a DDT-equivalence class is called trivial when its size matches the lower-bound given in Property 1: 2 into F m 2 and let denote the dimension of its linear space, i.e., of the space formed by all linear structures of S.Then, the DDT-equivalence class of S necessarily contains the 2 m+n− distinct functions of the form The differential uniformity is an important characteristic for analysing the resistance to differential cryptanalysis (see [Nyb94]).The differential uniformity of an S-box S(x) is defined as max The lowest possible value for the differential uniformity of a function from F n 2 into itself is two and functions with differential uniformity two are called almost perfect nonlinear (APN).As we discuss in Section 5, it is harder to reconstruct APN functions with input dimension between 7 and 11 from their DDT compared to random S-boxes, using our technique.
The linear approximation table (LAT) of an S-box is used to derive approximate linear relations between input bits and output bits of the S-box [Mat94].For any input mask a ∈ F n 2 and any output mask b ∈ F m 2 , the LAT entry is defined as where a • x and b • S(x) are the inner product over F 2 , e.g. a In Corollary 6 of [DR07], Daemen and Rijmen also discuss the LAT of a random n × m S-box, showing that the probability that a random entry of the LAT is non-zero is The nonlinearity of a Boolean function f from F n 2 to F 2 is the minimal number of truth table entries that must be changed in order to become an affine function.In our case, for each Let f be a Boolean function on n variables, where n is even.

Links between an S-box, its Difference Distribution Table and its Linear Approximation Table
We now revisit the relation between the DDT and the LAT of an S-box observed in [BLN17, BN13, CV95].To do so, we start with the Walsh-Hadamard transform.Let where a ∈ F n 2 , b ∈ F m 2 and a • x and b • y are the inner product over the domains F n 2 and F m 2 , respectively.Note that the sum in Equation 2 is evaluated over the reals.Lemma 1 shows that given the S-box's LAT, the attacker can reconstruct the underlying S-box by solving a system of 2 m+n -variable linear equations.Theorem 1 obtained in [BN13, CV95,DGV95] shows that the entries of the DDT and the LAT are linked to each other through the Walsh-Hadamard transform.
where λ 2 (a, b) is the Walsh transform of λ 2 (a, b), the squared LAT.
The first conclusion from the above theorem is that given the DDT, one can deduce the squared LAT as follows: Hence, in order to recover the S-box from the DDT and reconstruct the S-box from the squared LAT, we need to determine the signs of the entries in the squared LAT and reconstruct the S-box from the squared LAT.One can apply a trivial algorithm to reconstruct the S-box by testing all the 2 possibilities for the signs of the non-zero λ(a, b) coefficients to recover the LAT.As discussed in Section 3, we introduce a new, and more efficient method, to reconstruct the S-box.

Hadamard Matrices
Let H n be a 2 n × 2 n Hadamard matrix such that the element in the i-th row, j-th column of H n is (−1) i•j , where i • j is the inner product of i and j for any 0 ≤ i, j < 2 n .We show the recursive definition of these Hadamard matrices as follows: Definition 1.Let H 0 = (1), then the Hadamard matrix H i can be represented as

The Sign Determination Problem
As suggested in Section 2.2, given the DDT, we can easily compute λ † (a, b).To recover the S-box we just need to determine the signs of the entries.We define the sign determination problem as follows: Definition 2. Given λ † b where 1 ≤ b < 2 m , the sign determination problem of the b-th column in an LAT is the problem of recovering λ b from λ † b , i.e., determining the signs of λ(a, b), 0 ≤ a < 2 n .
To solve the sign determination problem, we study the linear relation between λ b and s b in Section 3.1.Based on this relation, a basic algorithm for solving the sign determination problem is presented in Section 3.2.In Section 3.3, we observe some interesting properties of the solution space.We use these observations in developing a new and improved algorithm in Section 3.4.We give a tight upper bound of the complexity of the new algorithm in Section 3.5.

The Linear Relation between λ b and s b
Property 2. For any b-th column of the linear approximation table (for 0 ≤ b < 2 m ), the following formula holds As H n • H n = 2 n I 2 n , this formula can also be written as H n s b = 2 λ b .Note that when p = 0 in Equation 5, it follows that The assignments of the b-th column are related to the linear combination of the components of S(x Definition 3. The c 0 -th, . . ., the c j -th columns in the LAT where 0 ≤ c 0 < • • • < c j < 2 m are independent columns if the binary representations of c 0 , . . ., c j are linearly independent over F m 2 . If the attacker solves the sign determination problem for m independent columns, then the attacker can easily recover the S-box.The attacker takes (c 0 , . . ., c m−1 ) as an m × m matrix on F 2 , denoted as C.For each S(i), 1 ≤ i < 2 n , let s i be the vector (c 0 S(i), . . ., c m−1 S(i)) T over F m 2 , which is known from the solutions of the sign determination problems.The binary representation of S(i), i.e., (S 0 (i), . . ., S m−1 (i)), is obtained immediately by computing C −1 s i .Then the S-box can be reconstructed by computing C −1 in time O(m 3 ) and computing C −1 s i for 0 ≤ i < 2 n in time O(2 n • m 2 ).Thus, the total running time of recovering S from m independent columns is If the attacker applies an exhaustive search to solving the sign determination problem of m independent columns, the complexity of reconstructing the S-box is still very high, which is O(2 m•2 n P LAT n,m + m 2 2 n ) as there are m columns of P LAT n,m 2 n non-zero elements each.We propose a basic method for solving this problem of one column in Section 3.2 and improve it with a significantly more efficient manner in Section 3.3 and Section 3.4.

Solving the System of Linear Equations H n x = y
With Definition 1 and the fast Walsh-Hadamard transform [MS77], we can solve the system of linear equations H n x = y recursively.By elementary transformation: It is easy to see that the original problem is divided into two independent subproblems as follows: We can recursively apply the above process to the problems in the -th step.At the beginning of the -th step, there are 2 −1 problems with 2 n− +1 constraints, denoted as: where β 0 , . . ., β 2 −1 −1 are the vectors obtained from the last step and 1 ≤ ≤ n.Each problem in Equation 7 is divided into two subproblems as follows: where, The total number of subproblems after the -th step is 2 and the number of constraints in each subproblem is 2 n− .At the n-th step, the coefficient matrix in the subproblems is H 0 = 1.Thus, the entries of x are directly obtained.

The Main Idea
We propose to solve the sign determination problem using a recursive procedure.In each layer, the algorithm combines the linear equations in the problem of the current layer, apply the idea of solving the system of linear equations H n x = y to reduce the problem into two independent subproblems and check the consistency of the subproblems.That is the algorithm works on the systems of linear equations with the size reduced by half compared to the ones in the previous layer.Finally, when it reaches the n-th layer, the algorithm returns the solutions to the sign determination problem.The algorithm can be represented by a tree structure.For ease of explanation we denote the -th layer of the recursive tree by T .The algorithm is initialized by guessing the signs of At the beginning of the -th layer, the subproblems in Equation 7 are recorded in We call the set a full set which contains all the possible i-th constraints in Equation 7, denoted by (This strategy will be replaced by a more effective manner described in Section 3.3 and Section 3.4.) In the -th layer, the i-th possible constraints of the new subproblems in Equation 8 are deduced from Equation 7 to construct a new vector which is defined as E −1 ( p, q) is computed as described below: where p j and q j are the j-th entries with respect to p and q and 0 ≤ j < 2 −1 .It can be seen from Equation 8that each entry of the vector in F [i] is an even number in the range of −2 n− and 2 n− when 1 ≤ < n.As the components of s b are 1 or −1, then in the n-th layer, the entries of the vectors in F n [0] take their values from the set {1, −1}.If the constraints over the elements of the vectors in F [i] are satisfied for the vector E −1 ( p, q), the new vector is a possible i-th constraint of the new subproblems in Equation 8; otherwise, it should be discarded.When it reaches the n-th layer, the solutions of the sign determination problem are the vectors in the root node To illustrate our idea more intuitively, we refer to the recursive tree for n = 2 in Figure 1 and show an example when λ † b = (1, 1, 1, 1) and the corresponding LAT column In the end, there are eight vectors in T 2 [0], which are (1, 1, 1, −1), (−1, −1, −1, 1), (1, 1, −1, 1), (−1, −1, 1, −1), (1, −1, 1, 1), (−1, 1, −1, −1), (−1, 1, 1, 1) and (1, −1, −1, −1).It can be seen that s b ∈ T 2 [0].We give the pseudo code of the basic algorithm in Algorithm 1.
Similarly to the GD algorithm, we fix S(0) to 0 (or any other constant) to find one representative of the DDT-equivalence class and other DDT-equivalent S-boxes can be obtained applying simple linear transformations based on Property 1.Therefore, Algorithm 1 only returns the vectors with the first element as (−1) b•S(0) = 1.

Observing the Structure in the Full Set and Introducing the Compact Set
When we examine the full set F [i], we notice that its vectors are related.We can use this relation to offer a more compact representation of F [i] without losing any solutions.This new compact representation reduces both the time and memory complexities of the search algorithm.In the following, we discuss the structure of the full set first.Then, we improve the basic algorithm of Section 3.2 using the compact representation of the full set.

The Structure of the Full Set
Before presenting the structure of F [i], we define a set of symmetric permutations.
Definition 4. Let v T be (v 0 , . . ., v 2 −1 ).For 0 ≤ j < , we define π j as follows: Algorithm 1 Basic Algorithm for Solving the Sign Determination Problem return There exist no S-boxes corresponding to the given DDT! 13: end if 14: if every entry in w is even and ranges from −2 n− −1 to 2 n− −1 then 19: end if 33: end procedure Note that the permutation π j swaps every two consecutive blocks of 2 j elements in v pairwise.Let Π be a set of symmetric permutations π 0 , . . ., π −1 .It can be easily verified that each permutation in Π is of order two and that π 0 , . . ., π −1 are pairwise commutative.Suppose that v = E −1 ( p, q), the vectors which generate π j ( v) for 0 ≤ j < in the ( − 1)-th layer are shown in Lemma 2, .
Proof.By the definition of operation E −1 , It follows from the definition of π j that p = π −1 j−1 ( p) and q = π −1 j−1 ( q).Thus, For the case when j = 0, it can be easily verified that π 0 ( v) = E −1 ( p, − q).Now, we define a j-symmetric relation between two vectors u and v with respect to the permutations in Π that helps in capturing the structure of the full set F [i].
Definition 5.For each 0 ≤ j < , the vector u is j-symmetric to the vector v if there exist p ≥ 1 permutations π j0 , . . ., For the special case when j = , the -symmetric vectors to u are defined as u and − u.
We say that u is symmetric-equivalent to v if for some j, u is j-symmetric to v. It can be easily verified that the symmetric-equivalent relation is an equivalence relation.Moreover, the set of vectors that are positive j-symmetric to u for some j is denoted by [ u] + .For any 0 ≤ j ≤ , we denote the set of vectors that are j-symmetric to u as [ u] j .Thus, for each vector u ∈ F Theorem 2. For any vector u ∈ F [i] and for any Proof.Let us consider the case when a vector v is j-symmetric to u, 0 ≤ j < l.For the positive case in Definition 5, we first prove inductively that for each j ≤ j, π j ( v) ∈ F [i].The negative case in Definition 5 can be proved with a similar method.
The statement is true when For the case when j = , the positive case is trivial as u = v.The negative case is proved inductively.When = 0, the statement is true: Now we define the self-j-symmetric vector and the self-j-symmetric set, respectively.

Compact Set
Based on Theorem 2, we define a compact set C [i] to be a compact representation of the full set The relation between the full set F [i] and its compact set C [i] is thus: .By doing this, the repeated computation is avoided and the memory consumption is greatly reduced compared to the basic algorithm, as we will show in Section 3.4.
We now propose a technique to construct the compact set In this way, the compact set C +1 [i] is indeed constructed by computing E ( u, v) for each u ∈ C [i] and v in each M u, w such that every two elements in C +1 [i] are not j-symmetric to each other, 0 ≤ j ≤ .
The process of building the middle set M u, w is shown in Algorithm 2. The structure of the middle set M u, w is related to the symmetric property of the compact set for all integers j ∈ J do 4: for all the distinct vectors e, f in M u, w do end for 10: end for 11: return M u, w 12: end procedure It can be seen from Algorithm 2 that the middle set M u, w is constructed by discarding the irrelevant elements from the set [ w] + , i.e., the vectors in [ w] + that generate j-symmetric vectors for some j are carefully selected.Next, we discuss in which form the vectors need to be removed from the set [ w] + to build the middle set M u, w .Now we show in Lemma 3 that M u, w ⊆ [ w] + .

Lemma 3. For each non-zero vector
, which reaches a contradiction that each two vectors in C +1 [i] are not 0-symmetric.Thus, − v / ∈ M u, w .Let [ w] + be the set which contains the vectors that are positive j-symmetric to w for all 0 ≤ j ≤ .It can be concluded that M u, w ⊆ [ w] + .
The structure of M u, w is also related to the symmetric property of the vector u.Suppose that u is self-j-symmetric, then there exist p permutations π j0 , . . ., π jp−1 such that u = π jp−1 • . . .• π j0 ( u). 5 For a vector v ∈ M u, w , let e denote the vector E ( u, v), , which is self-(j + 1)-symmetric to e.This contradicts the fact that each two vectors in

Improved Sign Determination Algorithm
We can now run a variant of the basic algorithm using the compact sets.In the initial phase of the improved algorithm, the leaf nodes are assigned C 0 and After n iterations, the solutions to  9. The search process of the sign determination problem using compact sets is stated in Algorithm 3.
We show an example with λ † b = (0, 0, 0, 0, 2, 2, 2, 2) T to contrast the basic algorithm with the improved strategy.We apply Algorithm 3 to solve the sign determination problems and show the tree structure involved in solving the sign determination problem in Figure 2. Note that the compact sets in each layer are stored in the corresponding leaf nodes.The compact set technique is shown by constructing Note that when the basic algorithm is applied, the full set F 3 [0] is constructed from the full sets F 2 [0] and F 2 [1], where F 2 [0] = F 2 [1] = {(2, 0, −2, 0), (−2, 0, 2, 0), (0, 2, 0, −2), (0, −2, 0, −2)}.For each u ∈ F 2 [0] and v ∈ F 2 [1], we compute E 2 ( u, v) and obtain 16 elements in the full set F 3 [0], whereas the compact set C 3 [0] contains only one element.To obtain the full set F 3 [0], we only apply simple permutations on the elements of C 3 [0], which avoids repeated computations.Thus, it can be concluded that applying the compact sets in the reconstruction procedure can save both time and memory complexity compared with the basic algorithm.We note that the advantage of applying the compact sets is more significant as the size of the full set is larger.
The number of the solutions of its sign determination problem is equal to the size of the Boolean functions which are DDT-equivalent to b • S(x), 1 ≤ b < 2 m .The Boolean functions corresponding to the solutions of its sign determination problem share the same squared LAT with b • S(x), i.e., ( λ † 0 , λ † b ).These Boolean functions are DDT-equivalent with b • S(x).When b • S(x) has nontrivial DDT-equivalence classes, T n [0] contains multiple vectors.
Given enough memory, Algorithm 3 can solve all the sign determination problems.However, for some instances, the amount of vectors in the internal layer grows sharply, which demands too much memory.In this situation, a threshold H on the number of internal vectors can be preset heuristically with respect to the accessible memory of the attacker.In the -th layer, if the size of C [i] rises above the threshold H, the search process is interrupted, where 0 ≤ < n and 0 ≤ i < 2 n− .
We call a column in the absolute LAT good if it can be recovered under the threshold H applying Algorithm 3; otherwise bad.In some cases, there exist both good columns and bad columns in the absolute LAT.For example, the S-boxes of CAST-256 [Ada99], like S0, are 8 × 32 S-boxes, which are constructed by choosing 32 distinct bent functions as the components (see [Ada97] for details).It indicates that each entry of the columns { λ † 2 i |0 ≤ i < 32} is 2 8/2−1 = 8, i.e., all the entries of the LAT columns are ±8.It has Algorithm 3 Improved Algorithm for Solving the Sign Determination Problem return There exist no S-boxes corresponding to the given DDT! 14: Randomly pick a vector from C [i] and compute if every entry in r is even and [−2 n− −1 , 2 n− −1 ] then 24: end if 44: end procedure been proven by Langevin and Leander in [LL11] that the number of bent functions in dimension eight is approximately 2 106 .Thus, the sign determination problem for the columns { λ † 2 i |0 ≤ i < 32} is too computationally expensive to be solved, i.e., these columns are bad columns.However, there are still some good columns in the absolute LAT of CAST-256's S0 if the attacker sets the threshold to 2000.For example, λ † 6 and λ † 7 corresponding to the 6th and 7th columns of its LAT.
According to our experiments with input size n between 8 and 14, the number of solutions for the good columns is no more than 2 n+2 , i.e., T n [0] contains at most two vectors.We note that determining the size of the DDT-equivalence classes of a Boolean function from F n 2 to F 2 is still an open problem and determining a suitable H, or even telling in advance whether a column is good, is also an open problem.

Heuristic Analysis of Time and Memory Complexities
We now analyze the memory complexity of Algorithm 3. In the -th layer, there are 2 n− nodes in the tree structure, 0 ≤ ≤ n.Each node contains at most H vectors of length 2 and the entry of the vector ranges from 2 n− to −2 n− .The memory complexity of storing the nodes in the -th layer is O(H Note that the complexity of this step is negligible as the attacker only applies permutations on vectors.Then, the attacker computes . The time complexity of constructing C +1 is no more than O(H 2 2 3(n− ) ).Thus, the upper bound of the time complexity is O(H 2 2 3n ).

Applying Algorithm 3 for Reconstructing the S-box
The procedure of reconstructing an n × m S-box is related to the number of good columns defined in Section 3.4.We suppose that the attacker has solved the sign determination problem for k independent good columns, 1 ≤ k ≤ m.In the sign determination problem for the c i -th column, the possible candidates for the Boolean function c i S(x) are recovered by Algorithm 3. We call it the matching phase for the k good columns when the combination of these candidates is searched with respect to the squared LAT, 1 < k ≤ m.
After the matching phase for the k good columns, the Boolean functions c 0 S(x), • • • , c k−1 S(x) are obtained.As mentioned before, when k = m, the attacker can reconstruct the S-box using linear algebra.When k < m, applying the knowledge of c 0 S(x), • • • , c k−1 S(x), we propose a new technique that improves the guess-and-determine algorithm of [BCJS19].

The Matching Phase for the k Good Columns
Let V i be the set which contains the output vectors from Algorithm 3 with respect to the c i -th squared LAT column, where 0 ≤ i < k.In the matching phase, the Boolean functions c 0 S(x), • • • , c k−1 S(x) are obtained by searching the vectors in V i to match the squared LAT applying a basic property of the Hadamard product.
Property 3 is obvious from the definition of s b and the Hadamard product.Combining the first formula in Property 3 with Property 2, we obtain that the (b ⊕ c)-th column λ b⊕c in the LAT can be deduced by 1/2H n • s b⊕c = 1/2H n • ( s b s c ).For each two vectors u ∈ V i and v ∈ V j , the attacker computes a new vector w = 1/2H n • ( u v).Then, the attacker can easily detect whether u and v are consistent with the squared LAT by verifying whether w † = λ † b⊕c .We call u and v a matching vector pair if they are consistent with the absolute LAT column λ † b⊕c .Now we discuss the matching phase of the c i -th column and the c j -th column, 0 ≤ i < j < k.It should be noted that it is not necessary to verify the match for every pair of vectors from V i and V j .In the reconstruction problem, our purpose is to find a representative S(x) in the equivalence class For example, when the matching phase begins with the c 0 -th and c 1 -th columns, let us assume that there are q distinct symmetric-equivalence classes in the solution of the sign determination problem of the c 0 -th column, i.e., V 0 = { v| v ∈ [ u p ], 0 ≤ p < q}.The set of vector pairs which needs to be tested is {( u p , w)|0 ≤ p < q, w ∈ V 1 }.When the attacker finds the representatives of matching vector pairs in V 0 × V 1 that are consistent with the squared LAT, i.e., {( u p0 , w)|0 ≤ p 0 < q, w ∈ V 1 }, the other matching vector pairs in V 0 × V 1 can be constructed by the second formula in Property 3. Similarly, once the attacker obtains c 0 S(x) and c 1 S(x) corresponding to the matching vector pairs, all other Boolean functions can be recovered by the translation c 0 S(x ⊕ c) ⊕ d and c 1 S(x ⊕ c) ⊕ d following Property 1.
The number of the matching vector pairs between V i and V j is related to the number of the Boolean functions which are DDT-equivalent to (c i S(x), c j S(x)).More precisely, the matching phase over V i and V j finds the vectorial Boolean function . Thus, G(x) shares the same DDT with (c i S(x), c j S(x)).Note that the problem of determining the size of DDT-equivalence class of a Boolean function from F n 2 to F 2 2 is also an open issue.As the size of DDT-equivalence class is unknown, we restrict the prescribed DDT to be a family of S-boxes for which the DDT-equivalence class is trivial according to the following conjecture proposed in [BCJS19].
Conjecture 1. Suppose that S is a permutation over F n 2 and the rows of the DDT of S are pairwise distinct.Then, the DDT-equivalence class of S is trivial, i.e., only contains the permutations of the form S(x ⊕ c) ⊕ d, where c, d ∈ F n 2 .The matching phase for k good columns is shown in Algorithm 4 repeating the matching phase of the i-th good column and the (i + 1)-th good column, 0 ≤ i ≤ k − 2. For the S-boxes with trivial DDT-equivalence class, one combination is expected to be returned from Algorithm 4. If Conjecture 1 does not hold when the DDT-equivalence class of S is nontrivial, lines 9 and 17 in Algorithm 4 should be removed and the search continues with a set of the match vector pairs.
In our case, the number of solutions for good columns is ).The memory complexity is negligible.

The Improved Guess-and-Determine Algorithm
Now we suppose that the attacker has obtained k (1 ≤ k < m) Boolean functions, i.e., c 0 S(x), . . ., c k−1 S(x), using Algorithm 4. We present an improved GD algorithm that takes the DDT table and the k Boolean functions as its inputs and returns a representative of the DDT-equivalence class.
The improved GD algorithm implements the tree-traversal structure of [BCJS19].The improved GD algorithm begins by fixing S(0) to be zero in the initial layer.In the i-th layer, Algorithm 4 The Matching Phase Given k Good Columns 1: Input: the index set of the good columns C = {c 0 , . . ., c k−1 }, the corresponding solution sets V 0 , . . ., V k−1 and the squared LAT; 2: Output: c 0 S(x), . . ., c k−1 S(x); 3: break this line is to be removed if the DDT-equivalence class is nontrivial.
break this line is to be removed if the DDT-equivalence class is nontrivial.the algorithm determines the possible assignments for S(i), i = 1, . . ., 2 n − 1, by checking the constraints imposed by the DDT.We follow the notations from [BCJS19] by denoting the set of possible values for S(i) by R i = {y δ(i, y) = 0} imposed by the given DDT.It implies that In our approach, the knowledge of c 0 S(x), . . ., and c k−1 S(x) reduces the size of the set L. For every element x ∈ L, if any of the equalities c 0 does not hold, x is removed from L. Then the guess and determine of [BCJS19] is applied with the reduced lists.The reconstruction process is illustrated in a recursive way in Algorithm 5.
Next, we analyze the time complexity of Algorithm 5 on a random S-box for 1 ≤ k < m.The analysis of the original GD algorithm when k = 0 is presented in Appendix A. In the first layer, after discarding the non-consistent values of S(1) based on the DDT, there are 2 m P DDT n,m possible values on average.Similarly, after checking the constraints imposed by c 0 S(x), • • • , and c k−1 S(x), there are 2 m−k P DDT n,m possible values left for S(1).By the i-th layer, the above process is repeated and the number of the possible assignments S(1), • • • , S(i) on average is where K is the smallest positive integer such that 2 (m−k)i (P DDT n,m ) i 2 +i 2 < 1.In the (i+1)-th layer, there are 2 m P DDT n,m possible assignments for S(i+1), where i < K.For each possible assignment, the attacker checks whether S(i+1)⊕S(1), . . ., S(i+1)⊕S(i) Algorithm 5 The Improved Guess-and-Determine Algorithm 1: Input: the indices of good columns c 0 , . . ., c k−1 , the Boolean functions c 0 S(x), • • • , c k−1 S(x) and the given DDT 2: Output: one representative in the DDT-equivalence class 3: s is initialized as a vector of 2 m zeros.4: ImprovedGD( s, 1) 5: return s 6: 7: procedure ImprovedGD( s,i) • 2 m W i (k) possible assignments for S(1), • • • , S(i + 1) at this stage.Each assignment should be tested with respect to the constraints c 0 S(i + 1), • • • , and c k−1 S(i + 1).The number of checks on each assignment is also no more than 2. Thus, the time complexity of this layer is From the K-th layer, W i (k) = 1 and the time complexity of each layer is no more than 2 m+1 P DDT n,m .Thus, the expected time complexity of Algorithm 5 is We evaluate the time complexity for the original guess and determine algorithm for n = 8 with different values of m, which is shown in Table 1.It should be noted that increasing the size of the output of the S-box (i.e., n) makes the reconstruction process easier.Thus, an n × m S-box with m n is not a significantly secure option when designing a secret non-linear layer for a cryptographic primitive.
Recall that from Equation 3, the time complexity of deducing a column of the absolute LAT from the DDT of an 8-bit S-box is about 2 24 P DDT n,m ≈ 2 23.28 , which is greater than the complexity of the original GD algorithm of 2 22.34 .Thus, for a random 8-bit S-box, it is better to apply the original GD algorithm when reconstructing the S-box from its DDT.We also evaluate the time complexity for the GD phase for n-bit S-box with different k, where 9 ≤ n ≤ 14.The results are shown in Table 2.It is obvious that to optimize the original GD algorithm, the attacker should find at least two independent good columns.It should be noted that from Table 2, the original GD algorithm (k = 0) quickly becomes impractical with the size of S-box growing.For example, reconstructing a 14-bit S-box with the original GD algorithm is infeasible with the expected time complexity of about 2 68.37 ; whereas for k ≥ 9 it is no more than 2 27.68 .Hence, given enough good columns, our technique improves the original GD algorithm and makes the reconstruct procedure practical to be implemented.

Experiments
We verify our results by implementing our reconstruction technique on random S-boxes, the S-boxes of some existing block ciphers, 4-differential uniform permutations, and APN functions.Our experiments are implemented in C++ using a g++ compiler with -O2 optimization with a single core of an Intel(R) Xeon(R) E5-2620 v3 CPU @ 2.40GHz of 64GB memory.The related codes are available at https://github.com/xiaohuangthu/sbox.

Random S-boxes
For each 8 ≤ n ≤ 14, we compare the performance of the GD algorithm and our approach by implementing the two methods on 100 random n-bit S-boxes.For 8 ≤ n ≤ 12, we set the threshold H to be 2000.The necessary memory for 8 ≤ n ≤ 12 is 4.2MB, 10.4MB, 26.9MB, 67.7MB, and 172.6MB, respectively according to our analysis in Section 3.5.For the 13-bit instances, when H = 2000, there are not enough good columns found.Thus, the threshold is increased to 6000 for the random 13-bit S-boxes, which costs 1.2GB memory at most.We set H = 12000 for the same reason when reconstructing 14-bit cases, which needs 5.3GB memory at most.We also note that as shown in Table 2 when the number of good columns grows, the effect of reducing the search phase of the GD phase becomes less significant.In the experiments, we set the value of k to make the complexity of the GD phase practical, e.g., k = 6 for n = 14.The running time of the GD algorithm and our approach is shown in Figure 3.We denote the running time as T .The time measurements of our approach include finding sufficient number of good columns and using them for recovering the S-box.The statistical data of the running time on the instances is presented in Table 3.The running time of the original GD algorithm for larger S-boxes (i.e., 12-, 13-, and 14-bit) is estimated based on the following approach:  As mentioned, the time complexity of the GD algorithm on random 14-bit S-boxes is about 2 68.27 .Obviously, we have not run an experiment for that long.Indeed we estimate the running time of the GD algorithm using the following methodology: first, we fix S(1), • • • , S(4), and S(5) to the correct values and apply the GD algorithm on the remaining values, denoting the running time to be t 0 .Then, we repeat the procedure with wrong assignments for S(1), • • • , S(4), and S(5) for 100 times.We denote the average running time for a wrong guess by t 1 .Thus, if there are C assignments to check before the correct one for S(1), • • • , S(4), and S(5), then the estimated running time is t 0 + C • t 1 .
It can be seen from Figure 3 that the advantage of our approach over the GD algorithm sharply increases when the size of the S-box grows.Among 100 random 8-bit S-boxes, our approach is better than the GD algorithm in 2 cases.For the random 9-bit S-boxes, our approach is better in 44 cases.For the random 10-bit S-boxes, our approach is better in 87 instances.When the input size of S-boxes is larger than 11, our approach is better in all cases.For example, as shown in Table 3, the average running time of the GD algorithm on the random 14-bit S-boxes is approximately 15178.9 years.The average running time of our approach is 7.52 × 10 4 s, which is less than one day.It can be seen from the standard deviation in Table 3 that the running time of our approach is more stable than the GD algorithm.

Specific S-boxes of Existing Ciphers
We also run experiments on the 8-bit S-boxes of several block ciphers, including AES [DR02], Camellia [AIK + 01], SEED [IA], ARIA [KKP + 04], SKIPJACK [Age], CLEFIA [SSA + 07], and Streebog [oTRM].In these experiments, Algorithm 3 is applied to solve the sign determination problems with the threshold preset to be 2000 and the number of good columns is set to 2. While for many of the tested S-boxes, we have found good columns; for the S-box S0 of CLEFIA, there exists no good column in its absolute LAT.The running  4 that it is more effective to reconstruct the S-boxes of AES, ARIA, SEED, Camellia, and S1 of CLEFIA from their DDT by solving the sign determination problem of two independent columns and applying Algorithm 5 with the knowledge of two Boolean functions related to the S-box.For example, using the GD algorithm, the reconstruction procedure takes 9.19s to recover the S1 in SEED from its DDT.However, when the attacker solves the sign determination problem of two independent columns, the reconstruction costs only 0.23s.It should be noted that the S-boxes of AES, ARIA, SEED, Camellia and S1 of CLEFIA are of 4-differential uniformity.
For other 8-bit S-boxes in our experiments, i.e., the S-boxes of Streebog, Skipjack and S0 of CLEFIA, it is more effective to reconstruct the S-box from its DDT with the original GD algorithm of [BCJS19].It should be noted that the S-boxes of Streebog, Skipjack and S0 in CLEFIA are 8-, 12-, and 10-differential uniformity, respectively.

4-differential uniformity S-boxes and APN functions
We applied our algorithms on some 4-differential uniformity permutations in Table 1 of [BCC10] for the input size between 9 and 14.The threshold is set to 12000.Although it is difficult to reconstruct the S-boxes with low differential uniformity according to our experiments, we can still find good columns in the LAT of the 10-bit and 14-bit inverse functions, respectively.For example, there are 3 good columns found in the absolute LAT of the 14-bit inverse function, which reduce the searching space of guess-and-determine algorithm sharply.
It is hard to find good columns in absolute LAT of APN functions.We applied our technique to the 7-bit S-box S7 and 9-bit S-box S9 in the block ciphers KASUMI [KAS], MISTY1 [Mat97], which are designed to be the APN permutations.We found no good columns in the absolute LATs of KASUMI's S7 and S9 and MISTY1's S7 and S9 even when we set the threshold H = 12000.Then, with the same threshold, we applied our technique to the APN functions of the input size between 6 and 11 listed in Table 3 of [Sun17].It is interesting to note that we find good columns only in the 6-bit APN functions, the 8-bit Kasami function and the inverse functions with 7-bit, 9-bit and 11-bit input, respectively.Hence for such functions it seems that the standard GD algorithm in [BCJS19] is better than ours.

Conclusions
In this paper we presented a new algorithm for reconstructing an S-box from its DDT.The new algorithm is more efficient than the guess-and-determine algorithm proposed by Boura et al. in [BCJS19], for random S-boxes starting at the size of 10 bits, it outperforms the previous GD algorithm by several orders of magnitude.
Most notably, the new algorithm can be useful to explore problems related to DDTs.This includes theoretical explorations (e.g., whether there are two DDT-equivalent bijective S-boxes which are not linearly equivalent) or even the ability to construct an S-box from a "made up" DDT (i.e., picking the DDT and then constructing an S-box out of it), thus extending the analysis of Biryukov and Perrin in [BP15] for partial DDT constraints.We note that this capability may be used for designing stronger S-boxes with a 0 in selected places of the DDT.This would allow, for example, to make sure that differences which are optimal with respect to the linear layers (e.g., activate less S-boxes in non-full diffusion layers) cannot "co-exist" through the S-box itself.
Another related open problems are the problems of reconstructing an S-box from its Boomerang Connectivity Table, introduced in [CHP + 18] and its Differential-Linear Connectivity Table, introduced in [BODKW19], respectively.These two tables are useful for evaluating the boomerang attack [Wag99] and the differential-linear attack [LH94], respectively.Both tables are related to the DDT.Hence, while at the moment there are no attacks that recover the BCT and DLCT of an unknown requiring the ability of reconstructing an S-box from them, this ability to reconstruct may help in exploring the properties of BCTs and DLCTs.

A The Time Complexity of the Original Guess-and-Determine Algorithm
The original guess-and-determine algorithm in [BCJS19] also returns a representative S in the set {S(x ⊕ c) ⊕ d c ∈ F n 2 , d ∈ F m 2 }.To achieve so, the attacker can fix S(0) to be zero and fix S(1) to be any value in R i .Thus, there is one possible case after the first layer.Similar to the analysis in Section 4.2, the number of the possible cases at the end of the i-th layer is where K is smallest positive integer such that 2 m(i−1) (P DDT n,m ) i 2 +i−2 2 < 1.In the (i + 1)-th layer, the attacker need to check the consistency of 2 m P DDT n,m W i cases with respect to the DDT.The complexity of the (i + 1)-th layer is no more than 2 m+1 P DDT n,m W i .As the algorithm starts from searching the assignment of S(2), the time complexity of the original guess-and-determine algorithm is [O'C94, O'C95], O'Connor discussed the DDT of a random bijective n-bit S-box, showing that for a, b = 0, δ(a, b) = 2t, where t ∼ Poi(1/2).Later, Daemen and Rijmen investigated n × m S-boxes, reaching related conclusions in Corollary 2 of let θ(a, b) be the characteristic function of S, i.e., θ(a, b) = 1 if and only if S(a) = b; otherwise θ(a, b) = 0.Then, λ(a, b) = 2 m+n−1 θ(a, b).

)
For each vector u ∈ C [i], any vector v ∈ [ u] can be constructed using Definition 5.The full set F [i] can thus be obtained by applying Equation 9 to all u ∈ C [i], i.e., by computing [ u] from u.The compact set C [i] thus allows rebuilding F [i] efficiently.The basic algorithm in Section 3.2 can be optimized by storing the compact set C [i] instead of the full set F [i] in the intermediate node T [i]

1 CFigure 2 :
Figure 2: The Tree Structure for a Sign Determination Problem w contains at most 2 vectors with length of 2 , where u ∈ C [i] and w ∈ C [i + 2 n− −1 ].Thus, O((n − + 1)2 2 ) bits of memory are needed to store M u, w , To conclude, the memory complexity of Algorithm 3 is O(H • n 2 2 n + n2 2n ) bits.Similarly, we analyze the time complexity of Algorithm 3. To construct C +1 [i], the attacker needs to apply Algorithm 2 for constructing M u, w first for each u ∈ C [i] and w Deduce c 0 S(x), . . ., c k−1 S(x) from p 0 , . . ., p k−1 23: return c 0 S(x), . . ., c k−1 S(x).
if the DDT of s matches the given DDT then are consistent with the DDT.The expected complexity of this process is 1 + P DDT n,m + • • • + (P DDT n,m ) i < 2 tests.There are (P DDT n,m ) i+1

Figure 3 :
Figure 3: The Running Time on Random S-boxes w}

Table 1 :
log 2 T n,m (0) for random S-box n = 8 with different m

Table 2 :
log 2 T n,n (k) for random S-box 9 ≤ n ≤ 13 with Different k

Table 3 :
The Statistical Data for The Instances

Table 4 :
The Running Time for Existing S-boxes