Eﬀicient Search for Optimal Diffusion Layers of Generalized Feistel Networks

. The Feistel construction is one of the most studied ways of building block ciphers. Several generalizations were then proposed in the literature, leading to the Generalized Feistel Network, where the round function ﬁrst applies a classical Feistel operation in parallel on an even number of blocks, and then a permutation is applied to this set of blocks. In 2010 at FSE, Suzaki and Minematsu studied the diﬀusion of such construction, raising the question of how many rounds are required so that each block of the ciphertext depends on all blocks of the plaintext. They thus gave some optimal permutations, with respect to this diﬀusion criteria, for a Generalized Feistel Network consisting of 2 to 16 blocks, as well as giving a good candidate for 32 blocks. Later at FSE’19, Cauchois et al. went further and were able to propose optimal even-odd permutations for up to 26 blocks. In this paper, we complete the literature by building optimal even-odd permutations for 28, 30, 32, 36 blocks which to the best of our knowledge were unknown until now. The main idea behind our constructions and impossibility proof is a new characterization of the total diﬀusion of a permutation after a given number of rounds. In fact, we propose an eﬃcient algorithm based on this new characterization which constructs all optimal even-odd permutations for the 28, 30, 32, 36 blocks cases and proves a better lower bound for the 34, 38, 40 and 42 blocks cases. In particular, we improve the 32 blocks case by exhibiting optimal even-odd permutations with diﬀusion round of 9. The existence of such a permutation was an open problem for almost 10 years and the best known permutation in the literature had a diﬀusion round of 10. Moreover, our characterization can be implemented very eﬃciently and allows us to easily re-ﬁnd all optimal even-odd permutations for up to 26 blocks with a basic exhaustive search.


Introduction
The Feistel network is one of the main generic designs for building modern block ciphers. It was initially proposed in the data encryption standard DES [?], and is still used in more recent ciphers such as Twofish [?], Camellia [?] or SIMON [?]. The idea behind this construction is to split the plaintext into two halves x 0 , x 1 , and build the round function which sends (x 0 , x 1 ) to (x 1 , x 0 ⊕ F i (x 1 )), where F i is a non-linear function for the i-th round. One of the main advantage of this construction is that F i does not need to be invertible, and thus it allows to transform a pseudorandom function (PRF) into a pseudorandom permutation (PRP). Moreover, there are theoretical arguments suggesting that it is a good method to construct block ciphers, as Luby and Rackoff proved in 1988 [?] that if each F i is a pseudorandom function and all three are independent, then 3 rounds of the Feistel construction are enough to get a block cipher which is indistinguishable from a random permutation under the Chosen Plaintext Attack (CPA) model, and 4 rounds with 4 independent functions are enough in the Chosen Ciphertext Attack (CCA) model. This was later improved by Pieprzyk in 1990 [?] : if one takes f as a pseudorandom function, 4 rounds of Feistel with F i = f for i = 1, 2, 3 and F 4 = f 2 are sufficient to obtain a block cipher that is indistinguishable from a random permutation in the CPA model. In 1989 at CRYPTO, Zheng et al. [?] proposed some generalizations of the Feistel construction. Especially, they defined the Type-2 Feistel 1 construction, which splits the message into 2k blocks and uses a round function of the form where each F i,j is a pseudorandom function for the i-th round. This is essentially a parallel application of k Feistels followed by a cyclic shift of the blocks. They also showed that when all F i,j are pseudorandom functions, then 2k + 1 rounds of such a construction provide a block cipher that is indistinguishable from a random permutation. Moreover, the Type-2 construction is inherently easier to compute in parallel, and the corresponding decryption function is basically the same except that the functions F i,j are applied in reverse order, i.e. for r rounds, the first round of decryption uses the functions F r,j . Both of these properties make this construction very efficient in practice, both on hardware and software, e.g. TWINE [?] and Simpira [?]. All of these arguments lead to some block ciphers based on this Type-2 Feistel construction, such as HIGHT [?] and CLEFIA [?].
At ASIACRYPT'96,Nyberg [?] studied a variant of the Type-2 Feistel construction using a different permutation than the cyclic shift, called Generalized Feistel Network. Such a construction was used to design block ciphers such as TWINE [?] and Piccolo [?]. However, Nyberg only focused on one specific permutation. Suzaki and Minematsu thus studied at FSE'10 [?] a more general case where the cyclic shift is replaced by any other permutation of the blocks. Their work was focused on finding permutations with the lowest diffusion round. The diffusion round is close to the concept of diffusion introduced by Shannon in 1949 [?]. Essentially, a block cipher has full diffusion if every bit of the ciphertext depends on every bit of the plaintext. In the context of Generalized Feistel Network (GFN), [?] defined the diffusion round as the minimal number of rounds such that every block of the ciphertext depends on every block of the plaintext. Focusing on blocks instead of bits allows them to get rid of the precise specification of the functions F i,j as well as the exact size of the blocks, thus giving structural results. Especially, they tied the diffusion round of a given GFN to its resistance against Impossible Differential distinguishers [?], proving that if a GFN has a diffusion round of DR, then it needs strictly more than 2DR + 1 rounds to avoid any Impossible Differential distinguisher. Along with a lower bound on the diffusion round of a GFN of 2k blocks, they gave optimal permutations (w.r.t the diffusion round) for 2 ≤ 2k ≤ 16. It is worthy to note that such an optimal permutation was then used to design block ciphers such as TWINE [?]. At FSE'19, Cauchois et al. went further and gave optimal permutations for 18 ≤ 2k ≤ 26, as well as good candidates for 2k = 32 (which was already found in [?]), as well as for 2k = 64 and 128 using a sophisticated technique that they called Collision-free exhaustive search. Note that these permutations are even-odd, i.e. the image of an even number is an odd number. On a side note, relaxing the condition that the permutation is the same in each round make the problem easier and in [?], Kales et al. give such a construction for any number of blocks.
Our contribution. In this paper, we focus on even-odd permutations and we complete the work on the 10-year-old problem (introduced by [?]) of finding optimal even-odd permutations for 32 blocks, as well as finding optimal even-odd permutations for 28, 30 and 36 blocks which were not given in the previous literature. To do so, we propose a new characterization of a permutation reaching full diffusion after a given number of rounds. Using this characterization, we are able to create a very efficient algorithm, which on the previously mentioned cases yields all the permutations that achieve full diffusion in 9 rounds. Note that our algorithm essentially uses branch-and-bound techniques, and thus it is hard to evaluate the exact complexity. However, the size of the search space goes from 2 43 for 2k = 28 up to 2 75 for 2k = 42, but we were able to treat each of these cases in less than one hour for each value of k when using 72 threads. Moreover, this characterization has a very efficient implementation which allowed us to re-find all optimal even-odd permutations for up to 26 blocks with a basic exhaustive search in a few hours, showing that for these cases, there is no need for sophisticated techniques as in [?]. Furthermore, for 34, 38, 40 and 42 blocks, we prove with this method that there is no even-odd permutation with a diffusion round of 9, which is the lower bound on the diffusion round for these sizes given in [?]. We were also able to find even-odd permutations with a diffusion round of 10 for 2k = 34 (which is thus optimal), as well as even-odd permutations with diffusion round 11 for 2k = 38, 40, 42. Finally, we evaluate the security of our constructed permutations against impossible differentials and differentials (by computing the minimum number of active S-boxes). In particular, for the 32 blocks case, and the impossible differentials, all our permutations have a one-round shorter longest impossible differential distinguisher compared to what was proposed by [?], which brings it down to 17 rounds.

Generalized Feistel Networks (GFN)
Zheng et al. [?] introduced Type-2 Feistels as a generalization of the original Feistel construction. Given an even number 2k of blocks (X 0 , . . . , X 2k−1 ), it first applies the Feistel construction on the pairs of blocks which yields (X 0 ⊕ S 0 (X 1 ), X 1 , . . . , X 2k−2 ⊕ S k−1 (X 2k−1 ), X 2k−1 ). The blocks are then cyclically right shifted to obtain the result. Later, it was proposed to use another permutation than the cyclic shift in [?], leading to Generalized Feistel Networks. Definition 1. Let 2k be an even number, n, r be positive integers, and {F i,j } i∈{1,...,r},j∈{0,...,k−1} be a set of cryptographic keyed functions from F n 2 to F n 2 . Let π be a permutation over 2k elements. A Generalized Feistel Network (GFN) is a block cipher built as R r • · · · • R 1 , where R i is the round function Note that for this paper, neither the exact definition of the keyed functions F i,j nor their sizes are relevant. We can thus consider all of them as an arbitrary S-box S, leading to the framework depicted in Figure ?? 2 . As the only variable parameters are thus k and π, we denote by GFN k π a GFN with 2k blocks that uses the permutation π.

Diffusion Round
We use the notations depicted in Figure ??. The input variables of the i-th round of a GFN are denoted by (X i 0 , X i 1 , . . . , X i 2k−1 ). We also denote by ( Figure 1: Generalized Feistel Network variables which are at the input of the permutation π, i.e. It is easy to see from Definition ?? that X 1 π(0) depends on X 0 0 and X 0 1 . More generally, any block X r j depends on a certain number of blocks from the round 0, i.e. computing X r j requires some blocks {X 0 j0 , . . . , X 0 j l }. Note that this does not depend on the size of the functions F i,j in the GFN. As in [?], we say in that case that any of these X 0 ji diffuses to X r j , and we focus our study on the number of rounds needed to reach full diffusion.
Definition 2. Let π be a permutation over 2k elements. We say that a block X 0 j fully diffuses after r rounds if for all i ∈ {0, . . . , 2k − 1}, X 0 j diffuses to X r i . We say that π reaches full diffusion after r rounds if for all j ∈ {0, . . . , 2k − 1}, X 0 j fully diffuses after r rounds. The smallest r that verifies this property for the block X 0 i is called the diffusion round of the block X 0 i . Note that we need to study both the diffusion over the encryption and the decryption process. Indeed, there is no guarantee that an encryption function with good diffusion also keeps this property for its inverse. Since we have (GFN k π ) −1 = GFN k π −1 , we need to study both the diffusion of π and π −1 . Naturally, we would like both π and π −1 to fully diffuse as quickly as possible, which leads to the following definition.
Definition 3. Let π be a permutation over 2k elements. Denote by DR i (π) the minimum number of rounds r such that X 0 i fully diffuses after r rounds in GF N k π . The diffusion round of a permutation π is: This definition gives the same importance to the total diffusion of both π and π −1 . Definition ?? defines a natural partial order on the permutations: a permutation π 1 is better (at diffusing) than a permutation π 2 if DR max (π 1 ) ≤ DR max (π 2 ). Searching the best permutations (for the diffusion) directly can be difficult. As a result the methodology we adopt in this work is to search for permutations that diffuse totally in the forward direction and then check if their respective inverse also diffuses totally.

Even-odd Permutations
A naive way to search for optimal permutation would be to simply go through all of them and check the diffusion one permutation by one. However, there are (2k)! permutations, which quickly grows beyond practical means. For example with 2k = 32, approximately 2 117 permutations should be checked. To reduce the number of permutations that will be tested, we will restrict ourselves to a specific class of permutations and give an equivalence relation which further reduces the number of permutations to be considered.
In [?], Suzaki and Minematsu did an exhaustive search for 1 ≤ k ≤ 8, and made the observation that every optimal permutation (for such k) mapped even-number input blocks to odd-number output blocks and vice versa. We call such permutations even-odd. In the rest of this paper, we will use the following notation for even-odd permutations. An even-odd permutation π of size 2k will be denoted by the pair of permutations (p, q) of size k verifying ∀i ∈ [0, k − 1], π(2i) = 2 · p(i) + 1 and π(2i + 1) = 2 · q(i). The search space is now reduced to (k!) 2 permutations. According to this, [?] gives the following lower-bound on the diffusion round of even-odd permutations (p, q).
Let π = (p, q) be an even-odd permutation over 2k elements, and i be the smallest integer such that F i ≥ k. Then DR max (π) ≥ i + 1.
For a given permutation π, if the inequality is tight, we say that π is tight. A proof of this proposition already exists in both [?] and [?]. According to our results, we will give another proof of this proposition in Section ??. We will also show in Section ?? that this bound is tight for the cases 2k = 28, 30, 32, 36 and strict for 2k = 34, 38, 40, 42.

Equivalence Classes of Even-odd Permutations
To further reduce the size of the search space, as in [?], we use some equivalence classes, given by the following definition.
Definition 4. Let π and π be two even-odd permutations over 2k elements. We say that π and π are equivalent if there exists a permutation ϕ over 2k elements such that From [?], we can then give a set of permutations P k such that for any equivalence class, there exists at least one π ∈ P k which belongs to this class. This effectively gives us a set of class representatives (in which a few of them are redundant), and this set can be built from the following proposition, proven in [?]. Recall that any permutation can be decomposed into a composition of cycles. We call cycle structure the unordered set of the length of these cycles, for example the permutation (0 1 2 3)(4 5)( 6 7)(8) has a cycle structure of {4, 2, 2, 1}.
Proposition 2. Let P k be a set of even-odd permutations π = (p, q) over 2k elements constructed as follows. For each possible cycle structure c of a permutation over k elements, pick one permutation p which has a cycle structure equal to c. Then, for every permutation q over k elements, add (p, q) in the set P k . By doing so, P k contains at least one representative of each equivalence class induced by Definition ??. Moreover, P k contains exactly N k .k! elements, where N k is the number of partitions of the integer k.
This allows us to only consider N k .k! permutations instead of (k!) 2 . This is a significant improvement, as for example with k = 16, there are only 231 × 16! 2 52 permutations to go through, instead of (16!) 2 2 88 . However when k grows, it is still too big a number to try an exhaustive search. As such, we propose in Section ?? an efficient search algorithm to find all optimal even-odd permutations for a given k, without needing to do an exhaustive search.

Characterization of Full Diffusion
In this section, we will explain our strategy to search for a tight even-odd permutation, that is, a permutation with a diffusion round reaching the Fibonacci bound given in Proposition ??. We will first give an algebraic characterization for a permutation to have full diffusion, then give an algorithm to exploit this characterization and quickly search all such permutations. Note that here we only focus on the diffusion round of the permutation when considering encryption. That is, for a given permutation π, we focus only on DR(π) = max 0≤i≤2k−1 {DR i (π)}. Then, once we found a permutation reaching the Fibonacci bound, we can easily check if π −1 also reaches this bound, and if that is the case, we found a tight permutation.
We describe here the main tools we used to design our search algorithm. Note that for two permutations p, q, we denote the composition p • q by pq for better reading. We first begin by giving the following proposition.
Proposition 3. Let π = (p, q) be an even-odd permutation over 2k elements. Then π achieves full diffusion after r rounds if and only if each block X 0 j is diffused to at least one block of each pair at the input of the (r − 1)-th round, i.e. diffused to either X r−1 Proof. Suppose that a given block X 0 i has been fully diffused, i.e. to every block X r 2j and On the other hand, suppose that a given block X 0 i has diffused to an even block X r−1 2j , then X 0 i will be diffused to only X r−1 2j . If X 0 i has diffused to an odd block X r−1 2j+1 , it will be diffused to both X r−1 2j and X r−1 2j+1 . In both cases, it will be diffused to X r−1 2j , then to X r 2j +1 with j = p(j), and finally to both X r 2j and X r 2j +1 . Thus, if for all j ∈ {0, . . . , k − 1}, X 0 i is diffused to any block of the j-th pair at the input of the (r − 1)-th round, it will be diffused to every block X r 2p(j) and X r 2p(j)+1 , and since p is a permutation, this means that we have full diffusion for X 0 i .
Proof. For the proof of the previous theorem, we can easily see that a block X 0 j diffuses to either X r−1 being fully diffused is the same as X 0 2j being fully diffused, and X 0 2j+1 is always diffused to X 0 2j .
Thus we only need to focus on the diffusion of each block X 0 2j to each block X r−1 2j . Now we can take a look a what would happen in an ideal scenario. Assume that we are studying the diffusion of a block X 0 2j . Then X 0 2j is diffused to X 0 . Then again : Assuming an ideal scenario, we would have j 3 0 = j 3 1 , i.e. X 0 2j has diffused to two different blocks after 4 rounds (minus the application of π on the fourth round). We can then keep going and get a series of j i which gives us the blocks on which X 0 2j has diffused after i + 1 rounds minus the last application of π, always assuming that we never have j i = j i for = . The propagation for up to 7 rounds is given in Figure ??.
However, we cannot have j i = j i with = forever. Indeed, since we only have k blocks, we are bound at some point to have j i = j i and = . However, we can easily compute the actual value of each j i . Indeed, if we take for example j 6 6 in Figure ??, then we know that Denote by J i j the set of equations obtained by expressing every j i that way. For example, we would have According to this, we can give a generic way to compute J i j . We start with J 1 such that x can be written as x = p(y) for some y ∈ J i−2 j , we also add q(x) to J i j . We can justify this construction as follows. Suppose that a given j belongs to J i−2 j because X 0 2j diffuses to X i−2 2j . Then X 0 2j diffuses to both X i−1 2j and X i−1 2j +1 with j = p(j ). Thus for the next round, X 0 2j will diffuse to both X i 2 j+1 and X i 2 j , with j = p(j ) and On the other hand, suppose that j belongs to J i−2 j because X 0 2j diffuses to X i−2 2j +1 . In that case, X 0 2j will only diffuse to X i−1 2j with j = q(j ). For the next round, X 0 2j only diffuses to X i 2 j+1 with j = p(j ).
Thus in both cases, we need to have j = p(j ), but we only require j = q(j ) in the first case, which corresponds exactly to the case where the previous term started with a composition by p.
Note that from this construction, we can deduce the following proposition.
Proof. We can prove this by induction. Both J 1 j and J 2 j are of size 1, which corresponds to F 1 and F 2 . We first add an element p( However, according to our construction, J i−1 j contains such an element x = p(y) for every term y ∈ J i−2 j . Thus, there are F i−2 such terms. In the end, J i j contains F i−1 + F i−2 = F i elements, which concludes the induction. We can now use those sets J i j to fully characterize the fact that a block fully diffuses when using a given permutation.
According to the previous corollary, if i is the smallest integer such that F i ≥ k, this exactly means that DR max (π) ≥ i + 1 Note that from the construction of any J i j , each term starts with a composition by p. Since p is a permutation, and we want full diffusion for every blocks, we can remove this first p from every term to get a smaller representation. Essentially, this means that we are considering the diffusion of the block p −1 (j), but we will still write J i j . As such, J 6 j for example is thus rewritten as To illustrate the previous characterization, we introduce what we call the diffusion table (of rank i) of an even-odd permutation (p, q) of size 2k. The columns are indexed by the numbers from 0 to k − 1 and the row are indexed by the products of p and q used to generate all sets J i j . Each cell of the table is the value obtained by applying the permutation indexing the row to the value indexing the column of the cell. For example, the cell indexed by p i and 0 contains p i (0). This provides a clear visualization of our characterization, as the j-th column is exactly J i j . Thus, we can easily illustrate Corollary ?? by verifying that every column of this table contains every possible values. We thus add one more row at the end of diffusion table called diff which contains the number of different values in a column. By construction, this is exactly the number of elements of J i j where j is the index of the column. In tables constructed as described, the full diffusion of a permutation corresponds to a diff row containing only the value k.  ).
Finally, we can reformulate the problem of finding optimal even-odd permutations with these tables. Indeed, it corresponds to finding the minimal i and even-odd permutations of size 2k such that their diffusion table have their diff row containing only k.

Efficient Search Algorithm
First, we can see that our characterization can be very efficiently implemented, as testing if π = (p, q) has full diffusion mostly requires only a few table lookups. An example of an implementation for this test for 9 rounds is given in Appendix ??, and its efficiency allowed us to recover all optimal even-odd permutations for k ≤ 13 with a basic exhaustive search. Especially, for k = 13, we were able to go through all N 13 .13! 2 39 permutations and check them in about 410 minutes on a single core. While these optimal permutations were already known, it shows that the sophisticated techniques introduced in [?] were not necessary for these cases.
However for k ≥ 14, it becomes too expensive to make this exhaustive search. We thus focus on finding optimal even-odd permutations for 14 ≤ k ≤ 21, hence such permutations would have a diffusion round of 9. Given a cycle structure for p, we can easily find a permutation p with such structure and thus we need to search q such that π = (p, q) needs 9 rounds to reach full diffusion, i.e., such that each J 8 j contains all numbers from 0 to k − 1. Note that we cannot exploit J 8 j directly. Indeed, one might want to guess parts of q and check if J 8 j does not contains too many duplicates. However, to fully compute J 8 j , we need to guess q in its entirety, which makes this strategy too expensive. We thus describe an efficient way to exploit this characterization to find optimal even-odd permutations.
First for a given j, if we take a look at J 6 j , we can see that we need to make only 7 guesses over the images of q to fully compute J 6 j . Indeed, we need to know q(j), (qp)(j), (qp 2 )(j), (qp 3 )(j), (qpq)(j), (qp 2 q)(j) and (qpqp)(j).
Let X 6 j and Y 6 j be two subsets of J 6 j , such that X 6 j ∪ Y 6 j = J 6 j , with According to the construction of J 8 j , we can actually write Assume that we made the 7 guesses mentioned above. In that case, we know the exact values in both X 6 j and Y 6 j . Moreover, since p is known, we know exactly the values in p 2 (X 6 j ∪ Y 6 j ). Finally, since we guessed 7 images of q, there might be some values in (pq)(X 6 j ) and (qp)(X 6 j ∪ Y 6 j ) that are known. Hence, we create three sets K j , X 6 j and Y 6 j : • K j is the set of all known values of J 8 j . Thus p 2 (X 6 j ∪ Y 6 j ) ⊂ K j and there might be a few elements from (pq)(X 6 j ) and (qp)(X 6 j ∪ Y 6 j ) in K j too.
• X 6 j is the subset of X 6 j such that for any x ∈ X 6 j , the value of q(x) yet remains to be determined.
• In the same way, Y 6 j is the subset of p(X 6 j ∪ Y 6 j ) such that for any x ∈ Y 6 j , the value of q(x) is not determined.
For j to be fully diffused, we thus have the constraint We then check if this constraint is valid, i.e. if there exist some guesses for the remaining images of q such that C j holds, and this is described in the next section. Now if we take a look at J 6 j where j = p(j), we can see that we only need 3 more guesses to compute it, instead of 7 as before. Indeed, we already guessed and thus it only remains to guess (qp 4 )(j) = (qp 3 )(j ) (qp 2 qp)(j) = (qp 2 q)(j ) (qpqp 2 )(j) = (qpqp)(j ).
By doing these guesses, we can build the sets K j , X 6 j and Y 6 j as before, and thus get another constraint that needs to be checked However by making those three new guesses, we might be able to compute new values in X 6 j and Y 6 j . We thus need to update the constraint C j according to these guesses, and then check again if C j is valid.
This can be repeated until we have fully guessed q, in which case we have a solution, or show that no matter which guesses we made there is no solution which satisfies all constraints. This is the core of our algorithm, which is described from a high-level point of view in Algorithm ??. Algorithm 1 Searching for optimal even-odd permutations over 9 rounds Guess q(j), (qp)(j), (qp 2 )(j), (qp 3 )(j), (qpq)(j), (qp 2 q)(j) and (qpqp)(j)

22:
Deduce the constraint C j

23:
if C j is a valid constraint then Note however that the actual algorithm is a bit more sophisticated. Indeed, it might occur at some point that p(j) was already processed, i.e. C p(j) is already a constraint we have. When this happens, we need to choose another starting block j, and re-apply the algorithm, while still keeping all previously computed constraints. In practice, we found that the most efficient strategy is to use an element from the shortest cycle of p as the first starting block. Then, if we need to choose another starting block, we pick an element in the next shortest cycle of p and so on. Moreover, when making some guesses for the images of q, it might happen that we already made this guess. This is not a problem, as this guess basically becomes free and does not add any more cost. Finally, except for the first seven guesses, we update and check all constraints after each guess.

Checking the Constraints
We first give a naive way to check if a constraint is valid. We are given three sets K, X and Y, resulting in the constraint We know the full permutation p, and for any x ∈ X ∪ Y, q(x) is still unknown. Let A denote the set of values a for which we still do not know the preimage of a through q, i.e. for any a ∈ A, we do not know which x results in q(x) = a. Considering the guesses we already made on q, we always know this set A, and thus have the following two relations (pq)(X) ⊂ p(A) and q(Y) ⊂ A. According to this, we can write Hence if |K ∪ A ∪ p(A)| < k, we know that the constraint C cannot be valid. However, we can actually go further and get more precise information by doing the following.
We can formulate our problem in the following generic way. We are given three sets K, A, and B (= p(A)), and we search for two sets A ⊂ A and B ⊂ B such that |K ∪ A ∪ B| is maximal, with A = q(Y) and B = (pq)(X). Note that, since p and q are permutations, we have | A| = |X| and | B| = |Y|. Hence our idea is to determine whether there is at least one such pair ( A, B) satisfying |K ∪ A ∪ B| ≥ k. Indeed if no such pair exists then constraint C does not hold. Note that if X ∩ Y = ∅ then it is possible for such pair to exist while C does not hold. However we found this filter powerful enough for our need.
We can partition K ∪ A ∪ B into the following eight disjoint sets: Let k A (resp. k B ) denote the cardinality of A (resp. B), and k i A , k i B be such that Since all S i are disjoint, A ⊂ A and B ⊂ B, notice that we have By selecting the two sets A ∩ S 1 and B ∩ S 1 as disjoint as possible we have: Indeed, first we have at most |K| + k A + k B elements in K ∪ A ∪ B. However among all those elements, some might be the same, which explains the remaining terms : • Elements of A and B included in S 0 , S 2 or S 3 are duplicates since they all belong to K.
• We need to take k 1 A (resp. k 1 B ) elements from A (resp. from B), where all those elements belongs to S 1 . We thus have two cases. If k 1 A + k 1 B ≤ |S 1 |, we can freely choose all those elements without having duplicates between A and B. Indeed for example, if we have k 1 A = k 1 B = 1 and S 1 = {0, 1, 2}, then we can put 0 in A and 1 in B, thus resulting in no duplicates between A and B . However if we have k 1 A + k 1 B > |S 1 |, then no matter what, we will have duplicates. Thus in the best case, we have max(k 1 A + k 1 B − |S 1 |, 0) duplicates that we need to count out.
Hence, maximizing |K ∪ A ∪ B| is straightforward as there is one specific order in which to find the values of k i A and k i B that always maximize the size of the union. We only give the way to optimally build A since it is fully similar for B : • First, using elements from S 5 to build A does not add any duplicate, thus we first pull elements from S 5 and k 5 A = min(k A , |S 5 |). • As mentioned above, using one element from S 1 adds either zero or one duplicate, thus we then pull elements from S 1 and k 1 A = min(k A − k 5 A , |S 1 |). • Finally, elements from either S 0 and S 3 necessarily add duplicates, so we freely choose Finally, computing the maximal value for |K ∪ A ∪ B| only requires to compute |S 1 |, |S 4 | and |S 5 | and we then check how it compares to k.

Results
We ran our algorithm for every k such that we need at least 9 rounds to have full diffusion, according to Proposition ??. This corresponds to 14 ≤ k ≤ 21, and we were able to find all optimal even-odd permutations for k ∈ {14, 15, 16, 18}. For k ∈ {17, 19, 20, 21}, our algorithm allowed us to prove that there is no even-odd permutation leading to a full diffusion after 9 rounds. Since 9 rounds correspond to the Fibonacci bound, we know that for these cases, we need at least 10 rounds to have full diffusion, and we give later in this section an optimal solution for k = 17 reaching full diffusion in 10 rounds, as well as good permutations for k = 19, 20, 21 with a diffusion round of 11. We can thus give the following theorem to summarize our results.

Theorem 2.
To build a Generalized Feistel Network GFN k π with full diffusion where π is an even-odd permutation, we have : • For k = 14, 15, 16 and 18, the optimal number of rounds for full diffusion is 9.
• For k = 17, the optimal number of rounds for full diffusion is 10.
• For k = 19, 20 and 21, the optimal number of rounds for full diffusion is at least 10 and at most 11.
We give in Table ?? an overview of our results. The first column gives the total time needed for our algorithm to either exhaust all optimal even-odd permutations, or prove that no such permutation exists. Note that this is the total CPU time, i.e. when using a single CPU, however our algorithm is highly parallelizable and thus the real time can be drastically reduced. 3 This shows that our algorithm is extremely efficient, as it can quickly solve the case k = 16 for which [?] were not able to give an optimal solution. The second (resp. third) column gives the possible cycle structures of p (resp. q) in an optimal permutation, and the last column gives the number of solutions which have this structure. We can notice that not only the number of solutions is quite low, but also that the number of possible cycle structures is also quite limited. Moreover, we always have a fixed point in either p or q. (6, 6, 1, 1) (6, 6, 2) 144 (6, 6, 2) (6, 6, 1, 1) 144 (6, 3, 2, 2, 1) (6, 3, 2, 2, 1) 144 (12, 2) (12, 1, 1) 24 (12, 1, 1) (12, 2) 24 15 480 min (10, 2, 2, 1) (10, 2, 2, 1) 160 16 1023 min (6, 6, 3, 1) (6, 6, 3, 1) 432 (6, 6, 2, 2) (6, 3, 3, 2, 1, 1) 288 (6, 3, 3, 2, 1, 1  The most important result in this table is that there are actually even-odd permutations which have full diffusion after 9 rounds for k = 16, while both [?] and [?] could only find a permutation with full diffusion after 10 rounds, leaving open the question of whether the theoretical bound of 9 rounds (from Proposition ??) could be reached. Our results shows that it is indeed possible, and thus this proves that our permutations are optimal when considering even-odd permutations. We will see in the next section that we can further regroup these permutations into more precise equivalence classes, leading for the case k = 16 to four equivalence classes, given in Table ??.
Then we have π = ϕ • π • ϕ −1 . Indeed, if we look at the image of an even number 2i, we have In the same way, the image of an odd number 2i + 1 is We thus have π = ϕ • π • ϕ −1 . Hence, π and π are conjugate and thus equivalent, according to Definition ??.

B Efficient Implementation to Test 9 Round Full Diffusion
We give an example of a C++ implementation of the characterization for a permutation to have full diffusion over 9 rounds. This function takes powerp and q as parameters, which are respectively, the precomputed values of each power of p, i.e. powerp[i][j] = p i (j), and the permutation q.