Improved Attacks on sLiSCP Permutation and Tight Bound of Limited Birthday Distinguishers

Limited birthday distinguishers (LBDs) are widely used tools for the cryptanalysis of cryptographic permutations. In this paper we propose LBDs on several variants of the sLiSCP permutation family that are building blocks of two round 2 candidates of the NIST lightweight standardization process: Spix and SpoC. We improve the number of steps with respect to the previously known best results, that used rebound attack. We improve the techniques used for solving the middle part, called inbound, and we relax the external conditions in order to extend the previous attacks. The lower bound of the complexity of LBDs has been proved only against functions. In this paper, we prove for the first time the bound against permutations, which shows that the known upper bounds are tight.


Introduction
Lightweight cryptography aims at providing an efficient cryptographic primitive on highlyconstrained devices such as sensor networks, distributed control systems, the Internet of Things, and so on. Recently, the National Institute of Standards and Technology (NIST) initiated a lightweight cryptography standardization process [Nat19] to select and standardize several lightweight cryptographic algorithms. In April 2019, 56 algorithms were announced as round 1 candidates and in August 2019, 32 algorithms were selected as round 2 candidates. NIST had originally planned to announce about 8 round 3 candidates in September 2020, but this announcement has been delayed a few months. Given the situation, improved security analysis of round 2 candidates is very important.
Design of a cryptographic algorithm that simultaneously achieves high security and lightweight implementation properties is a challenging task. A recent trend 1 is to design a cryptographic permutation as an underlying primitive, and to build an authenticated encryption with associated data (AEAD) with the duplex construction [BDPA11]. This approach is also advantageous to additionally implement a cryptographic hash function only with a small overhead. In fact, NIST reported that 49% of the round 1 candidates and 50% of the round 2 candidates are based on a permutation [TMÇ + 19].
A cryptographic permutation is expected to behave as a uniformly random permutation. From an attacker's position, the goal is to find a specific behavior that differs between the target permutation and a random permutation. The attacker first specifies a certain relationship for a set of inputs and the corresponding outputs, and then compares the complexity, i.e. computational cost and memory amount, to find such a set for the target algorithm and a randomly chosen permutation. A limited birthday distinguisher (LBD) [GP10] is a natural application of differential cryptanalysis to permutations. The attacker specifies a set of input differences and a set of output differences. The attacker's goal is to find a pair of texts that confirm both of the input and output differences.
In this paper, we provide the cryptanalysis for sLiSCP [ARH + 17] and sLiSCP-light permutations [ARH + 18]. sLiSCP is a cryptographic permutation based on Simeck [YZS + 15]. sLiSCP was designed to be used in their sponge hash function and duplex AEAD mode. sLiSCP consists of 18 iterations of the step function that adopts a 4-branch type-2 generalized Feistel network (GFN) in which the size of each branch w is w ∈ {48, 64}. The whole permutation size is 192 bits or 256 bits, which is called sLiSCP-192 and sLiSCP-256. The step function is illustrated in the left-hand side of Fig. 1. The step function of sLiSCP-192 (resp. sLiSCP-256) computes two unkeyed 6-round Simeck48 (resp. 8-round Simeck64).
The same designers later presented a tweaked version called sLiSCP-light. The major difference from sLiSCP is that the GFN is replaced with the partial substitution permutation network (PSPN) [ARH + 18] illustrated in the right-hand side of Fig. 1. The recommended number of steps of sLiSCP-light was also reduced from 18 to 12.
sLiSCP-light is used as an underlying primitive of two round 2 candidates in NIST's standardization process. SpoC [AGH + 19a] builds an AEAD scheme with the duplex-like framework using 18-step sLiSCP-light-192 and 18-step sLiSCP-light-256 as underlying permutations. Spix [AGH + 19b] also builds an AEAD scheme with the duplex framework and 18-step sLiSCP-light-256 is used to process the key material while 9-step sLiSCP-light-256 is used to process associated data and message/ciphertext. The active usage of sLiSCP-light shows the importance of third-party security analysis.
To the best of our knowledge, there exists only a single third-party security analysis against sLiSCP [LSSW18] and no third-party analysis exists against sLiSCP-light. Liu et al. [LSSW18] provided a forgery attack and a collision attack against 6-step sLiSCP in the AEAD mode and the hash mode. In addition, a LBD was presented against 15-step sLiSCP permutation. The designers of sLiSCP, sLiSCP-light, SpoC, and Spix also provided some cryptanalysis for the permutation, which includes impossible differential, zero-correlation and integral distinguishers against 9-step sLiSCP. 2 After the submission of this work, Kraleva, Posteuca and Rijmen uploaded the analysis on SpoC to Cryptology ePrint Archive [KPR20], which was later presented at the Fourth Lightweight Cryptography Workshop organized by NIST. Soon after, the SpoC team posted a message to the NIST LWC forum [Tea20] to report that the attacks and observations in [KPR20] do not pose any threats to the security of SpoC and its underlying permutation sLiSCP-light.
In the design document of SpoC [AGH + 19a] and Spix [AGH + 19b], the designers claim that they aim to provide the evidence that 18-step sLiSCP-light is secure against various distinguishing attacks to prove that its behavior is as close as possible to that of an ideal permutation. Hence we believe that improving the previous permutation distinguishers for sLiSCP and providing a new analysis on sLiSCP-light is of great interest.
Remarks on the Permutation Distinguisher Framework. As explained in [IPS13], one may argue that the LBD can trivially be solved for any permutation Π by choosing any input pair (X, Y ) and computing X ⊕ Y as the set of input difference and Π(X) ⊕ Π(Y ) as the set of the output difference. In general, those meaningless attacks are avoided by considering that a hash function is part of a family indexed by a key input (e.g. IV is replaced with a key). In our case, the attack works even if a constant in Simeck boxes is given right before the attack starts. Hence, our attacks are not meaningless.
Our Contributions. The contribution of this paper is twofold. First, we improve the best known attacks against sLiSCP and present the first third-party cryptanalysis against sLiSCP-light. Second, we prove the lower bound of the complexity to solve LBDs for a random permutation, showing that the current best known generic attack is actually tight.
Limited-Birthday Distinguishers against sLiSCP and sLiSCP-light. We first reduce the complexity of the 15-step LBDs for sLiSCP, which is computed in the rebound-like procedure [MRST09]. By carefully analyzing the computation order, we extend the number of steps that can be covered by the inbound phase, which reduces the complexity from 2 122.7 to 2 111.4 for sLiSCP-192 and from 2 168.3 to 2 149.6 for sLiSCP-256.
Even with this complexity improvement, the attack cannot be extended to 16 steps easily because the remaining degrees of freedom are insufficient to satisfy the differential propagation for another step. Here, we look into the differential characteristics for Simeck48 and Simeck64 and try to make many bits inactive by spending a small amount of degrees of freedom. This allows us to attack 16-step sLiSCP-256 with 2 154.6 cost.
For sLiSCP-light, the designers of Spix and SpoC argued that the best known distinguisher is a zero-sum distinguisher with a start-from-the-middle approach, which works up to 14 steps but requires data and time complexities equal to that of the exhaustive search. Although the designers of Spix and SpoC cited the work by Liu et al. [LSSW18], no word is given on the possibility of applying the LBDs on sLiSCP to sLiSCP-light. In this paper we formally claim, for the first time, that 16-step sLiSCP-light can be attacked, using a similar procedure to the one on sLiSCP. Besides, the replacement of the Feistel network of sLiSCP with the partial SPN allows us to attack 16 steps of sLiSCP-light-192 with a complexity of 2 113.0 . The comparison of the attack complexities is given in Table 1.
Tight Lower Bound for the Limited-Birthday Problem. We show that the upper bound given by the known best algorithm on the limited birthday problem for a random permutation is asymptotically tight.
As we mentioned before, the goal of a LBD is to find a pair of texts that confirm both of the input and output differences that are specified by the attacker. More precisely, the limited birthday problem on an n-bit permutation P and closed subsets ∆ in , ∆ out is the problem of finding a tuple (X, X , Y, Y ) such that P (X) = Y , P (X ) = Y , X ⊕ X ∈ ∆ in , and Y ⊕ Y ∈ ∆ out . Here, a non-empty subset ∆ ⊂ {0, 1} n is closed if and only if X ⊕ Y ∈ ∆ for all X, Y ∈ ∆. An algorithm to solve the problem is called a limited-birthday distinguisher (LBD), or simply distinguisher.
The known best distinguisher to solve the problem when P is a random permutation is 150 Improved Attacks on sLiSCP Permutation and Tight Bound of Limited Birthday Distinguishers which was shown by Gilbert and Peyrin [GP10]. LBDs on permutations are claimed to be valid attacks when their complexity is less than (1). Although the lower bound for a random function was proven by Iwamoto et al [IPS13] 3 , no such a formal proof is known for a random permutation. In this paper, we for the first time give a formal proof that (1) is actually the (asymptotically) tight bound to solve the limited-birthday problem on a random permutation. More precisely, we prove the following theorem.
Theorem 1 (Lower bound for the limited-birthday problem, informal). When P is a random permutation, to solve the limited-birthday problem with a probability greater than queries to P or P −1 are required 4 .
This theorem strengthens the rationale of validity of various LBDs including those in previous works such as [GP10], and our attacks on sLiSCP and sLiSCP-light (The complexities of all of our new attacks are smaller than the lower bound for a random permutation in (2)).
The proof of Theorem 1 is more complex than the proof for the lower bound on a random function by Iwamoto et al. since we have to deal with queries to both of P and P −1 . To achieve the lower bound that is the complex combination of "max" and "min", and is quite close to the upper bound (1), we introduce a technical parameter in our proof.
Paper Outline. sLiSCP permutation family and LBD will be introduced in section 2. New LBDs for sLiSCP permutation family will be shown in section 3. A proof of the lower bound of LBD will be shown in section 4. We will conclude this paper in section 5. 3 The lower bound for a random function is O max The statement is valid as long as 2 −n+4 ≤ ps ≤ 1/2.

Specification of sLiSCP
An input to the permutation is first divided into four w-bit words, where w = 48 for sLiSCP-192 and w = 64 for sLiSCP-256. Let (X 0 0 , X 1 0 , X 2 0 , X 3 0 ) be the input to the permutation. This value is updated by iteratively computing the following step function shown in the left-hand side of Fig. 1 for i = 0, 1, . . . , 17.
where Simeck r w is an r rounds of w-bit block Simeck, called Simeck box, described in the following paragraph and sc i and sc i are step-dependent constants. Let m ∈ {24, 32} be the word size such that 2m is a block size. The 2m-bit input value is divided into two m-bit values L 0 R 0 . Then the following is computed for i = 1, 2, . . . , r where r is 6 and 8 for Simeck48 and Simeck64, respectively.

Specification of
where rc i is the round constant generated by an LFSR that is initialized with sc i or sc i . Since the constant does not impact to our analysis, we omit the details. The diagram of the round function of Simeck is illustrated in Fig. 2.

Specification of sLiSCP-light
sLiSCP-light is a tweaked design of sLiSCP. The major difference from sLiSCP is a step function and the number of steps to be computed. Let (X 0 0 , X 1 0 , X 2 0 , X 3 0 ) be the input to the permutation. This value is updated by iteratively computing the following step function shown in the right-hand side of Fig. 1 for i = 0, 1, . . . , 11.
sLiSCP-light is used as an underlying primitive of two NIST second-round candidates SpoC [AGH + 19a] and Spix [AGH + 19b]. Though the original number of steps is 12, the number of steps for the instantiations in those two designs is either 9 or 18.

Limited-Birthday Problem
The limited-birthday problem on a permutation P is the problem defined as follows 5 .
Definition 1 (The limited-birthday problem on permutation). Let P be an n-bit permutation, and ∆ in , ∆ out be (non-empty) closed subsets of {0, 1} n . For the limited-birthday problem on the permutation P , the goal of the adversary is to generate a quadruple of The complexity of the best known attack to solve the limited-birthday problem on a random permutation [GP10] is (3)

Improved LBD against sLiSCP
We revisit the previous 15-step attack on sLiSCP in Sect. 3.1. We show how to improve its complexity for sLiSCP in Sect. 3.2. Applications to sLiSCP-light are then discussed in Sect. 3.3. We finally present the attacks on 16 steps in Sect. 3.4 and Sect. 3.5.

Previous Analysis on sLiSCP
Liu et al. [LSSW18] analyzed the differential properties of the sLiSCP permutation and presented a 15-step distinguisher for sLiSCP-192 and sLiSCP-256. They first focused on a 6-step iterative differential characteristic that maps a difference (0, 0, 0, α) to a difference (0, 0, 0, α). This includes four differential propagations of α → β and two differential propagations of β → α through 6-round (resp. 8-round) Simeck48 (resp. Simeck64). The 6-round iterative characteristic is shown in the left-hand side of Fig. 3. Liu et al. then searched for the choice of α and β that has the maximum characteristic probability for α → β and β → α taking into account the weight that α → β occurs twice more frequently than β → α. Such differential properties are summarized in Table 2. Note that the probabilities in Table 2 were originally for the situation where each subkey of the keyed Simeck is chosen uniformly randomly, while what we need here is the probability that a randomly chosen plaintext (and the counterpart of the differential pair) would follow the specified differential transition for a fixed constant. We assume that the probability of the differential characteristic of Simeck is almost the same for all keys, thus is also true for the fixed constant in sLiSCP.
Finally, Liu et al. built a 15-step differential characteristic and proposed to find a pair satisfying the characteristic using the rebound attack framework [MRST09, LMS + 15], in which the attacker first efficiently finds paired values to satisfy the propagation for the middle steps (inbound phase), and then propagate the pairs backwards and forwards to probabilistically satisfy the characteristic (outbound phase).

Previous Inbound Phase for Three
Steps. To explain our improvement, the previous procedure of the inbound phase needs to be explained more precisely. Liu et al. focused on four active Simeck boxes in three middle steps, which is shown in the left-hand side of Fig. 4. For each active Simeck box, paired values satisfying the differential propagation are exhaustively searched, which is performed with 4 × 2 48 or 4 × 2 64 computations. Any combination of the solutions from four Simeck boxes will fix the entire 192-bit or 256-bit state. In Fig. 4, paired values for red lines are fully determined by fixing paired values of four active Simeck boxes. The black lines can be directly computed from the red ones.
The combined number of solutions for four active Simeck boxes is 2 125.8 and 2 182.8 for sLiSCP-192 and sLiSCP-256, respectively. Then the characteristic were extended so that the probability of the outbound phase is higher than this number of solutions. As shown in the left-hand side of Fig. 9 in the supplementary material A, 12 steps were added for the outbound phase that can be satisfied with probability 2 −122.7 and 2 −168.3 . In the previous procedure, four active Simeck boxes are fixed independently, while in the new procedure, fixing three active Simeck boxes (red) will fix another one (blue), and an additional one (black) can be fixed independently.

Improving Complexity of 15-Step Attacks on sLiSCP
We present an improved procedure for the inbound phase, which covers four middle steps. Intuitively, this improvement on its own does not increase the number of paired values that satisfy the full characteristic, hence the number of attacked steps does not increase in a straight-forward analysis. Instead, the procedure to find the paired values will be more sophisticated (another stage is introduced for the divide-and-conquer approach), which improves the attack complexity as it allows to reduce the outbound part. The new inbound procedure is illustrated in the right-hand side of Fig. 4, which is explained as follows.
1. We exhaustively search for the paired values satisfying the differential propagation for three Simeck boxes; the right-hand side of the first step, the left-hand side of the third step, and the left-hand side of the fourth step (in red). As preparation for Step 3, we also search for the paired values for the left-hand side of the second step (in black).
2. We take any combination of the solutions from those three Simeck boxes, which fixes the paired values for another active Simeck box in the right-hand side of the third step (in blue). Then we check whether the differential propagation of this Simeck box is satisfied or not, which filters out wrong pairs at this stage.
3. Any combination of the solutions for the first four Simeck boxes (red and blue) and the precomputed Simeck box (black) fixes the entire state. Then, the state is propagated to the outbound part to satisfy the 15-step trail shown in the left-hand side of Fig. 9 in the supplementary material A.
Step 1 requires to test 2 64 inputs for each Simeck box. Because the probability is 2 −18.7 for both of α → β and β → α, the number of obtained solutions will be 2 64−18.7 = 2 45.3 for each Simeck box. Time complexity of this step is 4 × 2 64 = 2 66 and a memory to store 4 × 2 45.3 = 2 47.3 values are required.
Step 2 requires to test 2 45.3×3 = 2 135.9 values. The power of the filter is 2 −18.7 , hence we will obtain 2 117.2 solutions. The time complexity of this step is 2 135.9 and a memory to store 2 117.2 values would be required a priori.
We have 2 45.3 for this Simeck box (black) and 2 117.2 solutions from the previous step (red and blue). Hence, we can generate up to 2 117.2+45.3 = 2 162.5 solutions from four inbound steps.
In the outbound phase in Fig. 9, we need to control 1 active Simeck box for the first 2 steps and 6 active Simeck boxes from steps 7 -14, which is satisfied with probability 2 −18.7×8 = 2 −149.6 . The last step contains 1 active Simeck box, but we do not control it. In the end, after trying 2 149.6 solutions of the inbound phase, we will have a pair whose input difference is of the form (β, α, 0, 0) and output difference is of the form (0, * , α, 0), where α and β are shown in Table 2 and * denotes any difference.
The complexity to find such a pair for a random permutation is given by the limited birthday problem. The size of the input and output differences are 1 and 2 64 , respectively, the generic attack complexity in Eq. (1) is 2 256+1 /(1 · 2 64 ) = 2 193 and the lower bound in Eq.
The first remark is that the previous attack complexity is 2 168.3 and our new attack complexity is 2 149.6 . The improved attack factor is 2 18.7 . The improvement clearly comes from the inclusion of one more active Simeck box in the inbound phase.
The second remark is that the required memory of 2 117.2 for Step 2 can be omitted by performing Step 3 (the exhaustive search of the active Simeck box should be finished in advance) as soon as a solution is generated in Step 2 from the tables of the red values. Hence the required memory is 2 47.3 .
Complexity Analysis for sLiSCP-192. The analysis is almost the same as sLiSCP-256 but it is a bit more complicated because the probabilities for α → β and β → α are different; the former is 2 −11.3 and the latter is 2 −21.8 as shown in Table 2.
Step 1 requires to test 2 48 inputs for each Simeck box. For two of them with α → β, we will obtain 2 48−11.3 = 2 36.7 solutions. For one of them with β → α, we will obtain 2 48−21.8 = 2 26.2 solutions. Time complexity of this step is 4 × 2 48 = 2 50 and a memory to store 2 × 2 36.7 values are required.
Step 2 requires to test 2 36.7+36.7+26.2 = 2 99.6 values. The power of the filter is 2 −11.3 , hence we will obtain 2 88.3 solutions. Time complexity of this step is 2 99.6 . Similarly to the case of sLiSCP-256, a memory to store 2 88.3 values can be omitted by testing the filtered solutions on the fly thanks to the smaller previous lists.
We have 2 48−21.8 = 2 26.2 solutions for the precomputed Simeck box (black) and 2 88.3 solutions from the previous step (red and blue). Hence, we can generate up to 2 88.3+26.2 = 2 114.5 solutions from the inbound phase.

Application to 15-step LBD for sLiSCP-light
The distinguishers in the previous subsection apply to sLiSCP, while two NIST second round candidates Spix [AGH + 19b] and SpoC [AGH + 19a] are based on sLiSCP-light. According to the designers, the best distinguisher for sLiSCP-light is a zero-sum distinguisher for 14 steps. The designers of Spix and SpoC cited the work by Liu et al. [LSSW18] but did not mention the possibility of applying the rebound attack on sLiSCP to sLiSCP-light.
Here we formally claim for the first time, that 15-step sLiSCP-light can be attacked by using a similar procedure to the one on sLiSCP. Our analysis starts from the 6-step iterative differential characteristic for the partial SPN in sLiSCP-light. The diagram of the differential propagation is illustrated in the right-hand of Fig. 3. The difference (0, 0, 0, β) will be mapped to itself after 6 steps by going through four active Simeck boxes with the differential propagation α → β and two active Simeck boxes with β → α. Hence, the efficiency of the 6-step trail as well as the best choice of α and β are the same as the ones for sLiSCP.
Our inbound phase that covers four steps of sLiSCP can also be applied to sLiSCP-light, which is illustrated in Fig. 5. We will omit the details that were already explained in the previous distinguishers. Intuitively, we first precompute all the solutions for three active Simeck boxes highlighted in red. Any combination uniquely determines the paired values for another active Simeck box highlighted in blue. Finally, we can freely choose the solution for another Simeck box highlighted in black.
The extension for the outbound phase is straightforward. The 15-step trail is given in Fig. 9 in supplementary material A. It includes six active Simeck box with differential propagation α → β, two active Simeck box with β → α, and one uncontrolled active Simeck box in the last step. The only difference is the form of the output difference, that is (0, * , * , 0). Note that * is unknown but must be identical for two branches, hence the size of the possible output differences remains the same as the size for sLiSCP.
As a result, 15-step sLiSCP-light can be attacked in a similar way as sLiSCP. The complexity is 2 111.4 computational cost and 2 38.7 memory amount for sLiSCP-light-192, and 2 168.3 computational cost and 2 47.3 memory amount for sLiSCP-light-256.

16 Steps Attacks against sLiSCP-256
The attack in the previous section cannot be extended to 16 steps easily from the following reason. The inbound phase can generate up to 2 114.5 (resp. 2 162.5 ) solutions for sLiSCP-192 (resp. for sLiSCP-256), while the probability of the outbound phase of the 15-step attack is 2 −111.4 (resp. 2 −149.6 ). The remaining degrees of freedom is only 2 3.1 (resp. 2 12.9 ), which is not sufficient to satisfy one more active Simeck box.
Overall Idea. To extend the trail to 16 steps, we will have one more active Simeck box. We exploit the remaining degrees of freedom to control the differential propagation only partially. The input difference for the 16-step sLiSCP distinguisher is unchanged, (β, α, 0, 0), while the output differences becomes (γ, α, 0, * ), where γ is partially controlled, i.e. a few bits of γ are inactive. Depending on the number of inactive bits in γ, the dedicated attack can be faster than the generic attack. Fig. 6 illustrates how to extend by one step the 15-step distinguisher. For the sake of completeness, the entire 16-step trail is shown in Fig. 10 in supplementary material B.

Analysis for sLiSCP-256
The differential probability for α → β is 2 −18.7 , but we only have 2 12.9 degrees of freedom left. To evaluate the impact of partially controlling the propagation, we look into the details of the differential characteristic with the highest probability.
For sLiSCP-256, as shown in Table 2, there are two kinds of the differential masks: (wt(α), wt(β)) = (2, 1) or (1, 2), where wt denotes the Hamming weight. If the input differential mask has the heavier weight of 2, the first a few rounds have relatively low probability. Hence most of the remaining degrees of freedom are consumed for those rounds, and the number of uncontrolled rounds (following the full diffusion) becomes long, which is disadvantageous. Thus we set α ← 0080000 || 0000000 and β ← 0880000 || 0000000. This characteristic can be satisfied with probability 2 −22 . The breakdown for each round is given in the left-hand side of Table 3.
Partially Controlled Differential Characteristic. The analysis of the propagation where we partially control the differential propagation of Simeck64 is shown in the right-hand side of Table 3. We have 2 12.9 degrees of freedom remaining. The analysis here assumes that the differential propagation follows the characteristic up to 2 −13 (this is a temporary assumption, we will later discuss its validity), and the analysis for the full diffusion is applied to the remaining rounds. Such words start to appear from the computation of round 5, which are denoted by L 5 , L 6 , L 7 , and L 8 in the right-hand side of Table 3. The bitwise representation of those words are also given in Table 3, where '0' denotes the inactive bits, '1' denotes the active bits, and '2' denotes that the bit may or may not have a difference. As can be seen in the table, we have 33 inactive bits after 8 rounds; 14 bits in the left-hand side L 8 and 19 bits in the right-hand side L 7 .
Differential Probability / Validity of the Assumption. The analysis in the previous paragraph is based on two assumptions. The first assumption is too optimistic for the attacker in which the propagation follows the characteristic up to probability 2 −13 , while we only have 2 12.9 degrees of freedom. The second assumption is too pessimistic for the attacker in which the analysis from round 5 follows the full diffusion. Note that the full diffusion analysis is the worst-case scenario for the attacker because it usually Improved Attacks on sLiSCP Permutation and Tight Bound of Limited Birthday Distinguishers Step Differential Mask probability Step Differential Mask probability 00800000 00000000 00800000 00000000 1 01000000 00800000 2 −2 1 01000000 00800000 2 −2 2 02800000 01000000 2 −2 2 02800000 01000000 2 −2 3 04000000 02800000 2 −4 3 04000000 02800000 2 −4 4 0a800000 04000000 2 −2 4 0a800000 04000000 2 −2 5 01000000 0a800000 2 −6 5 L 5 0a800000 2 −3 6 08800000 01000000 2 −2 6 L 6 L 5 1 7 00000000 08800000 2 −4 7 L 7 L 6 1 8 08800000 00000000 L 5 : 0001 2021 0000 0000 0000 0000 0000 0000 corresponds to the situation where active AND gates always produce the difference. Indeed, this probability is the same as the situation where active AND gates always stop the difference. We also need to consider the differential probability, namely, even if the differential propagation does not follow the characteristic up to 2 −13 , the target 33 bits can be inactive. Namely, the bit-wise differential form of γ = γ L || γ R can be as follows.
γ L = **** **** *000 0000 0000 000* **** **** γ R = **** **** *000 0000 0000 0000 0000 **** The simplest way to evaluate the precise probability to satisfy γ is to perform an experiment, i.e. we choose many 64-bit values x and process x and x ⊕ α with 8-round Simeck64 to calculate the probability that the target 33 bits are inactive. In our experiment, we took 1 million choices of x uniformly at random and 33 target bits became inactive for 32,501 choices. Hence, the probability is 2 −4.94 . This implies that we do not need to use 2 13 degrees of freedom to make those 33 bits inactive, but only 2 5 degrees are sufficient.
Complexity Evaluation. The attack will spend 2 5 more degrees of freedom than the 15-step attack. Hence, the computational cost of the 16-step attack is 2 149.6+5.0 = 2 154.6 . The required memory amount is 2 47.3 .

Remarks for sLiSCP-192
For sLiSCP-192, we only have 2 3.1 degrees of freedom remaining and to control the differential characteristic for 6-round Simeck48 is difficult. Currently we have not found any valid distinguishers on 16-steps sLiSCP-192. We experimentally confirmed that the partial control of Simeck48 could work up to 5 rounds, but could not work for 6 rounds. Akinori Hosoyamada, María Naya-Plasencia and Yu Sasaki 159

16 Steps Attacks against sLiSCP-light-192 and sLiSCP-light-256
The extension to 16 steps can be similarly applied to sLiSCP-light, however there is a certain difference due to the usage of the different network. As shown in the right-hand side of Fig. 9, the last step of the 15-step attack includes a differential propagation with β → α. If the same strategy as sLiSCP is applied, we need to partially control this propagation. For Simeck48, the probability for β → α is much smaller than one for α → β, hence it is not a good strategy to partially control the last step of the 15-step attack. To avoid this problem, we extend the 15-step trail in backwards, and partially control the difference in the first step of the 15-step attack. The diagram of the extended steps is given in Fig. 7.

Peeling off the Last Round
We point out that Simeck round function has the property that the difference after the first round or before the last round can be computed from the input or the output without knowing the key value. This is because the key is directly added to the Feistel network, which is different from traditional designs in which the key is added inside a so-called Ffunction.
To be more precise, let I l || I r be the input to the Simeck permutation. Then an attacker can compute the difference after the first round by i r ⊕(i l ≪ 1)⊕(i l ∧(i l ≪ 5)) || i l , which is independent of the secret-key/fixed-constant value. Moreover, the network of sLiSCP-light helps an attacker to exploit this property. As shown in Fig. 7, the partially controlled Simeck box is located in the first step, and the input to the entire permutation (with difference γ) is directly used to compute this Simeck box. Hence, the attacker who only has an access to the input and the output of the entire permutation can actually compute the difference after 1 Simeck round. Note that it is not the case with sLiSCP. As shown in Fig. 6, the output of the partially controlled Simeck box is masked by an internal state due to the Feistel network. Hence, an attacker who only has an access to the output of the entire construction cannot compute 1 Simeck round.

Analysis for sLiSCP-light-192
We only have 2 3.1 degrees of freedom remaining. The differential characteristic probability for 6-round Simeck48 is 2 −12 and the propagation where the characteristic is satisfied up to 2 −4 are given in Table 4.
Towards More than Experimental Verification. The above attacks used the differential probability obtained by the experiments. Here, we discuss if there exist other methods to validate the obtained probability. Note that what we want to evaluate is different from the standard differential probability, i.e. summing up probabilities of all the trails that map a fixed input difference to a fixed output difference. In our setting, the input difference to the inverse of Simeck is fixed, but for the output difference, only several inactive bit positions are fixed. In case of sLiSCP-192, inactive 20-bit positions are fixed, in other words, the output differences can be any of 2 48−20 = 2 28 choices. Evaluating differential probability for such a case is not an easy task.
We approach this probability with MILP-based evaluation. For the input, we gave a condition for each bit to specify whether each bit is active or not. For the output, we gave a condition only for the target bits to set them inactive. The MILP solver found that the maximum characteristic probability to make the target 20 bits inactive is 2 −8 . Hence, we added the condition that the probability of the trail is X, where X = 2 −8 , 2 −9 , 2 −10 , · · · and counted the number of characteristics for each X. The results are given in Table 5. Table 5 shows that Y distinct characteristics with probability X were found by MILP. As X becomes smaller, Y becomes bigger. We stopped this evaluation when the number of characteristics reached 10 million. The sum of the probability of all the characteristics up to 2 −29 is 2 −1.916 ≈ 2 −2.0 . We still have some gap to reach 2 −1.6 , however we believe that this analysis provides better understanding of the differential probability obtained by our experiments.

Analysis for sLiSCP-light-256
16 steps of sLiSCP-light-256 can be attacked in the same procedure but the analysis can be much simpler because the probabilities for α → β and β → α are identical in Table 5: Evaluation of Differential Probability with MILP. MILP found Y distinct characteristics with probability X, and its contribution is calculated by XY .

Tight Lower Bound for the Limited-Birthday Problem
In previous works, LBDs on permutations are claimed to be valid distinguishers when their complexity is less than the one given by (3). For a random permutation, the complexity (3) has been considered to be the best because 1. we don't know of any algorithm that solves the limited-birthday problem on a random permutation with a complexity smaller than (3), and 2. on the limited-birthday problem on a random function F , the complexity max 2 n+1 |∆out| , 2 n+1 |∆out|·|∆in| is proven to be tight [IPS13].
The above two evidences strongly indicate that (3) will be the tight bound to solve the limited-birthday problem on a random permutation. However, to strengthen the rationale of validity of LBDs on concrete permutations, it is highly desirable to give a formal proof that (3) is the tight bound for a random permutation. Although many works have been done on LBDs on permutations, there does not exist any previous work that gives such a formal proof.
We for the first time provide a formal proof showing that (3) is actually the (asymptotically) tight bound to solve the limited-birthday problem on a random permutation.
When the attack target is a random permutation, the limited-birthday problem can be reformalized as the game G A such that an adversary A has access to the oracles P and P −1 , where P is a random permutation, and A wins if it outputs a quadruple of n-bit strings (X fin , X fin , Y fin , Y fin ) such that X fin ⊕ X fin ∈ ∆ in and Y fin ⊕ Y fin ∈ ∆ out and P (X fin ) = Y fin and P (X fin ) = Y fin . Let G A ⇒ 1 denote the event that A wins the game G A . The formal description of G A is given in Fig. 8.
The following theorem shows that (3) is asymptotically the tight bound of the number of queries to solve the limited-birthday problem on a random permutation.

162
Improved Attacks on sLiSCP Permutation and Tight Bound of Limited Birthday Distinguishers Figure 8: The game G A that defines the limited birthday problem on a random permutation P . The lists L X , L Y , and L P are set to be empty at the beginning of the game. p s ≤ 1/2), A has to make at least queries to P and P −1 .
We first provide intuition of our proof for the theorem, and then describe the formal proof. Let I and O denote the integers such that |∆ in | = 2 I and |∆ out | = 2 O .

Proof intuition.
We provide proof intuition on the latter statement of the theorem (the number of queries should be larger than (5)) when p s = 1/2.
For simplicity, we assume that A stores a pair (X, Y ) into a list L at each query to P or P −1 . When A queries X to P , A stores (X, P (X)) into L. When A queries Y to P −1 , A stores (P −1 (Y ), Y ) into L. Intuitively, at each query, A tries its best to obtain a new pair (X, Y ) such that X ⊕ X ∈ ∆ in and Y ⊕ Y ∈ ∆ out for some existing pair (X , Y ) in L.
Without loss of generality we can assume that I ≤ O. Then, min 2 n |∆out| , 2 n |∆in| = 2 n |∆out| = 2 (n−O)/2 holds. Roughly speaking, the best strategy for A at the i-th query is to perform the following procedure: 1. Choose the largest possible subset S ⊂ L such that X ⊕ X ∈ ∆ in for all (X , Y ), (X , Y ) ∈ S.

3.
A does not query X to P if (a) A has already queried X to P before, or (b) A queried Y to P −1 for some Y before and got P −1 (Y ) = X as the response.

4.
A does not query Y to P −1 if (a) A has already queried Y to P −1 before, or (b) A queried X to P for some X before and got P (X) = Y as the response.
Let (X fin , X fin , Y fin , Y fin ) denote the final output of A, and (X i , Y i ) denote the i-th element in L P (i.e. (X i , Y i ) is added to L P when A makes the i-th query). Let X L i and X R i be the most significant I bits and the least significant (n − I) bits of X i , respectively. Let Y L i and Y R i be the most significant O bits and the least significant (n − O) bits of Y i , respectively. Let coll i be the event that X j ⊕ X k ∈ ∆ in and Y j ⊕ Y k ∈ ∆ out for some j < k ≤ i.
Then we have that Suppose that A's i-th query is made to P (but not to P −1 ). If X R i = X R j for all j < i, Pr [coll i |¬coll i−1 ] = 0. If there exist indices 1 ≤ j 1 < · · · < j s < i such that 8 We can assume this due to the following reason. Suppose that an adversary B solves the problem by making q queries if elements of ∆ in , ∆out have the form α||0 n−I and β||0 n−O , respectively. Assume that B does not query to P −1 for simplicity. Then, we can make an adversary A to solve the problem for arbitrary closed subsets ∆ in , ∆ out ⊂ {0, 1} n such that |∆ in | = 2 I and |∆ out | = 2 O by making q queries, as follows: (1) A chooses linear isomorphisms L,L : {0, 1} n → {0, 1} n such that L(∆ in ) = ∆ in andL(∆out) = ∆ out .
(2) A runs B. When B queries X, A queries L(X) to P , and returns (L −1 • P • L)(X) to B. (3) When B returns an output (X, X , Y, Y ), A returns (L(X), L(X ),L(Y ),L(Y )) as its own output. Then A simulates a random permutation perfectly, and Pr[G A ⇒ 1] = Pr[G B ⇒ 1] holds. (Here we considered the special case that B does not make queries to P −1 , but the general case can be shown in the same way.) X R i = X R j1 = X R j2 = · · · = X R js and X R i = X R j holds for all j ∈ {1, . . . , i − 1} \ {j 1 , . . . , j s }, The holds. Therefore we have (Recall that now we are assuming that 2 I ≤ 2 O holds.) Hence follows.
Upper bounding the term Pr G A ⇒ 1 ¬coll q in (6). Let bad 1 , bad 2 , and bad 3 be the events that G A ⇒ 1 holds (A wins the game G A ) and 1. (X fin , Y fin ) ∈ L P , (X fin , Y fin ) ∈ L P , 2. (X fin , Y fin ) ∈ L P , (X fin , Y fin ) ∈ L P , and 3. (X fin , Y fin ) ∈ L P , (X fin , Y fin ) ∈ L P hold just after A outputs the final output (X fin , X fin , Y fin , Y fin ), respectively. Then follows. The claim of the proposition (in the case I ≤ O) follows from (6), (7), and (11).
Next, we show the following proposition. To achieve the lower bound that is the complex combination of "max" and "min" and quite close to the known best upper bound, we introduce a technical parameter c.