New Low-Memory Algebraic Attacks on LowMC in the Picnic Setting

. The security of the post-quantum signature scheme Picnic is highly related to the difficulty of recovering the secret key of LowMC from a single plaintext-ciphertext pair. Since Picnic is one of the alternate third-round candidates in NIST post-quantum cryptography standardization process, it has become urgent and important to evaluate the security of LowMC in the Picnic setting. The best attacks on LowMC with full S-box layers used in Picnic3 were achieved with Dinur’s algorithm. For LowMC with partial nonlinear layers, e.g. 10 S-boxes per round adopted in Picnic2 , the best attacks on LowMC were published by Banik et al. with the meet-in-the-middle (MITM) method. In this paper, we improve the attacks on LowMC in a model where memory consumption is costly. First, a new attack on 3-round LowMC with full S-box layers with negligible memory complexity is found, which can outperform Bouillaguet et al.’s fast exhaustive search attack and can achieve better time-memory tradeoffs than Dinur’s algorithm. Second, we extend the 3-round attack to 4 rounds to significantly reduce the memory complexity of Dinur’s algorithm at the sacrifice of a small factor of time complexity. For LowMC instances with 1 S-box per round, our attacks are shown to be much faster than the MITM attacks. For LowMC instances with 10 S-boxes per round, we can reduce the memory complexity from 32GB (2 38 bits) to only 256KB (2 21 bits) using our new algebraic attacks rather than the MITM attacks, while the time complexity of our attacks is about 2 3 . 2 ∼ 2 5 times higher than that of the MITM attacks. A notable feature of our new attacks (apart from the 4-round attack) is their simplicity. Specifically, only some basic linear algebra is required to understand them and they can be easily implemented.


Introduction
The LowMC block cipher [ARS + 15] is the first dedicated symmetric-key primitive designed for MPC/FHE/ZK protocols. An important application of LowMC is the Picnic signature scheme [CDG + 17,KZ20], which is one of the alternate third-round candidates in NIST post-quantum cryptography (PQC) standardization process. Especially, the security of Picnic is directly related to the difficulty of recovering the secret key of LowMC from a single known plaintext-ciphertext pair.
Due to its novel design strategy and its importance, the LowMC block cipher has attracted lots of attention since its publication at EUROCRYPT 2015. However, the security of this novel design is not well-understood and the secure number of rounds of LowMC has been updated several times to resist state-of-the-art attacks [DEM15,DLMW15,RST18]. The latest version is called LowMC v3 and we simply use LowMC to refer to LowMC v3 in the following. Although many powerful attack vectors have been taken into account to determine the secure parameters of LowMC, the recent progress [LIM21,LWM + 22,Din21] in the cryptanalysis of LowMC indicates that some latest parameters are still insecure. However, the attacks described in [DEM15, DLMW15, RST18, LIM21, LWM + 22] cannot directly threaten the security of Picnic because the required data complexity is larger than 1.
To understand the security of LowMC in the Picnic setting, the LowMC team launched a public competition 1 in 2019 and some major progress has been made since then. The first important step was made by Banik et al. who found an efficient method [BBDV20] to linearize the 3-bit S-box used in LowMC. Up until now, the best attacks on LowMC with full S-box layers were achieved by Dinur's algorithm from [Din21] applying the polynomial method [LPT + 17], which is an advanced method to solve multivariate equation systems over GF(2). However, this technique always requires huge memory and it currently cannot exploit the feature of the well-overdefined equation system. For LowMC with partial nonlinear layers, the best attacks on LowMC were obtained by Banik et al. with the MITM technique [BBVY21]. We notice that the memory complexity of the MITM attacks on LowMC with 10 S-boxes per round is still somewhat high, i.e. 2 38 bits. Moreover, we observe that the guessed information of the key bits is not exploited at the phase to compute the full key in the MITM attacks. Hence, we are motivated to devise new low-memory attacks on LowMC with a single known plaintext-ciphertext pair.
Our contributions. The contributions of this paper are summarized below: 1. We observe that Banik et al.'s guess strategy [BBDV20] to linearize the LowMC S-box is not always optimal for LowMC with full S-box layers. Specifically, to linearize one LowMC S-box, instead of guessing 1 quadratic polynomial in its input, naively guessing 2 linear polynomials in its input can significantly improve the attacks. This is counter-intuitive because guessing 2 bits for each S-box is very costly to linearize 1 round of LowMC compared with guessing only 1 bit.
2. By utilizing a simple version of the crossbred algorithm [JV17] proposed in [BDT22] to solve an overdefined quadratic equation system, our attacks on 3-round LowMC require negligible memory and can achieve better time-memory tradeoffs than Dinur's algorithm [Din21].
3. To show the effectiveness of our new guess strategy for LowMC with full S-box layers, we also describe attacks on 4-round (full-round) LowMC building upon Dinur's algorithm. Specifically, we can utilize Dinur's critical observation [Din21] on the attacks on an odd number of rounds of LowMC to improve the generic complexity of the Dinur's algorithm. In this way, better time-memory tradeoffs are again achieved, as detailed at the end of Section 3.
4. For Banik et al.'s guess strategy to linearize the S-box [BBDV20], we describe how to construct a well-overdefined system of quadratic equations in key bits to efficiently recover the full key. The new method is especially efficient for LowMC with partial nonlinear layers. The major difference in our new attacks is that we make full use of the guessed quadratic polynomials in key bits while these are not used in Banik et al.'s attacks [BBDV20,BBVY21] to recover the key. It can be found that our attacks are much faster than the MITM attacks [BBVY21] for some LowMC parameters.
Our new results for LowMC are shown in Table 1 and Table 2. Note that for attacks on LowMC with partial nonlinear layers, the fast exhaustive search attack [BCC + 10] cannot work because constructing the polynomial equations is too costly for their high degree. Moreover, the MITM attacks [BBVY21] on 3-round LowMC are slower than the fast exhaustive search attack [BCC + 10].
Organization. In Section 2, we describe the LowMC cipher, introduce how to efficiently evaluate a polynomial for all possible assignments to the variables, and explain a simple version of the crossbred algorithm and Dinur's algorithm. In Section 3 and Section 4, we demonstrate the new low-memory attacks on LowMC with full S-box layers and with partial nonlinear layers, respectively. Then, in Section 5, we provide the details of our experiments to verify our attacks. Finally, the paper is concluded in Section 6.
where A i is the input state of the i−th round, K is the k-bit key, RC i is the used n−bit round constant, M i is a full-rank matrix of size n × k, L i is a full-rank matrix of size n × n and S is the nonlinear operation by applying s 3-bit S-boxes to the first 3s bits of A i in parallel.
Denote the plaintext and ciphertext by p and c, respectively. Then, we have where M 0 is also a full rank matrix of size n × k. Note that M 0 · K is the whitening key. For LowMC, RC i , L i and M j (1 ≤ i ≤ r, 0 ≤ j ≤ r) are all randomly generated, i.e. they are not fixed as in many other block ciphers. Specifically, before encrypting p, we need to compute the parameters for (RC i , L i , M j ) with a pseudorandom number generator.
The 3-bit S-box S(a 1 , a 2 , a 3 ) = (a 4 , a 5 , a 6 ) of LowMC is defined as below: where (a 1 , a 2 , a 3 ) ∈ F 3 2 and (a 4 , a 5 , a 6 ) ∈ F 3 2 denote the input and output, respectively. Note that the inverse of this S-box can be easily deduced, as shown below: a 1 = a 4 ⊕ a 5 ⊕ a 5 a 6 , a 2 = a 5 ⊕ a 4 a 6 , a 3 = a 4 ⊕ a 5 ⊕ a 6 ⊕ a 4 a 5 .

Evaluating a Polynomial of Degree d in u Variables
Given a polynomial f (x) ∈ F 2 [x 1 , x 2 , . . . , x u ] of degree d, we aim to evaluate f (x) over all x = (x 1 , x 2 , . . . , x u ) ∈ F u 2 . A naive algorithm is to evaluate each term of f (x) for each assignment to x, which results in a time complexity upper bounded by about 2 u · u ≤d bit operations, where u ≤d = d i=0 u i . By simply utilizing the Möbius transform, it is possible to achieve this purpose with u · 2 u bit operations. For simplicity, we also write a function f (x) as f in the following.
To see how it works, it is necessary to realize that the Möbius transform is an involution on the set of Boolean functions. For any such f , we can write its algebraic normal form (ANF) as follows: x bi i . Recall the common usage of the Möbius transform. Given a blackbox access to the polynomial f , we can construct the truth table denoted by TAB_F of (x, f ) by exhausting 2 u possible values of x. Based on this truth table, we can then determine the accurate ANF of f with u · 2 u bit operations, i.e. we obtain a truth table denoted by TAB_B of (b, g). Indeed, the Möbius transform is just to apply a butterfly-style algorithm to the table TAB_F to obtain a new table TAB_B. If we again apply the Möbius transform to the table TAB_B, we will obtain the table TAB_F, which is why we say it is an involution.
To see the reasons, let us first focus on the relation between g(b) and f (x). For any b, we introduce a set of indices J = {i 1 , i 2 , . . . , i j } ⊆ {1, 2, . . . , u} such that b i = 1 (i ∈ J) and b i ′ = 0 (i ′ / ∈ J). Then, we have The Möbius transform is indeed to compute g(b) for all b ∈ F u 2 with the above formula based on TAB_F in only u · 2 u bit operations.
Indeed, based on the ANF of f , for any x with x i = 1 (i ∈ J) and Therefore, if we apply the Möbius transform to the table TAB_B, we will obtain the table TAB_F.
In a word, given any concrete f , we can evaluate f over all x ∈ F u 2 in u·2 u bit operations and with 2 u memory in bits because the truth table of (b, g) can be directly extracted from the expression of f . With some extra efforts in the procedure to do the Möbius transform, the time complexity can be reduced to d · 2 u , as shown in [DS11].
In Dinur's algorithm [Din21], the memory complexity of the above standard Möbius transform is further reduced to about u · u ≤d bits and the time complexity is almost kept the same, i.e. d · 2 u bit operations.
Using Gray code for the case d = 2. For most of our new attacks on LowMC, we need to handle the case d = 2. Therefore, we use a very simple yet efficient method [BCC + 10, BDT22] to evaluate a quadratic polynomial based on Gray code. The idea is based on the fact that when only one variable changes, the values of at most u terms in f will change.
When d = 2, f can be represented as where c, a i,j ∈ F 2 . Let e i ∈ F u 2 denote the unit vector which is zero everywhere except on the i−th coordinate. Then, we have Hence, we can enumerate all the 2 u values of x using Gray code so that only a single bit of x changes at each iteration. In this way, evaluating f over x ∈ F u 2 only takes u · 2 u bit operations and the memory complexity is negligible. It is now easy to observe that when d = 1, evaluating f over x ∈ F u 2 with Gray code only takes 2 u bit operations.

A Simple Version of Crossbred Algorithm for Quadratic Equations
In our attacks, we will adopt a simple version of the crossbred algorithm [JV17] to solve an overdefined system of quadratic equations, which is described in [BDT22]. This algorithm fits very well with our attacks on LowMC for its simplicity to bound the time complexity and to implement in practice.
In other words, the system of m quadratic equations can be expressed as follows: where A is the coefficient matrix of size m × (u 1 (u 1 − 1)/2 + u 1 ) and B is a vector of size m. Moreover, for the elements in the first u 1 (u 1 − 1)/2 columns of A, all of them take constant values from {0, 1} according to f t (1 ≤ t ≤ m). For the elements in last u 1 columns of A, each of them is written as a linear polynomial in y, i.e. the coefficient of For the vector B, each element is written as a quadratic polynomial in y. For example, consider the following system of quadratic equations in (z 1 , z 2 , z 3 , y 1 , y 2 , y 3 ): Then, we can rewrite it in the form shown in Equation 3, as specified below: With the representation described in Equation 3 in mind, it is easy to understand the simple algorithm to solve m quadratic equations in u variables. The overall procedure can be described as follows: 1. Apply the Gaussian elimination on the augmented matrix A|B such that the first u 1 (u 1 − 1)/2 columns of the matrix A|B are in the reduced row echelon form. Denote the matrix after this Gaussian elimination operation by A ′ |B ′ .
2. From the last m − u 1 (u 1 − 1)/2 rows of the matrix A ′ |B ′ , we can deduce m − u 1 (u 1 − 1)/2 equations of the form where each element in A ′′ is a linear polynomial in y and each element in B ′′ is a quadratic polynomial in y. Complexity analysis. To efficiently solve the quadratic equations with the above method, it is required that the equation system Equation 5 is slightly overdefined, i.e. for each guess of y, there is at most one solution of z. Another reason to make it slightly overdefined is to amortize the cost to check the solutions. Let

Using
The time complexity of Step 1 − 2 can be estimated as bit operations as we need to perform the addition operation for quadratic polynomials in u variables for each row operation. The time complexity of Step 3 can be estimated as the sum of the time complexity to update (A ′′ , B ′′ ), the time complexity to solve Equation 5 and the time complexity to check the original m equations. We assume that Equation 5 is slightly overdefined and the cost to check the correctness of the solutions is negligible 3 . Then, the time complexity of Step 3 can be estimated as bit operations. Therefore, the total time complexity is estimated as bit operations.
2 A better way is to choose the quadratic equations from A ′ |B ′ that are not included in A ′′ |B ′′ for verification.
3 To check the correctness of a solution, we can randomly pick a quadratic equation not included in A ′′ |B ′′ to check its correctness. For the wrong solution, we expect each such equation holds with probability 2 −1 . Thus, we estimate the time complexity to check 2 ω solutions as bit operations. The value of ω can be estimated as ω = u − u 1 − ϵ, thus resulting in a complexity to check the solutions of 2 u−u 1 −ϵ+1 · u ≤2 bit operations. This complexity in most cases will be smaller than We have checked that this holds for all our attacks.

On Dinur's Algorithm
In our attacks on full-round (4-round) LowMC with full S-box layers, we will use Dinur's algorithm [Din21] to solve a system of nonlinear equations of degree 4, which is based on the polynomial method originally proposed in [LPT + 17]. Similar to Dinur's attack on LowMC of an odd number of rounds, we can optimize the generic time complexity given in [Din21] for such an equation system by taking the structure of equations into account. Therefore, it is necessary to understand how Dinur's algorithm works and how the complexity is computed. In the following, we will describe a simplified version of Dinur's algorithm.
Similarly, we consider the system of m equations denoted by E(x) as shown below: where can also be written as P i (y, z).
Next, we randomly take ℓ = u 1 +1 different equations from the m equations P i (y, z) = 0 and we denote the new system of equations byẼ(y, z) as shown below: It is easy to observe that a solution to Equation 6 must be a solution to Equation 7. However, a solution to Equation 7 is not necessarily a solution to Equation 6. The general idea of Dinur's algorithm is to efficiently enumerate all the solutions to Equation 7 and then test their correctness against Equation 6.
Assumption. We assume that when the value of y is specified, there is at most 1 solution of z satisfying Equation 7, and the corresponding (y, z) is called the isolated solution tõ E(y, z).
Let us elaborate more on the isolated solution. According to its definition, for each specified value of y denoted byŷ, there is at most 1 solution of z denoted byẑ satisfying E(y, z) and if such aẑ exists, (ŷ,ẑ) is an isolated solution toẼ(y, z). Hence, it is indeed equivalent to the assumption made above.
The above assumption is reasonable because once y is specified, we get ℓ = u 1 + 1 equations in u 1 variables. Hence, we can expect that there is at most 1 solution of z. We emphasize here that we have checked the probability of this assumption in our 4-round attacks on LowMC, as detailed in Section 5. Now suppose we have a blackbox function denoted by U (y) : F u−u1 2 → F u1 2 which can efficiently output the value ofẑ for each givenŷ such that (ŷ,ẑ) is an isolated solution toẼ(y, z). In this way, the time complexity to enumerate all the solutions toẼ(y, z) is identical to the time complexity to evaluate the function U (y) over F u−u1 2 , which can be efficiently finished via Möbius transform once the explicit Algebraic Normal Forms (ANFs) of U (y) are known.
Finding the blackbox function U (y). The core idea of Dinur's algorithm is indeed to find the above blackbox function U (y) with time complexity less than 2 u . To achieve this, let us consider an equivalent representation of the equation systemẼ(y, z) as below: In this way, the set of solutions toẼ(y, z) is identical to the set of solutions to the equation: Then, we rewriteF (y, z) in different forms as shown below: Similar to the cube attack [DS09], there is no term in Q 0 (y, z) which is divisible by Relations between U i (y) and the isolated solutions. Supposing (ŷ,ẑ) is an isolated solution toẼ(y, z), we can trivially have U 0 (ŷ) = 1 because there is only 1 value z =ẑ which can makeF (ŷ, z) = 1. If there is no solution tõ E(y, z) for the given y =ŷ, there will be U 0 (ŷ) = 0 because for each z ∈ F u1 2 ,F (ŷ, z) = 0. Note that we make an assumption that there is either no solution or 1 solution toẼ(y, z) for a given y. Hence, U 0 (ŷ) can help determine whether there is a solution toẼ(y, z) for a given y =ŷ. Once it is determined that there is 1 solution to z, the remaining work is to deduce this solutionẑ under the given y =ŷ.
To deduceẑ once U 0 (ŷ) = 1, it is necessary to observe that This can be verified and proved under the above mentioned assumption [Din21]. Hence, for a given y =ŷ, computing the solution toẼ(ŷ, z) can be described as follows: 2. If U 0 (y) = 0, there is no solution and exit.
Recovering (U 0 (y), U 1 (y), . . . , U u1 (y)). U 0 (y) can be interpolated from its value set w} and HW (a) is the Hamming weight of a. To compute each value for such a set, we need to do 2 u1 evaluations ofF (y, z). Similarly, U i (y) (1 ≤ i ≤ u 1 ) can be interpolated from its value set W u−u1 w+1 , where the computation of each such value requires 2 u1−1 evaluations ofF (y, z). In other words, to interpolate U 0 (y), we need a list ofF (y, z) of size u−u1 ≤w × 2 u1 , i.e. we need to exhaust the solutions ofẼ(y, z) from the constrained space (y, z) ∈ W u−u1 w × {0, 1} u1 . Similarly, to interpolate U i (y) (1 ≤ i ≤ u 1 ), we need to exhaust the solutions ofẼ(y, z) from the constrained space Hence, we only need to exhaust all the solutions toẼ(y, z) in the constrained space (y, z) ∈ W u−u1 w+1 × {0, 1} u1 , which is sufficient to compute all the required lists. Exhausting the solutions toẼ(y, z) in the constrained bit operations using the fast exhaustive search algorithm [BCC + 10].
Note that we only record those (y, z) such thatF (y, z) = 1, i.e. we only care about those (y, z) which are solutions toẼ(y, z). In this way, we will obtain a list of (y, z) of size of about 1 2 · u−u1 ≤w+1 . After obtaining the list, we need to construct an array of size u−u1 ≤w for U 0 (y) according to its definition. We also need to construct u 1 other arrays of size u−u1 ≤w+1 for U i (y) (1 ≤ i ≤ u 1 ) according their definitions. The time complexity of this phase is slightly larger than (u 1 + 1) · 1 2 · u−u1 ≤w+1 but is still negligible compared with Equation 8. Then, with these arrays, we can interpolate (U 0 (y), U 1 (y), . . . , U u1 (y)) with the Möbius transform, which requires less than (u 1 + 1) · u−u1 ≤w+1 bit operations and is negligible compared with Equation 8. Hence, the time complexity to recover (U 0 (y), U 1 (y), . . . , U u1 (y)) is dominated by Equation 8, i.e. it is We refer the interested readers to [Din21] for more details of this procedure as it is the core of Dinur's algorithm and explaining the full details in such a short paragraph is infeasible.
The total time complexity. To reduce the overload to check the correctness of solutions and to efficiently identify the isolated solution, we may prepare four different choices for the equation systemẼ(y, z) and we perform the above procedure for each of 4 equation systems. Then, the time complexity can be estimated as bit operations, where the second term accounts for evaluating u 1 + 1 polynomials on the space {0, 1} u−u1 for four times.
The total memory complexity. At the phase to recover all U i (y), it is required to construct a table of size of about 4 · 1 2 · u−u1 ≤dF −u1+1 bits so that the Möbius transform can work. At the phase to evaluate (u 1 + 1) polynomials over the space {0, 1} u−u1 , with the standard Möbius transform, the memory complexity is about 4 · (u 1 + 1) · 2 u−u1 bits. If using Dinur's memory-efficient Möbius transform, the memory complexity is about 4 · (u 1 + 1) · u−u1 ≤dF −u1+1 .

New Algebraic Attacks on LowMC
In this section, we will first demonstrate 3 different attacks on LowMC with full S-box layers. The first attack is a new and simple guess-and-determine (GnD) attack on 3-round LowMC by using Banik et al.'s strategy [BBDV20] to linearize the 3-bit S-box, where we solve a system of quadratic equations with the standard linearization technique. The second attack is a much simpler yet more efficient GnD attack on 3-round LowMC by using a naive guess strategy to linearize the 3-bit S-box, where we solve quadratic equations with the simplified version of the crossbred algorithm [BDT22]. The third attack is for full-round (4-round) LowMC, where we still adopt the naive guess strategy but use Dinur's algorithm [Din21] to solve equations of degree 4.

The First Attack on LowMC
The first attack is essentially based on the strategy to linearize the 3-bit S-box by guessing a quadratic relation, as first proposed in [BBDV20]. Specifically, consider the definition of the S-box as shown below: a 1 ⊕ a 2 a 3 , a 5 = a 1 ⊕ a 2 ⊕ a 1 a 3 , a 6 = a 1 ⊕ a 2 ⊕ a 3 ⊕ a 1 a 2 .
If we guess a 4 = a ⋆ 4 , i.e. the value of the quadratic polynomial a 1 ⊕ a 2 a 3 is a ⋆ 4 , then (a 4 , a 5 , a 6 ) can be expressed as linear expressions in terms of (a 1 , a 2 , a 3 ): Indeed, guessing any 1 non-zero linear polynomial in (a 4 , a 5 , a 6 ) can help linearize the 3-bit S-box, i.e. (a 4 , a 5 , a 6 ) can then be expressed as linear expressions in terms of (a 1 , a 2 , a 3 ). This property also applies to its inverse due to the similar formulas for its inverse: a 1 = a 4 ⊕ a 5 ⊕ a 5 a 6 , a 2 = a 5 ⊕ a 4 a 6 , a 3 = a 4 ⊕ a 5 ⊕ a 6 ⊕ a 4 a 5 .

An overdefined system of equations for the 3-bit S-box.
Since the algebraic attack on AES was published [CP02], though it is commonly believed the attack cannot work as expected, it has been well-known that one can find an overdefined system of linearly independent quadratic equations for the used S-box. For the 3-bit S-box of LowMC, we can find at most 14 linearly independent quadratic equations, as shown below: a 4 = a 1 ⊕ a 2 a 3 , a 5 = a 1 ⊕ a 2 ⊕ a 1 a 3 , a 6 = a 1 ⊕ a 2 ⊕ a 3 ⊕ a 1 a 2 , a 1 = a 4 ⊕ a 5 ⊕ a 5 a 6 , a 2 = a 5 ⊕ a 4 a 6 , a 3 = a 4 ⊕ a 5 ⊕ a 6 ⊕ a 4 a 5 , a 1 a 2 ⊕ a 2 a 3 , a 4 a 3 = a 1 a 3 ⊕ a 2 a 3 , a 5 a 1 = a 1 ⊕ a 1 a 2 ⊕ a 1 a 3 ,   a 5 a 3 = a 2 a 3 , a 6 a 1 = a 1 ⊕ a 1 a 3 , a 6 a 2 = a 2 ⊕ a 2 a 3 ,   a 4 a 1 ⊕ a 1 = a 5 a 2 ⊕ a 1 a 2 ⊕ a 2 , a 5 a 2 ⊕ a 1 a 2 ⊕ a 2 = a 6 a 3 ⊕ a 1 a 3 ⊕ a 2 a 3 ⊕ a 3 . The GnD attack by solving quadratic equations. We noticed that in Banik et al.'s attacks [BBDV20,BBVY21] on LowMC, the guessed quadratic polynomials in key bits are not used to recover the key and they are only used to linearize the S-box. In this attack, we will make full use of all the guessed polynomials and the overdefined system of quadratic equations for the 3-bit S-box. First, we show that at each time we guess a quadratic polynomial for the 3-bit S-box, we indeed can obtain 3 quadratic equations rather than only 1 quadratic equation. Let us take the guess a 4 = a ⋆ 4 for an instance. In this case, we have the following 3 linearly independent quadratic equations in (a 1 , a 2 , a 3 ): a 2 a 3 , a ⋆  4 a 2 = a 1 a 2 ⊕ a 2 a 3 , a ⋆  4 a 3 = a 1 a 3 ⊕ a 2 a 3 .
Indeed, if we guess any 1 linear equation in (a 4 , a 5 , a 6 ), we can always obtain 3 quadratic equations in terms of (a 1 , a 2 , a 3 ) from such a guess, which can be easily derived based on a similar strategy.
Guessed · · · · · · · · · · · · Figure 1: Illustration of the first attack on 3-round LowMC With the above observations in mind, it is now easy to understand our first GnD attack. As shown in Fig. 1, for each S-box in the first 2 rounds, we linearize it by guessing 1 output bit. In this way, each input state bit of the S-box at the first 3 rounds can be written as linear expressions in the n-bit key K = (K 1 , K 2 , . . . , K n ). In addition, for such a guess strategy, we can obtain 3 · 2s = 6s quadratic equations in K because we guess 2s quadratic equations for the 2s S-boxes in the first 2 rounds to linearize them. Finally, we deal with the third round to attack 3-round LowMC.
For the s S-boxes in the third round, we only linearize s − u 1 (0 < u 1 < s) S-boxes by guessing s − u 1 output bits. Note that the output of the S-box in the third round is linear in K. In this way, we obtain 3(s − u 1 ) linear equations in K because s − u 1 S-boxes are linearized. Moreover, as mentioned above, such a guess for these s − u 1 S-boxes also allows us to deduce 3(s − u 1 ) quadratic equations in K. For the remaining u 1 S-boxes, we can derive 14u 1 quadratic equations in K. In summary, we now obtain 3(s − u 1 ) linear equations in K and 6s + 3(s − u 1 ) + 14u 1 = 9s + 11u 1 quadratic equations in K.
It can be found that the first attack is much slower than the generic fast exhaustive search attack using Gray code proposed in [BCC + 10]. In the following, we will describe how to use a different guess strategy to make the attacks greatly outperform the fast exhaustive search attack. Moreover, it can be found later that the idea in the first attack will be used for LowMC with partial nonlinear layers, which allows to devise attacks more efficient than the MITM attacks [BBVY21].

The Second Attack on LowMC
In the second attack, we use a rather naive guess strategy to linearize the 3-bit S-box. Specifically, we guess two input bits of the S-box rather than 1 output bit of the Sbox to linearize it in the forward direction. However, different from Banik's et al.'s attacks [BBDV20,BBVY21], we only apply this guess strategy for the first round. For such a guess strategy, we directly obtain 2s linear equations in K = (K 1 , K 2 , . . . , K n ). After applying Gaussian elimination to these linear equations, we obtain s free variables Guessed · · · · · · · · · · · · Figure 2: Illustration of the second attack on 3-round LowMC At last, we consider the inverse of the S-box and obtain 3s quadratic equations in these s variables v = (v 1 , v 2 , . . . , v s ). We emphasize here that we should not consider the overdefined system of equations for the S-box here because 11 out of such 14 equations in v will be of degree higher than 2. The main reason is that the input bits are quadratic in v while the output bits are linear in v.
In Table 4, we give the optimal values of (s, u 1 , T 2 ) to attack 3-round LowMC. The memory complexity of this attack can be simply estimated as 3s · (s + s(s − 1)/2) bits, which is the cost to store the 3s quadratic equations in s variables. Improving the time complexity with the crossbred-like algorithm [BDT22]. In the above attack, the problem is reduced to solving 3s quadratic equations in s variables. We now show that we can use the crossbred-like algorithm for this system of quadratic equations to slightly improve the time complexity. First, we split the s variables v = (v 1 , v 2 , . . . , v s ) into y = (y 1 , y 1 , . . . , y s−u1 ) = (v u1+1 , v u1+2 , . . . , v s ) and z = (z 1 , z 2 , . . . , z u1 ) = (v 1 , v 2 , . . . , v u1 ). Then, as discussed in Subsection 2.3, we can rewrite the 3s quadratic equations in the following form: We choose u 1 such that To optimize the time complexity, we choose u 1 such that ϵ is minimized. According to the complexity analysis described in Subsection 2.3, the time complexity to solve the 3s quadratic equation in s variables with this method is estimated as about The time complexity of our attack on 3-round LowMC is thus estimated as T 3 = 2 2s · (3s) 2 · s ≤ 2 + 2 3s−u1 · (u 1 + ϵ) · (u 2 1 + u 1 · ϵ + s) bit operations. The optimal values of (s, u 1 , ϵ, T 3 ) are given in Table 5. The memory complexity of this attack can be simply estimated as 3s · s ≤2 bits, which is the cost to store the 3s quadratic equations in s variables.

The Third Attack on LowMC
In this part, we discuss the attacks on 4-round LowMC by combining the GnD technique and Dinur's algorithm [Din21] to solve nonlinear Boolean equations. As already shown above, if we linearize the round function of LowMC with Banik et al.'s guess strategy, we can only obtain quadratic equations in the key variables from the guess and these equations cannot be used to reduce the number of variables in the equation system. Moreover, linearizing 1 round of LowMC in this way requires to guess s bits and the problem is still to solve nonlinear equations in 3s variables. However, we can easily check that the time complexity to solve nonlinear equations in 3s (s ∈ {43, 64, 85}) variables with Dinur's algorithm is much larger than 2 2s and thus the total time complexity will be much larger than 2 3s . In other words, Banik et al.'s guess strategy is not suitable when we want to combine the GnD technique with Dinur's algorithm to improve the attacks. Therefore, we use the naive guess strategy to linearize 1-round LowMC by guessing two input bits of each S-box. Specifically, for the first round, we guess 2s input bits to linearize it. Then, apply Gaussian elimination on the 2s linear equations in K = (K 1 , K 2 , . . . , K n ) to obtain n − 2s = 3s − 2s = s free variables v = (v 1 , v 2 , . . . , v s ). In this way, we can write the input state of the S-box in the third round as quadratic expressions in v and write the output state of the S-box in the third round as quadratic expression in v as well.
In total, we can construct 3s such degree-4 polynomials in v. The problem is reduced to solving these 3s degree-4 equations in s variables v = (v 1 , v 2 , . . . , v s ). Due to the high degree, our strategies in the first 2 attacks are not useful because the cost is very high. Hence, we consider Dinur's algorithm for this system of equations.
The assumption to correctly recover v becomes that for the correct guess 4 of y, there is at most 1 solution of z satisfying {P i (v) = P i (y, z) = 0, 1 ≤ i ≤ ℓ}. Note that for each specified y, we are considering ℓ = u 1 + 1 equations in u 1 variables. Hence, this assumption holds with a high probability. Later, we will use experiments to verify this assumption.
Moreover, similar to Dinur's attacks [Din21], we prepare 4 different sets of ℓ polynomials, each set of polynomials is constructed in the above way to exploit the structure of the equations caused by the S-box. This is used to amortize the cost to check the solution (y, z) against the 3s equations. Specifically, for each guessed y =ŷ, we compute the corresponding z =ẑ if there is for each of the 4 sets. Then, when the same (ŷ,ẑ) appears more than twice, we treat (ŷ,ẑ) as a potential solution and check its correctness against the 3s equations 5 . Otherwise, we simply abandon all the suggested (ŷ,ẑ). Later, we will use experiments to simulate its success probability.
According to the complexity analysis described in Subsection 2.4, the time complexity of our attack is estimated as bit operations. The memory complexity is estimated as bits if we use the standard Möbius transform. If using Dinur's memory-efficient Möbius transform, the memory complexity is estimated as The values of (s, u 1 , ℓ, dF , T 4 , M 0 , M ′ 0 ) are given in Table 6 to optimize the attacks on 4-round LowMC. Remark. We note that there is a trivial time-memory tradeoff for Dinur's algorithm by guessing variables. Specifically, to solve equations of degree d in u variables, Dinur's algorithm generally requires u 2 · 2 (1−1/2.7d)u bit operations and u 2 · 2 (1−1/1.35d)u bits of memory. For attacks on 4-round LowMC, d = 4 and u = k. For the naive guess strategy, we guess u − u 1 variables and use Dinur's algorithm to solve degree-4 equations in u 1 variables, whose time and memory complexity become 2 u−u1 · u 2 1 · 2 (1−1/2.7d)u1 and u 2 1 · 2 (1−1/1.35d)u1 , respectively. To obtain time complexity not higher than that of our attacks, we find that the required memory complexity is larger than 2 84.6 , 2 108.2 and 2 134.2 for the parameters k = 129, k = 192 and k = 255, respectively. Hence, our attacks can achieve much better tradeoffs. The main reason is that by guessing 2s key variables to linearize the first round, the generic complexity of Dinur's algorithm can be optimized by taking some useful properties of the equations into account. However, by randomly choosing 2s key variables for guess, no properties of the degree-4 equations can be exploited.

Attacks on LowMC with Partial Nonlinear Layers
In this part, attacks on LowMC with parameters s = {1, 10}, r = ⌊ n s ⌋ and n = k ∈ {128, 192, 256} will be taken into account, which are the targets listed in the LowMC competition. The best attacks on these parameters are achieved with the MITM technique [BBVY21]. The main idea of our attacks is very simple, i.e. we exploit the overdefined system of equations for the 3-bit S-box as we do in the first attack on LowMC.
For r rounds of LowMC, there are in total s(r − 1) S-boxes in the first r − 1 rounds. First, we linearize the first λ S-boxes by guessing 1 output bit of each of these S-boxes in the forward direction. For the remaining s(r − 1) − λ S-boxes in the first r − 1 rounds, for each of its three output bits, we introduce 3 intermediate variables to represent them. Hence, there are in total 3s(r − 1) − 3λ intermediate variables and they are denoted by µ = (µ 1 , µ 2 , . . . , µ 3s(r−1)−3λ ). In this way, each input state of each round is linear in (µ, K), as shown in Fig. 3.
Using the crossbred-like algorithm. We use the crossbred-like algorithm for this problem. Specifically, we first split v into z = (z 1 , z 2 , . . . , z u1 ) = (v 1 , v 2 , . . . , v u1 ) and y = (y 0 , y 1 , . . . , y 3(sr−λ)−u1 ) = (v u1+1 , v u1+2 , . . . , v 3(sr−λ) ). Then, we find the minimal positive integer ϵ such that Then, we iterate all values of y with Gray code and compute the corresponding z with Gaussian elimination, which will take bit operations. The total time complexity of our attacks on r rounds of LowMC is thus estimated as bit operations. The memory complexity is negligible, which is dominated by storing the quadratic equations. Hence, the memory complexity is estimated as The values of (λ, u 1 , ϵ, T 7 , M 1 ) to optimize the attacks on LowMC with different parameters are given in Table 7.

Experiments
We conduct 3 different experiments to verify the correctness of our attacks. The first experiment is to verify our best attacks on 3-round LowMC. We choose the LowMC instance with the parameter (n, k, r) = (129, 129, 3) as the target. The main concern in our best 3-round attack is whether the crossbred-like algorithm can correctly work. Experiments have shown that it works as expected, and after guessing 2×43+(43−15) = 96 key bits, we can always reduce the problem to solving 15 + 9 = 24 linear equations in 15 variables z. Among 10000 random guesses, we find that only 21 solutions of z are suggested, thus resulting in a filtering probability of 0.0021 ≈ 2 −8.9 , which is almost the same as the expected filtering probability 2 −9 . As already mentioned, checking the suggested solution is cheap by evaluating randomly picked quadratic equations not included in the 24 equations. For the correct guess, we find that the key can be correctly recovered.
The second experiment is to verify the assumptions used in our attacks on 4-round LowMC. Specifically, for the chosen 4 different sets of equation systems, when the key bits are correctly guessed, how many sets will suggest the correct isolated solution (the correct full key)? Moreover, when the key bits are wrongly guessed, how many sets will suggest the same isolated solution? We choose the LowMC instance with the parameter (n, s, r) = (129, 43, 4) as the target. For this target, we need to consider 4 different equation systems derived from 8 different S-boxes, where each equation system has 6 degree-4 equations in 5 variables and it corresponds to the equations deduced from 2 S-boxes. We performed 20000 random guesses by considering 4 equation systems simultaneously, i.e. there will be in total 80000 different equation systems. We find that there is at most 1 solution for 72000 equation systems, thus indicating that the assumption for the isolated solution holds with probability of about 0.9. In addition, among these 20000 random guesses, there are only 339 guesses such that more than 2 equation systems can suggest the same unique solution, thus resulting in a filtering probability of about 339/20000 = 2 −6.2 . We also perform 20000 experiments by always correctly guessing the key bits. It is found that there are about 16700 experiments such that more than 2 equation systems can suggest the same unique solution (the correct key). All in all, the attack succeeds with probability of about 16700/20000 = 0.83 and the filtering probability is low enough to amortize the time complexity to check the suggested solutions. Indeed, according to the experiments, if we check the solution only when more than 3 equation systems suggest the same isolated solution, the filtering probability becomes about 1/20000 and the success probability becomes about 0.5.
The last experiment is to verify the correctness of our attacks on LowMC with partial nonlinear layers. As the correctness of the crossbred-like algorithm has been verified in our 3-round attack, the main concern in this attack is whether we can obtain the expected number of linearly independent quadratic equations. We choose the LowMC instance with the parameter (n, s, r) = (128, 1, 128) as the target. In this attack, we expect that after guessing λ = 114 bits, we can obtain 14sr − 11λ = 14 × 128 − 11 × 114 = 538 linearly independent quadratic equations in 3(sr − λ) = 3 × (128 − 114) = 42 variables. Then, we solve these equations with the crossbred-like algorithm using the splitting parameter u 1 = 32, i.e. solving u 1 + ϵ = 32 + 10 = 42 linear equations in 32 variables for each guess of the 42 − 32 = 10 variables. Experiments show that the 538 quadratic equations are always linearly independent and we can always construct 42 linear equations in 32 variables for each guess of the 10 variables. For the correct guess, the key can be correctly recovered. Hence, the correctness of our attacks is verified.

Conclusion
While intuitively linearizing one LowMC S-box by guessing only one quadratic polynomial seems efficient, we show that naively guessing two linear polynomials to achieve the linearization can make the attacks on LowMC with full S-box layers much better. The main advantage of this naive guess strategy comes from the great reduction in the number of unknowns because guessing 1 linear polynomial directly reduces the number of unknowns by 1. Based on this new guess strategy, we can improve the key-recovery attacks on 3 and 4 rounds of LowMC using better time-memory tradeoffs than Dinur's algorithm.
Another contribution is to take advantage of the guessed quadratic polynomials and the overdefined system of quadratic equations for the LowMC S-box to devise more efficient attacks on LowMC with partial nonlinear layers. In this way, recovering the full key is reduced to solving a much overdefined system of quadratic equations with the crossbred-like algorithm.
In conclusion, we have significantly improved the attacks on LowMC by using better time-memory tradeoffs and we expect this work further advances the understanding of the security of LowMC in the Picnic setting.