Attacks on the Firekite Cipher

. Firekite is a synchronous stream cipher using a pseudo-random number generator (PRNG) whose security is conjectured to rely on the hardness of the Learning Parity with Noise (LPN) problem. It is one of a few LPN-based symmetric encryption schemes, and it can be very efficiently implemented on a low-end SoC FPGA. The designers, Bogos, Korolija, Locher and Vaudenay, demonstrated appealing properties of Firekite, such as requiring only one source of cryptographically strong bits, small key size, high attainable throughput, and an estimate for the bit level security depending on the selected practical parameters. We propose distinguishing and key-recovery attacks on Firekite by exploiting the structural properties of its PRNG. We adopt several birthday-paradox techniques to show that a particular sum of Firekite’s output has a low Hamming weight with higher probability than the random case. We achieve the best distinguishing attacks with complexities 2 66 . 75 and 2 106 . 75 for Firekite’s parameters corresponding to 80-bit and 128-bit security, respectively. By applying the distinguishing attacks and an additional algorithm we describe, one can also recover the secret matrix used in the Firekite PRNG, which is built from the secret key bits. This key recovery attack works on most large instances of Firekite parameters and has slightly larger complexity, for instance, 2 69 . 87 on the 80-bit security parameters n = 16 , 384 , m = 216 , k = 216.


Introduction
Since Shor [Sho99] in his seminal work introduced quantum algorithms that efficiently break the discrete-log and factoring problems, researchers have set their sights to cryptographic alternatives that promise to be quantum-resistant such as lattice-based or code-based cryptography. In particular, cryptographic primitives whose security relies on learning problems, such as Learning Parity with Noise (LPN), Learning with Errors (LWE), and the closely related Ring-LPN, are receiving great attentions as they are built on supposedly hard problems. 1 Moreover, Impagliazzo and Levin showed that cryptography is only possible if efficient learning is not [IL90]. Besides the absence of an efficient LPN-solving quantum algorithm, LPN-based constructions are desired as they can be efficiently implemented using mainly XOR ('exclusive or') operations, thus achieving popularity in lightweight cryptography on constrained, low-powered devices. However, most LPN constructions are inclined towards asymmetric cryptography and they have their own disadvantages. These include the requirement to produce and extract randomness (cryptographically secure bits) from an entropy-limited source, causing a significant overhead cost [HDWH12,Sho99], and that they also often require large public keys.
Bogos, Korolija, Locher and Vaudenay [BKLV21] proposed Firekite, a synchronous symmetric cipher, using an LPN-based PRNG which requires only one cryptographically strong bit vector to construct the secret matrix key. A small key size is attained by moving from an LPN problem to a Ring-LPN problem [HKL + 07]. Their study conjectures that the corresponding Ring-LPN instance remains hard to solve when using said matrix instead of a fully random matrix. They demonstrated that using the Firekite noise distribution for an LPN instance is still secure and there is a 'partial' transformation to an LPN instance. Using the best BKW-style algorithm proposed by Levieil and Fouque [LF06], Firekite's designers estimated the complexity to break the transformed LPN instances, thus derived concrete complexity results for attacking their cipher. The cipher's efficiency was tested in terms of the throughput, which is the number of bytes encrypted or decrypted per second using both desktop computers and FPGAs. They also showcased that, given dedicated hardware, the Firekite PRNG can be parallelized, hence throughput improved substantially for larger parameters.
One can draw many parallels between Firekite and the closely related LPN-C [GRS08b]; in particular, both involve computing a noisy product using a secret random matrix M and a random error vector e. However, LPN-C further requires an error correcting code C and the error vectors are drawn from a Bernoulli distribution, as opposed to being bounded as in the Firekite PRNG. This could make the decrypting process fail once the error weight exceeds the code's error correcting capacity. This drawback could be amended by truncating the binomial distribution to make sure not too many bits are set in the error vectors. However, it is speculated that doing so may have a negative impact to the security of LPN-C [BKLV21]. Furthermore, LPN-C inherently requires a large random secret matrix and samples two uniformly random vectors for every invocation of the encryption algorithm. Hence, it becomes infeasible to implement it efficiently when implemented in a constrained environment. Firekite, besides avoiding such undesirable features, surpasses LPN-C by not requiring fresh random bits for each output block.
Even more important than constructing schemes that are potentially quantum secure, it is crucial to try to attack them with the most suitable approaches to better understand their security.

Contributions
In this work, we propose both distinguishing and key-recovery attacks for Firekite. We observe that the secret matrix is fixed throughout every round of encryption. Hence, if the vector components in the internal states collide to the zero codeword, the outputs of Firekite, when combined together appropriately, result in unusually low weight sums and can be detected. In other words, finding such occurrences amounts to solving a birthday paradox problem with a specific target weight.
We then consider the secret matrix as the generator for a code and by carefully determining which positions in the above combinations are free of errors, we describe a key-recovery attack with a slightly higher complexity than that of the distinguishing attack.
As an example, we apply the distinguishing attacks on the Firekite cipher with specific parameters that target 80-bit and 128-bit to understand better Firekite's security. In particular, we launch both a distinguishing attack and a key recovery attack on parameters n = 16384, m = 216, k = 216 with complexity 2 68.87 and 2 69.97 , respectively. As there are many choices of parameter sets for each security level, the complexity numbers vary a bit depending on selected parameter sets.

Organization
The paper is organized as follows. Section 2 presents preliminary and background knowledge regarding the LPN problem and its variants such as Ring-LPN. A brief review of the LPN-based Firekite PRNG, and how it gave rise to the Firekite synchronous stream cipher follows. We then describe our idea, and formally analyze our attack for Firekite in Section 3. In Section 4, we attack different parameters proposed for Firekite and verify our approach by a simulation with smaller parameters. We describe our key-recovery attack in Section 5 and discussions on how to improve Firekite finally concludes our work.

Background
Whereas the LPN problem usually finds its cryptographic applications in the public-key domain, we will be interested in its application in symmetric cryptography. In particular, we have seen constructions of a few synchronous stream ciphers [GRS08b, BKLV21] based on LPN.
A synchronous stream cipher is a symmetric cipher, in which a stream of pseudorandom bits is generated independently of the plaintext and ciphertext messages, and then bitwise XOR-ed to the plaintext, to encrypt, or to the ciphertext, in order to decrypt. Cryptanalytic attacks either aim to distinguish the output of the pseudorandom bit generator from random source, recover the state of the pseudorandom generator, or recover the key. As known plaintext for a segment of ciphertext implies knowledge of the keystream for the same segment, a known plaintext attack of a synchronous stream cipher assumes that a large part of the keystream is available to an attacker which is only limited by the maximum number of keystream bits allowed to be output for a same key. Distinguishing attacks [HJB09] on the (known) keystream are relevant to the security of stream ciphers as well: depending on the nonranomness detected, some information on the plaintext may be leaked. For some stream ciphers, a distinguishing feature can even be elaborated to a key recovery attack, as is the case for the distinguishing property we shall derive for Firekite.

The LPN problem
LPN is an important problem in cryptography. It appears as one of main problems on which we base post-quantum cryptography. Due to the existence of fast algorithms for quantum computers that can solve the factorization and the discrete logarithm problems [Sho99], the LPN problem (and the related LWE problem) including its different versions are of great interest. No fast quantum algorithm that solves the LPN problem is known. Although current omnipresent symmetric encryption schemes such as AES will likely not be rendered obsolete in the near future, studies in post-quantum cryptography, namely aforementioned works, are of absolute necessity. We need post-quantum cryptographic primitives to have efficiency, confidence, and usability [Ber09].
Cryptographic constructions based on LPN are also appealing, since only simple operations such as bit-wise addition (XOR) and scalar products are used. This can give rise to efficient algorithms or protocols.
The LPN problem can informally be described as the problem of solving a noisy binary system of equations. We formally define it below.
Let Ber η be the Bernoulli distribution with parameter η ∈ (0, 1 2 ) and a bit e ← Ber η be such that Pr ). An LPN oracle Π LPN for x and η returns pairs of the form where e ← Ber η , and ⟨x, g⟩ denotes the scalar product of vectors x and g. Definition 2. (LPN problem). Given an LPN oracle Π LPN with parameters m and η. The (m, η)-LPN problem is finding the secret vector x and is said to be (T, N, δ)-solvable if there exists an algorithm A asking for at most N oracle queries, using time at most T and The definition above is known as the search version of the LPN problem. In the decisional version of the LPN problem, the objective is to distinguish pairs from Π LPN from uniformly random samples. The search and decisional versions are proved to be computationally equivalent [KS06].
We briefly look at a subclass of LPN problems called Ring-LPN which proves to be useful in general and specifically used in the Firekite PRNG. Let f be a polynomial over Z 2 and R = Z 2 [x]/(f ) denote the quotient ring. Hence R consists of all polynomials over Z 2 of degree less than that of f . We say r ← Ber R η if the coefficients of the ring element r ∈ R are assigned independently following the distribution Ber η . If r is drawn uniformly from R, we write r U ← − R. The Ring-LPN problem can be defined similarly to the standard LPN problem.
where e ← Ber R η .
Definition 4. (Ring-LPN problem). Given a Ring-LPN oracle Π Ring-LPN with parameters η and a polynomial ring R. The Ring-LPN problem is finding the secret polynomial s ∈ R and is said to be (T, N, δ)-solvable if there exists an algorithm A asking for at most N oracle queries, using time at most T and It is worth pointing out the essential difference between LPN and Ring-LPN. If we query the LPN oracle N times, then we can collect an m × N matrix G = g T 1 . . . g T N and each column is generated independently. In the case of Ring-LPN, only one polynomial r is generated uniformly random in R. If we consider a polynomial as its coefficient vector, only the first column r is drawn uniformly random . The other columns are obtained via shifting r [HKL + 12]. While the LPN problem has been shown to be N P-hard in the worst case [BMvT78], the hardness of Ring-LPN is not known. However, there is a reduction from Ring-LPN to LPN and the assumption is that Ring-LPN is also hard.

Firekite's PRNG and Firekite construction
We recall that the decisional version of the LPN assumption can be interpreted as one can not efficiently distinguish an LPN oracle from a source providing random bit vectors of length m + 1. Naturally, it can be extended into stating that distinguishing a noisy product of an m × n matrix M and a secret vector v, i.e., vM + e from a random n-bit vector in Z 2 , where e is a n-bit noise vector is hard. As an example, LPN-C further used a [k, n] error correcting code C with a generator matrix G to encode a plaintext x to a ciphertext c through c = xG + vM + e.
However, this construction inherently asks the source to produce random v and e for encrypting a single plaintext. The Firekite PRNG circumvents this problem by extracting both v and e from the noisy product and feeding them iteratively into the next encryption invocations. Out of n bits, one can spare m + k · log n bits to initialize the next round of Firekite. 2 Let || denote the usual concatenation of vectors. We write vM + e = (g||v ′ ||c e ). (1) Assuming n ≫ m, one can split the noisy product into three components as in (1), then m bits are used for producing the next vector v ′ . Since e is only required to be a sparse n-bit vector, we can have a compact representation of the next noise vector, called c e . Then, the remaining bits, forming g, are the PRNG's output. We are now in the position to describe the Firekite PRNG formally. Let m, n, and k be some integer parameters, where n ≫ m and n is a power of 2. A secret key M is a binary matrix of size m × n, and w is a vector of length m + k log n < n. Together they form a pair (M, w), the state of the PRNG. We define w = v||c e , where v and c e are of length m and k log n, respectively. As stated above, M is fixed and w is updated for every iteration. It is straightforward to assign v = v ′ . To get the next error vector e, we further parse c e = c 1 ||c 2 || . . . ||c k where c i is of length log n. Hence, each c i can be seen as the binary presentation of a non-negative integer less than n. Therefore, c e encodes an n-bit error vector of weight at most k. In particular, let b cj be the unit vector of length n, where the bit at the position represented by c j is 1. Then the error vector e is defined as e = k j=1 b cj . Note that this construction implies e is not a Bernoulli distributed error. The execution of the Firekite PRNG is described by Algorithm 1. At each iteration, the PRNG's input is its state (M, w), where the first m bits and the remaining k log n bits of w are set to be v and c e respectively. Then the error vector e is derived from its concise representation c e and the noisy n-bit product is computed as vM + e. This vector is again parsed into g and w ′ of length d = n − m − k log n and m + k log n, respectively. The internal state is then updated to (M, w ′ ) and g is the output of the PRNG. The number r of randomization rounds is needed to guarantee that v is free from significant biases when Firekite begins to output its keystream [BKLV21].
Firekite is a synchronous stream cipher that makes use of this PRNG to produce the d-bit keystream g directly. Therefore, for each invocation, d-bit data of a plaintext is encrypted, and the next output of Firekite depends on the updated internal state. The designers pointed out that, for practicality, the parameters m, n, and k need to be large which in turn makes the secret key M big. In order to solve this problem, they proposed the following technique, which turns the LPN instance into a Ring-LPN instance. Consider R = Z 2 [X]/(X b − 1), i.e, the polynomial ring with binary coefficients reduced modulo , meaning that we shift the entries in the coefficient vectors q 1 by i − 1 times. Hence, we can construct a b × b matrix Q by shifting the first row to the left consecutively b − 1 times. The secret matrix M is obtained by generating the first m rows, then dropping the last b − n columns of Q. Therefore, the secret key of Firekite PRNG is, in fact, the random b-bit vector q 1 rather than an m × n matrix M. The designers conjectured that using such M does not substantially reduce the security compared to a fully random matrix M. Table 1 shows a few sets of suggested parameters for Firekite that correspond to 80 and 128 security bit levels. Other proposed parameter sets can be found in [BKLV21].
To derive an estimation of the concrete security of Firekite, one faces two problems: first, the noise vectors from Firekite has weight at most k and the noise distribution is not binomial, as opposed to a standard LPN instance. Second, an adversary only sees a part of the noisy product. Therefore, it is necessary to prove that using Firekite noise distribution for an LPN variant is still hard, and the underlying problem of solving Firekite is as hard as LPN.
The first problem is solved as follows. Let ∆(e) denote the Hamming weight of an n-bit vector and e j the j-th bit of e. If e comes from Firekite and assume each c e is uniformly distributed, then Pr[e j = 0] = n − 1 n k . Therefore, the expected Hamming weight of the Firekite noise (denoted by ∆ Firekite (e)) is and one can show that In a standard LPN problem with parameters η and m, E[∆ LPN (e)] = ηm and Pr[∆ LPN (e) = ⌊E[∆ LPN (e)]⌋] ∈ Ω(1/n). Therefore, given such an LPN instance, we set k such that ηm ≤ k, e.g., k := 3 2 ηm. Then the noise of this LPN instance could come from the Firekite noise distribution with probability at least Ω(1/n). In other words, if the LPN instance with the Firekite noise distribution can be broken efficiently, any standard LPN instance can also be broken with O(n) more work.
As for the underlying problem of solving Firekite, Firekite's designers were able to show that it is at most as hard as the LPN problem, and they also conjectured that the reverse is also true [BKLV21]. Using this transformation to attack Firekite with the most efficient LPN-solving algorithm, namely the one by Levieil and Fouque [LF06], they were able to derive the concrete proposed parameters for the different security levels.

The problem of observing noisy codewords from an unknown code
The task of recovering partially the secret matrix M (by observing vectors g i ) can be seen as identifying an unknown code by observing noisy codewords. The problem often arises in different contexts [MGB12], especially in analyzing cryptosystems where encryption involves error-correcting codes and the transmission is carried over a noisy channel (e.g., a binary symmetric channel). General approaches consist of three steps: first, arranging noisy codewords as rows of a matrix, then running the Gaussian elimination, and finally from the non-echelon part finding sums of vectors that are candidates to construct dual codewords (i.e, parity-check equations). Instead of looking at only columns that sum to 0, Sicot, Houcke and Barbier argued that sparse sums of columns can also be candidates for being dual codewords [BSH06,SHB09]. Therefore, the last step can be reduced to an instance of the well known close neighbors search problem. Beside the projection method proposed by Cluzeau and Finiasz [CF09], which aimed to find sparse sums of p columns (with complexity of order Ω(n p/2 ) when p is even) that are equal in some positions using birthday paradox and hashtables, there have been many improvements and extensive studies to the close neighbors search problem recently. In particular, one being the Dubiner method which later was applied by Carrier and Tillich in their generalized approach [CT19]. Their algorithm only performed a partial Gaussian elimination for the second steps. The argument is that the Gaussian elimination increases the noise by combining noisy codewords, hence it is more likely to obtain sparse sums in the early stage of the Gaussian elimination and minimizing the dual codewords that might have been undetected by Sicot-Houcke-Barbier algorithm [CT19]. Moreover it also allowed them to find dual codewords of much larger weight (compared to the full Gaussian elimination) with reasonable complexities.
In practice, the recovery of an unknown code by observing noisy codewords concerns useful families of codes, such as cyclic codes, convolutional codes, turbo codes, or the ubiquitous LDPC, which is important as finding low-weight dual codewords is essential in determining communication components such as unknown interleaver [BSH06,Tix15] or reconstructing other families of codes.
In the next section we introduce a new method, namely finding a small number of noisy codewords summing to the zero codeword through a generalized birthday type of algorithm.

The proposed distinguishing algorithm
In this section, we aim to give a brief description of the idea used in our distinguishing attacks on Firekite. We firstly observe that the secret key matrix M is fixed throughout the rounds of Firekite; hence, the keystream output by Firekite PRNG is subjected to accumulating non-randomness. Let us look at the Firekite PRNG, fulfilling where v ′ and c e are used in the next iteration by assigning v ′ = v and e = k j=1 b cj , and g is the PRNG's output. In the initial part of the attack, we concentrate on assuming the knowledge of where g is a known d-bit vector, M ′ is now considered as an m × d secret binary matrix (obtained from the first d columns of the original M matrix) and e ′ is a secret d-bit noise vector, being the first d positions of e. It is known that ∆(e) ≤ k (which is small); hence, the weight of e ′ is also small. The expected weight of e ′ denoted byk, wherek = k·d n since it is assumed that the ones in e are uniformly distributed among all d positions.
In a synchronous stream cipher attack, we assume that an adversary has access to a long output stream, which means access to a large number of d-bit vectors g. The set of these vectors is written as i and for some S to be addressed in the following subsections. We first sketch the ideas behind our distinguishing attack, i.e., given an aforementioned set of vectors, decide whether they originate from Firekite or if they are random vectors.
Our goal is to find a subset of g ij vectors, j = 1, . . . , ℓ, such that the corresponding ℓ j=1 v ij = 0, i.e., we find a set of noisy codewords such that the underlying information vectors sum to zero. If ℓ vectors v ij , j = 1, . . . , ℓ, sum to zero, then the sum of the corresponding g ij is expected to be of weight c ω = ⌈ℓ ·k⌉ with nonzero contributions coming only from the errors e ′ ij . Indeed, we then have Therefore, when ℓ is not too large, e.g. ℓ = 4 or ℓ = 8, the expected weight in ℓ j=1 g ij will be low if ℓ j=1 v ij = 0. Since d is much larger than c ω (with proposed parameters for Firekite), such a weight is very unlikely if the vectors g ij are random vectors. In the Firekite PRNG, such a collision of vectors of length m (i.e, with probability proportional to 2 −m ) guarantees a low weight vector of length d. It is only intuitive to deduce that we can detect such occurrences more frequently than what is expected in the random case.

A basic algorithm for finding noisy codewords summing to the zero codeword
Recall that we want to find ℓ different g ij vectors, j = 1, . . . , ℓ, such that the associated unknown vectors v ij sum to zero. Our approach is built from ideas from the generalized birthday attack [Wag02] and the BKW algorithm [BKW03]. A different but related approach is also May et al.'s Match-and-Filter algorithm [BM17].
In a simplified description following [Wag02], we set up ℓ (ℓ = 2 t is a power of 2) lists of size 2 c filled by g i vectors. We then combine the lists pairwise, resulting in a new list containing vectors created as a sum of two vectors, one from each initial list, such that some c predetermined positions are all zero. The expected number of vectors in the new list is 2 c . After the first step we have ℓ/2 lists. We then perform the same procedure again, reducing another c positions to zero until one single list remains, i.e, after t steps.
In the remaining list, we will finally examine whether there are vectors ℓ j=1 g ij that are candidates to satisfy ℓ j=1 v ij = 0. In fact, they are quite easily detected, since if this is the case then ℓ j=1 g ij = ℓ j=1 e ′ ij , which has very low weight. As in the BKW algorithm framework, one may use the same list for g i vectors and we increase the list size to roughly 3 · 2 c . Starting with a list L (0) , we can write up a sequence of updated lists L (0) → L (1) → L (2) · · · → L (t) , where in each step we reduce another c positions. This means that L (i) have vectors where the first i · c positions are all zero. On average, there are three vectors that collide in given c positions. Therefore, we can have three combinations for such vectors and the size of L (i) can be kept (hence, the motivation for the factor 3). We formally describe this approach in Algorithm 2.
Filter for low weight | · | ≤ cω  We need to consider complexity and memory of the algorithm. Let this computational complexity measured in simple operations be denoted C and the used memory in bits be denoted Mem. Its main parts are the L (i) = Combine(L (i−1) ) steps in the loop. We assume that the vectors in the list L (i−1) are organized in a hash table. We have that the first (i − 1) · c positions are all zero in all vectors in L (i−1) , and they are again sorted in different buckets in the hash table according to the value of the next c positions, i.e., position (i − 1) · c to i · c − 1, for i = 1, . . . , t. The Combine step now creates new vectors for the new list L (i) by adding together all possible pairs that are stored in the same bucket. This will cancel out another c positions so that vectors in L (i) start with i · c zeros. New vectors are created until the list L (i) has cardinality 3 · 2 c and the sorting procedure is repeated for the next iteration. 4 The complexity of one Combine step is then 3 · 2 c bit-wise additions of vectors of length at most d and storing the result in memory. We adopt Firekite's designers' notation by letting p be the word-length of a bit-wise addition operation, i.e, the number of bits for which an XOR operation can be computed. 5 . We write the cost of one d-bit XOR operation as (1 + ⌊d/p⌋) This procedure is repeated t times in Algorithm 2. The final check for low weight vectors actually does not need to go through all buckets, but only those with a low weight (for instance, one can sort the vectors in L (t) by their next c positions). This cost is then much smaller than the previous steps and can be disregarded. The complexity can thus be estimated as The required memory M is the storage of two lists, altogether at most M = 2 · 3 · 2 c · d in bits. In the next subsection, we investigate the success probability of the distinguisher.

Parameter choices and the success probability of the proposed algorithm 3.2.1 Algorithmic steps
Since the added noise in the ℓ j=1 e ′ ij expression becomes significant as ℓ grows large, a low-weight sum from Firekite will become hard to distinguish as ℓ grows (from the random case). We hence fix the number of algorithmic steps t to 2 or 3, corresponding to ℓ = 4 and ℓ = 8 in the proposed algorithm, respectively.

The required Firekite output observations
A vector formed as ℓ j=1 e ′ ij will be called a zero sum vector. Furthermore, considering a sum of error vectors, e.g., ℓ j=1 e ′ ij , we say that a position is error free mod 2 if ℓ j=1 e ′ ij is zero in that position; we say that a position is simply error free if all e ′ ij are zero in the position. It can happen that a double error event occurs, i.e., two ones in the same position and 1 + 1 = 0. The required Firekite output observations (i.e., |L (0) | = 3 · 2 c ) has to be chosen such that zero sum vectors can be found after Algorithm 2. Moreover, they must be error free mod 2 in the first t · c positions. This probability is denoted P nf (noise-free), and we investigate this probability for both case ℓ = 4 and ℓ = 8.
The case ℓ = 4 Starting with ℓ = 4, we are interested in knowing if a zero sum vector can be found in the final list. The expected number of zero sum vectors in the final list is denoted by N . We have such zero sum vectors, which can be roughly explained as follows: there are 3·2 c 4 possible combinations from the initial list. Among all such 4-sums, only a fraction 2 −m will correspond to a zero sum in v ij , j = 1, . . . , 4. Then, there are 3 ways to choose 2 pairs as in Algorithm 2. We consider two particular pairs {g i1 , g i2 } and {g i3 , g i4 } summing to a zero sum, we further condition g i1 and g i2 to cancel in the first c bits with probability 2 −c (the other pair automatically follows). Finally, we assume that e ′ i1 + e ′ i2 + e ′ i3 + e ′ i4 is zero in the first 2c positions, i.e, error free mod 2. P nf can be bounded by the probability that the first 2c positions are error free. For each e ′ i1 , there are at most k bits set, uniformly distributed among n positions, 6 so the probability of one error vector, being error free in the first 2c position, is roughly ((n − 2c)/n) k . Therefore, we have P nf ≥ ((n − 2c)/n) 4k .

Lemma 1. When ℓ = 4, we expect to have
zero sum vectors in the final list in Algorithm 2.
The case ℓ = 8 Next, we investigate ℓ = 8. Similar to the case ℓ = 4, we have: The explanation is again as follows: the number of different sums of 8 vectors that can be constructed is 3·2 c 8 . Among them, we expect a fraction of 2 −m summing to the zero case. There are 7 · 5 · 3 = 105 ways to form 4 pairs of 8 vectors. Consider the particular pairing {g i1 , g i2 }, {g i3 , g i4 }, {g i5 , g i6 }, and {g i7 , g i8 }. A sum constructed from this pairing will be in the final list of Algorithm 2 if g i1 + g i2 , g i3 + g i4 and g i5 + g i6 are all zero in the first c positions. Then g i7 +g i8 has to be zero in the first c positions. The probability of this event for each choice of fixed indices is 2 −3c . Similarly to the 4-sum, now (g i1 + g i2 ) + (g i3 + g i4 ) must sum to zero in the next c positions, with probability 2 −c . Finally, we also need the sum of error vectors to be error free mod 2 in 3c positions.
As before, P nf can be bounded by the probability that no errors occur in the first 3c positions. The probability of such a distribution for a single error vector is then roughly ((n − 3c)/n) k and for all eight of them we have Lemma 2. When ℓ = 8, we expect to have

zero sum vectors in the final list in Algorithm 2.
For ℓ = 8 there are more errors in general, meaning that P nf is much smaller compared to ℓ = 4. This gives a stronger motivation for examining other error patterns such as the double errors canceling out. In particular, the sums from our algorithm can have 1 + 1 = 0 in the first 3c bits. More specifically, if two error vectors have a one in the same position, their combination still survives the Combine step in Algorithm 2. For some parameters proposed by Firekite's designers, certain double error events are even more likely than having no error at all in the first 3c positions and thus should not be neglected. Since the error vectors are sparse (e.g., k = 16 ≪ n = 1024), if a double error occurs at a position, it most likely happens only once, i.e, coming from one pair of g ij (or equivalently, e ′ ij ). Having four ones in the same position is exceedingly rare for interested parameters (see Appendix, Example 1). Therefore, we can have a lower bound of P nf by considering only non-repeating double errors. Let us look at the simple case where errors from Firekite have exactly k bits set. 7 Assume we have ϵ ≤ k double errors, and the probability is denoted by P ϵ . Then P ϵ is equal the sum of all possible error patterns/combinations of g i vectors, provided they result in ϵ collisions. One writes ϵ = i,j>i ϵ ij , 7 The expected weight of errors from Firekite can be smaller than k. Hence using binomial expressions, while not entirely correct, gives a good approximation.
where ϵ ij denotes the number of double errors between g i and g j . The total number of errors in 3 · c positions of g i is ϵ i = j ϵ ij . Hence where {(ϵ ij )} is an eligible error colliding pattern of g i vectors with the corresponding probability P ϵ,{(ϵij )} . where Proof. Assume ϵ double errors and a fixed error colliding pattern {(ϵ ij )}. Without loss of generality, we further assume that ϵ i ≥ ϵ j for i < j, i.e., g 1 has the most errors in the first 3c bits. The probability of g i having ϵ i errors in the first 3c bits is We also requires g 2 to have ϵ 12 colliding positions out of ϵ 2 . This probability is ϵ1 ϵ12 Similarly, for vector g 3 , the colliding probability is ϵ1 ϵ13 Generalizing for g i and the lemma follows.
However, it is not practical to take into account all possible double error events. For instance, if the expected numbers of 1's in the first t · c positions for each error vector is small, e.g., fewer than 2, multiple double error events occur with decreasingly small probability (from a certain point). Moreover, they do not contribute substantially to our estimate (Appendix, Example 2). Moreover, an improvement in estimating P nf only suggests that we need less input for Algorithm 2, i.e., the bigger P nf is, the smaller c to satisfy (5). A reasonable approximation suffices for us to deduce the necessary initial list size |L (0) | so that the expected number of zero sums is N > 1.
To illustrate this argument, we consider the case where we have the relative Hamming weight of the error vectors e i 's in the first 3 · c positions to be slightly larger than 2 (ℓ = 8). Hence, we focus only on the scenarios of up to 2 double errors in Figure 3.
One can find the inspiration from Wagner ℓ-tree algorithm [Wag02] in Algorithm 2, namely, by consecutively canceling out c bits. Wagner argued that one needs lists of size O(2 m 1+log ℓ ) to have a solution in the exact ℓ-list birthday problem. In our algorithm, we need slightly more, 8 i.e, O(2 m 1+log ℓ +a ) where a depends on P nf . Note that P nf remains relatively the same if c ≈ m 1+log ℓ . Therefore, we initially set c = ⌊ m 1+log ℓ ⌋, then raising until we get N > 1. Finally, we verify N > 1 again with P nf estimated by said c.

The success probability
Previously, we have seen that if we choose parameters suitably, we can detect zero sum vectors in the final list. We now need to check whether low weight sums can stem from random vectors. In other words, the zero sums must be easily distinguished from those coming from the random case.
Assume Algorithm 2 outputs 'Firekite' for the random case. This means that it has found a vector in the final list of weight at most c ω . It is thus of interest to derive the likelihood of such a vector in the random case. Recall ∆(g) as the Hamming weight of a binary vector g, and let g [i] be the i-bit truncated g (first i positions). A vector in the final list will have the first t · c positions all zero, but the remaining positions d − t · c positions are just formed by XOR-ing ℓ random bit values; thus, they are independent and uniformly distributed on {0, 1}. The probability of such a vector having Hamming weight at most c ω is and the expected number of vectors of weight at most c ω , denoted by N random , in the final list is Information theoretically, we have an approximation 9 as where H is the binary entropy function and H(p) = −p log(p) − (1 − p) log(1 − p) with p ∈ (0, 1). Therefore, if there exists a low weight sum in the final list L (t) and N random is 'vanishingly small' (i.e., N random ≪ 1), we have shown that the Firekite's output vectors are indeed not random. Different values of N random for various parameters can be found in Table 2.

Results for the distinguisher
In this section we give the results for our distinguishing attack as described in Section 3 when it is applied on the suggested parameter sets for Firekite. 9 We use 2 n n i ≈ 2 (1+H(i/n))n . The final approximation is due to overwhelming contribution of 2 −(1−H(cω /(d−t·c)))(d−t·c)+c .

Theoretical complexity estimation for the proposed parameters of Firekite
We investigate the results for Firekite's proposed parameters. As an example, we explain our distinguishing attack for the case n = 1024, m = 216, d = 648, k = 16, where the claimed security level is 82.
1. For ℓ = 4 we derive the following: we pick c = 76, c w = ⌈4 ·k⌉ = 41, wherek = k·d n . Let P 0 denote the probability of the first 2 · c position being error free. By simply 2. For ℓ = 8 we derive the following: picking c = 62 and similarly, c ω = ⌈ℓ · k · d n ⌉ = 81, we have N random very close to zero.
As an example, we approximate P nf by the sum of probabilities of no double errors P 0 , one double errors P 1 , and two double errors P 2 (ϵ = 0, 1, 2). As discussed, many double errors are improbable and we focus on the most likely cases. • If there are two double errors, then there are two cases: the double errors happen in one pair or two pairs (note that a vector in the first pair can appear in the second pair). Let P 21 and P 22 denoted such events, respectively, then where P 21 ≈ 8 2 Therefore, P 2 ≈ 2 −35.86 , and P nf > P 0 + P 1 + P 2 ≈ 2 −34.65 ≈ 4P 0 which gives N > 2.7. The failure probability for this attack when c ω = 81 can be indicated by N random ≈ 2 −90.23 . Table 2 shows our attack's complexity and the corresponding N random for a few sets of parameters suggested by the Firekite's designers. The number of required Firekite output observations is indicated by the parameter c. Recall that the theoretical complexity is In their implementation, beside several optimization flags, they also use a compilation flag -mavx2 that allows XOR operations to apply on 256 bits per cycle. Therefore, in our complexity estimates, we set p = 256. With 4-sum attacks and for small parameters of 80-bit secured Firekite, we can only refine Firekite's designer estimates marginally. However, our 8-sum distinguisher manages to break Firekite for all parameters, except for the smallest 128-bit secure instance, which is n = 1024, m = 352, and k = 16. In particular, we can find a zero sum with the cost 2 107.75 but log(N random ) ≈ 83. Therefore, we were unable to claim that the Firekite's output is not randomly distributed as the low weight sums found could easily come from random vectors. The explanation is that d = n − m − k log n is not so large compared to 8 ·k in this case; hence, it is impossible to distinguish from the case of random vectors g i . In general, 8-sum attack performs slightly better when the parameters n and k grows (with the same factor, as suggested by Firekite's designers). This is owing to the fact that d grows bigger while m remains relatively unchanged; hence we have even smaller failure probability and bigger error free probability P nf . In fact, we need a smaller initial list (3 · 2 60 compared to 3 · 2 62 ) when attacking Firekite instance with n = 16384, m = 216, k = 216.
These theoretical results above can be improved; larger Firekite parameters make the double error events more probable. For instance, attacking the parameters n = 16384, m = 352, k = 228 with 8-sum distinguisher, we find that two double errors (P 2 ) are twice as likely as no error (P 0 ). Therefore, P nf should be better approximated by taking, e.g., P 3 and P 4 into consideration.

Simulation results for smaller parameters
We verify our approach and formulas by performing simulations. 10 As a toy example, we set up a mini version of Firekite with small parameters (where the ratios n k are kept constant as in Table 1 and m is also reduced by a similar factor) and run Algorithm 2. Our parameters are m = 52, n = 256, k = 4, b = 269 and r = 15. Recall that b is the secret key's length used to generate the first row of Firekite's secret matrix M such that ( and r is the number of randomization rounds before Firekite generates its actual output.
One has P 0 = n − 2c n 4k ≈ 2 −3.5 . If we choose c = 18, i.e |L (0) | = 3 · 2 18 , there are, on average, less than 1 bit set of the error vectors in the first 2c bits. One can safely assume P nf ≈ P 0 , and it gives N ≈ 3.6. The simulation returns, on average, 1.65 low weight vectors after 10 2 tests. The discrepancy can be explained as follows: the assumption that we can keep the list's size |L (i) | = 3 · 2 c is often violated as there are more vectors after every Combine step owing to vectors being not evenly distributed among buckets. Therefore, 'good' combinations that are present in zero sums might be discarded by chance. Keeping all combinations from Combine, we obtain more low weight sums after filtering with c ω (at the cost of higher complexity) and the simulation is more consistent with the theoretical estimate. We now look at the probability of 4 random vectors summing to such sums: • For the 8-sum distinguisher, the filter weight is chosen to be c ω = 21. Again, c must fulfill where P nf ≈ P 0 + P 1 + P 2 ≈ 2 −7.67 . Setting c = 14, meaning |L (0) | = 3 · 2 14 , suffices and gives N > 1.35. It needs to be clarified that in the 8-sum attack's implementation, the effect of keeping |L (i) | = 3 · 2 14 is more visible. In particular, we might discard all 'good combinations' when N is very close to 1. We adapt by allowing |L (1) | and |L (2) | to be at most 2 · |L (0) |, then directly filter combinations from |L (2) | with c ω . Therefore, the complexity is slightly higher than the theoretical estimate provided in the previous section. We suppose said negative impact can be mitigated when c is large as the vectors in |L (i) | might be more evenly distributed among buckets. The simulation returns 1.52 low weight vector on average after 10 2 tests.
In the random case, the probability of having a vector having Hamming weight up to c ω is: which yields N random ≈ 2 −34.58 .

A key-recovery attack on Firekite
In this section we show a possible way to turn the distinguishing attack into a key-recovery attack with slightly higher complexity. We focus on the recovery of the secret m × d matrix M ′ . First, we recall that the secret matrix M in Firekite is constructed by choosing a part of the bigger b × b matrix Q as described in Section 2. We pick q 1 U ← − Z b 2 and defined rows in the matrix Q as q i = X i−1 q 1 , i = 1, . . . , b, i.e., by shifting the first row to the left consecutively b − 1 times. The secret matrix M is obtained by dropping the last b − n columns of Q and keeping only the first m rows. The secret key is only the random b-bit vector q 1 rather than a m × n matrix M. Let the unknown bits in q 1 be written as q 1 = (k 1 , k 2 , . . . , k b ).
More specifically, we can now see that if M = [m ij ] m×n then every entry in the matrix M corresponds to an unknown key bit. As M ′ is the first part of M, the same holds for M ′ .
We can view M ′ as a generator matrix that spans a code C. But there are many generator matrices spanning the same code. One particular case is when M ′ is transformed to the systematic form, that is M ′′ = [I J], where I is the m × m identity matrix and M ′′ = SM ′ for some m × m unknown matrix S. We assume , v ∈ Z m 2 }. In this case, entries in J are linear combinations of the secret key bits. Therefore, we consider all entries in J as unknown. There is an assumption here that the fist m columns of M ′ are linearly independent, which is adopted.
The key-recovery attack consists of running the aforementioned distinguisher, finding several zero sum vectors, and then deducing M ′ . We first show how to derive the secret key from such zero sum vectors if we assume that the first m positions are all error free (no double error). Again, a zero sum vector fulfills Therefore, finding a zero sum amounts to knowing the corresponding ℓ j=1 e ij . We now consider a single g ij vector. Its positions can be split in two parts, namely those for which we know that they are (most likely) error free mod 2 (since the error vector is zero in this position) and those for which we do not have knowledge of, since one of the ℓ involved vectors has an error. Note that the first t · c positions are error free, but there are, on average, c ω positions where at least one of the eight vectors will have an error.
If the first m positions are all of the error free type, one can write We further have roughly d − t · c − c ω additional positions to be error free. For each such position, we can form a linear equation. Denote by J q the q-th column of J. Assume a position q > m is error free. Then Here g ij (q) denotes position q in vector g ij , etc. Since g ij (q) and v ij are known, it gives a linear equation in the unknowns of vector J q . Collecting many such equations will enable us to derive J q and eventually, the full J matrix. However, such approach is not adequate as we have seen in Section 4 that, for interested parameters, double error probability can not be disregarded. Hence, we need to consider a more complicated approach where we try to detect columns with double errors. Assume we have found N zero sum vectors with the 8-sum distinguisher. Let the first seven vectors in the first zero sum vector ℓ j=1 g ij be denoted g 1 , g 2 , . . . , g 7 , the first seven vectors in the next zero sum vector be denoted g 8 , g 9 , . . . , g 14 , and so forth. We construct a matrix where g i are row vectors. Now we will examine the columns and the related known error vector for the corresponding zero sum vector. Recall that the first t · c columns are error free mod 2, by virtue of Algorithm 2. For the remaining columns, if ℓ j=1 g ij is zero in the q-th position for all detected zero sum vectors, the corresponding column G q is free of 'direct errors' and is kept. If it is not true, then we discard the column. After this process, we have a new matrix G ′ of length t · c + U , where U is the number of columns kept in the previous step. If t · c + U > 7N > m there will be low-weight codewords in the code spanned by G ′ . Namely, if 7N > m then there are linear combinations of rows that correspond to a zero codeword plus error terms. As we have removed all 'direct errors', the only error contribution in the code must come from double errors.
Each double error will give either a 0 or a 1 as contribution in that position. Assume there are D double errors in the columns in G ′ , then we expect to find codewords in the code spanned by G ′ that have weight around D/2. More, because we can form 2 P −m different combinations of rows that sum to zero in the underlying code, we have 2 P −m low weight codewords of weight around D/2. Finally, every column in G ′ can be found to contain a double error or not as follows. If this position is zero in all (or almost all) low-weight codewords, there is no double error. Otherwise, we detected a double error in that position. From this information, it is possible to do a full recovery through additional steps. A description of this key recovery attack is given in Appendix, Algorithm 3.
There arises a problem of finding enough columns free of direct errors. When the length n is small, there will be very few columns free of direct errors and the length of G ′ is not enough to have low-weight codewords. Therefore, in the following example, we choose a large n instance of Firekite to illustrate our attack.

Example of the key recovery attack on Firekite with n = 16384
Consider the Firekite parameters choice m = 216, n = 16384 and k = 216. The attack works as follows. First we run the 8-sum distinguisher to obtain zero sum vectors. In this case, we need to have slight more than m/7, e.g., 32 zero sum vectors. Applying (5), we choose c = 60 generating N ≈ 3.5 zero sum vectors using the 8-sum distinguisher. Instead of repeating the distinguishing attacks 32 times or, equivalently 2 5 more work, we can instead increase the initial list size to 3 · 2 61 to obtain sufficient low-weight sum (N ≈ 41), with an affordable complexity of 2 69.87 . We denote this cost by C distinguishing .
We now consider a matrix G of dimension P = 32 · 7 = 224 consisting of the g i vectors as its rows and we then remove columns with direct associated errors. In a zero sum vector, there are 216 · 8 errors inserted, so a positon is error free with probability (1 − 1/(16384 − 61 · 3)) 216·8 ≈ 0.899. In our case, we want the position to be error free in all 32 zero sum vectors, which brings the probability to about 0.033. Since d = 13144, we can have about 432 columns error free. We then form the matrix G ′ which is of dimension P = 224 and length 61 · 3 + 432 = 615. There will be 2 224−216 = 2 8 codewords in the code spanned by G ′ with support corresponding to the double errors.
By computing the likelihood of double errors, we find that a column in G ′ is error free with probability at most 0.995 32 = 0.85. For instance, consider a simple case where there is no non-repeating double error at position j-th of a zero sum. Then the probability is One can expect 615 · 0.15 ≈ 93 columns to have double errors. In conclusion, the code spanned by G ′ will contain 2 8 codewords where the weight is distributed around 47. Finding low-weight codewords in a random binary linear code is a well-known problem that has been studied extensively. One can use ISD algorithms to complete the task. For our example, an improved Stern's ISD algorithm 11 yields the bit-complexity estimate, denoted C ISD , to be 2 44.6 , which is small compared to the distinguishing step.
A random linear code with dimension 224 and length 615 will have an expected minimum distance of about 100 according to the Varshamov-Gilbert bound, so the low weight codewords would come from the observation above. Finally, generating say 16 such low weight codewords, we look for the positions where all the 16 of these codewords are zero. This would be the case for more than 500 positions and in this way we have identified 500 columns that are completely error free. Using a selection of them as the information set of the code we can recover remaining parts of the code M ′ . The total complexity is therefore C = C distinguishing + C ISD ≈ 2 69.87 + 16 · 2 44.6 ≈ 2 69.87 .

Discussion and Conclusions
Having seen how Firekite is vulnerable to our distinguisher, especially the 8-sum distinguishing attack, it is natural to ask how we can make Firekite and other similar ciphers resilient to a generic birthday problem solving algorithm. From the result and performance of our attacks, there are certain approaches one can consider. First, we observed that N random , or in other words, the failure probability inflates when the filtering weight c ω grows. That is to say, unless c ω is very small compared to d, it is difficult to distinguish Firekite's zero sum vectors from those that could stem from random vectors g i . Therefore, instantiating Firekite with larger k can be beneficial. Second, we have discussed that the attack complexity depends on the parameter c which is solely determined by m (if we fix ℓ), the number of rows in M. Therefore, if the security level is close to m/(1 + log ℓ), our attack becomes infeasible. As a contribution to Firekite's design criteria, we propose a few modifications as follows.
For small Firekite's parameters, one can increase k slightly, which yields an LPN instance with a higher noise rate; therefore more difficult to solve in general. In our estimate, larger k suggests a drastic decrease in P nf and an increase in N random . It is now exceedingly unlikely to have no error in the first t · c bits and d becomes smaller, which makes it more difficult to distinguish the zero sum found by Algorithm 2 from those stemming from the random case. As an example, by setting k = 24 for the instance n = 1024, m = 216, our attack was rendered useless as N random is always larger than 1. This comes at the cost of decreasing the number of bits encrypted per invocation; hence more instructions need to be executed per bit. However, larger parameter instances of Firekite are less affected by this 'fix' as d becomes large relatively to k. For instance, our 8-sum attack still succeeds with n = 16384, m = 216 despite raising k from its original k = 216 to k = 400. We only need to slightly increase |L (0) | = 3 · 2 67 and we still obtain a good failure probability as N random ≈ 2 −2835 . An extreme adjustment such as k = 600 gives Firekite resistance to our attack. We apply this idea to our simulation with toy parameters to verify the countermeasure (see Appendix, Example 3). Finally, we may discuss possible future improvements to the proposed attack. We believe that there can be a possibility to gain some small amount in terms of decreased complexity by smaller changes in the distinguishing algorithm. One idea could be to not only allow sums of vectors that sum to zero in c positions, but also those that have weight 1 in these c positions. Still, it would not change the complexity significantly and with modified parameters as suggested above the Firekite should meet the intended security level. Moreover, since our approach relies heavily on GBA, it would be amenable in principle to quantum search approach, e.g., using result in [NPS20].
unify ISD-algorithm variants (Prange,Stern,MMT,BJMM) in a Nearest-Neighbor framework. They also provided a complexity estimator for independent parameters. The probability of no double error in the first 2 · 76 bits is: The probability of one double error in the first 2 · 76 bits is: Therefore, P 0 + P 1 ≈ 2 −14.57 , and more importantly, this 'better approximation' of P nf does not affect our algorithm significantly.

Example 3. (Improve Firekite by increasing noise level)
Recall in our simulation, the parameters are n = 256, m = 52, k = 4. We increase k to k = 7 and apply the 8-sum distinguisher again with c = 14. One can verify, with the new parameters, that N random ≈ 3.15.
As discussed in Section 4, due to non-uniformity of the input, we exhaust all combinations in the last Combine step. On average, there are 2 18 vectors in L (3) which raises N random (in our simulation) to 2 18