Vectorial Decoding Algorithm for Fast Correlation Attack and Its Applications to Stream Cipher Grain-128a

. Fast correlation attack, pioneered by Meier and Staﬀelbach, is an important cryptanalysis tool for LFSR-based stream cipher, which exploits the correlation between the LFSR state and key stream and targets at recovering the initial state of LFSR via a decoding algorithm. In this paper, we develop a vectorial decoding algorithm for fast correlation attack, which is a natural generalization of the original binary approach. Our approach beneﬁts from the contributions of all correlations in a subspace. We propose two novel criteria to improve the iterative decoding algorithm. We also give some cryptographic properties of the new FCA which allows us to estimate the eﬃciency and complexity bounds. Furthermore, we apply this technique to the well-analyzed stream cipher Grain-128a. Based on a hypothesis, an interesting result for its security bound is deduced from the perspective of iterative decoding. Our analysis reveals the potential vulnerability for LFSRs over matrix ring and also for nonlinear functions with biased multidimensional linear approximations such as Grain-128a.


Introduction
Stream ciphers are a widely used class of symmetric-key cryptosystems.A key stream sequence is generated from the initial state derived from the key.The plaintext is encrypted by XORing with the key stream of the same length.
Linear feedback shift register (LFSR) based stream ciphers form an important class of stream cipher system, in which one or more LFSRs are often used.LFSRs could be defined over different algebraic structures, such as finite fields and matrix rings.Besides LFSR, these ciphers usually adopt a nonlinear filter function or a finite state automaton (FSM) with a nonlinear function.The history of these ciphers can be traced back to decades ago, e.g., LILI-128 [CDF + 02], the SNOW family [EJ00, EJ03, UEA06, EJMY19] and the Grain family etc.
The Grain family includes three well-known stream ciphers: Grain-128a [ÅHJM11], Grain-128 [HJMM06] and Grain-v1 [HJM07].Grain-v1 is in the eSTREAM portfolio and Grain-128a is standardized by ISO/IEC [29115].All the members of the Grain family share a similar structure.Several lightweight ciphers proposed recently also adopt similar structures [AM15,AHMN13,MAM16].However, the Grain family is reported to be vulnerable to fast correlation attacks (FCA) in CRYPTO 18 [TIM + 18].After that, the same FCA approach is applied to Grain-like small state stream ciphers such as Plantlet, Fruit-v2 and Fruit-80 [WLLM19].
FCA is pioneered by Meier and Staffelbach in 1989 [MS89].Generally speaking, FCA exploits the correlation between the key stream and the state or the outputs of LFSR.The problem of recovering the initial state of LFSR is transformed into a decoding problem.The linear part of the stream cipher is treated as a linear code, and the nonlinear part of the stream cipher is treated as noise.According to the differences in decoding strategies, these FCA approaches can be roughly divided into two classes.
The first class adopts a one-pass decoding algorithm.For example, the FCA adopts convolution codes and Viterbi decoding algorithm [JJ99b], which is improved by turbo codes [JJ99a].Another FCA adopts maximum likelihood decoding on a reduced set of information bits [CJS00].The parity-checks are usually folded to eliminate partial bits.List decoding and polynomial reconstruction can also be applied in FCA [MFI02,JJ00].An important improvement is accelerating the parity-check evaluations by fast Walsh-Hadamard transform (FWHT) [CJM02].This technique is applied in cryptanalysis of the stream cipher E0 [LV04].It was later generalized to extension fields and applied to stream cipher SNOW 2.0 [ZXM15].A recent improvement of FCA is based on commutative property and applied to Grain family [TIM + 18].
The second class adopts a probabilistic iterative decoding algorithm.After Meier and Staffelbach's original FCA, low-density parity-check code (LDPC) is introduced into FCA to improve the iterative decoding algorithm [CT00].There are many related works in this area, such as [ÅLHJ12, CT00, Gol01, CGD96, GH05, MG91, MG93].Intuitively, iterative decoding algorithm seems to be more powerful, as their decoding abilities are closer to Shannon's bound [Sha48].However, compared with the FCA decoding by information set, it has some inconveniences.Firstly, it is usually very hard to describe its properties by mathematical language, e.g., the relationship among the number of parity-checks, the decoding ability and the noise distribution.Thereby, it is hard to derive a clear time/space/memory complexity as expected, sometimes even for toy ciphers.Secondly, although multidimensional linear approximations may have advantages in cryptanalysis stream ciphers [ZXM15], it still lacks a convenient iterative decoding algorithm to work with the multidimensional linear approximation.For these reasons, the application of the FCA based on an iterative algorithm to modern stream ciphers is very limited.In addition, in terms of iterative algorithms, even though some principles are discussed to avoid the iterative process being trapped into a "tie" state too early [MG91,CGD96], how to get rid of the tie state is also need to be considered.

Our Contributions
Firstly, we propose a vectorial iterative decoding algorithm for fast correlation attack, which generalizes Meier and Staffelbach's original FCA very naturally.The vectorial approach benefits from a multidimensional linear approximation, while the binary version only exploits a binary linear approximation.Moreover, the quantity of parity-checks fitting for the vectorial algorithm is much more than that for binary algorithms.We propose two novel criteria to improve the iterative decoding process.One is proposed for breaking the tie state, which may be also applied for some binary iterative algorithms.We also perform a scaled experiment to verify the validity of the vectorial algorithm and perform another experiment to verify the generality of the idea for breaking a tie state.
Secondly, we give some cryptographic properties for the first iteration via distribution approximations, which allows us to describe the relationship between the decoding efficiency and the noise distribution.We also give two propositions that involve the relationship between the number of parity-checks, the noise distribution and the data complexity.The first one illustrates the number of parity checks needed when the noise distribution is fixed.In addition, by the first proposition, we show that the vectorial algorithm may have theoretical advantages in data complexity for some cases.Given the number of parity-checks and the noise distribution, the second one shows the length of data needed to correct errors.These results reveal some theoretical constraints of the complexity, which provide us a more clear profile of the complexities of the FCA based on an iterative algorithm.
Finally, we apply those results to the well-analyzed stream cipher Grain-128a.In the first step, we construct a multidimensional linear approximation by bundling up the linear approximations proposed in [TIM + 18].In the second step, based on a hypothesis there are parity-checks with two taps or with a special form, we give a data complexity estimation by the proposed theoretical bounds for the vectorial algorithm.The result shows maybe its potential security margin is lower than we thought from the perspective of vectorial iterative decoding.Our analysis reveals the potential vulnerability for LFSRs over matrix ring and also for nonlinear functions with biased multidimensional linear approximations.

Outline
The rest of the paper is organized as follows.Section 2 is preliminary.Section 3 describes the details of the vectorial decoding algorithm and the scaled experiments.In section 4, we propose some cryptographic properties.How to apply the new FCA to Grain-128a is explained in section 5. Section 6 shows the limitations and open problems.Finally, we conclude the paper.

Notations and Definitions
Some notations are introduced for convenience.
• Given 2 binary row vectors x = (x 1 , . . ., x n ) ∈ F n 2 and y = (y 1 , . . ., y n ) ∈ F n 2 , their inner product is denoted by x • y = ⊕ n i=1 x i y i .The Hamming weight of x are denoted by wt(x).
• Let F : F m 2 → F n 2 denote a vectorial Boolean function.A binary linear approximation of F with m-bit input mask u = (u 1 , . . ., u m ) and n-bit output mask pair v = (v 1 , . . ., v n ) can be represented by u•x⊕v•F (x).When we have 1 < a ≤ m+n linearly independent mask pair (u 1 , v 1 ), . . ., (u a , v a ), a vectorial (or multidimensional) linear approximation is denoted by U x ⊕ V F (x), where the i-th row of (U, V ) is (u i , v i ), x are treated as a column vector unless otherwise stated.
• Linear correlation is used to measure the bias of a binary linear approximation. Let , w is an r bits binary linear mask, the correlation of linear approximation with mask pair (wU, wV ) is where w is treated as a row vector.
• Let a ∈ F m 2 denote a binary vector.There is an integer a = m−1 i=0 a i+1 2 i corresponding to a.For convenience, we alternatively use them if there is no ambiguity in the context, especially as a subscript.For example, for a probability density function p (0,...,0) , . . ., p (1,...,1) , we mean the same thing when denote it by (p 0 , . . ., p 2 m −1 ).
• Let M m (F 2 ) denote the m×m matrix ring over F 2 .Given a LFSR with rank d and mbit cell, its generator is denoted by where C d is nonsingular and E is the identity matrix.The number of information bits of L(x) are denoted by • Give 2 positive integers a and b with gcd(a, b) = 1.The b-cyclotomic coset modulo a containing i is denoted by C i = {i, ib, . . ., ib r−1 } mod a, where r is the smallest positive integer such that ib r ∼ = i mod a.The minimal integer in C i is called coset header and denoted by ī.All coset headers form a set R b,a .
The notation a b implies that there is at least one 1 ≤ j ≤ n satisfying a j > b j , while has reverse meaning.

Walsh-Hadamard Transform
Walsh-Hadamard transform is a spectral tool widely used in cryptanalysis.Let X ∼ P denote a discrete random variable which take values in F m 2 .The Walsh-Hadamard transform of X is defined by Since Walsh-Hadamard transform is a linear operator for XOR, let X = X 1 ⊕ X 2 ⊕ • • • ⊕ X k , we can efficiently compute probability distribution of X with the help of the convolution property

Square Euclid Imbalance
Relative entropy (or KullbackâĂŞLeibler divergence) is used to measure the difference between two probability distributions P and Q, i.e., If p(x) is close to q(x), i.e., p x = q x + (x), the relative entropy could be approximated by . The summation term is usually called capacity, and denoted by C(p q).Square Euclid Imbalance (SEI) is defined to be the capacity between a probability distribution and uniform distribution, i.e., The following theorem reveals the relationship between SEI and linear correlation.

Theorem 1 ([BJV04]
).Let X ∈ F m 2 be a random variable with density function p x , then its SEI where (x) = p x − 2 −m , ˆ (w) denotes the FWHT of (x).For convenience, we use ∆(p) if x is well known in the context, or ∆(X) if the random variable X with density function p(x) is clear.Particularly, we have c 2 (e) = ∆(p) when m = 1.

Parity-Check and Characteristic Polynomial
A parity-check corresponds to an equation that fulfills the LFSR output sequence x t .For example, it is well known that any multiples of L(x) ∈ F 2 m [x] is a parity-check.Usually, only those very sparse parity-checks with a low degree are exploited in FCA.
Let set H(τ + 1, d) denote all parity-checks with τ + 1 taps and degree at most d, abbreviated by H without ambiguity.The available parity-checks at position n denoted by H (n) ⊆ H. Suppose a parity-check for sequence x t is denoted by where G n is nonsingular.Its characteristic polynomial is denoted by where A denotes the companion matrix

A Brief Description of Original FCA
Meier and Staffelbach's original FCA includes a precomputation phase and a decoding phase.

Precomputation Phase
Let LFSR's generator polynomial L(x) ∈ F 2 [x].The purpose of the precomputation phase is to find sufficient very sparse parity-checks with a low degree, which is a hard open problem.One way recommended by Zeng [ZYR91] is evaluating logarithms in finite fields of characteristic 2. It is rather efficient to find low weight multiples, but the degree is not promised to be low.Another way is by extended K-tree algorithm based on general birthday collision [NS15].The extended k-tree algorithm can be used to find low-weight multiples of a polynomial with not-so-large degrees with flexible parameters.

Decoding Phase
The decoding phase targets to recover the initial state of LFSR from key stream.Suppose we have found sufficient suitable parity-checks x n ⊕a is the sum of τ key stream bits corresponding to x n−l k .The nonlinear part of a stream cipher is modeled as a binary symmetric channel (BSC), the crossover probability is p = Pr[x n ⊕ z n = 1].The critical part of the decoding phase is calculating a posterior probability (APP) with prior distribution symbol by symbol.Suppose that the check values are all 0 for a subset H 0 ⊆ H, then by Bayes' formula, where each s i = s(p l1 , . . ., p lτ ) = Pr[a l ] depends on the probability of τ symbols involved in parity-check.Moreover, s l can be calculated recursively in the BSC model s(p l1 , . . ., p lτ ) = p lτ s(p l1 , . . ., p lτ−1 ) + (1 − p lτ )(1 − s(p l1 , . . ., p lτ−1 )).
The specific process is depicted in Algorithm 1.For more details, we refer to the original paper [MS89].

Algorithm 1 Meier and Staffelbach's binary iterative decoding Algorithm B
Input: A key stream sequence z of length N and H.
For iteration i from 1 to a small integer do 4.
Calculate APP p * from priori probability p, assign p * n = p n for all position n. 5.
Complement the bits of z with p n > p thr .8.
Reset all positions to initial probability p. 9.
If z satisfies all parity-checks then, break; EndIf 10.EndFor 11.Terminate with x = z.

Channel Model
The FCA based on binary linear approximations usually deploys the binary symmetric channel (BSC).Similarly, when the transmitted w-bit word is x, and received word is z = x ⊕ e, we can model it as the symmetric channel (SC).Its transition matrix has the following properties.Each row is a permutation of another row, and so as to columns.Moreover, the sum of each row equals 1 by the definition of SC.SC can be treated as an extended BSC.Its channel capacity is C = w − H(r), where r denotes a row of the transition matrix.Suppose we have a linear approximation with dimension m, i.e., i∈{1,...,#Tx} j(i)∈Tx where T x and T z are sets of indexes related to linear approximation, all U i and V i are m × w matrices over F 2 , both x j(i) and z j(i) are w-bit vectors, e is a m-bit noise.Similarly as BSC, the channel noise vector e is XORed to i∈{1,...,#Tx},j(i)∈Tx U i x j(i) , and the output is i∈{1,...,#Tz},j(i)∈Tz V i z j(i) , see Fig. 3.1.
Remark 1.When we are discussing a generic multidimensional linear approximation, we can always obtain a linear approximation with form U x ⊕ V z , i.e., only including one input vector x and one output vector z , for example, by rewriting x to a larger input vector of dimension w × #T x .Thus the rank of U becomes larger than those U i .However, although we are interesting to those linear approximations with large dimensions and large SEI, the SEI is hard to always increase sufficiently as the dimension increases.Thus we pick multidimensional linear approximation with form (3) as a generic form.

Checking Parity with Vectorial Noise
Let l ∈ H (n) denote a specific parity-check: where E is the w × w identity matrix, and all G i are w × w square matrices, which is a common case for the LFSR over the finite field F w 2 or the matrix ring M w (F 2 ).
LFSR SC e ( ) , ( ) In order to check parity over matrix ring, we require that for each and m × m matrices respectively.Thus we have According to (3), we can sum them up, and check parity by substituting those U i x n+j(i) , . .., U i x n−d+j(i) with the observed values.Thus we have where k(j) ∈ T x , k (j) ∈ T z , G 0 = E, and e n−i is a m-bit noise vector.Consequently, the purpose is to determine e n−i of each position, when observing . This process can be done for all parity-checks in H (n) .Notice that the approach here is generic.When the parity-checks and linear approximations have special forms, a more efficient checking approach is feasible, see section 5.2.To describe the effect of these parity-checks, we divide them into two sets.Let H I include those parity-checks whose coefficients are all E, while H II includes the rest.They are called type I and type II parity-checks respectively, which play different roles in the iterative decoding phase.
Notice that there is no need that all G i = E as in linear distinguishing attack in large alphabets [YJM20], which is expected to have a very high degree.For example, the degree of those special parity-checks with weight 4 of SNOW 3G is expected to be 2 172 .

Iterative Process
In this subsection, we consider how to extract information from a noisy sequence by a vectorial iterative decoding algorithm.Firstly, we try to generalize the original Algorithm B, then improve the iterative criteria.
Let #H (n) = h denote the number of parity-checks with τ + 1 taps at position (or clock) n.Let e 1 . . .e N denote the sequence of noises, and z 1 . . .z N denote the derived sequence from key stream z 1 . . .z N by i∈{1,...,#Tz},j(i)∈Tz V i z j(i) .The initial priori distribution P is the same for each e n , which is derived by linear approximation.Let p 2 ] denote its density function, then the APP p Algorithm 2 Calculate the nominator G li e n−li by FWHT and convolution property.4.

Algorithm 3 Vectorial iterative decoding
Input: The sequence z of length N derived from key stream, The sequence of noises e with initial p.d. p, The parity-checks set H with τ + 1 taps.parameters: Maximal rounds R, maximal iterations T and minimal gap G to infuse new noises.
If p End For. 10.
else if computed by Bayes's formula.
We always assume e n−li are independent and all parity-checks are orthogonal.As can be calculate by convolution property and FWHT.Thus, the nominator and denominator can be computed by Algorithm 2.
The vectorial iterative decoding algorithm is listed in Algorithm 3. In line 1, the prior distribution is initialized with the noise distribution p, and a global empirical vector E glb is initialized with 0. In line 3 of the main loop, the round empirical vector E rnd and the coin ζ are all set to be zero.In line 5 of the iteration loop, an iteration empirical vector E itr is set to be zero.For each symbol, we compute the app and increase the empirical vector E itr , see line 7 and 8.If E itr is still increasing, then we assign E rnd with E itr and pri with app, and continue the iteration, see line 10.Otherwise, if E rnd is close to E glb , i.e., the algorithm may correct very few errors, we inject an appropriate noise sequence and break the current loop, which is a new criterion, see line 13.Otherwise, we choose a coin ζ which is likely to correct more errors and break the current loop in line 14.The complements in Algorithm 3 are applied on the derived sequence z .The n-th position z n is changed to z n ⊕ ζ when the noise e n is determined to be ζ and the complement is performed.If z satisfies all parity-checks at the end, we just deduce that all e i = 0. Thus with the help of LFSR's feedback polynomial, the initial state of LFSR can be recovered.The criteria that are used to break the iterative loop and trigger a resetting process are the main factors affecting the speed of convergence [CGD96,MG91].

Iterative Criteria
The criteria used in the vectorial algorithm could be summarized in two points.
Criterion 1. Passing through sufficient iterations before breaking up and resetting, which corresponds to line 7-10 and 14.More specifically, if new app strengthens the empirical complement effect and iterations are less than maximal, then continue iteration by Bayes's rule.Otherwise, select the complement coin which has the potential largest empirical complement effect.
Criterion 2. When the empirical complement effect is weak from the previous round to the current round, a sequence of very biased noises is infused in order to break the tie caused by the self-combination property of LFSR.The noises' SEI is required to be appropriate, neither very large to counteract the previous decoding work nor very small to break the tie.
These criteria are motivated by scaled experiments when parity-checks are not so many, and proposed for different purposes.On one hand, notice that if the loop is broken when achieves a preset threshold as in Algorithm B, it is easier to be triggered in the earlier rounds than in the later rounds.However, when a complement is performed very early without passing through enough iterations, it will pull the algorithm into a tie state very early and weaken the decoding efficiency.A tie state represents that the decoding algorithm reaches the point where the iterations fail to improve the correction of the key stream sequence.To improve this, Criterion 1 is proposed to avoid converging to a tie state too early.We hope it will help to correct errors as many as possible in each of the early rounds.
On the other hand, notice that once the algorithm entered into a tie state, i.e., the number of right complements and the wrong complements are almost equal, the correcting effect is very weak.However, If we only reset the iterative process, the experiments show that the new process will enter into the tie state again without profits.The main reason is that the noise errors are no longer independent with the parity-checks in higher rounds.Therefore, a new sequence of biased noises is XORed to the sequence z to get out of the trap.We hope it will improve the convergency to some extent.
Intuitively, a tie state is likely to appear, when it is close to the bound of decoding ability.Let e = e 0 , e 1 , . . ., e N −1 denote the current noise sequence after many rounds.In higher rounds, the check result maybe indicate the e i is a swing error for many i ∈ {0, 1, . . ., N − 1}, i.e., either to be 0 or ζ.Complementing those e i with e i ⊕ ζ will have no profits.However, injecting noise n maybe weaken the dependency between e and the parity-checks.The new sequence e ⊕ n maybe far away from the original e in the sense of the check result.Thus it may pull the algorithm out of the tie state when both the number of parity-checks and the SEI of initial noise are not too small.Thus we require that the SEI of the infused noise is not too small, or it will counteract all previous iterative correcting processes.Meanwhile, the SEI of the infused noise should not be too large, or the pull will be too small to get out of the trap, as the changes for e are not enough in this case.As for the binary case, i.e., e is a binary sequence, the difference is that a tie state maybe satisfy that almost half of parity-checks hold.Thereby, it seems that the idea may be applied to some binary iterative algorithms.The experiment results in the next subsection also provide an evidence.
Remark 2. Regardless of the differences in criteria, the original FCA proposed by Meier et.al. can be treated as a special case of the new FCA with dimension m = 1.The coefficient matrices of LFSR degenerate to scalar elements in F 2 .Therefore, the commutative condition for coefficient matrices of parity-checks is not needed to be considered.The multidimensional linear approximation degenerates to a binary linear approximation, and ζ must be 1.

Small Scale Experiments
In this section, we perform scaled experiments to verify the vectorial iterative algorithm and the idea of Criterion 2. The first scaled experiments show the validity of the vectorial iterative algorithm.The second experiment shows that the idea of Criterion 2 may be applied to other binary iterative algorithms.
The first experiment settings are as follows.The generator polynomial of LFSR is where α is the primitive element of F 2 2 .The output of LFSR at time t is x t .The noise stems from a SC channel instead of nonlinear part of a stream cipher.The target is recovering LFSR output sequence x 1 x 2 . . .
As stated in Section 3.3.2, the new Criterion 2 may help to improve some other binary algorithms.We take Algorithm B in [MS89] and MIPD Algorithm in [CGD96] as examples to perform another scaled experiment.Thereby, we have 4 binary algorithms, i.e., the Algorithm B, the MIPD Algorithm and their modified versions with Criterion 2. The parameters of the experiments are as follows.The length of the LFSR over F 2 is 32-bit.The noised bit z t is derived by XORing the output of LFSR x t with noise bit e t .The probability of error Pr(e t = 1) = 0.378 for the Algorithm B and its modified version, while the value is 0.371 for MIPD and its modified version.For both cases, we use 9 parity-checks with 3 taps and 2 20 bits data (key stream).For Algorithm B, we inject a noise sequence whenever N w is still very small after the iterations.For the MIPD Algorithm, we inject a noise sequence whenever the number of holding parity-checks changes is very small.Figure 3 illustrates the first 175 rounds of bit-error ratio (BER) for the 4 algorithms.Noticed that the way we embed Criterion 2 into the original algorithms may not be optimal.However, Criterion 2 still improves the convergence in both cases.The results imply that Criterion 2 maybe also work for some other binary algorithms when they approach the decoding boundary.It seems that Criterion 2 has better performance in Algorithm B than in the MIPD algorithm.The main reason is that the complements are performed in each iteration, which means that the infused noises may be eliminated slowly.Thus we injected a slight noise sequence.However, the Algorithm B complements z t after some iterations in each round, which means the infused noises may be cleared fast.Thereby, in order to make the improvement more obvious to be illustrated in a figure, we choose a slightly smaller initial probability in comparison MIPD Algorithm with its modified version.

A Discussion about the Potential Advantages
Although the new criteria may be effective, we can not directly compare the vectorial algorithm with a binary algorithm.The reason is that it is hard to compare them under equal status.Firstly, as the same number of vectorial parity-checks and binary paritychecks does not mean the equivalent comparison condition, it is hard to choose the two numbers to make the two algorithms stand at the same starting line so far as we know.Secondly, the corresponding concept is the bit error ratio (BER) in the binary case instead of the word error ratio (WER).The vectorial algorithm aims at small WER, while the binary one aims at small BER.Small BER does not strictly mean small WER and vice versa.These differences prevent us from giving a precise and fair comparison.
However, since the purpose of proposing the vectorial decoding algorithm is to deploy the multidimensional linear approximation, whose SEI may be significantly larger than that of the binary linear approximation, we can observe some cases where the vectorial algorithm may have advantages.For example, let the l-bit LFSR is defined in F 2 m , where l is a multiple of m, the SEI of the multidimensional linear approximation is 2 −γ .Since the number of parity-checks should not be smaller than 2 m −1 , i.e., the equation (18) in Section 4.2.1, we need at least 2 γ/2 (2 m − 1) parity-checks with 3 taps.Thus the length N of data needed satisfies (2 m − 1) 2 2 −l N 2 ≈ 2 γ/2 (2 m − 1) by a birthday . For the vectorial case, N seems to be smaller than the binary case, because that m > 1 and γ is expected to be smaller than the binary case.For more details about these relationships, we refer to Section 4.2.1.

Convergence Property
It is necessary to figure out the convergence property when iteratively computing APP.Intuitively, we hope that APP p * (n) ζ increases when noise variable e n = ζ and decreases when e n = ζ.Its expected value is computed as follows.
Example 1.Let the generator polynomial of LFSR L(x) ∈ F 2 2 [x] with degree 16.We get the increasing and decreasing ratios in Table 1 when exploits 3 type I parity-checks with 3

Decoding Efficiency
In algorithm B, a threshold N thr is computed to promote the efficiency of the complements.It is determined by the intersection point of two shrunk normal distributions, In the multidimensional case, the intersection point becomes an intersection curve (surface).The threshold reflects the correcting ability of the first iteration in the binary case.Although we do not need such a threshold to promote efficiency in the vectorial case, it still reflects the decoding efficiency from the first iteration.Thus we discuss how to estimate the correcting ability by measuring the volume of the intersection area in this subsection.
Let N thr ζ denote this threshold corresponding to ζ.Without loss of generality, we assume that the priori probability distribution P of noise sequence e 1 . . .e N s.t.p 0 ≥ p 1 ≥ . . .≥ p 2 m −1 > 0. Suppose that a random variable X ∼ P , we require that the distribution of new random variable G li X still has 0 as the maximal value point 1 .It surely holds when G li is nonsingular.This requirement may reduce the number of available parity-checks, but it simplifies the analysis of the effect of parity-checks.
Let X 1 , . . ., X τ denote τ independent random variables all follow P .Let Q denote the distribution of their linear combination τ i=1 G li X i .Thus Q still has 0 as its maximal value point, which could be deduced from the convolution property and Walsh-Hadamard transform.Particularly, if all G li = E, Q preserves the order of P , i.e., q 0 ≥ q 1 ≥ . . .≥ q 2 m −1 > 0.
The approach to calculate N thr ζ is inspired by the fact p * ζ is large when more check values appear to be ζ.Let q c = Pr[ τ i=1 G li e n−li = c] denote the probability that the τ taps sum to be c for parity-check l.Obviously, q c depends on the individual parity-check.This phenomenon makes it very complicated to calculate the threshold N thr ζ .To simplify the calculation, we divide all parity-checks into two sets H I and H II according to its coefficients, then deal with them separately.
The set H I includes all parity-checks whose coefficients are all identity.For this class, q c is obviously independent of parity-checks.Let #H I = h I , the probability the current noise e = ζ and x i check values equal i, i ∈ {0, . . ., 2 m − 1} is as follows 2 where 1 Minimal value point is similar.We assume that p 0 is minimal instead.
2 Actually, check values are vectors in F m 2 , here we use integers i to denote the same thing.Obviously, x = (x 0 , . . ., x 2 m −1 ) follows multinomial distribution Multi(h I , q ζ ) with parameter q ζ = (q ζ , . . ., q 2 m −1⊕ζ ).Its density function is denoted by q(x, ζ).For convenience, we introduce notations Let A(ζ) be a subset of all possible x.Once we complement those noises with ζ = 0 when the vectors in A(ζ) are observed, the expected number of correctly complemented noises and erroneously complemented noises are respectively where N denote the length of data.All the other cases of complements are neutral.Thereby, the number of actual corrected positions is the difference Given P and H I , if we can find a set A(ζ) maximizing I(P, A(ζ), ζ, 0), then the expected number of actual corrected positions of each complement should be maximized.Firstly, we observe that the means of the two multinomial distributions are h I q ζ and h I q 0 respectively.Therefore, similar as the binomial case, there is a set A(ζ) of x in which I(P, A(ζ), ζ, 0) takes non-negative value.
Since given x, The time complexity is about ).When h I is large and q is not near the boundary of the parameter space, multivariate normal distribution approximation is suitable.Multi(h I , q) could be approximated by N (µ, Σ) with density function where superscript T denotes transposition, mean vector µ and covariance matrix Σ are determined by Multi(h I , q).Therefore, the area A(ζ) maximizing the multiple integral should be part of a hypercube with dimension 2 m − 2 that restricted by the 2 m − 1 coordinate plane and two surfaces Ω 1 : Notice that Ω 2 is a quadratic form in the real field, the multiple integral (12) can be computed by repeated integral.Once A(ζ) is determined, the threshold can be calculated by volume integral Example 3. Let the probability distribution P and LFSR be the same as in Example 1.
To illustrate this multivariate normal approximation, I(P, A(1), 1, 0) is computed by two methods and depicted in Table 2.In order to simplify the integral, we could even slightly adequate the boundary of A without fluctuating the result much.
When the parity-checks stem from H II = H\H I , q c depends on individual paritycheck.Thus when the probability value peak is q 0 , we introduce a symmetric multinomial distribution Q to simulate the influences of type II parity-checks, which parameter is Then the calculation is similar as for H I .According to the size of H I and H II , we could estimate N thr ζ by combine H I and H II together.The multinomial distribution is replaced by Multi(h I , q ζ )Multi(h II , q ζ ) in this case.We also give some direct properties from the point view of information theory in Appendix B, which maybe imply some relationships among the decoding efficiency, the initial noise distribution and the number of the parity-checks.

Two bounds Related to Cryptanalysis Complexity
As the number of parity-checks h influences the decoding complexity.We focus on the property of the first iteration in the first round, which seems to be the critical part by the previous section, and discuss how to deduce some theoretical bounds for h as well as key stream length N .

A Bound Derived from Decoding Codes
Similarly, as Proposition 1 in [CS91], in order to perform error-corrected iterative decoding, the lower bounds of h should satisfy that there exists at least a ζ such that p * ζ > p * 0 .It is summarized as follows.
Particularly, when the probability values of P and Q (or Q ) are in order as stated before, and all values of parity-checks are ζ, obviously we have Table 4: Two probability distributions P and P Remark 3. Though the ratio η(ζ, 0) has large value when all check values are ζ, The lower bound for h given in Proposition 1 may be loose, as the probability that all check values are ζ is small.
A lower bound for N could be derived through Proposition 1.For example, when generator polynomial L(x) ∈ F 2 m [x], the number of parity-checks h and the key stream length N shall satisfy that N τ (2 m − 1) τ ≈ h2 k .As an application of Proposition 1, we consider an example of two special probability distributions.Since when ∆(e) = 2 −γ , it is expected that there are probability values around 2 −m ± 2 − 2m+γ 2 in practice [YJM20], the distributions P and P in Table 4 may be useful, where ε denotes (2 − 2m+γ 2 )/(2 m − 1)3 .When 2 −γ/2 is relatively small to 1, by Taylor's formula, we have Furthermore, by the convolution property, when each parity-check has τ + 1, τ ≥ 2 taps, we have Hence, by Proposition 1, the number of type I parity-checks for P is For the case of P , the general term formula of distributions convolution could be deduced by its recursion formula, i.e.,

Thus we have
Notice that type I and II parity-checks are not distinguished in the case of P .The FCA mainly benefits from the increased SEI.More specifically, according to Theorem 1, there are 2 m − 1 binary linear approximations contributing to the SEI of linear approximation with dimension m.Notice that there are other distributions, e.g., P with

A Bound Derived from the Practical Corrected Errors
In this part, we discuss how to deduce a bound from the number of expected positions with p * ζ > p * 0 , ζ = 0. Let us consider the sets A(i), i ∈ {1, 2, . . ., 2 m − 1} for multinomial distributions.Since A(i) may intersect with each other, the way of computing threshold in section 4.1.2can't be directly applied.Thereby, we introduce some new sets: That is A(i) excluding all elements that are included in previous sets A(i), i ∈ {1, 2, . . ., i}.Let M i denote the summation of probability values over set A (i), more specifically, It is reasonable to require that Then the succeeding iterations may trigger more positions with p * ζ > p * 0 .This phenomenon may be the main advantage that soft decision decoding algorithms have.
Summing up the probability values in multinomial distributions is inconvenient.Though multivariate normal distribution approximation could also be used as before when h is large, the integral may not be easy to evaluate in practice, as the integral area A (ζ) is very complicated.Since symmetric distribution Q simulates the iterative process very well, we could deduce boundaries for A (ζ) using Multi(h, q ).The following results show how to estimate M ζ in this case.
Proposition 2. For multinomial probability distribution Multi(h, q ), we have

Particularly, when
ζ i=0 q i⊕ζ is small and hq i ≤ h b , the expected number of positions with p * ζ > p * 0 in the first iteration are dominated by those small l.
By Proposition 1, we deduce that there is a minimal positive integer When h is not small and ζ i=0 q i⊕ζ is not high, multidimensional distribution Multi(h, q ζ ) could be approximated by ζ +1 independent Poisson distributions with means λ i⊕ζ = hq i⊕ζ , i.e., As λ i ≤ h b , the maximal value of Pr(X = x) is when ζ i=0 x i is small, i.e., when l is small.
Proposition 2 gives us a hint that the value corresponding small l dominate M i .When ζ is not very large, M ζ could be approximated by partial summation for small l close to the boundary.Obviously, M i =0 are monotone non-increasing sequence.
When ζ = 1, there is another elegant way to estimate M 1 by Skellam distribution.Let Y 0 ∼ Pois(λ 1 ) and Y 1 ∼ Pois(λ 0 ), we know that their difference K = Y 1 − Y 0 follows Skellam distribution with following probability density function.
where I |k| is the modified Bessel function of the first kind.Obviously,

Application to Grain-128a
In this section, we apply our new techniques to stream cipher Grain-128a.We assume the cryptanalysis is under the known-plaintext scenario.Since the output is directly used as key stream and the plaintext never participates in updating internal states, this assumption is reasonable for Grain-128a.

A Brief Description of Grain-128a
Grain-128a includes a 128-bit LFSR cascaded with a 128-bit NFSR.Let s (t) = (s t , s t+1 , . .., s t+127 ) and b (t) = (b t , b t+1 , . . ., b t+127 ) denote their internal states at time t.The output y t of the pre-output function at time t is represented by where h(s (t) , b (t) ) is defined as The feedback bits of LFSR and NFSR are computed by Key stream bit z t = y t in the stream cipher mode, while z t = y 2w+2t in the authenticated mode, where w is the tag size.The overall structure of Grain-128a is depicted in Fig. 5.

Constructing Multidimensional Linear Approximations and Checking Parity
In [TIM + 18], the authors proposed a family of linear approximations of Grain-128a by pilling up different clocks to eliminate the linear terms of the NFSR, which forms are According to [TIM + 18], an assignment of Λ i [1 − 3] and Λ i [5 − 8] will completely determine the correlation of h function, when Λ i [0, 4] is fixed.For a specific i ∈ T z , there are only 64 possible Λ i [0, 4], i ∈ A such that the correlation of Eq. ( 21) is nonzero.Hence, the linear correlation value of (21) can be deduced by summing up all these 64 Λ i [0, 4], i ∈ T z .Meanwhile, there are 2 6 values of Λ i [1−3, 5−8] of a specific i ∈ T z with the correlation of h function is nonzero.For example, when Λ i [1 − 3, 5 − 8] = 0000000, ∀i ∈ T z , the correlation of ( 21) is about ±2 −57.0454 .For more details of these linear approximations, we refer to [TIM + 18].
In this paper, we reuse these linear approximations but in a new way by bundling them up.Firstly, we choose 42 linear approximations which Λ i [1 − 3, 5 − 8], i ∈ T z has form (Λ 0 [1 − 3, 5 − 8], Λ 26 [1 − 3, 5 − 8], . . ., Λ 128 [1 − 3, 5 − 8]) = (0, . . ., 0, 1, 0, . . ., 0), i.e., Λ i [1 − 3, 5 − 8], i ∈ T z as a group of standard basis.Then a linear approximation with dimension 9 ≤ m ≤ 42 can be established as follows where E is a m × m identity matrix in F 2 .e t is noise vector, and Any even Hamming weight linear combination of Eq. ( 22) will generate a linear approximation without i∈A Tz s t+i and i∈Tz y t+i , which correlation would be treated as 0. As for odd linear combinations, it is still required that any of Λ i [1−3, 5−8], i ∈ T z will not deduce a zero correlation for h function.Therefore, we can construct a multidimensional linear approximation with dimension 9 ≤ m ≤ 42, which consists of 2 m−1−6 = 2 m−7 linear approximations with correlation ±2 −57.0454 .By Theorem 1, its SEI ∆(e t ) = 2 m−121.0908 .As s t is a m-sequence, shifting and summation sequence s t+c j = s t+cj + i∈A Tz s t+i is also a a m-sequence with same generator polynomial as s t .Let vectorial sequence x t = (s t+c 1 , . . ., s t+c m ), since shift offsets c j , 1 ≤ j ≤ m have large difference, the paritychecks with τ = 1 are not all ruled out.Since x t runs over F m 2 \{0}, there is at most one parity-check with τ = 1 for each 0 < n ≤ N/m.In order to increase the occurrence possibility for parity-check with t = 1, several redundant binary linear approximations with nonzero correlation could be added into the subspace.The dimension increases but SEI is almost unchanged.Therefore, the maximal probability value should decrease.
Another way is exploiting a kind of special parity-checks with τ > 1.In order to avoid the great loss of SEI while implementing convolution, we play a trade-off trick when special parity-checks are feasible.For example, suppose we have h special parity-checks as follows.
Notice that all of them involve vector variables x t , x t−d1 , . . ., x t−da except for the last variable x t−dn,j Let D n−i,j = G n−i,j + G n−i,1 , 1 ≤ i ≤ a, denote the coefficient difference between the j-th and the 1-st equation.Let a i=1 D n−i,j x t−di = δ j denote the difference value.Moreover, we require that δ j satisfies some restrictions.
Since we have h − 1 groups of linear equations with coefficients (D n−1,j , . .., D n−a,j ), we require that those linear equation groups have the same solution subspace S with large dimension, for example, with dimension am − 1 or am − 2, which implies that the rank of (D n−1,j , . . ., D n−a,j ) may be 1 or 2. Thus when (x t , x t−d1 , . . ., x t−da ) ∈ S, all δ j = 0. Otherwise, δ j = 0 are likely different.Thus we have Then the APP for a i=1 G n−i,1 e t−di + Ee t could be evaluated by total probability theorem according to whether all of δ j are 0. The initial state is recovered from observed values z t of the error-corrected positions, i.e., The dimension of linear approximation is not changed but the APP converges slower.Thus the decoding ability decreases when dimension of S decreasing.However, the constraints for parity-checks is relaxed.
With these techniques, a fast correlation attack could be performed with these special parity-checks and multidimensional linear approximations in (22).

Complexity Estimation
In this section, we estimate some theoretical bounds for Grain-128a, which would bring us a new perspective on its security margin.
Let the SEI ∆(e t ) = 2 −γ , dimension m = 42.According to results in [YJM20], we can assume p 0 = 2 −m + 2 − 2m+γ 2 be the maximal probability value.This implies that all other points are very close to 2 −m .Thereby, the probability distribution P stemming from SEI is close to symmetric distribution.To simplify the process of estimating the expected number of positions with p * ζ > p * 0 , we need the following hypothesis.Hypothesis 1.There are at least 2 parity-checks with two taps, or there are more special parity-checks as stated in the previous section.
Suppose we have h special parity-checks corresponding to a solution subspace of dimension am−1 as stated above.Let v 1 , . . ., v h denote the check values, γ = (γ 0 , . . ., γ 2 m −1 ) and γ = (γ 0 , . . ., γ 2 m −1 ) denote the frequency of values in v 1 , . . ., v h and v 1 , v 2 ⊕ δ 2 , . . ., v h ⊕ δ h respectively.There are two events that may deduce p * ζ > p * 0 : event A denotes that γ ∈ A (ζ), while event B denotes that γ ∈ A (ζ).For simplicity, we only consider that when A occurs, then we have The first term denotes the probability that current noise symbol is ζ or 0, when the frequency vector γ ∈ A (ζ) and all δ j = 0.The second term corresponds to when the frequency vector γ ∈ A (ζ) but many δ j = 0, 2 ≤ j ≤ h.Thus the observed vector is γ .Since γ i are likely different, it is reasonable to assume that the second terms of M ζ and M 0 are close.To simplify the evaluation, we only consider the first term.Table 5 in Appendix A depicts the approximation of M ζ ( 1 2 is neglected).M 1 is estimated by two methods: Skellam distribution and summation for small l.The two estimations are very close to each other.Let D i = M i − M 0 denote the difference.We also compute the summation 2 36 i=1 M i and the difference summation 2 36 i=1 D i .For example, when h = h b = 2, the expected key stream length N > 2 48+42+1 = 2 91 .As P is symmetric, it seems there is no need to evaluate every probability value of APP distribution.Therefore, we use the key stream length N multiplied by the number of parity-checks h as time complexity.For the other case when there are at least 2 parity-checks with two taps, there is no probability loss caused by the trade-off.The complexity estimation is similar.

Discussion and Open Problems
The analysis of the vectorial iterative decoding algorithm is very complicated, there are still several open problems needed further study.
Firstly, it seems we cannot directly compare the vectorial decoding algorithm with a binary algorithm.The main reason is that we do not know how to put them in the same start line.For example, how many parity-checks means they have equal status.Although we point out the potential theoretical advantages for some cases in Section 3.4, it is still an open problem for the generic case.
Secondly, the other theoretical properties of the vectorial algorithm are still not clear.For example, the time complexity is estimated by the key stream length multiplied by the number of parity-checks.There are lots of redundant computations.However, we have no idea whether FWHT acceleration technique could be applied in this case.
Thirdly, the main difficulties are figuring out the existence of the special parity-checks and proposing an efficient algorithm to generate suitable parity-checks in matrix rings

A Estimation of M i
Table 5 depicts some estimatied values about the data complexity.• Assume that the sequence x is periodic, if there are two special parity-checks as stated in Section 5.2, let Gx t + . . .+ G x t+d = 0 denote their sum, then both the head G and the tail G must be invertible.
Proof. 1.Let Gx t + Ex t+d = 0 be a parity-check.Since i-th row of A and E forms a check polynomial f i with nonzero constant for x t , then f |f i .As G is nonsingular, there must be two different check polynomials f i (x) and f j (x).That means f i + f j also forms a check polynomial, but c m − c 1 + m − 1 < k means a polynomial with degree less than k could be deduced, which is impossible.2. When d 1 = d 2 , it is deduced that (G + G )x t = 0 for all x t , When x t run over all values in F m 2 \{0}, then we have G = G .When d 1 < d 2 , we could deduce another linearly dependent parity-check Therefore, according to Euclid long division algorithm, there is a G * which satisfies G * x t + Ex t+gcd(d1,d2) = 0.
Since there are k information bit of LFSR, then gcd(d 1 , d 2 ) ≥ k − m.
3. As the sum of two different parity-checks still satisfies the sequence x, and x is periodic, then we must have both G and G are invertible.
These observations imply that parity-checks with 2 taps may be rare, but it doesn't mean none, even though the key stream length needed may be large.For example, when all (x t+c1 , x t+c2 , . . ., x t+cm ) are only in a subspace of F m 2 , and c m − c 1 + m − 1 is large.Once a parity-check is found, more could be constructed by sliding and adding together.
Alternatively, the parity-checks with 2 taps may be indirectly constructed for some very special cases.For example, assume that we can find a parity-check with 4 taps, which has the following form Moreover, if it happens that the multidimensional linear approximation has the form U x t ⊕ U Gx t−a ⊕ i∈{1,...,#Tz},j(i)∈Tz Thus we have a parity-check with 2 taps Ey t + G y t−b−a = 0 for a new sequence y t = Ex t + Gx t−a .When x is a binary m-sequence with period 2 n − 1, the above 4 taps parity-check with small b + 2a does not exist, when gcd(a, 2 n − 1) = 1.The reason is that b must be a multiple of the period of sequence y, whose period is also 2 n − 1.Thus if there is a low degree one, then at least gcd(a, 2 n − 1) is large.
Besides that, we have the following observations.If a parity-check satisfies sequence x, then its characteristic polynomial F n (x) ∈ F 2 [x] has f (x) as a factor.Since G n = G, G 1 = . . .= G n−1 = 0, then F n (x) = det(Ex n + G), the number of choices for matrix G and n is Since F (x) is the characteristic polynomial of invertible matrix G, the number of different F (x) is 2 m−1 .Suppose that F (x) = f n1 1 . . .f nv v , where all f i are distinct irreducible polynomials of degree d i , it has been proved that the number of G with given F (x) is θ(F (x)) [Ger61], i.e., .
We also know that F n1 (x) = F n2 (x) 2 i for some i > 0 when n 1 and n 2 are in the same 2cyclotomic coset C n modulo ord(f ) = 2 k −1.And the size of set F = {F n(x) : 1 < nm < N } is bounded by N/(km) < #F ≤ d|k µ(d) k/d i=1 2 i , where µ(•) is Möbius function.For the case τ ≥ 2, there are about (N/m − 1)|GL m (F 2 )|(2 m 2 − 1) choices for the two coefficients and n.An upper bound of #S is the number of conjugacy classes of T in GL nm (F 2 ), which is roughly about 2 nm − (nm−1)/2 i= nm/3 2 i .We believe it is much more than (2 m − 1) 2 when L(x) ∈ F 2 m .

Figure 2 :
Figure 2: Several vectorial iterative decoding curves of scaled experiments

Figure 3 :
Figure 3: An example for comparing Criterion 2 collision, which means N≈ 2 (γ+2l+2)/4 / √ 2 m − 1.When m = 1, N ≈ 2 (γ+2l+2)/4.For the vectorial case, N seems to be smaller than the binary case, because that m > 1 and γ is expected to be smaller than the binary case.For more details about these relationships, we refer to Section 4.2.1.

Figure 4 :
Figure 4: An example for the difference distribution

10) Example 2 .
Let initial distribution P and LFSR be the same as in Example 1, and h I = 15.The difference δ(ζ, 0) is illustrated in Fig. 4. The green circles denote the negative δ(ζ, 0), while the red circles denote the positive δ(ζ, 0).The size of circle represents the relative value of |δ(ζ, 0)|.The non-negative and the negative area are separated.The region A(ζ) corresponds to those x which derives red circles.

Table 1 :
An example of increasing and decreasing ratio The second row is a priori probability distributionP .E 0 [p * ]/[p * ] and E 1 [p * ]/[p * ]denote the increasing and decreasing ratio.Particularly, E 0 /p * and E 1 /p * denote the case only considering the number of holding parity-checks.Both cases meet our expectations.

Table 3 :
Theoretical and empirical value of N thr ζ /N To verify the validity of these approximations, with the same P and LFSR as in Example 1, we compute the theoretical ratio of N thr ζ /N and the empirical ratio by the ratio where p * ζ > p * 0 .Table 3 depicts that our estimations are very precise.

Table 5 :
Estimation of some M i with m = 42