Statistical Model of Correlation Difference and Related-Key Linear Cryptanalysis

The goal of this work is to propose a related-key model for linear cryptanalysis. We start by giving the mean and variance of the difference of sampled correlations of two Boolean functions when using the same sample of inputs to compute both correlations. This result is further extended to determine the mean and variance of the difference of correlations of a pair of Boolean functions taken over a random data sample of fixed size and over a random pair of Boolean functions. We use the properties of the multinomial distribution to achieve these results without independence assumptions. Using multivariate normal approximation of the multinomial distribution we obtain that the distribution of the difference of related-key correlations is approximately normal. This result is then applied to existing related-key cryptanalyses. We obtain more accurate right-key and wrong-key distributions and remove artificial assumptions about independence of sampled correlations. We extend this study to using multiple linear approximations and propose a χ2-type statistic, which is proven to be χ2 distributed if the linear approximations are independent. We further examine this statistic for multidimensional linear approximation and discuss why removing the assumption about independence of linear approximations does not work in the related-key setting the same way as in the single-key setting.


Introduction
Linear cryptanalysis is one of the main standard statistical methods for analysing the strength of a symmetric-key block cipher. It is mostly used in the single-key setting. Applications to related-key setting are much more rare in comparison, for example, with differential cryptanalysis. The only works so far seem to be [RN13] and [BBR + 13]. They consider difference of correlations under a fixed difference in related key pairs. Linear cryptanalysis exploits biased linear expressions computed from cipher data called as linear approximations. Given a sample of plaintext-ciphertext pairs, the cryptanalyst computes the sampled correlation of the linear approximation. Statistical modelling of sampled correlations are needed to determine the data requirements of the attack.
Statistical distributions of sampled correlations of linear approximations of block ciphers are well established in the single-key setting, see e.g. [BN17]. The goal of this paper is to derive statistical distributions of the difference of the sampled correlations of Boolean functions. Such differences of correlations emerge in related-key linear cryptanalysis when the correlation of a linear approximation of a block cipher is analysed for two different keys. In previous works mentioned above, the distributions are modelled under the assumption that the sampled correlations computed for two different keys are statistically independent. Considering the fact that the related-key cryptanalysis exploits some nonrandom behaviour of a block cipher that becomes observable when analysing data obtained from the cipher with two different keys, the assumption about statistical independence is somewhat contradictory.
Another approach to argue for independency has been to use two independent data samples to compute the correlations [RN13]. In this paper we establish the distribution of the correlation difference where the two correlations are computed for the same set of known plaintexts.
Our contributions. The main technical result of this paper gives the mean and the variance of the difference of sampled correlations of a given pair of Boolean functions when using the same sample of inputs to compute both correlations. This result is further extended to determine the mean and variance of the difference of correlations of a pair of Boolean functions taken over a random data sample of fixed size and over a random pair of Boolean functions. We use properties of multinomial distribution to achieve these results without independence assumptions on the pair of functions. Using multivariate normal approximation of the multinomial distribution we obtain that the distribution of the correlation difference is approximately normal.
We then discuss the impact of this result on the existing works on related-key cryptanalysis [RN13] and [BBR + 13]. We propose a statistic for analysing the difference of sampled correlations of a set of linear approximations applied to the cipher with two different keys. In particular, we establish the distributions of this statistic for KDIB cryptanalysis for both right and wrong keys without assuming independence of the sampled correlations computed for related keys.
While the new wrong-key model is essentially the same as the one derived under the independence assumption, the right-key model is more detailed and may potentially lead to improvements in practical applications.

Sampling of the Difference of Correlations
Let f and f be two Boolean functions from F n 2 to F 2 with correlations cor(f ) = 2p − 1 and cor(f ) = 2p − 1, respectively, where we denoted by p and p the probabilities that f and f take the value 0.
A random sample S ⊂ F n 2 of size N is composed of triplets (x, f (x), f (x)) where each x ∈ S is picked equiprobably from F n 2 . The sampling is done either with or without replacement. The difference between these sampling methods in our applications is well captured by using the finite population correction factor given by 2 n − N 2 n − 1 , see e.g. [RT89]. The variances of the binomial and hypergeometric distributions differ by this factor. When considering these two alternative sampling methods in parallel we will use the following constant Let S ⊂ F n 2 be a sample of inputs to f and f of size N . We denote by the value distribution of the pairs (f (x), f (x)), x ∈ S, and by cor(f ) = 1 N (X 0,0 + X 0,1 − X 1,0 − X 1,1 ) and the sampled correlations of f and f . The next theorem gives the parameters of the probability distribution of the difference of the sampled correlations over a random sample. We denote by c the difference cor(f ) − cor(f ).
From this we get that the variance of c is equal to which was the claim in case the sampling is done with replacement. If sampling is done without replacement, then (X 0,0 , X 0,1 , X 1,0 , X 1,1 ) follows the multivariate hypergeometric distribution, which means that the variance must be multiplied by the finite population correction factor.
In models where cor(f ) and cor(f ) are assumed to be independent, the variance of c is two times the variance of one sampled correlation, that is, approximately equal to 2B/N using the binomial or hypergeometric distribution, see e.g. [BN17]. This is only a rough estimate of the actual value given by Theorem 1 which hides the cases where q = 1 2 and p = p . In particular, assuming independence, it is not possible to distinguish between the cases where just cor(f ) = cor(f ), or actually f = f . If f = f the variance given by (2) is equal to zero for all N .
Independence of cor(f ) and cor(f ) can be formally established if two independently drawn sets S and S of inputs and cor(f ) is computed from pairs (x, f (x)), x ∈ S, and cor(f ) is computed from pairs (x, f (x)), x ∈ S , repectively. Assuming q = 1 2 and (p − p ) 2 ≈ 0 we get that the variance of c is equal to 2B/N independently of whether one sample of N inputs or two samples, each one consisting of N inputs, is used. Given N inputs x it takes 2N oracle calls to get f (x) and f (x), while the corresponding numbers in the two-sample case are 2N inputs and 2N oracle calls.
In the next section, we will consider the distribution of c by randomizing, in addition to the data sample, also over a pair (f, f ) of Boolean functions, where the functions f and f cannot, in general, be considered independent. In our applications, the pair (f, f ) originates from a linear approximation formed for a cipher and a pair of related keys, see Section 4. Theorem 5 gives an example of the distribution of c in such a situation without assuming independence of f and f .

Difference of Correlations of Random Boolean Functions
Given two Boolean functions f and f , let us denote by c the difference of their correlations, that is, c = cor(f ) − cor(f ). Let us now examine the probability distribution of c for a random pair of Boolean functions drawn equiprobably among all pairs of Boolean functions. Using the notation p γ,δ given by (3), let us denote Then the 4-tuple (N 0,0 , N 0,1 , N 1,0 , N 1,1 ) follows the multinomial distribution such that N 0,0 + N 0,1 + N 1,0 + N 1,1 = 2 n with probabilities ( 1 4 , 1 4 , 1 4 , 1 4 ). Then c = 2 1−n (N 0,1 − N 1,0 ). By using the same arguments as in the proof of Theorem 1 we can prove the following result.
Theorem 2. The mean of c taken over a random pair (f, f ) of Boolean functions from F n 2 to F 2 is equal to zero. The variance of c is equal to 2 1−n .
Proof. We substitute N = 2 n , q = p = p = 1 2 , and B = 1 to the expression (2). Now, we can derive the parameters of the probability distribution of the difference c of the sampled correlations taken over a random sample S ⊂ F n 2 of size N and a random pair (f, f ) of Boolean functions from F n 2 to F 2 . This is achieved by combining the results of Theorems 1 and 2.
Theorem 3. The mean of the difference of the sampled correlations of two Boolean functions taken over a random sample of size N and a random pair of Boolean functions from F n 2 to F 2 is equal to zero. Its variance is equal to 2 Proof. We get the variance of c taken over the samples and pairs of Boolean functions by taking the expected value of the variance of c from Theorem 1 and adding to it the variance of the mean c of c as given by Theorem 2. We get For sampling with replacement, we take B = 1 to get For sampling without replacemet, we substitute B = 2 n −N 2 n −1 to obtain Let us recall the corresponding result for the sampled correlation cor(f ) of a random Boolean function f from [BN17]: the variance of cor(f ) is equal to 1 N (1 + N 2 −n ) for sampling with replacement and equal to 1 N for sampling without replacement. Hence for the difference of correlations of two equiprobably and independently drawn Boolean functions, these values are twice as large, that is, equal to the variances given by Theorem 3. While it is straightforward to derive these results by assuming independence of f and f , we have obtained them without any independence assumptions on the randomly and equiprobably chosen pair (f, f ) by exploiting the properties of the multinomial distribution.

Linear trails
Let E κ be the encryption function of an n-bit block cipher with n-bit plaintext The n-bit vectors a = (a 1 . . . a n ) and b = (b 1 . . . b n ) are called input and output masks, respectively. The correlation of the linear approximation is defined as It takes values between 1 and −1 included, and is a function of the key variable κ ∈ F k 2 , where k is the key length in bits.
An iterative key-alternating block cipher with block size n processes plaintexts x ∈ F n 2 and round keys κ 0 , κ 1 , . . . , κ r by iterating key-independent round functions g i , i = 1, . . . , r, round by round to obtain a ciphertext. Let us denote the sequence of round keys κ 0 , . . . , κ r by κ ∈ F (r+1)n 2 and denote the encryption function of the block cipher by E κ . The round keys are added to the data between rounds using XOR addition: Then the correlation of a linear approximation over E κ can be expressed as where the sum is taken over all (r + 1)-tuples τ = (τ 0 , τ 1 , . . . , τ r ) such that τ 0 = a and τ r = b [DGV94]. The sequence τ is called a linear trail of the linear approximation is called the trail correlation of trail τ and is independent of the key. The term τ, κ = τ 0 , κ 0 ⊕ τ 1 , κ 1 ⊕ · · · ⊕ τ r , κ r depends solely on the key schedule.
The existing works on related-key linear cryptanalysis [RN13] and [BBR + 13] consider the difference of correlations cor E K (a, b) and cor E K⊕∆ (a, b) for two keys (sequences of round keys) with a fixed difference ∆. For an iterative block cipher, this difference can be written as follows All terms with τ, ∆ = 0 cancel out, meaning that the number of the possible values of the correlation difference is smaller than the number of the values of the correlation. If moreover, for all τ with τ, ∆ = 1, then the difference of the related-key correlations is equal to zero. This property, named as key difference invariant bias (KDIB), was identified in the block ciphers LBlock and TWINE in [BBR + 13].

Approximate continuous distributions
In the context of linear cryptanalysis the Boolean functions f considered in Section 2 are linear approximations of a keyed block cipher E κ defined as follows where a, b ∈ F n 2 are some fixed linear masks and n is the block size of the cipher. It is practical to use continuous approximations of the discrete binomial and hypergeometric distributions for statistical analysis of the attacks. Similarly, the multinomial distribution and the multivariate hypergeometric distributions are approximated using the multivariate normal distribution. Further, it is well known that any linear (or affine) transformation of a multivariate normal deviate is again a multivariate normal deviate. This means that the distribution of the difference X 0,1 − X 1,0 considered in Theorem 1 can be approximated using a normal distribution. From this we get the following corollary.
Corollary 1. The difference c of the sampled correlations of a linear approximation a, E κ (x) ⊕ b, x of an n-bit block cipher E κ computed for two different keys κ = K and κ = K is approximately normally distributed with mean c = cor(f K ) − cor(f K ) and variance equal to where N is the size of the sample of data triplets (x, E K (x), E K (x) and The constant B is defined by (1).

Cryptanalysis of Röck and Nyberg
By Equation In Matsui's original attack of the DES algorithm, the linear approximation is composed of only one trail meaning that the correlation has only two possible values ±ρ, where ρ is the trail correlation [Mat93]. To thwart this attack, modern ciphers are designed so that a single trail cannot determine the correlation of a linear approximation. In an attempt to reduce the number of trails, Röck and Nyberg presented a generalisation of Matsui's Algorithm 1 to the related key setting [RN13]. They divided the keys K into key classes K(c) as follows: a key K belongs to K(c) if cor(f K ) − cor(f K⊕∆ ) = c, where c is a possible value of the correlation difference.
To recover the key the cryptanalyst computes the sampled correlation difference. To determine the success and error probabilities of the solution the cryptanalyst needs to know the probability distribution of the sampled correlation difference. In [RN13], it was assumed that the sampled correlations of the two related keys are independent, by arguing that this is at least the case if for each key the sample is drawn separately and independently. Since the sampled correlation for a fixed key is normally distributed with variance 1/N by the approximation of the binomial distribution (assuming sampling with replacement), the difference of two such sampled correlations is then normally distributed with the mean cor(f K ) − cor(f K⊕∆ ) and variance 2/N .
Using Corollary 1 we can remove the assumption about independence of the sampled correlations and get a more detailed understanding of the statistical behaviour of c as given by the following result.

Theorem 4. In the setting of [RN13], let us denote by Q(c) the average of q, defined by Equation (7) taken over all keys in K(c). Then the distribution of c over a random sample of size N and a random key in K(c) is approximately normal with the mean c and variance equal to
Proof. Since the expected value of c is constant for all key pairs in K(c), the variance is the mean of the variance (6).
If Q(c) = 1 2 and c 2 1 2 and B = 1 we obtain the same distribution parameters as in [RN13] but now without the assumption about independence of the sampled correlations. For ciphers with Q(c) < 1 2 , if any, the variance could be smaller than estimated, which may lead to improvements of the attack.

Key difference invariant bias
The KDIB cryptanalysis proposed by Bogdanov et al. [BBR + 13] is a key-recovery attack similar to Matsui's Algorithm 2 [Mat93]. The distinguisher is based on the difference between the statistical distributions of some test statistic computed for the wrong key and the right key. While in Matsui's linear cryptanalysis this test statistic is the sampled correlation, Bogdanov et al. used the (squared) difference of correlations computed for two keys with a fixed difference in their corresponding sequences of round keys. In statistical cryptanalysis, the wrong-key behaviour of the statistic is modelled according to the behaviour of the corresponding statistic computed for a random permutation. The right-key behaviour, which should be different, is based on some non-randomness property, which in this case, is the KDIB property.
By the KDIB property there is a linear approximation with input mask a and output mask b and key difference ∆ of the sequences of round keys such that the difference of correlations cor E K (a, b) − cor E K⊕∆ (a, b) is equal to zero for all keys [BBR + 13]. Hence the difference of sampled correlations computed for a data sample obtained from the cipher with a KDIB property can be expected to have a smaller variance compared to the one in the random case.
Using Corollary 1 we get the following result.
Theorem 5. Suppose a key-alternating block cipher has the KDIB property with input mask a and output mask b and a key difference ∆. Then the probability distribution of the difference of the sampled correlations taken over a random sample of N plaintexts and over a random key K is approximately normal with the mean equal to zero and the variance equal to where Q is the average of the probability Pr (E K (a, b) = E K⊕∆ (a, b)) taken over a random key K.
Proof. By the KDIB property we set c = 0, for all keys, in Corollary 1. It follows that the variance of c taken over a random key is equal to zero. Hence the variance of c is equal to the mean of the variance given in (6) over a random key.
By setting Q = 1 2 and B = 1 for sampling with replacement, we obtain that N 2 c 2 is a χ 2 deviate with one degree of freedom. If we take λ such variables, where λ is high enough, assume their independence, and use their sum as a statistic as done in [BBR + 13], we get the result of their Proposition 2 after approximating the χ 2 distribution with a normal distribution. Proposition 2 of [BBR + 13] makes an additional assumption that, for each linear approximation, the two sampled correlations (or the counters) computed for the related key pair are statistically independent.
In general, the related-key linear cryptanalysis of a block cipher is based on some non-randomness property of the difference of the expected values of the correlations, or equivalently, of the data distributions, which holds for all related key pairs. The success of the attack depends on how well the related-key behaviour of the cipher can be distinguished from the random behavior of the difference of correlations and data distributions.
For an example how to set up a linear related-key distinguisher for key recovery and establish a connection between the error probabilities and the data requirement we refer to [BBR + 13]. In the next subsection, we will determine the wrong-key distribution of the correlation difference.

Wrong-key distribution for related-key linear distinguisher
The distribution of the correlation of a linear approximation taken over a random permutation can be approximated by the distribution of the correlation of a random Boolean function [DR07]. This property was later established also for the sampled correlation [AKN21]. Key-recovery attacks on block ciphers (key-dependent permutations), which exploit a statistical distinguisher between the right-key and wrong-key behaviours of the cipher, typically model the wrong-key behaviour according to the random case.
Analogically, a practical model of the wrong-key behaviour of the (sampled) difference of related-key correlations of a linear approximation is obtained by imitating the behaviour of the difference of the (sampled) correlations of two random Boolean functions. By Theorem 3 we get the following result.

Corollary 2. The difference of sampled related-key correlations
taken over a random wrong related-key pair (K, K ) and over a random sample of size N is approximately normally distributed with mean equal to zero. The variance is equal to if sampling is with replacement. If sampling is without replacement, the variance is equal to 2 N .
Proposition 3 [BBR + 13] gives an approximate probability distribution of the sum of squares of correlation differences for λ linear approximations under the following assumptions 1. λ is high enough, 2. all 2λ sampled correlations are statistically independent, and 3. the sample of N known plaintexts may contain repetitions.
Using Corollary 2 we can remove assumption 1 and give the result without normal approximation of χ 2 distribution (which required λ to be high enough). In addition to the case defined by assumption 3 we also consider sampling without replacement. We can relax assumption 2 and allow the two counters for each linear approximation to be dependent. Yet we still need statistical dependence of the linear approximations. For the formulation of this result, see Corollary 3 in the next section.

Definition of the statistic
be a set of M nonzero linear approximations of an n-bit block cipher E κ . Let (K, K ) be a pair of related keys and f α and f α denote these linear approximations applied to E K and E K , respectively. Further, we denote c α = cor(f α ) − cor(f α ) and where the sampled correlations cor(f α ), and cor(f α ), α = 1, . . . , M , are computed for a random set of N plaintexts drawn either with or without replacement. Further, we denote by q α the probability that f α (x) = f α (x) taken over a random plaintext x ∈ F n 2 . By Corollary 2 we have that if sampling is with or without replacement, respectively, follows the χ 2 distribution with one degree of freedom. In statistical cryptanalysis we can assume that N and 2 n are large and make the following approximations N ≈ N − 1 and 2 n ≈ 2 n − 1.
In the analysis of the sampled related-key correlation difference c α we propose to use the following statistic where B is the constant defined in (1).

Wrong-key distribution of the statistic
Corollary 2 gives the distribution of a sampled related-key correlation difference over wrong key pairs. Using it we obtain the following information about the distribution of T for a family of M linear approximations.
Corollary 3. The mean of the statistic taken over a random sample of size N and a random wrong related-key pair is equal to M . Suppose, moreover, that the linear approximations are independent, in the sense that c α , α = 1, . . . , M , are statistically independent over a random sample of size N and over a random pair of permutations. Then the statistic T follows the χ 2 distribution with M degrees of freedom.
The shape of the distribution of T must be considered separately for different kinds of families of linear approximations. It can be argued that if the linear approximations a α , E κ (x) ⊕ b α , x , α = 1, . . . , M , are linearly independent, that is, the mask pairs (a α , b α ) are linearly independent, then they are also essentially statistically independent. In practice, χ 2 distribution may work well also for other kinds of sets of linear approximations even if the prerequisites of Pearson's χ 2 test are not fully satisfied [BTV18,FN20].
It might be possible, although elaborate, to use the properties of the multinomial distribution in the similar way it was done in [AKN21] for the single-key setting to compute the variance of the capacity of a multidimensional linear approximation. In this way, one could obtain the variance of T , while the form of the distribution still would remain an open problem.

The statistic without independence of linear approximations
We start by examining the statistic T for a fixed pair of permutations, either random or cipher, with related keys identified by a key pair (K, K ). We determine the expected value of T over a random sample of N plaintexts.
Theorem 6. For any given pair of keys (K, K ) with c α = cor(f α ) − cor(f α ) the statistic T defined by (9) has the following mean over a random sample of size N where B is the constant defined in (1). If the set of linear approximations satisfies then the mean of T taken over a random data sample of size N is equal to Proof. For each α = 1, . . . , M , the expected value of c α is equal to c α . By applying the expression (6) of the variance we get By summing over α = 1, . . . , M and using N − B ≈ N we get the claim.
We leave it as an open task to investigate the distribution of T . In practical applications the χ 2 distribution may often give a sufficiently accurate approximation. For example, in the case where Q = 1 2 the distribution of (1 + (N/B)2 −n ) T could be close to the noncentral χ 2 distribution with M degrees of freedom and noncentrality parameter approximately equal to

The distribution view
In single-key multidimensional linear cryptanalysis, the χ 2 distribution of the statistic computed from the squared correlations of the linear approximations arises naturally from the related multinomial distribution using Pearson's χ 2 test [HCN19,AKN21]. In related-key multidimensional linear cryptanalysis this is not clear. Let us have a closer look.
A multidimensional linear approximation is a linear space where the nonzero elements are given by the mask pairs (a α , b α ), α = 1, . . . , M . If the dimension is t, then M = 2 t − 1. Then the multidimensional linear approximation can also be given by a vectorial Boolean function. For example, when applied to the cipher E κ with key κ, the nonzero components of this vectorial Boolean function are the Boolean functions defined by the expression (8). Moreover, we can assume that the indexing is such that α ∈ F t 2 and the mapping α → (a α , b α ) is a linear isomorphism.
If we denote by F this vectorial Boolean function for κ = K then we can assume that Similarly, we define the vectorial Boolean function F to correspond this multidimensional linear approximation applied to E K . For each η ∈ F t 2 we define the probabilities p η = Pr (F (x) = η) = 2 −n |{x ∈ F n 2 | F (x) = η}|, and p η = Pr (F (x) = η) = 2 −n |{x ∈ F n 2 | F (x) = η}|.
To observe the difference of the distibutions p η and p η we draw a random sample S of N plaintexts x from F n 2 . We denote Then the values F (x) and F (x) computed for a single x and a single key pair (K, K ) typically increment counters for two different values of η, which means that the categories (labelled by η) are not sampled independently thus violating the prerequisites of Pearson's χ 2 test. Based on the connections we get that the statistic T can also be expressed in the following form The form of T given above is applicable to the analysis of different types of distributions of cipher data obtained using linear projections. Such distributions occur for example in statistical saturation attack. The relation between the statistical saturation attack and multidimensional linear cryptanalysis (in the single-key setting) has been studied by Blondeau and Nyberg [BN14].

Conclusions
In this work, we studied the probability distribution of the difference of sampled correlations of two Boolean functions over a random sample of their inputs and showed that it is approximately normal and gave its parameters. Further, we established this distribution also over a random pair of Boolean functions. These results were then applied to related-key linear cryptanalysis. By modelling the wrong-key behaviour of the correlation of a linear approximation according to random behaviour, we obtain the wrong-key distribution. For the right key, the cryptanalyst exploits some non-random property of the cipher. We revisited the KDIB cryptanalysis and established the right-key distribution for a single linear approximation without any independence assumption about the sampled correlations computed for related keys. The variance of this distribution depends on the probability q that the linear approximations take different values when computed for the cipher with two related keys. This probability may not always equal to 1 2 . It would be interesting to determine the probability q and study its impact to the variance of the correlation difference for LBlock and TWINE. Another line of work, for related-key linear cryptanalysis more generally, would be its applications to tweaked block ciphers.
We also discuss the previously proposed solution to obtain independence of the sampled related-key correlations by using two independent samples to compute the correlations. While this would work for distributions taken over the data, it does not help when the distributions are taken over a random (right or wrong) key. In related-key cryptanalysis, in particular, it is not realistic to assume that the sampled related-key correlations are independent.
For related-key applications involving multiple linear approximations we proposed a χ 2 -type statistic, which indeed has a χ 2 distribution under the additional assumption that the linear approximations are independent. When trying to remove this assumption by considering linear or affine spaces of linear approximations, as it is done in the single-key setting, we encountered problems, which were left for future work.