Cryptanalysis of Reduced Round ChaCha – New Attack & Deeper Analysis

. In this paper we present several analyses on ChaCha , a software stream cipher. First, we consider a divide-and-conquer approach on the secret key bits by partitioning them. The partitions are based on multiple input-output differentials to obtain a significantly improved attack on 6-round ChaCha256 with a complexity of 2 99 . 48 . It is 2 40 times faster than the currently best known attack. This is the first time an attack on a round reduced ChaCha with a complexity smaller than 2 k/ 2 , where the secret key is of k bits, has been successful. Further, all the attack complexities related to ChaCha are theoretically estimated in general and there are several questions in this regard as pointed out by Dey, Garai, Sarkar and Sharma in Eurocrypt 2022. In this regard, we propose a toy version of ChaCha , with a 32-bit secret key, on which the attacks can be implemented completely to verify whether the theoretical estimates are justified. This idea is implemented for our proposed attack on 6 rounds. Finally, we show that it is possible to estimate the success probabilities of these kinds of PNB-based differential attacks more accurately. Our methodology explains how different cryptanalytic results can be evaluated with better accuracy rather than claiming that the success probability is significantly better than 50%.


Introduction
ChaCha was designed in 2008 by Bernstein [Ber08a] as a variant of Salsa, which was one of the finalists of eSTREAM project (see [Ber08b] for details). ChaCha is an ARX cipher, i.e., the operations in the cipher involve Addition, Rotation and XOR, which are executed very fast in a Central Processing Unit (CPU). ChaCha is presently being used in many of the standards and quite safe with the number of rounds proposed. However, it is important to study the cipher with reduced rounds for cryptanalytic purposes. There are several important results in this direction for around a decade.

Summary of the existing attacks
Most of the attacks on the reduced round versions of this cipher are based on differential cryptanalysis.
• The fundamental attack was proposed in 2008 in [AFK + 08]. The authors identified weaknesses of 6 and 7-round versions of ChaCha256. Moreover, they also provided cryptanalytic results on ciphers with similar structure such as ChaCha128, Salsa256, x ⊟ y Subtraction of x and y modulo 2 32 x ⊕ y Bitwise XOR of x and y x ≪ n Rotation of x by n bits to the left x ≫ n Rotation of x by n bits to the right XOR difference after r-th round of the j-th bit of the i-th word of X and X ′ (ID, OD) Input Difference-Output Difference k Key length in bits • In section 3, we begin with the analysis of the previous attacks on 6-round ChaCha256.
In subsection 3.2 we present a novel cryptanalytic idea using multiple (ID, OD) pairs, and apply this technique on 6-round ChaCha256 to obtain a significantly improved attack complexity over the previously existing ones. This is the first time an attack complexity significantly less than 2 k/2 is reported, where the secret key is of k bits. Note that the previous attacks, where the complexities were claimed to be less than 2 k/2 , were incorrect and we will explain that as and when required.
While such an approach was not exploited against ChaCha so far, accumulating several biases to mount improved attacks is quite well known in literature. For example, the work of [ABP + 13] exploited the idea of accumulating several biases (see [GMPS14] and the references therein) to mount an attack on RC4. In fact, the cryptanalysis of [ABP + 13] is one of the prime reasons RC4 is being replaced by ChaCha in various standardized encryption systems. Also, similar ideas have been used in [SG18], [SGSL18].
• To concretely understand the complexity of the complete attack, in section 4, we present a toy version of ChaCha with 32-bit secret key. We implement the usual cryptanalytic attack approaches to achieve a better estimation of the complexities, false alarm error probabilities and the success probabilities. We also implement our new cryptanalytic technique and compare the complexity with the usual attack approaches.
• Lastly in section 5, we present a theoretical approach to identify more accurate ranges for the success probabilities for the PNB-based techniques. We use this theory to compare the practical results obtained from the toy cipher, and that matches convincingly. We exploit this calculation to refine the success probabilities of the attacks presented in [AFK + 08] and its modified version with chosen IV approach by Maitra [Mai16].

Structure of ChaCha256 and Differential Attack Idea
Let us first explain the design of the cipher. The key stream generation machinery of ChaCha considers an input (the secret key) of size 256-bit (k), a constant of size 128-bit (c) along with the initialization vector (IV) v of size 128-bit (3 nonces and one counter) which are divided in 16 blocks of size 32-bit each. They are organised in a 4 × 4 matrix form (X). Each 32-bit block is conventionally called a word. The first row is filled by the constants c = (c 0 , c 1 , c 2 , c 3 ), second and third row contains the key k = (k 0 , k 1 , . . . , k 7 ) and the last row has the initial vectors (IVs) v = (t 0 , v 0 , v 1 , v 2 ). The four constants are fixed as c 0 = 0x61707865, c 1 = 0x3320646e, c 2 = 0x79622d32, c 3 = 0x6b206574. That is, the initial state matrix format is as following: This is denoted by X 0 (or sometimes by X) which goes through R many ChaCha Round functions [Ber08a]. The updated version of X 0 after r rounds is denoted by X r . After the full R rounds of execution, the final state X R is added word-wise (modulo 2 32 ) to the initial state X 0 forming the key stream Z, i.e., Z = X 0 ⊞ X R .
Each of the ChaCha Round is formulated with the help of quarterround function which itself consists of four ARX functions. The four ARX operations of each quarterround function, which transforms a vector (a, b, is given by the following equations: (1) The odd numbered ChaCha Round is called the column round due to the fact that it updates the four column vectors (X 0 , X 4 , X 8 , X 12 ), (X 1 , X 5 , X 9 , X 13 ), (X 2 , X 6 , X 10 , X 14 ), and (X 3 , X 7 , X 11 , X 15 ) of the state matrix X. On the other hand the even numbered ChaCha Round is known as diagonal round, as the the diagonal vectors (X 0 , X 5 , X 10 , X 15 ), (X 1 , X 6 , X 11 , X 12 ), (X 2 , X 7 , X 8 , X 13 ), and (X 3 , X 4 , X 9 , X 14 ) of the matrix X are updated.
Generally one can reverse back to r-round state X r from the r + 1-round state X r+1 with the help of the reverse quarterround function, which transforms the vector ( (2) Further details regarding these operations are available in [Ber08a].
Existing idea of Differential Attack. We denote the j-th bit of the i-th word of the state matrix X after r rounds by X r i [j]. In the differential attack against ChaCha, we apply an input difference ∆ 0 i [j] to the j-th bit of the i-th word of initial state matrix X (which is by notation actually X 0 ) producing X ′ . To be precise, injecting the input difference is basically complementing that bit of the respective state matrix. That is, . Now, the round function is applied on both X and X ′ for r rounds. In X r and X ′ r , the difference is observed at the q-th bit of p-th word, i.e, ∆ r . We compute the probability Pr(∆ r p [q] = 0|∆ 0 i [j] = 1), which we write in the form 1 2 (1 + ϵ d ), where ϵ d is called the forward bias. We aim to find an (ID, OD) pair (∆ 0 i [j], ∆ r p [q]) for which we have a higher value of ϵ d and consequently use it as a distinguisher.
In this approach of differential attack, the idea of probabilistic neutral bits (PNBs) [AFK + 08] plays a vital role in the backward direction. From the output key streams Z = X ⊞ X R and Z ′ = X ′ ⊞ X ′ R , to find a PNB, we proceed as follows. A key bit from X, X ′ is complemented and X, X ′ are produced respectively. Then we compute Z − X, Z ′ − X ′ and execute the reverse round function by R − r many rounds on both of them and consequently achieve the matrices Y and Y ′ respectively.
. Now if the probability of the event Γ p [q] = ∆ r p [q] demonstrates a bias which is higher than a predetermined threshold γ, we call the complemented key bit a probabilistic neutral bit (PNB). Otherwise we call it a significant key bit or non-PNB. In the pre-processing stage of the attack, the probabilistic neutral bits are identified by estimating the above mentioned probability through experiments.
Next in the actual attack, the attacker collects N samples of output key streams Z, Z ′ , assigns random values to the PNBs of X, X ′ , and first aims to guess the significant key bits correctly. Now let us compare the scenarios when the significant key bits are correctly guessed and when they are not. We denote byX,X ′ the states where the significant key bits have correct values and the PNBs are random. We run the reverse round operations by R − r rounds on Z −X, Z ′ −X ′ , and achieveŶ ,Ŷ ′ . We observe the On the other hand,X,X ′ are two states where both significant bits as well as the PNBs are random. We compute Z −X, Z ′ −X ′ and run the reverse round operation by R − r rounds to achieveỸ ,Ỹ ′ , and compute Pr(Γ p [q] = 0) = 1 2 (1 +ε). Betweenε andε,ε would have a noticeable value, since the significant key bits are correct, butε would be approximately 0. After successfully identifying the significant bits, we may recover the PNBs by exhaustive search. The bias of the event (Γ p [q] = ∆ p [q]) is usually called the backward bias and denoted by ϵ a . Now under some assumptions of independence (which are approximately valid and logical), Therefore the biasε can be approximated by ϵ a ϵ d .
Complexity of the attack. We use hypothesis testing to distinguish the correct guess of significant keys from a wrong guess and find the complexity. Consider the following two hypotheses • H 0 : The guessed significant key bits are incorrect, i.e., the bias related to the key bits is 0.
• H 1 : The guessed significant key bits are correct, i.e., their combined bias isε.
From our N samples, we keep track of how many times the observed differenceΓ p [q] = 0. Suppose the count is x. Hence, for a threshold T , a reasonable decision rule will be of the form: In this regard we have the following two types of errors.
1. Type I error: Where the null hypothesis (H 0 ) is rejected in spite being true, i.e., for an incorrect guess of significant key bits, we achieve x > T (False Alarm).
2. Type II error: Where the null hypothesis is retained in spite of being false (Non detection). Here, for the correct guess of the significant key bits, we achieve x ⩽ T .
The authors of [AFK + 08] restricted the probability of non-detection error to be less than or equal to 1.3 × 10 −3 . Based on this, using Neyman-Pearson lemma, the value of N is derived as: Here α is such that the probability of false alarm is 2 −α . Let us denote the total key size of the cipher by k bits and the number of significant key bits in the key be m, then the complexity formula given by [AFK + 08] was 2 m · N + 2 k−α . However, in the work [DGSS22] it has been shown that the accurate complexity formula should be 2 m · N + 2 k−α + 2 k−m .

Critical analysis of the previous works and a novel attack on 6-round ChaCha256
Based on the correction provided by [DGSS22] in the complexity formula, we identify that there is a limitation in the fundamental cryptanalytic approach in [AFK + 08]. In this approach, the overall complexity of the attack can never go below 2 k/2 . Let k − m be the number of PNBs in the key. As mentioned above, the complexity is then given by On the other hand, m ⩾ k/2 implies, 2 m · N ⩾ 2 k/2 . So, 2 m · N + 2 k−m + 2 k−α ⩾ 2 k/2 . Therefore, for both the cases m < k/2 and m ⩾ k/2, the complexity is greater than or equal to 2 k/2 . This is where we come up with a novel cryptanalytic idea of exploiting multiple (ID, OD) pairs to bring down the complexity below 2 k/2 , for k-bit key that we present in this section, particularly in subsection 3.2.

Correcting the complexity calculations of some previous works
In the literature there are several works in which the achieved complexity seems to be less than 2 k/2 . The reason for this is that the authors have used the complexity formula given in [AFK + 08]. In fact, there are several previous works on the differential attacks on ChaCha whose complexities should be significantly different from their claim, if we use the complexity formula given in [DGSS22].

Explanations for the miscalculation in the complexity calculation.
The key recovery is done in two stages. In the first stage, we recover the significant key bits, for which the required complexity is 2 m · N + 2 k−α . In the second stage we recover the PNBs by exhaustive search, for which the complexity is 2 k−m . In the formula given by [AFK + 08], this 2 k−m term was missing, i.e., the complexity for recovering the PNBs was not included in the formula, which has later been incorporated in [DGSS22]. In the complexity analysis of the following works, 2 k−m is significantly higher than 2 m · N + 2 k−α . So the actual complexity is significantly higher than their claims. This we explain with little more details before proceeding further.
• [AFK + 08]: In the work of [AFK + 08] itself, the authors have attacked the 7-round and 6-round version of ChaCha256. For the 6-round version attack, the authors used 147 PNBs in their attack, i.e., the complexity for the second stage is 2 k−m = 2 147 . Unfortunately, because of considering the formula of [AFK + 08], the authors claimed that the attack complexity as 2 m · N + 2 k−α = 2 139 , which is actually higher.

• [SZFW12]:
In the last step of their attack on 6-round ChaCha256, they have to recover 139 PNBs, which requires a complexity of 2 139 . However, the complexity claimed was 2 136 .
• [CN20]: Next in 2020, the authors of [CN20] claimed to further improve the attacks on 6-round ChaCha256 by using a better distinguisher, and proposed a cryptanalytic idea achieving complexities like 2 102 and 2 104 . Again, they have used 210 and 212 PNBs respectively in these attacks, because of which the actual complexities should be 2 210 and 2 212 .
We mention the claimed complexities and the actual complexities of all these attacks in Table 2, along with the complexity of our newly proposed technique in the next subsection.

Our cryptanalytic technique involving multiple (ID, OD) pairs
Now we propose a novel cryptanalytic technique using multiple (ID, OD) pairs with the help of which one can achieve a complexity less than 2 k/2 . Assume that we have q different (ID, OD) pairs each of which give a high bias (How the (ID, OD) pairs are obtained and how the number q is selected is explained in subsection 3.4). Here we exploit all these pairs in the attack. Let us denote these pairs as (

Pre-processing Stage: Partitioning the key bits into q + 1 subsets
In this approach, we partition the set of all key bits into (q + 1) subsets S 1 , S 2 , . . . , S q+1 , such that for i = 1, 2, . . . , q, S i is the set of significant key bits corresponding to (ID i , OD i ). Further, S q+1 is the set of all remaining key bits.
Stage 1: We put the input difference at ID 1 , run r ChaCha Round functions and observe the difference at OD 1 . Let us call it ∆ OD1 . Then, we run the algorithm on both the matrices R − r more rounds (i.e., R rounds in total), and generate Z, Z ′ . Now by changing a single key bit in X and X ′ , we generate X and X ′ . We compute Z − X and Z ′ − X ′ , run the reverse round by R − r rounds and check the differences at OD 1 . Let us call it T OD1 . We repeat this process for each key bit. If the bias in the event (∆ OD1 = T OD1 ) is less than a predetermined threshold γ 1 , we consider the key bit to be in S 1 , i.e., significant bits corresponding to (ID 1 , OD 1 ). We repeat this process for each key bit and thus construct S 1 .
For i = 2 to q, we do the following: Stage i: Similarly as above, by putting the input difference at ID i , we run both X and X ′ by r rounds and observe the difference at OD i , and call it ∆ ODi . Then we generate Z, Z ′ . Consequently, for each of the key bits which are not in any of S 1 , S 2 , . . . , S i−1 , we proceed as follows. Changing the key bit of the initial matrices, we achieve X and X ′ , then compute Z − X and Z ′ − X ′ and run the reverse algorithm by R − r rounds to check the difference T ODi . The bits for which the bias of the event (∆ ODi = T ODi ) is less than a predetermined threshold γ i , are included in S i .
Stage q + 1: After the construction of S 1 , S 2 , . . . , S q , the remaining key bits which are not in any of the sets S 1 , S 2 , . . . , S q , are assembled in the set S q+1 .
By the construction process it is clear that since during any stage S i , key bits are chosen from those which are not in S 1 , S 2 , . . . , S i−1 , so its intersection with S 1 ∪ S 2 , . . . ∪ S i−1 is empty. Since this is true for any i ∈ {1, 2, . . . , (q + 1)}, S 1 , S 2 , . . . , S q+1 are all disjoint.

Online Phase
In the differential attack model we assume that the attacker has access over the IVs. The first stage of the cryptanalytic method consists of collection of the data.
Data Collection: For all i = 1, 2, . . . , q, the following steps are executed. By assigning N i different pairs of IVs (v, v ′ ) such that the difference is at the position ID i , the attacker runs the algorithm through R rounds and collects the outputs Z, Z ′ . Thus, the data

Recovering the key bits:
The key bits are recovered in (q + 1) stages. For each of i = 1 to q, in the i-th stage, we recover the key bits of S i .
Stage 1: For each of the collected N 1 pairs of Z, Z ′ which are generated from X, X ′ with the input difference at (ID 1 , OD 1 ), the steps are as follows. The attacker guesses the key bits of S 1 and assigns random values in the remaining k − |S 1 | key bits. Thus, two states X andX ′ are produced. Now the attacker runs Z −X and Z −X ′ for R − r rounds and checks the difference at OD 1 position. Let us denote it byT ID1 . Out of N 1 pairs, if the number of times whenT ID1 = 0 occurs is more than a predetermined threshold T 1 , the guess is considered to be correct for the S 1 key bits, and the attack proceeds to stage 2.
If it does not cross the threshold, the attacker takes a new guess of the S 1 key bits and repeats the procedure.
For i = 2 to q, the following procedure is performed: Stage i: Till the beginning of the i-th stage, the attacker has already recovered the key bits of S 1 , S 2 , . . . , S i−1 . Now for each of the N i pairs of Z, Z ′ , corresponding to (ID i , OD i ), the attacker puts the already recovered values for the key bits of S 1 ∪ S 2 ∪ · · · ∪ S i−1 , and guesses S i , puts random values in the remaining (k − |S 1 | − |S 2 | − · · · − |S i |) key bits. Then, Z −X and Z −X ′ are run backwards by R − r rounds and their difference at OD i position is observed, which we denote byT IDi .
If out of these N i pairs, the count thatT IDi = 0 occurs crosses a predetermined threshold T i , the guess for S i key bits is considered to be correct, and the algorithm proceeds to stage i + 1. Otherwise, we proceed with a new guess and then the process is repeated.
Stage q + 1: In this stage the remaining key bits S q+1 are obtained by the exhaustive search.

Complexity and error probability in our new approach
Here we present the complexity analysis in line of [AFK + 08], with certain modifications. We aim to choose the data complexity in such a way that the probabilities of the two types of errors are within our desired limit.

Non-detection error in each stage:
In each of the first q stages, the non-detection error can occur, i.e., the threshold may not cross even if the guess is correct. For simplicity, we aim to keep the non-detection error probability same in each stage. Let us denote this by Pr * nd and the overall non-detection probability by Pr nd . First, we find the relation between Pr * nd and Pr nd . In each individual stage, the probability that the correct key bits are detected is (1 − Pr * nd ). Therefore, the probability that correct key bits are detected in all the q stages is (1 − Pr * nd ) q . Thus, False alarm error in each stage: Similarly, a false alarm can occur in each of the q stages. Let us denote the error probability of the i-th stage as Pr f a i . If there is a false alarm in the i-th stage, we proceed to the (i + 1)-th stage. However, in our attack approach, we will consider the probability of false alarm error so small that it would have negligible contribution to the overall complexity.

Complexity in the i-th stage:
In the i-th stage, the complexity to find the correct significant key bits corresponding to the i-th (ID, OD) pair, can be expressed as 2 mi · N i . Particularly, in our attack we assign such values of α that the false alarm error does not have significant influence on the complexity. Therefore the total complexity can be written as Derivation of N i : Here we derive the data complexity of each stage with the aim that the overall error probability of non-detection has the same upper bound as the previous works (1.3 × 10 −3 ). Let a random variable X follow a binomial distribution with N i trials.
If the null hypothesis is true, i.e., the guessed significant key bits are incorrect, then p = 1 2 . If the alternative hypothesis is true, we have p = 1 2 (1 + ϵ). Now, we have to distinguish between these two distributions. We approximate both of these by normal distributions. Then we have to decide a threshold T i for which the two errors mentioned above are upper bounded by certain desired values, i.e., Pr * nd and Pr f ai = 2 −αi . Let us denote the two random variables corresponding to H 0 and H 1 as X 0 and X 1 .
When H 0 is true, we have p = 1 2 and when the alternative H 1 is true we have p = 1 2 (1 + ϵ). Here the test statistic used is Z j = (X j − mean)/standard deviation (j = 1, 2), which follows a standard normal distribution. We consider the probability of false alarm error to be upper bounded by 2 −αi , i.e., On the other hand, the probability of non-detection is Pr * nd . Hence Thus, T i can be expressed as follows.
Using this value of the expression from Equation 6 into Equation 5 we get, Taking natural logarithm on both sides and using the equality 1 q Pr nd = Pr * nd we have,

Key recovery of 6-round ChaCha256
We use this divide-and-conquer kind of approach to produce an attack against the 6-round version of ChaCha256. In this process we use three (ID, OD) pairs.

Choosing the (ID, OD) pairs
In section 6.1 of [BLT20], the authors reported four distinguishers for the 3.5-th round which they found experimentally. Among these four, we use three pairs (q = 3) in our attack, which are as follows: ). However the authors have used suitable IVs such that the number of differences after the first round is minimum and thus obtained a bias of 0.00317 after 3.5 round. To achieve one suitable IV we need 2 5 random trials on average. To avoid these 2 5 extra trials we loose the minimum difference criterion after one round. As a result we achieve a bias (ϵ d ) of 0.0005 by experimenting over 2 40 random key-IV pairs. Note that we want to keep the non-detection error probability same as in the previous works, i.e., Pr nd = 1.3 × 10 −3 . Thus, Φ −1 [(1 × Pr nd )/q] = Φ −1 [(1 × (1.3 × 10 −3 ))/3]) ≈ −3.4. Now we discuss how many (ID, OD) pairs do we need to consider and in which order so that we can produce the best attack. To explain this, we study the complexity calculation given in formula Equation 4.
Explanation for using three (ID, OD) pairs: We consider a q to be suitable, if the complexity of the last stage (to recover the remaining key bits of S q+1 via exhaustive search) is almost negligible compared to the complexity of the first q stages (to recover S 1 , S 2 , . . . , S q ). For example, in our case if we had taken q = 2, then the complexity to recover S 1 and S 2 would have been approximately 2 99 , whereas the complexity of the last stage might be as high as 2 142 , since we had to recover 142 remaining bits in the last stage. So, q = 2 is not a suitable choice. Thus we go for q = 3, where we see that the complexity at the last stage is 2 92 , which is much less than 2 99 . Further, we can proceed for q = 4, but that does not improve the overall complexity further, since the complexity of the first stage still remains 2 99 .
Choosing the order of the (ID, OD) pairs: In this expression, among all the terms of the form 2 mi · N i the first term 2 m1 · N 1 plays the vital part and the other terms are significantly smaller than this, i.e., the contribution of those terms in the overall complexity is much less. The reason is, in the later stages the number of key bits we recover is less, and also, since the already recovered bits are correctly guessed, the bias increases and therefore N i decreases. So we aim to make the term 2 m1 · N 1 as small as possible.
Thus, we have to focus on which (ID, OD) pair should be considered as the first pair (ID 1 , OD 1 ), since it has the primary role in deciding the complexity. Among all the pairs, we choose the (ID, OD) pair for which we can achieve the minimum value for 2 m1 · N 1 . Particularly in our case, since each of the three pairs produces same backward and forward biases if considered as the first pair, so the order does not really matter much.

Particulars of the cryptanalysis
First Stage: Corresponding to (ID 1 , OD 1 ), we set the threshold 0.565 and obtain 58 significant bits with 2 20 samples.

Implementing the attacks on a toy version of ChaCha
It is evident that the complexities we discussed so far are at a level that cannot immediately be implemented to demonstrate the complete attack. On the other hand, as in many other cryptanalytic efforts, there are several statistical assumptions while we estimate the complexity and success probability of the attack. In this direction let us explain the importance of developing ToyChaCha, a toy version of ChaCha. The following points discuss several aspects of proposing ToyChaCha and implementing the attacks.
• The differential attacks that has been proposed so far are based on the probabilistic neutral bits, and generally we follow the complexity formula proposed in [AFK + 08]. Unfortunately, this formula has been used almost as a black box in several works afterwards, and substantial verification has not been studied on the accuracy of this complexity estimation. This formula is based on several assumptions and approximations. So far we do not have any scientific validation of this entire approach. Implementation of these cryptanalytic approaches would be convincing towards the validity of the entire attack procedure and the complexity calculation.
• The complexity formula is based on an approximation of binomial distribution to normal distribution. Thus how closely the approximation helps us to get the actual complexity can be experimentally validated by an application on a toy cipher.
• Note that, the authors of [AFK + 08] claimed that the attack has success probability for at least half of the all possible keys. However, there has not been detailed investigation on the exact proportion of keys for which the attack is applicable. Implementing the attack on the toy version helps us to get a more accurate measure of the success probability, at least for the toy version.
• In the formula, the probability of false alarm plays a vital role. In the work of [AFK + 08], the authors did not accurately estimate this probability of false alarm, rather considered an upper bound for this. This influences the derived complexity to deviate from the actual value. In the ToyChaCha we can experimentally measure the actual probability of false alarm error and investigate quantitatively how it may revise the complexity estimation.

Structure of ToyChaCha
The design of the ToyChaCha has similar structure with respect to the original cipher except for the constant vectors and the quarterround function. Here each entry (word) of the matrix is of 8 bits. We consider 32-bit key and replicate it on the next row. The four constants chosen are c 0 = 0x65, c 1 = 0x6e, c 2 = 0x32, c 3 = 0x74. The equation of the quarterround function which transforms a vector (a, b, c, d) to

Implementation of key recovery attack on 3.5 round using ideas from [AFK + 08] and [Mai16]
On this ToyChaCha, we implement the fundamental key recovery attack given in [AFK + 08] and the further improvement given by Maitra [Mai16] using chosen IV approach. After that, we implement our cryptanalytic technique using multiple (ID, OD) as well. The details of the machine where we experimented these are as follows: Intel(R) Xeon(R) W-2265 CPU @ 3.50GHz with Ubuntu 20.04.4 LTS operating system.

Approach of [AFK + 08]
Using a single bit distinguisher on 2 rounds, we produce an attack in 3.5 round ToyChaCha.
In this process, we use the (ID, OD) as (∆ The forward bias observed here is ϵ d = 0.9167 while the backward bias is ϵ a = 0.377. Therefore, for approximate calculations,ε = ϵ d · ϵ a = 0.343, number of significant key bits m = 16. We achieve the best complexity for α = 11. For this, N = 378, T = 227 and the complexity is 2 24.67 .

Implementation:
In the implementation experiment, we execute the code with 2 15 different keys. The average time required to recover the key is 0.9658 seconds, and the complexity is 2 23.60 , which is close to (slightly less than) the theoretically achieved value 2 24.67 . Out of these keys, 32705 keys were successfully recovered. Thus, the success probability of the attack is 99.81%. To estimate the false alarm probability, for each of the 2 15 keys, we count the number of times the false alarm occurred and divided it by the number of guesses, and then took the average. We calculated the false alarm probability as low as 0.000341, which is less than the theoretical claimed upper bound 2 −11 . The source code of the attack program is available in GitHub [Gar22] and the summary of the experimental evidences are provided in Table 3.

Approach of Maitra [Mai16]
In the chosen IV approach of Maitra [Mai16], during the pre-processing stage, for all possible keys in the input difference column, the attacker lists the IVs that produce minimum number of difference between X and X ′ after the first round. We call them "suitable IV" according to [Mai16]. This choice of IV improves the forward bias ϵ d . In this approach the author did not include any key bit from the input difference column into the PNB set. In the actual attack, while guessing the key, the attacker uses the corresponding IVs from the prepared list. We implement this chosen IV approach in the ToyChaCha. We choose the input difference position ∆ 13 [0] and observe the output difference at ∆ (2) 1 [6] after 2 rounds. The minimum number of differences between X and X ′ after the first round is 10. By assigning threshold 0.45, we achieve the following PNB set : { 7, 6, 5, 4, 3, 2, 1, 0, 19, 18, 31, 30, 26, 25, 24 } Here, in the pre-processing stage, we prepare the list of key-IV pairs. In the input difference column, since 8 key bits are involved, there are 256 possible values. Out of them we observe that for 8 keys we do not get any IV which gives 10 differences after the first round. We call them strong keys according to the terminology used in [BLT20]. For each of the remaining 248 weak key values of k 1 , we find out one IV value v 1 and prepare a list. So the list IV contains 248 key-IV pairs. In the program, after guessing a value of the significant key bits, we find the corresponding IV from the prepared list IV , and then implement the attack. In this approach, we observe ϵ d = 0.98 and ϵ a = 0.49. The PNB size is 15. So, based on the complexity formula, we get N = 185, T = 119 and the complexity is 2 24.65 for α = 11.

Implementation:
In the implementation program, we run it over 2 15 different keys. The average time required to recover the key is 0.866 seconds, and the complexity is 2 23.50 , which is slightly less than the theoretically achieved value of 2 24.65 . Out of 2 15 keys, 32673 could be successfully recovered. So the success probability of the attack is 99.71%. To estimate the false alarm probability, for each of the 2 15 keys, we count the number of times the false alarm occurred, divide it by the number of guesses, and then calculate the average. We estimate a false alarm probability of 0.00015, which is less than the theoretical claimed upper bound 2 −11 = 0.000488. The source code of the attack program is given in the GitHub link [Gar22] and the summary in Table 3.

Implementing Multiple (ID, OD) attack: Comparison with single (ID, OD)
Finally, we present application of our technique on the 3-round ToyChaCha and confirm that our approach produces a more efficient cryptanalysis. Here, the distinguishers in the second round are considered. We use the input difference at ∆ (0) 13 [0] and the output difference in ∆ (2) 1 [6], which produces a bias 0.91. In the backward part, we have to come back by one round only. There are 24 PNBs, each of which provides a backward bias ϵ a = 1.
As we discussed that a high number of PNBs can actually increase the complexity, in this approach one has to exploit 8 significant bits and 24 PNBs. According to the complexity formula given in [AFK + 08], the complexity is 2 14.56 (which is actually not correct) for α = 40 and the data required is 94.7 ≈ 95. However, according to the modified and corrected complexity calculation formula in [DGSS22], the complexity is of the order of 2 24 , as the 24 PNBs need to be exhaustively searched at the end. We implement this attack and achieved the complexity 2 23.01 . The details can be found in Table 4.

Implementation:
We execute the program for 2 15 different keys. The average time required to recover the key is 10.066 ms and the complexity is 2 13.67 . The details of the complexity for each stage is given in Table 4. We also compare this attack with the single (ID, OD) based effort in the same table.
Discussion: As we can see in Table 4, it is validated that for single (ID, OD), the complexity formula proposed in [AFK + 08] does not provide correct complexity when the number of PNBs is large. On the other hand, the correctness of the modified complexity formula by [DGSS22] is validated too. Secondly, the multiple (ID, OD) attack approach proposed in our work and its complexity formula are properly verified through the experimental result. It is clear that our method reduces the attack complexity significantly than that of the existing single (ID, OD) strategy.

Success probability estimation for the attacks
Here we propose a theoretical approach to achieve a better estimation of success probability corresponding to the PNB-based differential cryptanalysis than what has been claimed in [AFK + 08]. It was claimed in [AFK + 08] that the success probability is at least 50%. So far, there has not been any disciplined investigation in this regard to estimate the success probability in a more accurate manner. Later, in Maitra's [Mai16] approach using the chosen IVs, it was claimed that the right IVs are available for 70% of the keys only. So, he computed the median bias over those 70% keys only and then used the same approach as in [AFK + 08] to compute the complexity. Thus, the success probability of the chosen IV approach can be claimed to be at least 35%. There has been no analysis how effective the attack is for the rest 65% keys. Therefore, this is an important area of further investigation to obtain a more accurate range for the success probability. Interestingly, our implementation on the toy cipher shows that both Aumasson's attack and Maitra's attack have success probability more than 99% on the toy version. This identifies that the attacks are far more effective than what was initially assumed. Thus, we aim to obtain a better measure of success probability.
Theorem 1. For each i ∈ {0, 1, . . . , n}, let X i denote the normal random variable with . Let Y be a random variable such that Y = X i for each i with probability 1 n . Let ρ 0 , ρ 1 , . . . , ρ k be such that 0 = ρ 0 < ρ 1 < · · · < ρ k−1 < ρ k = ϵ max and, for each of j = 0 to k − 1, X ′ j be the set of all X i such that ρ j < ϵ i < ρ j+1 f or 1 ≤ j ≤ k − 1. Consider E j be the event that Y chooses a X i from X j . Then, First we will find a lower and an upper bound for each Pr((Y < T )|E j ). Let Φ be the Cumulative Distribution Function of standard normal distribution. Then we know that, For any X i ∈ X j , On the other hand, if T − N 2 (1 + ϵ i ) ≥ 0, Therefore for any ϵ i such that ϵ i < ρ j+1 , . Since Φ is an increasing function, for any X i ∈ X j such that j ∈ {0, 1, . . . , k − 2}, In a similar manner for any ϵ i such that ϵ i > ρ j , · Pr(E j ).
Exploiting Theorem 1 to obtain the range for success probability: We use the above theorem to measure the success probability corresponding to the attacks. For any key, the observed bias at the OD bit ∆ (r) We collect the output key stream for N different IVs. So in this theorem, each X i can be considered to be the count of Γ p [q] = 0 out of the N samples corresponding to a key, say k i . Therefore the distribution is approximated by normal with mean N 2 (1 + ϵ i ) and standard deviation N 4 (1 − ϵ 2 i ). Now, Pr(X i < T ) is the probability that even for the correct guess of significant key bits of k i , it is not detected. Further, Pr(Y < T ) represents that for a randomly chosen key, the key is not detected in the attack even after the guess for significant bits are correct. Therefore, 1 − Pr(Y < T ) represents the success probability of the attack. So, 1 − · Pr(E j ) is an upper bound for the success probability. In the attack, the bias ϵ is approximated by ϵ d · ϵ a . Since ϵ d ≤ 1, the maximum value of ϵ, i.e., ϵ max is ϵ a . Instead of dealing with ϵ, we deal with ϵ d , since its value is high. We choose 0 = ρ ′ 0 < ρ ′ 1 · · · < ρ ′ k = 1 and define ρ i = ρ ′ i · ϵ a . Then, ρ i 's will follow the property mentioned in the theorem. Moreover, for some . Therefore, the lower and upper bounds are respectively ) · Pr(E j ). We apply Theorem 1 to estimate a range for the success probability of the attack based on the approach of [AFK + 08] as given in subsection 4.2. Further, we verify this comparing with the experimental result. For convenience, let us call each ρ j a marker. Proof. We consider k + 1 = 7 markers. We choose ρ ′ 0 = 0.0, ρ ′ 1 = 0.79, ρ ′ 2 = 0.83, ρ ′ 3 = 0.87, ρ ′ 4 = 0.91, ρ ′ 5 = 0.95, ρ ′ 6 = 1.0. We find the probabilities of E i 's experimentally, which is as follows: Pr(E 0 ) = 0.0, Pr(E 1 ) = 0.00157, Pr(E 2 ) = 0.174, Pr(E 3 ) = 0.292, Pr(E 4 ) = 0.281, Pr(E 5 ) = 0.252. Thus, we obtain the following.

Success probability of attack [AFK + 08] against ChaCha256
In the attack produced by [AFK + 08] against ChaCha256 for 7 rounds, the input difference was given at the position ∆    Now, we find the forward bias ϵ d for 2 15 randomly chosen key, and observe that the bias values are distributed in a wide range. For some keys, the bias is sometimes even as low as 0.001. Refer to Figure 2, that provides a spectral representation of the forward bias observed in ChaCha256, for different keys whose values are in the range [0, 0.026]. We divided the entire range into 46 parts, each of length 0.005. For each sub-range, the bar represents the percentage of keys which produces a forward bias in that range.
We apply Theorem 1 to find a lower and upper bound of the success probability of the differential attack proposed in [AFK + 08] against ChaCha256. Here, the 7-round ChaCha was cryptanalysed using a distinguisher in the third round. The input difference was given in the position ∆ Proof. We use total 12 markers ρ ′ 0 , ρ ′ 1 , . . . , ρ ′ 11 . The values and the corresponding probabilities of E j are given in Table 5.

Chosen IV
In the chosen IV approach, Maitra [Mai16] used the same (ID, OD) pair. Because of the chosen IVs, the median of forward biases increased to 0.14 and the backward bias is 0.015862. The data complexity is N = 15430828 and threshold T = 7726261. We obtain the forward bias ϵ d for 2 15 randomly chosen key. In Figure 3, a spectral representation of the forward bias is observed for ChaCha256, for different keys. The bias values are primarily distributed in the range [0.125 − 0.150]. We divided the entire range into sub-ranges of length 0.005 each.  Proof. To obtain the success probability, we use k+1 = 7 markers, 0, 0.125, 0.13, 0.135, 0.14, 0.145, 1. The probabilities Pr(E j ) for each range is given in Table 6.

Conclusion
This work first shows the limitation of the existing attack approaches using a single ID, OD pair against ChaCha, when the number of PNBs is high. Apart from improving the attack with multiple pairs significantly, our idea opens a new direction of further work exploiting a divide-and-conquer approach with several sets. If distinguishers in higher rounds can be discovered in future, this strategy can significantly reduce the attack complexity for 7 or higher rounds. A toy model of ChaCha is proposed as well for more detailed investigations to compare the efficacy of the existing and the new attacks. This helps to build a clearer understanding of the cryptanalytic techniques as the complete attack can be implemented with a reasonable complexity. Finally, we exploit statistical techniques to estimate the success probabilities of different cryptanalytic approaches against ChaCha and validate our idea on the toy version.