Cryptanalysis of QARMAv2

. QARMAv2 is a general-purpose and hardware-oriented family of lightweight tweakable block ciphers (TBCs) introduced in ToSC 2023. QARMAv2 , as a redesign of QARMAv1 with a longer tweak and tighter security margins, is also designed to be suitable for cryptographic memory protection and control flow integrity. The designers of QARMAv2 provided a relatively comprehensive security analysis in the design specification, e.g., some bounds for the number of attacked rounds in differential and boomerang analysis, together with some concrete impossible differential, zero-correlation, and integral distinguishers. As one of the first third-party cryptanalysis of QARMAv2 , Hadipour et al., [HGSE24] significantly improved the integral distinguishers of QARMAv2 , and provided the longest concrete distinguishers of QARMAv2 up to now. However, they provided no key recovery attack based on their distinguishers. This paper delves into the cryptanalysis of QARMAv2 to enhance our understanding of its security. Given that the integral distinguishers of QARMAv2 are the longest concrete distinguishers for this cipher so far, we focus on integral attack. To this end, we first further improve the automatic tool introduced by Hadipour et al. [HSE23,HGSE24] for finding integral distinguishers of TBCs following the TWEAKEY framework. This new tool exploits the MixColumns property of QARMAv2 to find integral distinguishers more suitable for key recovery attacks. Then, we combine several techniques for integral key recovery attacks, e.g., Meet-in-the-middle and partial-sum techniques to build a fine-grained integral key recovery attack on QARMAv2 . Notably, we demonstrate how to leverage the low data complexity of the integral distinguishers of QARMAv2 to reduce the memory complexity of the meet-in-the-middle technique. As a result, we successfully present the first concrete key recovery attacks on reduced-round versions of QARMAv2 . This includes attacking 13 rounds of QARMAv2 -64-128 with a single tweak block ( T = 1), 14 rounds of QARMAv2 -64-128 with two independent tweak blocks ( T = 2), and 16 rounds of QARMAv2 -128-256 with two independent tweak blocks ( T = 2), all in an unbalanced setting. Our attacks do not compromise the claimed security of QARMAv2 , but they shed more light on the cryptanalysis of this cipher.


Introduction
Our computing devices perform a wide range of computations, some of which are very sensitive, e.g., cryptographic operations, and others might even be malicious, e.g., malware.In addition, with the growth of cloud computing, we increasingly rely on running mutually untrusted processes on a shared platform.Overall, the adversary may have access to the system in use and can run a task on the same platform as the victim.In this security model, we trust the host hardware/platform but not the software running on it.Therefore, it's crucial to safeguard the sensitive parts of codes and data from unauthorized access by other processes on the same platform, ensuring that these operations and data remain private and secure.One solution is cryptographic memory protection to guarantee confidentiality and control flow integrity [Com16,Sec17].
One example is the Pointer Authentication Code (PAC) [Com16,Sec17], used in Arm architectures, which provides a control flow integrity mechanism and makes it much harder for an attacker to modify protected pointers in memory without being detected.The idea behind PAC is to insert a PAC into each pointer we want to protect before writing it to memory, and then verify the PAC before using the pointer.Therefore, an adversary who aims to modify a protected pointer has to find the correct PAC for the new value of the pointer to control the program flow.Another example of cryptographic memory protection can be found in Intel's SGX technology (Software Guard Extensions) [Gue16] incorporated into their CPUs.SGX creates a secure enclave within the processor, providing a protected area where sensitive operations, including encryption tasks, can occur without being accessible to external interference.The enclave ensures the confidentiality and integrity of the data and code within it, offering a secure execution environment even when the broader system may not be fully trusted.
The cryptographic primitives required for memory encryption, should be very fast to minimize the performance overhead.At the same time, they should be secure enough.Therefore, latency is the primary engineering constraint in the design of lightweight block ciphers for memory encryption, whereas area and, thus, the power are the secondary constraints.The previously well-analyzed ciphers, such as AES, are not a good choice because their latency is too high for memory encryption.In addition, for efficient memory encryption, we need a cryptographic primitive where the permutation not only depends on the key and plaintext but also on a public parameter tweak that can be the encrypted block's physical address.One approach to achieve this is to use modes of operations based on classical block ciphers.But these modes typically require constructions that lead to increased latency or extra memory to store, for example, the nonce.Another approach is to use a tweakable block cipher (TBC), where the permutation is determined by the secret parameter key and public parameters tweak and plaintext.In a TBC, the cipher should remain secure even if the tweak can be controlled by the adversary.
QARMAv2 [ABD + 23] is a general-purpose and hardware-oriented family of lightweight TBC that is designed to be also suitable for cryptographic memory protection and control flow integrity.This paper explores QARMAv2 from the cryptanalysis aspect, shedding light on its security against cryptanalytic attacks.The designers of QARMAv2 provided a relatively comprehensive security analysis in the design specification, e.g., differential, boomerang, integral, impossible differential, and zero-correlation attacks.For instance, the designers used the method introduced initially in [HBS21, HNE22] to provide some bounds for the number of attacked rounds in boomerang analysis.They also used the methods introduced very recently at EUROCRYPT 2023 [HSE23] to provide some concrete impossible differential and zero-correlation distinguishers.As another example, they used division property [Tod15,XZBL16] to provide concrete integral distinguishers for up to 5 rounds of QARMAv2-64.
As a first third party cryptanalysis of QARMAv2, Tim Beyne [Bey23] found a nonlinear invariant for the unkeyed round function of QARMAv2-64, a property that can be extended to multiple rounds only for a set of weak keys, but does not affect the full-round QARMAv2-64.Also, the designers addressed this weakness by incorporating a new S-box in the final version of the QARMAv2 specification [ABD + 23].As another third-party analysis, very recently, Hadipour et al., [HGSE24] significantly improved the integral distinguishers of QARMAv2.The longest concrete distinguishers for QARMAv2 up to now are the integral distinguishers proposed in [HGSE24].The authors of [HGSE24] exploited the control of the adversary over the tweak part and improved the automatic tool introduced in [HSE23] to find integral distinguishers for up to 10 (resp.12) rounds of QARMAv2-64 (resp.QARMAv2-128).However, they did not provide any key recovery attack based on their distinguishers, and the efficiency of integral key recovery attacks for QARMAv2 is still an open question.Therefore, this paper focuses on the integral cryptanalysis of QARMAv2.Outline.We first recall the specification of QARMAv2 in Subsection 2.1.It is followed by a brief overview of the integral distinguishers and their relation to zero-correlation distinguishers in Subsection 2.2.Subsection 2.3 briefly reviews the partial-sum and meetin-the-middle techniques in the key recovery of integral attacks.Section 3 discusses the MixColumns property of QARMAv2 in terms of integral cryptanalysis.After that, we present our improved automatic tool for finding integral distinguishers of TBCs following the TWEAKEY framework in Section 4. Lastly, we present our integral key recovery attacks on QARMAv2 in Section 5 and conclude in Section 6.

Background
In this section, we review the QARMAv2 specification.We then provide a brief overview of ZC distinguishers and their conversion to integral distinguishers for block ciphers.Lastly, we cover the partial-sum and meet-in-the-middle techniques in the key recovery of integral attacks.

Specification of QARMAv2
QARMAv2 is a redesign of QARMAv1 with a longer tweak and tighter security margins that was introduced in ToSC 2023 QARMAv2 also follows the reflector construction as illustrated in Figure 1.Following the specification of QARMAv2 [ABD + 23], we represent the inverse of a function f by f , and f −1 interchangeably, in this paper.

Figure 1: Reflector structure of QARMAv2
As Figure 1 shows, the reflector construction consists of three parts: forward function F , backward function F = F −1 , and the central construction G.This construction allows the implementation of both encryption and decryption using the same circuit with a minor set-up cost.According to Figure 1, the first and the last rounds include only the S-box layer and key addition without mixing with a tweak.The reflector is also independent of a tweak.K (i) , (resp.T (i) ) are derived by applying a linear function on a master key K (resp.master tweak T ).
The algorithm 1 describes the encryption/decryption of QARMAv2 in detail.X in algorithm 1 represents the internal state of the cipher and can be considered as ℓ layers of 4 × 4 arrays of nibbles, where ℓ ∈ {1, 2}.The data is arranged row-wise in each layer as follows: In what follows, we briefly describe the operations in QARMAv2 encryption in algorithm 1.The round constraints c i in algorithm 1 have no impact on our analysis and we omit them.
The state shuffle τ is applied to each layer separately, rearranging the nibbles' positions as shown in Figure 2. The S-box layer, denoted as S, applies a 4-bit S-box to each nibble of the state.The MixColumns layer M multiplies the following matrix to each column of each layer: where ρ ∈ F 4 2 , and ρ 4 = 1.In other words, ρ is the rotation to the left by one bit, i.e., ρ ((x 3 , x 2 , x 1 , x 0 )) = (x 2 , x 1 , x 0 , x 3 ), for x = (x 3 , x 2 , x 1 , x 0 ) ∈ F 4 2 .The exChangeRows operation is exclusively employed in the case of 2-layer versions (ℓ = 2).It involves swapping the first two rows between the two layers.This operation is applied every second round in forward and backward rounds, and should always appear in rounds r, and r + 1, i.e., before and after the central construction.
The tweakey schedule of QARMAv2 closely follows the TWEAKEY framework [JNP14], but it distinguishes the key and tweak, maintaining separate schedules for each.For ℓ = 1 (i.e., QARMAv2-64), the only acceptable key size is 128.For ℓ = 2 (i.e., QARMAv2-128), encryption is always defined with two full 128-bit inputs for the key, i.e., for a 256-bit string K 0 ||K 1 .In the case of QARMAv2-128-256, K 0 and K 1 are two halves of the master key.For other versions of QARMAv2-128 with a master key shorter than 256 bits, the master key is first extended to a 256-bit extended key K 0 ||K 1 .For more details about the extension function, refer to [ABD + 23].Then, the encryption algorithm alternates between using K 0 and K 1 as the round keys for the forward rounds.For the backward rounds it uses L 0 , and L 1 as the round keys which are derived from K 0 , and K 1 , by the following linear transformations: where α, and β are constants and o is a linear function over F b 2 as follows: o(w) := (w ≫ 1) ⊕ (w ≫ (b − 1)).For the reflector part, it uses W 0 = o 2 (K 0 ), and W 1 = o −2 (K 1 ), as the round keys (see Figure 4b).
Let T denote the master tweak.Also, let T represent the number of independent tweak blocks.For T = 1, we define T 1 = φ (T 0 ), where T 0 = T .Besides, the two tweak blocks just before the center for encryption are equal in the case of T = 1.For T = 2, T 0 , and T 1 are two independent blocks (each one b bits) of the master tweak T = T 0 ||T 1 .Let t i be the round tweak in round i. Besides, assume that t 1 = T 1 , and t 2 = φ −r (T 0 ), where r is the number of forward/backward rounds.The tweak schedule of QARMAv2 derives the round tweaks as follows: t 2i+1 = φ(t 2i−1 ), and t 2i+2 = φ −1 (t 2i ) for i ≥ 1, where φ is a permutation on the position of the tweak nibbles as illustrated in Figure 3.One round of QARMAv2 is represented in Figure 4a, and in Figure 4b, you can see the QARMAv2 encryption for odd values of r, where the number of forward and backward rounds are the same.Table 2 briefly describes the main parameters of different versions of QARMAv2.As per [ABD + 23], ε in Table 2 is typically a small number like 2. However, for standardization, the QARMAv2 designers recommend setting ε to 16.
In the context of cryptanalysis, we need to study the security of reduced round versions of QARMAv2.A reduced round version of QARMAv2 can be obtained by reducing the number of rounds before or after the reflector construction, or both.According to the designers in [ABD + 23], rounds are counted as S-box layers.If the number of rounds before and after the reflector construction are the same, we call it a balanced reduced round of QARMAv2, otherwise it is called an unbalanced reduced round.We recall that, the designers of QARMAv2 also provided cryptanalysis results in unbalanced setting in [ABD + 23].For example, the impossible-differential distinguisher for 9 rounds of QARMAv2-64 (T = 2) in [ABD + 23] is composed of 5 forward rounds and 4 backward rounds.As another example, most of the bounds for boomerang distinguishers in [ABD + 23], apply to unbalanced reduced rounds.

From Zero-Correlation to Integral Distinguishers
The concept of integral distinguishers was initially introduced as a theoretical extension of differential distinguishers by Lai [Lai94] and subsequently, as a practical attack by Daemen et al., [DKR97].This concept was further formalized by Knudsen and Wagner [KW02].
The fundamental idea behind integral distinguishers is to identify a set of inputs whose corresponding outputs sum up to zero (or a key-independent value) in specific bit/cell positions.The idea of zero-correlation (ZC) distinguishers was initially proposed by Bogdanov and Rijmen [BR14] after introducing integral distinguishers.The core idea of ZC distinguishers is to exploit the linear approximations with zero correlation of the block cipher to distinguish it from a random permutation.ZC attacks were later improved further by Bogdanov  which states that a ZC linear hull for block ciphers defined over F n 2 always results in an integral distinguisher.

Theorem 1 (Sun et al. [SLR
According to Theorem 1, the data complexity of the integral distinguisher obtained from a ZC linear hull is 2 n−m , where n is the block size, and m is the dimension of the linear space created by the input linear masks in the corresponding ZC linear hull.At ToSC 2019, Ankele et al. investigated the impact of the tweakey on ZC distinguishers for TBCs, → F n 2 be a TBC following the STK construction as illustrated in Figure 5. Suppose that the tweakey schedule of E K has z parallel paths and applies a permutation h on the tweakey cells in each path.Let (Γ 0 , Γ r ) be a pair of linear masks for r rounds of E K , and Γ 1 , . . ., Γ r−1 represents a possible sequence for the intermediate linear masks.If there is a cell position i such that any possible sequence Γ 0 to find ZC linear hulls for TBCs based on Theorem 2, and significantly enhanced the ZC and integral attacks on all variants of SKINNY and some other tweakable block ciphers.This CP model was further improved in [HGSE24].
The QARMAv2 design is related to the TWEAKEY framework.Moreover, the methods introduced in [HSE23, HGSE24] have proven highly efficient in uncovering integral distinguishers for TBCs using the TWEAKEY construction.Consequently, we employ the same approach to discover integral distinguishers for QARMAv2.Nevertheless, as we will elaborate in Section 4, we refine the technique introduced in [HSE23, HGSE24] by considering the distinctive structure of the QARMAv2 diffusion layer.This enhancement allows us to identify integral distinguishers that are more effective for integral key recovery attacks.

Key Recovery in Integral Attacks using the Partial-Sum Technique
For integral distinguishers derived from ZC linear hulls the sum of the outputs is zero in specific bit positions (balanced bit positions).We typically append some rounds to the distinguisher to build a key recovery upon an integral distinguisher.Then, we guess the involved key bits to partially decrypt the ciphertexts and compute the sum of the distinguishers' outputs in the balanced bit positions.If the sum is zero, we keep the guessed key bits as potential candidates.Otherwise, we discard the guessed key bits.
The partial-sum technique was initially introduced by Ferguson et al. in [FKL + 00] to reduce the time complexity of integral attacks.Unlike the naive integral key recovery, where we guess the involved key bits all at once, the partial-sum technique divides the partial decryption into several steps.We guess a subset of the involved key bits at each step and store the intermediate results.We repeat the process until we reach the distinguisher's output.At each step, only a portion of the internal state is involved, whose values are needed to calculate the final sum.One advantage is that the size of involved positions reduces as we approach the distinguisher's output.In addition, to compute the sum of the distinguisher's outputs in the balanced bit positions, we only need to know if each involved value appears an even or odd number of times.
Figure 6 represents the integral key recovery for 6 rounds of AES using the partial-sum technique.This attack relies on a 4-round integral distinguisher, derived by encrypting 2 32 plaintexts that take all possible values in the main diagonal and a fixed value in other positions.After 4 rounds of AES, the sum of the outputs is zero in all bytes.The last round does not include the MixColumns, and instead of K 4 , we retrieve K4 = MC −1 (K 4 ), i.e., the so-called equivalent key for 5th round.The colored numbers in Figure 6 denote the corresponding step of the partial-sum technique for each byte in the internal state or round keys.According to Figure 6 we have: where S is the AES S-box.We first implement 0E • S −1 (.) by S 0 (.), 09 • S −1 (.) by S 1 (.), 0D • S −1 (.) by S 2 (.), and 0B • S −1 (.) by S 3 (.) as lookup tables.Next, we perform the partial-sum key recovery as outlined in algorithm 2.
Initialize a list L 2 of size 2 16 with zeros; Initialize a list L 3 of size 2 8 with zeros; In total 5 bytes of round keys are involved, and each balanced byte provides an 8-bit filter.Hence, it is necessary to execute algorithm 2 with 6 distinct sets of 2 32 chosen plaintexts to uniquely retrieve the relevant key bits.Each run of algorithm 2 proposes a set of key candidates, and the correct key lies at the intersection of all the suggested sets.The time complexity of the naive approach is 6 • 2 32 • 2 40 ≈ 2 74.58 partial decryptions.However, the time complexity of algorithm 2 is 2 50 S-box computations.Repeating it for 6 sets of 2 32 chosen plaintexts yields a total complexity of at most 6 • 2 50 ≈ 2 52.58 S-box lookups.The required memory to store 2 32 ciphertexts dominates the memory complexity, and the data complexity is 6 • 2 32 ≈ 2 34.58 chosen plaintexts.As can be seen, the partial-sum technique significantly reduces the time complexity of integral attacks.So, we use this technique to build integral key recovery attacks on QARMAv2.

Meet-in-the-Middle Technique
From the 6-round integral key recovery attack on AES one can see that the number of involved key bits to compute the final sum is an effective factor in time complexity.Meetin-the-middle technique in integral key recovery, firstly introduced in [SW12], splits the involved key bits into two sets and enables us to retrieve each set of key bits independently.We explain this technique with a simple example.As Figure 7 shows, assume that we aim to compute Z from the ciphertexts and check if Z = 0.In a naive approach, we must guess all the involved key bits K 1 ∪ K 2 .However, by looking at Figure 7, we observe that Z = X ⊕ Y .Verifying Z = 0 is the same as confirming that X = Y .This enables us to independently calculate X and Y and then compare them for equality.The advantage is that we only need to guess K 1 (resp.K 2 ) to compute X (resp.Y ).Each guess of K 1 and K 2 that satisfies X = Y is considered a potential candidate.Consequently, the time complexity of guess-and-filter for the involved key bits decreases from 2

Integral Properties of QARMAv2 Diffusion Matrix
In this section, we show how to exploit the properties of QARMAv2 MixColumns to bypass the diffusion layer right after the distinguisher.This approach involves fewer key bits in the key recovery process, allowing us to decrease the time complexity and add more rounds for key recovery.We demonstrate that if two balanced cells are present in one column before the MixColumns operation, a linear combination of cells in the same position after MixColumns remains balanced.
T represent columns before and after the MixColumns, respectively.Assume that C is a pool of ciphertexts derived from the input set of integral distinguishers.In addition, since the key is fixed in the integral attack, we use X i (c) and Y i (c) to indicate the dependency on the ciphertext c ∈ C. For i, j ∈ {0, 1, 2, 3} with i ̸ = j, if X i , and X j have the zero-sum property, then we have: Proof.According to Equation 1, we have: (4) Therefore, by multiplying ρ (i−j) mod 4 to the ith row and then adding it to the jth row on both sides of Equation 4, we obtain: As a result, if X i , and X j have the zero-sum property, then c∈C ρ We note that Ankele et al. already discovered a similar property for QARMAv1 [ADG + 19].According to Lemma 1, if two inputs of the MixColumns, for instance, X i and X j , have the zero-sum property, we can transfer the distinguishing property to the output of MixColumns by verifying whether c∈C ρ (i−j) mod 4 Y i (c) = c∈C Y j (c) holds or not.This way, we can use the meet-in-the-middle technique to derive c∈C ρ (i−j) mod 4 Y i (c) and c∈C Y j (c) independently.With our enhanced model for distinguishers, as detailed in Section 4, we identify new integral distinguishers for QARMAv2 that leverage the MixColumns property to increase the number of rounds for the key recovery attack.

Search for Distinguishers
This section introduces our new CP model for detecting integral distinguishers in the QARMAv2 cipher.Our approach follows the method introduced in [HSE23, HGSE24].However, we enhance the model by considering the unique structure of the QARMAv2 diffusion layer.This refinement allows us to leverage Lemma 1 to bypass the diffusion effect of the MixColumns at the end of the distinguisher, thereby reducing the number of involved key bits in the key recovery attack.We elaborate on our model for a TBC following the TWEAKEY framework, as depicted in Figure 5. Therefore, it is adaptable not only to QARMAv2 but also to other TBCs, such as SKINNY, MANTIS, and CRAFT, which share similar diffusion layers.
We aim to utilize Theorem 2 to discover a ZC distinguisher suitable for integral key recovery attacks.Subsequently, leveraging Theorem 1, we convert the ZC distinguisher into an integral distinguisher, forming the basis for a key recovery attack.Therefore, we detail the creation of a CP model for searching for ZC distinguishers based on Theorem 2. In this process, we encode deterministic linear trails both forward and backward through the cipher.Our modeling of deterministic linear trails follows the method introduced in [HSE23, HGSE24].For more details on encoding deterministic linear trails at the cell level, one can refer to [HSE23,HGSE24].

CP Model for Integral Distinguishers in [HSE23, HGSE24]
We first briefly review the model in [HGSE24] for finding integral distinguishers.As illustrated in Figure 5, let E be a tweakable block cipher following the STK framework, with block size of n = m • c, where m and c denote the number of cells and the cell size, respectively.Suppose that E has z parallel independent tweakey paths, and h denotes the permutation on the position of the tweakey cells.Besides, assume that ST K r [i] represents the ith cell of the subtweakey after r rounds.
As Figure 8b illustrates, we define integer variables AXU r [i] (resp.AXL r [i]) to represent the activeness pattern of the ith cell of the internal state after r rounds in the forward direction (resp.background direction).The domain of these variables is {0, 1, 2, 3}, where 0 indicates the zero linear mask, 1 indicates a fixed nonzero linear mask, 2 indicates a free nonzero linear mask, and 3 indicates a free linear mask.Then, as visualized by Figure 8b, we define some constraints to encode the propagation of the deterministic linear trails in forward and backward direction over r d rounds independently.For more details on the constraints, please refer to [HSE23,HGSE24].We also add the constraints Modeling the distinguisher in [HSE23].We define the integer variables ASTK r [i] ∈ {0, 1, 2, 3} to encode the activation pattern of ST K r [i].We know that the activeness pattern of tweakey cells in the propagation of linear trails should follow the linear mask of the internal states.Assume that AYU and AYL are integer variables like AXU and AXL, indicating the activeness pattern of the internal state right before the round tweakey addition.Therefore, we add the new constraint ASTK r [i] = min{AYU r [i], AYL r [i]} for all 0 ≤ r ≤ r d − 1 and 0 ≤ i ≤ m − 1, to link the activeness pattern of the subtweakey to the activeness pattern of the internal state.This way, the subtweakey follows the activeness pattern in one of the forward or backward propagations with less active cells.Then, to ensure that the conditions of Theorem 2 are met, we add the following constraint: The conjunction of the CSP models above, i.e., CSP d = CSP u ∧ CSP TK ∧ CSP l , creates a unified CP/MILP model based on satisfiability, whose all feasible solutions are the ZC/integral distinguishers for r d rounds of the block cipher E. By including the objective function max we can maximize the number of linearly active cells at the input.According to Theorem 1, the number of active cells at the input of the corresponding integral distinguisher is minimized, and we can find integral distinguishers with minimum data complexity.Additionally, the linear combination The authors of [HGSE24] applied the above model to identify integral distinguishers for several TBCs, including QARMAv2, resulting in a significant enhancement for integral distinguishers of QARMAv2.However, all integral distinguishers reported in [HGSE24] have only one balanced cell at the output, right before the MixColumns.Consequently, we cannot exploit the MixColumns property of QARMAv2 since we require at least two balanced cells in one column to bypass the diffusion effect of the MixColumns layer.In the following section, we detail refinements to the model in [HSE23,HGSE24] for discovering integral distinguishers with multiple balanced cells in a single column at the output.

Our CP Model for Integral Distinguishers
We need at least two balanced cells in the output of distinguishers to exploit the MixColumns property.We keep the constraints for the forward propagation, i.e., CSP u , unchanged.However, we modify the model for the backward propagation.As illustrated in Figure 8c, we create two independent CSP models CSP1 l , and CSP2 l with independent variables, (AXL1 r [i], AYL1 r [i]), and (AXL2 r [i], AYL2 r [i]), respectively, to model the deterministic linear trails in the backward direction.The idea is that the combination of each CSP model CSP1 l and CSP2 l with CSP u should create a CSP model whose solutions are ZC/integral distinguishers.We define two independent sets of integer variables ASTK1 r [i], and ASTK2 r [i] for subtweakey cells and add the following constraints to link the activeness pattern of the subtweakey to the activeness pattern of the internal state for each backward propagation: Then, to ensure that the conditions of Theorem 2 hold for both CSP models CSP u ∧ CSP1 l and CSP u ∧ CSP2 l , we include the constraints CSP DT K as follows: where contradict1[i], and contradict2[i] are binary variables for all 0 ≤ i ≤ m − 1.We aim to ensure that CSP1 l and CSP2 l produce two distinct activeness patterns for the output cells.Additionally, the active cells in AXL1 rd and AXL2 rd should reside in the same columns.To achieve this, we first constrain the values of AXL1 rd [i] and AXL2 rd [i] to 0, 1.We then introduce the constraints Finally, we incorporate additional constraints to ensure that the active cells of AXL1 rd and AXL2 rd appear in the same column.The conjunction of the CSP models, denoted as  For QARMAv2-64 (T = 1/2), and QARMAv2-128 (T = 2), we found 9/10-round, and 11-round ZC-based integral distinguishers with data complexity 2 44 , respectively.Figure 9, Figure 10, Figure 14 illustrate some of these distinguishers featuring two balanced cells within a single column of the output state.The colors employed in the figures signify the activity pattern of the corresponding cells in the ZC distinguishers, where denotes any linear mask, signifies a nonzero linear mask, and represents a nonzero fixed linear mask.Inactive cells remain white (blank).To transform the ZC distinguishers into integral distinguishers, we must invert the activity pattern in the input.In other words, active cells with an arbitrary linear mask ( ) should assume a fixed value, while inactive cells should take all possible value exactly once.Then, the output cells with are balanced in the corresponding integral distinguisher.
As mentioned earlier, we model the propagation of linear masks in the backward direction using two independent CSP models.This is why we depict the activity pattern in the backward direction with upper and lower triangles in each cell.For instance, and signify that the linear mask of the corresponding cell can assume any value in one backward path, but it must be nonzero in the other backward path.In addition, denotes the tweak cell that should take all possible values in the corresponding integral distinguishers.We explain the interpretation of Figure 9 as an example, and interpreting other figures is similar.Figure 9 represents a 9-round ZC linear hull for QARMAv2, taking the tweak schedule into account.As seen in Figure 9a, 6 input cells can take any linear mask ( ), while the linear masks of the other 10 input cells are zero.The output cells can take a fixed nonzero linear mask ( ) in 8th (first backward propagation ) and 12th (second backward propagation ) cells.The tweak cell 0 takes a nonzero liner mask exactly once (after the reflector construction), whereas its linear mask is zero everywhere else.As a result, according to Theorem 2, we have two independent ZC linear hulls with the same activeness pattern at the input, but different active cells at the same column of the output.
To convert the ZC linear hulls in Figure 9 into integral distinguishers, the input cells with active linear masks ( ) should take a fixed value and the linearly inactive cells should take all possible values exactly once.In addition, the tweak cell number 0 should take all possible values exactly once.Then, the output cells with / labels are balanced in the corresponding integral distinguishers.Due to 10 + 1 active cell (10 active cells at the internal state and 1 active cell in the tweak) at the input, the data complexity of the resulting integral distinguisher is 2 44 .
According to Table 2, the data complexity of any valid attack on QARMAv2-64 (resp.QARMAv2-128) should be less than 2 56 (resp. 2 80 ).Therefore, our integral distinguishers satisfy the data complexity limits.We also found a 12-round integral distinguisher for QARMAv2-128 (T = 2) with two balanced output cells, as illustrated in Figure 15.However, we do not use it in our key recovery attack since its data complexity is 2 96 (above the threshold).In Section 5, we elaborate on applying meet-in-the-middle and partialsum techniques to construct an efficient key recovery attack based on our new integral distinguishers.

Integral Key Recovery
Here, we use the partial-sum technique [FKL + 00] and the meet-in-the-middle approach [SW12] to provide key recovery attacks upon our distinguishers for QARMAv2.Moreover, to exploit the low data complexity of the integral distinguisher, we construct each distillation table after guessing the whole of L 0 .It is also helpful to reduce the required memory complexity for the meet-in-the-middle approach.Recall that the authors of QARMAv2 claimed (κ − ε)-bit security for κ-bit secret key, where ε is adjusted to values such as 16 for standardization purposes.We emphasize that our key recovery attacks are valid for the parameters suggested for general or standardization purposes.

Integral Attack for 13-Round QARMAv2-64 (T = 1)
Here, we propose an integral attack against 13-round of QARMAv2-64 (T = 1) by appending 4 rounds for key recovery to the ciphertext side of our 9-round distinguisher in Figure 9a. Figure 11 shows the overview of the key recovery.As seen in Figure 11, thanks to having two balanced cells in the same column at the output of the distinguisher, we can bypass the diffusion effect of the MixColumns layer.Otherwise, much more key bits would be involved in the key recovery, yielding a higher time complexity.Besides, we sometimes guess L0 = M • τ (L 0 ) and L1 = M • τ (L 1 ) instead of L 0 and L 1 .Let X i , Y i , and Z i be internal states defined in Figure 11.
Attack Procedure.The 9-round integral distinguisher is built by using 2 44 chosen plaintexts.We have the 8-bit balanced value after the S-box layer.We use the following relation for the efficiency of the key recovery.
Only two nibbles of X 0 are enough to observe the 4-bit balanced property.Therefore, we use the meet-in-the-middle approach, where we independently compute the sum of .We finally retrieve the secret key satisfying . One structure, using 2 44 chosen plaintexts, can be a 4-bit filter, i.e., the secret-key space is reduced by the factor of 2 −4 .However, we need a more substantial filtering effect to build an attack whose complexity is less than 2 128−16 , and hence valid for standardization purposes.Therefore, we use s structures to enhance it to a 4s-bit filter.
The straightforward meet-in-the-middle approach yields an enormous memory complexity to store all the guessed key bits.To reduce the memory complexity, we share some guesses, specifically the whole of L 0 , in both procedures.Specifically, we use the following procedure.
1. Guess the whole of L 0 , 64 bits, and construct two distillation tables to compute the sum of X 0 [5] and X 0 [15].
(a) Compute the sum of X 0 [5] by using the partial-sum technique (see Table 3).
(b) Compute the sum of X 0 [15] by using the partial-sum technique (see Table 4).
(c) Apply the meet-in-the-middle approach and retrieve about 2 64−4s key candidates about L 1 .
(d) Guess 2 64−4s L 1 and check the correctness by a few trial encryptions.
Table 3 and Table 4 summarize the partial-sum procedures to compute the sum of X 0 [5] and the sum of X 0 [15], respectively.Here, mix(X, X ′ ) denotes a linear function represented by ρ i (X) ⊕ ρ j (X ′ ) with a proper i and j.A unit of each partial-sum procedure involves memory load/write and S-box evaluation.However, in practice, each procedure utilizes at most two nibbles of the state while guessing one nibble of the key.Therefore, practical precomputation for S-box evaluation becomes viable.As a result, the primary cost is mostly attributed to memory access, and these accesses, in addition, tend to occur sequentially.It's important to note that sequential memory access (MA) is exceptionally fast, and in our paper, we consider its cost as equivalent to the S-box evaluation (1/16 of the round function).We finally estimate the attack complexity.
Total 2 50.15Note that each list for the meet-in-the-middle contains 2 36 key candidates.Even if we repeat the procedure 2 64 times for guessing L 0 , the cost of sorting and matching two lists is negligible.When we regard MA and RF as 1 16 RF and 1 13 ENC, respectively, the total time complexity is 2 110.47 with s = 5.Thus, the data complexity is 5 × 2 44 .Each partial-sum procedure has to store at most 2 40 values, but storing s × 2 44 ciphertexts is more significant.Therefore, the memory complexity is about s × 2 44 .

Integral Attack for 14-Round QARMAv2-64 (T = 2)
We have discovered a 10-round integral distinguisher with two balanced output words for QARMAv2-64 (T = 2) as illustrated in Figure 10.By adding 4 rounds to the integral distinguisher, we obtain a 14-round integral attack.Almost the same attack procedure as the key recovery for T = 1 is applicable.We guess the whole of L 1 and construct two distillation tables for computing the sums of X 0 [11] and X 0 [4].The complexity for computing the sum of X 0 [4] is slightly more efficient because it involves no active tweak where RF denotes the cost of the round function, MA denotes the cost of each partial-sum procedure, and ENC represents the cost of the encryption algorithm.With the same conversion of the unit of complexity, the attack complexity is 2 110.17 with s = 5.The required data complexity is 5 × 2 44 .Figure 12 summarizes the 14-round key recovery.

Integral Attack for 16-Round QARMAv2-128 (T = 2)
We append 5 rounds to our 11-round integral distinguisher for QARMAv2-128 (T = 2) in Figure 14 to obtain a 16-round integral attack.The attack procedure is similar to the QARMAv2-64 attack.We initially guess further bits to reduce the time and memory complexity, i.e., the whole of L 1 and part of L0 .Specifically, we use the following procedure.
Compute the sum of X 0 [21] by using the partial-sum technique (see Table 5).
Compute the sum of X 0 [31] by using the partial-sum technique (see Table 6).
(c) Apply the meet-in-the-middle approach and retrieve about 2 80−4s key candidates about L 0 .
(d) Guess 2 80−4s L 0 and check the correctness by a few trial encryptions.Each list for the meet-in-the-middle contains 2 44 key candidates.Therefore, we regard the cost of sorting and matching two lists as negligible.When we regard MA and PD as 1 16 RF and 4RF = 4 16 ENC, respectively, the total time complexity is 2 234.11 with s = 6.Thus, the data complexity is 6 × 2 44 .Each partial-sum procedure needs to store 2 40 values.Storing s × 2 44 ciphertexts is more dominant.Thus, the memory complexity is 6 × 2 44 = 2 46.58 .

Conclusion and Future Works
In this paper, we further improved the tool for finding integral distinguishers proposed in [HSE23,HGSE24].Using this new tool, we could exploit the MixColumns property of QARMAv2 to find new integral distinguishers for QARMAv2 that are more efficient in terms of integral key recovery.Then, we leveraged the combination of meet-in-the-middle and partial-sum techniques to propose the first concrete key recovery attacks on QARMAv2.
Our CP model to search for integral distinguishers is not limited to QARMAv2 and can be applied to similar designs such as MANTIS and CRAFT.In summary, we provided a 13-round attack on QARMAv2-64 (T = 1), a 14-round attack on QARMAv2-64 (T = 2), and a 16-round attack on QARMAv2-128 (T = 2).While our attacks do not pose a practical threat to the security of QARMAv2, they contribute valuable insights into its security, prompting further research in this domain.For instance, we presented a 12-round integral distinguisher for QARMAv2-128 (T = 2) with a data complexity of 2 96 .As the data complexity of this distinguisher exceeds 2 10 x 11 x 12 x 13 x 14 x 15     In addition, assuming that b is the block size of QARMAv2-b, the first cell of the first layer includes bit indices [b − 1, • • • , b − 4], and the last cell of the last layer includes bit indices [3, • • • , 0].Consequently, the number of layers is ℓ = b/64.Besides, a b-bit value in the design of QARMAv2 is called a block.

( a )
A full round of QARMAv2.(b) QARMAv2 encryption for odd r.

Figure 5 :
Figure 5: The STK construction of the tweakey framework.

Figure 6 :
Figure 6: Involved cells in integral key recovery on 6 rounds of AES.

Figure 7 :
Figure 7: Meet-in-the-middle technique in integral key recovery.
Our modeling for integral distinguishers.

Figure 8 :
Figure 8: Modeling the ZC and integral distinguishers as a CSP problem.
forms a unified CP/MILP model based on satisfiability.All feasible solutions of this model represent integral distinguishers for r d rounds of the block cipher E with at least two balanced cells in the same column at the output of the distinguishers.We implemented this model for all versions of QARMAv2 using MiniZinc [NSB + 07] and successfully solved it with the open-source CP solver Or-Tools[PF]   on a regular laptop within seconds.Integral distinguisher II for (4 + 5) 9 rounds of QARMAv2-64.Data complexity: 2 48 .

Table 1 :
Summary of our attacks on QARMAv2.T : No. of independent tweak blocks.Our contributions.In this paper, we shed more light on the security of QARMAv2.Considering that the integral distinguishers of QARMAv2 are the longest concrete distinguishers [SW12]9]GSE24]r so far, we focus on integral attack.We first improve the automatic tool introduced in[HSE23,HGSE24]for finding integral distinguishers of TBCs following the TWEAKEY framework.This new tool exploits the MixColumns property of QARMAv2 to find integral distinguishers more suitable for key recovery attacks.The application of our tool for finding integral distinguishers is not limited to QARMAv2, and it is usable to other TBCs such as MANTIS and CRAFT[BLMR19].Then, we combine several techniques for integral key recovery attacks, e.g., Meet-in-the-middle[SW12]and partial-sum [FKL + 00] techniques to build a fine-grained integral key recovery attack on QARMAv2.Notably, we demonstrate how to leverage the low data complexity of the integral distinguishers of QARMAv2 to reduce the memory complexity of the meet-in-the-middle technique.Table1summarizes our key recovery attacks.While the designers of QARMAv2 assert (κ − ε)-bit security for a κ bit secret key, with ε as a small number, such as 2, they recommend larger values for ε, such as 16 for standardization.Our analyses remain valid even for standardization purposes, i.e., ε ≤ 16.Additionally, our attacks operate in an unbalanced setting, meaning the number of forward and backward rounds is unequal.The source code of our tool is available at https://github.com/hadipourh/QARMAnalysis.
It offers two block sizes, b = 64, 128 bits, denoted by QARMAv2-b-s, where s is the bit size of the key (or the security level in bits).For b = 128, the key size can be s = 128, 192, or 256 bits, and for b = 64, the key length is always s = 128 bits and can be omitted from the notation.Similar to MANTIS [BJK + 16], and PRINCE [BCG + ], [ABD + 23].The goal behind the design of QARMAv2 is to provide a general TBC that is also suitable for memory encryption, and fast computation of short-message MACs.

Table 2 :
Main parameters of QARMAv2 (b) Parameters of QARMAv2 with a single tweak block (T = 1).
and Wang at FSE 2012 in [BW12].At ASIACRYPT 2012, Bogdanov et al. demonstrated in [BLNW12] that an integral distinguisher, as defined by a balanced vectorial Boolean function, unconditionally implies a ZC distinguisher.At CRYPTO 2015, Sun et al. introduced Theorem 1 in [SLR + 15]

Table 5 and
Table6summarizes the partial-sum procedures to compute the sum of X 0 [21] and the sum of X 0 [31], respectively.Here, mix(X, X ′ ) denotes a linear function represented by ρ i (X) ⊕ ρ j (X ′ ) with a proper i and j.