1, 2, 3, Fork: Counter Mode Variants based on a Generalized Forkcipher

. A multi-forkcipher (MFC) is a generalization of the forkcipher (FC) primitive introduced by Andreeva et al. at ASIACRYPT’19. An MFC is a tweakable cipher that computes s output blocks for a single input block, with s arbitrary but ﬁxed. We deﬁne the MFC security in the ind-prtmfp notion as indistinguishability from s tweaked permutations. Generalizing tweakable block ciphers (TBCs, s = 1), as well as forkciphers ( s = 2), MFC lends itself well to building simple-to-analyze modes of operation that support any number of cipher output blocks. Our main contribution is the generic CTR encryption mode GCTR that makes parallel calls to an MFC to encrypt a message M . We analyze the set of all 36 “simple and natural” GCTR variants under the nivE security notion by Peyrin and Seurin from CRYPTO’16. Our proof method makes use of an intermediate abstraction called tweakable CTR (TCTR) that captures the core security properties of GCTR common to all variants, making their analyses easier. Our results show that many of the schemes achieve from well beyond birthday bound (BBB) to full n -bit security under nonce respecting adversaries and some even BBB and close to full n -bit security in the face of realistic nonce misuse conditions. We ﬁnally present an eﬃciency comparison of GCTR using ForkSkinny (an MFC with s = 2) with the traditional CTR and the more recent CTRT modes, both are instantiated with the SKINNY TBC. Our estimations show that any GCTR variant with ForkSkinny can achieve an eﬃciency advantage of over 20% for moderately long messages, illustrating that the use of an eﬃcient MFC with s ≥ 2 brings a clear speed-up.


Introduction
Forkcipher (FC) [ALP + 19b] is a novel symmetric primitive, originally conceived for efficient authenticated encryption (AE) of short messages. It transforms a fixed length (n-bit) plaintext input X into a larger (2n-bit) fixed length output Y via a secret key K and an (optional) public tweak T . The security notion of an FC is given as indistinguishability from two pseudorandom tweakable permutations (ind-prtfp) [ALP + 19b]. An FC is then used to build secure nonce-based AE schemes that require strictly one primitive (FC) call per message block. The FC modes for nonce-based AE (r)PAEF and SAEF are the first examples of such one-primitive-call-per-block constructions. Other nonce-based AE schemes, such as TAE [LRW02], ΘCB [RK11], GCM [MV04], CCM [WHF03], and OCB [RBB03], incur at the very least one additional primitive call. Combined with an efficient FC, the FC-based AE modes evidently minimize computational cost for short messages.
Forkcipher applications beyond efficient short-message AE are still to be explored, especially their possible efficiency and security advantages over regular and tweakable ciphers. For example, recently, a stronger security for the forkcipher based AE mode SAEF has been proved to support its defense in depth. Additional to the classical nAE [Rog02] security, SAEF has been shown secure [ABV20, ABV21, ABD + 20] under both OAE [FFL12] and INT-RUP [ABL + 14] security notions.
In this work we focus on the applications of forkciphers to counter(CTR)-mode-like encryption. CTR mode is one of the most deployed symmetric encryption schemes. Its applications include random number generator AES-CTR DRNG [BK07], asynchronous transfer modes, network security: TLS/SSL and low power protocols, and IP security, among others. Furthermore, many standardized AE modes, such as GCM [MV04], CCM [WHF03], and SIV [RS06] make use of the CTR mode internally.
CTR mode was introduced in 1979 [LRW00] and is part of the US National Institute of Standards and Technology NIST SP 800-38A Recommendation for block cipher modes of operation. In his survey, P. Rogaway says "I regard CTR as the "best" choice among the classical confidentiality-only techniques. Its parallelizability and obvious correctness, when based on a good blockcipher, mean that the mode should be included in any modern portfolio of modes.", and "Overall, usually the best and most modern way to achieve privacy-only encryption" [Rog11].
For an underlying block cipher E K with a key K, a message M , an initialization vector IV (either a nonce as a non-repeating value, or a random value), and a counter j, the classical CTR mode is defined as c j = E K (f (IV, j)) ⊕ M j . The output of f is the counter block and is a unique input to each call of the block cipher. A secure and hence typical choice of f is the simple concatenation operation, or f (IV, j) = IV j, e.g., the IV takes the upper 64 bits and the counter takes the lower 64 bits of a 128-bit counter block. For a non-repeating nonce IV , CTR mode is indistinguishable from random bits under chosen plaintext attack (CPA). If the nonce is reused, the CTR security is completely compromised. When E K is a 128-bit block cipher, such as the AES-128, the CTR mode achieves confidentiality under CPA up to the birthday bound (BB), that is up to 2 64 encrypted data blocks, assuming that E K is a secure pseudorandom permutation (PRP). The CTR mode's most desirable features are the forward-only primitive operation in both encryption and decryption (known as the inverse-free property), and the full parallelizability. These make CTR particularly efficient and well-suited for modern architectures with multiple cores and SIMD extensions where blocks can be encrypted (and decrypted) in parallel.
Although classical blockciphers, such as the AES, are still broadly used, a new class of tweakable blockciphers [LRW02] (TBC), such as CRAFT [BLMR19], Deoxys [JNPS16], SKINNY [BJK + 16], etc., has proliferated in the last decade. A tweakable cipher takes an additional public input called tweak. The tweak is used to ensure both the cipher "variability" and increased resistance against precomputation attacks. 1 The variability factor is particularly useful when analyzing security of TBC-based modes: if two AES calls are made with the same plaintext block X, the result will be identical, yet, under distinct tweaks T 1 and T 2 a "good" (indistinguishable from a tweak-indexed collection of random permutations) TBC returns computationally independent ciphertexts E T1 K (X) and E T2 K (X). The CounTeR in Tweak (CTRT) encryption mode was proposed by Peyrin and Seurin [PS16]. It is a TBC-based CTR-style encryption mode where the tweak value T is set to the counter block computed as the XOR of the random IV value and a counter (IV ⊕ j), and the cipher input value X is set to a unique nonce value N (fixed per message). For a block size of n bits and a tweak of t bits, the CTRT mode achieves beyond birthday bound (BBB) 2 (n + t)/2-bit security under the nivE notion (defined as indistinguishability from random bits with fresh nonces for each encryption), and a graceful security degradation when the nonce is repeated. The CTRT mode retains the features of the classical CTR mode and, in addition, brings in beyond birthday bound security and improved resistance against nonce misuse.
Nonce-misuse resistance (NMR) has been considered a theoretical abstraction, but recent attacks illustrated its severity in practice. In USENIX'16 Böck et al. [BZD + 16] investigated NMR of the CTR-based AES-GCM deployed in TLS 1.3 and managed to completely break the authenticity of those connections where servers repeated nonces. The next year at CCS'17 [VP17] Vanhoef and Piesens introduced the key reinstallation attack which forces nonce repetitions and breaks the WPA2 wireless protocol.

Contributions
In this work, we explore the security, efficiency and resulting advantages of tweakable primitives in CTR-style encryption through the prism of multiforkciphers. Our main reference points are the classical CTR and the recent CTRT encryption modes.
Multiforkciphers. First, we extend the forkcipher definition to that of a generic multiforkcipher (MFC). MFC is a tweakable cipher with an arbitrary but fixed length output, which generalizes over and covers both TBCs and tweakable FCs. An MFC transforms a single input block into s ≥ 1 output blocks. When s = 1, MFC becomes a TBC and when s = 2, it becomes an FC. We present the MFC security definition -ind-prtmfp (indistinguishable pseudorandom tweakable multi-fork permutation), which similarly to the ind-prtfp notion [ALP + 19b], captures indistinguishability from multiple pseudorandom tweakable permutations. We use MFCs to tackle the security of CTR-style encryption schemes. When s is left as a fixed but arbitrary parameter in a security analysis, one obtains a result valid simultaneously for TBCs, FCs and (hypothetical) MFCs with s > 2. This means, for example, that our results are directly applicable to both TBC-based and FC-based instances of CTR-like modes, among others.
Generic CTR mode. Our main contribution is the novel generic counter GCTR structure that uses an MFC as its underlying primitive. GCTR makes parallel MFC calls where the MFC inputs X (the plaintext) and T (tweak) are determined via two input generator functions f X and f T , taking as input a nonce N , a random IV denoted by R, and a counter j. We focus on the simplest and most natural generator functions, defined as either the concatenation, XOR, or the copy operations of two (respectively one) out of these three inputs, or simply a constant function independent of its inputs. We identify 36 instances, (most of) which implement secure nonce based, or IV-based or nivE schemes. In the special case of MFC with s = 1 (TBC), our results include, and offer alternatives to CTRT, which coincides with GCTR-3. To the best of our knowledge, this is the first systematic treatment of the popular CTR-style encryption.
TCTR abstraction. We analyze the security of all our GCTR variants. To do so, we define the tweakable CTR (TCTR) as an intermediate abstraction. TCTR directly takes two sequences of tweaks and plaintexts and feeds the input (T, X) pairs into parallel calls to MFC to generate a key stream. The security of TCTR is defined via the concept of a sequence-builder, capturing the common properties of all GCTR variants. We then bound the generic distinguishing advantage between the TCTR output and a truly random key stream, and apply this result in the analysis of our GCTR variants.

Security.
Our results show interesting security advantages overall, and particularly improve over the classical CTR mode. We prove that some of our variants achieve security beyond the birthday bound of n/2 bit (BBB) and some even full n-bit security with n being the size of the input blocks. We provide a detailed interpretation of our security results in Sect. 5 and pick a selection of variants GCTR-3 (= CTRT when s = 1), and GCTR-7 that excel in security. For a total of σ MFC calls, GCTR-7 provides perfect informationtheoretic security against nonce-respecting adversaries and BBB-security against nonce misusing adversaries (for σ nonce repetitions). GCTR-3 comes with BBB-security against nonce misusing adversaries (for σ nonce repetitions). Our security bound for GCTR-3 additionally improves over the original bound of CTRT. CTRT [PS16] was proven BBB secure with a degradation in the bound (nonce misuse and nonce respecting) as where q and x are the number of total CTRT queries and the maximum number of nonce repetitions over CTRT queries, respectively. In this work, we reduce the gap to (2σ−q)(x−1) 2 t+1 . Our results also show that GCTR-3 with larger tweaks of 2n bits provide ≈ n-bit security for all adversary (nonce-misuse and respecting) types (see Sect. 5). To achieve these security improvements, however, we may need to pay with an increase in the communication bandwidth with the nonce/IV size as compared to regular CTR mode.
Revisiting Tweakable HCTR. We reanalyze Tweakable HCTR (or THCTR; a VIL enciphering scheme [DN18]) that uses as an internal building block a CTR-like encryption mode that is in fact equal to our GCTR-4. We invalidate its existing security bound (claiming beyond birthday security with respective TSPRP [DN18] security notion) by identifying a flaw in its existing proof. Further, we provide a birthday attack confirming that THCTR does not achieve TSPRP-security beyond the birthday bound in its present form and recommend to replace its internal GCTR-4 component by either of our preferred variants (namely, GCTR-3 or GCTR-7) to achieve the intended BBB-security.

Efficiency.
A large part of our motivation for the study of GCTR variants is the idea that an MFC with a large s, that is more efficient than s TBC calls, results in more efficient encryption, with the advantage accumulating as the message grows. Our findings in Fig. 4 confirm that the only existing MFC with s > 1 ForkSkinny [ALP + 19b, ALP + 19a] yields a more efficient encryption scheme than its TBC counterpart (with identical round and tweakey functions) SKINNY [BJK + 16] (s = 1) when plugged in GCTR. For example, ForkSkinny in any of our GCTR modes achieves an efficiency improvement of over 20% over SKINNY in GCTR modes for the same tweak and nonce sizes.

Preliminaries
All strings used in this paper are binary strings. Strings of length n > 0 are referred to n-bit strings. The set of all n-bit strings is denoted as {0, 1} n . Any sequence of n-bit strings is denoted by ({0, 1} n ) + . We denote the set of all permutations of {0, 1} n by P erm(n). For any string A, |A| represents the bit-length of A and trunc c (A) represents the string defined by the first c bits of A. For a set S, the notation 2 S denotes the power set of S and |S| denotes the size of S. For any real number r, r denotes the least integer which is greater than r (the ceiling function). We denote any vector B with components B 1 , B 2 , . . . , B i as B 1 , B 2 , . . . , B i . For any two numbers a and b, a · b or ab represents their scalar multiplication.
Given a string A and an integer n with |A| = cn + d for some 0 < d ≤ n, we use the notation A 1 , A 2 , . . . , A c+1 n ← − A to represent the partitioning of A into a maximum number of n-bit blocks, such that |A i | = n for 1 ≤ i ≤ c and |A c+1 | = d. The symbol ⊥ represents an undefined value or error. We let r ← $ R denote the random sampling of an element r from a finite set R considering the uniform distribution. We let N denote the set of natural numbers.
The notation (p) q denotes the falling factorial p · (p − 1) · (p − 2) · . . . · (p − q + 1) where (p) 0 = 1. A predicate P(x) is defined as P(x) = 1 if it is true and P(x) = 0 if it is false. All comparisons that are used in the work for integer tuples are lexicographic comparisons (to exemplify, (i , j ) < (i, j) iff i < i or i = i and j < j).

Nonce-and IV-based Encryption
We target the syntax and security of nonce-and IV-based encryption schemes (nivE) [PS16]. An nivE scheme is a tuple Π = (K, E, D) where K is a key distribution (typically a finite key space with uniform distribution), E : K × N × R × M → {0, 1} * is the deterministic encryption algorithm, and D : K × N × R × {0, 1} * → M is the deterministic decryption algorithm with N and R representing the sets of nonces and IV s, respectively. The encryption algorithm maps a key K, a nonce N , an IV R and a message M with (K, N, R, M ) ∈ K × N × R × M to a ciphertext C = E(K, N, R, M ) and the decryption maps key, nonce, IV and a ciphertext to a message M = D(K, N, R, C).
We require that D(K, N, R, E(K, N, R, M )) = M for all K, N, R, M ∈ K × N × R × M and we assume that both E and D return ⊥ if any of the inputs is not in its intended domain. In this paper, we further require that R is a finite set, M ⊂ {0, 1} * such that M ∈ M and |M | = m ⇒ {0, 1} m ⊆ M and that |E(K, N, R, M )| = |M |. We let E $ : K×N ×M → R×{0, 1} * denote the randomized encryption algorithm, which internally samples an R ← $ R with uniform distribution, computes C ← E(K, N, R, M ) and returns R, C. We further let . nivE Security and ivE Security. We define the security of nivE through indistinguishability of ciphertexts from random strings in a chosen plaintext attack. More precisely, given an nivE scheme Π and a nonce respecting (i.e., using a fresh nonce for each encryption query) adversary A, we define A's advantage at breaking the security of Π as internally samples an R ← $ R and returns R with an independent random string of |M | bits upon every query. If A is not nonce-respecting (may reuse a nonce), we define its advantage with the same experiment, but denote it as Adv ivE Π . Relation to Nonce-based and IV-based Encryption. The syntax and notion of nivE schemes capture both nonce-based encryption (with R = {ε} which makes E $ deterministic) and random initialization vector-based encryption (with N = {ε}). Beyond these two basic types of encryption, nivE also captures a generalized type of symmetric encryption that uses a nonce and an IV simultaneously, previously shown useful to achieve high security levels [PS16].
On the use of nivE notion and schemes. At the first glance, the reader may question the usefulness of the "true" nivE schemes. Practice-wise, an encryption scheme requiring a nonce and an IV to be transmitted would indeed not be the first choice for mainstream applications. There are nevertheless scenarios where nivE schemes are useful as-is. For example, in an encryption-only scenario where two-pass processing is unacceptable and nonce-misuse resistance is desirable (e.g., a micro controller with embedded TRNG and constrained RAM, streaming sensor-data and having a reset-related possibility of noncerepetition), a nonce-IV scheme seems the only viable option.
In addition, nivE schemes are useful building blocks for higher-level constructions, where the nonce, the IV, or both can be implicit, as exemplified by SCT [PS16] (AEAD) and HCTR [DN18] (enciphering). Generally speaking, the benefits of nivE can be leveraged wherever an implicit nonce exist or some form of a synthetic IV can be computed. We conjecture that this is the case in many constructions and communication protocols (think about TLS, where each frame has a sequence number). Our results can then be used as a blackbox in the analyses of such constructions, as seen on the example of HCTR.
Finally, the nivE definition is an umbrella notion, that captures nonce-based, IV-based and nonce-IV-based symmetric encryption schemes. This lets us characterize and study interesting constructions of all three types simultaneously, dispensing with the need for a dedicated treatment for each type.

Coefficient-H Technique
The coefficient-H technique is a simple but powerful proof technique by Patarin [Pat09] which is often used to prove indistinguishability of a given construction from an idealized object by an information-theoretic adversary. The Coefficient-H technique characterizes an indistinguishability experiment, in which an information-theoretic adversary A tries to distinguish two sets of oracles O real (the "real world") and O ideal (the "ideal world"), in the form of transcripts. A transcript is defined as a complete record of the interaction of an adversary A with its oracles. To exemplify, if A has a single oracle, (M i , C i ) representing the input and output of the i-th query to this oracle and q is the total number of queries made by A then the corresponding transcript (denoted by τ ) is defined as τ = (M 1 , C 1 ), . . . , (M q , C q ) . The goal of A here is to distinguish interactions in the real world O real from the ones in the ideal world O ideal .
Let us denote the distribution of transcripts in the real and in the ideal world by Θ real and Θ ideal , respectively. We call a transcript τ attainable if the probability of achieving τ in the ideal world is non-zero. Further, we also assume w.l.o.g. that A does not make any duplicate or prohibited queries. We can now state the fundamental lemma of the coefficient-H technique. We refer the reader to an excellent tutorial on the coefficient-H technique by Chen and Steinberger [CS14].
Lemma 1 (Fundamental Lemma of the coefficient H Technique [Pat09]). Consider that the set of attainable transcripts is partitioned into two disjoint sets T good and T bad . Also, assume there exist 1 , 2 ≥ 0 such that for any transcript τ ∈ T good , we have Pr

Multi-Fork Cipher (MFC)
In this section, we define the syntax and security of symmetric key primitive we name multiforkcipher (MFC). MFC generalizes the primitive called forkcipher [ALP + 19b]. Informally, a forkcipher takes as input a secret key, a public tweak and an input block, and evaluates two independent permutations of the input block at the same time. 3 An MFC generalizes this concept to an arbitrary (but fixed) number of encryption branches (i.e., arbitrary but fixed number of output blocks). More precisely, a multi-forkcipher takes a secret key, a public tweak, and an n-bit plaintext block as input and produces s n-bit output blocks. Additionally, the input X should be computable backwards from any of the output blocks, and any of the output blocks should be reconstructible from any other output block.
Ideal MFC. With a random key as input, an ideal s-MFC implements an s-tuple of independent random permutations π T,1 , π T,2 , . . . , π T,s for every tweak T , which for input X and a set α ⊆ {1, 2, . . . , s} provides {v i | v i = π T,i (X) for i ∈ α} i.e. |α| many indexed but independent outputs. We define a secure multi-forkcipher to be computationally indistiguishable from such an ideal MFC. The ideal MFC is equivalent to a tuple of s ideal tweakable block ciphers used in parallel.
Note that this formalism captures both conventional TBCs (MFC with a single output block), the original forkcipher, and any future generalized constructions with three or more branches. Results using the notion of MFC then have the advantage of being automatically applicable to instantiations based on any of the aforementioned primitives. Furthermore, forking primitives that can output a practically unlimited number of branches (akin to Farfalle [BDH + 17]) can be viewed as an MFC with the (maximum) number of branches set to their operational (or security) limits.

Syntax
A multi-forkcipher F s is a pair of deterministic algorithms, the forward F s : {0, 1} en and the backward (or the inverse) F −1 s : The forward algorithm F s takes in a key K, a tweak T , an input block X and an output selector set α. It then outputs the output blocks Y a1 , . . . , Y az indicated by the output . Similarly, the backward computing algorithm F −1 s takes in a key K, a tweak T , a block Y , an input indicator β and an output selector set α. It then outputs the blocks Y a1 , . . . , Y az indicated by α = {a 1 , a 2 , . . . , a z }. If a 1 = i then the first block is X (the corresponding input block of F s ) and a 2 < . . . < a z , otherwise a 1 < a 2 < . . . < a z . If . We call k, n and t the keysize, inputsize and tweaksize of F s , respectively.
A multi-forkcipher is said to be correct if for every K ∈ {0, 1} k , T ∈ {0, 1} t , X, Y ∈ {0, 1} n and β ∈ {1, 2, . . . , s} it satisfies the following conditions: , decrypting a ciphertext block with the same key, the same tweak and using the same output index gives the correct plaintext, ,2,...,s}\{β} , i.e., fixing the key and the tweak and given a ciphertext block produced with output index β, reconstructing the ciphertext block for output index α always gives the same value as encrypting the same plaintext directly with the output index α, ..,s} , i.e., fixing the key and the tweak, encrypting a plaintext with a certain set of output indexes always produces the same output blocks as encrypting the same plaintext with each of the output indexes individually, . . , a z }) for each set {a 1 , a 2 , . . . , a z } ∈ 2 {i,1,2,...,s}\{β} , i.e., fixing the key and the tweak and given a ciphertext block, reconstructing/decrypting with a certain set of output indexes always produces the same output blocks as reconstructing/decrypting the same ciphertext blocks with each of the output indexes individually.

Security of MFC
We define the security of a multi-forkcipher with the help of security games prtmfpreal and prtmfp-ideal in Fig. 1. 4 An adversary A who wants to break the multiforkcipher F s plays games prtmfp-real or prtmfp-ideal. In either game, A makes q queries in total, of the form The oracle either processes the inputs with the real F s used with a random key, or with a random "multi-forked permutation" P . A multiforked permutation is an s-tuple of tweakable permutations, s.t. these s permutations are always used with the same plaintext block (even when queried in the backward direction). The selection of the tweakable permutations to be applied is based on the selector α in the natural way. We define the advantage of A at distinguishing F s from a random multi-forked permutation P of |α| · n bits in a chosen ciphertext attack as Adv prtmfp We will use a shorthand [s] to denote the set {1, 2, . . . , s}. In the rest of this paper, we only use the forward direction of an MFC, with α = [s]. Thus, we fix α = [s], drop the output selector from the input list, and use the notation F s (K, ·, ·) = F s,K (·, ·) = F s,K (·, ·, [s]). One can see this F as a multi-forkcipher with α hardwired to "all".

MFC vs TPRI.
An n-bit MFC with s branches syntactically resembles an n-bit input tweakable pseudorandom injection PRI with sn-bit output, yet they differ in their probability distributions. While a TPRI simply samples sn-bit images w/o replacement, the MFC concatenates s random permutations, resulting in a birthday gap between the two objects.

MFC-based CTR Mode and its Variants
The Counter (CTR) [LRW00] mode of operation has been considered as one of the best choices among the set of block cipher modes for message confidentiality. The inverse-freeness and parallelism of the original CTR mode are simple but very powerful in confidentialityonly protocols. Yet, the classical CTR mode provides only n/2 security when used with an n-bit block cipher and fails completely in the face of nonce reuse (in the cases where IV is implemented as a nonce). In this section, we define a generic CTR (GCTR) with the same design properties as the original CTR construction but aiming to achieve higher security levels in the spirit of the recent tweakable block-cipher-based CTRT mode [PS16] and aiming for the nivE security notion.
GCTR implements an nivE encryption scheme that uses an MFC as a lower-level primitive and similarly to CTRT takes as input both a nonce and a random value, as opposed to the classical CTR mode's single IV value. We then show that at the cost of an additional input we obtain encryption schemes with significantly improved security. More precisely, we present and investigate the security of 36 concrete instances which are distinguished by the way a nonce N , an IV (a random value R) and a counter j are combined as inputs to the MFC primitive. Our research is exhaustive regarding the simplest of operations (XOR, copy or concatenation), covering the classical CTR as well. Furthermore, via the abstraction of MFC we incorporate and enable the comparison of security and efficiency features of tweakable primitives with variable output sizes.

Generic CTR
There are many possible definitions of a CTR-like mode with an MFC by combining the MFC tweak and plaintext inputs: a nonce N , a block counter j, and a random IV value denoted by R. We formally capture the space of the MFC-based CTR-like modes through the generic CTR mode (GCTR), which uses placeholder functions f T (N, R, j) → T and f X (N, R, j) → X to compute the tweak T and the MFC plaintext X. An instance, or else a GCTR variant, is obtained by fixing those functions.
For a fixed multi-forkcipher F s : , 1} i and the encryption and decryption algorithm defined in Fig. 2. The exact values of ν, r and depend on the concrete instantiation. We also use the shorthand GCTR, leaving F s , f T and f X implicit.  Table 2).
Similar to the conventional CTR mode, GCTR is inverse-free, i.e., the inverse direction of the underlying multi-forkcipher F s is never used. We note that the security of GCTR mode depends not only on the security of the underlying multi-forkcipher but also on the functions f X and f T that compute MFC inputs and tweaks. In the next section, we exhaustively investigate a well-defined subset of the GCTR variants' space.

GCTR Variants: CTR Mode of Encryption using MFC
The space of all possible GCTR instances is huge (there are 2 (t+n)·(2 ν ·2 r ·jmax) of them with ν = |N |, r = |IV | and j max being the maximal allowed counter value) but only a significantly smaller subset of those is of practical interest. The main criterion is computational complexity of the functions f X and f T ; they must be computed efficiently for the instance to make any sense. In this section, we exhaustively investigate the set of arguably most efficient GCTR variants, with f X and f T defined using the simplest operations.
Simple variants. The class of GCTR variants we investigate are what we call "simple and natural". The class is induced by imposing the following restrictions on the functions f T , f X : Simple operations: fT (N, R, j) (resp. fX (N, R, j)) can only be (1) a concatenation of two out of the three input arguments, or (2) an xor of two out of the three input arguments, or (3) a simple copy of one of the three input arguments, or (4) a constant function independent of the input arguments.

No argument reuse:
No input argument can be used by fT and fX at the same time (e.g if fT = N ⊕ R then fX = N j is invalid due to the use of N ).
We put no restrictions on the integer parameters ν, r, j max defining the domains {0, 1} ν , {0, 1} r and {1, 2, . . . , j max } of respectively the nonce N , the random IV R and the counter j insofar the functions f X and f T are well-defined. For example, for f X = N ⊕ R we must have ν = r = n while for f T = N j we must have ν < t and j max = 2 t−ν − 1. Note that we assume that for an evaluation of the functions f T and f X the counter is suitably encoded as a fixed-size binary string j . The restriction to simple operations leaves 10 choices for each f T and f X (three possibilities for the xor, three for the concatenation, three for the copy plus the constant function), yielding a set of 100 GCTR variants. Further filtering this set by prohibiting the reuse of arguments leaves 36 variants: • for a constant f T , we are free to use any of the nine non-trivial possibilities for f X (9 variants in total); • for an f T that is a copy of one of the three input arguments, f X can be a binary operation of the remaining arguments or a copy of one of the remaining arguments or a constant function (15 variants in total); • for an f T that is a binary operation, f X can be a copy of the remaining argument or a constant functions (12 variants).
The individual 36 variants are listed in the remaining paragraphs of this section.
Trivially insecure variants. As the first step of our investigation, we immediately identify three sets of trivially insecure simple variants: Counter only: If one of the functions fT and fX is a copy of the counter j, and the other is a constant γ, the GCTR variant is trivially insecure, as the key stream blocks it generates repeat in each query. This set consists of variants 33 and 34 in Table 1.
No counter: If none of the functions fT and fX uses the counter j, the GCTR variant is trivially insecure, all key stream blocks in a query have the same value. This set consists of variants 23 to 32 in Table 1.
Nonce XORed with counter only: If one of the functions fT and fX is N ⊕ j , and the other is a constant γ, the GCTR variant is trivially insecure, as the key stream blocks it generates will always have repetitions among queries where the adversary chooses nonces with ensuring that some of the N ⊕ j inputs are the same among these queries. Note that such repetitions of block outputs are unavoidable here even when we restrict the adversary to be nonce-respecting. This set consists of variants 35 and 36 in Table 1.
We refer the reader to the full version of the paper (see Appendix C) for a more formal treatment of nivE attacks on the trivially insecure variants.
Interesting variants. We investigate the 22 variants that remain after the previous filtering. All of them are named and listed in the Table 2 as GCTR-1 to GCTR-22. We give a formal statement of security for these 22 secure GCTR variants and support them with security proofs. The formal claim about the security of these 22 secure GCTR variants is stated in Theorem 1 (see Table 2 for the definition of adversarial resources). (1) for some adversary B who makes at most σ queries, and runs in time given by the running time of A plus γ 0 ·σ for some constant γ 0 . Here deg represents the corresponding degradation in the (n)ivE security of the variant GCTR-z as given in the Table 2 for all values of z. ] variants for constructing a nivE scheme using a multiforkcipher. The columns "f T " and "f X " respectively show computation of the j th t-bit tweak and n-bit plaintext to the MFC F s . Here, q and σ are the total number of plaintext queries and MFC calls, respectively, R is an r-bit random value, N is a ν-bit nonce, j is a counter, γ is a constant, j is a constant-size encoding of an integer j, is the maximum of query lengths i s with 1 ≤ i ≤ q, and 1 ≤ x ≤ q is the upper bound on number of reuses (repetitions) for any nonce N i (x = 1 means no nonce repeats). The column " max " contains the maximum number of possible n-bit blocks in a query.
We defer the proof of Theorem 1 to Sect. 6.2.

Discussion
In this section, we give an interpretation of the bounds in Theorem 1. We then discuss the performance benefits that can be gained from MFC-based GCTR.

Security
(Mis)use of Nonce and IV. The GCTR variants 1-18 that use the random IV R input remain secure under nonce reuse. Some of these do not use the nonce N at all, making the nivE and ivE bounds equal (variants 11-16). Most of the variants using both nonce and random IV have a better nivE bound than ivE (here most means all except GCTR variants 9, 10, 17 and 18 because despite of having both N and R as inputs these variants use the nonce N as XORed with either R or the counter which negates the benefits of the nonce).
(Beyond) Birthday-Secure Variants. The classical CTR mode, a.k.a. GCTR-20, is among our 22 secure variants. Interestingly, the bounds of all other 21 variant are superior to that of the CTR mode (variant 20), which becomes void at ≈ 2 n/2 processed blocks. More specifically, all of these (n)ivE bounds are dominated by a quadratic term, but unlike the CTR mode, this term is not only in the number of blocks σ but has q or as well. (see Table 2). Recall that we (informally) consider a GCTR variant beyond birthday bound (BBB) secure when having a security bound that does not become void around 2 n/2 queried blocks. With this definition, we have 6 variants in the 22 that are BBB-secure namely, variant 1, 3, 7, 11, 13 and 19.
Our pick. The variants GCTR-3 and GCTR-7 are the best two modes in terms of security. Table 2 shows that out of all 22 modes, GCTR-7 achieves the best quantitative security for x ≤ 1 + sq/D where D = ((sσ + q)/(2σ − q))2 t − 2 n whereas for x > 1 + sq/D, GCTR-3 provides the best security. In other words, for the nonce respecting case GCTR-7 is the best choice whereas for the general nonce misuse case GCTR-3 with t = 2n is the best choice (in practical cases σ and q are upper bounded by 2 n ). The same can also be verified in Fig. 3 (plot F) which shows the security degradation of these variants with increasing number of nonce-repetitions. We further illustrate the security gap between GCTR-3, GCTR-7 and the classical CTR mode for a fixed input sizes of n = 128 in Fig. 3 (plots A to C). For simplicity, q is replaced by its worst-case value, i.e. q = σ. Further, in Fig. 3 (B, D and E), we give the advantage for more realistic values of q to illustrate that the degradation slope is preserved also for those q values. We note that the advantage slopes for other choices of n have the same shape.  From Fig. 3, we infer for s = 2: 1. GCTR-3 with t = 2n provides ≈ n-bit security against all types of adversaries. GCTR-3 with t = n provides ≈ n-bit security against nonce-respecting adversaries and BBB-security against nonce misusing (with x σ nonce repetitions) adversaries.
3. Even though GCTR-3 with t = n and GCTR-7 both provides ≈ n-bit security for nonce-respecting adversaries, GCTR-7 has a comparatively slower/better security degradation with increasing σs. For example, a nonce-respecting adversary who wants to achieve an advantage of 2 −120 (or more) requires at least 2 128 128-bit encrypted blocks against GCTR-7 whereas to achieve the same advantage for GCTR-3 with t = n, it only requires 2 68 128-bit encrypted blocks.
GCTR Modes and CTRT. In our GCTR framework CTRT coincides with the variant GCTR-3 while GCTR-4, GCTR-7 and GCTR-17 are just mentioned in [PS16] as other possible secure variants. The existing instantiation Deoxys-II [JNPS16] of CTRT is the same as the GCTR 1 -3 (with t = n) mode with the TBC Deoxys-BC [JNPS16]. In [PS16], CTRT is shown BBB secure with (n)ivE degradation bound as 2σ(x−1) 2 t + σ 2 2 n+t+1 . In this work, we improve this CTRT security bound with updated degradation as (2σ−q)(x−1) 2 t+1 This improved bound is of practical relevance and strengthens the security of CTRT in cases where average message length is longer.
Revisiting THCTR. Tweakable HCTR (THCTR) was proposed as a tweakable VIL enciphering scheme that turns an n-bit tweakable block cipher to a variable input length tweakable block cipher [DN18]. It uses a CTR-like encryption mode as an internal building block. In the original publication, THCTR is claimed to be BBB-secure under the TSPRP notion [DN18] and a security proof is provided to support this claim.
However, upon an inspection, the internal CTR-like component of THCTR can be seen to be equal to GCTR-4, for which our own analysis yields BB-security (see Table 2). An investigation of this discrepancy revealed that the claim of THCTR's BBB-security under the TSPRP notion is not correct. We give a BB-attack as Prop. 1 (included in App. D), disproving the claimed TSPRP-security beyond the birthday bound. We also point to the exact flaw in the security proof.
With GCTR-3 and GCTR-7 being better alternatives to GCTR-4 having gracefully degrading BBB-security under the ivE notion, we recommend to replace the GCTR-4-like component of THCTR by GCTR-3 or GCTR-7 to achieve the desired BBB security. GCTR in AE schemes. There are existing known ways to construct AE schemes from encryption and authentication schemes such as the Encrypt-then-MAC generic composition [BN00] and SCT-style AE [PS16]. We recommend the later for GCTR as it allows message generated pseudo-random IVs and thus reduces the bandwidth as well as avoids the dedicated random sampling of IVs.
We note that the syntax of GCTR is a natural generalization of CTRT only in terms of number of outputs which means that any secure variant of GCTR would yield an AE if combined with an SCT-style overarching scheme.

Efficiency
A thorough performance evaluation of a GCTR instance would of course require a fixed HW setup and concrete implementations, which is out of scope of this paper. Nevertheless, we do provide an estimation of efficiency gain between GCTR, CTRT and basic CTR by comparing the total number of primitive rounds for instances based on ForkSkinny [PARV19]. Since all GCTR variants follow the same MFC-based GCTR framework, it is sufficient to analyze the efficiency of the generic GCTR mode.

Security
This section is dedicated to security analysis backing up Theorem 1. We first define the Tweakable CTR (TCTR) construction and security notion. Rather than a full-fledged security notion, TCTR is to be seen as intermediate abstraction layer, albeit powerful one. It captures the core security aspects common to all GCTR variants in Lemma 2, thus simplifying the security analysis. We then proceed and prove all the bounds of Theorem 1, relying heavily on the aforementioned lemma.

Tweakable CTR framework
We define the tweakable CTR construction (TCTR), an algorithm that takes a sequence of tweak-input pairs and generates a key stream by applying an MFC to each pair. When paired with a security notion based on the concept of a sequence-builder defined below, TCTR is easily seen to be what is common for all GCTR variants. We upper-bound the distinguishing advantage for TCTR as a function of the properties of the used sequencebuilder, and then apply this result in the analysis of GCTR variants.
GCTR mode is obtained from the TCTR s algorithm in a natural way, as shown in Fig. 6. Taking a key K ∈ K, a nonce N ∈ N and plaintext M ∈ {0, 1} * , GCTR mode determines the number of components of the input sequence X and the tweak sequence T as |M |/sn , and computes them using the functions f X and f T . It then uses TCTR s and outputs a ciphertext C.

Security of Tweakable CTR.
Defining the security of TCTR as indistinguishability from a random key stream generator, while giving the adversary the ability to directly query the input-and-tweak sequences would not be meaningful, as there are adversaries that would achieve an advantage close to one with constant resources. This also fails to capture how TCTR is used in GCTR.
Game tctr-real TCTRs,s-build To address the latter, we define the security TCTR by slotting a possibly randomized query-builder algorithm s-build : {0, 1} ν × N + → {0, 1} r × ({0, 1} n ) + × ({0, 1} t ) + between an adversary and TCTR. The query builder takes as input adversarially chosen nonce N and sequence length , and outputs random coins R (if used), the sequence of MFC inputs X ∈ ({0, 1} n ) and the sequence of MFC tweaks T ∈ ({0, 1} t ) , which are then fed to TCTR to produce the key stream V . The adversary gets R and V . The algorithm s-build is a parameter of the security games tctr-real and tctr-ideal in Fig. 7, and is fixed throughout the experiment, and known to the adversary. The adversary can thus compute all MFC inputs and tweaks.
An adversary A who wants to break the TCTR s algorithm used in conjunction with a sequence builder s-build plays the games tctr-real and tctr-ideal. A makes oracle queries of the form N, as explained above. The oracle returns random coins R (if any) and a string that is either the real TCTR s output for the inputs queried by A, or a random string of the same length. We define Adv ind-tctr TCTRs,s-build (A) = Pr[A tctr-real TCTRs,s-build ⇒ 1] − Pr[A tctr-ideal TCTRs ,s-build ⇒ 1] . We say a TCTR s construction is secure with s-build if the adversarial advantage Adv as described above is "small" for all adversaries with "reasonable resources".
The security notion for TCTR is intuitive. For the GCTR variants from Sect. 4, the algorithm s-build consists of simply sampling the random IV R and applying the function f X and f T in a loop. In Lemma 2 and the corresponding analysis, we express the security of TCTR s , i.e. the adversarial advantage Adv ind-tctr TCTRs[Fs],s-build (A) as a function of the properties of s-build and of the security of the MFC F s .
For simplicity, the lemma uses a shorthandPr which is defined as follows. Let E(a) be an event that depends on an integer index a ≥ a 0 where a 0 is a constant. Theñ Pr(E(a)) = Pr(E(a) ∧Ē(a − 1) ∧ · · · ∧Ē(a 0 )) ≤ Pr(E(a)) . Further, with this notation, it also holds that Pr(E(a) ∨ E(a − 1) ∨ · · · ∨ E(a 0 )) = a i=a0P r(E(i)) . This equality holds for any ordering of the indices [a 0 , a], however, we stick to the lexicographical ordering. Note that the equality also holds for events that are dependent on multiple indices such as E(i, j). Further, with a slight abuse of notation, we will leave the number q of queries and the length of i th query (in blocks) i implicit when summing over all MFC calls, using for some adversary B who makes at most σ = q i=1 i s queries, and runs in time given by the running time of A plus γ 0 · σ for some constant γ 0 .
Note that the distribution of the tweak-input pairs, and consequently the bound, is determined by fixing the sequence builder s-build.

(B) + Adv ind-tctr
TCTRs[π] (A) . Now, the adversary A is left with the goal of distinguishing between the games tctr-real TCTRs[π] and tctr-ideal TCTRs[π] . For simplicity, we denote these games by "real world" and "ideal world", respectively. Hence, we want to bound Adv ind-tctr  [Pat09], we describe the interactions of A with its oracle in a transcript: Coefficient-H. Let us now represent the distribution of the transcripts in the real world and the ideal world by Θ re and Θ id , respectively. The proof relies on the fundamental lemma of the coefficient-H technique as defined above in Lemma 1. We say an attainable transcript τ is bad if one of the following conditions occurs: .
s+p for at least one of the values of 1 ≤ p ≤ s. We use T bad to denote the set of "bad" transcripts which is defined as the set of attainable transcripts for which the transcript predicate BadT(τ ) = BadT 1 (τ ) ∨ BadT 2 (τ ) = 1. Further, we use T good to denote the set of attainable transcripts that are not in the set T bad . Transcripts of the set T good are therefore called good transcripts.

Proof. [Lemma 3]
For any transcript in T bad with BadT 1 set to 1, we know that there exists at least one pair of block indices (i , j ) < (i, j) for which (X i j , T i j ) = (X i j , T i j ). One can notice that there are in total q values of i and i , and for each such i and i , there are i /s and i /s values for j and j , respectively. Now, since the values of Xs and T s are independent of the corresponding world being real or ideal, we have Similarly, for any transcript in T bad with BadT 2 set to 1, we know that there exists at least one pair of block indices (i , j ) Clearly, as the values of vs are uniformly and independently distributed in Θ id and as p can take at most s values, the probability of v i (j −1)s+p = v i (j−1)s+p is upper bounded by s/2 n . Now, since the values of Xs and T s are independent of the corresponding world being real or ideal, with same bounds on i, i , j, j as above, we get and now using the union bound we obtain the claim of the lemma.

Proof. [Lemma 4]
Note that any good transcript τ does not contain input or output collisions as described in the bad events above which means all inputs and outputs blocks (of n-bits) that correspond to the same tweak are distinct in τ . Also, one should keep in mind that the values of input Xs and T s are not dependent on the corresponding world. We can now compute the probability to obtain a good transcript in the real and ideal worlds as follows. Let S M denotes the multiset of all tweaks used during a session of q queries with query lengths 1 /s , . . . , q /s in terms of (n + t)-bit blocks (here an n + t-bit block denotes the corresponding input-tweak pair (T, X)). Let S S be the largest subset of S M (i.e. a set with all distinct elements of S M ) and let η a denote the multiplicity of T a ∈ S S in S M . In the real world, since the output vs are defined using a random permutation, we get Pr(Θ re = τ ) = Π On the other hand, in the ideal world, the output vs are chosen uniformly and independently at random which gives us Pr(Θ id = τ ) = Π |S S | a=1 (1/2 n ) ηa . From these two expressions, we get and hence the claim.
Combining the results of Lemma 3 and 4 (taking = 0) into Lemma 1, we get the upper bound and hence the result of Lemma 2.
In the following, we upper bound the probability terms in Lemma 2 for certain choices of s-build.
Lemma 5. Let F s be a tweakable multi-forkcipher with tweak space = {0, 1} t and let where T i j is computed as N i j . Then for any adversary A who makes at most q TCTR s queries, such that each nonce value repeats no more than x times and σ = The proof of Lemma 5 is straightforward from the fact that there are at most σ choices of T i j = N i j and for each choice (as j gets fixed) there are at most x − 1 choices of T i j = N i j such that T i j = T i j with non-zero probability. Further, we multiply by an extra 1/2 as we are only interested in exactly half of these pairs due to the ordering of indices as defined in the sum expression.

Lemma 6. Let F s be a tweakable multi-forkcipher with tweak space
Then for any adversary A who makes at most q TCTR s queries, such that each nonce value repeats no more than Proof. [Lemma 6] Since there are σ possible ways to choose the block index pair (i, j) and for each choice of (i, j), there are at most ( − 1) choices for another block index pair (i , j ) = (i, j) such that i = i, N i = N i and R i = R i with probability 1, the sum expression which corresponds to these collisions is bounded by the first term as shown above in Lemma 6. Here, we multiply by an extra 1/2 as we are only interested in exactly half of these pairs due to the ordering of indices as defined in the sum expression.
For all the remaining tweak pairs that are not counted in the above explanation (i.e. tweak pairs with tweaks from different queries), we can have a collision only if the tweak pair corresponds to same N and R. There are σ possible ways to choose the block index pair (i, j) and for each choice of (i, j), there are at most min{(x−1) , σ} choices for another block index pair (i , j ) = (i, j) such that i = i, N i = N i and R i = R i with probability 1/2 r . Hence, the sum expression which corresponds to these collisions is bounded by the second term as shown above in Lemma 6. Here again, we multiply by an extra 1/2 as we are only interested in exactly half of these pairs due to the ordering of indices as defined in the sum expression.
The third term can be understood from the fact that Pr(T i j = T i j ) ≤ 1 for all σ 2 pairs of block indices (i, j), (i , j ).

Then for any adversary A who makes at most q TCTR s queries, such that
The proof of Lemma 7 is straightforward from the fact that there are at most σ choices of T i j = R i j and for each choice (as j gets fixed) there are at most q − 1 choices of T i j = R i j such that T i j = T i j with non-zero probability and since all R i are uniformly chosen at random, this probability is equal to 1/2 r . Further, we multiply by an extra 1/2 as we are only interested in exactly half of these pairs due to the ordering of indices as defined in the sum expression.
Lemma 8. Let F s be a tweakable multi-forkcipher with tweak space = {0, 1} t and let where T i j is computed as R i ⊕ j and the distributions of X i j s and T i j s are statistically independent over the coins of the adversary and s-build. Then for any adversary A who makes at most q TCTR s queries, Proof. [Lemma 8] There are at most σ choices for a block index (i, j) and for each such choice there are at most q − 1 choices for another block index (i , 1) with i = i such that R i ⊕ R i = j ⊕ 1 with probability 1/2 r . Clearly, this counts all possible tweak collisions. However, one can notice that we have counted each pair of indices ((i, j), (i , 1)) twice whenever j = 1 due to their ordering. Since we are only interested in unordered pairs of indices, we subtract these extra cases from the counted ones. Let us now count these extra cases. There are at most q choices for a block index (i, j) with j = 1 and for each such choice there are at most q − 1 choices for another block index (i , 1) = (i, 1). Since we are only interested in exactly half of these pairs (i.e. the unordered pairs), we multiply by 1/2. The final bound on tweak collision probability after subtracting these pairs becomes (q − 1)(σ − q/2)/2 r .
Lemma 9. Let F s be a tweakable multi-forkcipher with tweak space = {0, 1} t and let where X i j is γ, a constant fixed for all block tuples (i, j). Then for any adversary A who makes at most q TCTR s queries, we have The proof of Lemma 9 is straightforward from the fact that for X i j = γ, we have X i j = X i j for all block index pairs (i, j) > (i , j ) .

Security of GCTR
Let us now define Q as {(N i , R i ) | 1 ≤ i ≤ q}; the set of q queries of A against GCTR with i th query labeled as its corresponding pair (N i , R i ). We define Q N as {(N i , R i ) | N i = N and 1 ≤ i ≤ q} i.e. a subset of Q with queries containing the same nonce N . By definition of x, any such subset of Q can have at most x elements. We now define the two possible generic events that are applicable against the defined 22 GCTR constructions namely event U and V.
Event U -Let U be the event when in any Q N ⊆ Q, we get (N, R i1 ) = (N, R i2 ) with i 2 < i 1 (i.e. one of the randomly chosen R i1 s matches one of the previously chosen R i2 s having the same nonce). Since for any Q N we can have at most x such R i s and each one is of size r bits, therefore, following the details as explained in Appendix A.1, we obtain Pr(U) ≤ q(x − 1)/2 r+1 . Note that the r here is also a variable and its value depends upon the underlying GCTR variant.
Event V -Let V be the event when in any Q N ⊆ Q, for one of the randomly chosen R i1 , an R i1 ⊕ j 1 matches to one of the previously used/defined R i2 ⊕ j 2 s. For any Q N we can have at most x such R i s and each one is of size r bits, therefore, following the details as explained in Appendix A.2, we obtain Pr(V) ≤ (2σ − q)(x − 1)/2 r+1 . Note that the r here is also a variable and its value depends upon the underlying GCTR variant.
Note that the events U and V as defined are not dependent on the type of inputs of the GCTR mode. However, the fact that the occurrence of one of these events results into one or more input-tweak pair collisions (hereafter called trivial collisions) in the GCTR mode depends upon the type of inputs of GCTR (i.e. X and T ). To make it more clear, we define event applicability for the GCTR mode.
Event Applicability -If the GCTR variant has R ⊕ j as one of its inputs (i.e. either X or T ) then we say that the applicable event for that variant is V, if the variant has any other combination with R as one of its input then we say that the applicable event is U and if both inputs of the variant are independent of R then we say that none of the two events are applicable. In Table 3, we classify the 22 GCTR modes according to their event applicability. Case Analysis -We now perform an exhaustive case analysis to proceed with the security proof of the GCTR mode. The motivation for doing this case analysis is to define/branch some simplified advantage expressions from the inequality of Lemma 2 (over the events U and V) which are valid for disjoint sets/categories of GCTR variants. We can then further simplify one of these advantage expressions for individual variants that belong to the corresponding category.
Note that every (sub)case in the upcoming case analysis is defined with some conditions of event applicability and event occurrence of the events U and V, and the types of the inputs that are fed to the GCTR mode. This shows that every (sub)case corresponds to the particular set of GCTR variants where its imposed conditions apply. Further, note that the length variables of N and R (ν and r) used in this analysis depend upon the variant itself. N is considered fixed to an empty string for GCTR variants that don't use N as one of the inputs i.e. ν = 0. Similarly, R is considered fixed to an empty string for GCTR variants that don't use R as one of the inputs i.e. r = 0. We now define the cases as follows.
Case 1: When the event U is applicable to the given GCTR variant and -Case 1.1: If the GCTR variant neither has N ⊕ j nor N ⊕ R as one of its inputs (X or T ) then there will be repetition of some input-tweak pairs (trivial collisions) only when U occurs. This is true as for any GCTR variant that belongs to this case we can map any input-tweak collision of (X, T ) to a collision of (N, R). In expression, we have where ((T i 1 , X i 1 ), . . . , (T i i/s , X i i/s )) denote the input-tweak pairs of the corresponding i th query to the underlying TCTR s construction of that GCTR variant. Case 1.2: If the GCTR variant has N ⊕ j (respectively N ⊕ R) as one of its inputs (X or T ) then there will be repetition of some input-tweak pairs (trivial collisions) only when the pairs are not from a same query and at least their corresponding values of Rs (respectively N ⊕ Rs) are the same. Since there are in total q queries of A against GCTR, we have where ((T i 1 , X i 1 ), . . . , (T i i/s , X i i/s )) denote the input-tweak pairs of the corresponding i th query to the underlying TCTR s construction of that GCTR variant. Case 2: When the event V is applicable to the given GCTR variant then there will be a repetition of some input-tweak pairs (trivial collisions) only when V occurs. This is true as for any GCTR variant that belongs to this case we can map any input-tweak collision of (X, T ) to a collision of (N, R ⊕ j ) and vice versa. In expression, we have where ((T i 1 , X i 1 ), . . . , (T i i/s , X i i/s )) denote the input-tweak pairs of the corresponding i th query to the underlying TCTR s construction of that GCTR variant. Case 3: When neither of the events U and V is applicable to the given GCTR variant and all MFC calls made during the q queries contain distinct input-tweak pairs (only 4 variants namely GCTR-19, 20, 21 and 22 fall into this category when conditioned under the nonce-respecting setting) then we know that there can not be a trivial collision here. However, there can be repetitions of the tweaks used in these calls which lead us to the Lemma 2 and one of the Lemmas 5-9.
Clearly, these generic cases are not only mutually exclusive but are also exhaustive for all GCTR variants of Table 2. For simplicity, let us summarize here the classification of 22 GCTR variants according to their corresponding case and applicable lemma (if multiple lemmas are applicable, the one with tightest bound is used). The remaining proof of the Theorem 1 relies on combining the results of Lemma 2 and Lemmas 5-9 with the bounds as defined above in the case analysis for each variant of Table 2. In the remaining proof, we use the following explanation for simplicity. If the underlying GCTR variant uses the tweaks defined as nonce with XOR i.e. in the format of N i ⊕ j or N i ⊕ R i (this includes the GCTR variants 9 and 17 from the Table 2) then we know that despite of the fact that N i s are distinct the corresponding values of tweaks N i ⊕ j (or N i ⊕ R i ) can be the same for different values of j (or R i ). Note that such collisions are unavoidable even in the case of a nonce-respecting adversary. However, we can at least argue the following: • For GCTR variants with tweaks T i j = N i ⊕ R i (GCTR-9), the probability of a tweak collision is independent of the nonce repetition, therefore, the tweak collision probability can be computed using Lemma 6 in a similar manner to GCTR variants with tweaks T i j = R i .
• For GCTR variants with tweaks T i j = N i ⊕ j (GCTR-17), the probability of a tweak collision is independent of the nonce repetition. However, what we still know is that for any of the query pairs of GCTR-17, there can only be at most one tweak collision due to some N i ⊕ j repetition. More specifically, for any index pair (i, i ) with 1 ≤ i < i ≤ q, N i ⊕ N i = j ⊕ 1 can occur with probability at most 1. This implies that the tweak collision probability for GCTR-17 can be easily computed using Lemma 8 in a similar manner to GCTR variants with tweaks T i j = R i ⊕ j but with |R i | = 0 (note that here |R i | = 0 is equivalent to setting 1/2 r = 1).
We can now bound the (n)ivE advantage of A against each one of the 22 GCTR variants as follows: GCTR-1: In this variant R i j is used as the tweak and N i as the input, hence applying the results from Lemma 2, Lemma 7 and Eqn.(3), we get GCTR-2: In this variant, N i is used as the tweak and R i j as the input, hence applying the results from Lemma 2, Lemma 6 with |R i | = 0 and Eqn.(3) with |R i | = r, we get

GCTR-3 (CTRT mode):
In this variant, R i ⊕ j is used as the tweak and N i as the input, hence applying the results from Lemma 2, Lemma 8 with |R i | = t and Eqn. (5) with GCTR-4: In this variant, N i is used as the tweak and R i ⊕ j as the input, hence applying the results from Lemma 2, Lemma 6 with |R i | = 0 and Eqn.(5) with |R i | = n, we get

GCTR-5:
Here N i R i is used as the tweak and j as the input, hence applying the results from Lemma 2, Lemma 6 and Eqn.

GCTR-15:
Here R i is used as the tweak and j as the input which is same as the inputs of GCTR-5 but with N i fixed to N = " "; an empty string for all i i.e. ν = 0 and x = q. Hence, applying the results from Lemma 2, Lemma 6 with |R i | = t and Eqn.(3) with

GCTR-16:
Here j is used as the tweak and R i as the input which is same as the inputs

Conclusion and Open Problems
We presented MFC, a generalization of the forkcipher, and used it to define and analyze a generic counter (GCTR) mode construction for tweakable primitives. Our results show that most variants of GCTR outperform the traditional CTR in terms of security and that use of a random IV can help to mitigate the impact of nonce reuse. Further, we also show that an efficient MFC instance can make any GCTR variant more efficient than the comparable CTRT instance. This work seems to be the first systematic investigation of CTR-style modes, which is surprising given its popularity.
In the security proof, we were able to rigorously analyse 22 GCTR variants. An appropriate choice of the abstraction layer combined with the unusual choice to express a bound as a function of elementary probabilities maximizes the common parts of the analyses. We obtained tight bounds that even improve on the state of the art in the case of CTRT (a.k.a. GCTR-3). We show that two variants stand out in terms of security. Our improvement of CTRT's bound illustrates that an investigation of tightness of these bounds is an interesting open problem. The result on GCTR-4 from the appendix D indicates that our security bounds could be tight and that one can define simple attacks satisfying these bounds. However, a full study of the tightness of these bounds is beyond the limit of this paper and we leave it to the future work.
An MFC is a resourceful primitive that boosts the security, and when supported by an efficient instance, can also improve the performance of applications that span beyond GCTR and AEAD for short messages. We leave to future research the questions of designing more efficient MFC instances, especially with s > 2, as well as the identification of novel applications benefiting from MFCs. One can easily see that ∇ θ (g) = 0 and therefore for λ as a Lagrange multiplier we can define a Lagrange function L as L(θ, λ) = f (θ) − λg(θ) .
Further, using the first result of Eqn.(7), we get θ 1 = θ 2 = · · · = θ d = q/d and therefore in Eqn.(8), we get the extrema of B as Calculating B for few inputs (other than this extrema) results into comparatively small values which means this only extremum of B is indeed a "maximum". Additionally, we know from the pigeonhole principle that the number of Q N s in Q can not be smaller than q/x (i.e. d ≥ q/x) and therefore, B ≤ (x − 1)q/2 .

A.2 Event V
Let us reconsider Q = {(N i , R i ) | 1 ≤ i ≤ q} as the set of q queries of A against GCTR mode with i th query labeled as its corresponding pair (N i , R i ). Let again Q N be defined as {(N i , R i ) | N i = N and 1 ≤ i ≤ q} i.e. a subset of Q with queries containing same nonce N . By definition of x, any such subset of Q can have at most x elements. Now, let us recall that V is the event when in any Q N ⊆ Q, for one of the randomly chosen R i1 , an R i1 ⊕ j 1 matches one of the previously used/defined R i2 ⊕ j 2 s, where j 1 and j 2 can take any value from the counters used in the i th 1 and i th 2 query, respectively.
Since R⊕ j is a cyclic permutation in j, we can define the set A of R⊕ j pairs where such collisions can occur as A = {(R i1 ⊕ j 1 , R i2 ⊕ j 2 ) | 1 ≤ i 2 < i 1 ≤ q, 1 ≤ j 1 ≤ * i1 , 1 ≤ j 2 ≤ * i2 , N i1 = N i2 and min{j 1 , j 2 } = 1}. Here * i = i /s represents the number of MFC blocks calls used in the i th query and thus can be treated as the maximum value for the corresponding counter of i th query. Note that any other pair for possible collision of R ⊕ j with j 2 ≥ j 1 > 1 (w.l.o.g.) can be mapped back to the pair (R i1 ⊕ 1 , R i2 ⊕ j 2 − j 1 + 1 ) in the set A. Now, for any Q N we know that we can have at most x many of these R i s and each one is of size r bits. Since each one of these R i s is chosen uniformly and independently at random we have Pr(V) = A/2 r where A is the size of the set A.
Proof. Let Q N1 , . . . , Q N d be the mutually exclusive and spanning (exhaustive) sets of queries with nonce inputs as N 1 , . . . , N d respectively (i.e. Q = ∪ d c=1 Q Nc ). Hence, we have, where V c is the event V conditioned on the set Q Nc for queries and A c is the number of elements in A with N i1 = N i2 = N c . Let |Q Nc | = θ c for all c then for x as the maximum number of nonce-repetitions, we have Clearly, for any "extrema" of f , we have ∇ θ,L L = 0 which gives us (L 1 , . . . , L d , θ 1 , . . . , θ d ) =(λ 1 , . . . , λ 1 , λ 2 , . . . , λ 2 ) .
Further, using the results of Eqn.(11) and (15), we get L 1 = L 2 = · · · = L d = (σ − q/2)/d , θ 1 = θ 2 = · · · = θ d = q/d , and therefore, in Eqn. (14), we get the extrema of A as Calculating A for few inputs (other than this extrema) results into comparatively small values which means this only extremum of A is indeed a "maximum". Additionally, we know from the pigeonhole principle that the number of Q N s in Q can not be smaller than q/x (i.e. d ≥ q/x) and therefore, A ≤ (x − 1)(σ − q/2).

B.2 Event V
For Q, Q N , x and A as defined above in Appendix A.2, we know that there are at most σ choices for a block index (i, j) and for each such choice there are at most x − 1 choices for another block index (i , 1) with i = i such that N i = N i and R i ⊕ R i = j ⊕ 1 with probability 1/2 r . Clearly, this counts all possible elements of A. Further note that we have counted each pair of indices ((i, j), (i , 1)) twice whenever j = 1 due to their ordering.
Since we are only interested in unordered pairs of indices, we subtract these extra cases from the counted ones. Let us now count these extra cases. There are at most q choices for a block index (i, j) with j = 1 and for each such choice there are at most x − 1 choices for another block index (i , 1) = (i, 1) such that N i = N i . Since we are only interested in exactly half of these pairs (i.e. the unordered pairs), we multiply by 1/2. The final bound after subtracting these pairs becomes A ≤ (x − 1)(σ − q/2) . Now, since we know that an output of a random permutation is information-theoretically indistinguishable from an output of a random function when only one forward query is allowed to these oracles, we have for q = 1,