DoveMAC: A TBC-based PRF with Smaller State, Full Security, and High Rate

. Recent parallelizable message authentication codes (MACs) have demonstrated the beneﬁt of tweakable block ciphers (TBCs) for authentication with high security guarantees. With ZMAC , Iwata et al. extended this line of research by showing that TBCs can simultaneously increase the number of message bits that are processed per primitive call. However, ZMAC and previous TBC-based MACs needed more memory than sequential constructions. While this aspect is less an issue on desktop processors, it can be unfavorable on resource-constrained platforms. In contrast, existing sequential MACs limit the number of message bits to the block length of the primitive n or below. This work proposes DoveMAC , a TBC-based PRF that reduces the memory of ZMAC -based MACs to 2 n + 2 t +2 k bits, where n is the state size, t the tweak length, and k the key length of the underlying primitive. DoveMAC provides ( n +min( n + t )) / 2 bits of security, and processes n + t bits per primitive call. Our construction is the ﬁrst sequential MAC that combines beyond-birthday-bound security with a rate above n bits per call. By reserving a single tweak bit for domain separation, we derive a single-key variant DoveMAC1k .


Introduction
Message Authentication Codes (MACs) secure the integrity and authenticity of communications.Many standardized MACs, such as CMAC [Dwo16], OMAC [IK03], or PMAC [BR02] are block-cipher modes of operations with birthday-bound security1 This fact implies hardly a problem if the state size of the underlying primitive is at least 128 bits; however, resource-limited platforms often use primitives with smaller state and key size, e.g., HIGHT [HSH + 06] or PRESENT [BKL + 07].Smaller state sizes result in lower security guarantees, which may be impractical when used in a mode with birthday-bound security, as emphasized in [IMG + 14, MV04].Therefore, MACs with higher security margins are desirable for lightweight platforms, in particular, for stateless deterministic MACs that avoid the transmission of additional nonces.

Existing Parallelizable MACs.
A considerable amount of research has been devoted to the construction of MACs with high security bounds.A generic approach is to sum the outputs of several independent hash functions or PRFs, e.g., Yasuda's Sum-ECBC [Yas10] or Iwata and Minematsu's Fr [IM16], which accumulates the outputs of multiple GMAC instances.However, this strategy implies r times the computational effort, the key material, as well as the state size as the single-hash approach.Many block-cipher-based MACs with higher security were inspired by the classical PMAC [BR02] design.Those process the message blocks in parallel, accumulate the results, and give the sum as input into a finalization that produces the tag.One approach for (slightly) higher security is the use of counter sums, where some bits of the message inputs are used for a counter.Thus, the influence of the message length on the security bound is eliminated [BGR95,Ber99].This approach has seen a recent revival, e.g., in LightMAC [LPTY16] or the variable-size counter modes by Dutta et al. [DJN17].A higher security gain can be achieved by a larger accumulator, as had been proposed in PMAC + [Yas11].This approach was adopted by various proposals, e.g., by the extensions of LightMAC [Nai17,Nai18a], but also by [IMPS17,LN17,Nai15].In particular, the latter profited from the use of tweakable block ciphers as underlying primitive.
Tweakable Block Ciphers.Tweakable block ciphers (TBCs) extend classical block ciphers (BCs) by an additional public input, called tweak [LRW02].Given a non-empty set of keys K, a tweak space T , and a state space B, a TBC π : K × T × B → B is a family of keyed permutations, s.t. for all combinations of key and tweak K, T ∈ K × T , π(K, T, •) is a permutation over B. With the recent advent of dedicated performant proposals such as Deoxys-BC, Joltik-BC [JNP14], or Skinny [BJK + 16c], TBCs have been established for various cryptographic applications, including MACs, encryption, and authenticated encryption schemes [IMPS17,JNP16b,Nai15,PS16].In those contexts, the tweak can efficiently separate domains, which can not only increase security guarantees but also lead to simpler designs.For instance, PMAC_TBC1k [Nai15], Encrypted Parallel Wegman-Carter (EPWC) [PS16] and its variant in SCT-2 [JNP16b] encode domains in the tweak space for avoiding additional input masks and multiple keys.In general, tweakable block ciphers are slightly slower than classical block ciphers with equal security guarantees since the additional tweak needs to be processed also in a secure manner.With ZMAC, Iwata et al. [IMPS17] proposed a message authentication code that brought a considerable speed-up.In contrast to previous designs, ZMAC used both the state and the tweak input to process n + t bits of message material per primitive call.Thus, it could benefit from a TBC both in terms of high security and a high rate.
State Size of A Scheme.We briefly define our intention of the required state size of a scheme.Given a TBC with a t-bit tweak, a k-bit key and a block length of n bits, processing a message block with a (T)BC implies the need to hold n + k (or n + t + k bits), respectively.A higher-level scheme may further require to store masking keys or accumulators.We disregard further memory for performance optimizations; e.g., on the majority of platforms, it is common to store an extended state with expanded key material.Moreover, TBC-based constructions can usually be easily adapted to reserve one or a few bits of the tweak for domain separation to avoid multiple keys.In certain settings, an outside environment can precompute and prepare eventual checksums, e.g., by appending it to the input; though, this poses a security risk as we will discuss later in Section 3, and a higher-level environment may be absent in certain settings.
Parallel Block-cipher-based MACs.In general, the PMAC-like block-cipher-based constructions above are not optimized for a small state.The block-cipher-based variants [Yas11,Yas12,Zha15] require at least 2n bits for the current block-cipher state and an input mask, plus k bits for at least one key, plus 2n bits for an accumulator.Light-MAC_Plus [Nai17] is similar to PMAC + , but spares n bits for the mask, yielding 3n + k bits of memory.LightMAC_Plus_1k [Nai17] shares the state requirements of Table 1: Comparison of existing deterministic MACs.r ≥ 1 is a flexible parameter for the rate; n/k/t = state/key/tweak size in bits; c ≤ n = #bits for a counter; q = #queries; m = max.#blocks per message, σ = total #message blocks.the rate excludes the calls in the finalization.

Based on compression functions
Based on classical block ciphers Based on tweakable block ciphers  3n + k + t bits each due to their 2n-bit accumulator.
Sequential MACs with High Security Guarantees.Inspired by CBC-based MACs and iterated hash functions, there exists an alternative portfolio of MACs that are inherently sequential, but employ significantly smaller amounts of memory.Clearly, the security of single-chain designs is limited by the birthday bound because of potential state collisions.A simple approach for increasing the security is to employ a primitive with larger state.This can be realized, e.g., by a compression function [DNP16,Yas08,Yas09] or with a wide-state permutation, as in Chaskey [MMH + 14].For n-bit security, Chaskey needs 2n bits for the primitive plus k bits for a final key.LightMAC and [DJN17] are on par with 2n + k bits of memory.Though, permutation-based modes such as Chaskey have to process a 2n-bit state all the time.In contrast, tweakable primitives could set the tweak constant or transform it in more lightweight manner than the state, as performed in most recent lightweight TBCs such as e.g.Skinny [BJK + 16b] or QARMA [Ava17].One approach towards higher security is the usage of two chains for (1) primitive calls and (2) accumulating intermediate results.For instance, 3kf9 [ZWSW12] enhances CBC by such an accumulating chain; a similar example is NI + -MAC [DNP16].The former combines beyond-birthday-bound security with low memory demands of only n + k bits for a CBC-like mode plus n bits for an accumulator.The latter used a 2n-bit compression function instead of a block cipher.

Rate.
Besides minimizing the memory requirements, increasing the number of message bits per primitive call is a second important factor for increasing the efficiency of MACs.Nearly all constructions considered above process n bits or less of message material per primitive call, where n is the state size.Only ZMAC and its derivatives allow to authenticate n+t message bits per call, but at the cost of a much larger state.The question that arises is if and how the high rate of ZMAC could be combined while maintaining or reducing its state size.
Contribution.This work tries to give an affirmative answer to the above.We propose DoveMAC, a highly secure PRF that needs 2n + 2t + 2k bits of memory, based on a tweakable block cipher with n-bit state and t-bit tweak size.We show a security level of approximately (n + min(n, t))/2 bits, which means full n-bit security if t ≥ n and (n + t)/2 bits otherwise.Figure 1 provides a schematic overview.It maintains a chain of t bits at the top and n bits at the bottom.Each (t + n)-bit message block is processed by a single call of a tweakable block cipher, such that t bits are used as tweak, and n bits as state input.The output of the primitive is XORed to the bottom lane before the next block.An accumulated checksum of all tweak inputs is added to the top lane.Finally, both lanes are used as tweak and state input, respectively, to a final call of the primitive that uses an independent key for generating the tag.From a high-level point of view, our construction is inspired by 3kf9, NI + -MAC, and ZMAC, but possesses twice the rate than the former two, and requires less memory than the latter.We show its utility for highly secure authenticated encryption by combining it with the nonce-IV-based variant of Counter-in-Tweak [PS16].DoveMAC requires 2n + 2t + k bits for hashing the message.An additional independent k-bit key is used for the finalization phase.We briefly outline also a variant called DoveMAC1k that reserves a single bit for domain separation.Then, the same key can be used in both hashing and finalization which can save key-management costs, but may require to split the inputs into unconventional block lengths for usual TBCs.
Outline.The remainder is structured as follows: after Section 2 continues with briefly stating the necessary preliminaries, Section 3 describes the details of DoveMAC.Section 4 provides an analysis of its PRF security, that is detailed in the subsequent Sections 5 and 6.Section 7 describes an instantiation of DoveMAC with Skinny-64-128 for common microcontrollers and compares it to the most efficient variant of ZMAC, ZMAC1.Appendix A describes two birthday-bound attacks on versions from earlier design phases and a forgery attack on a single-key version, which provide a further insight and rationale of our final proposal.Furthermore, Appendix C proposes an authenticated encryption scheme that combines DoveMAC for authentication, and Counter-in-Tweak [PS16] for highly secure encryption.

Preliminaries
General Notation.We write lowercase letters for indices and integers, uppercase letters for functions and variables, and calligraphic uppercase letters for sets.For a, b ∈ N 0 , we write [a..b] as the set of integers {a, a + 1, . . ., b}.Given a set X , we define X + = ∞ i=1 X i , and X * = ∞ i=0 X i .We denote by {0, 1} x the set of bit strings of length x, the concatenation of binary strings X and Y by X Y , and their XOR by X ⊕ Y .We let |X| denote the length of a variable X in bits; for a bit string X that is processed in units of blocks, we write X i for the i-th block of X.For X ∈ {0, 1} n and i ≤ n, we denote by msb i (X) the i leftmost and by lsb i (X) the i rightmost bits of X.To split a string into blocks of fixed maximal length, (X 1 , . . ., X x ) n ← − X indicates that X is split into n-bit blocks i. e., X 1 . . .X x = X, and |X i | = n for 1 ≤ i ≤ x − 1, and |X x | ≤ n.For any X ∈ {0, 1} n+t , we denote by (X 1 , X 2 ) n,t ← − − X the splitting of X into X 1 = msb n (X) and X 2 = lsb t (X).We denote by x n the encoding of a non-negative integer x into its n-bit representation.Moreover, we write X X to indicate that an element X is chosen uniformly at random from some given set X .A tweaked permutation π : T × X → X with tweak set T defines a family of permutation over X , i.e., for every T ∈ T , π(T, •) is a permutation over X .Given three sets T , X , and Y, we define Func(X , Y) = def {F |F : X → Y} for the set of all functions with domain X and range Y.Moreover, we write Perm(T , X ) = def { π| π : T × X → X } for the set of all tweaked permutations over X with associated tweak space T .Given sets X and Y, a uniform random function ρ : X → Y is a mapping of inputs X ∈ X independently from other inputs and uniformly at random to outputs Y ∈ Y.
Adversaries.An adversary A is an efficient Turing machine that interacts with a given set of oracles that appear as black boxes to A. We denote by A O the output of A after interacting with some oracle O.We write ∆A (O for the advantage of A to distinguish between oracles O 1 and O 2 .All probabilities are defined over the random coins of the oracles if any.Following [MRV15], we consider (mostly) information-theoretic adversaries that are restricted only by their maximal number of queries and blocks.Moreover, we assume that adversaries they never ask queries to which they already know the answer.As proposed, e.g. in [CS14], information-theoretic adversaries can be safely assumed to be deterministic.All our results can be transferred to the complexity-theoretic setting by restricting the adversaries by the time needed for the evaluation of all internal primitive calls.Note that for the transferred adversaries, the setting would assume deterministic adversaries.Adversary Characteristics.To quantify the power of adversaries, we say that an adversary A for a notion x against a scheme Π is a (q, m, σ)-x adversary if A asks at most q queries to its oracles, each of at most m blocks, and of σ blocks in total.The block size will be given in the context.We will write Adv x Π (q, m, σ) for the maximum over all (q, m, σ)-x adversaries on Π.For authenticated encryption, we will represent by m the maximum number of blocks of associated data and message combined; furthermore, σ is the maximum number of blocks over all associated data and messages combined.
Block Ciphers and Tweakable Block Ciphers.Let B = {0, 1} n be a block space for a fixed integer n.A TBC E with associated key space K, tweak space T , and message space B is a mapping E : K × T × B → B s.t. for every key K ∈ K and tweak T ∈ T , it holds that E(K, T, •) is a permutation over B. We often write E T K (•) as short form of E(K, T, •).
Definition 1 (TPRP Advantage).Let K be a non-empty finite set and B, and T be message and tweak space, respectively.Let E : K × T × B → B denote a tweakable block cipher.Let π Perm(T , B) and K K.Then, the TPRP advantage of an adversary A w.r.t.E is defined as PRFs and Universal Hashing.For the remaining definitions in this section, let K, X , and Y be non-empty finite sets.We restrict our considerations to Y ⊆ {0, 1} * , and let H : K × X → Y and F : K × X → Y be keyed functions.

Definition 2 (PRF Advantage
Definition 3 (Almost-Universal Hash Function).Let H be defined as above.We call H -almost-universal ( -AU) if for all distinct X, X ∈ X , it holds that Pr The collision probability is strongly related to almost-universality.Here, we define it in the context of messages that are composed of blocks of bit strings.
Definition 4 (Collision Probability between Message Pairs).Let H be defined as above, with the restriction that X ⊆ B * for some block space B = {0, 1} x .So, inputs to H are from a set of blocks.For arbitrary distinct messages X ∈ B m and X ∈ B m for given integers m and m , we define the collision probability by the maximum We overload the notation of the collision probability from message pairs to message sets.
Definition 5 (Collision Probability among Message Sets).Let H be defined as above, with the restriction that X ⊆ B * for some block space B = {0, 1} x .So, inputs to H are from a set of blocks.Let M denote a set of q pairwise distinct messages X ∈ B ≤m of at most m blocks each and σ blocks in total over all messages.Then, we define the collision probability of any collision of outputs between distinct messages from M as Given a function that outputs tuples, we will consider the collision probability between certain parts of the output.This is captured by the notion of truncated almost universality.
Definition 6 (Truncated-AU Hash Function).Let Y = {0, 1} n1 × {0, 1} n2 for positive integers n 1 , n 2 .We say that H is (n 1 , n 2 , )-truncated-AU (tAU) if, for all distinct X, X ∈ X , it holds that The H-Coefficient Technique.The H-coefficient technique is a proof approach due to Patarin [Pat08].It assumes that the results of the interaction of an adversary A with its oracles are collected in a transcript τ .The task of A is to distinguish the real world O real from the ideal world O ideal given its transcript τ .The transcript is called attainable if the probability to obtain it in the ideal world is greater than zero.One assumes that A does not ask duplicate queries or queries prohibited by the game.Θ real and Θ ideal denote the distribution of transcripts in the real and the ideal world, respectively.Then, the fundamental Lemma of the H-coefficient technique, the proof to which is given in [CS14,Pat08], states for information-theoretic adversaries: Lemma 1 (Fundamental Lemma of the H-coefficient Technique [Pat08]).Assume, the set of attainable transcripts can be partitioned into two disjoint sets GoodT and BadT.Further assume, there exist 1 , 2 ≥ 0 s.t. for any transcript τ ∈ GoodT, it holds that Then, for all adversaries A, it holds that ∆A (O real ; O ideal ) ≤ 1 + 2 .

The DoveMAC Construction
Basic Definitions.Throughout this section, we denote by two positive integers n and t the block length and tweak length, respectively.We define a non-empty key set K, a tweak space T = {0, 1} t , block space B = {0, 1} n , and a tweak-block space S = T × B.
Hereafter, we use a tweakable block cipher E : K×T ×B → B and a tweakable permutation π ∈ Perm(T , B), i.e., π(T, •) is a permutation over B for each T ∈ T .For primitives with t = n, we define a padding function pad : N × B → {0, 1} t , where the first parameter defines the output length t.If t = n, the input X to pad is returned unchanged; if t < n, the input X is padded by appending n − t zero bits to obtain a t-bit value that is XORed to T i .In the case that t > n, the least significant t − n bits of X are truncated instead.
Block.First, we define a mapping Block that will be used in our hash function Dove-Hash to process each (t + n)-bit block of the input.

Definition 7 (Block). For all inputs (X
Hash-then-TBC.Given a keyed hash function H : K × {0, 1} * → S, and a tweakable block cipher π : K × T × B → B. The Hash-then-TBC construction HtTBC[ π, H] hashes a message input M ∈ {0, 1} * with H K (M ).Thereupon, it maps the hash output (X, Y ) ∈ T × B to a flexible number of output blocks Z ∈ B d by using one part as tweak, and the other as state input to π.The finalization had been defined as ZFin + [ π] from [LN17] as a more efficient finalization for ZMAC.The general version XORs a counter i − 1 t to derive the i-th tweak input to obtain n-bit security.We briefly recall the general definitions for Hash-then-TBC PRF HtTBC[ π, H] and ZFin + from [LN17].

DoveHash.
Given the general definitions of Block, we define the basic hash function DoveHash[ π] : S * → S as on the right part of Algorithm 1.It assumes a constant initial value (X 0 , Y 0 ) ∈ B 2 .Internally, the hash function splits the message into m blocks M i of t + n bits each and processes each block by Block[ π]: DoveMAC.Now, we can define the stateless deterministic PRF DoveMAC[ E K1,K2 ] : {0, 1} * → B d , as in Algorithm 1.It is an instance of the HtTBC paradigm using DoveHash[ E K1 ] as hash function and ZFin + [ E K2 ] for finalization, where K 1 , K 2 ∈ K are independent keys.DoveMAC fixes the number of output blocks to d = 1 and the initial value to (X 0 , Y 0 ) = (0 n , 0 n ).A given input message M ∈ {0, 1} * is padded first to M ← M 10 * , using a one-zero padding such that the output length is the smallest multiple of (t + n) bits, which is realized in Encode.The padded message M is then hashed to (X, Y ) ← DoveHash[ E K1 ](M ); in DoveHash, the message is processed in (t + n)-bit blocks; in each block, t bits are used as tweak, and n bits as state input, until the padded message is fully processed.In the end, a checksum of all t-bit tweak inputs is XORed to the top lane.Finally, the hash output (X, Y ) is used as tweak and state input to Internally, DoveHash requires a state of n + t + k bits for calls to the TBC, plus n bits for the lower lane, which yields 2n + t + k bits.Since one can load the second key at the end and can supply the tweak checksum appended to the message, DoveMAC requires at least 2n + t + k bits of memory.

Checksum.
Setting the checksum Θ = m i=1 T i as part of the message from the outside is an implementation optimization to reduce the internal state.For maximal security, this approach should be used only if the integrity of the input can be guaranteed.This could be done in a secured processor if available on the platform.Otherwise, if this part of state is critical, the security of DoveMAC would reduce, although it would still be lower bounded by the birthday bound.If the absence of the checksum was the only modification, the security of DoveMAC would reduce to the complexity of finding a collision which occurs Algorithm 1 Authentication of a message M with construction DoveMAC, using a tweakable block cipher E and d = 1.
only in the ideal world, which yields about n/2 bits of security.Such a birthday-bound attack on a variant that omitted the checksum can be found in Appendix A. Though, it is relatively easy to see that a significantly better attack is not possible.
A Single-Key Variant DoveMAC1k.A single-key variant can be defined in a straightforward manner at the price of a less conventional splitting of the message blocks.Using a domain space of D, one can assume that the tweak space is T and can be split into a usable space T and a domain space D: T = T × D. Define D = {0, 1} δ .A single bit that differs for the intermediate calls and for the calls in the finalization suffices.A domain separation with δ ≥ 1 bit for the domain could then use π D,Ui (S i ) with the domain as, e.g., If δ ≥ 2 bits are reserved for domain purposes, further domains could be used for an encryption scheme used in combination with DoveMAC.
On the downside, this approach must split the message into potentially unconventional pieces that can potentially conflict, e.g., with byte or register alignments.The advantages and disadvantages of reserving a few tweak bits for separating domains have to be taken into account depending on the considered use case.Our security results also apply to DoveMAC1k when t is replaced by (t − 1) in the bounds.

PRF Security
If DoveHash satisfies two criteria, the PRF security of DoveMAC can be derived similarly as that of HtTBC for single-block outputs.Prior, we replace E K1 and E K2 in DoveMAC by two independent random permutations π, π Perm(T , B), respectively.We denote the resulting construction by DoveMAC[ π, π ].Using a hybrid argument, the advantage to distinguish between both settings is at most Adv TPRP E K (A ), where A is a TPRP adversary on E K that asks at most σ + 2q queries and runs in time at most O(σ + 2q).
Theorem 1 (PRF Security of DoveMAC).Let K 1 , K 2 K be independent keys.Let π, π Perm(T , B).Let A be a PRF adversary on DoveMAC[ π, π ] s.t.A asks at most q queries that consist of at most m < 2 n−2 (t + n)-bit blocks after padding each, and that sum to at most σ < 2 n−2 (t + n)-bit blocks in total.Then Remark 1.While we generalized DoveMAC to arbitrary tweak lengths, our analysis for the settings t = n and t > n often follow the same arguments since the outputs are simply truncated before they are XORed to the next tweak input.However, a larger tweak does not yield higher security, but only increases the rate.When t < n, there are up to 2 n−t possible output values that would produce internal collision events, which reduces the bound to 2 −(n+t) in the exponent. Proof.
Proof.The queries by A are collected in a transcript τ that contains the messages from A as well as the outputs {(M i , X i , Y i , Z i )} 1≤i≤q ), as well as the ideal primitives π and π .M i denotes the i-th message, X i and Y i the inputs to the final call to π , and Z i the tags.Moreover, we define the length of M i after padding by m i .Both the real and the ideal worlds have an on-line and an off-line sampling phase.In the on-line phase, the real world computes the tags Z i .The ideal world maps inputs M i ∈ M to uniformly random outputs Z i B. In the off-line phase, the real world releases all internal values X i and Y i and π.The ideal world samples π to derive those internal values X i and Y i in this phase and releases π, X i , and Y i .Those parts of the transcript are revealed to the adversary after it made all its queries, but before it outputs its decision bit that represents its guess of which world it interacted with.The task of A is then to distinguish the real world O real from the ideal world O ideal .A transcript τ is called attainable if the probability to obtain τ in the ideal world is non-zero.The set of all attainable transcripts can be partitioned into two disjoint sets GoodT and BadT.We call a transcript τ bad iff τ ∈ BadT, and denote it as good otherwise.A transcript is called bad if at least one of the following statements holds: • bad 2 : There exist distinct i, j ∈ {1, . . ., q} s.t.(X i , Z i ) = (X j , Z j ).We condition the event bad 2 to exclude bad 1 .
The proof of Theorem follows then from Lemmas 2 and 3 below.
The bad events represent possible input or output collisions in the finalization: bad 1 models the event that two pairs of state and tweak inputs collide; bad 2 indicates a collision between tweak and outputs.For all bad events, the adversary could easily distinguish the worlds.However, their probability to occur is sufficiently small as is studied in Lemma 2.
Lemma 2 (Bad Transcripts).Given the considerations from Theorem 2, and the bad events as defined above.Then We upper bound the probability of those bad events in the following.
Bad 1 .In this case, it holds that there exists at least one tuple of distinct i, j ∈ [1..q], s.t.
M i and M j yielded (X i , Y i ) = (X j , Y j ).So, the outputs would have to be equal, and with high probability, A could distinguish the worlds.This probability is at most Bad 2 .In this case, it holds that M i and M j produced (X i , Z i ) = (X j , Z j ).In the ideal world, the outputs Z i and Z j are sampled independently and uniformly at random.Given that H[ π] is (t, n, 2 )-tAU, the probability that X i = X j , for a fixed pair of i and j, is bounded by 2 .Over q queries, it follows that Our claim in Lemma 2 follows from the sum of both terms.Note that we can simplify the bound for bad 2 that will be treated in Lemma 5, which captures also the term for a collision between two queries of at most m (t + n)-bit blocks.We could generalize it over all queries to another term of the collision bound for q queries of at most σ blocks in total.However, this would treat collisions twice.Considering avoids the duplicate term for the collision bound.
The finalization of DoveMAC is identical to HtTBC[ π , H] for single-block outputs.Hence, we can apply the following lemma whose proof is given in Lemma 3 in [LN17]; for the sake of completeness, we sketch it in Appendix B.
Lemma 3 (Interpolation Probability of Good Transcripts).Given the considerations from Theorem 2, and the definition of good and bad transcripts as from its proof.Let The remaining analysis reduces to finding upper bounds for 1 and 2 .We study these properties in the upcoming Sections 5 and 6, respectively.

Collision Analysis
We will show the following lemma in this section.

Lemma 4 (Collision Probability of DoveHash[ π]
).Let σ < 2 n−2 .Then, it holds that For our analysis of the collision probability, we need the definition of the longest common prefix between two messages.Proof of Lemma 4. In the following, we consider q queries M i for i ∈ [1..q] that are collected in a transcript τ .Most of the time, however, we study the probability of events for a single message M i , that we will denote as M for simplicity or the probability of a collision for two distinct messages M i and M j .To reduce the number of indices, we will simply call them M and M where possible.Given two distinct messages M , M , we denote the blocks of M = (M 1 , . . ., M m ), and the corresponding intermediate values X i , Y i , S i , T i , and I i , for 1 ≤ i ≤ m, as well as the blocks of M = (M 1 , . . ., M m ), and its intermediate values X i , Y i , etc, for 1 ≤ i ≤ m , in the intuitive way.Note that the values U i can be derived, given T i and X i−1 .We consider m ≥ m ; the analysis of the case m ≤ m is analogous.We further define p = def LCP t+n (M, M ) for the length of the longest common prefix between M and M .If m = m , it holds that p < m and after padding, which is XORed with the previous state to yield tweak and input of the current block: , respectively, for all 1 ≤ i ≤ m and 1 ≤ i ≤ m .We will often denote by ∆ the XOR difference between corresponding blocks of the two messages.For instance, ∆X i = X i ⊕ X i , ∆Θ = Θ ⊕ Θ , and so on.
We call such an event a non-trivial tweak-input collision, whereas a trivial tweak-input collision is a collision of (U i , S i ) = (U i , S i ) for all i ∈ {0, . . ., p}.We call a tweak-input tuple (U i , S i ) fresh iff it is not old.We extend the definition of freshness to tweak-input tuples of M , (U i , S i ), in the natural manner.
Directed Graphs.We define directed unlabeled graphs G = (V, E) of a set of vertices V and a set of edges E ⊆ V ×V between them.Moreover, we define directed edge-labeled graphs G L = (V, E, L) with E ⊆ V × V × L, where L denotes a set of labels corresponding to edges.Here, we consider (u, v, ) or u − → v as a directed edge between u, v ∈ V and label ∈ L. For an edge (u, v), u is called the predecessor of v. Analogously, v is called the successor of u.
If all vertices of a walk are pairwise distinct, v is called a path.If all vertices v 1 , . . ., v m are pairwise distinct and v 0 = v m , it is called a cycle.If there exist i < j s.t.v i = v j , then, v i..j is called to contain a loop.We denote a partial sequence of v i..j ⊆ v as Note that if α is injective, the range of α can be restricted to its domain, which yields a bijective α.Function Graphs.A directed edge-labeled graph G L = (V, E, L) is called a function graph if for all vertices u ∈ V and all labels ∈ L, there exists at most one successor v ∈ V S (u) s.t.(u, v) has label .This definition is extendable to walks.If there is a walk v with labels (in that order) 1 , . . ., m , then the walk must be unique.
Intermediate Inputs and States.Let M = (M 1 , . . ., M q ) be a tuple of q pairwise distinct messages M i ∈ B mi , for 1 ≤ m i ≤ m and 1 ≤ i ≤ q and q i=1 m i ≤ σ.Moreover, we consider that M i = (M i 1 , . . ., M i mi ) are sequences of message blocks M i j , for 1 ≤ j ≤ m i .For our purpose, we denote by in as the domain of intermediate inputs and state as the domain of intermediate states.For our purpose, we will use in = T × B and state = B × B. We will use in(i, j) = (U i j , S i j ) to refer to intermediate inputs and state(i, j) = (X i j , Y i j ) for intermediate states.Sequences of inputs and states are denoted by in and state, respectively.A sequence I is attainable if there exists a function F s.t.I = in(F, M).We consider functions F ∈ Func(T × B, B).

Block-Vertex Input-Structure Graphs.
A block-vertex input-structure graph G L for a function F and a message tuple M is defined by its set of labeled edges Thus, it is a graph-theoretic representation of the intermediate values.The vertices are inputs (U i j , S i j ) to the tweakable block cipher, i.e., V = T × B; edges are transitions from one permutation state (U i j−1 , S i j−1 ) to the next one (U i j , S i j ).The labels are the message blocks M i j = (T i j , I i j ), which implies L = T × B: For our purpose, the transition will be defined as

Note that we have to implicitly keep track of the values Y
Given the sequence of predecessors, Y i j−1 is uniquely determined; however, the notation would become unhandy.To address this issue, one could consider an isomorphic structure-graph representation instead.
Block-Vertex State-Structure Graphs.We consider a block-vertex state-structure graph G S L for a function F and a message tuple M is defined by its set of labeled edges The vertices is the intermediate state (X i j , Y i j ) the tweakable block cipher, i.e., V = T × B; the edges are transitions from one permutation state (X i j−1 , Y i j−1 ) to the subsequent one (X i j , Y i j ).The labels are again the message blocks M i j = (T i j , I i j ), which implies that L = T × B: where the transition is defined as G S L is a union of all M i -walks, for M i ∈ M and 1 ≤ i ≤ q.Both G L and G S L are function graphs since for every vertex u ∈ V, all outgoing and ingoing edges have distinct labels.Moreover, each walk is unique.Note that A can construct the structure graph from the information in the transcript τ .For a given message M , the structure graph G L (M ) is isomorphic to v(M ), the walk of M .For multiple messages M , M , . . ., the graph G L (M, M , . ..) represents the union of the structure graphs isomorphic to the walks of the considered messages.Over all messages of a tuple M, G L (M) is isomorphic to the union of all walks v, v , and so on.
Partial Walks and Structure Graphs.For our purpose, it will be sufficient to consider partial walks and partial structure graphs.In the remainder, we use G L (M i , M j ) for the partial structure graph from the union of G L (M i ) and G L (M j ) that stops when the first loop or non-trivial state collision in M i , M j or between them occurs.

Core Idea. Let Coll be short-hand for the event Coll
DoveHash[ π] (t + n, q, m, σ).Let Coll(M i , M j ) denote the event that the walks of M i and M j collide.So, we can bound In the following, we can concentrate on upper bounding the collision in the graphs between two messages.To reduce the number of used indices, we name them M and M .We consider the labeled structure graph G L (M, M ) of M and M until (and including) their first non-trivial collision: ( Let (v j−1 , v j ) denote an edge from G L (M ) and (v j −1 , v j ) an edge originally from G L (M ), s.t.v j = v j denotes the first non-trivial collision of their union graph G L (M, M ).We define by G L (M, M ) = (V , E ) the subgraph of G L (M, M ) induced by removing this first non-trivial collision, and by Coll j,j (M, M ) the event that the subgraph G L (M, M ) collided at blocks j of M and j of M .Note that G L (M, M ) is determined uniquely from G L (M, M ); hence, summing over all graphs G L (M, M ) is equivalent to summing over all graphs G L (M, M ).Later, we will have to determine the probability of the condition event Coll j,j (M, M ).Thus, Equation (1) can be reformulated as (2) We will distinguish between two types of graphs: • Bad graphs: We call a graph G L (M, M ) bad iff it contains a loop.Since a loop occurs in the walks of the same message, this is equivalent to the event that at least one of the structure graphs for the individual messages G L (M ) or G L (M ) contains a loop.
• Good graphs: The graph G L (M, M ) contains no loop.
We define GoodG for the set of good partial structure graphs and BadG for the set of bad partial structure graphs.We will bound To upper bound Pr[Coll j,j (M, M )] for good graphs, we will later be able to use the entropy of independent values X i , X k from M and X i , X k from M , for some indices i, k ∈ [1..m] and i , k ∈ [1..m ], that have not yet been inputs to the permutation π before.Thus, their corresponding outputs that lead to a collision in the subsequent permutation inputs are drawn from the set of all not yet fixed values of the permutation π.Since at most 2m elements have been fixed before, the probability for them to lead to a collision can then be upper bounded by It follows that, for good graphs, Equation (2) can be computed as The bad partial graphs cover all those that do not yield two independent variables.For those bad graphs, we will show instead that their number can be upper bounded by a reasonably "small" amount.Over all q queries, it follows then Bad Graphs.We consider four cases of bad partial structure graphs with loops: • Bad 1 : The length of the loop, r, is a single block, i.e., there exists • Bad 2 : The length of the loop is a single block, i.e., r = 1, and loops with the begin, i.e.,(U 2 , S 2 ) = (U 1 , S 1 ).
• Bad 3 : The loop is longer than a single block, i.e., there exist distinct i < j ∈ [1..m] s.t.(U j , S j ) = (U i , S i ) and j ≥ i + 2. • Bad 4 : The loop is longer than a single block and collides with the initial value, i.e., there exists In the following, we investigate the probability of a loop in the individual cases of graphs.W.l.o.g., we consider the case that the loop is contained in the graph of M and consider m > m .For each, we will distinguish between the settings where t = n, t > n, and t < n.
Bad 1 .In this case, the graph of M contains a loop of length one.First, assume the Setting t = n.For a fixed index i, we investigate the probability that .
Since there exist at most m blocks, the number of cases is upper bounded by The probability to fulfill the lower equation does not increase with smaller or longer tweaks.Therefore, it is equal the settings where t > n and t < n.
Bad 2 .In this case, the second input tuple collides with the first one.First, assume the Setting t = n.For a fixed index i, we investigate the probability that .
Since there exist at most m blocks, the number of cases is upper bounded by Again, the probability to fulfill the lower equation does not increase with smaller or longer tweaks.Therefore, it is equal the settings where t > n and t < n.
Bad 3 .In this case, the graph of M contains a loop of length at least 2. First, consider the Setting t = n.For a fixed index i, we investigate the probability that Since we consider the first loop, all previous blocks do not form a loop.Consequently, the blocks (U j−1 , S j−1 ), (U j−2 , S j−2 ) are fresh and their corresponding outputs X j−1 and X j−2 are chosen randomly from a set of size at least 2 n − (j − 1) each.Since there exist at most m blocks, the probability in this case is upper bounded by m 2 (2 n − m) 2 .
In the Setting t = n, the top equality from Equation (3) becomes Isolating the outputs X j−1 and X j−2 as in the setting t = n yields that the probability that X j−1 fulfills it is 1/(2 n − m) as before.If t > n, the longer tweak cannot increase the probability.If t < n, there exist up to 2 n−t values for X j−2 that can fulfill the top equality of Equation ( 4).The probability is therefore upper bounded by Bad 4 .In this case, it holds that (U i , S i ) = (U 1 , S 1 ), where i > 2.
Again, let us start with the Setting t = n.For a fixed index i, we investigate the probability that Since we consider the first loop, all previous blocks do not form a loop and are fresh.So, X i−1 and X i−2 are fresh and chosen randomly from a set of size at least 2 n − (i − 1) each.
Since there exist at most m blocks, the probability in this case is upper bounded by m 2 In the Setting t = n, we can rewrite Equation ( 5) as We isolated the outputs X i−1 and X i−2 as in the setting t = n.The probability that X i−2 fulfills the bottom equality is 1/(2 n − m) as before.If t > n, the longer tweak cannot increase the probability.If t < n, there exist up to 2 n−t values for X i−1 that can fulfill the top equality of Equation ( 6).The probability is therefore upper bounded by There exist at most q bad graphs.So, the probability for a bad graph is upper bounded by using the fact that σ < 2 n−2 .Good Graphs.It remains to bound the number of collisions in good graphs.We denote by r the minimum distance of blocks from the block p where M diverges from M after their longest common prefix until the first collision.This means, r = min i>1 |{i : (U i+p , S i+p ) ∈ v }|.Analogously, we define the distance of blocks from the block p where M diverges from M after their longest common prefix until the first collision: r = min i>1 |{i : (U i+p , S i+p ) ∈ v}|.The values r and r do not have to be equal.Note that, for the block directly following the longest common prefix, the inputs to π must differ by definition, i.e., (U p+1 , S p+1 ) = (U p+1 , S p+1 ).So, it must hold that at least one of r and r has length at least 2: (U i , S i ) = (U i , S i ), i = p + r.We consider the following mutually exclusive cases: • Good 1 : r ≥ 3, i.e., there exists i and j s.t.(U i , S i ) = (U j , S j ) with i ≥ p + 3.
Recall that we consider the first collision.Since the graphs are good, they exclude loops.
Good 1 .In this case, the graphs of M and M diverge and converge again, where r ≥ 3 blocks are between division and conversion in the graph of M .We start again in the Setting t = n.For a fixed index i, we investigate the probability that We can isolate the blocks X i−1 and X i−2 .Since we consider the first collision at X i and no loop, the inputs (U i−1 , S i−1 ) and (U i−2 , S i−2 ) that produced X i−1 and X i−2 are fresh.So, X i−1 and X i−2 are sampled randomly from sets of size at least 2 n − (i − 1).Thus, the probability that they fulfill Equation ( 7) is upper bounded by m 2 (2 n − 2m) 2 .
In the Setting t = n, Equation ( 7) becomes The probability that X i−2 fulfills the bottom equality is still 1/(2 n − 2m).If t > n, the longer tweak cannot increase the probability.If t < n, there exist up to 2 n−t values for X i−1 that can fulfill the top equality of Equation ( 8).The probability is therefore upper bounded by Good 2 .In this case, the graphs of M and M diverge and converge again after two blocks in the i-th block and it holds that T i = T i .
In the Setting t = n, we investigate the probability for fixed i that We can assume that i = i holds for both messages, which means the graphs of both messages diverge at the i − 1-th block, and join again at the i-th block.Otherwise, if r > 2, we could swap M and M and are in the case of good 1 .So, we assume r = r = 2.
In this case, we also consider that T i ⊕ T i = ∆T i = 0, i.e., T i = T i .So, we study the probability From , the bottom lanes of both messages differ.If the graphs of M and M would be common in all blocks M j = M j , for i ≤ j ≤ m, the bottom lanes would differ until the end: Since the bottom lane is used as state input, this would imply no collision at the end.
To obtain a collision at the end, there must exist a second diversion phase between the graphs, i.e., there must exist some fresh index j > i s.t.(U j−1 , S j−1 ) = (U j−1 , S j−1 ) or even j > m .This means also the probability Since i − 1 and j − 1 are distinct indices and fresh, X i−1 and X j−1 are random values from sets of at least 2 n − 2m elements each.Thus, the probability that they fulfill Equation ( 9) is upper bounded by m 2 (2 n − 2m) 2 .
In the Setting t = n, we can rewrite Equation (9) to The probability that X j−1 fulfills the bottom equality is still at most 1/(2 n − 2m).If t > n, the longer tweak cannot increase the probability.If t < n, there exist up to 2 n−t values for X i−1 that can fulfill the top equality of Equation ( 11).The probability is therefore upper bounded by Good 3 .In this case, the graphs of M and M diverge and converge again after two blocks in the i-th block, and it holds that Here, Setting t = n investigates the probability We can assume that i = i holds for both messages, which means the graphs of both messages diverge at the i − 1-th block, and join again at the i-th block.Otherwise, if r > 2, we could swap M and M and are in the case of good 1 .So, we assume r = r = 2. Since T i = T i , we consider the probability In order to allow X i = X i to collide, it follows that the tweaks must have differed in the i − 1-th block: Since the outputs from the penultimate blocks X i−2 = X i−2 were equal, this implies that If the graphs of M and M are identical until the last block, it would follow that the tweak checksums would differ: Θ = Θ .So, as for good 2 , there must exist a second diversion phase between the graphs.This means, there must exist some index j > i s.t.either (U j−1 , S j−1 ) = (U j−1 , S j−1 ) or alternatively j > m .This means also the probability Since i−1 and j −1 are distinct indices and fresh, X i−1 and X j−1 are chosen randomly from sets of at least 2 n − 2m elements each.Thus, the probability that they fulfill Equation ( 12) is upper bounded by m 2 (2 n − 2m) 2 .
In the Setting t = n, Equation ( 12) becomes The probability that X j−1 fulfills Equation (13) equality is still 1/(2 n − 2m).If t > n, the longer tweak cannot increase the probability.If t < n, there exist up to 2 n−t values for X i−1 that can fulfill Equation ( 14).The probability is therefore upper bounded by Good 4 .In this case, the graphs of M and M diverge at the (p + 1)-th block and converge again after two blocks for one graph, which hits the p + 1-th block of the respective other.We will study the sub-case (U p+2 , S p+2 ) = (U p+1 , S p+1 ).The sub-case Since we assume that the graphs are good, there exist no previous non-trivial collisions.Since π is a random permutation, the values X p and X p+1 are randomly chosen from at least 2 n − 2m elements each.Thus, the probability that they fulfill Equation ( 15) is upper bounded by m 2 (2 n − 2m) 2 .
In the Setting t = n, Equation (15) becomes The probability that X p+1 fulfills the bottom equality is at most 1/(2 n − 2m).If t > n, the longer tweak cannot increase the probability.If t < n, there exist up to 2 n−t values for X i−1 that can fulfill the top equality of Equation ( 16).The probability is therefore upper bounded by Over all cases of good graphs and q 2 query pairs, the probability of a collision is upper bounded by using again our assumption of σ < 2 n−2 .Taking the sum over all cases, we can derive our bound in Lemma 4.

tAU Analysis
In addition to the collision analysis, we need an upper bound on the probability that two distinct messages yield a collision in X = X .This is captured by the following Lemma.

Lemma 5 (tAU Upper Bound of DoveHash
Proof.The proof follows from a similar but simpler argumentation as in our collision analysis.We study the probability of events for a single message M i , that we will denote as M for simplicity, or the probability of a collision for two distinct messages M i and M j , that we will call M and M for simplicity.As in the collision analysis, we will consider a directed, edge-labeled function graph G = (V, E, L) of our construction.Though, here, we differ from the previous graph by considering permutation outputs.So, the values (X i , Y i ) will represent the vertices of the graph.The set of edges E are the transitions between vertices; the set of labels L consists of exactly those input tuples So, we interpret the blocks of a message M = (M 1 , . . ., M m ) as the labels of the graph.Again, we consider walks v and v associated with M and M , respectively.In the following, we differentiate walks according to non-trivial output collisions, i.e., collisions between two permutation outputs • X i = X j , where i > LCP t+n (M, M ) for some i ≥ 1 and j ≥ 0 or We call the event of an output collision in the same message, X i = X j or X i = X j , for some i = j, an output loop.Moreover, we exclude non-trivial tweak-input collisions from consideration here since their probability has already been studied in the collision analysis.Clearly, their probability is upper bounded by Instead, we will focus on three mutually exclusive cases: • Bad Walks: − bad 1 : The partial walk v contains an output loop.
− bad 2 : The partial walks v and v contain no output loop but a non-trivial output collision.
• Good Walks: The walks v and v contain no output loops and no non-trivial output collision.We call such walks v good walks.
Bad 1 .In the following, we investigate the probability of an output loop.We stop at the first loop and assume no further non-trivial tweak-input collision.W.l.o.g., we consider the case that the loop is contained in the walk of M and consider m > m .For each, we will distinguish between the settings where t = n, t > n, and t < n.Assume the Setting t = n.In this case, we consider the probability Pr [X i = X j ] .
Since the input has been fresh, the probability that both values are equal is at most 1/(2 n − 2m).Over at most m 2 possible combinations of blocks, the probability is upper bounded by m 2 2 n − 2m .
Note that we compare the non-truncated and non-padded values X i .moreover, X 0 ∈ B; so, the probability upper bound holds also in the Setting t = n.
Bad 2 .Here, we consider non-trivial output collisions between two messages M and M .Again, we study the first non-trivial output collision X i = X i .and excluded non-trivial state-input collisions and bad walks.So, the input tuple (U i , S i ) has been fresh.Thus, the probability that two permutation outputs are equal is at most 1/(2 n − 2m).Over at most m 2 possible combinations of blocks, the probability is upper bounded by m 2 2 n − 2m .
Again, we compare the non-truncated and non-padded values X i .Thus, the upper bound also holds in the Setting t = n.So, the probability that a walk is bad is at most using the assumption that m < 2 n−2 .
Good Walks.It remains to study the probability of collisions X = X and no output loops or non-trivial output collisions occurred.For a collision at the end, it has to hold that We study two cases depending on ∆Θ = Θ ⊕ Θ : (1) ∆Θ = 0 t and (2) ∆Θ = 0 t .
Case (1): ∆Θ = 0 t .Here, it must hold that Since we assume no further non-trivial output nor non-trivial tweak-input collisions, the tweak-input tuple (U m , S m ) was fresh.In the Setting t = n, the probability is at most The probability does not increase in the Setting t > n.In the Setting t < n, there can exist at most 2 n−t outputs X m that lead to the collision.So, the probability is upper bounded by Case (2): ∆Θ = 0 t .Here, it must hold that We differentiate between our three settings.
In the Setting t = n, the fact that X m ⊕ X m must hold implies an output collision.This would contradict our assumption that no non-trivial output collisions have occurred.Thus, in this setting, the case has probability zero here.A similar argument holds in the Setting t > n.
In the Setting t < n, Equation ( 17) can hold either due to a non-trivial output collision, or that the most significant t bits collided from two different outputs.Only the latter event is relevant and would mean that X m = X m .Again, it implies, together with our assumptions of no further non-trivial output nor non-trivial tweak-input collisions, that tweak-input tuple (U m , S m ) was fresh.
In the Setting t < n, there can exist at most 2 n−t outputs X m that lead to the collision on the most significant t bits.Then, the probability is upper bounded by So, the probability for an output collision when the transcript is good is at most using the assumption that m < 2 n−2 .From all cases, we obtain our bound in Lemma 5.

Instantiation
This section reports on an instantiation of DoveMAC with Skinny-64-128 [BJK + 16c].We provide a rationale, results of an implementation on two common Atmel microcontrollers, and a brief comparison with an implementation of ZMAC1 on the same platform and with the same primitive [IMPS17].
Choice of A Primitive.DoveMAC would benefit from a tweakable block cipher with a tweak length of at least the block size.In contrast to ZHash, DoveHash does not require an additional counter.Moreover, the term O(σ 2 /2 n+min(n,t) ) in the bound limits the security to at most n bits.So, while larger tweaks would increase the rate, the security does not increase for t > n.So, we considered performant lightweight tweakable block ciphers for instantiation with t = n.Among the available lightweight primitives, the search focused on Skinny-64-128 [BJK + 16c], Joltik-BC-64-192 [JNP14], MANTIS [BJK + 16c], and QARMA [Ava17].We opted for Skinny-64-128 due to its lightness and its availability, among others, for microcontrollers.We give a brief overview for the sake of self-containment.
Definition.Skinny [BJK + 16c] is a lightweight tweakable block cipher that employs the TWEAKEY schedule [JNP14].State words and tweakey words are represented by a 4 × 4-matrix of cells, where each cell is a nibble for Skinny-64.So, key and tweak words are considered together and are processed in a linear update function to produce the round keys.The cipher consists of a 36-round substitution-permutation network which consists of SubCells (SC), AddConstants (AC), AddRoundTweakey (ART), ShiftRows (SR), and MixColumns (MC).The primitive is optimized towards low code and area size.Compared to most earlier lightweight SPNs, Skinny omits an initial key whitening and comes with a non-MDS mixing layer that can be implemented by a few simple XORs.Moreover, the round tweakey is XORed only to half of the state, i.e., to the two topmost state rows in each round.The round function and an iteration of the tweakey schedule are illustrated in Figure 3.More details can be found in [BJK + 16c].
Implementation.We implemented DoveMAC in C on ATmega 2560 [Atm14] and ATmega 328p [Atm18], which are common 8-bit RISC microcontrollers operating at 16 MHz.The former has 256 KiB flash memory, eight KiB RAM, and four KiB EEPROM; the latter 32 KiB flash memory, two KiB SRAM, and one KiB EEPROM available.As primitive, we employed the public Skinny-64-128 implementation for microcontrollers from [rwe18] referenced by the SKINNY designers' overview [BJK + 16a].Internally, this implementation uses two parallel four-bit S-boxes and precomputed the subkeys.
For comparison, we also implemented ZMAC1 [Nai18b], a successor of ZMAC with the same primitive on the same platform.ZMAC is the most intuitive and most illustrative choice of a MAC with a rate that is comparable to that of our proposal.For fairness, we used the recent more efficient successor ZMAC1 that spares the separate domains and employs the same finalization as DoveMAC.So, differences between the constructions are majorly due to the hash function, plus the hash function avoids to extract odd tweak portions and can also use 64-bit tweaks per primitive call.
Our instantiation uses two 64-bit keys for both DoveMAC and ZMAC1, one for the hash function and a second one for the finalization each.The results of our comparison are given in Table 2 for message lengths of up to four KiB for the ATmega 2560 and of up to one KiB for the ATmega 328p.We employed avr-gcc as compiler with the -Os option for minimizing the code size.Each measurement represents the mean of 1 000 tag computations of hash function and finalization.The storage values in RAM exclude the size of the message and keys.Since the microcontrollers are similar, the storage results were identical and are given therefore only once in Table 2.The storage and performance values depend strongly on the primitive implementation.
For the chosen setup, DoveMAC is about 7-12 percent faster compared to ZMAC1, which is likely to be caused by the doublings in the latter.After subtracting the message and key, our implementation of DoveMAC[Skinny-64-128] used 176 bytes of RAM.An implementation of ZMAC1 on the same platform and with the same block-cipher implementation employed 236 bytes of RAM.Our implementation results leave room for further minimizing the state considerably from the theoretical minimal requirements.Though, the differences reflect implementation-specific overheads in ZMAC1, e.g., the masks, as well as temporary variables for the counters.We plan to publish the source code freely available to the public domain.

Conclusion and Future Work
This work proposed a sequential TBC-based PRF that attempts to reduce the memory requirements of ZMAC, in the spirit of the 3kf9 and NI + -MAC designs at a rate of t + n bits per primitive call.Our construction is the first sequential block-cipher-based proposal that processes more message bits than the state size of the primitive while having O((n + min(n, t))/2) bits of security.We could simply derive a single-key variant DoveMAC1k that spares the second key of DoveMAC by reserving a bit from the tweak space to separate the primitive used for hashing from that used for the finalization.
Future Work.DoveMAC can be easily combined with a small-state encryption mode to an AE scheme.There exist several options: (1) an on-line scheme that uses a PRF only for authentication of the associated data, (2) an off-line nonce-based AE scheme, or (3) a deterministic off-line AE scheme.For all variants, the nonce-IV or the purely IV-based versions of CTRT [PS16] would allow to use the already available TBC efficiently.We outline the second option in Appendix C. The third option would require a longer IV of at least (t + n) bits to benefit from the high security guarantees of DoveMAC.This would require to use a longer output from the PRF.We show briefly in Section A why it is not straight-forward to derive longer outputs from DoveHash and ZFin + with high security.So, a future work is to derive a highly secure deterministic AE scheme.Though, the focus of the current work resided on a highly secure fixed-output-length PRF that also uses the tweak for message absorption and reduced the state compared to previous ZMAC-like variants.3.If there exists any collision Z i = Z j in L, for i = j, generate M q+1 = M i P , where P is an arbitrary postfix.Ask for its tag Z q+1 .
The forgery is valid with high probability, namely if

A.3 An Insecure Single-key Variant of DoveMAC
This section illustrates an O(1) forgery attack if DoveMAC would use the same key E K for both the hash function and finalization.All other aspects compared to the original definition in Section 3 remain unchanged.The bottom part of Figure 4 visualizes this scheme.
The forgery attack exploits a classical length extension.For simplicity, we consider the case t = n and specify the messages after the 10 * padding has been applied.We provide the messages as tuples of M i j = (T i j , I i j ), where i denotes the message index and j the block index.Then, the steps are as follows: 1. Choose M 1 = ((0 n , 0 n ), (10 n−1 , 0 n ).This implies that checksum is Θ 1 = 10 n−1 .Ask for the corresponding authentication tag Here, the padding is located in the third block.This implies that the checksum is Θ 2 = 0 n .Ask for the corresponding authentication tag Definition 11 (nivE Advantage).Let Π = (E, D) be a nonce-IV-based encryption scheme with signatures and assumptions as above.Let K K and let A be an adversary against Π with access to an oracle, s.CTRT is a variant of the well-known Counter mode, where the nonce is used as input to all calls of E K and the tweak V i is derived from V .CTRT defines IV = T , and uses an increment function inc : T → T that increments the value of a given V .We define that inc i (V ) = V ⊕ i − 1 t for all i ≥ 1.Note that the decryption is similar to the encryption.It differs only in the fact that the decryption oracle does not choose IVs: The following security statement is part of Theorem 1 in the full version of [PS16] for nonce-respecting adversaries.

Theorem 3 (nivE Security of CTRT [PS16]
).Let A be a nonce-respecting nivE adversary against CTRT[ E] that asks at most q queries of σ blocks.Let K K. Then where A asks at most σ queries.Definition 12 (Priv Advantage).Let ρ Func(N × A × M, C × U) be length-preserving and K K. Let A be an adversary on E that does not repeat inputs N ∈ N .Then, the Priv advantage of A w.r.t.Π is defined as Adv Priv Π (A) = def ∆A (E K ; ρ).Definition 13 (Auth Advantage).Let A be an adversary on Π that is given access to an encryption and decryption oracle where K K is a random secret, and A does not repeat inputs N ∈ N to the encryption oracle, and does not ask responses from the encryption oracle to the decryption oracle.Then, the Auth advantage of A is defined as the probability that A successfully forges a valid ciphertext that is accepted by the decryption: Adv Auth Let A be a nonce-respecting adversary on Π that is given access to an encryption and decryption oracle, where K K is a random secret, and A does not ask queries that it received as responses from the encryption oracle to the decryption oracle.Then, the nAE advantage of A is defined as Adv nAE Π (A) = def ∆A (E K , D K ; ρ, ⊥), where ⊥ : N × A × M × U → {⊥} returns the invalid symbol for every input.

Nonce
We call A a (q e , q d , σ)-nAE-adversary if A asks at most q e encryption queries, and at most q d decryption queries that consist of at most σ blocks in total.
Fact 1.Given a nonce-respecting (q e , q d , σ)-nAE-adversary A on an nAE scheme Π, there exists a nonce-repeating (q e , q d , σ)-AE adversary A on Π s.t.Adv nAE Π (A) ≤ Adv AE Π (A ).Fact 1 follows form the trivial observation that any nonce-respecting adversary A is also a nonce-repeating A , and the latter can simply copy the exact behavior from A to inherit A's advantage.It is well-known that, if A is an (q, q d , σ)-nAE adversary on Π, then it holds that for any (q, σ)-Priv adversary A and (q, q d , σ)-Auth adversary A .
NSIV.NSIV[F, Π] is a SIV-like nonce-based off-line AEAD scheme by [PS16] based on a nonce-based function F : K 1 × N × A × M → U, and a nonce-IV-based encryption scheme Π = (E, D) with key space K 2 and E : K 2 × N × IV × {0, 1} * → {0, 1} * .The scheme defines a regular function Conv : U → IV that converts the output from F into an IV for Π.3 First, nonce, associated data, and message are processed by F to produce a tag: T ← F K1 (N, A, M ).Then, the scheme computes V ← Conv(T ), and encrypts the message M to C ← E K2 (N, V, M ).The output is given by (C, T ).The following theorem is slightly adapted from Theorem 4a) in [PS16].
Theorem 4 (nAE Advantage of NSIV).Let F and Π have signatures as above, and let Conv be a regular function.Let K 1 , K 2 K 1 × K 2 .Let A be a nonce-respecting nAE adversary against NSIV[F, Π] with access to two oracles such that A never queries outputs from its first (encryption) oracle to its second (verification) oracle.A asks at most q queries to its available oracles consisting of at most m blocks each and σ blocks in total.Then where A and A ask at most q queries of at most σ blocks in total and run in time at most O(σ).
Remark 2. Theorem 4 in [PS16] contains separate inequalities for nonce-repeating and nonce-respecting adversaries.The bound above is equivalent to the AE security statement for NSIV[F, Π]; though, it also holds for nonce-repeating adversaries.Their separation of statements was necessary since the statement of Theorem 4b) provided a significantly better bound for nonce-respecting settings for the choice of Counter-in-Tweak and the Encrypted Parallel Wegman-Carter MAC.However, since the security of DoveMAC does not depend on nonces, we can work with the nonce-repeating bound hereafter.Theorem 5 (nAE Advantage of DoveSIV).Let F and Π have signatures as above, and Conv be given as in Algorithm 2. Let K 1 , K 2 , K 3 K 3 be independent.Let A be a nonce-respecting nAE adversary against DoveSIV[F, Π] with access to two oracles such that A never queries outputs from its first (encryption) oracle to its second (verification) oracle.Moreover, A asks at most q queries whose concatenated lengths of associated data and messages consists of at most m (t + n)-bit blocks each and at most σ < 2 n−4 (t + n)-bit

Figure 1 :
Figure 1: Hashing with DoveHash (left) and finalization with ZFin + (right) of a message M = (M 1 , M 2 , M 3 ) with three (t + n)-bit blocks with DoveMAC[ E K1,K2 ].Each block M i is split into a t-bit part T i and an n-bit part I i .Θ = m i=1 T i denotes a checksum over all tweak input parts.

Figure 2 :
Figure 2: Structure graphs that visualize the bad graphs of message M (top), and the good structure graphs of messages M and M (bottom).

Figure 4 :
Figure 4: Earlier less secure variants of DoveHash and DoveMAC.Top: Variant without the final checksum.Middle: Variant where the lower input is XORed directly to the bottom lane.Bottom: Single-key variant of DoveMAC.

Figure 5 :
Figure 5: Encryption of a message M = (M 1 , . . ., M m ) with the nonce-IV-based variant of Counter-in-Tweak [PS16] under nonce N and IV T .Conv : IV → T is a regular function and inc : T → T increments the current value of the tweak inputs s.t.V i = V ⊕ i − 1 t , for 1 ≤ i ≤ m.
t.A never repeats a nonce N .Let ρ Func(N ×IV ×M, C) be a length-preserving function.For each (N, M ) ∈ N × M input, the oracle samples IV IV, and outputs (IV, C) where C = E K (N, IV, C) in the real world and C = ρ(N, IV, M ) in the ideal world.The IV is not chosen by A in both worlds.Then Adv nivE Π (A) = def ∆A (E K ; ρ).Counter-in-Tweak.The nonce-IV-based version of Counter-in-Tweak takes as inputs tuples of (N, V, M ) ∈ N × IV × M, and produces a ciphertext C of equal length as the message: |C| = |M | for all keys K ∈ K and inputs (N, V, M ).Note that V IV is chosen by the oracle in the nivE model.
-based Authenticated Encryption.Let A and U define non-empty sets of associated data and authentication tags, respectively.A nonce-based authenticated encryption scheme (with associated data) Π = (E, D) is a tuple of deterministic encryption algorithm E : K × N × A × M → C × U and deterministic decryption algorithm D : K × N × A × C × U → M × {⊥} with associated key space K.The associated data is authenticated but not encrypted.The decryption function D takes a tuple (N, A, C, T ) and outputs either M or ⊥ if the input is invalid.Again, we assume C = M ⊆ {0, 1} * , as well as correctness and length preservation.Note that [NRS14, Rog04] combine ciphertext and tag in a single entity C. Though, it is more natural to consider two entities of ciphertext C and authentication tag T .Hereafter, we briefly recap the privacy and authenticity notions for nonce-based authenticated encryption.

.
Let K 1 , K 2 , K 3 ∈ K be independent keys, N = {0, 1} n , and T = {0, 1} t .We define DoveSIV as an instantiation of NSIV[F, Π], where the PRF F is instantiated with DoveMAC[ E K1 , E K2 ] and the encryption scheme Π with CTRT[ E K3 ] .We writeDoveMAC[ E K1,K2 ] as a short form of DoveMAC[ E K1 , E K2 ],hereafter.More detailed, we define an injective function Encode x : N ×A×M → ({0, 1} x ) * that maps nonce, associated data, and message into the block-wise format for DoveMAC: M = Encode t+n (N, A, M ).The encryption scheme is CTRT[ E K3 ] with XOR-based increment of the tweaks.The conversion function Conv : B → U maps the n-bit IV to a t-bit tweak for E by chopping off the least significant n − t bits.Now, we can combine the results from Theorems 3 and 4 with the security of CTRT and NSIV with Theorem 1 that quantifies the PRF security of DoveMAC.
[JNP16b]which differed from each other in the fact that they used only the plaintext, tweak, or both inputs during the finalization.The same work developed ZMAC1, which could avoid the tweak-based domain separation in the hash function with limited loss of security.Though, his variants share the same memory requirements with ZMAC. 2 EPWC or PMAC_TBC1k avoid the need for input masks.theformer[PS16]requires a nonce for beyond-birthday security and 2n + t + k bits; its variant as used in SCT-2[JNP16b]can append the nonce that is used in the finalization to the message to use the same amount of memory.Though, since we consider deterministic authentication, Table1contains only the nonce-ignoring variant of EPWC, although it can guarantee only birthday-bound security.Finally, PMAC_TBC1k and PMAC_TBC3k require Nai18b] 1 4n+2t+k O(σ 2 /2 n+min(n,t) +q/2 n ) t+n 1 DoveMAC [This work] 2 2n+2t+2k O(q 2 m 2 /2 n+min(n,t) +q/2 n ) t+n 1 DoveMAC1k [This work] 1 2n+2t+k O(q 2 m 2 /2 n+min(n,t−1) +q/2 n ) t+n−1 1 LightMAC_Plus; it spares four bits due to its n − 2-bit accumulators.The generalized variant of LightMAC_Plus, LightMAC_Plus2 [Nai18a], requires 4n + k bits in its finalization; PMAC w/ Parity [Yas12] needs r masks plus n bits each for the current block and the accumulator.Parallel TBC-based MACs.The situation is similar for existing TBC-based MACs.ZMAC needs 3n + t + k bits for input, masks, and key, plus n + t bits for the accumulators.Recently, Naito [Nai18b] investigated variants of ZMAC called ZMACb, ZMACt, and The proof of Theorem 1 follows from Theorem 2. The latter will use Lemmas 4 and 5 to derive concrete bounds.Theorem 2 is similar to [LN17, Corollary 1].
Theorem 2 (PRF Security of HtTBC).Let H be short for DoveHash[ π].Assume that the collision probability over all messages Coll H[ π] (t + n, q, m, σ) is upper bounded by 1 , and H is (t, n, 2 )-tAU.Let A be a PRF adversary against HtTBC[ π , H] that makes at most q queries consisting at most m (t + n)-bit blocks after padding each, that sum to at most σ (t + n)-bit blocks in total.Then

Table 2 :
Rounded throughputs in cycles/byte and RAM storage in bytes of our implementations on Atmel microcontrollers.