Dumbo, Jumbo, and Delirium: Parallel Authenticated Encryption for the Lightweight Circus

. With the trend to connect more and more devices to the Internet, authenticated encryption has become a major backbone in securing the communication, not only between these devices and servers, but also the direct communication among these devices. Most authenticated encryption algorithms used in practice are developed to perform well on modern high-end devices, but are not necessarily suited for usage on resource-constrained devices. We present a lightweight authenticated encryption scheme, called Elephant. Elephant retains the advantages of GCM such as parallelism, but is tailored to the needs of resource-constrained devices. The two smallest instances of Elephant, Dumbo and Jumbo, are based on the 160-bit and 176-bit Spongent permutation, respectively, and are particularly suited for hardware; the largest instance of Elephant, Delirium, is based on 200-bit Keccak and is developed towards software use. All three instances are parallelizable, have a small state size while achieving a high level of security, and are constant time by design.


Introduction
Authenticated encryption has become an integral part of our modern communication infrastructure.Considering the rise of the Internet of Things, the usage will not only expand, but it will also be required that authenticated encryption algorithms run on resource-constrained devices.Many modern cryptographic protocols like TLS [Res18] or the Signal protocol [PM16, CCD + 17] rely at their core on authenticated encryption.For instance, TLS 1.3 [Res18] relies on AES-GCM, or ChaCha20 with Poly1305, whereas in the Signal protocol [PM16, CCD + 17], the task of authenticated encryption can be performed using AES in CBC mode for encryption paired with HMAC-SHA-2 for authentication.While the performance of these constructions may be sufficient on modern high-end systems, they have inadvertently some drawbacks for the usage in lightweight systems.
A first drawback is the use of components such as the AES [DR02], ChaCha [Ber08], and SHA-2 [FIP12], which were not designed with lightweight applications in mind.Moreover, ChaCha and SHA-2 make extensive use of modular additions, which is not the best choice for lightweight hardware implementations.A second problem is the need for the implementation of two different primitives (one for encryption and one for authentication) for performing the single task of authenticated encryption, which is a potential waste of resources in lightweight applications.This is still true if the primitives within these constructions are replaced with more lightweight counterparts.Furthermore, the usage of lightweight 64-bit block ciphers for the aforementioned mode implies stringent restrictions on the amount of data that can be safely encrypted [BL16,LS18].The need for authenticated encryption schemes that perform well on resource-constrained devices has recently been addressed by NIST's call for lightweight authenticated encryption schemes [Nat18].The call specifies a request for authenticated encryption schemes having at least 112-bit security provided that the online complexity is at most around 2 50 bytes.
To provide an alternative for lightweight applications, we introduce the authenticated encryption scheme Elephant.The mode of Elephant is a nonce-based encrypt-then-MAC construction, where encryption is performed using counter mode and message authentication using a variant of the Wegman-Carter-Shoup MAC [WC81,Sho96,Ber05].Both modes use a cryptographic permutation masked using LFSRs, akin to the masked Even-Mansour construction of Granger et al. [GJMN16].
The mode is permutation-based and only evaluates this permutation in the forward direction.As such, there is no need to implement multiple primitives or the inverse of the primitive, unlike in OCB-based [RBBK01, Rog04, KR11] authenticated encryption schemes.Furthermore, this allows us to rely and build on the extensive literature of permutations used for sponge-based lightweight hashing [AHMN10, GPP11, BKL + 11].That said, Elephant itself is not sponge-based: on the contrary, it departs from the conventional approach of serial permutation-based authenticated encryption.Elephant is parallelizable by design, easy to implement due to the use of LFSRs for masking (no need for finite field multiplication), and finally, it is efficient due to elegant decisions on how the masking should be performed exactly.A security analysis in the ideal permutation model demonstrates that the mode of Elephant is structurally sound.
Due to the parallelizability of Elephant, there is no need for instances with a large permutation: we can go as small as 160-bit permutations while still matching the security goals recommended by the NIST lightweight call [Nat18].In detail, the Elephant scheme consists of three instances: 1. Dumbo: Elephant-Spongent-π [160].This instance meets the minimum permutation size as dictated by the security analysis: it achieves 112-bit security provided that the online complexity is at most around 2 46 blocks.This instance is particularly well-suited for hardware, as Spongent [BKL + 11] itself is; 2. Jumbo: Elephant-Spongent-π [176].This is a slightly more conservative instance of Elephant: it is based on the same permutation family, yet achieves 127-bit security under the same conditions on the online complexity.We note, in particular, that Spongent-π[176] is ISO/IEC standardized [BKL + 11, ISO16]; 3. Delirium: Elephant-Keccak-f [200].This variant is developed more towards software use, although it still performs reasonably well in hardware.Elephant instantiated with Keccak-f [200] also achieves 127-bit security, with a higher bound of around 2 70 blocks on the online complexity.The permutation is the smallest instance that is specified in the NIST SHA-3 standard [BDPV11b,FIP15] that fits our needs.
Dumbo and Jumbo are named after two famous elephants; Delirium is named after a Belgian beer, whose logo is a pink elephant.As each of the permutations is relatively small, all versions of Elephant have a small state size, despite its support for parallelism.The LFSRs used for masking are tailored to the specific instance, one for each, and are developed to operate well with the specific cryptographic permutation.For example, the LFSRs paired with the Spongent instances have been chosen to minimize the number of XOR operations that have to be performed for a state-update, while the Keccak-based instance has been selected to perform well on software platforms.We note that the three cryptographic permutations in Elephant can also be used for cryptographic hashing -in fact, Spongent [BKL + 11] and Keccak [BDPV11b] themselves are sponges -but due to our quest for small permutations, these cryptographic hash functions cannot meet the 112-, or 127-bit security level guaranteed by our authenticated encryption schemes.In contrast, in order to perform sponge-based hashing with at least 112-bit security, a cryptographic permutation of size at least 225 bits must be used.

Related Work
Basing authenticated encryption on public permutations has become more and more popular with the standardization of the sponge-based [BDPV07] hash function Keccak [BDPV11b] as SHA-3 [FIP15], and the associated duplex construction [BDPV11a,MRV15,DMV17].Besides these sequential approaches, several permutation-based authenticated encryption schemes have been proposed that allow for parallel processing of the input.Examples of such constructions are Minalpher [STA + 15] and OPP [GJMN16] that require the inverse for decryption, MRO [GJMN16] that requires the processing of the whole associated data and message before encryption can start, and Farfalle [BDH + 17] that shows how to instantiate a PRF by using permutations for parallel compressing and expanding, and that builds authenticated encryption on top of it.In some sense, Elephant can be seen as a complement to those existing parallel modes that puts its focus on lightweight authenticated encryption while still allowing for parallel computations.

Outline
Cryptographic preliminaries and the security model are discussed in Section 2. We describe the Simplified Masked Even-Mansour (SiM) tweakable block cipher in Section 3.This tweakable block cipher will be used in the security analysis of the Elephant authenticated encryption scheme.Elephant itself is discussed in Section 4, with its specification in Section 4.1 and its security analysis in Section 4.2.The three instances, and in particular the choice of the LFSRs, are described in Section 5.1 (for Dumbo), Section 5.2 (for Jumbo), and Section 5.3 (for Delirium), respectively.We give a detailed discussion of the design rationale, including implementation aspects, of Elephant in Section 6. Security proofs of SiM and Elephant are given in Sections 7 and 8, respectively.The work is concluded in Section 9.

Security Model
For n ∈ N, we let {0, 1} n denote the set of n-bit strings and {0, 1} * the set of arbitrarily length strings.For X ∈ {0, 1} * , we define to be the function that partitions X into = |X|/n blocks of size n bits, where the last block is appended with 0s.The expression "A ?B : C" equals B if A is true, and equals C if A is false.For x ∈ {0, 1} n and i ≤ n, we denote by x i (resp., x i) a shift of x to the left (resp., right) over i positions.We likewise denote by x ≪ i (resp., x ≫ i) a rotation of x to the left (resp., right) over i positions.We denote by x i the i left-most bits of x.
For a finite set T , we denote by perm(n) the set of all n-bit permutations and by perm(T , n) the set of all families of permutations indexed by T ∈ T .For a finite set S, we denote by s $ ← − S the uniform random sampling of an element s from S. An adversary A is an algorithm that is given access to one or more oracles O, and after interaction with O outputs a bit b ∈ {0, 1}.This event is denoted as A O → b.In our work, we will be concerned with computationally unbounded adversaries A; their complexities are only measured by the number of oracle queries.For two randomized oracles O, P, we denote the advantage of an adversary A in distinguishing both by (2) Finally, let k, m, n, t ∈ N with k, m, t ≤ n throughout.

Authenticated Encryption
An authenticated encryption scheme _ consists of two algorithms enc and dec.
In our work, the authenticated encryption scheme _ is based on an n-bit permutation P, which is modeled as a random permutation: P $ ← − perm(n).The security of _ against an adversary A is defined as where the randomness of the oracles is taken over , and the function rand that for each input (N, A, M ) returns a random string of size |M | + t bits.The superscript ± indicates two-sided access by A. The function ⊥ returns the ⊥-sign for each query.
We only consider nonce-respecting adversaries: A is not allowed to make two encryption queries for the same nonce.It is also not allowed to relay the output of the encryption oracle (enc K in the real world and rand in the ideal world) to the decryption oracle (dec K in the real world and ⊥ in the ideal world).

Tweakable Block Ciphers
A tweakable block cipher E is a function that gets as input a key K ∈ {0, 1} k , tweak T ∈ T ,1 and message M ∈ {0, 1} n , and it outputs a ciphertext C ∈ {0, 1} n .The tweakable block cipher is required to be bijective for any fixed (K, T ).
In our application, we will not make use of the inverse E −1 .More importantly, for our authenticated encryption scheme it suffices to use a tweakable block cipher that is secure against adversaries that only have access to E, and not to E −1 .The tweakable block cipher considered in this work is based on an n-bit permutation P, which is modeled as a random permutation: P $ ← − perm(n).The security of E against an adversary A is defined as where the randomness of the oracles is taken over

Simplified Masked Even-Mansour
The Elephant authenticated encryption family uses its underlying permutation in a "Masked Even-Mansour" (MEM) construction [GJMN16]: the input to and output of the permutation P are masked using an LFSR evaluated on the secret key.However, the tweakable block cipher used in our proposal is simpler than the original construction in two ways: (i) the tweak only consists of the exponents of the LFSRs and not the nonce and (ii) in our application, the tweakable block cipher is only evaluated in the forward direction.The changes are not huge, but they do allow for a simpler description, security analysis, and bound.We will refer to this scheme as SiM (Simplified MEM).For generality, we will keep the formalization for an arbitrary amount of LFSRs, even though we will only use it for two LFSRs.

Security of SiM
We need a restriction on the tweak space T in order for SiM to be a secure tweakable block cipher.As Granger et al. [GJMN16], we say that T is 2 −α -proper with respect to The tweak space T is called 2 −α -proper with respect to (ϕ 1 , . . ., ϕ z ) if the following two properties hold: 1.For any Y ∈ {0, 1} n and (a 1 , . . ., a z ) ∈ T ∪ {(0, . . ., 0)}, 2. For any Y ∈ {0, 1} n and distinct (a 1 , . . ., a z ), (a 1 , . . ., a z ) ∈ T ∪ {(0, . . ., 0)}, In Section 7, we will prove Theorem 1, which says that if the tweak space is 2 −α -proper for sufficiently small 2 −α (note that 2 −α cannot be smaller than 2 −n ), then SiM is a secure tweakable block cipher.The proof is a direct simplification of Granger et al.'s analysis of MEM [GJMN16], due to the changes described in the introductory text of Section 3.These simplifications allow us to derive a slightly improved bound on the advantage, noting for comparison that Granger et al. [GJMN16] proved security up to (4. 1} n be z LFSRs, and let T be a 2 −α -proper tweak space with respect to (ϕ 1 , . . ., ϕ z ).Consider SiM of (6) based on random permutation P $ ← − perm(n).For any adversary A making at most q ≤ 2 n−1 construction queries and p primitive queries, The proof is given in Section 7.

Elephant Authenticated Encryption
The Elephant authenticated encryption mode is specified in Section 4.1, and it is proven to be secure relative to the tweakable block cipher security of SiM in Section 4.2.

Decryption
Decryption dec gets as input a key

Security of Elephant
We will prove security of Elephant of Section 4.1 for any 2 −α -proper tweak space.The specific choice of tweak space will be discussed in Section 5.
LFSRs, and let T be a 2 −α -proper tweak space with respect to (ϕ 1 , ϕ 2 ).Consider Elephant = (enc, dec) of Section 4.1 based on random permutation P $ ← − perm(n).For any adversary A making at most q e ≤ 2 n−1 construction encryption queries, q d construction decryption queries, each query at most padded nonce and associated data and message blocks, and in total at most σ padded nonce and associated data and message blocks, and p primitive queries, for some A that makes 2σ construction queries and p primitive queries.
The proof is given in Section 8.

Instantiation
While it is possible to instantiate our scheme with any permutation, we aimed for permutations that have a lightweight footprint in either hardware or software.Hence, for our three instances we rely on well-established permutations operating on as small as possible state sizes, in order to still fulfill the security goals recommended by the NIST lightweight call [Nat18] of having at least 112-bit security provided that the online complexity is at most around 2 50 bytes.We propose the three instances given in Table 1.The instances are further elaborated on in Sections 5.1, 5.2, and 5.3, respectively.For generating the masks of our scheme, we use the approach of Granger et al. [GJMN16].We define ϕ 1 as the following F 2 -linear map, where the x i 's correspond to 8-bit words: This LFSR aims to minimize the area required when implemented in hardware.In particular, in addition to the shift register, only two 2-bit XOR gates are needed.Hence, this choice of the LFSRs is in line with the strength of the Spongent permutations, making a perfect match for small area hardware implementations.Despite the particular suitability of both LFSRs for small area hardware implementations, it is still possible to implement them rather efficiently on 8-bit platforms.We will prove that the 160-bit LFSR defined by (8) has maximal length, and that the tweak space used in Elephant with this LFSR is 2 −n -proper with respect to (ϕ 1 , ϕ 2 ).Proposition 1.Let n = 160.Let ϕ 1 : {0, 1} 160 → {0, 1} 160 be the LSFR given in (8), and Proof.The proof is almost identical to [GJMN16, Lemma 4], with the main difference that a different discrete logarithm must be computed.Let V be the 160 × 160 matrix over F 2 that represents ϕ 1 of (8).As shown in [GJMN16, Lemma 3], minimal polynomial of V is primitive and of degree n.A quick computation using Sage [The17] shows that this polynomial p(x) = x 160 + x 136 + x 83 + x 53 + 1 is irreducible and primitive.
Next, let = log x (x + 1) in the field F 2 [x]/p(x).We have to show that If we consider that b ∈ {0, 1, 2} divides the tweak space into three sets, the smallest difference is between the set with b = 0 and the set corresponding to b = 2, which is bigger than 2 154 .Hence, by ensuring that 0 ≤ a ≤ 2 154 , we have that for any two distinct Finally, using both of the above observations, one can easily observe that T is 2 −nproper in light of Definition 1.
We directly obtain that Dumbo is secure in the random permutation model.
Consider Dumbo: Elephant = (enc, dec) of Section 4.1 based on the permutation Spongentπ[160], modeled as a random 160-bit permutation, and on ϕ 1 : {0, 1} 160 → {0, 1} 160 of (8).For any adversary A making at most q e construction encryption queries, q d construction decryption queries, each query at most padded nonce and associated data and message blocks, and in total at most σ ≤ 2 158 padded nonce and associated data and message blocks, and p primitive queries,

Jumbo: 176-Bit Elephant
The 176-bit instance of Elephant is also based on a Spongent permutation, namely Spongentπ[176] [BKL + 11].It has the same features as Spongent-π[160] (see Section 5.1), but offers a slightly more comfortable 127-bit security margin.In addition, this particular Spongent permutation is part of the ISO/IEC standard on lightweight hash functions [ISO16].
For generating the masks of our scheme, we use the approach of Granger et al. [GJMN16].The LFSR ϕ 1 is defined as the following F 2 -linear map, where the x i 's correspond to 8-bit words: This LFSR has the same advantages and implementation features as the 160-bit LFSR of (8) in Section 5.1.We will prove that the 176-bit LFSR defined by (9) has maximal length, and that the tweak space used in Elephant with this LFSR is 2 −n -proper with respect to (ϕ 1 , ϕ 2 ).Proposition 2. Let n = 176.Let ϕ 1 : {0, 1} 176 → {0, 1} 176 be the LSFR given in (9), and Proof.The proof is identical to that of Proposition 1, with the difference that for the 176 × 176 matrix V that represents ϕ 1 of (9), the corresponding polynomial p(x) = x 176 + x 154 + x 135 + x 19 + 1 is irreducible and primitive.The discrete logarithm = log x (x + 1) in the field F 2 [x]/p(x) and its related 2 are computed as = 18881376151403786777481463432029450294100461562220699 ≈ 2 173.66 , 2 = 37762752302807573554962926864058900588200923124441398 ≈ 2 174.66 .

Delirium: 200-Bit Elephant
The 200-bit instance of Elephant is based on the Keccak-f [200] permutation [BDPV11b].The 200-bit instance is the smallest of the instances that is specified in the NIST standard [FIP15] that fits our need; it is still reasonable in hardware, and particularly good in software on 8-bit platforms, considering that it is naturally defined using 8-bit lanes [BDP + 12, KY10].As such, it is complementary to the Spongent-based instantiation of Elephant.
For generating the masks of our scheme, we use the approach of Granger et al. [GJMN16].The LFSR ϕ 1 is now defined as the following F 2 -linear map, where the x i 's correspond to 8-bit words: This LFSR shows its full potential when implemented on 8-bit platforms.A state update within the LFSR just updates one byte, while the content of the other 24 bytes is not changed and basically just relabeled.The single updated byte is computed as the XOR sum of 3 bytes other state bytes that are just rotated or shifted by one bit position.Hence, the essential operations that have to be performed on 8-bit platforms are 3 XOR operations, two rotations by one bit to the left plus one shift by one bit to the left.We will prove that the 200-bit LFSR defined by (10) has maximal length, and that the tweak space used in Elephant with this LFSR is 2 −n -proper with respect to (ϕ 1 , ϕ 2 ).Proposition 3. Let n = 200.Let ϕ 1 : {0, 1} 200 → {0, 1} 200 be the LSFR given in (10), and Proof.The proof is identical to that of Proposition 1, with the difference that for the 200 × 200 matrix V that represents ϕ 1 of (10), the corresponding polynomial p(x) = x 200 + x 93 + x 91 + x 82 + x 78 + x 71 + x 69 + x 67 + x 65 + x 60 + x 52 + x 49 + x 47 + x 41 + x 39 + x 38 + x 34 + x 30 + x 27 + x 26 + x 25 + x 23 + x 21 + x 19 + x 17 + x 16 + x 15 + x 13 + 1 is irreducible and primitive.The discrete log = log x (x + 1) in the field F 2 [x]/p(x) and its related 2 are computed as = 692180606625676931900534627786122994390018641930530681719698 ≈ 2 198.78 , 2 = 1384361213251353863801069255572245988780037283861061363439396 ≈ 2 199.78 .
Again, dividing the tweak space into 3 sets according to the value b ∈ {0, 1, 2}, the smallest difference is between set b = 2 and set b = 0, which is bigger than 2 197 .Hence, by ensuring that 0 ≤ a ≤ 2 197 , we have that for any two distinct (a, b), (a , b ) ∈ {0, 1, . . ., We directly obtain that Delirium is secure in the random permutation model.

Design Rationale
The Elephant mode is an encrypt-then-MAC mode, where encryption is performed by counter mode and message authentication by a variant of Wegman-Carter-Shoup [WC81, Sho96], both implicitly instantiated using a simplification of the masked Even-Mansour (MEM) tweakable block cipher of Granger et al. [GJMN16].We explain the design rationale of Elephant at the following two levels of granularity: the generic mode in Section 6.1, and how the mode uses the permutation, i.e., the masking scheme, in Section 6.2.Finally, Section 6.3 briefly discusses implementation aspects.

Mode
Generically, encrypt-then-MAC is the most secure approach [BN00, NRS14]: unlike its alternatives encrypt-and-MAC and MAC-then-encrypt, this approach yields integrity of ciphertexts.Stated differently, malformed ciphertexts yield failure upon MAC verification, and for these no decryption is needed.This prevents unintended leakage from verification failures.The approach also makes it possible to easily prevent leakage due to release of unverified plaintext: simply do not start decrypting before the tag is verified.Note that for the generic alternatives encrypt-and-MAC and MAC-then-encrypt, such a simple countermeasure is impossible.This makes the encrypt-then-MAC mode of Elephant preferable over its alternatives, not only in the lightweight setting but also for general purpose.
The counter encryption mode and Wegman-Carter-Shoup MAC mode within Elephant, in turn, are both fully parallelizable and only evaluate the underlying permutation P in forward direction.The fact that Elephant evaluates its primitive in forward direction is important in the lightweight setting: it allows for smaller implementations, since there is no need to implement the inverse of P. Note, in particular, that due to the rise of the sponge, various cryptographic permutations, including Ascon [DEMS16], Gimli [BKL + 17], Keccak [BDPV11b], and XOODOO [DHVV18], are developed to be particularly efficient in forward direction.
By being parallelizable, Elephant distinguishes itself from a wide range of authenticated encryption schemes that employ a serial permutation-based mode of operation, such as APE [ABB + 14], Beetle [CDNY18], or the Duplex construction [BDPV11a,MRV15,DMV17].To support parallelism, we need to store the internal state value, but on the upside, it turns out to give various elegant implementation advantages (see Section 6.2 and Section 6.3) and it means that there is no strict need to employ larger permutations.
We briefly elaborate on existing generic authenticated encryption schemes that are both parallel and permutation-based (but not necessarily inverse-free).Granger et al. [GJMN16] introduced OPP, a parallel and permutation-based scheme derived from ΘCB [KR11], but it is not inverse-free.Minalpher [STA + 15], likewise, is parallel and permutation-based but not inverse-free.Finally, a permutation-based version of OTR [Min16] exists in the embodiment of Prøst-OTR [KLL + 14].This construction is parallel, permutation-based, and inverse free, just like Elephant.However, because it processes pairs of message blocks using a two-round Feistel structure, the encryption process differs depending on the parity of the number of message blocks.This stands in contrast to the conceptual simplicity of Elephant.In addition, for short messages, less parallelism is available in Prøst than for Elephant.If the implementation maximally exploits parallelism, Elephant would compare favorably for short messages in terms of latency.
The mode is nonce-based: each of the members of Elephant uses a 96-bit nonce.The nonce is prepended to the associated data, which is then padded and split into n-bit blocks A 1 . . .A A (see line 6 of Algorithm 1).This way, the scheme is optimized for the parameters specified in the NIST call [Nat18]: the nonce is 96 bits, and in order to avoid a waste of n − 96 bits due to padding (where n ∈ {160, 176, 200}), the nonce is appended with the first n − 96 bits of the associated data.Caution must be paid here, namely that the nonce is always of fixed length of 96 bits.If variable-length nonces were allowed, the scheme would be vulnerable to trivial padding attacks.We remark that it is theoretically possible to adjust the Elephant mode to allow longer nonces or flexible-length nonces, but we discourage this as it might lead to error-prone designs.Furthermore, we clarify that the nonce is used both for encryption and for authentication: the former is needed for confidentiality and the latter is needed in case of authenticated encryption of an empty message.Also, as the mode is nonce-based, security is guaranteed only if the adversary does not repeat nonces for encryption queries.

Masking
As specified in Section 4.1, the inputs to and outputs of the permutation P are masked using mask a,b K of (7).The masking function is defined using two LFSRs ϕ 1 , ϕ 2 : {0, 1} n → {0, 1} n that satisfy ϕ 2 = ϕ 1 ⊕id, and it is parameterized by (a, b) which are used in a manner so as to assure that every occurrence of the masking in the Elephant mode gets different parameters.We have heuristically chosen our LFSRs to give a good match when used in combination with the particular permutations.For the LFSR's matching Spongent, we selected versions that have a small gate count in hardware.In the case of the 200-bit Keccak permutation, we chose an LFSR that can be implemented with a small number of instructions.Hence, we selected an LFSR that allows for implementations with shift/rotation by one.The number of gates needed for a hardware implementation was a secondary consideration in this case.
The LFSR-based masking technique is taken from Granger et al. [GJMN16], and so is the security analysis (although different state sizes, discrete logarithm computations, LFSRs, and tweak domains are considered).Granger et have argued in favor of this technique over its alternatives for various reasons: (i) the approach is simpler to implement, as the masking is purely linear and does not use finite field multiplication, (ii) it is more efficient (depending on the primitive used), and (iii) the masking is constant time.
The latter point is important in the lightweight setting where resistance against timing attacks comes at a cost.In this respect, the LFSR-based masking approach compares favorably with another, and very popular, masking technique, namely powering-up-based masking (simplified to allow for fair comparison with (7)): where 2 and 3 are coordinates in the monomial basis in the finite field F 2 n .The technique was introduced by Rogaway [Rog04] in the context of OCB2, and it has seen many applications, including CAESAR submissions AES-OTR [Min16], AEZ [HKR17], COLM [ABD + 16], Minalpher [STA + 15], POET [AFF + 15], and SHELL [Wan15].These multiplications can be implemented as an LFSR on one-bit words, but the masking functions ϕ 1 and ϕ 2 are constant time by design and allow for more flexibility in the word size.
A related masking approach is that of OCB3 [KR11] and OMD [CMN + 15], which use masking based on Gray coding.In detail, Gray coding-based masks can be updated as G(i) = G(i − 1) ⊕ 2 ntz(i) , were ntz(i) is the number of trailing zeros in the binary representation of i.The masking, unlike powering-up, does not need a conditional XOR, but it requires log 2 (i) field doublings (which may be precomputed).As the LFSR-based masking used in Elephant does not incur such a cost, it also compares favorably with this technique.
The particular choice of masking, namely (a, b) = (i, 0) in the encryption layer, (a, b) = (i, 1) for ciphertext authentication, and (a, b) = (i, 2) for associated data authentication, allows maskings to cancel out nicely in the implementation.To see this, consider the authentication of ciphertext C i (for i < M ≤ C ), and more detailed the contribution T i it makes to tag T .This value is computed as By definition of mask a,b K , and as ϕ 2 = ϕ 1 ⊕ id, we have This, not surprisingly, is the mask used for the encryption of the next message block M i+1 .We note that exploiting this requires extra state.Another optimization in mask management is in the masks that contribute to the tag, i.e., the sum of all masks that appear in the final tag T .The contribution coming from the ciphertext authentication equals and that coming from the associated data likewise equals This feature of the masking may be useful if Elephant is used for fixed-length data, in which case the (11) and ( 12) could be precomputed.

Implementation
As discussed in Section 6.1, the Elephant mode allows for a high degree of parallelism.
For the hardware-oriented variants of Elephant (Dumbo and Jumbo), this makes it easy to trade-off area for additional throughput.Hardware implementations of the 176-bit Spongent permutation are given by Bogdanov et al. [BKL + 11], e.g., just needing 1329 GE to implement the Spongent-160 hash function, which is based on the 176-bit Spongent permutation.The 200-bit variant of Elephant primarily targets (embedded) software, but the same remarks concerning hardware implementations apply as, e.g., demonstrated by an implementation of a hash function based on the 200-bit Keccak permutation needing just 2520 GE by Kavun and Yalçin [KY10].Software implementations of 200-bit Elephant (Delirium) can also exploit parallelism.If multiple cores are available, several blocks can be processed concurrently -but this is only useful for long messages.More importantly, on processors with a word size above 16 bits, the available parallelism makes it possible to increase the efficiency of the implementation by combining two or more calls to the Keccak permutation.For mid-and high-end processors with SIMD instructions, the same technique can be used to obtain even greater speed-ups.
An increasingly common requirement is the ability to protect implementations against side-channel attacks.As discussed in Section 6.2, the masking scheme is constant time by design.The same applies to the Spongent and Keccak permutations.In addition, all variants of Elephant are well-suited for Boolean masking techniques such as threshold implementations [NRR06].
Finally, it is worth mentioning that a few specific use-cases of Elephant allow for additional optimizations.As discussed in Section 6.2, the contribution of the mask values to the tag can be precomputed for fixed-length messages.In addition, if one or more blocks of associated data are static, it is possible to precompute their contribution to the tagwith the exception of the first block, which involves the nonce.
A reference implementation of Dumbo, Jumbo, and Delirium written in C99 can be found at https://github.com/TimBeyne/Elephant.

Proof of Theorem 1 (on SiM)
The proof closely follows Granger et al. [GJMN16] and is performed using the H-coefficient technique [Pat08,CS14]. Let where T is 2 −α -proper with respect to LFSRs (ϕ 1 , . . ., ϕ z ).Consider a computationally unbounded adversary A that tries to distinguish O := ( E P K , P ± ) from P := ( π, P ± ).Without loss of generality, we can consider it to be deterministic: for any probabilistic adversary there exists a deterministic one that has at least the same success probability.The interaction of A with its oracle (O or P) is gathered in a view ν.Denote by D O (resp., D P ) the probability distribution of views in interaction with O (resp., P).Denote by V the set of "attainable views", i.e., views ν such that Pr (D P = ν) > 0.
The remainder of the proof is structured as follows.We specify the views of an adversary in Section 7.1 and define the bad views in Section 7.2.The probability of bad views is analyzed in Section 7.3 and the probability ratio for good views is considered in Section 7.4.Section 7.5 concludes the proof.

Views
The adversary can make q construction queries to E P K or π, all in forward direction only.Each such query is made for some tweak āi = (a 1 , . . ., a z ) i and message input M i , and results in an output C i .The q queries are summarized in a view The adversary can make p primitive queries to P ± , and these are likewise summarized in a view ν p = {(X 1 , Y 1 ), . . ., (X p , Y p )} .
After the conversation of A with its oracle, but before it makes its final decision, we reveal the key material used in the interaction.This can be done without loss of generality; it only improves the adversarial success probability.The first value that is revealed is a value K.In the real world, this is the key K $ ← − {0, 1} k that is actually used by the construction oracle; in the ideal world, it is a dummy key K $ ← − {0, 1} k .The second value that is revealed is a value L ∈ {0, 1} n .In the real world, it is the value L = P(K 0 n−k ); in the ideal world, it is a dummy key L $ ← − {0, 1} n . 3The revealed data is summarized in a view The complete view is defined as ν = (ν c , ν p , ν k ).We assume that the adversary never makes any duplicate query, hence ν c and ν p contain no duplicate elements.

Definition of Good and Bad Views
In the real world, all tuples in ν p define exactly one input-output pair for P. Likewise, the sole tuple in ν k is an input-output pair for P. Using this tuple, one can observe that any tuple (ā i , M i , C i ) ∈ ν c also defines an input-output pair for P, namely If among all these q + p + 1 input-output pairs defined by ν, there are two that have colliding input or output values, we consider ν to be a bad view.Formally, ν is called "bad" if one of the following conditions is satisfied, where we recall that ν k = {(K, L)} is a singleton: bad c,c : for some distinct (ā, M, C), (ā , M , C ) ∈ ν c : We write bad = bad c,c ∨ bad c,p ∨ bad c,k ∨ bad p,k .

Probability of Bad View in Ideal World
Our goal is to bound Pr (D P ∈ V bad ), the probability of a bad view in the ideal world P = ( π, P ± ).(14) We will analyze the four probabilities separately, thereby noticing that (i) K Otherwise, if ā = ā , we can deduce from 2 −α -properness of T , namely property 2 of Definition 1, that event bad c,c holds with probability at most 2/2 α .Thus, summing over all q 2 possible choices of queries, Event bad c,p .For bad c,p , let (ā, M, C) ∈ ν c and (X, Y ) ∈ ν p be any two tuples.We can deduce from 2 −α -properness of T , namely property 1 of Definition 1, that event bad c,p holds with probability at most 2/2 α .Thus, summing over all qp possible choices of queries, Event bad c,k .For bad c,k , let (ā, M, C) ∈ ν c be any tuple.We consider the two equations of bad c,k separately.For the first equation, we will use that L $ ← − {0, 1} n is a randomly generated value independent of K. We can deduce from 2 −α -properness of T , namely property 1 of Definition 1, that this equation holds with probability at most 1/2 α .
For the second equation, we will use that all construction queries are made in forward direction, and that C is randomly drawn from a set of size at least 2 n − q elements.Above equation thus holds with probability at most 1/(2 n − q).Thus, summing over all q possible choices of queries, Event bad p,k .For bad p,k , let (X, Y ) ∈ ν p be any tuple.As K $ ← − {0, 1} k and L $ ← − {0, 1} n , the tuple sets bad p,k with probability at most 1/2 k + 1/2 n .Thus, summing over all p possible choices of queries, Conclusion.Concluding, we obtain for (14): using that 2 n − q ≥ 2 n−1 .

Probability Ratio for Good Views
Consider any good view ν ∈ V good .We will prove the inequality Pr (D O = ν) ≥ Pr (D P = ν).The proof is a direct simplification of that of Granger et al. [GJMN16], noting that in our case, ν k consists of just one element.The proof is included for completeness.
Real World.In the real world O = ( E P K , P ± ), goodness of the view means that ν = (ν c , ν p , ν k ) defines exactly q + p + 1 input-output pairs for P and ν k consists of a random value K $ ← − {0, 1} k , and there are no two of them that collide on the input or output.Therefore, we obtain: Ideal World.In the ideal world P = ( π, P ± ), the view ν = (ν c , ν p , ν k ) consists of three lists of independent tuples: ν c defines exactly q input-output pairs for π, ν p defines exactly p input-output pairs for P, and ν k consists of two random values (K, L) For counting, it is convenient to group the tuples in ν c depending on the tweak value ā.For T ∈ T , define where T ∈T q T = q.We obtain: using that for any σ + τ ≤ 2 n we have .

Conclusion.
Combining ( 16) and ( 17), we obtain that for any good view ν ∈ V good :

Conclusion
By the H-coefficient technique (Lemma 1), we directly obtain from (15) and (18): 8 Proof of Theorem 2 (on Elephant) and rand be a function that for each input (N, A, M ) returns a random string of size |M | + t bits.Consider a deterministic computationally unbounded adversary A that tries to distinguish O := (enc P K , dec P K , P ± ) from P := (rand, ⊥, P ± ): As a first step, we will describe an alternative authenticated encryption scheme _ based on a tweakable permutation π $ ← − perm(T , n), where T is 2 −α -proper with respect to LFSRs (ϕ 1 , ϕ 2 ).Its encryption function enc and decryption function dec are given in Algorithms 3 and 4, respectively.Unlike the original functions enc and dec of Algorithms 1 and 2, the functions enc and dec are not explicitly keyed, but are instead implicitly keyed by the use of random secret tweakable permutation π.
For the second distance of (21), we remark that every query is made for a unique nonce, and in more detail: • The i-th block of ciphertext equals π((i − 1, 0), N ) ⊕ M i , where M i is the i-th block of plaintext; • The tag equals π((0, 2), N A ) t ⊕ h(A , C), where A equals the first n − m bits of padded associated data and A equals the rest, and where h never evaluates π for tweak (•, 0) or (0, 2).
The maximum k-interpolation probability of f t , for k ≤ q e + 1 ≤ 2 n−1 + 1, satisfies: where we used that k − 1 ≤ 2 n−1 .As 2 −α ≥ 2 −t , the bound satisfies the constraints put forward by Bernstein for δ = e (qe+1)qe/2 n .We remark that for t = n, i.e., for f n an injective function, Bernstein computed the same maximum k-interpolation probability in [Ber05, Theorem 4.2] and derived a similar bound on the security of mac π,h in [Ber05, Theorem 5.3].

9
In this paper, we presented the Elephant family of lightweight authenticated encryption schemes.Our construction combines a provably secure mode of operation with standardized lightweight permutations.As a result, we end up with a parallel authenticated encryption scheme that is suitable for dedicated hardware implementations on resource-constrained devices, but also for software implementations on small 8-bit microcontrollers.Hence, Elephant fulfills the increasing demand for secure lightweight authenticated encryption schemes.

Figure 1 :
Figure 1: Depiction of Elephant.For the encryption part (top): message is padded as M 1 . . .M M n ← − M , and ciphertext equals C = C 1 . . .C M |M | .For the authentication part (bottom): nonce and associated data are padded as A 1 . . .A A n ← − N A 1, and ciphertext is padded as C 1 . . .C C n ← − C 1.
For brevity, denote by D P ∝ bad the event that D P satisfies bad.By the union bound, Pr (D P ∝ bad) = Pr (D P ∝ bad c,c ∨ bad c,p ∨ bad c,k ∨ bad p,k ) ≤ Pr (D P ∝ bad c,c ) + Pr (D P ∝ bad c,p ) + Pr (D P ∝ bad c,k ) + Pr (D P ∝ bad p,k ) .
, 1} n are random variables, and (ii) as the adversary only makes forward construction queries, each tuple (ā, M, C) ∈ ν c satisfies that C is randomly drawn from a set of size at least 2 n − q.Event bad c,c .For bad c,c , let (ā, M, C), (ā , M , C ) ∈ ν c be any two distinct tuples.If ā = ā , then necessarily M = M and C = C , and bad c,c holds with probability 0.

Table 1 :
Instances of Elephant.Spongent permutation; we opt for 80 rounds since this ensures that at least 160 S-boxes are differentially active.This is in accordance with the Spongent design strategy.Note further that this implies that the 7-bit LFSR specified in [BKL + 11] should be used (with initial value 0x75) to generate the round constants for the permutation.
The 160-bit instance of Elephant is based on the Spongent-π[160] permutation [BKL + 11].The choice for Spongent is natural: it is particularly well-suited for hardware, and the existing third-party analysis [Abd12,ZL17,HKS18,ZBRL15] does not indicate any weakness of the Spongent family relevant for our use-case.We have used the 160-bit version of Spongent as this is the smallest possible permutation that can be used to efficiently 2 to meet the NIST call for proposals.Bogdanov et al. [BKL + 11] do not explicitly specify the number of rounds of the 160-bit version of the As before, limiting the total online complexity σ to 2 74 /(n/8) blocks, the bound of Corollary 3 is at most 1 for p ≤ 2 127 .