Xoodyak, a lightweight cryptographic scheme

. In this paper, we present Xoodyak , a cryptographic primitive that can be used for hashing, encryption, MAC computation and authenticated encryption. Essentially, it is a duplex object extended with an interface that allows absorbing strings of arbitrary length, their encryption and squeezing output of arbitrary length. It inherently hashes the history of all operations in its state, allowing to derive its resistance against generic attacks from that of the full-state keyed duplex. Internally, it uses the Xoodoo [12] permutation that, with its width of 48 bytes, allows for very compact implementations. The choice of 12 rounds justiﬁes a security claim in the hermetic philosophy: It implies that there are no shortcut attacks with higher success probability than generic attacks. The claimed security strength is 128 bits. We illustrate the versatility of Xoodyak by describing a number of use cases, including the ones requested by NIST in the lightweight competition. For those use cases, we translate the relatively detailed security claim that we make for Xoodyak into simple ones.


Introduction
Xoodyak is a versatile cryptographic object that is suitable for most symmetric-key functions, including hashing, pseudo-random bit generation, authentication, encryption and authenticated encryption.It is based on the duplex construction, and in particular on its full-state (FSKD) variant when it is fed with a secret key [3,12].It is stateful and shares features with Markku Saarinen's Blinker [19], Mike Hamburg's Strobe protocol framework [13] and Trevor Perrin's Stateful Hash Objects (SHO) [17].In practice, Xoodyak is straightforward to use and its implementation can be shared for many different use cases.
Internally, Xoodyak makes use of the Xoodoo permutation [11,10].The design approach of this 384-bit permutation is inspired by Keccak-p [5,15], while it is dimensioned like Gimli for efficiency on low-end processors [1].The structure consists of three planes of 128 bits each, which interact per 3-bit columns through mixing and nonlinear operations, and which otherwise move as three independent rigid objects.Its round function lends itself nicely to low-end 32-bit processors as well as to compact dedicated hardware.
The mode of operation on top of Xoodoo is called Cyclist, as a lightweight counterpart to Keyak's Motorist mode [6].It is simpler than Motorist, mainly thanks to the absence of parallel variants.Another important difference is that Cyclist is not limited to authenticated encryption, but rather offers fine-grained services, à la Strobe, and supports hashing.
Xoodyak contains several built-in mechanisms that help protect against side-channel attacks.
• Following an idea by Taha and Schaumont [23], Cyclist can absorb the session counter that serves as nonce in chunks of a few bits.This counters differential power analysis (DPA) by limiting the degrees of freedom of an attacker when exploiting a selection function, see Section 3.2.2.
• Another mechanism consists in replacing the incrementation of a counter with a key derivation mechanism: After using a secret key, a derived key is produced and saved for the next invocation of Xoodyak.The key then becomes a moving target for the attacker, see Section 3.2.6.
• Then, to mitigate the impact of recovering the internal state, e.g., after a side channel attack, the Cyclist mode offers a ratchet mechanism, similar to the "forget" call in [3].This mechanism offers forward secrecy and prevents the attacker from recovering the secret key prior to the application of the ratchet, see Section 3.2.5.
• Finally, the Xoodoo round function lends itself to efficient masking countermeasures against differential power analysis and similar attacks.

Notation
The set of all bit strings is denoted Z * 2 and ϵ is the empty string.Xoodyak works with bytes and in the sequel we assume that all strings have a length that is multiple of 8 bits.The length in bytes of a string X is denoted |X|, which is equal to its bit length divided by 8.
We denote a sequence of m strings X (0) to X (m−1) as X (m−1) • • • • • X (1) • X (0) .The set of all sequences of strings is denoted (Z * 2 ) * and ∅ is the sequence containing no strings at all.
We denote with enc 8 (x) a byte whose value is the integer x ∈ Z 256 .

Usage overview
Xoodyak is a stateful object.It offers two modes: the hash and the keyed modes, one of which is selected upon initialization.

Hash mode
In hash mode, it can absorb input strings and squeeze digests at will.Absorb(X) absorbs an input string X, while Squeeze(ℓ) produces an ℓ-byte output depending on the data absorbed so far.The simplest case goes as follows: Cyclist(ϵ, ϵ, ϵ) {initialization in hash mode} Absorb(x) {absorb string x} h ← Squeeze(n) {get n bytes of output} Here, h gets a n-byte digest of x, where n can be chosen by the user.Xoodyak offers 128bit security against any attack, unless easier on a random oracle.To get 128-bit collision resistance, we need to set n ≥ 32 bytes (256 bits), while for a matching level of (second) preimage resistance, it is required to have n ≥ 16 bytes (128 bits).This is similar to the SHAKE128 extendable output function (XOF) [15].More complicated cases are possible, for instance: Here, h 1 is a digest over the two-string sequence y • x and h 2 is a digest over z • y • x.The digest is over the sequence of strings and not just their concatenation.In this aspect, we here have a function that is similar to and has the same security level as TupleHash128 [16].

Keyed mode
In keyed mode, Xoodyak can do stream encryption, message authentication code (MAC) computation and authenticated encryption.As a first example, the following sequence produces a tag (a.k.a.MAC) on a message M : The last line produces a t-byte tag, where t can be specified by the application.A typical tag length would be t = 16 bytes (128 bits).
Then, encryption is done in a stream cipher-like way, hence it requires a nonce.The obvious way to do encryption would be do call Squeeze() and use the output as a keystream.Encrypt(P ) works similarly, but it also absorbs P block per block as it is being encrypted.This offers an advantage in case of nonce misuse, as the leakage is limited to one block when the two plaintexts start to differ.Hence, to encrypt plaintext P under a given nonce, we can run the following sequence: And to decrypt, we simply replace the last line with: Finally, authenticated encryption can be achieved by combining the previous sequences.For instance, to encrypt plaintext P under a given nonce and associated data A, we proceed as follows: We attach a fairly precise, yet involved, security claim to Xoodyak in keyed mode.In addition, we provide clear corollaries with the resulting security strength for specific use cases.Here are two examples, which both assume a single secret key made of κ = 128 uniformly and independently distributed bits.
• First, let us take the MAC computation at the beginning of this section.It does not enforce the use of a nonce, hence an adversary gets more power in exploiting adaptive queries.Yet, this authentication scheme can resist against an adversary with up to 2 128 computational complexity and up to 2 64 data complexity (measured in blocks).
• Then, we discuss the last example of this section, namely the authenticated encryption scheme.We assume an application that correctly implements nonces and that does not release unverified decrypted ciphertexts.The use of nonces makes Xoodyak resist against even stronger adversaries.Our claim implies that this nonce-based authenticated encryption scheme can resist against an adversary with up to 2 128 computational complexity and up to 2 160 data complexity.Furthermore, the key size κ can be increased up to about 180 bits and the computational complexity limit follows 2 κ , still with a data complexity of 2 160 .

Advantages and limitations
The advantages of Xoodyak are the following.
• It is compact: It only requires a 48-byte state and some input and output pointers.The underlying duplex construction allows for bytes that arrive to be immediately integrated into the state without the need of a message queue.
• It foresees protections against side-channel attacks.
-It offers leakage resilience.During a session, the secret key is a moving target, as there is no fixed key.In between sessions, it foresees a mechanism to roll keys.-If the same key must be used many times, one can easily add protection against implementation attacks.The degree-2 round function of Xoodoo makes masking and threshold schemes relatively cheap.
• Its specifications are short and simple, while supporting all symmetric crypto operations with a security strength of 128 bits.
• Its mode offers great flexibility and can be adapted to the specific needs of an application.For instance, it supports sessions and intermediate tags in authenticated encryption in a transparent way.Intermediate tags allow reducing the buffer at the receiving end to store the plaintext before checking the tag.
• It considers the security against multi-target attacks in the design.
• It relies on a strong and efficient permutation.
-Xoodoo is based on the same principles as Keccak-p and hence its propagation properties are well understood.-Xoodoo has an exceptionally good security strength build-up per operation count.This is visible in the diffusion properties and trail bounds.
• In case of misuse (i.e., nonce misuse or release of unverified decrypted ciphertexts), the key cannot be retrieved by cryptanalysis.Authentication does not rely on a nonce.

It has the following limitations:
• It is inherently serial at construction level.
• It does stream encryption so accidental nonce re-use may result in a leakage of up to 24 bytes of plaintext.

Specifications
Xoodyak is an instance of the Cyclist mode of operation on top of the Xoodoo permutation.We start with the definition of the permutation in Section 2.1.Then in Section 2.2 we present the mode of operation.And finally, in Section 2.3, we define Xoodyak and its associated security claim.

The Xoodoo permutation
Xoodoo is a family of permutations parameterized by its number of rounds n r and denoted Xoodoo has a classical iterated structure: It iteratively applies a round function to a state.The state consists of 3 equally sized horizontal planes, each one consisting of 4 parallel 32-bit lanes.Similarly, the state can be seen as a set of 128 columns of 3 bits, arranged in a 4 × 32 array.The planes are indexed by y, with plane y = 0 at the bottom and plane y = 2 at the top.Within a lane, we index bits with z.The lanes within a plane are indexed by x, so the position of a lane in the state is determined by the two coordinates (x, y).The bits of the state are indexed by (x, y, z) and the columns by (x, z).Sheets are the arrays of three lanes on top of each other and they are indexed by x.The Xoodoo state is illustrated in Figure 1.
The permutation consists of the iteration of a round function R i that has 5 steps: a mixing layer θ, a plane shifting ρ west , the addition of round constants ι, a non-linear layer χ and another plane shifting ρ east .
We specify Xoodoo in Algorithm 1, completely in terms of operations on planes and use thereby the notational conventions we specify in Table 1.We illustrate the step mappings in a series of figures: the χ operation in Figure 2, the θ operation in Figure 3, the ρ east and ρ west operations in Figure 4.
The round constants C i are planes with a single non-zero lane at x = 0, denoted as c i .We specify the value of this lane for indices −11 to 0 in Table 2 and refer to Appendix A for the specification of the round constants for any index.
Finally, in many applications the state must be specified as a 384-bit string s with the bits indexed by i.The mapping from the three-dimensional indexing (x, y, z) and i is given by i = z + 32(x + 4y).
Here R i is specified by the following sequence of steps: θ :

The Cyclist mode of operation
The Cyclist mode of operation relies on a cryptographic permutation and yields a stateful object to which the user can make calls.It is parameterized by the permutation f , by the block sizes R hash , R kin and R kout , and by the ratchet size ℓ ratchet , all in bytes.R hash , R kin and R kout specify the block sizes of the hash and of the input and output in keyed modes, respectively.As Cyclist uses up to 2 bytes for frame bits, we require that max(R hash , R kin , R kout ) + 2 ≤ b ′ , where b ′ is the permutation width in bytes.
Upon initialization with Cyclist(K, id, counter), the Cyclist object starts either in hash mode if K = ϵ or in keyed mode otherwise.In the latter case, the object takes the secret key K together with its (optional) identifier id, then absorbs a counter in a trickled way if counter ̸ = ϵ.In the former case, it ignores the initialization parameters.Note that, unlike Strobe, there is no way to switch from hash to keyed mode, although we might extend Cyclist this way in the future.
The available functions depend on the mode the object is started in: The functions Absorb() and Squeeze() can be called in both hash and keyed modes, whereas the functions Encrypt(), Decrypt(), SqueezeKey() and Ratchet() are restricted to the keyed mode.The purpose of each function is as follows: • Absorb(X) absorbs an input string X; • C ← Encrypt(P ) enciphers P into C and absorbs P ; • P ← Decrypt(C) deciphers C into P and absorbs P ; • Y ← Squeeze(ℓ) produces an ℓ-byte output that depends on the data absorbed so far; • Y ← SqueezeKey(ℓ) works like Y ← Squeeze(ℓ) but in a different domain, for the purpose of generating a derived key; • Ratchet() transforms the state in an irreversible way to ensure forward secrecy.
The state of a Cyclist object will depend on the sequence of calls to it and on its inputs.More precisely, the intention is that any output depends on the sequence of all input strings and of all input calls (i.e., Absorb(), Encrypt() and Decrypt()) so far, and that any two subsequent output calls (i.e., Squeeze() and SqueezeKey()) generate strings from different domains.It does not only depend on the concatenation of input strings, but also on their boundaries without ambiguity.For instance, a call to Absorb(X) means the output will depend on X • Absorb, while a call to Encrypt(P ) will make the output depend also on P • Crypt.However, some dependency comes as a side-effect of other design criteria, like minimizing the memory footprint.As a result, the state also depends on the number of blocks in the previous calls to Squeeze() and the previously processed plaintext blocks in Encrypt() or Decrypt().
Together, everything that influences the output of a Cyclist object, as returned by Squeeze(), SqueezeKey() or as keystream produced by Encrypt(), is captured by the process history, see Definition 1 below.When in keyed mode, the output also depends on the secret key absorbed upon initialization, although the key is not part of the process history itself.This ensures the security claim can be properly expressed in an indistinguishability setting where the adversary has full control on the process history but not on the secret key, see Claim 2.
Table 3: Symbols and strings appended to the process history.
Hash mode: Block n hash (ℓ) Keyed mode: At initialization of the Cyclist object, the history is initialized to ∅.Then, each call to the Cyclist object appends symbols and strings according to Table 3, where ) .
In addition, the process history is updated with the R kout -byte blocks of plaintext as they are processed by Encrypt() or Decrypt().
The Cyclist mode of operation is defined in Algorithms 2 and 3.

Xoodyak and its claimed security
We instantiate Xoodyak in Definition 2 and attach to it security Claims 1 and 2.
of width 48 bytes (or b = 384 bits) • R hash = 16 bytes • ℓ ratchet = 16 bytes Claim 1.The success probability of any attack on Xoodyak in hash mode shall not be higher than the sum of that for a random oracle and N 2 /2 255 , with N the attack complexity in calls to Xoodoo [12] or its inverse.We exclude from the claim weaknesses due to the mere fact that the function can be described compactly and can be efficiently executed, e.g., the so-called random oracle implementation impossibility [14], as well as properties that cannot be modeled as a single-stage game [18].
This means that Xoodyak hashing has essentially the same claimed security as, e.g., SHAKE128 [15].Claim 2. Let K = (K 0 , . . ., K u−1 ) be an array of u secret keys, each uniformly and independently chosen from Z κ 2 with κ ≤ 256 and κ a multiple of 8.Then, the advantage of distinguishing the array of Xoodyak objects after initialization with Cyclist(K i , •, •) with i ∈ Z u from an array of random oracles RO(i, h), where h ∈ (Z * 2 ∪ S) * is a process history, is at most ) 192+κ,384) . ( Here we follow the notation of the generic security bound of the FSKD [12], namely: • N is the computational complexity expressed in the (computationally equivalent) number of executions of Xoodoo [12].
• M is the online or data complexity expressed in the total number of input and output blocks processed by Xoodyak.
• q ≤ M is the total number of initializations in keyed mode.
• Ω ≤ M is the number of blocks, in keyed mode, that overwrite the outer state and for which the adversary gets a subsequent output block.In particular, this counts the number of blocks processed by Decrypt(•) for which the adversary can also get the corresponding key stream value or other subsequent output (e.g., in the case of the release of unverified decrypted ciphertext in authenticated encryption).And it also counts the number of calls to Ratchet() followed by Squeeze(ℓ) or SqueezeKey(ℓ) with ℓ ̸ = 0.
• L ≤ M is the number of blocks, in keyed mode, for which the adversary knows the value of the outer state from a previous query and can choose the input block value (e.g., in the case of authentication without a nonce, or of authenticated encryption with nonce repetition).This includes the number of times a call to Absorb() follows a call to Squeeze(ℓ) or to SqueezeKey(ℓ) with ℓ ̸ = 0.
• q iv ≤ u is the maximum number of keys that are used with the same id, i.e., Claims 1 and 2 ensure Xoodyak has 128 bits of security both in hash and keyed modes (assuming κ ≥ 128).Regarding the data complexity, it depends on the values of q, Ω and L, for which we will see concrete examples in Section 3. Given that they are bounded by M , Xoodyak resists to a data complexity of up to 2 64 blocks, as the probability in Eq. ( 1) is negligible as long as N ≪ 2 128 and M ≪ 2 64 .In the particular case of L + Ω = 0, it resists even higher data complexities, as the probability remains negligible also when M ≪ 2 160 .
The parameter q iv relates to the possible security degradations in the case of multitarget attacks, as an exhaustive key search would erode security by log 2 q iv ≤ log 2 u bits in this case.However, when the protocol ensures q iv = 1, there is no degradation and the security remains at min(128, κ) bits even in the case of multi-target attacks.
A rationale for the security claim is given in Section 4.

Using Xoodyak
Xoodyak, as a Cyclist object, can be started in hash mode and therefore used as a hash function.Alternatively, one can start Xoodyak in keyed mode and, e.g., to use it as a deck function or for duplex-like session authenticated encryption.In this section, we cover use cases in this order, first in hash mode, then in keyed mode, then some combination of both.

Hash mode
As already mentioned, Xoodyak can be used as a hash function.More generally, it can serve as an extendable output function (XOF), the generalization of a hash function with arbitrary output length.To get a n-byte digest of some input x, one can use Xoodyak as follows: This sequence is the nominal sequence for using Xoodyak as a XOF.Its security is summarized in the following Corollary.Xoodyak can also naturally implement a dec function and process a sequence of strings.Here the output depends on the sequence as such and not just on the concatenation of the different strings and, in this sense it is similar to TupleHash [16].To compute a n-byte digest over the sequence x 3 • x 2 • x 1 , one does: A XOF can be implemented in a stateful manner and can come with an interface that allows for requesting more output bits.This is the so-called extendable output feature, and for Cyclist this is provided quite naturally by the Squeeze() function.
Here, some care must be taken for interoperability: For supporting use cases such as the one in Section 3.2.4,Cyclist considers squeezing calls as being in distinct domains.This means a Cyclist objects with some given history, the n + m bytes returned by Squeeze(n) || Squeeze(m) and Squeeze(n + m) will be the same in the first n bytes and differ in the last m bytes.If an extendable output is required without this feature, an interface can be built to allow incremental squeeze calls.For instance, an interface SqueezeMore() would behave such that calling Squeeze(n) followed by SqueezeMore(m) is equivalent to calling Squeeze(n + m) in the first place.

Keyed mode
In keyed mode, Xoodyak can naturally implement a deck function, although we focus instead on duplex-based ways to perform authentication and (authenticated) encryption.
To use Xoodyak as a keyed object, one starts it with Cyclist(K, id, counter) where K is a secret key with a fixed length of κ bits.We first show how to use the id and counter parameters, to counteract multi-target attacks and to handle the nonce, then discuss various kinds of authenticated encryption use cases.

Two ways to counteract multi-target attacks
The id is an optional key identifier.It offers one of two ways to counteract multi-target attacks.
In a multi-target attack, the adversary is not interested in breaking a specific device or key, but in breaking any device or key from a (possibly large) set.If there are u keys in a system, the security can degrade by up to log 2 u bits in such a case [8].Claim 2 reflects this in the term qivN 2 κ ≤ N 2 κ−log 2 u as q iv ≤ u.Let us assume that we wish to target a security strength level of 128 bits including multi-target attacks.Xoodyak can achieve this in two ways.
• We extend the length of the secret key.By setting κ = 128 + log 2 u, then the term • We make the key identifier id globally unique among the u keys and therefore ensure that q iv = 1.Then, there is no degradation for exhaustive key search in a multitarget setting, and the key size can be equal to the target security strength level, so κ = 128 in this example.

Three ways to handle the nonce
The counter parameter of Cyclist() is a data element in the form of a byte string that can be incremented.It is absorbed in a trickled way, one digit at a time, so as to limit the number of power traces an attacker can take with distinct inputs [23].At the upper level, the user or protocol designer fixes a basis 2 ≤ B ≤ 256 and assumes that the counter is a string in Z * B .A possible way to go through all the possible strings in Z * B is as follows.First, the counter is initialized to the empty string.Then, as the counter is incremented, it takes all the possible strings in Z 1 B , then all the possible strings in Z 2 B , and so on.The counter shall be absorbed starting with the most significant digits.This allows caching the state after absorbing part of the counter as the first digits absorbed will change the least often.The smaller the value B, the smaller the number of possible inputs at each iteration of the permutation, so the better protection against power analysis attacks and variants.
This method of absorbing a nonce, as a counter absorbed in a trickled way, is desired in situations where protection against power analysis attacks matter.Otherwise, the nonce can be absorbed at once with Absorb(nonce) just after Cyclist(K, id, ϵ).
Finally, a third method consists in integrating the nonce with the id parameter.If id is a global nonce, i.e., it is unique among all the keys used in the system, this also ensures q iv = 1 as explained above.

Authenticated encryption
We propose using Xoodyak for authenticated encryption as follows.To encrypt a plaintext P under a given nonce and associated data A under key K with identifier id, and to get a tag of t = 16 bytes, we make the following sequence of calls: To decrypt (C, T ), we proceed similarly: If the nonce is not repeated and if the decryption does not leak unverified decrypted ciphertexts, then we have L = Ω = 0 here, see Claim 2. The resulting simplified security claim is given in the following corollary.

Session authenticated encryption
Session authenticated encryption works on a sequence of messages and the tag authenticates the complete sequence of messages received so far.Starting from the sequence in Section 3.2.3,we add the support for messages (A i , P i ), where A i , P i or both can be empty.In this example, T 2 authenticates (A 2 , P 2 ) • (A 1 , P 1 ).The third message has no plaintext, the fourth message has no associated data, and the fifth message is empty.In such a sequence, the convention is that the call to Squeeze() ends a message.Since it appears in the processing history, there is no ambiguity on the boundaries of the messages even if some of the elements (or both) are empty.
The use of empty messages may be clearer in the case of a session shared by two (or more) communicating devices, where each device takes a turn.A device may have nothing to say and so skips its turn by just producing a tag.
To relate to Claim 2, we have to determine L by counting the number of invocations to Absorb() that follow Squeeze().If the nonce is not repeated and if the decryption does not leak unverified decrypted ciphertexts, we have L = T − q, with T the number of messages processed (or tags produced), and Ω = 0.

Ratchet
At any time in keyed mode, the user can call Ratchet().This causes part of the state to be overwritten with zeroes, thereby making it computationally infeasible to compute the state value before the call to Ratchet().
In an authenticated encryption scheme, the call to Ratchet() can be typically inserted either just before producing the tag or just after.The advantage of calling it just before the tag is that it is most efficient: It requires only one extra call to the permutation f .An advantage of calling it just after the tag is that its processing can be done asynchronously, while the ciphertext is being transmitted and it waits for the next message.Unless Ratchet() is the last call, the number of calls to it must be counted in Ω.

Rolling subkeys
As an alternative to using a long-term secret key together with its associated nonce that is incremented at each use, Cyclist offers a mechanism to derive a subkey via the SqueezeKey() call.On an encrypting device, one can therefore replace the process of incrementing and storing the updated nonce at each use of the long-term secret key with the process of updating a rolling subkey: Here ℓ sub should be chosen large enough to avoid collisions, say ℓ sub = 32 bytes (256 bits).Assuming that there are no collisions in the subkeys, L = 0 and Ω is the number of calls to Ratchet().
Using Cyclist this way offers resilience against side channel attacks, as the long-term key is not exposed any more and can even be discarded as soon as the first subkey is derived.The key to attack becomes a moving target, just like the state in session authenticated encryption.

Nonce reuse and release of unverified decrypted ciphertext
The authenticated encryption schemes presented in this section assume that the nonce is unique per session, namely that the value is used only once per secret key.It also assumes that an implementation returns only an error when receiving an invalid cryptogram and in particular does not release the decrypted ciphertext if the tag is invalid.If these two assumptions are satisfied, we refer to this as the nominal case; otherwise, we call it the misuse case.
In the misuse case security degrades and hence we strongly advise implementers and users to respect the nonce requirement at all times and never release unverified decrypted ciphertext.We detail security degradation in the following paragraphs.
A nonce violation in general breaks confidentiality of part of the plaintext.In particular, two sessions that have the same key and the same process history (i.e., the same K, id, counter and the same sequence of associated data, plaintexts) will result in the same output (ciphertext, tag).We call such a pair of sessions in-sync.Clearly, in-sync sessions leak equality of inputs and hence also plaintexts.As soon as in-sync sessions get different input blocks, they lose synchronicity.If these input blocks are plaintext blocks, the corresponding ciphertext blocks leak the bitwise difference of the corresponding plaintext blocks (of R kout = 24 bytes).We call this the nonce-misuse leakage.
Release of unverified decrypted ciphertext also has an impact on confidentiality as it allows an adversary to harvest keystream that may be used in the future by legitimate parties.An adversary can harvest one key stream block at each attempt.
Nonce violation and release of unverified decrypted ciphertext have no consequences for integrity and do not put the key in danger for Xoodyak.This is formalized in Corollary 3.

Corollary 3. Assume that (1) Xoodyak satisfies Claim 2;
(2) this authenticated encryption scheme is fed with a single κ-bit key with κ ≤ 192.Then, except for nonce-misuse leakage and keystream harvesting, it can be distinguished from an ideal scheme with an advantage whose dominating terms are: This translates into the following security strength levels assuming a t-byte tag (the complexities are in bits): computation data plaintext confidentiality (nominal case) min(128, κ, 8t) 64 plaintext confidentiality (misuse case) -plaintext integrity min(128, κ, 8t) 64 associated data integrity min(128, κ, 8t) 64

Authenticated encryption with a common secret
A key exchange protocol, such as Diffie-Hellman or variant, results in a common secret that usually requires further derivation before being used as a symmetric secret key.To do this with a Cyclist object, we can use an object in hash mode, process the common secret, and use the derived key in a new object that we start in keyed mode.For example: Note that if ℓ ≤ R hash , an implementation can efficiently chain K D ← Squeeze(ℓ) and the subsequent reinitialization Cyclist(K D , ϵ, ϵ).Since K D is located in the outer part of the state, it needs only to set the rest of the state to the appropriate value before calling f .Note also that if at least one of the public key pairs is ephemeral, the common secret K AB is used only once and no nonce is needed.

Design rationale
In this section, we give the design rationale of Xoodyak.First, we give the general strategy.Then, we report on the generic security of the Cyclist mode and relate it to Xoodyak's security claim.Finally, we highlight the properties of the Xoodoo [12] permutation.

Design strategy
Xoodyak connects a mode of operation, namely Cyclist, to a permutation, namely Xoodoo [12].The design strategy is hermetic in the following sense: We chose the number of rounds in Xoodoo such that the best attacks on Xoodyak are (claimed to be) the generic attacks on the Cyclist mode.This is visible in the security claims Claim 1 and 2, as they replicate the best known security bounds of the sponge and keyed duplex constructions.In contrast, a non-hermetic strategy would keep some buffer between the claimed security level and the generic attacks.
Note that the strategy behind Xoodyak differs from the so-called "hermetic sponge strategy" [5].Putting aside definitional issues, the hermetic sponge strategy described in [5] targets the absence of distinguishers in an absolute sense, whereas we only consider the security of the resulting function Xoodyak.Hence we do not claim that Xoodoo [12] is free of distinguishers, only that it is strong enough when plugged in Cyclist.

Generic security and the security claim
We now give more details to relate the security of the sponge and keyed duplex constructions to Xoodyak's security claim.

Xoodyak in hash mode
In hash mode, Cyclist can be reduced to the sponge construction with a capacity of c = b − 8R hash − 2, so c = 254 bits in the case of Xoodyak.In addition to the contents of the input blocks of R hash bytes prepared by Absorb(), variable frame bits can be added at only two additional positions (see Down() in Algorithm 3), hence accounting for a reduction of 2 bits in the capacity.We then make a flat sponge claim [2] with claimed capacity equal to c, hence accounting for a success probability of

Xoodyak in keyed mode
When in keyed mode, Cyclist can be rephrased in terms of calls to the FSKD with c = b − 8R kout , so c = 192 bits in the case of Xoodyak.A crucial property of the FSKD is that each duplexing call starts with applying the permutation f , then generates a block of output and finally adds an input block to the outer state.In the language of Cyclist, a duplexing call translates into a sequence of Up() followed by Down(X).This cycle is exactly an iteration of Encrypt(), where the plaintext block is given before the corresponding keystream block is output, so an iteration of Encrypt() directly translates to a call to FSKD.A similar comment applies to AbsorbAny(), possibly except the first iteration.However, a call to SqueezeAny() always ends with Up(), without knowing what the next input will be.To simulate this and properly remap it to the FSKD setting [12], we can say that in that case the Cyclist object gets its output block by making a duplexing call with an arbitrary input block.When the actual input block becomes known, it restarts the whole FSKD object with the same queries, but this time with the correct input block.This is like re-doing a query with the same path and is accounted for in L each time it happens.
The different terms of Claim 2's Eq. ( 1) stem from the security bound in the FSKD paper [12], which we now detail. ) are present as is in Eq. ( 1).
• H min(D K ) = H coll (D K ) = κ in our setting, so H min(D K ) as uN 2 κ (since q iv ≤ u), and

Decodability
In this section, we show that, given the sequence of b-bit blocks that are added to the state between each call to f , one can recover the process history of the Xoodyak object, together with the secret key if in keyed mode.First let us observe that any sequence of calls to the Cyclist object is translated internally into an alternating sequence of Down(X i , c D ) and Up(|Y i |, c U ) steps.The first step is the internal input step that takes a message block X i , applies a simple reversible padding to it and injects the result into the state, complemented optionally by a color byte c D , i.e., a byte that performs domain separation between the different operations.The second step is the internal output step, which first optionally injects a color byte c U into state, applies the permutation f and then produces the requested number of bytes as output.Since the parts of the state that these two steps deal with are not overlapping, and since each input block X i is padded in a reversible way, it is straightforward to extract from the b-bit block sequence the corresponding calls to Down() and Up() along with their parameters X i , c D and c U .We ignore the output length parameter |Y i | that is not necessary for the decodability.
In general, each Cyclist call starts with a first colored step and continues with zero, one or several uncolored steps.One can use this property to easily detect where each call starts in the alternating sequence of Down() and Up() steps.There are a few exceptions to this color property that we detail now.
The most notable exception is in hash mode, where none of Squeeze() steps are colored.If there are Down() steps, these will have empty input strings.For the sake of decodability, we can then simply consider that these steps are part of the previous call.
There are also exceptions in keyed mode.In the case the phase is down, Absorb() will start with an uncolored Up() step.This case may occur for instance if Absorb() is called twice in a row.A similar yet more subtle situation occurs if Squeeze() is called after any call that terminates with a Up() step.In that case, Squeeze() starts with an implicit uncolored void step, i.e., a Down()-like step that has no effect on the state.The same situation occurs for Encrypt(), Decrypt() and Ratchet().For all these exceptions, we can in fact either ignore the first uncolored step or consider that this step is part of the sequence attached to the previous call.Since each call to Cyclist is associated with a unique color, we can then use this color property to decode the alternating step sequence and extract the corresponding call parameters.
To summarize, the decodability of Cyclist works as follows.First, we convert the sequence of b-bit blocks that are added to the state into the corresponding sequence of step calls Down() and Up() along with their parameters.Working backward, we cut this sequence into sub-sequences, each starting with a colored step (or a void step) and followed by zero, one or more uncolored steps.We associate then each sub-sequence to corresponding call, reconstructing when necessary the message parameter from the concatenation of all block parameters extracted in the sub-sequence.This is illustrated in Table 4.In hash mode, we observe that although calls to Squeeze() are not meant to be decodable, some of them can still be decoded as a side-effect of the insertion of a void step (denoted d()) between two consecutive calls to Squeeze(), or due to empty down steps that appear in long Squeeze() calls (ℓ > R hash ).

Choice of the permutation
The choice of the permutation was driven by the idea of sharing resources between hash and keyed modes.The size of the permutation is therefore determined mainly by the hash mode, as for a given security level, it requires more capacity than the keyed mode.Since 128-bit security is desired, we need to have a capacity of at least 256 bits to prevent collisions.The permutation should therefore be wider than 256 bits, but not too much Table 4: Matching up and down sub-sequences with process history.A possible candidate was Keccak-p[400, n r ], as the permutation size leaves enough room for the input block.However, it uses operations on 16-bit lanes but 16-bit processors are not so common nowadays.Instead, the choice of Xoodoo was quite natural as it shares a lot of similarity with the Keccak-p family and works on 32-bit lanes.The entire state of 384 bits can be held in 12 registers of 32 bits, making it a nice fit with the low-end 32-bit devices.

Hash mode:
For the design rationale of Xoodoo, we give here some highlights and refer to [10] for more details.Xoodoo operates on three planes of 128 bits each, which interact per 3-bit columns through mixing and nonlinear operations, and which otherwise move as three independent rigid objects.Its round function uses the five step mappings θ, ρ west , ι, χ and ρ east .The nonlinear layer χ is an instance of the transformation χ that was already described and analyzed in [9], and that operates on 3 bits in Xoodoo.It has algebraic degree 2, it is involutive and hence r rounds of Xoodoo or its inverse cannot have an algebraic degree higher than 2 r .The mixing layer θ is a column parity mixer [22].As in both the parity plane computation in θ and in χ the state bits interact only within columns, the dispersion steps aim at dislocating the bits of the columns between every application of θ and of χ.For that reason, ρ east and ρ west shift the planes, treating them as rigid objects, between each χ and each θ step.Finally, the translation-invariance symmetry is destroyed by adding a round constants in the step ι.
The Xoodoo round function exhibits fast avalanche properties: It needs 3.5 rounds or 2 inverse rounds to satisfy the strict avalanche criterion [24].Like Keccak-p, it has so-called weak alignment [4], where alignment characterizes the way differences or linear correlations propagate.The weak alignment has the advantage of making Xoodoo less susceptible to truncated differentials attacks or to trail clustering effects.
Finally, in terms of differential and linear cryptanalysis, Xoodoo has strong bounds on the weight of its trails.Note that the weight of a trail relates to its differential probability or its correlation, see [10, Section 5.2] for more details.Since the publication of Xoodoo in [10], we have extended the trail search and improved the bounds.Table 5 shows the currently known lower bounds.
The choice for the number of rounds, namely 12, comes for one part from our experience in designing sponge-based hash functions and authenticated encryption schemes, and for another part from the similarity to Keccak-p on which extensive cryptanalysis has been performed in the last ten years [7].With 12 rounds, Xoodoo [12] has strong avalanche, differential and linear propagation properties, even stronger than those of Keccak-p[400, n r ] in terms of differential and linear trails.Even if an attack can somehow skip 4 rounds, it is guaranteed that any 8-round trail, either differential or linear, has weight at least 148.For hashing, the best collision or (second) preimage attack on Keccak reaches only 5 or 6 rounds, depending on how many degrees of freedom are available [21].Note that in hashing mode, Xoodyak has a much smaller rate, hence much less degrees of freedom, than the aforementioned Keccak instances.
For keyed operations, we believe that Xoodoo [12] is suitable to be plugged in the full-state keyed duplex construction, on which Cyclist relies.As a comparison, this is the same number of rounds that was used for Keccak-p in Keyak [6], also relying on the full-state keyed duplex construction.

Known attacks
At the time of writing, there are no known attacks on Xoodyak and therefore Claims 1 and 2 can plausibly be believed to hold.
Xoodyak is built on strong foundations and based on conservative design choices.There is a large number of research papers on the generic security of sponge and duplexbased modes, on Keccak, Ketje, Keyak and other permutation-based designs for hashing or authenticated encryption.These show the fairly wide understanding of the field around Xoodyak by the cryptographic community.
It is interesting to note that cube attacks were attempted on a Xoodoo-based authenticated encryption scheme following the same mode as Ketje [20].The authors succeed on the initialization phase reduced to 6 rounds of Xoodoo.Despite that Xoodyak does not use the same mode as Ketje, there is nevertheless significant similarity between their initializations, and we can deduce from this research that 12 rounds provide enough safety margin against this type of attacks.Furthermore, the authors discuss the effects of switching from 5-bit to 3-bit χ between Keccak-p and Xoodoo, and argue that the narrower χ contributes to an increased resistance against cube-attack-like analysis.

Tunable parameters
Xoodyak does not have user-chosen parameters, as the security claims apply to the only defined instance of Xoodyak.In contrast, Keccak has user-chosen parameters, namely the rate and capacity, for which the full range is covered by a security claim.
This said, should the need arise, we can already identify the parameters that could be modified to adapt Xoodyak's performance or security.
• The number of rounds of the permutation.Clearly, this is an essential parameter to protect against shortcut attacks.Reducing it can improve the performance but lower the safety margin.Should shortcut attacks be found, it can be increased to add safety margin.
• The different rates R hash , R kin and R kout .In a hermetic approach, tuning the rates (hence the associated capacities) have an impact mainly on the generic security.Increasing such a rate would have a positive impact on performance and the expense of the generic, and therefore claimed, security levels.For instance, we have a lot of margin in terms of data complexity in the case L = Ω = 0 (see Corollary 2), and in that case we could increase R kout to, say, 28 bytes.In the other direction, we could also wish to increase the generic and claimed security levels by reducing R hash or R kin .Decreasing these rates may also be a way to counteract some shortcut attacks, but this idea is not in the spirit of a hermetic approach.

Implementation aspects
The implementation aspects of Xoodyak essentially rely on those of the underlying permutation Xoodoo.We therefore refer to [10, Section 4] for more details.
In Table 6, we report on the performance of Xoodyak on ARM Cortex-M0 and -M3.

Submission to NIST Lightweight Cryptography Standardization Process
This document is part of the submission of Xoodyak to NIST Lightweight Cryptography Standardization Process.
• The algorithm submitted for authenticated encryption with associated data (AEAD) is the sequence in Section 3.2.3executed with Xoodyak.By default, the key length is κ = 128 bits, there is no global key identifier (id = ϵ), the nonce length is 128 bits and the tag length is 128 bits.The amount of data that can be processed by a key is only implied by the security claim.
• The algorithm submitted for hashing is the first sequence in Section 3.1 executed with Xoodyak.The default output length is n = 32 bytes, or otherwise freely chosen as in a XOF.The limit on the message size is only implied by the security claim.
These two algorithms share the same underlying Xoodyak algorithm and an implementation would naturally share the Xoodoo permutation and several input-output operations for absorbing and squeezing data.These two algorithms are therefore paired to be evaluated jointly.

Figure 1 :
Figure 1: Toy version of the Xoodoo state, with lanes reduced to 8 bits, and different parts of the state highlighted.

Corollary 1 .
Assume that Xoodyak satisfies Claim 1.Then, this hash function has the following security strength levels, with n the output size in bytes: collision resistance min(8n/2, 128) bits preimage and second preimage resistance min(8n, 128) bits m-target preimage resistance min(8n − log m, 128) bits

Table 1
Definition of Xoodoo[n r ] with n r the number of rounds Parameters: Number of rounds n r for Round

Table 2 :
The round constants c i with −11 ≤ i ≤ 0, in hexadecimal notation (the least significant bit is at z and wait for next message Absorb(A 2 )

Table 5 :
The weight of the best differential and linear trails (or lower bounds) as a function of the number of rounds.

Table 6 :
Performance figures of Xoodyak in cycles per byte.