SKINNY-AEAD and SKINNY-Hash

We present the family of authenticated encryption schemes SKINNY-AEAD and the family of hashing schemes SKINNY-Hash. All of the schemes employ a member of the SKINNY family of tweakable block ciphers, which was presented at CRYPTO 2016, as the underlying primitive. In particular, for authenticated encryption, we show how to instantiate members of SKINNY in the Deoxys-I-like ΘCB3 framework to fulfill the submission requirements of the NIST lightweight cryptography standardization process. For hashing, we use SKINNY to build a function with larger internal state and employ it in a sponge construction. To highlight the extensive amount of third-party analysis that SKINNY obtained since its publication, we briefly survey the existing cryptanalysis results for SKINNY-128-256 and SKINNY-128-384 as of February 2020. In the last part of the paper, we provide a variety of ASIC implementations of our schemes and propose new simple SKINNY-AEAD and SKINNY-Hash variants with a reduced number of rounds while maintaining a very comfortable security margin.


Introduction
SKINNY is a family of lightweight tweakable block ciphers proposed at CRYPTO 2016 [BJK + 16a]. We specify how to provide the authenticated encryption and hashing functionalities, with the parameters as required in the NIST lightweight cryptography standardization process, by using SKINNY as a base primitive.
Parts of this work is based on already published results. The new contributions can be summarized as follows: We show how members of the SKINNY family of tweakable block ciphers can be instantiated in the ΘCB3 framework [KR11] in order to fulfill the requirements for the NIST lightweight cryptography standardization process 1 (see also [Nat18]) and provide 6 members of a new family of AEAD schemes, called SKINNY-AEAD. We further use members of the SKINNY family to construct functions with state sizes of 256 and 512 bit, which can be used in a sponge-based hashing mode, and define two members of a new family of hash functions, called SKINNY-Hash. We provide several ASIC implementations for all our SKINNY-AEAD and SKINNY-Hash members. To stress the extensive amount of existing cryptanalysis of the SKINNY family of tweakable block ciphers, we provide a survey on the external cryptanalysis of SKINNY-128-256 and SKINNY-128-384 as of February 2020.

SKINNY-AEAD
In short, SKINNY-AEAD uses a mode following the general ΘCB3 framework, instantiated with SKINNY-128-384. The fact that SKINNY is a beyond birthday-bound secure tweakable block cipher enables to achieve the provable security providing full security in the noncerespecting setting. A similar mode was also employed in the third-round CAESAR candidate Deoxys-I [JNPS16]. Our primary design takes a 128-bit key, a 128-bit nonce, and an associated data and a message of up to 2 64 × 16 bytes. It then outputs a ciphertext of the same length as the plaintext and a 128-bit tag. We also specify other members of this family to support any combination of n -and t -bit nonces and tags, respectively, where n ∈ {96, 128} and t ∈ {64, 128}.
Moreover, we also specify the lightweight version instantiated with SKINNY-128-256. This design is motivated from the observation that the submission requirement to support 2 50 input bytes might be unnecessary for several of the use cases of the lightweight cryptography. This family consists of two members that take a 128-bit key, a 96-bit nonce, and an associated data and a message of up to 2 28 bytes as input and produce the ciphertext and a t -bit tag, where t ∈ {64, 128}. Because of the restriction of the maximum number of input message bytes, this family does not satisfy the submission requirement to support input messages of up to 2 50 bytes, yet provides even smaller and faster AEAD schemes.

SKINNY-Hash
SKINNY-Hash consists of two members of the hash function schemes that adopt the wellknown sponge construction. Our primary member uses a 384-bit to 384-bit function built with SKINNY-128-384 to provide a 128-bit secure hash function and the secondary member uses a 256-bit to 256-bit function built with SKINNY-128-256 to provide a 112-bit secure hash function.

Features
Before going into the specifications, we briefly summarize the main features of our design.
• Well-understood design and high security margin. The SKINNY family of tweakable block ciphers was designed as a solid Substitution-Permutation network (SPN) having a well-analyzed security bound against the most fundamental cryptanalytic approaches: differential cryptanalysis [BS90] and linear cryptanalysis [Mat93]. In addition, SKINNY receives a lot of security analysis by third-party researchers, which demonstrates its strong resistance against cryptanalysis. The cipher can basically be understood as a tweakable version of a tailored AES which omits components not strictly necessary for the security or substitutes them by more lightweight choices. Therefore, similar cryptanalytic approaches as for AES can be applied. However, opposed to AES, the TWEAKEY framework allows to derive strong security arguments in the related-key, related-tweak setting for SKINNY. Moreover, SKINNY offers a high security margin. As of February 2020, based on our own cryptanalysis and the extensive external cryptanalysis since its publication, SKINNY-128-384 offers 28 (out of 56) rounds, and SKINNY-128-256 offers 25 (out of 48) rounds of security margin in the related-tweakey setting.
• Security proofs by a modular approach. The security of the authenticated encryption schemes and hash functions are directly inherited from the well-known and widely-applied modes of operation used in our design. Indeed, SKINNY-AEAD relies on the proofs of the ΘCB3 mode, while for SKINNY-Hash we rely on the provable security of the sponge framework. The security of our schemes can thus be reduced to the ideal behavior of the underlying primitives SKINNY-128-384 and SKINNY-128-256.
• Beyond-birthday-bound security. By using a tweakable block cipher directly constructed by the TWEAKEY framework, we obtain beyond-birthday-bound security which allows to efficiently exploit the whole state. This is different to modes based on OCB, which only offers security up to the birthday bound. Such modes would require larger internal states to achieve the same security level.
Note that, however, OCB is birthday-bound secure based on the SPRP assumption while ΘCB3 is beyond-birthday-bound secure based on the STPRP assumption. Clearly, an oracle to a tweakable block cipher gives more freedom to the attacker than an oracle to a classical block cipher.
• Efficient protection against side-channel attacks. Thanks to the structured Sbox of SKINNY, which is an iteration of a quadratic permutation, its Threshold Implementation [NRS11] (a provably-secure countermeasure against side-channel analysis attacks) can be efficiently made. This helps us to efficiently integrate sidechannel countermeasures into various implementations of SKINNY with minimum number of shares and limited number of fresh randomness, both affecting the area overhead of the resulting design.
• General-purpose lightweight schemes. When designing a lightweight encryption or hashing scheme, several use cases must be taken in account. While area-optimized implementations are important for some very constrained applications, throughput or throughput-over-area optimized implementations are also very relevant. Actually, looking at recently introduced efficiency measurements (the FOAM value - Figure  Of Adversarial Merit [KPPY14]), one can see that our design choices are good for many types of implementations, which is exactly what makes a good general-purpose lightweight encryption scheme.
• Efficiency for short messages. Our algorithms are efficient for short messages. For authenticated encryption, the main reason is because the design is based on a tweakable block cipher, which allows to avoid any precomputation (like in OCB, AES-GCM, etc.). In particular, the first 128-bit message block is handled directly and by taking in account the tag generation, one needs only m + 1 internal calls to the tweakable block cipher to process messages of m blocks of 128 bits each (if there is no associated data).
Our primary member for hashing requires at most 3(m + 2) calls to the tweakable block cipher for producing a 256-bit digest for a message of m blocks of 128 bits each.
• Parallelizable mode. Our AEAD schemes are fully parallelizable as they are based on the ΘCB3 mode, which employs independent calls to the tweakable block cipher.
• Flexibility. Our AEAD design allows for smooth parameter handling. We define specific parameter sets to achieve the NIST requirements, but any user can in principle choose its own separation into nonce, key and block counter by adapting the key and tweak sizes at his/her convenience. This flexibility comes from the unified vision of the key and tweak material brought by the TWEAKEY framework. In a nutshell, one implementation of the underlying cipher is sufficient to support all versions with different key and tweak sizes (which sum up to the same size).

Specification
By we denote the concatenation of bitstrings. Given a bitstring B, we denote by B j the j-times concatenation of B, where B 0 is defined to be the empty string . For instance 0 10 3 = 010 3 = 01000 and (10) 3 = 101010. We denote the length of a string B in bits by |B|, where | | = 0.

AEAD
In a nutshell, our AEAD scheme adopts a mode that can be described in the ΘCB3 framework [KR11] by using either SKINNY-128-384 or SKINNY-128-256 as the underlying tweakable block cipher. Our primary member instantiates the ΘCB3 framework with the tweakable block cipher SKINNY-128-384 used with 128-bit keys, 128-bit nonces and which produces 128-bit authentication tags. Along with this primary AEAD scheme, we propose three additional ones that extend the possible parameters and allow users to choose between two nonce sizes (96 bits or 128 bits), and two tag sizes (128 bits or 64 bits). All of them are consistent with NIST's requirements.
We also specify two secondary options that are designed for processing short inputs. Those are based on a second tweakable block cipher, namely SKINNY-128-256. The nonce size is fixed to 96 bits, while users can choose between two tag sizes: 128 or 64 bits. The maximum number of message blocks that can be processed with SKINNY-128-256-based members is limited to 2 28 bytes. Users need to be careful about its usage because these two algorithms do not meet NIST's requirements to support input messages of up to 2 50 bytes.

Hashing
Overall, the SKINNY-Hash family contains two function-based sponge constructions (see [BDPA11]), in which the underlying functions are built from the SKINNY-128-384 and SKINNY-128-256 tweakable block ciphers. Both members, denoted SKINNY-tk3-Hash and SKINNY-tk2-Hash, process input messages of arbitrary length and output a 256-bit digest.
A list of our proposed AEAD schemes (members M1 to M6), together with the two hashing algorithms is provided in Table 1. For comparisons, we pair the AEAD members M1, M2, M3 and M4 with the hashing algorithm SKINNY-tk3-Hash and the AEAD members M5 and M6 with SKINNY-tk2-Hash as the constructions in the respective pairs are based on the same variant of the SKINNY tweakable block cipher.

SKINNY-128-256 and SKINNY-128-384
We already published the SKINNY family of tweakable block ciphers in 2016 in [BJK + 16a]. For the sake of completeness, we provide the specifications of the two members of the SKINNY family that are relevant for our constructions, namely SKINNY-128-256 and SKINNY-128-384. The tweakable block ciphers SKINNY-128-256 and SKINNY-128-384 both have a block size of n = 128 bit and the internal state is viewed as a 4 × 4 square array of cells, where each cell contains a byte. We denote IS i,j the cell of the internal state located at Row i and Column j (counting starts from 0). One can also view this 4 × 4 square array of cells as a vector of cells by concatenating the rows. Thus, we denote with a single subscript IS i the cell of the internal state located at Position i in this vector (counting starts from 0) and we have that IS i,j = IS 4·i+j .
The ciphers follow the TWEAKEY framework from [JNP14] and therefore take a tweakey input -instead of a key only -without any distinction between key and tweak input.
The two tweakable block ciphers SKINNY-128-256 and SKINNY-128-384 mainly differ in the size of the tweakey input: they respectively process 2n = 256 or 3n = 384 tweakey bits. The tweakey state is also viewed as a collection of two (resp., three) 4 × 4 square arrays of cells of 8 bits each. We denote these arrays T K1 and T K2 for SKINNY-128-256 and T K1, T K2 and T K3 for SKINNY-128-384. Moreover, we denote T Kz i,j the cell of the tweakey state located at Row i and Column j of the z-th cell array. As for the internal state, we extend this notation to a vector view with a single subscript: T K1 i , T K2 i and T K3 i .
We now give the structural specifications of the ciphers.

Initialization
The ciphers receive a plaintext m = m 0 m 1 · · · m 14 m 15 , where the m i are 8-bit words. The initialization of the ciphers' internal state are performed by simply setting IS i = m i for 0 ≤ i ≤ 15, i.e., Note that the state is loaded row-wise rather than in the column-wise fashion as done for example in the AES. This is a more hardware-friendly choice, as pointed out in [MPL + 11].

Round Function
One encryption round of SKINNY is composed of five operations in the following order: SubCells, AddConstants, AddRoundTweakey, ShiftRows and MixColumns (see illustration in Figure 1). The number r of rounds to perform during encryption depends on the tweakey size. In particular, SKINNY-128-256 applies r = 48 and SKINNY-128-384 applies r = 56 rounds. Note that no whitening key is used.
SubCells. An 8-bit Sbox S 8 is applied to every cell of the ciphers internal state. Its design is simple and inspired by the PICCOLO Sbox [SIH + 11].
If x 0 , . . ., x 7 represent the eight input bits of the Sbox (x 0 being the least significant bit), it basically applies the below transformation on the 8-bit state: followed by the bit permutation repeating this process four times, except for the last iteration where there is just a bit swap between x 1 and x 2 . Besides, we provide in Appendix A the table of S 8 and its inverse in hexadecimal notations.
The six bits are initialized to zero, and updated before used in a given round. The bits from the LFSR are arranged into a 4 × 4 array and only the first column of the state is affected by the LFSR bits, i.e., The round constants are combined with the state, respecting array positioning, using bitwise exclusive-or. The values of the (rc 5 , rc 4 , rc 3 , rc 2 , rc 1 , rc 0 ) constants for each round are given in Table 2 below, encoded to byte values for each round, with rc 0 being the least significant bit.  Finally, every cell of the first and second rows of T K2 (resp., T K2 and T K3) are individually updated with an LFSR. The LFSRs used are given in Table 3 (x 0 stands for the LSB of the cell).
ShiftRows. As in the AES, in this layer, the rows of the cipher state cell array are rotated. More precisely, the second, third, and fourth cell rows are rotated by 1, 2 and 3 positions to the right, respectively. In other words, a permutation P is applied on the cells positions of the cipher internal state cell array: for all 0 ≤ i ≤ 15, we set IS i ← IS P [i] with P = [0, 1, 2, 3, 7, 4, 5, 6, 10, 11, 8, 9, 13, 14, 15, 12].
MixColumns. Each column of the cipher internal state array is multiplied by the following binary matrix M: The final value of the internal state array provides the ciphertext with cells being unpacked in the same way as the packing during initialization. Test vectors for SKINNY-128-256 and SKINNY-128-384 are provided in Appendix B. Note that decryption is very similar to encryption as all cipher components have very simple inverses (SubCells and MixColumns are based on a generalized Feistel structure, so their respective inverse is straightforward to deduce and can be implemented with the exact same number of operations).

The AEAD Scheme SKINNY-AEAD
The authenticated encryption scheme adopts the ΘCB3 mode using either SKINNY-128-384 or SKINNY-128-256 as the underlying tweakable block cipher, depending on the member as shown in Table 1. In the following, we provide the detailed specification of the scheme. Let SKINNY-128-384 tk (P ) denote the encryption of a plaintext P under the tweakey tk with the SKINNY-128-384 algorithm and let SKINNY-128-256 tk (P ) be the encryption of a plaintext P under the tweakey tk with the SKINNY-128-256 algorithm. Let further SKINNY-128-384 −1 tk (C) (resp. SKINNY-128-256 −1 tk (C)) denote the decryption of a ciphertext C under the tweakey tk with the SKINNY-128-384 (resp. SKINNY-128-256) algorithm.
By (N, A, M ), we denote the tuple of a nonce N , associated data A and a message M , where A and M can be of arbitrary length (including empty).

Domain Separation.
We first define a 1-byte string that ensures independence of tweakable block cipher calls for different kinds of computations (i.e., domain separation) and also for different SKINNY-AEAD members. Let b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 be the bitwise representation of this byte, where b 7 is the MSB and b 0 is the LSB (see also Figure 2). Then, we use the following convention: In the following paragraphs, we specify the computations of a ciphertext C and a tag tag for a given (N, A, M ), key K, n , and t . For simplicity, we denote this single byte by d 0 , d 1 , d 2 , d 3 , d 4 , or d 5 depending on the 3-bit value for the domain separation, i.e.: Associated Data Processing. The computation for the associated data is depicted in Figure 3. If the byte-length of A is not a multiple of the block size (i.e., 16 bytes), it has to be padded. In particular, if |A| denotes the length of A in bit, let A = A 0 A 1 . . . A a −1 A a with |A i | = 128 for i ∈ {0, . . . , a − 1} and |A a | < 128. Note that if |A| is a multiple of 128, |A a | is set to the empty string and no padding is applied. Otherwise, we apply the padding pad10* to A a which is defined as pad10* : X → X 1 0 127−|X| mod 128 .
Each associated data block A i is processed in parallel by SKINNY-128-384 as a plaintext input under a 384-bit tweakey, where the structure of the 384-bit tweakey is as follows.
-The tweakey bytes tk 0 , . . . , tk 15 store 8 bytes from a 64-bit LFSR, followed by 7 bytes of zeros, and then the single byte for the domain separation (d 2 or d 3 whether it is a padded block). The 64-bit LFSR plays the role of a block counter. It is defined as follows: Let x 63 x 62 x 61 · · · x 2 x 1 x 0 denote the 64-bit state of the LFSR. It is initialized to LFSR 0 = 0 63 1 and updated as LFSR t+1 = upd 64 (LFSR t ), where the update function upd 64 is defined by the polynomial x 64 + x 4 + x 3 + x + 1 as Before loaded in the tweakey state, the order of the bytes of the LFSR state is reversed, i.e., tk 0 tk 1 . . . tk 15 = rev 64 (LFSR t ) 0 56 d 2 (resp. tk 15 = d 3 for the last padded block), where rev 64 is defined by -The tweakey bytes tk 16 tk 17 . . . tk 31 store the nonce N . If n = 1, i.e., if the nonce size is 96 bits, 32-bit zeros are appended to N , thus, tk 16 tk 17 . . . tk 31 = N 0 32 .
The XOR sum of each block's output is stored as Auth, which is later used in the final authentication tag computation.
Remind that if the size of A is not a multiple of 128 bits, we use the domain separation byte d 3 to process the last padded block.

Figure 3:
Handling of the associated data: in the case where the associated data is a multiple of the block size, no padding is needed. In the figures, E refers to SKINNY-128-384. For simplicity, we denote the block counter by 0 . . . , a − 1 (resp., 0, . . . , a ) but actually refer to the state of the LFSRs serving as a block counter. Figure 4 and Figure 5. First, suppose that the size of M in bits is a multiple of 128 ( Figure 4). In that case, M is parsed into 128-bit blocks M 0 , M 1 , M 2 , . . . , M m−1 and no padding is applied. Each message block M i is processed by SKINNY-128-384 as a plaintext input under a particular 384-bit tweakey and the output is taken as the corresponding ciphertext block. The structure of the 384-bit tweakey differs from the associated data processing only by the domain separation byte. Here, the byte tk 15 is fixed to d 0 instead of d 2 .

Encryption. The encryption of M is depicted in
To produce the tag, the XOR sum of the plaintext blocks noted Σ is computed and then encrypted by SKINNY-128-384, where the 384-bit tweakey is analogously defined as tk 0 tk 1 . . . tk 47 = rev 64 (LFSR m ) 0 56 d 4 N K. Finally, the output of this encryption is XORed with Auth. If t = 0, i.e., the tag size is 128 bits, the result of this XOR is a tag. If t = 1, i.e., the tag size is 64 bits, the result of this XOR is truncated by trunc 64 to 64 bit, where the truncation functions trunc i are defined for inputs of length at least i by

Algorithm 1 The authenticated encryption algorithm SKINNY-AEAD-M1-Enc(K, N, A, M )
In: Key K, nonce N (both 128 bit), associated data A, message M (both arbitrarily long) Out: (C, tag), where C is the ciphertext with |C| = |M | and tag is a 128-bit tag Similar as for the unpadded case, the encryption is XORed with Auth and truncated in the same way as described above if t = 1. Figure 5: Encryption of SKINNY-AEAD with SKINNY-128-384 with padded message when t = 128. The last ciphertext block C lm is further truncated to have the same size as M lm . Again, we denote the block counter by 0, . . . , m + 1 but actually refer to the state of the LFSRs serving as a block counter.
Decryption. The decryption and tag verification procedure for given (K, N, A, C, tag) is straightforward.

SKINNY-AEAD with SKINNY-128-256
This case applies to the members M5 and M6 (refer to Table 1). It is very similar to the previous case, the main difference being the definition of the tweakey states due to their smaller sizes.
Domain Separation. The domain separation is exactly the same as in the previous case. Note that b 4 is always fixed to 1 as only 96-bit nonces can be used in the members M5 and M6.
Associated Data Processing. The computation for associated data A is very similar to the previous case. The difference is that each associated data block A i is processed by SKINNY-128-256 as a plaintext input under a 256-bit tweakey, where the structure of the 256-bit tweakey is as follows.
-The tweakey bytes tk 0 , . . . , tk 15 store 3 bytes from a 24-bit LFSR, the single byte for the domain separation, followed by the 12-byte nonce N . The byte for the domain separation is fixed to d 2 , i.e., 0001t 010, for a non-padded block and to d 3 = 0001t 011 for a padded block. The 24-bit LFSR is defined below.
Let x 23 x 22 x 21 · · · x 2 x 1 x 0 denote the 24 bits of the LFSR. It is initialized to LF SR 0 = 0 23 1 and updated as LFSR t+1 = upd 24 (LFSR t ), where the update function upd 24 is defined by the polynomial x 24 + x 4 + x 3 + x + 1 as Algorithm 2 The decryption algorithm SKINNY-AEAD-M1-Dec(K, N, A, C, tag) In: Key K, nonce N (both 128 bit), associated data A, ciphertext C (both arbitrarily long), 128-bit tag tag Before loaded in the tweakey state, the order of the bytes of the LFSR state is reversed, i.e., tk 0 tk 1 . . . tk 15 = rev 24 (LFSR t ) d 2 N (resp. tk 15 = d 3 for the last padded block), where rev 24 is defined by -The tweakey bytes tk 16 tk 17 . . . tk 31 store the 128-bit key K.
Encryption and Decryption. The encryption of M is also very similar to the previous case. Also, decryption and tag verification is straightforward. Formally, in Appendix C, we provide the algorithms of the authenticated encryption members M5 and M6, together with their decryption and tag verification procedure, in Algorithms 11, 12 and 13, 14, respectively.

Remarks for Further Extension
Here, we explain two additional features of our AEAD schemes, which are not officially included in the NIST submission but can be implemented efficiently depending on the user's demand.
Supporting More than 2 64 blocks with SKINNY-128-384. Recall that in the members based on SKINNY-128-384, the tweakey bytes tk 0 , . . . , tk 15 store 8 bytes for a 64-bit LFSR, followed by 7 bytes of zeros, and then a single byte for the domain separation. If the user wants to support input data of more than 2 64 blocks, it is possible to replace the 7 bytes of zeros by the following 56-bit LFSR. Note that this LFSR would be updated every 2 64 blocks, hence very rarely in comparison to the 64-bit LFSR. Let x 55 x 54 · · · x 1 x 0 denote the 56 bits of the 56-bit LFSR. It is initialized to LFSR 0 = 0 55 1 and updated as LFSR t+1 = upd 56 (LFSR t ), where the update function upd 56 is defined by the polynomial We stress that this additional functionality is only available in SKINNY-128-384-based members, and cannot be adopted in SKINNY-128-256-based members.

Acceleration of Associated Data Processing.
When associated data A is processed, we fix 128 bits or 96 bits of the tweakey state to the nonce value N for SKINNY-128-384-and SKINNY-128-256-based members, respectively. We note that it is not strictly necessary to include N during the associated data processing, hence a potential acceleration of the associated data processing could replace N with bits from A. This would reduce the number of tweakable block cipher calls for processing A. In particular, the number of calls could be halved in SKINNY-128-384-based members.

The Hash Functionality SKINNY-Hash
Overall, the SKINNY-Hash family consists of the function-based sponge constructions SKINNY-tk3-Hash and SKINNY-tk2-Hash, in which underlying functions are built with SKINNY-128-384 and SKINNY-128-256, respectively. We recall here that the sponge construction [BDPA11] can be based on a cryptographic function as well as a cryptographic permutation.

F 384 : 384-bit to 384-bit function
We build a function

F 256 : 256-bit to 256-bit function
We build a function

SKINNY-tk3-Hash
The computation of SKINNY-tk3-Hash simply follows the well-known sponge construction. Differently from many of existing instantiations, we use the function F 384 as an underlying primitive. The construction is illustrated in Figure 7. The padding pad10* is applied to an input message M (note that the padding is always applied, even if |M | is already a multiple of 128). The message blocks M i are XORed to the outer part of the state during the absorbing phase.
After the absorbing phase, the 128 bits of the rate are extracted as the first 128 bits of the 256-bit digest. Then, S 384 ← F 384 (S 384 ) is applied once again and the 128 bits of the rate are extracted as the last 128 bits of the 256-bit digest. The formal algorithm is specified in Algorithm 3.

Algorithm 4 The hashing algorithm SKINNY-tk2-Hash
In: Message M of arbitrary length Out:

SKINNY-tk2-Hash
The 256-bit state S 256 , is divided into a 32-bit outer part and a 224-bit inner part, which are initialized to the following values: A difference with the previous case is that the message M now has to be padded such that its length is a multiple of 32 bits. Therefore, we apply the padding function pad10* 32 which is defined as pad10* 32 : X → X 1 0 31−|X| mod 32 .
The message blocks M i are XORed to the outer part of the state during the absorbing phase. After the absorbing phase, the first 128 bits of the state are extracted as the first 128 bits of the 256-bit digest. Then, S 256 ← F 256 (S 256 ) is applied once again and the first 128 bits of the state are extracted as the last 128 bits of the 256-bit digest. This means that in the squeezing phase, the rate is extended to 128 bits and the capacity is reduced to 128 bits. The formal algorithm is specified in Algorithm 4.

Table of Parameters and Security of SKINNY-Hash
For a summary, parameters of SKINNY-Hash are listed in Table 4.

Security Claims
We provide our security claims for the different variants of SKINNY-AEAD and SKINNY-Hash in Table 5. Basically, for all versions of SKINNY-AEAD, we claim full 128-bit security for key recovery, confidentiality and integrity (unless the tag size is smaller than 128 bits, in which case the integrity security claims drop to the tag size) in the nonce-respecting model. For all versions of SKINNY-Hash, we claim that it is hard to find a collision, preimage or second-preimage with substantially less than 2 c/2 hash evaluations, where c represents the capacity bitsize (c = 256 for M5 and c = 224 for M6).
One can see that we do claim full 128-bit security for all variants of SKINNY-AEAD with a tag size of 128 bit for a nonce-respecting user. More precisely, confidentiality is perfectly guaranteed and the forgery probability is 2 −τ , where τ denotes the tag size, independently of the number of blocks of data in encryption queries made by the adversary. This is very different than other modes like AES-GCM [MV04] or OCB3 [KR11], which only ensure birthday-bound security. In comparison, OCB3 only provides security up to the birthday bound, more precisely up to roughly 2 n/2 blocks of data since it relies on XE/XEX (a construction of a tweakable block cipher from a standard block cipher with security only up to the birthday bound). To give a numerical example, with 2 40 blocks ciphered (about 16 TeraBytes), one gets an advantage of about 2 −48 to generate a valid tag for most operating modes in the nonce-respecting scenario. For the same amount of data, the advantage remains 2 −128 for members M1/M2/M5 of SKINNY-AEAD. We assume that the total size of the associated data and the total size of the message in SKINNY-AEAD do not exceed 2 68 bytes for M1/M2/M3/M4 and 2 28 bytes for M5/M6. Moreover, the maximum number of messages that can be handled for a same key is 2 n l for all variants of SKINNY-AEAD (n l = 128 for M1/M3, n l = 96 for M2/M4/M5/M6). This will ensure that as long as different fixed-length nonces are used, the tweak inputs of all the tweakable block cipher calls are all unique.
Related-Cipher Attacks. By encoding the length of the tag and nonce into the domain separation, we obtain a proper separation between the SKINNY-AEAD members that employ the same instance of the SKINNY tweakable block cipher. We do not claim security against related-cipher attacks between members that employ the two different instances SKINNY-128-384 and SKINNY-128-256, e.g., M2 and M5.
Nonce-Misuse Setting. The above security claims are void under reuse of nonces. As pointed out in [VV17] for the case of Deoxys-I, the scheme is vulnerable to a universal forgery attack and a CCA decryption attack with complexity of only three queries. Because we are basically using the same mode, the attacks would apply to SKINNY-AEAD as well.

Design Rationale
For a detailed design rationale of the tweakable block ciphers SKINNY-128-256 and SKINNY-128-384, we refer to the original design paper [BJK + 16a, BJK + 16b]. We decided not to modify the primitives from their original specification. The rationale for this is that none of the extensive third-party cryptanalysis, that we discuss in detail in Section 5, pointed to any weakness of the ciphers nor any bad design choices. Indeed, all the third-party cryptanalysis confirmed the validity of the original design and its rationale. We furthermore do not see any change in the specification that would improve the ciphers to the extent that would justify such a modification. All design choices of SKINNY are optimized for its goal: Obtaining a cipher well suited for many lightweight applications.

Rationale for the AEAD scheme
The reason for choosing the ΘCB3 mode for the tweakable block cipher SKINNY-128-384 or SKINNY-128-256 is its provable security providing full security in the nonce-respecting setting. More precisely, for ΘCB3 using an ideal tweakable block cipher, confidentiality is perfectly guaranteed and the forgery probability is independent of the number of blocks of data in encryption/decryption queries made by the adversary. Those strong security guarantees along with its performance features are the design rationale for our choice.
We state the security bound of ΘCB3 in the nonce-respecting setting: whereẼ is an ideal tweakable block cipher. Let A be an adversary. Then Adv priv (A) = 0 and Adv auth (A) ≤ (2 n−τ )/(2 n − 1).
We denote by Adv ±prp SKINNY-TK2 (A) and Adv ±prp SKINNY-TK3 (A) the SPRP-advantage against SKINNY-128-256 and SKINNY-128-384 respectively. Replacing the ideal tweakable block cipher with SKINNY, we have the security bounds for our members as shown in Table 6.
On OCB and Tweakable Block Ciphers. The OCB mode was first published in [RBBK01] (i.e., OCB1). It has later been refined to OCB2 in [Rog04] and finally to OCB3 in [KR11]. That last paper describes the actual ΘCB3 framework we employ in SKINNY-AEAD by using will not use it to decrypt any authentic ciphertext and tag directly.

M1, M2
Adv Recently, OCB3 employed with the AES was selected as one of the winners of the CAESAR competition in the category for high-performance applications [KR16]. However, this scheme only offers birthday-bound security. More generally, a tweakable block cipher can be described as a family of block ciphers parameterized by a public parameter, the tweak. The idea of a block cipher that gets a public parameter for achieving variability goes back to the design of the Hasty Pudding Cipher [Sch98], a submission to the AES competition. This was later formalized in the notion of a tweakable block cipher by Liskov, Rivest and Wagner at CRYPTO 2002 [LRW02]. The motivation is that independent block cipher calls are needed at the mode-of-operation level, as in OCB. Liskov, Rivest and Wagner suggested that the source of variability should be directly incorporated in the primitive itself instead at the mode-of-operation level. This is a big difference to the classical OCB mode. There, a block cipher E is employed in a construction that can be understood as a tweakable block cipher (i.e., the tweakable block cipher E T K is just defined as E ). In that sense, OCB can be seen as an instance of the more general TAE mode, the tweakable authenticated encryption mode defined in [LRW02]. Indeed, Liskov, Rivest and Wagner have already proven a similar statement as Lemma 2 in [KR11]:

Theorem 3 of [LRW02]. If E is a secure tweakable block cipher, then E used in TAE mode will be unforgeable and pseudorandom.
In other words: The advantage of the adversary only comes from the distinguishing advantage of the tweakable block cipher and not from the mode.
However, the XEX construction used in OCB and also in ΘCB3 does not lead to an ideal tweakable block cipher. In fact, it only offers security up to the birthday bound. The TWEAKEY framework [JNP14] was introduced at ASIACRYPT 2014 as a method to build tweakable block ciphers from scratch (i.e., without employing an already existing underlying block cipher in a specific construction) with strong security arguments against differential and linear attacks. The intention of the TWEAKEY framework was to obtain beyond birthday-bound secure tweakable block ciphers and to consider key and tweak as the similar type of input (called the tweakey) such that the separation into key and tweak can be done by the user in a flexible way.
It is natural to employ a beyond-birthday secure tweakable block cipher in a mode following the TAE (resp., ΘCB3) framework in order to exploit its full strength. The thirdround CAESAR candidate Deoxys-I [JNPS16] is an already existing example following this design principle.
Our Modifications. In comparison to other modes of operation, we have decided to replace the usual block counter by an LFSR, which can be implemented with just a few operations. There is indeed no reason to use the increment function x → x + 1 over the integers, as the security simply relies on the function having a maximal cycle. The same argument has been made for instance in the original OCB mode where Gray codes have been suggested to derive inner tweak values. Here in our AEAD mode, we adopted LFSRs with maximal periods and which can be easily implemented in both hardware and software as block counters.

Rationale for the Hash Function Scheme
We use the well-known sponge construction, originally presented in [BDPVA07], that is also adopted in the NIST standard SHA-3 [Dwo15] so that SKINNY-Hash can inherit its elegant features. Here, we give some arguments for our design choices with respect to the following points: 1. the sponge construction using a cryptographic function as a building block, 2. the sizes of rate and capacity, and 3. our constructions of the 256-and 384-bit functions.
Function-Based Sponge. Although a lot of existing designs following the sponge framework use a cryptographic permutation as an underlying primitive, the designers do not restrict the underlying primitive to be a permutation and show a lot of analysis for the case that the underlying primitive is a function (see [BDPA11] for a detailed documentation on cryptographic sponges and several of its variants). There does not exist any significant disadvantage to base an entire construction on a function instead of a permutation. For example, the bounds for the indifferentiability and the collision resistance are almost identical between those two constructions.
In some case, the function-based sponge constructions is more difficult to attack than the permutation based sponge constructions, because the adversary does not have access to the inverse oracle for the function based constructions. This makes a significant difference of the security against second-preimage attacks. For permutation-based constructions, second preimages can be found by generating collisions on the inner part between queries to f and f −1 , which allows a generic attack with a cost of 2 c/2 . For function-based designs on the other hand, the best strategy is performing a similar second-preimage attack against Merkle-Damgård constructions [KS05] that requires (2 c )/L where L is the number of blocks included in the first preimage.

Choices of Rate and Capacity.
We adopt the most natural choice for SKINNY-tk3-Hash. The 256-bit capacity ensures 128-bit indifferentiability. Hence, no particular attack can be performed under 2 128 computational cost.
The choice for SKINNY-tk2-Hash is very optimized for lightweight use-cases. The 224-bit capacity in the absorption phase ensures the minimum requirement of 112-bit security. We change the rate and capacity for the squeezing phase to reduce the number of function calls in the squeezing phase. The security in this situation is analyzed in [NO14]. Let c and c be the capacity in the absorption and the squeezing phases, respectively. It was shown that c can be enlarged with preserving O(2 c/2 ) security for indifferentiability as long as c ≥ c/2 + log 2 c. We are aiming at 112-bit security, hence the suitable size for c is c ≥ 224/2 + log 2 224 ≈ 119.8. Because we cannot produce 256-bit hash digest in a single block, we set c = 128 so that the 256-bit hash digest can be produced with two blocks.
The results in [NO14] are for permutation-based schemes, however we got confirmation from the authors that almost the same bound can be obtained for the function-based schemes. Strictly speaking, the bound is slightly better for the function-based schemes because the adversary cannot access the inverse oracle. F 256 and F 384 . The security argument of the sponge construction assumes the usage of a random permutation, resp., function. To provide a secure instance of the sponge, we are going to use a function indifferentiable from random. The function F 256 is indifferentiable from a 256-bit random function up to O(2 128 ) queries. Very intuitively, the only way to differentiate F 256 from an ideal object is to find the case that two simulators ofẼ tk2 in the ideal world, one is for the plaintext 0 and the other is for the plaintext 1, return the same output value under the same tweakey input. This occurs with probability 2 −128 .

Rationale of
The same intuitive argument applies to F 384 . However, the bound is worse than the one for F 256 by a factor of 3 because the adversary now has three ways to indifferentiate the real and ideal worlds: collision of the simulators output between the first and the second simulators, between the first and third simulators, and between the second and the third simulators.

Security Analysis of the SKINNY TBC
We claim security of the SKINNY family in the related-tweakey model. We now provide an analysis of its security and then mention the best cryptanalytic results published to date.

Differential/Linear Cryptanalysis
In order to argue for the resistance of SKINNY against differential and linear attacks, in [BJK + 16a] we computed lower bounds on the minimum number of active Sboxes, both in the single-key and related-tweakey models. We recall that, in a differential (resp. linear) characteristic, an Sbox is called active if it contains a non-zero input difference (resp. input mask). In contrast to the single-key model, an attacker is allowed to introduce differences (resp. masks) within the tweakey state in the related-tweakey model. We considered the three cases of choosing input differences in TK1 only, both TK1 and TK2, and in all of the tweakey states TK1, TK2 and TK3, respectively. For lower bounding the number of linear active Sboxes we used the same approach by considering the inverse of the transposed linear transformation, i.e., M . We only considered the single-key model as there is no cancellation of active Sboxes in linear characteristics, see [KLW17]. Note that those bounds are for single characteristic only and do not quantify any potential clustering into differentials (resp. linear hulls).

Other Attacks
In the original design document [BJK + 16a, BJK + 16b], we also analyzed the security of SKINNY with regard to meet-in-the-middle attacks, impossible differential attacks, integral attacks, slide attacks, invariant subspace cryptanalysis, and algebraic attacks. We provide a brief summary of the results and refer the reader to the original documents for details.

Meet-in-the-Middle Attacks
We used the property that full diffusion is achieved after six rounds (in both directions) to estimate that meet-in-the middle attacks might work up to at most 22 rounds.

Impossible Differential Attacks
We constructed an 11-round truncated impossible differential which can be used for a 16-round key-recovery attack on SKINNY members with a block size of 128 bit with data, time, and memory complexities of 2 88.5 in the single-key model.

Integral Attacks
We constructed a 10-round integral distinguisher and used it for a 14-round key-recovery attack.

Slide Attacks
The distinction between the rounds is ensured by the round constants and thus the straightforward slide attacks cannot be applied. However, due to the small state of the LFSR, round constants can collide in different rounds. We took into account all possible sliding numbers of rounds and deduced what is the difference in the constants that is obtained every time. As these constant differences might impact the best differential characteristic, we experimentally checked the lower bounds on the number of active Sboxes for all these constant differences by using MILP.
In the single-key setting, by allowing any starting round for each value of the slid pair, the lower bounds on the number of active Sboxes reach 36 after 11 rounds, and 41 after 12 rounds. We thus expect that slide attacks do not threaten the security of SKINNY.

Invariant Subspace Attacks
The non-trivial key schedule already provides a good protection against such attacks for a larger number of rounds. The main concern that remains are large-dimensional subspaces that propagate invariant through the Sbox. We checked that no such invariant subspaces exist. Moreover, we computed all affine subspaces of dimension larger than two that get mapped to (different) affine subspaces and checked if those can be chained to what could be coined a subspace characteristic. It turns out that those subspaces can be chained only for a very small number of rounds. To conclude, the non-trivial key schedule and the use of round-constants seem to sufficiently protect SKINNY against those attacks.

Algebraic Attacks
The Sbox S 8 of SKINNY members with a block size of 128 bit has an algebraic degree of 6 and thus, algebraic attacks do not seem to be a threat.

Third-Party Cryptanalysis
Since the publication of the cipher in 2016, there has been lots of cryptanalysis by external researchers. 3 To the best of our knowledge, we provide a complete list of formally published papers (in the English language) related to mathematical cryptanalysis of SKINNY, as of February 2020. We found 30 such papers in total.
In [SMB18], the authors analyze different SKINNY variants with regard to zero-correlation and related-tweakey impossible differential attacks. For SKINNY-128-256, they obtain a related-tweakey impossible differential attack on 23 rounds with time complexity of 2 243.41 , data of 2 124.41 chosen plaintexts and 2 155.41 memory. They utilize a 15-round related-tweakey impossible differential.
In [TAY17], the authors apply impossible differential cryptanalysis on SKINNY in the single-key model. They utilize the 11-round impossible differential described in the design document. They obtain a key-recovery attack of 20 rounds SKINNY-128-256 with time complexity 2 245.72 , data of 2 92.1 chosen plaintexts and memory 2 147.1 . They further attack 22 rounds of SKINNY-128-384 with time complexity 2 373.48 , data 2 92.22 and memory 2 147.22 .
In [SSD + 18], the authors used constrained programming for applying the Demirci-Selcuk meet-in-the-middle attack. The authors find an 10.5-round distinguisher and a 22-round key-recovery attack on SKINNY-128-384 with time complexity 2 382.46 , data complexity of 2 96 chosen plaintexts and memory complexity of 2 330.99 . In [CSSH19], the authors derived a 22-round Demirci-Selcuk meet-in-the-middle key-recovery attack with a reduced time complexity of 2 366.28 by using a key-bridging technique.
In [CHP + 18], the authors introduce the Boomerang Connectivity Table (BCT) that quantify the boomerang switching effect in Sboxes wit regard to the boomerang attack. They apply their method to SKINNY and show that the probabilities of the attacks presented in [LGS17] are not precise. Later in [SQH19], the authors re-evaluated the probabilities of the boomerang dstinguishers from [LGS17] using a generalized framework for the BCT. In [WP19], a 4-round boomerang distinguisher on SKINNY with probability 1 is presented.
In [AST + 17], the authors proposed a method to model the actual DDT of large Sboxes in order to compute exact probabilities of differential trails. Applied to SKINNY members with a block size of 128 bit, the authors showed that the probability of any 14-round (single-key) differential trail is upper bounded by 2 −128 , while the designers proved a lower bound of 61 active Sboxes (ensuring only a probability upper bounded by 2 −122 ).
In [LTW18], the authors present algorithms for finding subspace trails. They find 5-round subspace trails for all SKINNY members.
In [ABI + 18], the authors conduct an exhaustive search over all possible word permutations to be used as a replacement for the ShiftRows permutation and derived the minimum number of active Sboxes with regard to differential cryptanalysis using Matsui's branch-and-bound algorithm. Their results show that the ShiftRows permutation used in SKINNY is actually among the best permutations. By using Matsui's algorithm, they computed the bounds for up to 40 rounds in the single-key setting, while the designers only gave bounds for up to 22 rounds. Table 7 is updated with their improved bounds starting from 23 rounds.
In [BCC19], the authors present a new framework for studying mixture-differential distinguishers and the multiple-of-8 property, two kind of distinguishers that were recently introduced to distinguish round-reduced versions of the AES. The authors analyze AESlike SPN ciphers with regard to those properties and show that similar results as for the AES can be obtained even for AES-like ciphers for which the MixColumns operation has non-optimal branch number. Applied to SKINNY, the authors show that 5-round SKINNY has the multiple-of-2 h property, where h ∈ {1, 2, . . . , 11, 13}.
In [BGLS19], the authors present a tool, called Peigen, for evaluating cryptographic properties of Sboxes. They analyzed the Sbox of SKINNY-128 and found out that it has 601 linear structures and that it is (7, 2)-linear and (3, 7)-linear, meaning that 2 2 components are affine on all cosets of a certain 7-dimensional linear subspace, resp., 2 7 components are affine on all cosets of a certain 3-dimensional linear subspace.
In [KB19], the authors experimentally evaluated the diffusion properties and the indices of strong nonlinearity of the round functions of some lightweight block ciphers, including SKINNY.
In [ZCGP20], the authors introduce a new method for checking the resistance of an SPN cipher against integral distinguishers, truncated differentials and impossible differential distinguishers in the single-key setting by considering an algebraic representation of the round function and analyzing its structure. They could find several 10-round integral distinguishers, the 11-round impossible differential distinguishers, and a 7-round truncated differential distinguisher. They further analyze SKINNY-128-128 explicitly and provide a practical 11-round attack in the single-key model.
In [ZDM + 20], the authors provide a new key-recovery model of related-key rectangle attacks on block ciphers with linear key schedules. Applying their technique to SKINNY, the authors obtain a 28-round related-tweakey rectangle attack on SKINNY-128-384 with time complexity 2 315.25 , data of 2 122 chosen plaintexts and memory 2 122.32 .
As a summary, Table 8 shows the maximum number of rounds that can be attacked so far. Both of the underlying primitives SKINNY-128-256 and SKINNY-128-384 still offer a security margin of at least 50%. In the context of SKINNY-AEAD and SKINNY-Hash, most of the attack model does not hold (for example due to data limit or limited control over the input to the TBC) and the attacks are not applicable to SKINNY-AEAD or SKINNY-Hash that uses the same reduced-round SKINNY. For example, having a theoretical attack on 28-round SKINNY-128-384 does not imply an attack on SKINNY-AEAD or SKINNY-Hash that uses 28-round SKINNY-128-384.

Hardware Implementations
We also provide performance results and area footprints of SKINNY-AEAD and SKINNY-Hash when implemented on a hardware platform. To this end, in addition to the tk3 and tk2 constructions of SKINNY-Hash, we realized two sets of implementations for each SKINNY-AEAD variant: one as encryption-only and the other one supporting both encryption and decryption functionalities. We further considered two different instances of the underlying SKINNY module: a round-based implementation performing every cipher round in a clock cycle, and a byte-serial implementation mainly processing a single byte per clock cycle. The corresponding results are depicted in Table 9 and Table 11, respectively, where the area footprint (in GE), maximum clock frequency, and maximum throughput for two standard cell libraries IBM 130 and UMC 90 are reported. In order to achieve the maximum throughput, we simulated the SKINNY-AEAD implementations with 100 blocks of 16-byte associated data A and 100 blocks of 16-byte message M , thereby obtaining the required number of clock cycles. Note that the number of clock cycles is independent of the value of the given associated data and the message as our implementations are constant-time preventing any leakage through the timing side channel. For SKINNY-Hash we obtained the number of required clock cycles by simulating the implementation with a message of 100 blocks of 16-byte (for SKINNY-tk3-Hash) and a message of 100 blocks of 4-byte (for SKINNY-tk2-Hash).
We further realized the side-channel protected variant of all aforementioned implementations. We applied a masking countermeasure with 2 shares and made use of the concept explained in [RBN + 15], i.e., how to achieve dth order security using d + 1 shares. It is noteworthy that due to the special construction of the SKINNY Sbox, its SCA-protected version with 2 shares does not require any fresh randomness. Similar observations have been reported in [RBN + 15] and [DDE + 19], where particular functions (mainly made by a combination of AND-XOR) can be uniformly shared without any fresh masks. Therefore, in all variants of SKINNY-AEAD, it is sufficient to present the entire inputs and output (including the associated data, message, key, tag and output) by two uniformly-masked additive shares. It also holds for SKINNY-Hash, while SKINNY-tk3-Hash requires an additional 384-bit of initial mask used to initialize the state S 384 in a shared way with 2 shares. Trivially, SKINNY-tk3-Hash also needs such an initial mask (of 256 bits). The performance results and the area footprint of all implementations are given in Table 10 and Table 12. The particular implementations can be found in the supplementary material and at https://sites.google.com/site/skinnycipher/downloads.    (out of 56 rounds). All these attacks have very high complexity, much more than 2 200 in computational complexity and sometimes up to almost 2 384 , and only work in the related-tweakey model where differences need to also be inserted in the tweak and/or key input. This means that for both SKINNY-128-256 and SKINNY-128-384 versions, the security margin is at least of 50% and actually much more if one considers only single-key attacks and/or attacks with a complexity lower than 2 128 (in the single-key model, the best known attacks against SKINNY-128-256 cover only 20 rounds [TAY17], while the best attack against SKINNY-128-384 covers 22 rounds [TAY17, SSD + 18, CSSH19], again all these attacks having a very high computational complexity). Most block ciphers have a security margin usually at very best around 20/30%. This indicates that a 50% security margin may be overly large. For this reason, we naturally propose new variants for the versions of SKINNY-AEAD and SKINNY-Hash based on SKINNY-128-384: SKINNY-AEAD-M1+, SKINNY-AEAD-M2+, SKINNY-AEAD-M3+, SKINNY-AEAD-M4+ as well as SKINNY-tk3-Hash+. These members share exactly the same specifications as SKINNY-AEAD-M1, SKINNY-AEAD-M2, SKINNY-AEAD-M3, SKINNY-AEAD-M4 and SKINNY-tk3-Hash, except that the number of SKINNY-128-384 rounds is reduced from 56 to 40 to give a very comfortable security margin of around 30% (in the worst-case related-tweakey scenario, without even excluding attacks with complexity significantly higher than 2 128 ). The security claims of these new members are exactly the same as the security claims of the old members they are based on, but they are expected to be around 1.4x faster than their counterparts for the same area cost.