The DRACO Stream Cipher A Power-efficient Small-state Stream Cipher with Full Provable Security against TMDTO Attacks

. Stream ciphers are vulnerable to generic time-memory-data tradeoff attacks. These attacks reduce the security level to half of the cipher’s internal state size. The conventional way to handle this vulnerability is to design the cipher with an internal state twice as large as the desired security level. In lightweight cryptography and heavily resource constrained devices, a large internal state size is a big drawback for the cipher. This design principle can be found in the eSTREAM portfolio members Grain and Trivium. Recently proposals have been made that reduce the internal state size. These ciphers distinguish between a volatile internal state and a non-volatile internal state. The volatile part would typically be updated during a state update while the non-volatile part remained constant. Cipher proposals like Sprout, Plantlet , Fruit and Atom reuse the secret key as non-volatile part of the cipher. However, when considering indistinguishability none of the ciphers mentioned above provides security beyond the birthday bound with regard to the volatile internal state. Partially this is due to the lack of a proper proof of security. We present a new stream cipher proposal called Draco which implements a construction scheme called CIVK . In contrast to the ciphers mentioned above, CIVK uses the initial value and a key prefix as its non-volatile state. Draco builds upon CIVK and uses a 128-bit key and a 96-bit initial value and requires 23 % less area and 31 % less power than Grain-128a at 10 MHz. Further, we present a proof that CIVK provides full security with regard to the volatile internal state length against distinguishing attacks. This makes Draco a suitable cipher choice for ultra-lightweight devices like RFID tags.


Introduction
Stream ciphers. In symmetric key cryptography, we typically distinguish two types of encryption schemes: block ciphers and stream ciphers. Block ciphers divide a plaintext into blocks of a fixed size and encrypt one such block of data as a whole. Stream ciphers on the other hand consider the plaintext as a continuous stream of data. The stream cipher maintains an internal state and in each step it outputs one bit or several bits and updates its internal state. Throughout this work, we consider individual bit outputs. The output bit stream is then combined with the plaintext, usually using the XOR operation.
One advantage of stream ciphers is that their resource requirements are lower than those of block ciphers in many application scenarios. This makes them particularly useful in lightweight cryptography. Instances of stream ciphers are used in the GSM cellular phone standard (A5/1), Bluetooth (E0) and wireless networking (RC4). Vulnerabilities. Stream ciphers are vulnerable to time-memory-data tradeoff attacks [Bab95,Gol96,BS00]. These types of attacks exploit the birthday paradox to recover an internal state. This internal state can then be used to decrypt the remaining ciphertext. Due to the birthday paradox the security of such ciphers is typically capped at half the size of the internal state. Accordingly, this has influenced the design of stream ciphers in such a way that the internal state size is at least twice the size of the desired security level. This is in stark contrast to the lightweight principle of stream ciphers, since a larger state necessarily increases resource requirements. Stream ciphers that employ a large internal state are the eSTREAM portfolio members Grain [HJM06] and Trivium [CP05]. We refer to these ciphers as the large-state-small-key construction, in short LSSK. Recent work. Recently, efforts have been made to reduce the internal state size while still retaining a reasonable security level. Lizard [HKM17b] raises the security against key recovery attacks beyond the birthday bound, reaching a security level of 2n/3, where n denotes the internal state's size. It does this by adding the secret key to its internal state in the last step of the state initialization. Its security against distinguishing attacks however, remains at the birthday barrier [HK15].
In addition to the volatile internal state, the stream ciphers Fruit [AGH18], Plantlet [MAM16] and Sprout [AM15] continuously use the secret key stored in non-volatile memory during their state update. The hope was that the additional key bits would enhance the security beyond the birthday bound with regard to the volatile internal state bits. However, these constructions were not equipped with a proof of security and they were eventually successfully attacked and broken [HKMZ18]. Atom [BCI + 21] also uses the secret key continuously. However it does not provide beyond the birthday bound security against distinguishing attacks as the attack presented in [HKMZ18] also applies here. We refer to these ciphers as the continuous-key construction, in short CKEY.
A third proposal was recently made in [HKM17a]. Instead of continuously using the non-volatile secret key, the non-volatile initial value is employed during the state update. A proof of security was later published in [HKM19]. We refer to these ciphers as the continuous-IV construction, in short CIV. Contribution. In this work we will present our new stream cipher proposal called Draco. Draco uses a 128-bit volatile internal state and a 128-bit non-volatile internal state. The non-volatile state consists of the initial value with a length of 96 bits and a key prefix with length 32 bits.
This new generic scheme, that uses a non-volatile state consisting of the initial value and key prefix, is called CIVK. In Section 5 we provide a security analysis in the random oracle model and we prove that CIVK provides full security against generic time-memory-data tradeoff attacks with regard to the volatile state length. In particular, this implies that any generic distinguishing attack against CIVK has a time complexity of O(2 ℓv ), where ℓ v denotes the volatile internal state size. In case of Draco, a time complexity of 2 128 steps is needed for a successful distinguishing attack. The corresponding attack on CIVK can be found in Subsection 4.2 and therefore the bound shown in Section 5 is tight.
To the best of our knowledge, it is the first small-state stream cipher that achieves a full 128-bit security level against key-recovery and distinguishing attacks. Our main variant of Draco stores the key prefix and the IV externally. In an ultra-lightweight scenario, like RFIDs where the secret key is burned into the device or stored in an EEPROM and the frame counter is used as the IV, Draco needs 23 % less area and 31 % less power than Grain-128a at 10 MHz. The saving in power stems from reduced area requirements but particularly also from the fact that unlike previous ciphers such as , only half of the state bits are constantly updated, thus significantly reducing costly dynamic power consumption.
For high-performance environments, we also present the variant Draco [KI] where the secret key and the initial value are stored inside the Draco hardware module while still only 128 bits are used during the state update. At a clock speed of 1 GHz Draco needs about 34 % less energy than Grain-128a. This demonstrates that not all internal state bits need to be constantly updated to achieve a security level of 128 bits. For more details please refer to Section 9.
Draco is a stream cipher that operates in packet mode, i.e. there may be up to 2 32 bits, i.e. 512 MiB, of output keystream per key-IV-pair. After this limit is reached, a new IV has to be used. No IV may be used twice. In Subsection 7.6 we argue that most transmission protocols use a packet size much lower than 512 MiB and therefore we see the packet length as a valid constraint to keystream generation. Outline. In Section 2 we provide the basics of stream ciphers. In Section 3 we present the current stream cipher constructions with an enhanced state and introduce the CIVK construction. In Section 4 we review time-memory-data tradeoff attacks on classic stream ciphers and on CIVK. In Section 5 we present the proof of security for CIVK. In Section 6 we present the specification of the Draco cipher. In Section 7 we argue about the choice of Draco's parameters. In Section 8 we analyze Draco's security against various attacks. In Section 9 we present our hardware results. Section 10 concludes this work.

Stream Cipher Basics
Stream ciphers are symmetric encryption algorithms intended for the online encryption of plaintext bitstreams X which have to pass an insecure channel. The encryption is performed by bitwise addition of a keystream S to X, which is generated in dependence of a secret symmetric session key k and public initial values. The legal recipient, who also knows k, decrypts the encrypted bitstream Y = X ⊕ S by generating S and computing X = Y ⊕ S. In this work, we consider key stream generator-based stream ciphers, i.e. stream ciphers which generate the keystream using a so-called keystream generator (KSG).

Keystream Generation
Keystream generators are stepwise working devices that typically consist of interconnected feedback shift registers (FSRs). KSGs can be formally specified by finite automata. The KSG has an internal state length ℓ s corresponding to the total amount of FSR register cells. The corresponding set of internal states is denoted by Q := {0, 1} ℓs . In the following we describe each step of the keystream generation process.
Initially, the secret key k, the initial value x and possibly a constant C are loaded into the KSG's register cells. We refer to this state as the loading state q load (k, x, C) ∈ Q.
The state initialization algorithm computes the initial state q init (k, x, C) ∈ Q from the loading state q load (k, x, C). 1 This is done to ensure a sufficient level of diffusion and confusion of the initial state bits. Normally, the main component of the state initialization algorithm is an operation called mixing, performed by the KSG. In many stream ciphers, mixing is done by clocking the KSG multiple times without producing keystream bits.
In every clock cycle i ≥ 0, the KSG produces an output bit z i = f (q i ) according to an output function f : Q → {0, 1}. Using the state update function π : Q → Q, the internal state q i is then updated to q i+1 = π(q i ). Typically, π is bijective and efficiently invertible. We denote multiple evaluations of the state update function π by π r , e.g. q 2 = π 2 (q 0 ) = π(π(q 0 )). The output key stream S(q 0 ) is defined by concatenating all the outputs z 0 ||z 1 ||z 2 ||z 3 || · · · Some stream ciphers define an additional parameter. The packet length ℓ p is defined to be the maximum count of output bits per IV x. After reaching ℓ p output bits, the IV x is changed and state initialization is performed again.

Security Requirements
The main security requirement for stream ciphers is that it must be hard to distinguish the keystream generated on a randomly chosen secret key from a truly random bitstream. This implies the hardness of the state recovery problem: Let S ≤ℓp (q) denote the first ℓ p bits of keystream generated from the internal state q. For a given piece of keystream z, compute an internal state q with z = S ≤ℓp (q).
This also implies that it is hard to recover the secret session key k.
The main drawback of KSG-based stream ciphers is their vulnerability to generic time-memory-data tradeoff (TMDTO) attacks [Bab95,Gol96,BS00]. These attacks allow to recover an internal state in time 2 ℓs/2 and they have a memory and data requirement of 2 ℓs/2 . This reduces their effective security level to one half of the internal state size.
Moreover, several stream ciphers like Trivium have an efficiently invertible state initialization algorithm which allows to efficiently compute the secret session key from one recovered internal state. Commercial stream ciphers like A5/1 and Bluetooth E 0 ignored the existence of TMDTO attacks. This brought their security level below the widely accepted bound of 80 bits. The eSTREAM portfolio members Trivium [CP05] and Grain v1 [HJM06] demonstrate awareness of TMDTO attacks by being designed in accordance with the so-called LSSK construction. While for Trivium and Grain the session key length is 80 bits, the internal state lengths of Grain v1 and Trivium are 160 bits and 288 bits, respectively. This is in stark contrast to the lightweight principle of stream ciphers, since a larger state necessarily increases resource requirements.

Enhanced State Stream Ciphers
In the last years an intensive search for new stream cipher constructions was conducted. In particular, the goal was to decrease the internal state size while still retaining a high resistance against TMDTO attacks. This research is accompanied by the development of new information-theoretic methods that allow to prove the security of generic stream cipher constructions against TMDTO attacks. This is similar to the formal framework of Even-Mansour ciphers that was used to analyze the security of generic block cipher constructions.
Three generic stream cipher constructions have been proposed so far: (1) the Lizard construction [HKM17b], (2) the CKEY construction (underlying Fruit [AGH18], Plantlet [MAM16], Sprout [AM15] and Atom [BCI + 21]), and (3) the CIV construction [HKM17a,HKM19]. The CKEY and the CIV construction rely on an enhanced state which will be explained below in Subsection 3.1. In the following, we will give an overview over these three constructions. Further we will introduce our new construction CIVK that combines the ideas of CKEY and CIV to enhance the security level.

Enhancing the Internal State
The CKEY and CIV constructions divide the internal state into a volatile part of length ℓ v and a non-volatile part of length ℓ nv := ℓ s − ℓ v . The non-volatile memory remains unchanged during state update and state initialization. In practice, this allows to reduce the amount of costly volatile register cells.
Continuous Key. The first example was the CKEY construction. It uses the non-volatile secret key not only for initialization, as is common, but also during state initialization and keystream generation. This principle underlies the stream cipher proposals Sprout, Plantlet, Fruit and Atom. However, it was shown in [HKM19] that the resistance of the CKEY construction against generic TMDTO distinguishing attacks is at most ℓ v /2. Continuous IV. The CIV construction was first proposed in [HKM17a]. Contrary to CKEY, it does not use the non-volatile key, but it uses the initial values as the nonvolatile part of the internal state. This construction provides a provable security level of ℓ v − log 2 (ℓ p ) [HKM19]. Note that there are no stream cipher instantiations based on the CIV construction so far. Continuous IV & Key. Our new construction CIVK uses the initial values as part of the non-volatile state as well as a prefix of length log 2 (ℓ p ) of the secret key. In particular, using the initial values and a key prefix from non-volatile memory, the volatile memory is initialized with the key only, with ℓ v = ℓ k , where ℓ k denotes the key length. We also define the non-volatile internal state length ℓ nv to be equal to the volatile state length, i.e. we have ℓ nv = ℓ v = ℓ k . The IV length ℓ IV is determined by the key and packet length: As we will prove in this work, this allows us to reach a security level of the entire volatile internal state length ℓ v , resp. of the entire key length ℓ k . In the next subsection we will specify the CIVK construction in detail. The proof can be found in Section 5, the resulting bound can be found in Subsubsection 5.2.4.

The CIVK Construction
We denote by Q nv the non-volatile internal state space and by Q v the volatile internal state space. We denote an internal state by ⟨a | b⟩, where the left side a ∈ Q nv denotes the non-volatile internal state and the right side b ∈ Q v denotes the volatile internal state. We chose this notation to make it easier to distinguish internal states from arbitrary tuples. Also, if for example the non-volatile state a consists of two parts a 1 and a 2 , we will denote this state by ⟨a 1 , a 2 | b⟩. The key space is denoted by K and the IV space is denoted by IV.
The following description will define the keystream generation using the CIVK construction.
Packet length. CIVK is a construction that works in packet mode. The parameter ℓ p defines the packet length. For each key-IV-pair (k, x) ∈ K × IV CIVK may output up to ℓ p keystream bits.
Enhanced State. The internal state is divided into a volatile part that is updated during state updates and a non-volatile part that is not updated during state updates. The volatile part of the loading state consists of the secret key k only. The non-volatile part consists of the IV x and a key prefix k pre of length at least log(ℓ p ). For a given key-IV-pair (k, x) ∈ K × IV the loading state of CIVK is denoted by ⟨x, k pre | k⟩. Typically the IV x and the key prefix k pre will be concatenated.
State Size. The parameters ℓ k and ℓ p are to be specified by the respective cipher built upon CIVK. The length of the non-volatile internal state is equal to the key length, i.e. ℓ v = ℓ k . The key prefix has a length of at least log(ℓ p ) as this allows to improve upon the CIV construction to reach a security level of 2 ℓ k instead of 2 ℓ k /ℓ p . Further, we want to be able to generate 2 ℓ k bits of keystream per key k. As there is a limit of ℓ p bits per key-IV-pair (x, k), the IV length has to be at least log(2 ℓ k − ℓ p ). This will ensure that the non-volatile state length ℓ nv has at least the size of the volatile state, i.e. ℓ nv ≥ ℓ v = ℓ k .
Mixing Function. The loading state ⟨x, k pre | k⟩ is used as input to the mixing function p. Its task is to provide the initial state with enough confusion and diffusion for further operation and corresponds to clocking the cipher without producing output bits.
State Update. The state update function π updates an internal state ⟨x, k pre | y⟩ to the next internal state ⟨x, k pre | y ′ ⟩.
r successive invocations of the state update function π on an internal state ⟨x, k pre | y⟩ are denoted by π r ⟨x, k pre | y⟩, e.g. for three successive invocations we write: It is needed that the period of the state update function π is larger than ℓ p for the entire internal state space. This means that for any internal state ⟨x, k pre | y⟩ the set {⟨x, k pre | y⟩, π 1 ⟨x, k pre | y⟩, . . . , π ℓp−1 ⟨x, k pre | y⟩} contains ℓ p distinct elements.

Output Function.
The output function f maps an internal state ⟨x, k pre | y⟩ to an output bit z ∈ {0, 1}.
Keystream Generation. Let (k, x) be an arbitrary key-IV-pair and let k pre be the corresponding key prefix. Using the functions defined above we can define the construction function e that corresponds to the keystream generation using the CIVK construction: We consider individual output bits and e outputs the r-th keystream bit. The entire keystream packet of length ℓ p for a key-IV-pair (k, x) looks as follows: e(k, x, 0) || e(k, x, 1) || · · · || e(k, x, ℓ p − 2) || e(k, x, ℓ p − 1).

Comments on the State Length
The volatile loading state consists of the key k only and therefore we have ℓ v = ℓ k , where ℓ k is regarded as fixed. The length of the key prefix k pre and the IV x are not fixed but rather lower bounded as described above. In the following we will elaborate on the consequences of smaller or larger lengths. If the length of the non-volatile state were lower than that of the volatile state, the attack by Babbage and Golić presented in Subsection 4.2 would yield a smaller complexity than 2 ℓ k .
The key prefix allows to improve upon the bound of CIV and is hence chosen to be at least log(ℓ p ). A smaller value will decrease security as it makes the second attack in Subsection 4.2 more probable. 2 A longer key prefix without compromising the IV length will increase the complexities of both attacks in in Subsection 4.2. Yet, the exhaustive key search remains with a complexity of 2 ℓ k .
The IV length mainly influences how many keystream bits per key can be generated. With a longer IV the second attack presented in Subsection 4.2 would still yield a complexity of 2 ℓ k . Theoretically this would allow for more than 2 ℓ k bits of keystream to be generated. A smaller IV would reduce the total amount of keystream bits that can be generated per key. Depending on the size of the reduction this may not be an issue.

Hardware Implications of Continuous IV Access
Now one may argue from a hardware perspective that while the secret key has to be stored anyhow (e.g., also for LSSK stream ciphers such as Trivium, Grain etc.) in order to be reused with other IVs, this would not be the case for the IV. Hence, at first sight, assuming that the IV is still accessible after state initialization might be considered cheating. However, we do not think that this is the case for many application scenarios. For example, in A5/1 of the GSM standard, the IV used for encrypting a data packet is the respective (sequentially incremented) 22-bit frame number. Hence, any A5/1 device needs some memory containing this frame number anyhow. Similarly, the DECT standard for cordless telephone systems relies on frame numbers as IVs for encryption.
In general, especially for ciphers with small IV spaces, there always has to be a mechanism like a stepwise incremented IV storage to make sure that the same IV is not accidentally used twice under the same secret key. Similarly, in all communication scenarios like A5/1 or DECT, where the packet number serves at the same time as an IV source, one will always have this information. Note that such packet counters are not only prevalent for standard network transmission protocols such as TCP/IP, but are also a common component of lightweight wireless devices such as RFID tags, e.g., for synchronization purposes and in order to protect against replay attacks.
The vast majority of RFID tags are based on ASICs (application-specific integrated circuits), whose primary types of writable storage are non-volatile EEPROMs and volatile flip-flops. In [MAM16], the designers of Plantlet extensively studied the effects of continuously reading the secret key from an EEPROM during keystream generation. Despite certain drawbacks of this approach, such as an increased design complexity and a potential reduction of the maximal achievable throughput, they concluded that this is in fact feasible. This naturally holds for any other kind of data stored in an EEPROM, too, such as a packet counter serving as the IV source and being accessed in the same way. In particular, the key-IV-schedule of Draco (cf. Section 6) accesses the 96 IV bits and the bits of the 32-bit key prefix in sequential order, just like the key schedule of Plantlet accesses the cipher's 80-bit key in sequential order.
The designers of Plantlet found this to be beneficial in terms of limiting the negative performance impact of continuous EEPROM access on the maximal achievable throughput. If the storage location of the IV source (such as the aforementioned network packet counter) is an array of flip-flops instead, the feasibility of continuous IV access is straightforward, because the ASIC's cryptographic logic can be connected to those flip-flops through wires at practically no cost.
In the previous paragraphs, we have explained that there are various scenarios where the cipher's key and/or IV are actually already present in some storage location on the device, allowing to reuse this when realizing continuous key-IV access. In fact, the resulting savings on flip-flops (and, thus, chip area) inside the cipher module have long been the major motivation for such designs. In Section 9, we show that due to this focus on the number of flip-flops, ignoring their actual usage, a great potential for reducing power consumption has been missed, so far.
More precisely, with our implementation variant Draco [KI] we demonstrate that even if a 2n-bit storage is required inside the cipher hardware module to achieve n-bit security against TMDTO attacks, algorithmically keeping half of this state constant is much more efficient (cf. Tab. 2 in Section 9) than and equally secure (see Section 8 and Section 5) as constantly updating the whole of it. That is, we show that even if the key and the IV are stored locally inside of the cipher hardware module (thus eliminating all the scenario assumptions / usage restrictions described above) in order to implement continuous key-IV access, our new small-state stream cipher Draco still allows to save up to 34 % of energy as compared to Grain-128a when producing 10 kbit of keystream (including state initialization) at a clock speed of 1 GHz.

Time-Memory-Data Tradeoff Attacks
TMDTO attacks are generic attacks that only have black-box access to the mixing algorithm and the output function of the KSG. The attacks assume that these components are ideally designed. This means in particular that no internals of the underlying components can be exploited by the adversary. The goal of TMDTO attacks is to discover weaknesses in how these components interact to produce the keystream.
TMDTO attacks are often divided into a precomputation phase and an online phase. In the precomputation phase some helping data structure is computed. In the online phase the attack is executed based on the available keystream and the helping data structure. Four cost dimensions are relevant to these attacks: • The amount of keystream available D in the online phase.
• The time consumption T of the online phase.
• The time consumption P of the precomputation phase.
• The total memory consumption M of precomputation and online phase.
The costs are expressed in a so-called tradeoff curve. It consists of all 4-tuples (T, M, D, P ) of cost values that allow for a successful attack with high probability. For attacks without a precomputation phase, the cost dimension P is not considered.
The first TMDTO attacks against KSG-based stream ciphers go back to Babbage [Bab95] and Golić [Gol96]. They yield the tradeoff curve T · D = 2 ℓs . In particular, for T = D = 2 ℓs/2 , this caps the security of stream ciphers at the birthday bound. We describe the idea of these attacks in Subsection 4.1.
Biryukov and Shamir [BS00] combined the attacks of Babbage and Golić with Hellman's attack on block ciphers [Hel80]. This yields the tradeoff curve T · M 2 · D 2 = 2 2·ℓs with P = 2 ℓs /D. In [BS00], Biryukov and Shamir also discuss a technique called BSW-sampling, which was originally used by Biryukov, Shamir, and Wagner in [BSW01] to attack the GSM cipher A5/1. While BSW-sampling allows to relax the restriction T ≥ D 2 in the above attack, the tradeoff curve T · M 2 · D 2 = 2 2·ℓs and the relation P = 2 ℓs /D remain unchanged. Hence, if one considers precomputation to be part of the overall attack complexity (as we do 3 ), even the use of BSW-sampling does not allow for attacks with overall complexity lower than 2 ℓs/2 . Moreover, the applicability of BSW-sampling is highly cipher specific (see [BS00] for further details) and, thus, corresponding TMDTO attacks are not fully generic.
In [HS05], Hong and Sarkar consider the TMDTO case of sampling pairs of keys and IVs instead of sampling from the space of internal states. In a single-key scenario, as treated in our analysis, this approach has an overall complexity (including precomputation) at least as large the complexity of exhaustive key search. Only in scenarios with multiple keys, where the attacker's goal is to discover one of these keys, a lower overall complexity can be achieved. Consequently, neither of the results in [BS00] and [HS05] conflicts with our security bounds for CKEY and CIVK.
In the remainder of this section we first sketch the original attacks by Babbage and Golić. Then we show how to modify and apply these attacks the CIVK construction, in order to derive a corresponding upper bound for security against TMDTO attacks. Together with the results presented in Subsubsection 5.2.4 this will yield a tight bound for CIVK.

The TMDTO Attack of Babbage and Golić
Suppose that the attacker knows a set S of D keystream blocks of length ℓ s . These blocks originate from one session with the secret session key k, but are allowed to stem from different keystream packets. Let R = {q 1 , . . . , q D } denote the set of corresponding internal states. The attacker generates a set of T internal state/keystream block pairs (y, S ≤ℓs (y)) for randomly chosen internal states y ∈ Q. If D · T ≈ 2 ℓs , there will be a collision between the observed keystream and the generated keystream with high probability according to the birthday paradox. In particular for D = T = 2 ℓs/2 the security is capped at the birthday bound. As a result, the attacker knows the internal state q j corresponding to one keystream block of a packet generated with respect to a known initial value x. This allows the adversary to compute the entire keystream packet corresponding to k and x as well as the initial state q init for this packet. Knowing an internal state, the adversary can simply compute the following internal states using the known state update function and use these states as inputs to the output function to generate the remaining keystream. Moreover, for Trivium, Grain v1, and many other ciphers, it is possible to efficiently recover the secret key k from q init .

TMDTO Attacks against CIVK
The CIVK construction refers to a cipher working in packet mode. The packet length is ℓ p . In CIVK the IV x and a prefix of length log 2 (ℓ p ) of the secret session key k are continuously employed during mixing and keystream generation. Thus the IV and the key prefix become the non-volatile part of the cipher's internal state. The secret key is loaded into the volatile part of the internal state.
Note that there are two ways of applying the Babbage-Golić TMDTO attack to this cipher. The first approach is to mount the attack in its original form, which does not take the special structure of the internal states into account. This attack has the tradeoff curve T · D = 2 ℓs = 2 ℓnv+ℓv . Let ℓ IV be the length of the IV x. The maximum amount of data D that can be obtained is ℓ p · 2 ℓ IV = 2 log 2 (ℓp)+ℓ IV = 2 ℓnv as at most ℓ p bits per IV x are produced. For this maximal D the tradeoff curve yields a time complexity of T = 2 ℓv = 2 ℓ k .
The second approach is to make use of the fact that the IVs for the keystream packets are known by the adversary. We assume that the adversary obtains p keystream packets (of length ℓ p ) corresponding to the initial values x 1 to x p . This corresponds to a data complexity of D = p · ℓ p . For each x i the attacker generates s times a random key prefix k pre i and a random volatile internal state z i ∈ {0, 1} ℓv . From x i , k pre i and z i the attacker computes an output keystream block S ≤ℓs (x,k, z) of length ℓ s .
A collision in the volatile internal state occurs with high probability if D · s = 2 ℓv . Additionally, the adversary needs a correct non-volatile internal state. The IV is known to the attacker and the key prefix is guessed correctly with a probability of ℓ −1 p . Hence, the probability that one out of the s generated key prefixes per IV x i is correct, is s/ℓ p . We obtain that an attack is successful if Note that regardless of how many keystream packets are observed, the attacker always has a time complexity of T = p · s = 2 ℓv . Insofar, it would be ideal to only attack one keystream packet. Also note that, assuming a chosen-IV attacker, all p · s = 2 ℓ k keystream packets can be precomputed and need to be stored in efficiently searchable data structure, i.e. a binary tree. This will require a time complexity of 2 ℓ k in the offline phase and a space (memory) complexity of 2 ℓ k . The online phase, when observing a keystream packet, will then have a time complexity of log 2 (2 ℓ k ) = ℓ k to search for the collision on the previously computed keystream packets. While the online time complexity is significantly reduced, there is an excessive amount of precomputation time necessary and the space complexity in the online phase is 2 ℓ k . Even if a time complexity of 2 ℓ k were not excessive, a space complexity of 2 ℓ k may very well be.

Proof of Security
In the following, we will present the preliminaries, notation and random oracle model necessary for the proof of security.

Random Oracle Model
An adversary will be interacting with a set of three oracles in one of two worlds: the real world or the ideal world. There will be an oracle P for the mixing function, an oracle F for the output function and an oracle E for the construction function. 4 The adversary can query the oracles with the inputs of the respective functions and it will receive answers from the oracle. These query-answer-pairs are collected by the adversary in a transcript.
The P -and F -oracles will answer their queries using ideal randomized primitives in either world: The P -oracle will use a random permutation P (more specifically: multiple random permutations as described in Subsubsection 5.1.3) and the F -oracle will use a random function F. In the real world, the E-oracle uses P and F as underlying building blocks. In the ideal world, the E-oracle will have access to another independent random function E. By assuming the underlying building blocks to be ideal one can abstract from possible weaknesses in the mixing function and the output function that an implementation may have and show that the scheme, the interaction of those building blocks, is secure. This will not prove an instantiation to be secure but provide a plausible justification for the structure of a cipher built upon CIVK, in this case Draco.
In the ideal world E, corresponding to the encryption function, will sample the output bits uniformly at random from {0, 1}. We will show that the adversary can not distinguish the ideal world from the real world in this scenario. In particular, this will show that the keystream generated by the "real" encryption function is indistinguishable from a truly random bitstream.
We will first describe the proof technique, then the distinguishing game and then explain in detail how the oracle queries and the adversary's transcript look like.

H-coefficient Technique
To prove the security of CIVK we will use the H-coefficients technique. The H-coefficients technique [Pat08] is a proof method due to Patarin, where we consider the variant by Chen and Steinberger [CS14]. The results of the interaction of an adversary A with its oracles are collected in a transcript τ . The oracles can sample randomness prior to the interaction (often a key or an ideal primitive that is sampled beforehand), and are then deterministic throughout the experiment [CS14]. The task of A is to distinguish the real world O real from the ideal world O ideal . Let Θ real and Θ ideal denote the distribution of transcripts in the real and the ideal world, respectively. A transcript τ is called attainable if the probability to obtain τ in the ideal world -i.e. over Θ ideal -is non-zero. Then, the fundamental Lemma of the H-coefficients technique, the proof to which is given in [CS14,Pat08], states: Lemma 1 (Fundamental Lemma of the H-coefficient Technique [Pat08]). Assume, the set of attainable transcripts can be partitioned into two disjoint sets GoodT and BadT. Further assume that there exist ϵ 1 , ϵ 2 ≥ 0 such that for any transcript τ ∈ GoodT, it holds that Then, for all adversaries A, it holds that A's distinguishing advantage ∆A

The Distinguishing Game
In the beginning of the adversary's interaction with the oracles a key k $ ← K will be sampled uniformly at random from the key space K. Next, the adversary poses its questions to the P -, F -and E-oracles with the additional limit of at most ℓ p E-queries per IV x. All of the query-answer-pairs will be collected in the corresponding τ P , τ F and τ E transcripts. When the adversary is finished with its interaction with the oracles it is given the secret key k as well as the transcript τ α , which contains the intermediate values generated during an E-query and will be defined more explicitly in Subsubsection 5.1.4. Based on the transcript τ = (τ P , τ E , τ F , τ α , k) the adversary has to make its decision whether it was interacting with the real world or the ideal world and output a decision bit. If the adversary's guess is correct, it wins the game. Our task is to upper bound the adversary's success probability.

Oracle Queries
The adversary will be given access to the P -, F -and E-oracles that correspond to the mixing function, output function and construction function respectively. For clarity, we provide a table about how the inputs, as chosen by the adversary, and outputs of the respective queries look like and then we will explain how the oracles are implemented. Note that we index the variables with the query type. In particular k P and k F denote key guesses and need not be equal to the actual key k. We chose this notation to make it clear that when using k P , we refer to the variable that is used in place of the actual key k in a P -query.

P-oracle.
The P -oracle that corresponds to the mixing function will be implemented using random permutations. Note that the mixing function p keeps the non-volatile internal state unchanged and only the volatile internal state gets permuted. Hence, for every (x, k pre ) ∈ Q nv the volatile internal state is permuted using an independent random permutation P x,k pre : Q v → Q v . For each P x,k pre we will use lazy sampling: As the first P -query ⟨x P , k pre P | k P ⟩ arrives, the oracle will sample the answer uniformly at random from all 2 ℓv possible answers. As the second P -query arrives, the oracle will sample the answer uniformly at random from all 2 ℓv − 1 remaining possible answers, and so on. The permutation P is defined as follows: The adversary may query the P -oracle in either the forward or the backward direction.
Remark 1. In either world it is possible to choose k pre P and k P such that k pre P is not a prefix of k P . As the mixing function is bijective, the corresponding internal state ⟨x P , k pre P | y P ⟩ will be invalid, i.e. it will not occur during an actual encryption using CIVK. These states cannot be used to obtain a distinguishing event. We will ignore queries of this type as they yield no advantage to the adversary.
F-oracle. The F -oracle that corresponds to the output function will be implemented using a random function F : Only queries in the forward direction are allowed. We again use lazy sampling; as an F -query arrives the output is sampled uniformly at random from {0, 1}.

E-oracle.
The E-oracle is defined differently in the real world and in the ideal world: In the real world it is defined similarly to the construction function e and implicitly uses the secret key k, the random permutation P, the state update function π and the random function F: In the ideal world E : 1} is a random function with outputs sampled uniformly at random from {0, 1}.

Transcripts
All of the adversary's queries to the oracles and the corresponding answers will be collected in a transcript τ . In particular, we will keep separate transcripts τ P , τ F and τ E for each of the corresponding oracles P , F and E defined as follows: is an E-query and z E its answer.} The transcripts mentioned above are visible to the adversary as it makes its queries to the oracles. Once the adversary's interaction with the oracles is finished it will additionally be given the secret key k and the transcript τ α defined as follows: is an E-query and (α E , α r E ) its α-values.} The α-values for each E-query (x E , r E ) are defined as follows: The α-values correspond to the internal states that are generated during an E-query. α E represents the initial value and α r E E represents the input to the output function F. We will also sample these values in the ideal world, even though the construction function E does not depend on these.

Security Analysis
We will now perform the security analysis of CIVK. In particular, for the CIVK construction we will show the following: From an attacker's point of view, the goal is to recover an internal state by obtaining a collision between an observed keystream and a generated keystream. If the keystreams collide, there is a high probability that they were generated from the same internal state. In particular, the same internal state always generates the same keystream; in the real world.
In our model, the adversary does not have to rely on observing collisions in the keystream as it is provided the α-values at the end of the interaction. The α-values correspond to the internal states of the respective output bit. Hence, the adversary's goal will be to obtain a collision between the α-values and the inputs of the F -queries.
If there occurs a collision between an α-value (corresponding to an E-query) and an F -query the corresponding outputs of the E-query and the F -query will always be identical in the real world, as E internally uses P, π and F. In the ideal world however, E is another random function independent of P, π and F and hence colliding α-values and F -query inputs only lead to the same output bit with a probability of 1 2 . This will be used as a distinguishing, i.e. bad, event.

Bad Events
Note that the IV x is chosen by the adversary. There are two strategies for the adversary to obtain a collision between the α-values and an F -query input: 1. Guess the key prefix k pre as well as a volatile internal state y correctly and ask the corresponding F -query F ⟨x, k pre | y⟩.
2. Guess the key k correctly, ask the corresponding P -query P ⟨x, k pre | k⟩ to obtain ⟨x, k pre | y⟩ and ask the corresponding F -query F ⟨x, k pre | y⟩.
If the outputs of the F -query and the output of the E-query corresponding to the colliding α-value differ, the adversary surely is in the ideal world. We will introduce two bad events that correspond to the two strategies mentioned above.
bad 1 (Guessing the key prefix and an internal state) There exists an E-query (x E , r E ) and an F -query ⟨x F , k pre F | y F ⟩ such that bad 2 (Guessing the key) There exists a P -query ⟨x P , k pre P | k P ⟩, an E-query (x P , r E ) and an F -query π r E (P ⟨x P , k pre P | k P ⟩) such that Remark 2. Note that bad 2 represents a special case of bad 1 . In particular we have that Considering these as separate bad events will simplify our analysis.

Bounding the Bad Events
Lemma 3.
Proof. Since we are in the ideal world, the answers to the E-, F -and P -queries are independent of the secret key (k pre , k). For simplicity, we will sample the secret key (k pre , k) after the adversary's interaction with the oracles. Also, we will sample the values α E for each query after the adversary's interaction with the oracles. bad 2 : We will first consider the event bad 2 . There are at most q P -queries with at most q distinct (k pre P , k P ). The secret key (k pre , k) is sampled independently at random with regard to the uniform distribution from the set {0, 1} ℓ k . The probability of a collision with the secret key (k pre , k) can therefore be upper bounded by q/2 ℓ k .
The outputs of the E-and F -queries collide with a probability of 1/2. We obtain: bad 1 : We will now consider the event bad 1 . Since we already bounded the probability for bad 2 , we will now consider Pr[bad 1 |¬bad 2 ]. We decided to sample α E after the adversary's interaction with the oracles.
By conditioning on ¬bad 2 , we know that for all x ∈ IV no value P ⟨x, k pre | k⟩ has yet been sampled. This applies since there was no adversarial P -query with the correct key. This implies, as we sample the α-values after the adversary's interaction with the oracle, no α-value has yet been sampled.
To obtain a collision between an F -query and an α-value, three conditions need to apply: 1. The key prefix k pre needs to be guessed correctly.
2. The F -query and the E-query, resp. α-value, need to utilize the same IV x.
3. A collision in the volatile state needs to occur.
We will individually bound the conditions above. Fix any F-query F ⟨x F , k pre F | y F ⟩. 1. As we sample the key k after the adversary's interaction with the oracles, a key prefix collision occurs with a probability of 2. There are at most ℓ p E-queries, resp. α-values, to collide with, as the E-queries are limited to ℓ p queries per IV x ∈ IV.
As there are at most q queries we obtain Note that there could be P -queries utilizing x E and k pre with k P ̸ = k and therefore we need to upper bound the amount of queries above by q. We obtain that the probability of a single fixed F -query to collide with an α-value is upper bounded by: The outputs of the E-and F -queries collide with a probability of 1/2. Summing up over at most q F -queries we obtain: Lemma 3 follows from the union bound of the above individual bad events. Note that ℓ k = ℓ v .

Good Transcripts
We upper bounded the occurrence of a distinguishing event in the ideal world in the previous section. In this section it remains to show that in the absence of a distinguishing event, the ideal construction is indistinguishable from the real construction: Lemma 4. For any good transcript τ ∈ GoodT, it holds that Proof. In either world, the permutation P and the secret key k are sampled uniformly at random and are therefore trivially indistinguishable. We still need to argue about the answers to the E-and the F -queries. We define A as the set of all α r -values and F as the set of all F -query inputs that are contained in the transcript τ . Formally A is defined as: As collisions between E-and F -queries do not occur in the good transcripts, we know that A ∩ F = ∅. Therefore we can conclude that for all α r ∈ A, for all y ∈ F and for all z, z ′ ∈ {0, 1}: The above holds as all z, z ′ are independent random variables sampled uniformly from {0, 1}. In particular, in either world, we evaluate F and E on different inputs only. The outputs are then sampled by a random function.

Security Bounds
We obtain Lemma 2 from Lemma 1, Lemma 3 and Lemma 4. Ultimately we could verify that the CIVK construction reaches a security level equal to the key length ℓ k .

Design Specification of DRACO
The design of Draco is similar to that of Lizard [HKM17b], which was in turn inspired by the Grain family [HJMM08] of stream ciphers. In particular, the inner state of Draco is distributed over two interconnected feedback shift registers (FSRs) as depicted in Fig. 1. Note, however, that while Grain uses one linear feedback shift register (LFSR) and one nonlinear feedback shift register (NFSR), which, moreover, are of the same length, Draco (like Lizard) uses two NFSRs of different lengths instead. The reasons for this design choice will be explained in Section 7. Like in Grain, the third important building block besides the two FSRs is a nonlinear output function, which takes inputs from both shift registers and is also used as part of the state initialization algorithm. A major difference to Grain (and also Lizard) is that Draco continuously uses the IV and (parts of) the key not only during state initialization but also during keystream generation.
In the following, we describe the components of the cipher in detail (Sec. 6.1) and specify how it is operated during state initialization (Sec. 6.2) and keystream generation (Sec. 6.3). For the sake of clarity, subsections 6.1-6.3 contain only the technical aspects of Draco. Explanations of important design choices for our construction are given separately in Section 7, along with a discussion of the security properties of the particular components (e.g., the algebraic properties of the feedback functions).

NFSR1
Draco's NFSR1 replaces the LFSR of the Grain family of stream ciphers. NFSR1 is 33 bits wide and corresponds (with a slight adaption, see below) to the NFSR A 12 of the eSTREAM Phase 2 (hardware portfolio) candidate ACHTERBAHN-128/80 [GGK06]. For all starting states, it has a guaranteed period of 2 33 (i.e., truly maximal) and can be specified by the following update relation (during keystream generation), defining f 1 depicted in Fig. 1: Note that NFSR1 of Draco differs from NFSR A 12 of ACHTERBAHN-128/80 only in the additional final term ⊕¬S t 1 ¬S t 2 · · · ¬S t 30 ¬S t 31 ¬S t 32 , which turns the period 2 33 − 1 of A 12 into the truly maximal period 2 33 for NFSR1. That is, NFSR1 of Draco cannot get stuck in the all-zero state.

NFSR2
NFSR2 is 95 bits wide and uses a modified version of g from Grain-128a [ÅHJM11] as its feedback polynomial. More precisely, f 2 of NFSR2 (cf. Fig. 1) shifts six taps of g by two positions to the left in order to fit a 95 bit register (i.e., tap shifts 86 ← 88, 89 ← 91, 90 ← 92, 91 ← 93, 93 ← 95, 94 ← 96). Moreover, four quadratic monomials and one degree-three monomial are added to further strengthen Draco, inter alia, against algebraic attacks. This results in the following update relation (during keystream generation): Note that this update relation for B t+1 94 additionally contains the masking bit S t 0 from NFSR1 (analogously to the Grain family) as well as the KIS bit d t (unlike the Grain family).

State Initialization
The state initialization process can be divided into 2 phases and is similar to the classical Grain-type initialization.

Phase 1: Key Loading
The registers of the keystream generator are initialized as follows:

Phase 2: Grain-like Mixing
Clock the cipher 512 times without producing actual keystream. Instead, at time t = 0, . . . , 511, the output bit z t is fed back into both FSRs as depicted in Fig. 2. To avoid ambiguity, we now give the full update relations that will be used for NFSR2 and NFSR1 in phase 2. For t = 0, . . . , 511, compute where z t and d t are computed as described in Subsection 6.1.4.

Keystream Generation
At the end of the state initialization, the 33-bit (initial) state of NFSR1 is S 512 0 , . . . , S 512 32 and the 95-bit (initial) state of NFSR2 is B 512 0 , . . . , B 512 94 . The first keystream bit that is used for plaintext encryption is z 512 . For t ≥ 512, the states S t+1 0 , . . . , S t+1 32 and B t+1 0 , . . . , B t+1 94 and the output bit z t are computed using the relations given in Subsection 6.1. Fig. 1 depicts the structure of Draco during keystream generation.
As Draco is designed to be operated in packet mode, the maximum size of a plaintext packet encrypted under the same key/IV pair is 2 32 bits and no key/IV pair may be used more than once, i.e., for more than one packet. Let X = x 0 , . . . , x |X|−1 denote such a plaintext packet and let z 512 , z 513 , . . . be the keystream generated for it as described before. Then the corresponding ciphertext packet Y = y 0 , . . . , y |X|−1 can be produced via y i := x i ⊕ z i+512 , i = 0, . . . , |X| − 1. Decryption (given that the secret session key and the public IV are known) works analogously.
Note that, though we use the terms plaintext/ciphertext packet here, Draco is really a (synchronous) stream cipher. I.e., the keystream bits z 512 , z 513 , . . . are generated in a bitwise fashion (and independently of the plaintext/ciphertext) and, consequently, the individual plaintext bits x i can be encrypted and then (in the form of y i ) transmitted as they arrive. The same obviously holds for the decryption of the ciphertext bits y i .

Design Considerations
In this section, we provide additional explanations w.r.t. our design, which were omitted in Section 6 for the sake of clarity. As Draco has a Grain-like structure, we particularly focus on respective similarities and differences. Based on several of the following properties, we will then argue in Section 8 why we believe that Draco resists the currently known types of attacks against stream ciphers.

The Key-IV Schedule
The Key-IV Schedule defined in Subsection 6.1 is designed in such a way that for all t has only one solution: u 0 = · · · = u 31 = v 0 = · · · = v 96 = 0.
This system corresponds the situation that t is chosen in such a way that the constant 0 is added at position t + r, which is the case if r ≡ 97 − t mod 97. Moreover, (u 0 , . . . , u 31 ) is obtained from (K 0 , . . . , K 31 ) by some cyclic shift, and (v 0 , . . . , v 96 ) is obtained from (x 0 , . . . , x 96 ) by a cyclic right shift by r positions.
Note that the bijectivity of D t makes Draco immune against the following type of chosen-IV TMDTO-attacks, which we call Zero d-stream attacks.

Zero d-stream Attacks
Let K(0) ⊆ {0, 1} 32 be the set of all key prefixes k pre for which there is some initial value 4. In the case that a collision D = Z ⃗ 0 (S) was found, compute k * from S.
Note that if k * ∈ K(0) then, in step 3, the attacker knows the packet P (k * , IV * ) for some IV * ∈ IV(0). This packet contains 2 32 keystream blocks Z ⃗ 0 (S ′ ) for the internal state S ′ ∈ {0, 1} 128 . By the birthday paradox, there will be a collision with D which yields the secret key. If k * ̸ ∈ K(0) then the attack fails. Consequently, the success probability of the attack is around |K(0)| · 2 −32 , while the cost is at least 2 96 .
Due to the fact that D t is bijective for all t ≥ 0 (see Subsection 7.1) we have that |K(0)| = |IV(0)| = 1, which implies that the Zero d-stream attack against Draco does not beat exhaustive key search.

NFSR1
NFSR1 is 33 bits wide and replaces the maximum-length LFSR of the Grain family. Especially in view of the halved size of the cipher's volatile inner state (128 bits for Draco vs. 256 bits for Grain-128a), employing an NFSR is favorable here in order to strengthen the design against algebraic and fast correlation attacks such as [TIM + 18]. Unfortunately, not much is known yet about how to generically construct large, cryptographically strong NFSRs with maximal period. For FSR sizes up to 30-40 bits, however, corresponding non-linear feedback functions can be found experimentally. In [GGK06], the designers of the eSTREAM Phase 2 (hardware portfolio) candidate ACHTERBAHN-128/80 provide a list of 13 such NFSRs, ranging in size from 21 bits (NFSR A 0 ) to 33 bits (NFSR A 12 ). The latter is perfectly sufficient for our design as, due to the restriction to packet mode with a maximum of 2 32 keystream bits per key/IV pair, Draco actually does not need as large guaranteed periods as the Grain family.
While in the original Grain family, an all-zero LFSR initial state is tolerable due to the FSRs' sizes (both of key length), this must strictly be avoided for small-state Grain-like stream ciphers. More precisely, for one in 2 33 key-IV combinations, the Grain-like mixing employed by Draco during state initialization will result in an all-zero state of NFSR1. In this situation, NFSR A 12 of ACHTERBAHN would be stuck in the all zero-state, because (like a maximum-length LFSR of this size) it 'only' has period 2 33 − 1 for non-zero starting states. On contrast, Draco's NFSR1 yields truly maximal period 2 33 for any starting state. This is achieved by adding the term ⊕¬S t 1 ¬S t 2 · · · ¬S t 31 ¬S t 32 to ACHTERBAHN's NFSR A 12 as described in Section 6.1, thus simply 'gluing' the all-zero state into the original state cycle of NFSR A 12 between the states (1, 0, . . . , 0) and (0, . . . , 0, 1). Accordingly, we can assess the security of our NFSR1 on the basis of the following properties of A 12 given in [GGK06]: a nonlinearity of 114688, an order of correlation immunity of 6, and a diffusion parameter 5 of 54. Moreover, it is easy to see that A 12 's feedback function is balanced and, thus (as it is 6th order correlation-immune), 6-resilient. For comparison, in Grain-128a the feedback function of the LFSR (which is replaced by NFSR1 in Draco) has resiliency 5. Finally, the algebraic degree of A 12 's feedback function is 4.
A major reason for choosing A 12 from ACHTERBAHN as a basis for Draco's NFSR1 is also its hardware efficiency. Despite a comparatively large algebraic normal form, the designers of ACHTERBAHN are able provide a compact hardware realization of A 12 's feedback function consuming only 31.75 gate equivalents (GE) and having logical depth three (see Sec. 9 for further details w.r.t. hardware complexity and an explanation of corresponding units of measure like GE).

NFSR2
NFSR2 is 95 bits wide and its feedback polynomial is a modified version of g from Grain-128a [ÅHJM11]. In contrast to NFSR1, the period of NFSR2 during keystream generation is unknown because even after state initialization, it is not operated in a self-contained manner. More precisely, due to the masking bit from NFSR1 and the KIS bit d t , NFSR2 is actually a filter instead of a real NFSR (cf. corresponding remark for the Grain family in [HJMM08]).
As described in Section 6.1, for f 2 of Draco we shifted six taps of g from Grain-128a by two positions to the left in such a way that the property of g that no tap appears more than once is preserved in f 2 . Moreover, we added the following monomials to further strengthen Draco in the face of its smaller volatile inner state: Note that the set of tap indices of these new nonlinear monomials is disjoint from the set of tap indices of all other monomials of f 2 . In consequence, several important properties of g from Grain-128a like its balancedness and its resiliency of 4 carry over to f 2 . Similarly, the security of f 2 of Draco w.r.t. linear approximations can hence be lower boundend by that of Grain-128a (2 14 best linear approximations with bias 63 · 2 −15 ). As proven in [MJSC16], the aforementioned disjointness property w.r.t. the newly added taps implies that the nonlinearity of f 2 of Draco can be computed on the basis of the nonlinearity of g from Grain-128a (267403264) and its number of variables (29) together with the nonlinearity of the sum of the new monomials (976) and its number of variables (11) as follows: 2 11 · 267403264 + 2 29 · 976 − 2 · 267403264 · 976 = 549656723456 (≈ 2 39 ). As for g from Grain-128a, the algebraic degree of f 2 of Draco is 4.
Beyond these properties, g has so far successfully thwarted all cube-like attacks against the initialization of Grain-128a. In Section 8.3, we derive corresponding security arguments for Draco.

Output Function a
An important question in FSR-based stream cipher design is how to share the load of ensuring security between the driving register(s) and the output function. To compensate for the fact that the volatile inner state of Draco is smaller than that of Grain-128a, we decided that the output function should have more inputs and larger algebraic degree instead. It builds on the construction scheme introduced in [MJSC16] as part of the FLIP family of stream ciphers. More precisely, Draco's output function a can be written as the direct sum of a linear function with seven monomials, a quadratic function with four monomials, a triangular function with seven monomials, a triangular function with three monomials, and another triangular function with four monomials, where each tap of NFSR1 and NFSR2 appears at most once in a.
As a consequence, the output function of Draco is defined over 59 variables, balanced, and has, according to lemmata 3-6 in [MJSC16], the following security properties: a nonlinearity of 287580136809693184 (≈ 2 57 ), a resiliency of 9, an algebraic immunity of at least 7, and a fast algebraic immunity of at least 8. The algebraic degree of a is 7.
If the content of NFSR1 at time t should be known to the attacker (e.g., as part of a guess-and-determine attack), the output function still depends on at least 44 variables and 'gracefully degrades' into the direct sum of a linear function with eight or nine (depending on the value S t 3 ) monomials, a quadratic function with four to six (depending on the values S t 20 and S t 6 S t 29 ) monomials, and a triangular function with seven monomials, which again conforms to the construction principle introduced in [MJSC16] and leads to the following worst-case security properties for that situation: a nonlinearity of 8634823344128 (≈ 2 42 ), a resiliency of 8, an algebraic immunity of at least 7, and a fast algebraic immunity of at least 8.
While the choice of tap positions for state update functions is often already restricted by the need to guarantee a certain period (e.g., as in the case of NFSR1), the choice of tap positions for an output function is commonly less substantiated. For example, the design documents introducing the members of the Grain family (cf. [HJM06,ÅHJM11,HJMM08]) mainly focus on the conceptual question whether certain taps used in the output function should be from the NFSR or the LFSR (and how many of each). The more concrete question of which tap positions within each FSR are actually chosen for the output function is almost exclusively discussed in the context of hardware acceleration or when it comes to mitigate issues of previous versions arising from actual attacks (e.g., the attack of Dinur and Shamir [DS11] on Grain-128, which lead to a change of tap positions in the output function of Grain-128a [ÅHJM11]).
In the absence of canonical criteria for the selection of tap positions for Grain-like constructions, we mainly resort to the concept of (full) positive difference sets that was used by Golić in [Gol96] to assess the security of nonlinear filter generators consisting of a single LFSR and a nonlinear output function. Note that, e.g., the fast correlation attacks against Grain v1 and Grain-128a presented in [TIM + 18] actually match this cipher model as they treat Grain's NFSR as (part of) a filter for its LFSR. For further details about Golić's findings and how they influenced Draco's output function, we refer the reader to Appendix B.
Another cryptanalytic result motivating our selection of tap positions for the output function of Draco are attacks based on binary decision diagrams (BDDs). A direct consequence of this type of attack against stream ciphers, which was introduced by Krause in [Kra02] and applied to Grain-128 by Stegemann in [Ste07], is that (roughly speaking) the distance between the lowest and the highest tap index of a monomial should be large for as many monomials as possible (see Section 8.6 for further details).

Continuous Key and IV Usage
In this subsection, we provide further details about how the generic CIVK construction introduced in Subsection 3.2 is instantiated concretely through Draco. In particular, we argue about the choice of the different parameter lengths. With Draco we want to achieve a security level of 128 bits. Accordingly, the key length is chosen to be 128 bits.
As Draco is designed to operate in packet mode and a key/IV pair may be used to generate at most 2 32 keystream bits, the key prefix length is log 2 2 32 = 32 bits. This also determines the length of the initial value: As the desired security level is 2 128 and the packet length is 2 32 bits, we need to set the length of the initial value to be 128 − 32 = 96 bits. This corresponds to a total volatile internal state size of 128 bits and a total non-volatile internal state size of 128 bits.
It is easy to see that Draco's mixing phase in fact realizes a bijection (between the 256-bit internal states at t = 0 and those at t = 512) as assumed by our security proof for the CIVK construction given in Section 5. Draco deviates from the generic construction scheme only in a tiny detail. During key loading at t = 0, the first key bit is inverted (i.e., B 0 0 := K 0 ⊕ 1). This is to avoid the "sliding property" of Grain v1 and Grain-128 that was pointed out in [DCKP08] (see Section 8.4 for further details). In terms of our TMDTO security proof, however, this modification is completely irrelevant.

Packet Mode
Stream ciphers have a long history when it comes to protecting digital communication.
In 1987, Rivest designed RC4 [Sch95], which was later used in SSL/TLS [DR08] and the wireless network security protocols WEP [Ins97] and TKIP (often called WPA) [Ins04]. Other well-known stream cipher examples are E 0 of the Bluetooth standard [SIG14] and A5/1 of GSM [BGW99]. Unfortunately, E 0 and A5/1 have been shown to be highly insecure (see, e.g., [LMV05] and [BB06]) and RC4 also shows severe vulnerabilities, which led to its removal from the TLS protocol [Pop15] and rendered other protocols like WEP insecure [FMS01]. In 2004, the eSTREAM project [ECR08] was started in order to identify new stream ciphers for different application profiles. In the hardware category, aiming at devices with restricted resources, three ciphers are still part of the eSTREAM portfolio after the latest revision in 2012: Grain v1 [HJM06], MICKEY 2.0 [BD06] and Trivium [CP05].
Grain v1 uses 80-bit keys, 64-bit IVs and the authors do not give an explicit limit on the number of keystream bits that should be generated for each key/IV pair. MICKEY 2.0 uses 80-bit keys, IVs of variable length up to 80 bit and the maximum amount of keystream bits for each key/IV pair is 2 40 . Trivium uses 80-bit keys, 80-bit IVs and at most 2 64 keystream bits should be generated for each key/IV pair.
Interestingly, all three ciphers of the eSTREAM hardware portfolio are obviously designed for potentially very large keystream sequences per key/IV pair. In contrast, the aforementioned transmission standards all use much smaller packet sizes. For example, A5/1 produces only 228 keystream bits per key/IV pair, where the session key is 64 bits long and the IV corresponds to 22 bits of the publicly known frame (i.e., packet) number. Similarly, Bluetooth packets contain at most 2790 bits for the so-called basic rate. The Bluetooth cipher E 0 takes a 128-bit session key and uses 26 bits of the master's clock, which is assumed to be publicly known, as the packet-specific IV. For wireless local area networks (WLANs), the currently active IEEE 802.11-2020 standard [Ins21] implies that at most 11454 bytes (i.e., < 2 17 bits) are encrypted under the same key/IV pair using CCMP. Another widespread example for encryption on a per-packet basis is SSL/TLS, which underlies HTTPS and thus plays a vital role in securing the World Wide Web. In the most recent version, TLS 1.3 [Res18], the maximum amount of data encrypted under the same key/IV pair is 2 14 + 2 8 bytes (i.e., 2 17 + 2 11 < 2 18 bits), as long as RC4, which is now forbidden for all TLS versions by RFC 7465 [Pop15], is not used.
Considering the above examples, Draco's maximum packet length of 2 32 bits should be more than sufficient for most communication scenarios in the foreseeable future.

Cryptanalysis
In the following subsections, we will argue for several types of attacks which weakened or even broke other stream ciphers in the past, why we believe that Draco will resist them. In Section 4 we already argued about time-memory-data tradeoff attacks against the CIVK construction that underlies Draco.
The discussion in this section will build on a variety of results that illuminate the security of Grain-v1 against these attacks, and that address a number of established security parameters. Since Grain and Draco are very similar in the structural elements relevant to algebraic and correlation attacks, it seems sufficient to us in the given context to describe the extent to which Draco meets or exceeds the design criteria relevant to Grain's security. Of course, this does not mean provable security against every conceivable variant of such attacks. But establishing a more extended and detailed formal framework for formally proving the security of grain-like ciphers would have to go beyond the existing state of the art and would thus be a challenging scientific project in its own right, clearly beyond the scope of this paper. We think that further development of this formal framework should come from new attacks ideas being published from within the scientific community.

Correlation Attacks, Linear Approximations
Correlation and fast correlation attacks like those in [MJSC16, TIM + 18, ZGM17, TMA20, WLLM19] are a major threat to Grain-like stream ciphers. They target the cipher's LFSR and are based on finding sufficiently biased linear approximations of the NFSR's feedback as well as of the output function. In Draco, Grain's LFSR is replaced by NFSR1 with high nonlinearity and correlation immunity. Moreover, the output function has much more inputs, a higher resiliency and a much higher nonlinearity than that of Grain-128a. Furthermore, the feedback function of NFSR2 was additionally hardened as explained in Section 7.3.
In [MJSC16], Méaux et al. point out the importance of "good balancedness, non-linearity and resiliency properties" of the filtering function in order to withstand correlation attacks [Sie85] and fast correlation attacks [MS89]. As explained in Section 7.4, Draco features a rather heavy output function to compensate for the smaller volatile part of the inner state compared to the original Grain family. It is defined over 59 variables and has nonlinearity of about 2 57 , whereas the output function of Grain-128a is defined over 17 variables and has nonlinearity 61440. Moreover, the resiliency of Draco's output function is 9 compared to 7 for that of Grain-128a.
In [BGM06], Berbain, Gilbert and Maximov present an attack on Grain v0 that combines linear approximations of the NFSR's feedback function and of the output function in order to recover the initial state of the LFSR given a sufficient amount of keystream bits. As possible countermeasures, Berbain, Gilbert and Maximov proposed the following modifications [BGM06]: "Introduce several additional masking variables from the NFSR in the keystream bit computation", "replace g by a 2-resilient function", "modify the filtering function h in order to make it more difficult to approximate" and "modify the function g and h to increase the number of inputs". For Grain-128a, the feedback function g of the NFSR was constructed with the above attack in mind. The designers state: "The best linear approximation of g is of considerable interest, and for it to contain many terms, we need the resiliency of the function g to be high. We also need a high nonlinearity in order to obtain a small bias." As a consequence, g was chosen such that it has nonlinearity 267403264 (≈ 2 28 ) and resiliency 4.
As explained in Section 7.3, the feedback function f 2 of NFSR2 in Draco builds on that of Grain-128a in a way that preserves its balancedness and resiliency, but features an even higher nonlinearity (≈ 2 39 ). Moreover, in accordance with the above suggestions from [BGM06] and the construction principle underlying g of Grain-128a (see previous paragraph), the output function of Draco has more than three times as many inputs, a much higher nonlinearity and a higher resiliency than that of Grain-128a (cf. values at the beginning of this subsection) in order to strengthen it against linear approximations. In particular, it is defined over the state variables of both FSRs, featuring monomials of all degrees between one and seven defined over NFSR2 (cf. triangular function T (1) t in Section 6.1.4), monomials of degrees one, two, and three over NFSR1 (cf. triangular function T (2) t ), and monomials of degrees two, three, and four with variables from both FSRs mixed (cf. triangular function T (3) t ). Also note that (fast) correlation attacks against Grain-like structures as published in [MJSC16, TIM + 18, ZGM17, TMA20, WLLM19] target the ciphers' LFSR. However, as described in Section 7.2, the Grain family's LFSR is replaced by an NFSR with high nonlinearity and correlation immunity for Draco. This approach was already employed for Lizard, which is the only Grain-like stream cipher completely unaffected by (fast) correlation attacks, so far.
In [TMA20], Todo, Meier and Aoki study the data limitation of small-state stream ciphers in the context of correlation attacks. For Plantlet, which targets 80-bit security, they can recover the secret key if about 2 53 keystream bits per key/IV combination are available. Fortunately, this condition cannot be met in practice as the cipher's designers set a corresponding limit of 2 30 bits, which is considered "conservative" by Todo, Meier and Aoki. In line with these findings, Draco (whose key and and state size are larger than those of Plantlet) has a maximum packet length of 2 32 bits.
Finally, as explained in Section 7.4, the choice of tap positions for Draco's output function follows the concept of (full) positive difference sets, which was introduced by Golić in [Gol96] as a design criterion to strengthen nonlinear filter generators against correlation attacks.

Algebraic Attacks
For algebraic attacks against Draco, one has to differentiate between two basic approaches. First, an attacker could express observed keystream bits as functions of the unknown 128 key bits and then try to solve the corresponding system of equations. This, however, would require to include all state transitions down to t = 0. Given that both FSRs are nonlinear and considering the high algebraic degree of the output function (which is used as part of the state update in phase 2 of the state initialization), this is clearly more complex than the following second attack approach: expressing observed keystream bits as functions of the unknown 128 bits of the volatile initial state at t = 512 and the unknown 32-bit key prefix (used continuously during keystream generation) and then trying to solve the corresponding system of equations. Consequently, for the remainder of this subsection, we will focus on the second approach.
First of all, note that, to the best of our knowledge, no successful (i.e., having complexity lower than 2 128 ) algebraic attacks that can recover arbitrary initial states for Grain-128a have been reported so far. 6 Due to the smaller volatile inner state of Draco, the number of variables of the corresponding system of equations in such an attack would now in fact be lower. This, however, is compensated for by the larger degree of the output function, which is now 7 as compared to 3 for Grain-128a. As pointed out in Section 7.4, Draco's output function builds on the construction scheme introduced in [MJSC16], depends on 59 variables, has nonlinearity of about 2 57 , algebraic immunity of at least 7 and fast algebraic immunity of at least 8. In addition, now both FSRs are nonlinear and NFSR1, which corresponds to the LFSR of the original Grain-family, has algebraic degree 4. Furthermore, we hardened NFSR2 against algebraic attacks by adding five more nonlinear monomials as compared to g in Grain-128a (cf. Sec. 7.3). Based on these properties, we expect that algebraic attacks against full Draco will not have complexity lower than that of exhaustive key search.
Also note that even when guessing the shorter NFSR1, Draco's output function still depends on at last 44 variables and has, inter alia, the following worst-case security properties: nonlinearity of about 2 42 , algebraic immunity at least 7, and fast algebraic immunity at least 8. For comparison, the full output function of Grain-128a depends on 17 variables, has nonlinearity 61440 and algebraic degree 3. Thus, in the context of a corresponding guess-and-determine attack, an algebraic attack on NFSR2 similar to the one in [BGJ09] will have large enough complexity.
Guessing the larger NFSR2 is even less promising from an attacker's point of view. Due to its size of 95 bits and the 128-bit security level targeted by Draco, a successful state-recovery attack against the full cipher would have to subsequently recover the 33bit inner state of NFSR1 and the 32-bit key prefix underlying the key-IV-schedule with time/memory/data complexity below 2 128−95 = 2 33 . In particular, these 65 remaining unknowns also influence the further state updates of NFSR2, which, on contrast to NFSR1, is not operating autonomously during keystream generation. So while in this phase guessing NFSR1 allows to eliminate it (including its feedback function and its contribution to the output function) from subsequent steps of the cryptanalysis, this is not the case when guessing NFSR2. That is, an attacker guessing NFSR2 would have to recover 65 unknowns with time/memory/data complexity below 2 33 , while still being faced with both feedback functions and the full output function of Draco.

Conditional Differentials, Cube Attacks
In [LM12], Lehmann and Meier study the security of Grain-128a against dynamic cube attacks and differential attacks. They come to the following conclusion: "To analyse the security of the cipher, we study the monomial structure and use high order differential attacks on both the new and old versions. The comparison of symbolic expressions suggests that Grain-128a is immune against dynamic cube attacks. Additionally, we find that it is also immune against differential attacks as the best attack we could find results in a bias at round 189 out of 256." The currently best key-recovery cube attack against round-reduced Grain-128a is presented in [TIHM17]. It is based on the division property and works for 183 initialization rounds.
Draco has 512 rounds in phase 2 of the state initialization, where the Grain-like mixing is performed as described in Section 6.2. On top of that, from the second half of phase 2 onwards (i.e, for all t ≥ 256), the 32-bit prefix of the secret key is continuously involved in the state update of NFSR2.
Note that the volatile inner state of Draco (128 bits) is smaller than that of Grain-128a (256 bits), whereas the output function is much more dense. It depends on 59 variables as compared to 17 in Grain-128a. The output function of Draco also has more nonlinear monomials (15) than that of Grain-128a (5). Moreover, now both FSRs are nonlinear and the feedback function of NFSR1 is defined over more inputs (40 vs. 19) and has more nonlinear monomials (15 vs. 10) than that of Grain-128a's NFSR.
The combination of a smaller volatile state and more dense feedback and output functions causes a faster diffusion of differentials and of the monomial structure for Draco. Together with the doubled number of initialization rounds, this should make Draco at least as resistant against differential attacks and cube attacks as Grain-128a, which seems to be already sufficiently secure in that respect.
In 2021 Horn [Hor21] studied the resistance of an earlier version of Draco against cube attacks. Since then, the key-IV-schedule was slightly changed to prevent the zero d-stream attacks mentioned in Subsubsection 7.1.1. In particular the work by Horn considers a key prefix of length 33 instead of 32 and an IV of length 95 instead of 96 with an additional 0-prefix. This will not significantly change the results obtained in the analysis against cube attacks.
Horn [Hor21] considered only 99 and 100 initialization rounds instead of the full 512 as "the superpoly recovery for Draco frequently turned out to become computationally infeasible even for a very small number of initialization rounds." Horn observed that in each clock cycle only one IV bit enters the internal state of NFSR2. He found practical distinguishers for 99 and 100 initialization rounds. Yet, he was not successful in attacking 101 initialization rounds. The author states that "it was not possible to recover the superpoly of a cube just a few rounds after the cube variable with the highest index is introduced." Further, since Draco uses 512 initialization rounds, Horn considers the margin more than sufficient to provide very high security against his cube attacks. Horn concludes that the "extremely fast growing complexity of these superpolys of an even simplified version of Draco, again supports our assumption that Draco is extremely resistant against the considered attack."

Slide Attacks, Related Key Attacks
In [Küç06], Küçük first pointed out a sliding property of the state initialization of Grain v1, which was later formally published by De Cannière, Küçük and Preneel in [DCKP08] as: "For a fraction of 2 −2·n of pairs (K, IV ), there exists a related pair (K * , IV * ) which produces an identical but n-bit shifted key stream." In the same paper, the authors describe how this property can be exploited to speed up exhaustive key search for Grain v1 (and also for Grain-128) by a factor of two. 7 In addition, they also suggest a related-key slide attack, for which they note: "As is the case for all related key attacks, the simple attack just described is admittedly based on a rather strong supposition." [DCKP08] As a reaction, the designers of Grain-128a changed the 22-bit constant (1, . . . , 1) that was used in the state initialization of Grain-128 to (1, . . . , 1, 0).
In Draco, no constants are used during state initialization. Instead, to avoid the above sliding property, we set B 0 0 := K 0 ⊕ 1 in phase 1 of the state initialization (cf. Sec. 6.2). As a result, for a key/IV pair (K, IV ), a related key/IV pair (K * , IV * ) in the sense of [DCKP08] would have to satisfy Let d t and d * t denote the key-IV-schedule bits computed on the basis of (K, IV ) and (K * , IV * ), respectively, as described previously in Section 6.1. For the sliding property from [DCKP08] to occur, d t+1 = d * t would need to hold for t ≥ 0. In particular, we get and It is easy to see that equations (6), (7), and (8) cannot be satisfied simultaneously. Note that, without inverting K 0 in phase 1 of the state initialization together with having different definitions of d t for t ≤ 255 and t ≥ 256, Draco would in fact suffer from a variant of the sliding property, despite continuously employing the IV and the 32-bit key prefix for state update during keystream generation.
Let us also point out that, as stated by De Cannière, Küçük and Preneel in [DCKP08] and cited above, we too consider the supposition underlying related-key attacks to be rather strong. In particular, we do not claim security for situations where a potential victim generates keystream under secret related keys and an attacker tries to recover one or more of these. Typical motivations of this scenario would be fault attacks (e.g., an attacker manipulating the secret inner state and/or the unknown key bits via clock glitches, voltage spikes, optical or electromagnetic fault injection etc.) or the usage of a weak (session) key derivation method. The latter is obviously a blatant security flaw on its own that needs to be avoided irrespective of the employed cipher. And for protecting against fault attacks, various established countermeasures are available on the hardware design level (see, e.g., [KSV13] for an overview). Correspondingly, we do not make security claims regarding other types of side-channel attacks either.

Weak Key/IV Pairs
In [ZW09], Zhang and Wang introduce the notion of weak key/IV pairs for the Grain family of stream ciphers. They show that Grain-128 has 2 96 such pairs, which lead to an all-zero initial state of the LFSR, and use them to mount distinguishing attacks and initial state recovery attacks. In [ÅHJM11], the designers of Grain-128a point out: "We note that the IV is normally assumed to be public, and that the probability of using a weak key-IV pair is 2 −128 . Any attacker guessing this to happen and then launching a rather expensive attack, is much better off just guessing a key." In analogy to the definition of Zhang and Wang, weak key/IV pairs for Draco would lead to an all-zero initial state of NFSR1. Such pairs, however, are now completely unproblematic (and, hence, not weak anymore) as the 33-bit-wide NFSR1 has truly maximal period 2 33 . In particular, unlike the LFSR of the Grain family, it cannot get stuck in the all-zero state.
Note that without adding the term ⊕¬S t 1 ¬S t 2 · · · ¬S t 30 ¬S t 31 ¬S t 32 to ACHTERBAHN's NFSR A 12 for obtaining NFSR1 of Draco (cf. Sec. 6.1), there would have actually been about 2 191 weak key/IV pairs out of 2 224 total key/IV pairs, leading to a probability of 2 −33 for using a weak pair. Thus, corresponding attacks might have posed a real threat to Draco, which is now avoided.

BDD-based Attacks
In [Kra02], Krause introduced the idea of using binary decision diagrams (BDDs) to attack LFSR-based stream ciphers like A5/1 of the GSM standard or E 0 of Bluetooth. Stegemann later showed in [Ste07] how this approach can be transferred to NFSR-based stream ciphers like Trivium and Grain.
In contrast to TMD tradeoff attacks or correlation attacks, which potentially require a lot of known keystream, BDD attacks are short-keystream attacks in the sense that only the information-theoretic minimum of keystream bits (i.e., often only few more than n bits of keystream for a keystream generator of inner state length n) is required to recover the corresponding initial state.
While we are currently not aware of any BDD attack faster than exhaustive key search against any member of the Grain family, the major design consequence of the BDD-related cryptanalytic results that Stegemann obtained for Grain-like stream ciphers is that the maximum number of what he calls active monomials of the feedback functions and the output function should be as large as possible (see [Ste07] for further details). In the setting of Stegemann, for Grain-128a, the maximum number of active monomials would be 0 for the LFSR, 3 for the NFSR and 3 for the output function. In comparison, for Draco, the maximum number of active monomials would be 19 for NFSR1, 6 for NFSR2 and at least 10 for the output function a. Consequently, we expect that, despite the smaller volatile inner state, Draco will also resist BDD attacks.

Preventing Banik et al. and Esgin-Kara Attacks
Banik's attack [BCI + 21] against Sprout is based on the LFSR-property that the constant zero internal state occurs in Sprout with a certain probability. This state generates a stream of zeros. As Draco does not use any LFSR, this attack can not applied to Draco.
The Banik, Baroti, Isobe attack against Plantlet [BBI19] exploits Plantlet's property that pairs of internal states which differ only in position 43 generate identical keystream blocks of length 41. This property is due to the comparatively large distance between certain taps of Plantlet's LFSR. To prevent this type of attack, the Atom stream cipher uses an additional second key filter which is driven by a 7-bit LFSR. As all pairs of neighbored taps of both of Draco's NFSRs have a sufficiently small distance, the BBI-attack can not be applied to Draco. This is the reason why we decided to use only a simple cyclic filter as key schedule for Draco, and not an LFSR-driven one.
The Esgin-Kara attack against Sprout [EK15] uses the fact that the key bits coming from the key filter are multiplied by a term depending on the volatile state. These key bits can be shown to be zero in certain clock cycles. This implies that certain keystream blocks do only depend on the volatile internal states, which allows for nontrivial TMDTO-attacks. This type of attack can not be applied to Draco as the d-bits coming from the Draco key-IV schedule are linearly added to the state update function of NFSR1. Moreover, the Draco key-IV schedule has the property that each 128-bit key stream block depends on all key prefix-and IV-bits. This prevents attacks like Esgin-Kara's Subsection 3.4 of [BCI + 21].

Hardware Results
In this section, we present the hardware results for our new stream cipher Draco and compare them to those of Atom [BCI + 21] and Grain-128a [ÅHJM11] 8 , which, like Draco, accepts 128-bit keys and 96-bit IVs. The reasons for focusing on Grain-128a are twofold. First, it is a natural choice for comparison due to the close structural relation between Draco and the Grain family of stream ciphers as explained in sections 6 and 7. Second, and more importantly, Grain v1 (the 80-bit version of Grain-128a) turned out to be the most hardware efficient member of the eSTREAM [ECR08] portfolio (see tables 1-4 and figures 1-3 in [GB08]) and, hence, the Grain family of stream ciphers can be considered as a benchmark for new designs. Also note that Draco is the first small-state stream cipher offering full 128-bit security against key recovery and distinguishing attacks, which is why a comparison to, e.g., Plantlet or Lizard would not be appropriate, here.
Second we chose to compare Draco to Atom. Atom is a lightweight stream cipher that was recently published in ToSC [BCI + 21]. Atom is a reasonable comparison as it also uses a 128-bit key and it further stores the secret key externally, i.e. it builds upon CKEY that we introduced earlier. We further implemented a version of Atom that stores its secret key internally in the hardware module in an additional register that is denoted in Table 2 as Atom [K] . This is done to allow the comparison to variants of Draco that store the key prefix, resp. IV, locally in the hardware module as described below. In particular, for Draco [KI] and Atom [K] there are no dependencies to external resources, as is the case for Grain-128a.
In line with papers like [Fel07,GB08], which evaluate candidates in the eSTREAM hardware category, we focus on application-specific integrated circuits (ASICs) with standard CMOS libraries. ASICs are the prevalent hardware component in lightweight application scenarios, such as radio frequency identification (RFID) technology, and likewise important for highspeed cryptographic processing, such as bitcoin mining. The two main restrictions imposed on the design of cryptographic protocols for RFID tags are the circuit size and the power budget. The circuit size strongly influences the manufacturing costs of an RFID tag (see [AHM14] for details) and is commonly specified in gate eqivalents (GE), where one GE corresponds to the area of a two-input drive-strength-one NAND gate. The power consumption is crucial as low-cost RFID tags are usually passively powered (i.e., via an electromagnetic field radiated by the reader). In ASIC-based highspeed processing, on the other hand, energy consumption is becoming the main cost factor (see, e.g., [DV18]). It is important to note that while the area requirement of cipher designs can be compared over different standard cell libraries by using the measure gate equivalents, "[p]ower cannot be scaled reliably between different processes and libraries" [GB08]. Consequently, it is crucial to use the same design flow for all implementations that are to be compared. In Appendix C.1, we provide a detailed specification of the tools and methodology employed for deriving the hardware evaluation results summarized in Table 2. After state initialization, all implementations produce one keystream bit per clock cycle, leading to identical throughput rates at identical clock speeds.
Remember that in contrast to Grain-128a, half of Draco's 256-bit inner state is actually held constant (consisting of the 32-bit key prefix and the 96-bit IV). This allows for maximizing Draco's resource efficiency by easily adapting the hardware implementation to each device's specific capabilities. For example, if the secret key is burned into the device or stored in an EEPROM (a common RFID scenario [AHM14], assumed, e.g., by Plantlet) and the IV is constituted by the device's frame counter (as, e.g., in A5/1), then no storage cells for this data need to be allocated inside of the Draco hardware module, leading to the most lightweight variant labeled Draco in Table 2. If, on the other hand, the 32-bit key prefix and the 96-bit IV should both be available only at the beginning of state initialization (as generally assumed by Grain-128a), additional storage cells are required, leading to Draco [KI] . The variants Draco [K] resp. Draco [I] represent the two intermediate scenarios that only the 32-bit key prefix resp. the 96-bit IV need to be held locally in the Draco hardware module.
The numbers presented in Table 2 show that the Draco stream cipher is likewise attractive for lightweight RFID and highspeed computation scenarios. For example, when making optimal use of an RFID tag's resources (i.e., burned/EEPROM key, transmission counter as IV), Draco requires 23 % less area (2142 vs. 2795 GE) and 31 % less power (79.2 vs. 115.3 µW) than Grain-128a at a clock frequency of 10 MHz. In the case of high speed computing, on the other hand, everything comes down to energy consumption. At a clock frequency of 1 GHz, all four implementation variants of Draco consume about 34 % less energy than Grain-128a for producing 10 kbit of keystream (including state initialization). In particular, this substantial advantage is achieved even if the 32-bit key prefix and the 96-bit IV have to be stored locally inside of the Draco hardware module (i.e., 32.7 nJ for Draco [KI] vs. 50.4 nJ for Grain-128a; cf. Appendix C.1).
In direct comparison to Atom we can see that Draco needs 28 % less area (2142 vs. 2976 GE) and 24 % less power (79.2 vs 104.9 µW) at a clock frequency of 10 MHz. Further comparing Atom [K] to Draco [KI] we see improvements of 21 % in area (3025 vs 3858 GE) and an improvement of 20 % in power consumption (100.6 vs 126.1 µW) at a clock frequency of 10 MHz.
The reason behind this is that already for moderate clock frequencies (here: between 10 MHz and 20 MHz) the dynamic power consumption (due to switching of values) dominates the static power consumption (due to leakage) of flip-flop storage cells. To the best of our knowledge, this effect has never been considered in stream cipher design before. Instead, the classical design paradigm (e.g., followed by Grain-128a, but also by Plantlet and Lizard) exclusively focused on the number of flip-flops, ignoring their actual usage. With Draco [KI] we demonstrate that even if a 2n-bit storage is required inside the cipher hardware module to achieve n-bit security against TMDTO attacks, algorithmically keeping half of this state constant is much more efficient (cf. Tab. 2) than and equally secure (see Section 8 and Section 5) as constantly updating the whole of it.

Conclusion
In this work we presented the new generic stream cipher construction CIVK and a new stream cipher proposal called Draco that instantiates CIVK. CIVK provably provides full volatile state length security against distinguishing attacks providing a solid theoretical foundation to design stream ciphers upon.
Draco uses a 128-bit key, which is loaded to the volatile state cells of its feedback shift registers during initialization. A 32-bit prefix of this key, together with a 96-bit initial value, is continuously employed as part of the state update during keystream generation. If the key prefix and the initial value are stored 'externally' (e.g., inside an EEPROM), this design requires 23 % less area and 31 % less power than Grain-128a at 10 MHz.
For high-performance environments, we also considered an implementation variant called Draco [KI] with the key prefix and the initial value stored inside the cipher hardware module, while still only half of the total internal state is updated during state updates. When clocked at 1 GHz, this variant consumes about 34 % less energy than Grain-128a, still providing 128 bits of security and thus challenging the current paradigm of stream ciphers to always incorporate all internal state bits during state updates.
As future work we suggest to evaluate the performance of Draco on other hardware platforms like FPGAs or microcontrollers. Moreover it might be interesting to investigate whether, under the current security guarantees, even more lightweight variants of Draco are possible, for example by choosing a lighter output function.
• No taps from NFSR1, except some of those in the additionally required term of f 1 , ⊕¬S t 1 ¬S t 2 · · · ¬S t 31 ¬S t 32 (cf. Sec. 6.1), are used at the same time for its feedback function f 1 and the output function. (In Grain-128a, the feedback function of the LFSR, which corresponds to NFSR1 in our construction, and the output function do not share any taps, either.) • The set {5, 11, 19, 22, 26, 31} of the tap indices (all from NFSR1) of T (2) t is a full positive difference set. This means that each two bits of the internal bitstream of NFSR1 never appear more than once together as part of this triangular function.
• No taps from NFSR2 are used at the same time for its feedback function and the output function. (In Grain-128a, the feedback function of the NFSR, which corresponds to NFSR2 in our construction, and the output function share only a single tap called "b i+95 " in [ÅHJM11].) • The direct sum L t + Q t + T (1) t uses only taps from NFSR2. (To maintain a sufficient security level even when the content of the smaller NFSR1 is known to the attacker, e.g., due to guessing; cf. Section 8. (1) t is a full positive difference set. One consequence of this is that each two bits of the internal bitstream of NFSR2 can form at most once a quadratic monomial together. (1) t appears as a difference between two taps of a higher degree monomial of T is a full positive difference set. Consequently, each two bits of the internal bitstream of NFSR2 never appear more than once together as part of each (i.e., the same) of those monomials.

C.1 Hardware Evaluation Setup
As done by Hell et al. for their hardware evaluation of Grain-128AEAD [HJM + 19] (a current round-2 candidate in the NIST Lightweight Cryptography Standardization process), we target 0.65 nm CMOS process technology and use Synopsys tools for synthesis and power estimation. More precisely, our results (see Table 2 in Section 9) are obtained via Synopsys Design Compiler 2018.06-SP4 and are based on the netlist generated for the Draco reference implementation (see Appendix C.2) employing TSMC's tcbn65gplus 200a standard cell library. Like Feldhofer in [Fel07] for his low-power implementations of Trivium and Grain, we employ clock gating, which is a standard technique for reducing dynamic power consumption in synchronous circuits. In a nutshell, this means that while an edge-triggered flip-flop is not supposed to switch values (such as the registers holding the 32-bit key prefix and the 96-bit IV in Draco [KI] for t ≥ 1), its enable port is disconnected from the circuit's clock signal.
The switching activity for power estimation (recorded with Synopsys VCS 2018.09-SP1-1 and fed back to Design Compiler) covers the generation of 10 kbit of keystream (as done by Good and Benaissa in [GCB06] in their hardware comparison of eSTREAM candidates) and includes the state initialization of the compared cipher modules. To improve the accuracy of the results, switching activity for 100 different random key/IV combinations is considered and the arithmetic mean of the respective power estimates is computed.
As, after state initialization, all cipher implementations compared in Section 9 produce one keystream bit per clock cycle, energy consumption can be straightforwardly computed and compared on the basis of power estimates. The only thing which has to be taken into account here is that Grain-128a performs 256 initialization rounds as compared to 512 rounds for Draco. Consequently, the amount of energy required for producing, 10 kbit of keystream (including state initialization) at a clock speed of 1 GHz is computed as (10256/(1 GHz)) · (4912.9 µW) = 50.4 nJ for Grain-128a and as (10512/(1 GHz)) · (3119.3 µW) = 32.7 nJ for Draco [KI] .
The critical path delay of a circuit determines the maximum possible clock frequency and, hence, the maximum achievable throughput. The worst delay for any of our four implementation variants of Draco is 560 ps, which corresponds to an achievable clock frequency of about 1.8 GHz. Also note that where encryption throughputs even larger than 1.8 Gbit/s are required, the delay can be further reduced by using techniques like pipelining (as done for the stream cipher Espresso in [DH15]). Moreover, it is possible to instruct the synthesis tool to optimize for higher clock speeds, which will lead to a circuit with smaller delay but, inter alia, higher area requirements.