Towards Low Energy Stream Ciphers

. Energy optimization is an important design aspect of lightweight cryptography. Since low energy ciphers drain less battery, they are invaluable components of devices that operate on a tight energy budget such as handheld devices or RFID tags. At Asiacrypt 2015, Banik et al. presented the block cipher family Midori which was designed to optimize the energy consumed per encryption and which reduces the energy consumption by more than 30% compared to previous block ciphers. However, if one has to encrypt/decrypt longer streams of data, i.e. for bulk data encryption/decryption, it is expected that a stream cipher should perform even better than block ciphers in terms of energy required to encrypt. In this paper, we address the question of designing low energy stream ciphers. To this end, we analyze for common stream cipher design components their impact on the energy consumption. Based on this, we give arguments why indeed stream ciphers allow for encrypting long data streams with less energy than block ciphers and validate our ﬁndings by implementations. Afterwards, we use the analysis results to identify energy minimizing design principles for stream ciphers.


Motivation
The field of lightweight cryptography has seen a number of cipher proposals in the past few years, with block ciphers like CLEFIA [SSA + 07], KATAN [CDK09], Klein [GNL11], LED [GPPR11], Midori [BBI + 15] [HKM17] and Plantlet [MAM16] to name a few. However, the Advanced Encryption Standard (AES) [DR02] still remains the de-facto standard when it comes to practical lightweight encryption, also due to the numerous low-power/area architectures for AES being reported in literature [MPL + 11,SMTM01,FWR05].
However, we argue that for battery driven devices that run on tight battery budgets like handheld devices, medical implants or RFID tags, a more relevant parameter is the energy consumption. In a nutshell, it is a measure of the total electrical work done by the battery source during the execution of any operation. In fact, some previous works have investigated the energy efficiency of block ciphers. In [BDE + 13,KDH + 12], an evaluation of several lightweight block ciphers with respect to various hardware performance metrics, with a particular focus on the energy cost, was done. In [BBR15], the authors looked at design strategies like serialization and round unrolling and the effect it has on the energy consumption required to encrypt a single block of data. Serialization stretches out the execution of each round function over a number of clock cycles and hence was found to be unsuitable for energy efficiency. The authors then proposed a formal model for energy consumption in any r-round unrolled block cipher architecture. The authors concluded that the energy consumed for encrypting one block of plaintext for any r-round unrolled implementation had a quasi-quadratic form (a, b, c are constants and R is the number of iterations of the round function prescribed for the design): where ar 2 + br + c denotes the energy consumed per cycle and 1 + R r is the total clock cycles required to encrypt. Although an r-round unrolled cipher consumes more energy per cycle for increasing values of r, it takes fewer cycles to complete the encryption operation itself. This makes the determination of the values of r at which the design has the lowest energy consumption an interesting and important optimization problem. The authors concluded that for block ciphers with lightweight round functions like PRESENT and SIMON, r = 2 was the optimal configuration, whereas for "heavier" round functions like in AES and Noekeon, r = 1 was optimal. Building on these ideas, the block cipher family Midori was proposed in [BBI + 15] that optimized the energy consumption per encryption.
However, previous work in this field [BBR15,BBI + 15, BDE + 13, KDH + 12] has focused on the energy consumption for encrypting one block of data. While this is reasonable for scenarios that require the encryption of short data bursts, we show that when it comes to encrypting significantly large data, a stream cipher may be energy-wise a better solution than a block cipher. Stream ciphers like Grain [HJM07] and Trivium [CP08] use an extremely simple state update operation that typically involves to compute multiple boolean functions and state rotation. As a result, unrolling multiple rounds of a stream cipher usually only involves to realize additional copies of the boolean function circuit (if the number of rounds unrolled is small). This has the consequence that the power consumption in stream cipher circuits increases very slowly with the number of rounds unrolled. On the other hand, the number of clock cycles required to encrypt a given amount of plaintext drops linearly with the level of unrolling and so does the energy required to perform the encryption operation. This makes stream cipher promising candidates for low energy encryption.
As an instructive example, we compare the energy consumptions of the single and two-round unrolled Grain v1 circuits.
• A single round implementation of the Grain v1 circuit synthesized with the standard cell library of the STM 90nm logic process, takes around 1164 GE and has an average power consumption of 40.567 uW at a clock frequency of 10 MHz. In order to encrypt 64 bits of data, the circuit has to operate for 1 (loading the Key-IV) + 160 (for Key-IV mixing) + 64 = 225 clock cycles. Therefore the energy required for the operation is approximately 40.567 * 225 ≈ 912.8 pJ.
• A two-round unrolled Grain v1 circuit, which performs 2 round operations in one clock cycle, has an area of around 1200 GE and an average power consumption of around 41 uW. However this circuit requires only 1+80+32=113 clock cycles to encrypt 64-bit data, and so the energy requirement is only around 463 pJ. So a 2x unrolling results in approximately a 2x reduction in energy.
Consequently, for a cipher like Grain v1 which was specifically designed to allow for efficient unrolling of up to 16 rounds, we expect the trend to persist for at least up to 16th degree of unrolling and perhaps beyond that as well.

Contribution
In this paper we investigate the energy consumption traits of stream ciphers. For our analysis, we select the stream ciphers Trivium [CP08], Grain v1 [HJM07], Grain-128 [HJMM06], Lizard [HKM17], Plantlet [MAM16], and Kreyvium [CCF + 16]. We take a look at all implementation level aspects that are likely to affect the energy consumption of stream ciphers and then draw conclusions from our studies. Our principal finding from these experiments was that the 160x unrolled implementation of Trivium is about 9 times more energy efficient than any block cipher based solution for encrypting long data streams, and that unrolled stream ciphers in general outperform block ciphers in this domain.

Organization
The paper is organized as follows. In Section 2, we take a look at the factors that may affect the energy consumption of stream ciphers. We try to identify parameters that result in increase/decrease the energy consumption and try to draw necessary conclusions. Section 3 concludes the paper.

Energy-Impact of Design Components
In [BBI + 15], it was pointed out that for any given block cipher, the three main factors that determine the quantity of energy dissipated in the circuit are: Since stream ciphers possess the same basic architecture as block ciphers in the sense that both are round-based, the same is likely to be true (to some extent) for a stream cipher as well. In this section, we investigate factors that may affect the energy consumption of stream ciphers. The aim is to identify design principles and parameters that a designer can choose to increase/decrease the energy consumption. To this end, we perform several experiments with respect to the three factors mentioned above from which we derive characteristics that an energy efficient stream cipher should possess. In all the simulations reported in the paper, we maintained the following design flow. First, the design was implemented at RTL level. A functional verification of the VHDL code was then done using Mentorgraphics ModelSim. Thereafter, Synopsys Design Compiler was used to synthesize the RTL design using the standard cell library of the STM 90nm CMOS logic process. The switching activity of each gate of the circuit was collected by running post-synthesis simulation. The average power was obtained using Synopsys Power Compiler, using the back annotated switching activity. The energy was then computed as the product of the average power and the total time taken for the encryption process.

Frequency of Operation
Note that the total energy dissipation for a CMOS gate can be written as • E dynamic refers to the dynamic dissipation which is due to the charging and discharging of load capacitances and the short-circuit current and • E static denotes the static dissipation which is due to leakage current and other current drawn continuously from the power supply.
As pointed out in [KDH + 12,BBR15], as the energy consumption is measured by the total number of switching activities of a circuit during the encryption process, it should be independent of the frequency of operation. While this is true at high frequencies where dynamic energy E dynamic consumed is significantly larger than the total static energy E static consumed by the system, the situation changes at lower frequencies. It was shown in [BBR15] that for circuits designed with the standard cell library of the STM 90nm CMOS process, at frequencies lower than 1 MHz the static energy gets a higher impact.
To remedy this effect, we fixed at our experiments the frequency of operation to 10 MHz (this corresponds to a clock period of 100 ns), so that the leakage power plays minimal role in the energy consumption.

Architecture
Often, there are different options for implementing a stream cipher. We will take a detailed look at a few of them: Figure 1 depicts the diagram of a stream cipher which has been unrolled n times. Unrolling in stream ciphers refers to implementations where we include logic gates for several instantiations of the update function such that multiple rounds can be executed within a single clock cycle. In a stream cipher, the storage elements, commonly realized by flip-flops, are usually preceded by a multiplexer, which in the initial clock cycle filters a combination of the key and IV on to the register and the output of the round function thereafter. The combination of flip-flop and multiplexer can be replaced with a scan flip-flop which provides the same logical functionality while occupying less area and less power. Hence the intuition is that designs using scan flip-flops would be more energy-efficient than those based on combining flip-flops and multiplexer.

A. Scan Flip-Flops vs Regular Flip-Flops
To investigate this, we executed simulations for several hardware-based stream ciphers. The results are tabulated in Table 1. The table shows simulation results for the six hardware-based stream ciphers Grain v1, Grain 128, Trivium, Plantlet, Lizard 1 and Kreyvium, synthesized with the standard cell library of the STM 90nm logic process. It displays the energy consumptions for encrypting for both 1 and 1000 blocks of plaintext where one block is taken to be equal to 64 bits.
The results confirm the intuition formulated above. For example, Grain v1 takes 1 (loading key-IV) + 160 (initialization) + 64 = 225 cycles to encrypt 1 block of plaintext. As can be seen in Table 1, in case of using regular flip-flops this results into requiring energy of 225×100 ns × 40.6 uW ≈ 912.8 pJ. Similarly, 161+64000 = 64161 cycles are required to encrypt 1000 blocks, and so the energy required for it can be estimated as 64161 × 100 ns × 40.6 uW ≈ 260.28 nJ. In contrast, when using scan flip-flops the energy requirement are ≈ 874.8 pJ and ≈ 249.47 nJ, respectively.

Main Conclusions:
We can draw the following conclusions from the results reported in Table 1. Designs implemented with scan flip-flops are shown to be better both with respect to energy consumption and circuit area. Since all other factors remain equal, reduced area when using scan flip-flops results into a reduced power consumption. Since energy is of the power over time, this also results into reduced energy consumption.

B. Fibonacci vs Galois Configuration
Practically all stream ciphers deploy feedback shift registers (FSR), either with linear update function (LFSR) or non-linear update function (NLFSR). For these, a designer has the choice between the Fibonacci and the Galois configuration. For instance, the designs of stream ciphers Grain v1, Grain 128 and Trivium consider the Fibonacci configuration of the deployed LFSRs. As shown in Figure 2A, a shift register in Fibonacci configuration updates its states by shifting all bits by one position and by inserting at the final position a bit that has been computed by the round function from the current state (before shifting).
In comparison, in a shift register in Galois configuration each state bit is updated using a function separate f i applied to the entire current state (cf. Figure 2B)  [MD10,Dub09]. The authors showed that Galois configurations usually have lower circuit latency and thus can allow for higher throughput. In Table 2, we tabulate for several ciphers a comparison between Galois and Fibonacci configurations.
Note that we have to omit the Plantlet stream cipher as a realization using Galois configuration is not possible. This is due to the fact that the non-linear register update function takes inputs from the 39-th, being the last bit, which doesn't allow to apply the transformation given in [MD10].
Note that both configurations provide the same logical functionality and hence do not offer any significant advantages over the other with respect to classical cryptanalysis. However in [CMM14], it was shown that Galois registers are more vulnerable to power attacks than Fibonacci registers: They were able to find the initial state of the Galois register using approximately half the number of power traces as compared to Fibonacci registers.

Main Conclusions:
A Galois configuration does not seem to offer any significant advantage over its Fibonacci counterpart with respect to energy consumption or area size. Moreover, most ciphers designs primarily consider FSRs in Fibonacci configuration. As we will see later, a more energy efficient realization of a stream cipher needs to unroll it a multiple number of times. However, implementations in Galois configuration for ciphers that were primarily designed for the Fibonacci configuration cannot be unrolled beyond a certain limit (see [MD10]). All these make Galois configurations unattractive for low energy designs.

C. Architecture of Round Function
The round functions F i in hardware-based stream ciphers are generally very simple. They involve a one bit shift (that can be efficiently implemented by shift registers) and one or multiple small boolean functions to update the terminal bit of the register. We look at three possible ways of realizing these.
1. The first approach is to use a look-up table. For an n-variable boolean function this is a table of 2 n × 1 entries. For obvious reasons, although effective for small n, this kind of circuit style is inadvisable for larger values of n. 2. The second approach is to feed the functional description (in terms of the algebraic normal form) of the boolean function to the synthesizer and instructing it to optimize for area and power. In this approach, we depend on the ability of the circuit synthesizer.
3. The third approach is to use a Decoder-Switch-Encoder (DSE) style configuration. This approach was previously considered in [BBR15,BBI + 15] for designing the 8-bit Rijndael S-box and was shown to be energy efficient. For implementing boolean functions, the first step is the same as for realizing an S-box circuit. We implement the decoder first i.e., for the case of an n-bit input we construct a set of 2 n wires, where logically, each wire represents one of the 2 n possible minterms of n variables. It is easy to see that only one of the wires would hold a logical HIGH signal for any given input value. Since there is one wire corresponding to every minterm, we simply logically OR all the wires whose minterms result in a logical HIGH in the truth table of the function. In fact it is clear, that we don't even need to expend hardware for constructing all 2 n wires: we can do with constructing only those wires whose minterms are present in the canonical normal from of the function we are implementing. However, the circuit size is still exponential in the input length, so we adopted a simple tweak. Whenever the number of input variables of a function exceeded 10, we split the function into the sum of two component functions of roughly equal size with an input size of less than 10 and constructed the circuits for each of the component functions. Breaking up the function into components is directly possible for some ciphers. For example, in Plantlet the NFSR update function g is given as g = n 0 + n 13 + n 19 + n 35 + n 39 + n 2 · n 25 + n 3 · n 5 + n 7 · n 8 + n 14 · n 21 + n 16 · n 18 + n 22 · n 24 + n 26 · n 32 + n 33 · n 36 · n 37 · n 38 + n 10 · n 11 · n 12 + n 27 · n 30 · n 31 Although this is a function of 29 variables, each variable occurs only once and hence there is no intersection of terms between any 2 monomials. Hence it is easy to break up g as a sum of five functions (say g 1 to g 5 ) each of 5 or 6 variables, such that no two component functions depend on the same input variable. However, this is not always the case. The NFSR update function of Grain v1 for instance, has 13 variables, and breaking it up into functions of disjoint variables is not straightforward. However the DSE construction does

Main Conclusions:
In Table 3, we list the simulation results for the three realizations of round function that we discussed. It is clear from the table that LUT or DSE style constructions of the boolean function have no significant advantage over the circuit optimized by the synthesizer.

Unrolling Rounds
Unrolling rounds is a design technique which aims to speed up the circuit throughput at the cost of area. The core idea is to replace the round function designed for one round by an augmented function that implements several rounds within one function. For example, a two round unrolled AES circuit consists of two sequentially placed circuits for the round functions, that computes the ciphertext in only 5 clock cycles (i.e. half the time as compared to a single round circuit). We discussed in Section 1.1 already that unrolling turned out to be an effective method for realizing low energy block ciphers and that we expect similar benefits for stream ciphers. In fact, most hardware stream ciphers have very simple round functions (consisting of a logical shift and a boolean function computation). Consequently, we do not expect a significant increase in the algebraic complexity when unrolling the design at least for the first few rounds. This would translate into a rather small increase in the hardware complexity, a reasonable number of additional logic gates. This would naturally limit the transient switching activity (signal glitches) from one round to the next (cf. [BBR15]). Since less glitches results into a lower power consumption, it is quite often the case that unrolling stream ciphers by one round does not significantly increase the power consumption whereas it always decreases the number of clock cycles required to encrypt a given amount of data. Thus, the overall energy consumption decreases with unrolling. A good example for this effect is the Grain v1 cipher that we discussed already in Section 1.1. The single round and the 2 round unrolled circuits have an average power consumptions of 40.567 and 41 uW respectively, at a clock frequency of 10 MHz. Since the number of clock cycles required to encrypt data in the 2 round circuit is approximately half as compared to the single round circuit, a 2x unrolling results in approximately a 2x reduction in energy as well.

Unrolling in RTL 4
Stream ciphers like Grain v1, Grain 128 and Trivium were specifically designed to easily allow unrolling. In Grain v1 for example, the last 16 bit positions in both the linear and non-linear register are used neither in the round update function nor the output keystream function. This implies that a 16x unrolling of Grain v1 is straightforward [HJM07], and only requires 16 additional copies of the round and update functions to be added to the circuit as shown in Figure 3. The same is true for Grain 128 (up to 32x unrolling) and Trivium (up to 64x unrolling). For degrees of unrolling higher than that specified in the design, the algebraic structure of the resulting round update function gets more and more complicated, since simply adding more copies of round functions will no longer lead to correct functionality. In RTL however, unrolling beyond this specified limit is not very difficult to realize, and an example of this is shown in Appendix A.
In Table 4, we list the simulation results for energy consumptions for different degrees of unrolling. We use scan based flip-flops to construct the memory element and use functional optimization of the round function circuit as motivated before. Next, we discuss several aspects of the simulation and observations in details.

Comparison with block ciphers:
We compare our results for stream ciphers with the block ciphers PRESENT and Midori64. PRESENT has been included as a standard in ISO/IEC 29192-2 and was shown in [BBR15] to be extremely energy efficient, while the Midori block cipher family was designed specifically for low energy consumption. Although a subspace attack [GJN + 16, TLS16] that exploits a class of weak keys of Midori64 has been reported, we keep the cipher in our comparisons as it sheds some light on lower energy limits achievable with block ciphers.
As opposed to the case of block ciphers, it is difficult to express the energy consumption of an r-round unrolled stream cipher by a simple equation as Equation (1). The reason is that unlike to block ciphers, unrolling a stream cipher by an additional round does not increase the circuit complexity uniformly. As a result the transient A discussion on modes of operation: Usually for block ciphers, an additional layer of mode of operation is a must before usage, while stream ciphers in general do not require such a layer. For example, encrypting data using the CBC mode requires an additional number of xor gates equal to the blocksize of the block cipher. And encrypting data using the CTR mode requires an additional counter. It is clear that additional hardware amounts to additional power consumption and hence additional energy requirement. In Table 4, we tabulate the energy required for encrypting data in ECB mode for PRESENT and Midori64. Which is to say, the tabulated values reflect the energy consumed in the block cipher circuit only. Hence, using the above ciphers combined with a mode of operation will therefore consume some more energy than the values tabulated in the table. From Table 4 and Figure 4, our findings are that a suitably unrolled version of Trivium or Grain v1 consumes energy much less than most energy efficient stand-alone block cipher, as we increase the total amount of encrypted data. Therefore we conclude that Trivium or Grain v1 would perform better than a block cipher combined with a mode of operation.
Shorter vs. longer data lengths: Note that while for encrypting a single block of data, block cipher outperform stream ciphers, the opposite is true for larger data. For shorter lengths of data, the energy consumed by the stream cipher is dominated by the key initialization phase. For example, the 1x implementation of Trivium would take 1217 clock cycles to encrypt 64 bits, of which 1152 is used up by the key initialization function. A one round implementation of Midori64 would take only 17 cycles to encrypt 64 bits. For longer data, the effect of key initialization on the energy consumption becomes less significant, since it is computed only once. To encrypt 1000 blocks (64000 bits) of data, Trivium 1x would require only 64000 + 1152 + 1 = 65153 cycles. Clearly 1152 is a much smaller fraction of 65153 than of 1217. Multiple unrolling decreases the time to encrypt even further. For example, the 160x implementation of Trivium can encrypt 160 bits in a single clock cycle, and so around 1 + 64000+1152 160 = 409 cycles are required for 1000 blocks.  On the other hand, the most energy-efficient version of Midori (2x) would take 9 * 1000 = 9000 cycles to encrypt 1000 blocks. As a result we see that for the most energy-efficient configuration of Trivium (160x) is around 9 times more energy efficient than the most energy efficient version of Midori64. In Figure 4 we plot the energy consumptions for encrypting up to 10 blocks of data with the most energy efficient unrolled configurations of the ciphers. While for a single block of data Midori64 performs best, for 6 blocks of data or more Trivium performs best.
Parabolic behavior with unrolling With respect to unrolling, the energy consumption for stream ciphers follows the same parabolic behavior as block ciphers [BBR15], particularly for longer lengths of data. Which is to say that for smaller degrees of unrolling the energy consumption is very high, the energy consumption comes to a minimum at some fixed degree of unrolling, and the energy consumption increases again if the cipher is unrolled beyond this point. This is the result of two conflicting effects. For lower degrees of unrolling, the energy consumption is obviously high due to 1) a comparatively large number of initialization rounds and 2) a lower number of bits encrypted per clock cycle. For example, a single round unrolled version of Grain v1 encrypts one bit of plaintext per clock cycle. This means that to encrypt 32 bits, the design has to pay for the energy consumption of the 160-bit register and the associated logic functions for 1 (key loading) + 160 (initialization) + 32 (keystream) = 193 clock cycles.
Consequently, the energy consumption decreases when the degree of unrolling increases. Larger degree of unrolling implies less time spent in initialization and more bits encrypted per cycle. For example, a 32x unrolled version of Grain v1, would need only 5 clock cycles for initialization. To encrypt 32 bits of data, the system would have to pay for the energy consumption of the 160-bit register and logic functions for 1 + 5 + 1 = 7 clock cycles. However, the logic functions in a 32x unrolled version are more than 32 times more complex than in a 1x design, and it is true that more power is consumed in the hardware circuit of the logic functions. Despite of that, we can see that a 32x implementation of Grain v1 is around 6.5 to 7 times more energy efficient than the 1x version for short data lengths, and around 7.5 times better for longer data lengths.
However, beyond a certain degree of unrolling, increasing the unrolling results into an increase of energy consumption. The reason for that is the power consumed in the logic functions increases sharply at that point. This happens due to the reasons which are similar for block ciphers [BBI + 15]. In [BBI + 15, Figure 2], it was shown that power consumption in sequentially placed logic functions increases uniformly because of increased circuit latency which leads to increased glitch propagation. Because of this, it was shown that each additional unrolled round results in quadratic increase in power consumption (due to the term ar 2 + br + c in Eq (1)), but only a linear decrease in the computation time (due to 1 + R r ). As a result, unrolling the round functions beyond a fixed number usually proves counter-productive. Figure  5, demonstrates the increasing share of power consumed by the logic functions in Grain v1 over 1, 32 and 64 degrees of unrolling. It is easy to see that at 64x, the most power hungry element of the design is the round function. Table 4 is that the "lightness" of the round functions in stream ciphers has a similar effect as in the case of block ciphers. It was shown in [BBR15] that block ciphers with light round functions like PRESENT, Twine, SIMON produce less glitches when the circuits for more than one round function are connected serially. Hence, block ciphers with light round functions achieve energy optimality when unrolled twice, in contrast with heavy round functions whose single round versions are most energy efficient. In Table  4, it can be seen that for ciphers like Grain v1, Lizard and Plantlet whose update functions are more algebraically complex, the energy optimality is achieved at small degrees of unrolling. In contrast, it holds for Trivium which has an extremely simple round update function consisting of 3 and gates and 6 xor gates only that energy optimality is achieved at 160x unrolling.

Light/heavy round function A further conclusion one can draw from
There is also a distinct advantage for unrolled stream ciphers with simple update functions. This is due to reasons similar as in block ciphers. Simpler/lighter round functions themselves produce less glitches, and thus even when the circuits for these functions are unrolled several times, the propagation of glitches across circuits is not significant enough to escalate the power consumed. Heavier round functions produce more glitches, and their propagation becomes significant even for smaller degrees of unrolling. Figures 5a,b provide a useful comparison between Grain v1 (heavy) and Trivium (light) round functions. At 160x unrolling, the round function in Trivium consumes only 134 uW which is only around 54% of the total power. This is in contrast with the 64x Grain v1 implementation, which consumes around 460 uW, which is 82 % of the total power.

Comparison with Kreyvium
Since Kreyvium builds upon the Trivium structure by adding two additional registers for key and IV rotation and two additional xor gates, we can see that an 1x unrolled version of Kreyvium consumes 1.5 to 2 times more energy as Trivium -even for longer data lengths. This trend can be seen for higher degrees of unrolling except for implementations where the number of unrolled rounds is a multiple of 128. These versions do not need additional registers to implement key and IV rotation since they can be assumed to be available on the wires, and hence these implementations have lower energy consumption. Nonetheless, the additional complexity of 2 XOR gates in the round function implies that even for multiples of 128, the most energy-efficient configuration of Kreyvium consumes around 10% more energy than Trivium.

Lessons learnt
From the discussion in this section, it becomes clear for encrypting longer data streams, stream ciphers with a simple update functions have a distinct advantage. These are easier and more energy-efficient to unroll for higher degrees of unrolling. Higher degrees of unrolling allows to encrypt more bits in one clock cycle, which is crucial in bringing down the number of clock cycles required to encrypt a given length of data, and hence the energy consumption. On the other hand, higher degree of unrolling results into a more complex logic of the update function and hence needs more power to operate. Thus, a sufficiently simple update function ensures that the additional power consumption resulting from unrolling remains small enough to not outweigh the natural advantages obtained from unrolling. Lastly, the number of initialization rounds does affect the energy numbers for shorter data packets, but its effect becomes minimal with the increase in the length of plaintext to be encrypted.

Conclusion
In this paper, we investigated the design of low energy ciphers. We conducted experiments on various design parameters that affect the energy consumption of the encryption process and were able to draw several conclusions out of it. Our initial investigations showed that although block ciphers are more energy-efficient solutions for encryption of short data streams, for longer data streams multiple round unrolled stream ciphers perform better. Stream ciphers with simple update functions were found to be more energy-efficient since these were easy to unroll without increasing the circuit complexity and power consumption too much. We found that the Trivium structure was best suited for this purpose. The 160x unrolled implementation of Trivium was not only around 9 times better than the best block cipher based solution in terms of energy consumption of 1000 data blocks.