SCB Mode: Semantically Secure Length-Preserving Encryption

. To achieve semantic security, symmetric encryption schemes classically require ciphertext expansion. In this paper we provide a means to achieve semantic security while preserving the length of messages at the cost of mildly sacrificing correctness. Concretely, we propose a new scheme that can be interpreted as a secure alternative to (or wrapper around) plain Electronic Codebook (ECB) mode of encryption, and for this reason we name it Secure Codebook (SCB). Our scheme is the first length-preserving encryption scheme to effectively achieve semantic security.


Introduction
In this paper we revisit the classical insecure Electronic Codebook (ECB) mode of encryption, and transform it into one that achieves semantic security, denoted Secure Codebook (SCB).This will be at the cost of imperfect correctness and the need of keeping state, but we provide optimal security-correctness trade-offs depending on the setting.

Background and Motivation
Given a block cipher over {0, 1} n , typical modes of operation for symmetric encryption map plaintext messages of length ℓn to ciphertexts of length at least (ℓ + 1)n.An exception is the Electronic Codebook (ECB) mode of operation, whose ciphertext have length exactly ℓn, and is therefore length-preserving.Still, ECB is the archetypal example of an insecure mode of encryption, in that it fails to achieve semantic security.The reason is that the latter, conventionally phrased in terms of indistinguishability under a chosen-plaintext attack (IND-CPA), is easily broken for ECB simply by comparing the encryption of a message with repeating blocks against one without (or a uniformly chosen bit string of the same length).
Semantic security has been initially introduced and achieved by Goldwasser and Micali [GM82], where it was originally defined for probabilistic (public-key) encryption.Bellare et al. [BDJR97] later adapted the notion to symmetric encryption, and subsequently Rogaway [Rog04] initiated the rigorous study of semantically secure deterministic (symmetric) encryption, made possible by the use of nonces.A typical example of a semantically secure mode of encryption is counter (CTR) mode, where a nonce of n bits (or simply a random string, if one wants to obtain a probabilistic encryption scheme) is constantly increased and fed to the block cipher to obtain pseudorandom bit strings to be used as (one-time) pads for each message block.By virtue of the nonce being used as a counter and thus never repeating, and the PRP/PRF switching lemma, we are guaranteed that each block, even repeated ones, will be padded with bit string that are computationally indistinguishable from uniformly and independently distributed ones.But clearly, the nonce needs to be transmitted (in or out of band) as part of the ciphertext, effectively contributing to the expansion of the ciphertext by one block.
Still, with CTR mode one can in principle achieve length preservation by pre-agreeing on an initial counter, and then keeping state.But crucially, it is imperative that ciphertexts do not get reordered or dropped (deleted) in transit.To see how this is possible, imagine that Alice and Bob agree to set the initial counter to 1, and suppose for simplicity that Alice wants to send block messages M 1 , . . ., M ℓ ∈ {0, 1} n to Bob.She will encrypt each message as C i .
= E K (i) ⊕ M i using the secret key K and a pre-agreed block cipher E, but an adversary will reorder the ciphertexts in transit by effectively delivering the permuted sequence C ′ 1 , . . ., C ′ ℓ to Bob, where , for some permutation π.This means that Bob will then decrypt each message as i , but clearly M ′ i = M i with probability 1 if and only if π(i) = i.Therefore, only messages that have been encrypted using a counter that is a fixed point of π will be correctly decrypted with probability 1.Moreover, in case even just one of the ciphertext is dropped by the adversary (and even if the others are not permuted), correctness of the scheme is completely lost for any subsequent message.
It is nevertheless well known that length-preserving encryption (or rather, enciphering) is indeed possible, if one wants to give up semantic security and settle for the weaker notion of pseudo-random permutation (PRP) security, adapted to variable length bit strings.The study of variable-input-length (VIL) ciphers was initiated by Bellare and Rogaway [BR99], and asks the question of how to transform a block cipher for fixed-input length into a VIL cipher, in such a way that the length is preserved.This is useful for example in networking applications where a packet format needs to be upgraded by adding privacy features under the constraint that such packet preserves its exact structure.
Bellare and Rogaway stated that "semantically secure encryption cannot possibly be length preserving", and here we want to exactly challenge that statement.We argue that this is not only a theoretically interesting question, but it also captures scenarios that are not entirely unlikely in practice.For example, consider a small low-power IoT device that needs to communicate data confidentially in a strong sense (that is, in a semantically secure sense) via UDP packets to some server, knowing that plaintext messages might be repeated, or might simply contain repeating blocks within or across messages.Note that since UDP, unlike TCP, does not define a session construct, reordering and dropping (deletion) of packets cannot be excluded.Now, it is reasonable to assume that the device might need to regularly send and receive a small number of encrypted messages, say each day d sessions consisting of m messages each, and that each message is made up of a small number of blocks (whose length is defined by the block cipher), say at most b.Then, with conventional IND-CPA secure modes of encryption, the amount of overall blocks that need to be transmitted per day is upper bounded by dmb + dm, but if the scheme would be length-preserving, then this bound would only amount to dmb, which for large d and small m and b represents a significant gain in communication efficiency.To achieve length preservation, the above approach with CTR mode would not be suitable in this setting, because using UDP might cause ciphertexts to be reordered or deleted.This naturally leads us to the question that we aim at answering in this paper, namely: Can we achieve semantically secure length-preserving symmetric encryption such that correctness is not completely lost in case of reordering or deletion?
We will answer this question in the positive for the case of reordering, and show that it is indeed possible to achieve the desired goal at the cost of (tunable) imperfect correctness and keeping state.

Contribution
We answer the above question constructively, by providing a concrete stateful symmetric encryption scheme that achieves semantic (IND-CPA) security, but that has imperfect correctness.This does not mean that the scheme is impractical, but rather, that it should be used in a way that (provably) provides the desired level of correctness.The core of our scheme is a mode of encryption, that is, a way to use a block cipher E : {0, 1} κ × {0, 1} n → {0, 1} n in order to encrypt bit strings whose length is a multiple of the block length n.
Our mode of encryption can be seen as a (black-box) adaptation of the classical (but clearly not semantically secure) Electronic Codebook (ECB) mode, and for this reason we named our new mode Secure Codebook (SCB).More precisely, given oracle access to ECB encryption and decryption functions (that is, for an unknown secret key K 1 ), SCB encryption uses an additional key K 2 , keeps state, and uses the ECB encryption oracle, and SCB decryption also uses K 2 , keeps state, and uses the ECB decryption oracle.Therefore, if for any (probably unsound) reason a protocol employs ECB as a sub-module, it is possible to wrap such module in a black-box way and achieve what any semantically secure scheme would achieve in its place, without having to change the syntax of the exchanged messages, and only at the cost of keeping some state and scarifying some correctness (still, in a way that it is very close to perfect, if the parameters are reasonably chosen).
Our approach is quite simple: Anytime a block is repeated (within or across messages), rather than enciphering it again with the block cipher, we will encipher a specially formatted block that, if decrypted, with high enough probability will be detected as signaling the repetition of a previous block.Clearly, forcing every block (even such repetition signals) to be of the same size, must be at the cost of correctness.More precisely, we will be splitting the block cipher domain in valid messages and repetition signals, so it is imperative to design such split in a careful way, so that with good enough probability deciphered blocks will not be interpreted as repetitions.Finally, applying the so-called ciphertext stealing (CTS) paradigm, turns our mode of encryption into an encryption scheme for strings of any length.Therefore, our scheme can also be easily adapted to be a (PRP) secure length-preserving enciphering scheme (or VIL cipher), even though, one with imperfect correctness.This can be achieved simply by not keeping state.
We provide two variants of our scheme.The simpler one can be used in a setting where no reordering of transmitted ciphertexts takes place.The second one is slightly more complex, because it assumes reordering of transmitted ciphertexts does take place.The idea is simply to tag blocks of decrypted messages that had the structure of a repetition, but for which no reference block has been deciphered so far.This way, if a message containing the reference block is decrypted later, the receiver can (at least partially) reconstruct the previous messages with tagged blocks.Note that this feature is impossible to achieve with CTR mode, if reordering or deletion of transmitted ciphertexts take place.

Related Work
The idea of designing length-preserving (symmetric) encryption schemes is not new.What is new is the axis used to approach this problem: Rather than asking how to appropriately weaken the security notion of such a scheme, we ask how we can weaken its correctness, while maintaining a high level of security, but in such a way that the scheme is still usable in practice.In fact, previous work on length-preserving encryption (LPE) addresses the problem we are considering here, but from a different perspective.Introduced by Bellare and Rogaway [BR99], LPE can essentially be understood as the problem of turning a regular (fixed-size) block cipher into a VIL cipher, such that for each possible length ℓ (and key), the new scheme implements a permutation over bit strings of length ℓ.Clearly, this implies that semantic security is unattainable by an LPE scheme, and therefore the best hope is to achieve a modification the classical of pseudo-random permutation (PRP) security notion for block ciphers, which simply asks that for each possible length ℓ (and key), the scheme is indistinguishable from a uniformly random permutation over bit strings of length ℓ.For this reason, it would in fact be more correct to understand the E in LPE as enciphering, rather than encryption.
The original work by Bellare and Rogaway [BR99] provided a concrete scheme that can be abstractly described as a two-pass CBC-MAC over the input message of arbitrary size.Subsequently, Bleichenbacher and Desai [BD99] refined this scheme to achieve Strong PRP (SPRP) security.More efficient constructions were later found by Patel et al. [PRS04].In [CYK04] and [CKY07], Cook et al. introduce the notion of elastic block ciphers (EBC), which unlike the previous works, achieves LPE by treating only the round function of the underlying block cipher as a black box, not the entire block cipher.
In the literature, there has been a shift in attention towards achieving tweakable LPE from a tweakable block cipher, a primitive originally introduced by Liskov et al. [LRW02].Two first such schemes are the modes of operation CMC [HR03] and EME [HR04], both introduced by Halevi and Rogaway, but both limited to input lengths which are a multiple of the block length (hence LPE schemes that are not VIL schemes).The first truly VIL tweakable LPE scheme, the Extended Codebook (XCB) mode of operation, was described by McGrew and Fluhrer [MF04], who combined a block cipher with an universal hash function to realize an unbalanced three-round Feistel network.The same authors only later formally proved its security in [MF07].Halevi extended EME into EME* [Hal04], achieving a fully VIL tweakable LPE scheme; Wang et al. [WFW05] proposed HCTR, based on the counter (CTR) mode of encryption, and later Chakraborty and Nandi [CN08] improved its security bound.A series of schemes based on ECB followed, PEP by Chakraborty and Sarkar [CS06], TET by Halevi [Hal07], and HEH by Sarkar [Sar07].Further schemes improving on HCTR are HSE by Minematsu and Matsushima [MM07], HCH by Chakraborty and Sarkar [CS08], and HMC by Nandi [Nan08].More recent schemes include TCT 1 and TCT 2 by Shrimpton and Terashima [ST13] and Adiantum by Crowley and Biggers [CB18].
We stress again that of all the above mentioned works, none provides an LPE scheme achieving semantic security, which is precisely what we do here for the first time.

Notation
Let N = {1, 2, . ..}.For any n ∈ N, we use the convention [n] .= {1, . . ., n}.For a set S we denote the set of all (non-empty) sequences of length at least n over S as S ≥n .
= ∪ i≥n S i , and we also define S + .
= S ≥1 .For some x = (x ℓ , . . ., x 1 ) ∈ S + , with ℓ ∈ N, we define |x| .= ℓ as well as [x] t .= (x t , . . ., x 1 ), for any 1 ≤ t ≤ ℓ.For another y = (y ℓ ′ , . . ., y 1 ) ∈ S + , with ℓ ′ ∈ N, we define x∥y .= (x ℓ , . . ., x 1 , y ℓ ′ , . . ., y 1 ).When S = {0, 1}, we call such sequences bit strings.For any n ∈ N, by F n we denote the set of all functions {0, 1} n → {0, 1} n , and by P n the set of all bijections (permutations) {0, 1} n → {0, 1} n .For any k, v ∈ N, we model a look-up table T mapping key bit strings of length k to value bit strings of length v as a function {0, 1} k → {0, 1} v ∪ {⊥} (with ⊥ / ∈ {0, 1} + ), and we define the following operations: Initializing a look-up table T to an empty one is denoted T ← [ ]; Assigning value V to key K in T is denoted T[K] ← V , and we assume that any value previously assigned to K will be overwritten by V ; Reading the value assigned to key K in T and assigning it in V is denoted V ← T[K], and if T does not hold any value for K (that is, no value has been assigned to K in T before), then V will be assigned the special symbol ⊥.Finally, if X is a finite set, we let x $ ← X denote picking an element of X uniformly at random and assigning it to x, and for an algorithm A we let y ← A O1,O2,... denote running A with oracle access to O 1 , O 2 , . .., modeled as functions, and assigning the output to y.

Games, Adversaries, and Reductions
We work in the concrete security setting pioneered by Bellare et al. [BKR94,BDJR97], and use the code-based game-playing framework of Bellare and Rogaway [BR06].A game G specifies a number of procedures O 1 , O 2 , . . .that model oracles for an adversary A. G also optionally defines a procedure Init, and (if not specified otherwise), A will output a bit b.Execution of adversary A with game G then consists of running A with oracle access to Init (if present) and O 1 , O 2 , . .., with the restrictions that A's single call to Init (if present) must be its first overall call.The output of the execution is the bit output by A, and we use the notation Pr For some security notions, when defining A's advantage, we will directly use the right-hand side expression with oracles modeled by regular functions parameterized by random variables.

Symmetric Encryption
In this work we consider a special type of symmetric encryption: length-preserving and stateful.We also restrict our attention to schemes with message and ciphertext spaces consisting of bit strings with lengths that are integer multiples of a fixed block length, not bit strings of arbitrary length.This makes our exposition easier, and our result more modular, because we will achieve schemes for any length by simply using ciphertext stealing as the very last step (see Section 3.4).
K is the space of keys, S is the space of encryption states, and T is the space of decryption states.The encryption algorithm takes as input a key K ∈ K, a message M ∈ {0, 1} ℓn , for some ℓ ∈ N, and an encryption state S ∈ S, and returns a ciphertext-state pair (C; S ′ ) ← Π.E(K, M ; S), such that C ∈ {0, 1} ℓn (length preservation).The decryption algorithm takes as input a key K ∈ K, a ciphertext C ∈ {0, 1} ℓn , for some ℓ ∈ N, and a decryption state T ∈ T , and returns a message-state pair (M ; T ′ ) ← Π.D(K, C; T), such that M ∈ {0, 1} ℓn .
In this paper we will assume for simplicity that the key distribution is the uniform distribution over K.Note that we did not include any correctness requirement in Definition 1.For better readability, we will use the following short-hand notation: • For any key K, encryption state S, message M , decryption state T, and ciphertext C, we define • We assume that (syntactically) the state is "passed by reference" to the encryption and decryption algorithms, that is, we write , meaning that encryption and decryption algorithms implicitly modify the state as a side effect.
Finally, we introduce a new operation that allows to enhance correctness of an LPSE scheme in case transmitted ciphertexts are reordered (for which we already apply the analogous above remarks).
Definition 2. For some n ∈ N, a Recoverable LPSE (R-LPSE) scheme Π = (E, D, D, R) is an LPSE scheme (E, D) which additionally defines a tagged decryption algorithm The tagged decryption algorithm takes as input a key K ∈ K, a decryption state T ∈ T , and a ciphertext C ∈ {0, 1} ℓn , for some ℓ ∈ N, and returns a tagged message (M, t) ← Π.
for some ℓ ∈ N, tagged decryption will tag each deciphered block M i , for i ∈ [ℓ], that is deemed ambiguous by setting t i to 1 (and to 0 otherwise), so that the output is A block is deemed ambiguous if it signals a repetition, but no previous plaintext block can be found in the decryption state.After a batch of ciphertexts has been transmitted (that is, after a session has terminated), if the communication channel did not guarantee that the order was preserved, running the recovery algorithm on the batch can then resolve any such ambiguity.

Semantic Security
For some n ∈ N, let Π be an LPSE scheme with key space K and message and ciphertext spaces ({0, 1} n ) + .We define semantic security of Π as indistinguishability from random ciphertexts (IND-CPA) (as introduced in [AR00, RBBK01]).Considering games G ind-cpa-0 Π and G ind-cpa-1 Π in Figure 1, we define the advantage of an IND-CPA adversary A as We let β(A) denote the total number of n-bit blocks queried to Enc by A. 1

Correctness
For some n ∈ N, let Π be a (R-)LPSE scheme with key space K and message and ciphertext spaces ({0, 1} n ) + .We define two separate notions for correctness.The first models the setting where ciphertexts are not reordered in transit, and therefore if Π is just an LPSE (and not R-LPSE) scheme, it is the only correctness notion that can be achieved (if Π is an R-LPSE, it can also satisfy this notion).The second models the setting where ciphertexts are reordered in transit, and therefore it only applies in case that Π is an R-LPSE scheme.

Without Reordering of Ciphertexts
We define correctness (COR) (without reordering) of Π as the problem of distinguishing between an oracle that, given a message, returns the decryption of its encryption, and an Game G cor-0 Π 1 : procedure Init Game G cor-wr-0 Π 1 : procedure Init must be a permutation.

5
: Game G We let β(A) denote the total number of n-bit blocks queried to EncDec by A.

With Reordering of Ciphertexts
Assume that Π is an R-LPSE.We define correctness with reordering (COR-WR) of Π as the problem of distinguishing between an oracle that, given a sequence of s ∈ N messages and a permutation on [s], (1) encrypts the messages in the given order, (2) tag-decrypts them in the permuted order, (3) applies the recovery algorithm to the list of decrypted messages, and (4) returns the sequence of (permuted) recovered messages, and an oracle that simply returns the permuted sequence of queried message.Considering games G cor-wr-0 Π and G cor-wr-1 Π in Figure 3, we define the advantage of a COR-WR adversary A as We let β(A) denote the total number of n-bit blocks queried to EncDecRec by A.

PRP Security
Let κ, n ∈ N, E : {0, 1} κ × {0, 1} n → {0, 1} n , and for any K ∈ {0, 1} κ and M ∈ {0, 1} n , define E K (M ) .= E(K, M ).Then E is a block cipher if, for any K ∈ {0, 1} κ , E K is bijective, that is, E K is a permutation on {0, 1} n .We say that E is a pseudorandom permutation (PRP) if it is indistinguishable from a uniformly selected permutation.We define the advantage of a PRP adversary A as We let q(A) denote the total number of queries to E K made by A.

Collision Resistance
Let m, n ∈ N with m > n, and H : {0, 1} m → {0, 1} n .We say that H is a collision resistant (CR) compression (hash) function if it is hard to find two pre-images of H with the same image.We define the advantage of a CR adversary A as Note that we consider unkeyed compression functions, which means that there always exists an efficient CR adversary with advantage 1.But as pointed out in [Rog06], this does not imply that one can actually find such adversary.Rather, in our proofs we give explicit constructions of CR adversaries from other adversaries (by means of a reduction).Still, it is possible to make our results more rigorous by letting compression functions be keyed.In this case, the target security would be weak collision resistance (WCR) from [BCK96].
3 Secure Codebook (SCB) Mode of Encryption

The Scheme
We design a stateful symmetric encryption scheme starting from a block cipher E and a compression function H, that is, we specify a mode of encryption.The main idea is to encipher each newly seen block normally with E K1 , for some key K 1 , and keep track of how many times each blocks has been seen so far via a look-up table S that maps blocks to integer counters; then, for each block that has been seen previously, rather than enciphering the block again (which is what ECB would do), we generate a hash value with H, append it to the bit string representing the number of times such block has been previously seen (retrieved from S), pad with enough zeros, XOR with a second key K 2 of the same length as a block, and encipher the resulting bit string.
For decryption, we will also keep state by using a look-up table T mapping hash values to blocks.For each block of a ciphertext, we initially decipher it with E −1 K1 , and then decide whether we believe the result to be the intended bit string upon encryption, in which case we store it under its hash value defined by H in T, or whether it was a signal of a repetition.We always guess the latter case if the deciphered block has the right structure, that is, if it is appropriately zero-padded, and if the last bits correspond to a hash value that is contained in T; In this case, simply retrieve the block from T.
Clearly, things can go wrong, but we will show that under appropriate conditions, our scheme is still practical, and achieves semantic security.The first case in which correctness is violated, is if two blocks are mapped to the same hash value by H.Such a collision would force encryption to signal a wrong repetition, and the probability of such an event is upper-bounded by the collision resistance of H.The second case is if a block of a message to be encrypted is such that when XORed with K 2 has the structure of a repetition signal, and we will bound this event with a concrete probability.We now formally describe the scheme, and will prove its security and correctness in the next sections.We also provide a schematic description in Figure 5. Definition 3. Let κ, n, σ, τ ∈ N with σ + τ < n, E : {0, 1} κ × {0, 1} n → {0, 1} n a block cipher, and H : {0, 1} n → {0, 1} τ a compression function.Also let S be the set of {0, 1} τ → {0, 1} σ ∪ {⊥} look-up tables, and T the set of {0, 1} τ → {0, 1} n ∪ {⊥} look-up tables.The Secure Codebook (SCB) mode of encryption is the LPSE scheme SCB[E, H] .

Security
Theorem 1.For any Proof.Define games G 0 -G 3 as in Figure 6.Note that, slightly abusing notation, we have . Moreover, G 2 and G 3 are identical until bad is set to true.To see this, observe that the two games differ in behavior only once ρ in G 2 is queried a certain value twice, since in that case the output of Enc is not an independent and uniformly random bit string.This can happen both at lines 16 and 24, but crucially, any repeated query will be input to ρ once at line 16 and once at line 24.More precisely, since β ≤ 2 σ , it is impossible to query ρ twice the same value at line 16 (this is guaranteed by the check at line 12), and it is also impossible to query ρ twice the same value at line 24 (this is guaranteed by the constantly increasing counter).Such collisions will happen with probability Pr[M i = K 2 ⊕ R] = 2 −n , and multiplying by β, the number of times that ρ is queried in total, gives an upper bound on the probability that bad will be set to true.Furthermore, we have Let adversary B be as in Figure 6.Then: where (1) follows by the PRP/PRF Switching Lemma [BR06, Lemma 1], and (2) follows by the Fundamental Lemma of Game Playing (FLGP) [BR06, Lemma 2].Finally, since β ≥ 1, we have In Figure 7 we provide a visual interpretation of Theorem 1.Note that the condition β ≤ 2 σ is a rough worst-case estimate.A more fine grained condition could depend on the transmitted messages, or better, on their distribution.For example, if it is known that among all transmitted messages, each block will not be repeated more than N ≤ 2 σ times, the condition would be unnecessary, and other more interesting properties of the distributions could significantly relax the original condition.We leave open the problem of improving the condition of Theorem 1 by taking into account the message distribution.

Correctness
Theorem 2. For any COR adversary A with β .= β(A) we can construct a CR adversary B with q(B) = β such that Proof.Define games G 0 -G 4 as in Figures 8 and 9.Note that, . Moreover, G 0 and G 1 are equivalent, G 1 and G 2 are identical until bad 0 is set to true, G 2 and G 3 are identical until bad 1 is set to true, and G 3 and G 4 are equivalent.To see that G 1 and G 2 are identical until bad 0 is set to true, observe that the two games differ in behavior only once a collision in H is provoked, that is, a previous block (M [j]) different than the current one (M i ̸ = M[j]) has the same hash value as the current one (h = H(M[j])).As long as this event does not happen, no adversary can distinguish whether the game encodes and successively decodes repeated blocks (lines 19-20 and 24-28 in game G 1 ) or simply (internally) marks them by prepending a bit which it later ignores (lines 19 and 29 in game G 2 ).We will reduce provoking such collision to the collision resistance of H.To see that G 2 and G 3 are identical until bad 1 is set to true, observe that the two games differ in behavior only once a new block (that is, one that has not been queried before) has the structure of a repeated block, that is, when XORed with K 2 it has n − σ − τ leading zeros and the last τ bits correspond to the hash of a Game G 1 1 : procedure Init 13 : then Games G 1 and G 2 for the proof of Theorem 2. Changes and additions from game G 0 to game G 1 are highlighted in the description of G 1 , and changes and additions from game G 1 to game G 2 are highlighted in the description of G 2 .
previous block.By the union bound, we obtain that this happens with probability at most  10c and 10d).The value chosen for σ guarantees that the condition of Theorem 1 is satisfied.For τ = 8, too many collisions happened, so only certain patterns are visible; for τ = 16, just a few collisions happened (six), but the decrypted image still has some errors; for τ = 24 the original image was successfully recovered without any errors.
Let adversary B be as in Figure 8. Then: where (3) and (4) follow by the FLGP.
In Figure 10 we provide a visual interpretation of Theorem 2. Note that, even though the factor 2 σ in the second term of the correctness bound is undesirable, it still gives a meaningful result if σ is significantly smaller than n, and moreover can in principle be easily replaced by a better term.To see this, notice that we obtain the term because upon decryption, SCB simply ignores the counter, in case of a repetition.It should not be too hard to extend decryption in a way that counters are also accounted for, and in case reordering of ciphertexts might happen, some clever counter prediction technique should be implemented.Still, if ciphertexts do not get reordered, the term should in principle disappear.We leave open the problem of optimizing SCB even further and improve the factor 2 σ .

Extending SCB into a Variable-Input-Length LPSE Scheme
We now show how SCB can be easily extended into the first variable-input-length (VIL) length-preserving encryption (rather than enciphering) scheme achieving semantic security, denoted VIL-SCB.The idea is straightforward, we just need to apply the ciphertext stealing (CTS) paradigm (see e.g.[RWZ12]) to SCB.Informally, given a message M ∈ {0, 1} ≥n of arbitrary length (but at least n), divide it into ℓ − 1 blocks M 1 , . . ., M ℓ−1 of size n (defined by the block cipher) plus a last block M ℓ of size m ≤ n.If m = n, use regular SCB for all blocks, otherwise only encipher all but the last two blocks to obtain C 1 , . . ., C ℓ−2 ; then encipher the penultimate block M ℓ−1 , and split the output into a bit string of length m and one of length n − m; the first m bits will form C ℓ , whereas the last n − m bits will be {0, 1} τ → {0, 1} σ ∪ {⊥} look-up tables, and T the set of {0, 1} τ → {0, 1} n ∪ {⊥} look-up tables.The Variable-Input-Length Secure Codebook (VIL-SCB) mode of encryption is the VIL-LPSE scheme VIL-SCB[E, H] .
= (E, D) with key space K = {0, 1} κ × {0, 1} n , encryption states space S, decryption states space T , and encryption and decryption algorithms E and D as defined in Figure 11.
Finally, note that if VIL-SCB is made stateless, that is, S and T are reset to the empty look-up table [ ] upon each encryption and decryption, respectively, then we obtain a length-preserving enciphering scheme (still with imperfect correctness), because now the scheme will only be computationally indistinguishable from uniform permutations (for each length at least n).

The Scheme
We now describe how to extend SCB mode into an R-LPSE scheme, RSCB, by equipping it with two additional algorithms, tagged decryption and recovery.For the former, the idea is to enhance regular decryption by including a bit string of length corresponding to the number of blocks in the message/ciphertext, where each bit signals whether the corresponding block was ambiguous, that is, whether when XORed with K 2 it is zeropadded like a repetition signal.To recover a set of tagged messages, we then essentially run through all blocks twice; in the first pass, we reconstruct the look-up table T, and in the second pass we use T to check whether any ambiguous block had the structure of a repetition, but was received before the original block.
We now formally describe the scheme, and will prove its correctness with reordering in the next section.Note that we do not need to prove security, because it is inherited by Theorem 1 for SCB mode.= (E, D, D, R) with key space K = {0, 1} κ × {0, 1} n , encryption states space S, decryption states space T , encryption and decryption algorithms E and D as defined in Figure 4, and tagged decryption and recovery algorithms as defined in Figure 12.
We next provide a simple example to better understand how RSCB works.
Example 1.Let M 1 , . . ., M 7 ∈ {0, 1} n be single-block messages, and define collides with a true repetition signal), M 6 is such that h 6 = h 3 (M 6 causes a collision in H), and M 7 = K 2 ⊕ h with h ̸ = h i for all i ∈ [7] (M 7 is a false repetition signal).Except for M 4 = M 1 , assume all messages are different, and except for h 4 = h 1 and h 6 = h 3 , also assume no other collisions in H happen. Also assume that M 4 ̸ = K 2 ⊕ h 1 and M 6 ̸ = K 2 ⊕ h 3 .Now, encrypting the messages with E in the specified order, we obtain the following sequence of ciphertexts: Assuming that the seven ciphertexts are sent in the same order, we would obtain the following sequence of messages after decrypting them with D: Therefore, only the fifth and sixth messages would be compromised.We will next see that even if we permute the order in which we (tag-)decrypt the six ciphertexts, after applying the recovery algorithm R we would end up in the same state.

Correctness (with Reordering)
, we obtain a game that behaves identically to G cor-wr-1 SCB[E,H] .To see that we can indeed apply the first modification to G cor-wr-0 SCB[E,H] without changing the behavior of the game, observe that for each block we have five cases: (1) the block is new, (2) the block is a repetition, (3) the block has the structure of a repetition signal for another transmitted block, (4) the block is new but its hash value collides with the hash of a previous block, and (5) the block has the structure of a repetition signal but for a block that was not transmitted.Then, as we saw in Example 1, it is easy to see that given a sequence of messages and a permutation, if we apply E to the sequence and then permute the resulting ciphertexts, for each of the five cases the deciphered block is the same whether we apply D and then R to the permuted ciphertext sequence, or just D.

Security-Correctness Trade-Off
In the following we assume that SCB is instantiated with a keyed compression function H : {0, 1} λ × {0, 1} n → {0, 1} τ , for some λ ∈ N. Hence we set the key space to be K = {0, 1} κ × {0, 1} n × {0, 1} λ , and adapt encryption and decryption algorithms to use key (K 1 , K 2 , K 3 ) ∈ K in the obvious way, that is, simply by replacing calls to H by calls to H K3 .Referring to our discussion in Section 2.7, this means that H must satisfy weak collision resistance (WCR) [BCK96, Definition 3.1], which means we should simply replace Adv cr H (B) by Adv wcr H (B ′ ) in Theorem 2, for an appropriate WCR adversary B ′ .The reason for assuming a keyed compression function, is that in order for our analysis below to be sound, we need adversaries not being able to perform offline queries to H. Still, as we will discuss later, our notion of correctness is quite strong, and to us it does not look so problematic if in practice H is instantiated without keys.
Towards finding a trade-off between security and correctness, first note that both bounds from Theorems 1 and 2 do not (explicitly) depend on the correctness parameter τ , but exclusively on the block length n, the security parameter σ, and the total number of transmitted blocks β.Clearly, the dominating factor is 2 σ β 2 2 n from Theorem 2, so we should minimize σ.Since from Theorem 1 we have the condition β ≤ 2 σ , we can derive lower and upper bounds on σ, given n and β: Because of the birthday bound (BB), an implicit condition on Theorem 2 is that β ≤ 2 τ 2 .But the BB also allows us to roughly lower bound the term Adv wcr H (B ′ ) by β 2 /2 τ .Therefore, since this implies that we should maximize τ , and since σ + τ < n, we can derive lower and upper bounds on τ as well, given n, σ, and β: Note that setting τ = n − σ would imply that the condition R < 2 σ+τ at line 5 of SCB[E, H].D T K1,K2 from Figure 4 would always be true, as n = σ + τ , but for efficiency reasons, it might be still better not to set τ = n − σ, since this would imply less look-ups in T on average.
For concreteness, suppose that we have n = 128 and that we estimate that the total number of transmitted blocks will not exceed β = 2 10 .Then, a reasonable choice of parameters would be σ = 10 and τ = 108, since in this case 2 σ β 2 /2 n = β 2 /2 τ = 2 −98 .This essentially gives us 97 bits of security, and since we set τ to be strictly less than n − σ, we also avoid to perform a look-up in T for every block upon decryption.But if we would instead estimate β = 2 20 , then the best we can do is to set σ = 20 and τ = n − σ = 108, since in this case 2 σ β 2 /2 n = β 2 /2 τ = 2 −68 .This essentially gives us 67 bits of security, but we have to perform a look-up in T for every block upon decryption.
Finally, we remark that in principle it is possible to arbitrarily increase the level of security by simply first applying any conventional length-preserving enciphering technique to E in order to increase its block length, and then instantiating SCB with the resulting block cipher.Clearly, this would be at the cost of overall increased time complexity.

Possible Modifications, Extensions, and Optimizations
An obvious modification to SCB could be to specify the key space as K .= {0, 1} κ , and then use a pseudorandom generator PRG : {0, 1} κ → {0, 1} κ+n to obtain K 1 ∥K 2 ← PRG(K) from K ∈ K upon initialization.This would be more in line with SCB being a wrapper around ECB, since the key space of the latter is just K from the block cipher E. Clearly, this would naturally extend to the case where the compression function H is keyed by some key K 3 ∈ {0, 1} λ simply by using a pseudorandom generator PRG ′ : {0, 1} κ → {0, 1} κ+n+λ instead.
A possible extension to SCB could be to compute 4, following the approach of the Even-Mansour scheme [EM93,DKS12].This could be a first step towards extending SCB into a CCA secure scheme.Moreover, as discussed Section 3.3, it should not be too hard to extend SCB decryption in a way that counters are also accounted for, and in case reordering of ciphertexts might happen, some clever counter prediction technique could allow to correctly guess that some blocks signal repetitions, even if they are transmitted later than the original block.
In order to gain some efficiency, line 8 of SCB[E, H].E S K1,K2 could be replaced by E K2 (R) instead (and SCB[E, H].D T K1,K2 correspondingly adapted).The downside of this approach, is that the resulting scheme would not be a wrapper around ECB anymore.Finally, to construct the compression function H efficiently one could for example follow the approach of [SS08], and then use H itself to implement the look-up tables S and T.

Further Remarks
In our notion of correctness, we only considered the case where encryption and decryption use the same security and correctness parameters σ and τ .Interestingly, there should enough room for σ and τ such that encrypting and decrypting with (slightly) mismatching parameters, should yield a satisfactory level of correctness.Moreover, if parties anyway cannot pre-agree on σ and τ , the sender could simply set a good enough σ and try increasing the value of τ and decrypt itself the ciphertext until it sees no errors, and the receiver can then do the same, until increasing τ does not change the decrypted message anymore.
Note that our notion of correctness is very strong: We assume that a sender could send a potentially bad message, that is, one containing a block M such that K 2 ⊕ M < 2 σ+τ .Clearly, a better bound than the one from Theorem 2 could be derived, if we simply impose that all such blocks are forbidden from being part of sent messages.But we feel that this is too strong of a requirement in practice, considering that anyway we also still perform the check T[h] ̸ = ⊥ at line 5 of SCB[E, H].D T K1,K2 from Figure 4, which on average would fail most of the times.
We remark that SCB is in principle parallelizable, if thread-safe look-up tables are available.The only caveat is that each thread should process all its blocks twice, to make sure that repetitions of blocks processed by other threads are correctly accounted for.Moreover, a further advantage of SCB is that error propagation is limited to single blocks, unlike other modes such as CBC where two blocks are affected.
Finally, we believe that our scheme has efficiency comparable to of most VIL (tweakable) LPE schemes from the literature, while achieving better security.The code of SCB[AES-128, [•] τ • SHA-256] used to generate the images of Figures 7 and 10 is available on https://github.com/fbanfi90/scb.A concrete estimation of the efficiency of this particular instantiation of SCB could be carried out using the results from [KS06].

Conclusions and Future Work
In this paper we initiated the study of the subtle trade-off between security and correctness of symmetric encryption, if it is required that ciphertexts preserve the length of plaintexts.To achieve this goal, we focused on the Electronic Codebook (ECB) mode of encryption, and enhanced it into a mode, Secure Codebook (SCB), that does not have perfect correctness, but achieves semantic security.Even if it only has imperfect correctness, SCB is still practical because its parameters can be set such that the probability of errors upon decryption is negligible.Nevertheless, as previously remarked, both security and correctness bounds we provided for SCB could in principle be improved by slight modifications of the scheme and analysis, and we leave this task open for future work.
Clearly, there might be other ways to achieve semantically secure length-preserving encryption, and we also leave open the question of finding other schemes with potentially better security-correctness trade-offs.One particular aspect of SCB that might limit its practicality is that the state linearly grows on each subsequent encryption and decryption.Therefore a natural question that arises is whether it is possible to design a semantically secure length-preserving encryption with constant size state.We suspect that this is not possible (for reasonable sizes), or that it would require further trade-offs.Nevertheless, it might be the case that schemes with better state growth rate than SCB exist.
We analyzed two different settings for correctness: without and with reordering of ciphertexts.A further notion of correctness that could be introduced and studied for SCB, and in general for future LPSE schemes, is one that considers deletion of ciphertexts as well.For SCB, we would clearly still have a significant advantage over the CTR approach outlined in the introduction, with respect to such a notion.Other interesting future directions might be for example enhancing SCB in a way that it achieves CCA security or even extend it into a (nonce-based) authenticated encryption scheme.For the latter, there might be a connection to the VIL tweakable LPE schemes from the literature.

Figure 3 :
Figure 3: Games defining correctness with reordering of an R-LPSE scheme Π.

Figure 6 :
Figure6: Games G 0 -G 3 and adversary B for the proof of Theorem 1. Changes and additions from game G 0 to game G 1 are highlighted in the description of G 1 , and the boxed code therein is exclusive to game G 2 .

Figure 8 :
Figure 8: Games G 0 , G 3 , G 4 and adversary B for the proof of Theorem 2.