Hybrid Code Lifting on Space-Hard Block Ciphers Application to Yoroi SPNbox

. There is a high demand for whitebox cryptography from the practical use of encryption in untrusted environments. It has been actively discussed for two decades since Chow et al. presented the whitebox implementation of DES and AES. The goal is to resist the key extraction from the encryption program and mitigate the code lifting of the program. At CCS2015, Bogdanov and Isobe proposed space-hard block ciphers as a dedicated design of whitebox block ciphers. It ensures that the key extraction is as difficult as the key recovery in the standard blackbox model. Moreover, to mitigate code lifting, they introduce space hardness, a kind of leakage-resilient security with the incompressibility of a huge program. For space-hard ciphers, code lifting (a partial leakage of the entire program) is useless to copy the functionality. In this paper, we consider a new attack model of space-hard block ciphers called hybrid code lifting . Space-hard block ciphers are intended to ensure security under a size-bounded leakage. However, they do not consider attackers (in the standard blackbox model) receiving the leakage by code lifting. If such attackers can recover the encryption program of a space-hard block cipher, such a cipher does not always satisfy the intention. We analyze Yoroi proposed in TCHES 2021. We introduce the canonical representation of Yoroi . Using the representation enables the recovery of the programs of Yoroi -16 and Yoroi -32 with 2 33 and 2 65 . 6 complexities, respectively, in spite of slight leakage. The canonical representation causes another attack against Yoroi . It breaks an authors’ security claim about the “longevity”. We additionally analyzed SPNbox proposed in Asiacrypt 2016. As a result, considering security on the hybrid code lifting, the original number of rounds is insufficient to achieve 128-bit security under quarter-size leakage.


Introduction
The use of block ciphers has become common in various environments. If block ciphers work in unreliable environments, attackers can access or modify their implementations. They exploit unavailable information for attackers in the blackbox model. In particular, attackers being allowed unlimited access or modification of their implementations are the strongest that can be assumed. We call such an attack model a whitebox model. Whitebox cryptography aims to ensure security against attackers in the whitebox model. The high demand for whitebox cryptography has been discussed, particularly in the software environment [All14,int18].
Chow et al. introduced whitebox cryptography two decades ago [CEJvO02a,CEJvO02b]. They provided whitebox implementations of block ciphers DES and AES. The primary goal is to make it difficult for an attacker in the whitebox model to extract the secret key from the implementation. The basic idea is to implement DES or AES by only continual lookups in several tables embedded with round keys. Random linear and nonlinear transformations are applied before and after each table to hide round keys from tables. Since the seminal paper by Chow et al., many other whitebox implementations have been proposed [BCD06,Kar10,LN05]. Unfortunately, almost all have been broken [BGE04, WMGP07, MWP10, MRP12, LRM + 13, con17]. Recently, Bock et al. pointed out that even attackers in a graybox model (limited whitebox model) are sufficient to extract the secret key from some whitebox implementations [BHMT16,BBB + 19]. Therefore, state-of-the-art whitebox implementations aim to ensure security against attackers in such a limited whitebox model [BBIJ17,BU18,CC19,BU21].
Another direction is designing dedicated whitebox block ciphers, whose whitebox implementations are easy [BBK14,BI15,BIT16,FKKM16]. A space-hard block cipher proposed by Bogdanov and Isobe [BI15] is one of the successful ciphers in this direction. Like whitebox implementations, they use a table, but the table is generated by a secure block cipher (such as AES). A whitebox attacker can observe the table, but extracting the secret key is equivalently difficult to the key-recovery attack in the blackbox model. Thus, we expect that space-hard block ciphers are secure against the key extraction.
Therefore, the main interest of space-hard block ciphers moves to mitigate code lifting, another whitebox attack model. The goal is to isolate the program and copy it instead of the secret key. To mitigate the code lifting, Bogdanov and Isobe introduced space hardness [BI15]. When the size of the program (table) is T , it guarantees that the probability that random plaintexts are successfully encrypted is at most 2 −Z by using M (≪ T ) partial table entries. The intuitive understanding is leakage-resilient security. Even if a whitebox attacker looks at the table and extracts M -bit information from the table, the extracted data does not help to copy the encryption program. Considering the table size ranges from KB to GB orders in common space-hard block ciphers, even the partial data (usually, quarter-size, i.e., M/T = 2 −2 ) is large, and leaking them is not easy to hide from users. Nowadays, many space-hard block ciphers have been proposed [BIT16, FKKM16, CCD + 17, KSHI20,KLLM20].
At TCHES 2021 [KI21], a new dedicated space-hard block cipher, Yoroi, was proposed. Yoroi has a new functionality called longevity beyond conventional space-hard block cipher. It enables us to update the table, and the functionality as block cipher is compatible before and after updating the table. Specifically, ciphertexts (generated by the old table) can be decrypted using the updated table. The goal is to ensure security against the following attack. An attacker leaks slight data about the table over a long time to avoid being found by users. For example, assuming the attacker leaks 16MB every day, 1600MB of data can be collected in 100 days. Eventually, the attacker can collect all table entries. An updatable table (but the secret key is not updated) is promising to address this attack. Once the table of Yoroi is updated, a whitebox attacker needs to restart leaking table entries from the beginning. To our knowledge, Yoroi is the only such cipher with this functionality.

Our Contribution
Hybrid Code Lifting. Considering the intention of the leakage-resilient security of spacehard block ciphers, we introduce a new attack model called hybrid code lifting. Our attack model is regarded as the hybrid of blackbox and whitebox models.
In the first phase, an attacker is in the whitebox model, looks at and analyzes the implementation, and leaks size-bounded arbitrary data. In the second phase, a collaborative blackbox attacker receives the leakage and analyzes the block cipher in the standard blackbox model. We say that the block cipher is insecure against the hybrid code lifting if the collaborative attacker can recover the encryption program faster than an exhaustive search of the secret key by exploiting the leakage. Complexity ‡ represents the time and data complexities to recover the encryption program from the leaked information. Arbitrary † represents a whitebox attacker w/o nonvolatile memory. Complexity ‡ represents the time complexity to recover the encryption program from collected leakages, and a query is not required.
At first glance, space-hard block ciphers look secure against hybrid code lifting. However, this intuition is not valid. First, the space hardness assumes leaking table entries directly. Next, it does not assume a blackbox attacker receiving the leakage. We believe that our attack model is natural and not extraordinarily strong for the security of space-hard ciphers. In practice, the strong incompressibility [FKKM16], which is a security model of whitebox encryption scheme, assumes similar attackers. Note that, as the authors of [FKKM16] already say, the strong incompressibility is not introduced for space-hard block ciphers. In practice, any space-hard block cipher does not satisfy the strong incompressibility because of a trivial attack. Our attack model can be regarded as the revision of the strong incompressibility so that it is compatible with space-hard block ciphers and still can be demanded.

Applying Hybrid Code Lifting to Yoroi and SPNbox.
To discuss the impact of our attack model, we apply this attack to Yoroi [KI21], which was recently proposed in TCHES 2021, in Sect. 5. Table 1 summarizes our attacks. If we expect Yoroi to be secure until a quarter-size leakage of the encryption program, 96 KB and 12 GB in Yoroi-16 and Yoroi-32, respectively, we fall short of the expectation by our attacks. Our leakage size is significantly smaller than the quarter size. Specifically, only 800-bit leakage, whose ratio is 800/(3 × 2 16 × 16) ≈ 2 −11.94 , is sufficient to recover the encryption program of Yoroi-16. Besides, the time complexity to attack Yoroi-16 is even practical. We need to say that this attack is outside the authors' security claims. However, we believe that their claimed security is too optimistic to claim that the practical use case of Yoroi can be secure.
We applied the hybrid code lifting to another space-hard block cipher SPNbox [BIT16] in Sect. 7. Unlike Yoroi, we do not find an extreme vulnerability. However, the original number of rounds would not achieve 128-bit security under quarter-size leakage.
Breaking Claimed Security of Yoroi. We also show another attack against Yoroi in Sect. 6. This attack breaks one of the authors' security claims. The attack target is the longevity of Yoroi. An attacker collects many leakages every table update. The attacker analyzes these leakages and tries to recover the encryption program without querying plaintexts/ciphertexts to encryption/decryption oracles. Table 2 summarizes our attacks. We can recover the encryption program by leakage based on the known-space attack model [BIT16,KI21], which the authors of Yoroi claimed as infeasible. Thus, this attack breaks one of the authors' security claims.
Organization. The rest of this paper is organized as follows. Section 2 introduces whitebox cryptography and space-hard block ciphers. We introduce a hybrid code lifting, a new attack model on space-hard block ciphers, in Sect. 3. The preliminaries for our attacks against Yoroi are summarized in Sect. 4. Sections 5 and 6 show the hybrid code lifting and attacks against the longevity, respectively. The hybrid code lifting on SPNbox is introduced in Sect. 7. Finally, Section 8 concludes the paper.

Block Cipher and Its Whitebox Security
Definition 1 (Block cipher). A block cipher is a function E : F κ 2 × F n 2 → F n 2 , where κ and n denote a key length and block length, respectively. The function E is invertible. Then, there is the decryption function D : F κ 2 × F n 2 → F n 2 such that D(K, E(K, P )) = P for all P ∈ F n 2 . We denote by E K and D K as encryption and decryption of the block cipher with the fixed secret key K ∈ F κ 2 .
We introduce the term program inherited from [DLPR13]. We use a program as the word in the language-theoretic sense. A program is interpreted in the explicit context of programming and execution models. Successive executions are stateless, i.e., it returns a deterministic output given a fixed input.
There is an efficient program to implement the encryption/decryption of the block cipher, where an efficient program denotes a program that is implementable by reasonable resources and returns output with reasonable time on a modern computer. An attacker can access an encryption (and/or decryption) oracle in the blackbox model. In contrast, in the whitebox model, an attacker can additionally access the program of the block cipher unlimitedly. Specifically, supposing that lookup tables are continuously used in the encryption, such as Chow et al.'s implementation or space-hard block ciphers, the attacker can look at the entire table. We introduce two well-discussed security goals for the whitebox security of the block cipher.

Key Extraction
The first-priority goal is to resist the key extraction. The goal of the key extraction is to extract K from the program, where a whitebox attacker can unlimitedly access and modify the program. The straightforward implementation does not resist the key extraction because the whitebox attacker can observe and extract the input of the key schedule. Chow  One of the successful directions resisting the key extraction is the dedicated design of whitebox block ciphers [BBK14,BI15]. For example, a space-hard block cipher [BI15], which is our focus, is such a cipher and guarantees that the key extraction is as hard as the key recovery attack against a common block cipher in the blackbox model. Thus, the main interest of dedicated design moves to the security against the code lifting.

Code Lifting and Related Works for Its Mitigation
The code lifting is to extract the program directly instead of extracting K. Attackers can easily encrypt any message once they successfully extract it. A space hardness, described later in detail, is introduced as a mitigation of code lifting in a space-hard block cipher. There are other mitigations, such as external encoding [CEJvO02a], binding [BBF + 20], and incompressibility [DLPR13]. Although understanding other mitigations is not always necessary to understand our paper, we briefly introduce them.
The external encoding was suggested by Chow et al [CEJvO02a]. It provides a program where the block cipher is masked by secret functions Q f and Q l . We cannot evaluate E using E ′ without Q f and Q l . Note that valid users need to use them when they want to use the block cipher. Therefore, the external encoding is helpful in the environment using the trusted hardware, where Q f and Q l are not exposed to attackers in even the whitebox model.
When the execution of block ciphers is bounded by trusted hardware or application, the encryption program does not work outside of the bound environment. Recently, binding using the technique of public-key cryptography have been discussed, e.g., a scheme using indistinguishability obfuscation [BBF + 20] or LWE [ABCW21]. Note that binding requires the trusted hardware that attackers never touch in even the whitebox model.
To the best of our knowledge, Delerablée et al. first introduced incompressibility as a security notion for whitebox cryptography in [DLPR13] 1 . Given an encryption program, incompressibility says that an attacker cannot compress the encryption program to a program whose size is significantly smaller in the whitebox model. They supposed the security risk in digital rights management (DRM), which is one of the most typical applications of whitebox cryptography. In DRM, attackers own the decryption program to decrypt protected contents, and the risk (of a content provider) is the re-distribution of the decryption program. Assuming the encryption program is large, the re-distribution may be somewhat discouraged due to the huge size. Note that the naive incompressibility is not always useful for the use case of whitebox cryptography except for the DRM [BABM20].
The mitigations above are not the only ones. For example, there are traceability [DLPR13], one-wayness [DLPR13], strong whitebox security [BBK14], and so on. Although we concentrate on the space hardness in this paper, we stress that there are some cases in that other mitigations are more helpful. Space hardness is superior to or inferior to other mitigations in some respects. For example, large program size is necessary for a space-hard block cipher, which is a disadvantage. On the other hand, it does not require trusted hardware, which is an advantage.

Space Hardness and Space-Hard Block Cipher
Bogdanov and Isobe introduced the space hardness [BI15], which is similar to the incompressibility but intends more leakage-resilient security than the incompressibility.
Definition 2 ((M, Z)-space hardness [BI15]). The implementation of a block cipher E K is (M, Z)-space hard if it is infeasible to encrypt (decrypt) any randomly drawn plaintext (ciphertext) with probability higher than 2 −Z given any code (table) of size less than M . As shown in existing works (e.g., §5.3.2 in [BI15] or §2.3 in [KI21]), the space hardness is the expected security notion for a leakage-resilient system. Even if an attacker successfully steals part of the entire table, the attacker cannot correctly encrypt (decrypt) a randomly drawn plaintext (ciphertext) with a high probability by using the stolen table entries when the space hardness is guaranteed.
A space-hard block cipher is the block cipher satisfying space hardness. The whitebox implementation (program) of a space-hard block cipher is generated by the following compiler.
Definition 3 (Compiler of space-hard block ciphers). A compiler of space-hard block ciphers is a function C E : F κ 2 × R → T , taking a key k ∈ F κ 2 and possibly a randomness r ∈ R drawn from some randomness 2 space R. It outputs a table T ∈ T . Then, there is a programẼ T , which has the same functionality as the original block cipher E K , namely, E T (P ) = E K (P ) for all P ∈ F n 2 . Note that the size of the table denoted by size(T ) is much larger than the size of the original key.
A space-hard block cipher is specified by continual lookups in several tables. The idea is to make tables from well-analyzed block ciphers such as AES for the whitebox implementation.
A block cipher family, SPACE, is the first instantiation [BI15]. SPACE is based on a target heavy Feistel construction in which the F function is generated by a well-analyzed block cipher (AES in their example) with the secret key by constraining the plaintext and truncating the ciphertext. Figure 1 shows the rth and (r + 1)th round functions of SPACE. The compiler loads the secret key, calculates 2 na table entries, and outputs the program using the table. In contrast, another implementation uses a short key, where AES runs every round instead of the table. Then, it does not satisfy the space hardness.
To be precise, space hardness depends on the attack mode in the whitebox. Bogdanov et al. introduced three attack models in [BIT16].
Definition 4 (Attack Models [BIT16]). Let F be a table used in the whitebox implementation of a space-hard block cipher.  Cho et al. introduced a more straightforward extraction, where an attacker leaks ciphertexts of chosen plaintexts [CCD + 17]. Then, it cannot hope for a space hardness better than Z = n − log(T ) when the attacker can leak (T × n) bits. Recently published space-hard block ciphers such as Galaxy [KSHI20] or Yoroi [KI21] mainly focus on the space hardness against the KSA/CSA. Space-hard block ciphers are promising as the ciphers in the leakage-resilient system. However, the space hardness and existing attack models are not sufficient to claim such security. First, we should suppose attackers extracting arbitrary leakage rather than extracting table entries or plaintexts because there is no reason for attackers to restrict their actions in the whitebox model. Moreover, we need to assume a collaborative blackbox attacker receiving the leakage. If the blackbox attacker can recover the encryption program, we cannot say that such a cipher is leakage-resilient secure. These two insufficiencies motivate us to consider a new extended attack model for space-hard block ciphers in Sect. 3.

Strong Incompressibility [FKKM16]
Fouque et al. introduced strong incompressibility (IND-COM and ENC-COM) in [FKKM16]. The strong incompressibility treats a symmetric-key encryption scheme rather than a block cipher. Unlike the space hardness, it supposes a blackbox attacker receiving the output of an arbitrary leakage function f . The only limitation imposed to f is that the min-entropy of the key remains sufficiently large after the leakage. Then, IND-COM ensures indistinguishability, and ENC-COM ensures that plaintexts are not successfully encrypted without the encryption oracle. We refer to [FKKM16] for the detailed definitions.
The strong incompressibility provides strong security under the leakage by a whitebox attacker. The authors of [FKKM16] mentioned Note that in the following definitions, f is not computationally bounded, so generating the tables via a pseudorandom function is not possible. before the definition of the strong incompressibility. Space-hard block ciphers generate the table via a pseudorandom function, e.g., AES. Since the function f is not computationally bounded, the attacker can choose the function that exhaustively searches the secret key by checking the consistency of the stored table of space-hard block ciphers. It implies that space-hard block ciphers never satisfy the strong incompressibility.
Remember that strong incompressibility is introduced to obtain provable security. On the other hand, a space-hard block cipher ensures its security based on the analysis of the best attack algorithm like the security of block ciphers.

Hybrid Code Lifting on Space-Hard Block Ciphers
As discussed in Sect. 2, the intention of the space hardness is to ensure security under the leakage by a whitebox attacker. However, the space hardness does not suppose a blackbox attacker receiving the leakage. We believe that supposing the blackbox attacker is necessary to satisfy the intent of the secure leakage-resilient system. Therefore, we evaluate space-hard block ciphers from a similar aspect to the strong incompressibility. Unfortunately, any space-hard ciphers never satisfy the strong incompressibility as discussed in Sect. 2. Therefore, we need to introduce a similar but different attack model to be sound for space-hard ciphers. Figure 2 shows the high-level overview of our attack model. The hybrid code lifting consists of two phases: the 1st and 2nd phases suppose whitebox and blackbox attackers, respectively. Whether the attacker wins or not is finally determined in the 2nd phase. The attack is parameterized by (λ, τ w , q, τ b ), where λ and τ w are parameters for the 1st phase, and q and τ b are parameters for the 2nd phase.

Attack Model
Definition 5 (Hybrid code lifting with parameter (λ, τ w , q, τ b )). LetẼ T : F n 2 → F n 2 denote a program of a space-hard block cipher. We assume the attacker consists of two phases.
In the 1st (code-lifting) phase, we assume an attacker who hacks into the encryption program and can do everything againstẼ T . Specifically, the attacker can read the whole table entries of the space-hard block cipher and perform arbitrary computations on it. The goal is to generate and leak at most λ bits. Note that the time complexity of this phase is bounded by 2 τw . In practice, the attacker in this phase may be malware or the attacker who temporarily steals the device.
In the 2nd (blackbox) phase, we assume an attacker receiving the leakage generated in the 1st phase. The attacker no longer analyzesẼ T in the whitebox model but analyzes E T in the blackbox model. Specifically, the attacker queries plaintexts/ciphertexts to the encryption/decryption oracles up to q times. The goal is to recover the encryption program. In other words, the goal is to successfully construct an efficient program Q : F n 2 → F n 2 such that Q(P ) =Ẽ T (P ) for all P ∈ F n 2 . Note that the time complexity of this phase is bounded by 2 τ b .

Remark on Parameters.
There are four parameters (λ, τ w , q, τ b ). The parameter λ is the size bound of the leakage. When λ = size(T ), the attacker always wins by leaking the table T itself. Thus, we consider λ < size(T ). Some space-hard block ciphers are expected to be secure even if a quarter of table entries is leaked [BI15,BIT16,KI21]. When we inherit this heuristic bound, λ ≤ size(T )/4 would be required. The parameters τ w and τ b are the bounds of the time complexity in the 1st and 2nd phases, respectively. To avoid a trivial exhaustive search, 2 τw + 2 τ b < 2 κ is necessary. The parameter q must be lower than 2 τ b . Note that constructing an efficient program is non-trivial, even using the full codebook. Namely, when κ > τ b > n, q = 2 n is a possible parameter. It is similar to the non-triviality of recovering the secret key of the block cipher with the full codebook.

Motivation of Hybrid Code Lifting
Hybrid code lifting is a hybrid of blackbox and whitebox models. We discuss why our attack model reflects the intention of space-hard block ciphers. We also show the difference from previous models.
Blackbox model after receiving leakage. We mainly focus on an attacker in the blackbox model, where the attacker receives the leakage generated by a whitebox attacker. Considering such an attacker is not new in whitebox cryptography. For example, Fouque et al. already supposed such an attacker in the strong incompressibility [FKKM16]. We discuss the necessity to consider such an attacker from the practical motivation of space-hard block ciphers.
The space hardness intends that ignoring slight leakage does not cause any security risk. Reflecting such an intention, the authors of [KI21] suggested the combined use with an anomaly detection system that detects huge leakage by monitoring process or outgoing packets. In other words, they suppose that detecting slight leakage is difficult, and it is not convincing that the attack model changes before and after slight leakage. If we do not need to consider a blackbox attacker after the leakage, the use case of the space-hard block cipher is limited, e.g., there are no blackbox attackers in the first place, or leakage is detectable despite the leakage size. The former is unlikely as a model for whitebox cryptography, and the latter loses the advantage of using space-hard block ciphers.
Hybrid Code Lifting and Longevity. Yoroi can update the table entries to enhance the security against the code lifting. Therefore, the authors suggested the following use case: users monitor the total amount of data traffic sent from the encryption device or the number of executions of the encryption program. If these numbers reach the threshold, which is equivalent to (size(T )/64)-bit leakage, users update their own table. If Yoroi is secure up to (size(T )/64)-bit leakage every table update, this use case is promising.
To provide protection in the use case above, the security claimed by the space hardness is too weak. Assuming that the cipher is vulnerable against the hybrid code lifting with parameters (λ, τ w , q, τ b ), where λ < size(T )/64, the use case above would be insecure because the implementation method does not matter for the blackbox attacker. Even if each table entry is updated, it does not contribute to security once we go to the blackbox phase.
Difference from Strong Incompressibility. As discussed in Sect. 2, space-hard block ciphers do not satisfy the strong incompressibility [FKKM16] because of a trivial attack. However, the attack model behind the strong incompressibility is crucial for the leakageresilient system. Therefore, we revisit a similar security in space-hard block ciphers. Due to the similarity of the motivation, hybrid code lifting is similar to the strong incompressibility.
Considering the table entries of space-hard block ciphers are generated by block ciphers, only limiting leakage size is not sufficient. To avoid an exhaustive search by the leakage function f , we introduce a bound for the time complexity of the leakage function (whitebox attacker) instead of restriction by a min-entropy. We believe this revision is natural because code lifting is regarded as noise-free leakage by a whitebox attacker with limited running time in practice.
The other main difference is the attack goal. Unlike the strong incompressibility, the goal of the hybrid code lifting is the program recovery, which is the most powerful attack. It is interesting to discuss similar, stronger security (like indistinguishability) for space-hard block ciphers. However, since our primary focus is the attack, we do not discuss such security in our paper.

General Attack Idea for Program Recovery
In our hybrid scenario, the collaborative blackbox attacker exploits size-bounded leakage generated by the whitebox attacker. However, since the target cipher is a space-hard block cipher, it still contains large secret information after leakage. For example, the code (table) size of SPNbox-16 is 2 16 × 16 bits. Assuming a quarter-size leakage, the remaining is 3/4 × 2 16 × 16 bits. It is significantly larger than a usual block cipher implementation loading 128 or 256-bit secret key. Thus, it is not easy for the collaborative blackbox attacker to recover the full program even after the leakage.
To recover the vast secret information (table entries), we present attack procedures that recover a part of (unleaked) table entries. This procedure highly depends on the target cipher: we use a truncated differential in Yoroi but a more straightforward guess-anddetermine approach in SPNbox. In many cases, we recover table entries picked randomly. Namely, the procedure often reveals table entries known already. This is very similar to the so-called coupon collector's problem. The goal is to collect all table entries (coupons).
Theorem 1 (Coupon Collector's Problem). There are n coupons. We already have k 0 coupons. When one trial randomly opens 1 coupon, the expected number of trials to collect all n coupons is nH n−k0 ≈ n(ln(n − k 0 ) + γ), where H n is the nth harmonic number, and γ is the Euler's constant, i.e., γ ≈ 0.577.
Theorem 1 is available to estimate the required number of procedures to collect all table entries. Often, there is a case that we cannot collect all coupons (table entries) by simply repeating the same attack procedure. Then, we want to estimate how many the limited trials can recover coupons (table entries). For that, we present the following corollary, a variant of the coupon collector's problem.
Corollary 1. There are n coupons. We already have k 0 coupons. When one trial randomly opens one coupon, the expected total number of coupons collected by t trials is Proof. The first trial opens a new coupon with a probability of n−k0 n . Therefore, n n−k0 trials are required to collect an additional coupon (k 0 + 1 coupons in total). The expected number of the remaining trials is t − n n−k0 after k 0 + 1 coupons are collected. After repeating the procedure up to collecting k coupons, the expected number of the remaining trials is after k coupons are collected. When t − n × (H n−k0 − H n−k ) = 0, the expected number of the remaining trials is 0. Therefore, and we expect k = ⌊n − e ln (n−k0)− t n ⌋. The probability that we can collect the (k + 1)th new coupon is estimated by Thus, the expected number is Before demonstrating the concrete attacks on Yoroi, we present the specification of Yoroi and some technical preliminaries used in our attacks shown in Sects. 5 and 6. We provide some important theorems and corollaries. The coupon collector's problem and its variants are often used in our analysis. Moreover, we introduce a perfect decomposition, which is based on random graph theory, to estimate the attack complexity. We finally show a canonical representation of Yoroi. Our attack recovers the table of the canonical representation. We emphasize that the designers of Yoroi overlooked the existence of this canonical representation, and it causes the critical flaw of Yoroi.
Partial table-entry leakage is not critical for space-hard block ciphers. It enables to mitigate the risk of code lifting by monitoring processes and/or outgoing packets. However, what about attackers leaking slight information from the table over a long time so as not to be found out by users? This is unlikely to be detectable. The designers of Yoroi tackled this problem and introduced a new unique functionality called longevity, where the table can be updated while maintaining the functionality of block ciphers.

Specification of Yoroi
Yoroi is a space-hard block cipher. There are two variants: Yoroi-16 and Yoroi-32 adopt key-dependent 16-and 32-bit tables (bijective functions), respectively, to build 128-bit block ciphers. Let P ∈ (F nin 2 ) ℓ and C ∈ (F nin 2 ) ℓ be a plaintext and ciphertext, respectively, and C is computed from P as . Each component is defined as follows: S-layer γ r . The S-layer γ r : (F nin 2 ) ℓ → (F nin 2 ) ℓ consists of ℓ key-dependent n in -bit bijective functions. S 1 and S 3 are applied for the first and last rounds, respectively, and S 2 is applied for the rest of the rounds.
where j = 1 for r = 1, j = 3 for r = R, and j = 2 for the rest of r.
Linear layer θ. The linear layer θ : Affine layer σ r . In the add-constant layer σ r : (F nin 2 ) ℓ → (F nin 2 ) ℓ , t-bit constants are added in the lsb t bits of each element of the state.
where C r = r.

AES layer A.
Finally, the AES with a fixed key K A is applied.

Updating Tables and Longevity
Yoroi has a unique feature called longevity, where updating the table is possible while  maintaining the functionality. It intends to ensure security even if partial table entries are  leaked every table update, and in total, it accepts massive leakage beyond the program size.
To achieve the longevity, Yoroi prepares a secure m-bit block cipher E and updates three tables as Since θ and σ i do not change the top m-bit values of each branch, the application of (E∥I) after the rth S-layer is canceled out by applying of (E −1 ∥I) before the (r + 1)th S-layer. As shown in Definition 3, the whitebox compiler can output a fresh table by using different randomness. To the best of our knowledge, Yoroi is the only space-hard block cipher accepting randomness. Note that updating the table of Yoroi is possible with an old table, and the secret key is unnecessary.
• Secure against any attack in the blackbox model.
• Secure against key extraction in the whitebox model.
For example, in Yoroi-16, encrypting a random plaintext is not possible even if (3 × 2 14 ) table entries are leaked by the KSA. Moreover, for longevity, encrypting a random plaintext is not possible even if (3 × 2 10 ) table entries are leaked by the KSA every table update. In other words, it accepts massive leakages, e.g., further beyond 3 × 2 nin as long as the table is updated with the proper interval.

Canonical Representation of Yoroi
We introduce the canonical representation of Yoroi, and it plays a crucial role in our attack. Yoroi is specified by three original tables S 1 , S 2 , and S 3 , but there are implementations using T 1 , T 2 , and T 3 with maintaining its functionality. The canonical representation is uniquely determined by the original three tables and easily computed by three tables T 1 , T 2 , and T 3 . Figure 3 shows the representation. Three tables T 1 , T 2 , and T 3 are table entries used in the implementationẼ T . The representation additionally involves m-bit permutation E r , but these applications are always canceled out by applying D r in the next round 4 . One important remark is that we can assign any (E r , D r = E −1 r ) such that whole encryption is perfectly preserved. Then, the tables used in the 1st and last rounds areT   Figure 3: Canonical representation of encryption algorithm of Yoroi.
We now exploit the freedom of E r and constructT r for r ∈ {1, 2, . . . , R − 1} satisfying the following property.
Here, lsb t denotes a t-bit string from the LSB, and msb m denotes an m-bit string from the MSB. Given E r−1 and T , Algorithm 1 finds E r satisfying Property 1 inT r . Appendix B provides a small example to understand Property 1. Proof. We first look at the 1st round, where T 1 = (E∥I) • S 1 . Then, We focus on the set S 1 := {x|lsb t (T 1 (x)) = 0}. Since E 1 • E is not applied to the LSB t bits, S 1 is determined by S 1 only. According to Property 1, S 1 is sorted by ascending order. For any x, x ′ ∈ S 1 , we assign (E 1 • E) such thatT 1 (x) <T 1 (x ′ ) for all x < x ′ . Such (E 1 • E) is uniquely determined, andT 1 is uniquely determined.
We next look at the 2nd round, where T 2 = (E∥I) • S 2 • (D∥I). Then, Recall that (E 1 • E) is uniquely determined by S 1 . Since the inverse (D • D 1 ) is also uniquely determined, S 2 • ((D • D 1 )∥I) is uniquely determined by S 1 and S 2 . Similar to the 1st round, (E 2 • E) is uniquely determined, andT 2 is uniquely determined.
Algorithm 1 Algorithm to determine E r satisfying Property 1 inT r .
The iterative application shows (E r • E) is uniquely determined by S 1 and S 2 until r = R − 1. We finally look at the last round, where T 3 = S 3 • (D∥I): Since (E R−1 • E) is uniquely determined by S 1 and S 2 , the inverse (D • D R−1 ) is also uniquely determined. Thus,T R is uniquely determined by S 1 , S 2 , and S 3 .
Proposition 1 shows that Yoroi has a unique canonical representation determined by S 1 , S 2 , and S 3 only. Our attack recovers the canonical representationT r for all r ∈ {1, 2, . . . , R}. In other words, we do not recover S 1 , S 2 , and S 3 . Note thatT r for all r ∈ {1, 2, . . . , R} (and the K A ) are sufficient to encrypt (resp. decrypt) any plaintext (resp. ciphertext).

Perfect Decomposition
In our attack procedure, we sometimes divide the set F n 2 into 2 n−m subsets, where each subset contains 2 m elements, to recover the canonical representationT r . Specifically, each subset contains all xs whose lsb t (T r (x)) is the same. The attack procedure can detect whether two elements belong to the same subset or not probabilistically. For example, when the attack procedure detects that x 1 and x 2 belong to the same subset and x 1 and x 3 belong to the same subset, it derives that x 2 and x 3 belong to the same subset without having to detect it via the attack procedure. We regard this behavior as the connectivity of the random graph.
Theorem 2 (Connectivity of random graph [ER59]). Let G(n, p) be a random graph, where there are n vertices, and each edge is included in the graph with a probability of p. Then, the probability that the graph G(n, p) is connected is estimated as e −e −c , where c = p × n − ln n.
We consider 2 n−m random graphs. Each random graph contains 2 m vertices. It is assumed that one procedure detects that each edge is included in the graph with a probability of p. When we repeat the procedure s times, we simply assume that the probability that each edge is included in the graph is enhanced to p × s. Then, the goal of the perfect decomposition is to construct 2 n−m disjoint connected random graphs. We call this problem (n, m, p)-perfect-decomposition.
Proposition 2 ((n, m, p)-perfect-decomposition.). The set F n 2 can be divided into 2 n−m subsets, where each subset contains 2 m elements. Let p denote the probability that one procedure detects (x, x ′ ) ∈ (F n 2 × F n 2 ) belong to the same subset, and we assume that the probability increases to s × p by s repetitions, where s ≪ 1/p. Let p succ be the probability that s procedures can divide the set F n 2 into 2 n−m subsets (with size 2 m ). Then, the number of required repetitions to reach the probability of p succ is Proof. We first consider the probability that s procedures can find a subset whose number of elements is 2 m . Assuming each pair is independent, we can regard this problem as the connectivity problem of a random graph, where there are 2 m vertices and every possible edge occurs independently with probability s × p. Due to Theorem 2, where c = p × s × 2 m − ln 2 m . We need 2 n−m connected graph. Assuming they are independent, the probability is estimated as The parameter c required to achieve the success probability p succ is We provide some examples.

Hybrid Code Lifting on Yoroi
We consider the security of the hybrid code lifting against Yoroi. The following is the notation used in our attack.
• t: bit size, where MDS and constant XORing are applied in θ and σ i , i.e., t = 4. Note that n in = m + t.
• lsb t (X): t-bit string from the LSB of X ∈ F * 2 . • msb m (X): m-bit string from the MSB of X ∈ F * 2 . We introduce notations used in the analysis based on the canonical representation. We use the following notations to recover the tableT r of the canonical representation.
• ρ ∈ (F m 2 ) 2 t : 2 t -dimensional vector whose elements take a value over F m 2 . • A i := {x ∈ F nin 2 |lsb t (T r (x)) = i}. SinceT r is a permutation, the number of elements of A i is 2 m for any i ∈ F t 2 . We sometimes use A ρi when ρ has not been recovered yet. Then, when x, x ′ ∈ A ρi , lsb t (T r (x)) = lsb t (T r (x ′ )).
• η ∈ (F t 2 ) 2 m : 2 m -dimensional vector whose elements take a value over F t 2 . • B j := {x ∈ F nin 2 |msb m (T r (x)) = j}. SinceT r is permutation, the number of elements of B j is 2 t for any j ∈ F m 2 . We sometimes use B ηj when η has not been recovered yet. Then, when x, x ′ ∈ B ηj , msb m (T r (x)) = msb m (T r (x ′ )).
• x j,i ∈ F nin 2 : input ofT r such thatT r (x j,i ) = j∥i. Figure 4 summarizes these notations. We show the following Lemma.

Lemma 1. The tableT r of the canonical representation can be recovered by A i and B ηj for all i and j.
Proof. In A 0 , inputs satisfying lsb t (T r (x)) = 0 are stored. For any x, x ′ ∈ A 0 and x < x ′ , msb m (T r (x)) < msb m (T r (x ′ )) holds due to Property 1 in the canonical representation. Let x j ′ ,0 be the j ′ th element of the (ascending) sorted A 0 . Then, when x j ′ ,0 ∈ B ηj , we obtain η j = j ′ . We now obtain A i and B j for all i and j. Thus, when x ∈ A i and x ∈ B j , T r (x) = j∥i.
On the canonical representation, A i and B ηj for all i and j are enough to recoverT r . We do not need to know η because it can be complemented from A 0 .

Attack Procedure
We show a detailed attack procedure. For the sake of simplicity, we first show an attack whose leakage size is 128 + (R − 1) × 2 t × n in bits. We later reduce the leakage size to 128 + (R − 1) × 6 × n in bits with a negligible impact on the complexity.

The 1st (Code Lifting) Phase
The 1st phase is the code lifting by a whitebox attacker. The attacker generates a leakage, which is useful in the 2nd phase.
The attacker first extracts the AES key K A from the encryption program. It might not be easy assuming there is a secure whitebox implementation of AES. However, in practice, realizing such a secure implementation is difficult [BHMT16,BBIJ17,BU18]. Therefore, we assume the AES key can be extractable. Note that the designers of Yoroi introduced the AES layer for security in the blackbox model and do not expect the layer to mitigate code lifting [KI21].
The knowledge of K A reveals the Yoroi-core part. We notice the Yoroi-core part only is an insecure block cipher because it has a non-trivial truncated differential distinguisher (see Fig. 5 and the 2nd phase in detail). Despite such a weakness, recovering all table entries is non-trivial because we cannot guess the correct secret table due to huge search space, e.g., 2 16 ! ≈ 2 954036 . For the practical attack, we leak small fragments about the table.
We focus on the canonical representation of Yoroi. By applying Algorithm 1 from r = 1 to r = (R − 1) iteratively, the whitebox attacker generates (T 1 , . . . ,T R ) of the canonical representation. We finally leak all elements in B 0 , i.e., x 0,i satisfyingT r (x 0,i ) = (0∥i) for all i ∈ F t 2 , ofT r from r = 1 to R − 1. Note that we do not leak B 0 ofT R . Please refer to Sect. 5.1.2 on how to exploit the leakage.
We finally summarize the attack complexity in the code lifting phase. Generating the table of the canonical representation (and retrieving x 0,i satisfyingT r (x 0,i ) = (0∥i)) is possible with the complexity of 2 nin for each r. Therefore, the complexity is (R − 1) × 2 nin , which is about 2 18.8 and 2 35.9 for Yoroi-16 and Yoroi-32, respectively. The leakage size is 128 + (R − 1) × 2 t × n in bits, which are 1920 and 7808 bits in Yoroi-16 and Yoroi-32, respectively.

The 2nd (Blackbox) Phase
The 2nd phase is the differential cryptanalysis by a blackbox attacker receiving the leakage generated by the 1st phase. We ignore the last AES layer and regard the output of the Yoroi-core part as ciphertexts because the leakage includes K A .
We first show that the Yoroi-core part is easily distinguished from ideal block ciphers. Recall the linear layer θ, where the MDS matrix is only applied to the last t bits of each output ofT r . Therefore, θ does not diffuse active branches when there is no difference in the last t bits. This causes the truncated differential shown in Fig. 5, where ∆ denotes any non-zero difference. The probability of satisfying this truncated differential is 2 −t(R−1) . Thus, a distinguishing attack is easy. On the other hand, the attack goal is the program recovery 5 . We use the following procedure. 5 In practice, distinguishing attacks are less interesting for the goal of the hybrid code lifting. This is because a whitebox attacker can leak any plaintext-ciphertext pairs. Therefore, a collaborative blackbox attacker can distinguish by querying the leaked plaintexts. Of course, as we demonstrated, a non-trivial distinguisher would allow the blackbox attacker to recover all table entries. Therefore, the designer should eliminate such a distinguisher. Step 2a. Following the truncated differential shown in Fig. 5, the blackbox attacker runs the following procedure.
1. Prepare a set with 2 nin plaintexts as following: For the 1st element, we takes all values over F nin 2 . For the others, we take randomly-chosen fixed values. Then, we have 2 nin plaintexts and obtain corresponding ciphertexts by using the encryption oracle.
Every procedure requires 2 nin times, data, and memory complexities, and all (= 2 n in 2 ≈ 2 2nin−1 ) pairs are checked at the same time. Assuming that (P [1], P ′ [1]) belong to the same subset A ρi , the probability that such pairs can be detected by one procedure is 2 −t×(R−2) . We estimate the required number of procedures to divide F nin 2 into A ρi . As shown in Proposition 2, it is regarded as (n in , m, 2 −t×(R−2) )-perfect-decomposition. Let s be the required number of procedures to divide F nin 2 into A ρi with the success probability of 50%. Then, the total complexity is estimated as follows.
Step 2a does not recover ρ i . We refer to the leakage and retrieve the data about Step 2c. We next recover B ηj . The truncated differential shown in Fig. 5 is not helpful. We use the modified truncated differentials shown in Fig. 6, where α is a m-bit arbitrary difference, and β is a t-bit non-zero difference chosen by the blackbox attacker. Then, difference ζ l ∈ F t 2 is a non-zero difference determined as (ζ 1 , ζ 2 , . . . , ζ ℓ ) = (β, 0, 0, . . . , 0) · M −1 .
The subset A i is already recovered in Step 2a and Step 2b. Moreover, the attacker knows x 0,i that satisfiesT 1 (x 0,i ) = (0∥i) by using the leakage. By using them, the attacker runs the following procedure. Figure 6: Another truncated differential of Yoroi-core part.
1. Compute (ζ 1 , ζ 2 , . . . , ζ ℓ ) from a non-zero β ∈ F t 2 . 2. Prepare a set with 2 2m plaintexts as follows: For the 1st element, we use a subset A i1 , where i 1 denotes a randomly chosen index. For the 2nd element, we use A 0 . For other lth elements, we randomly choose one text x 0,i l stored in the leakage, i.e.,T 1 (x 0,i l ) = (0∥i l ). Then, we have 2 2m plaintexts and obtain corresponding ciphertexts by using the encryption oracle.
Step 2d. The attacker has already recovered A i for all i ∈ F t 2 and B ηj for all j ∈ F m 2 . As shown in Lemma 1, they recover η j andT 1 thanks to the canonical representation.
Step 2e. The attacker can remove the first round becauseT 1 is already recovered. The attacker can use the same procedure above to recoverT 2 ,T 3 , . . . ,T R−1 . The probability of the truncated differential increases compared with the one to recoverT 1 . Thus, the attack complexity decreases. We regard these complexities as negligible.
Step 2f. The attacker already has (T 1 ,T 2 , . . . ,T R−1 ) and finally recoversT R . It is easy to recover the table entry ofT R by checking the consistency with the ciphertext given by the encryption oracle.

Reducing Leakage Size
We show an additional technique that reduces the leakage size by more than half with a negligible complexity increase.
The procedure above is more efficient than Step 2c because we do not need to recover B ηj completely. Only observing one truncated differential is enough to detect ρ i = ζ 2 because the probability that truncated differential holds is very low, i.e., 2 −nin×(ℓ−1) when wrong ρ i is used. Specifically, 2 m+m/2+1 time, data, and, memory complexities are required to recover one unknown ρ i , and in total 10 × 2 m+m/2+1 . The complexity is negligible compared to Step 2a and Step 2c.

Summary of Results
Our results are summarized in Table 1. Our attack uses 800-bit leakage and 3008-bit leakage for Yoroi-16 and Yoroi-32, respectively. These leakage sizes are significantly less than size(T )/64.
The time complexity of the code-lifting phase is about 2 nin . Thus, the complexity of the 1st phase is practical for both Yoroi-16 and Yoroi-32.
The time complexity of the blackbox phase is about 2 33 to attack Yoroi-16. Therefore, the time complexities of both 1st and 2nd phases are practical. On the other hand, the time complexity is about 2 65.5 to attack Yoroi-32. While we cannot say this is practical for a single PC, the complexity of 2 64 is usually not recommended.
We finally remark that this attack is outside of the security claim of Yoroi. On the other hand, we need to stress that the claimed security of Yoroi is too weak to claim that the use case of Yoroi can be secure. Note that we break Yoroi's claimed security for longevity in Sect. 6.

Experimental Verification
We implemented our attack against Yoroi-16 for the verification. We tried to recoverT 1 10 times, and Table 3 shows the experimental results for the attack on Yoroi-16. The theoretical estimation of s in Step 2a is 2 15.52 , and the corresponding average value in our experiments is 48232.8 ≈ 2 15.56 . Therefore, our theoretical estimation works well. The theoretical estimation of d in Step 2c is 12, and the corresponding average value in our experiments is 12.6. Again, our theoretical estimation works well.

Hybrid Code Lifting using More Leakage
Drawing the trade-off between the leakage size and attack complexity is interesting. We now present another simple attack using more leakage (but the size is still smaller than 3×2 n in 4 table entries). The simple attack no longer exploits the specific property of Yoroi, such as the canonical representation. Assuming that s 1 and s 2 table entries are leaked from T 1 and T 2 , respectively, the total leakage is s 1 + s 2 < 3×2 n in 4 . Note that we assume the AES key K A is also leaked. Then, we can prepare s ℓ 1 chosen plaintexts, where the first round can be computed. The probability that these chosen plaintexts are successfully encrypted by one round before the last round is ( s2 2 n in ) ℓ×(R−2) . Ciphertexts corresponding to the chosen plaintexts are obtained by asking the encryption oracle, and the outputs of the Yoroi-core part can be computed. As a result, we have s ℓ 1 × ( s2 2 n in ) ℓ×(R−2) × ℓ overlapping table entries about S 3 , and if the the following inequality holds, we can recover the full table entries of S 3 because of Theorem 1. Equation (1) holds when s 1 = 22192 on Yoroi-32. Then, we query 22192 4 ≈ 2 57.75 chosen plaintexts. The data and time complexities to recover S 3 are 2 57.75 , which is faster than the attack shown in Sect. 5.1. Of course, the required leakage size is much larger, i.e., 32 × 3×2 32 4 bits ≈ 12GB. Unlike Yoroi-32, this simple attack does not draw an interesting trade-off on Yoroi-16. The same analysis holds when s 1 = 24, but the required data and time complexities are 2 36.68 , which is much higher than the attack shown in Sect. 5.1.

Attacks against Longevity of Yoroi
In this section, we provide another attack, where the target is the longevity of Yoroi. An attacker collects leakage every table update and tries to recover the program without querying oracle. Figure 7 shows the overview of the attack. If the table leakage is limited by the KSA, the designers of Yoroi claimed the ((3 × 2 nin )/64, 128)-space hardness.
The canonical representation is useful in this attack too. We show three different attack models. The first one is the strongest, and it is unlikely to resist such attacks in general. The last one is the weakest, which corresponds to the known-space attack. Note that the last attack breaks one of the designers' security claims about longevity.

Attack in Whitebox Model with Nonvolatile Memory
Let us assume a whitebox model, where a nonvolatile memory is available. An attacker can memorize the old program on the encryption device and leak a little bit of the old program with each table update. Considering such a model, bounding the leakage size of   every table update no longer restricts the ability of the whitebox attacker. Therefore, such attacks are unavoidable.

Attack in Whitebox Model without Nonvolatile Memory
For more constructive discussion, we next assume a weakened whitebox model, where nonvolatile memory is not available, and all knowledge about the old program is lost after the table update. We believe that such a weakened model is still reasonable enough for the discussion of the whitebox model.
Unfortunately, Yoroi is insecure against such a weakened model. Recall that Yoroi has the canonical representation. The attacker can reconstruct the canonical representation from every updated table by using Algorithm 1. Therefore, the attacker leaks the partial data of (T 1 ,T 2 , . . . ,T R ) every table update. Specifically, it leaks 3 × 2 nin /64 table entries every table update. We can collect (T 1 ,T 2 , . . . ,T R ) by receiving leakage with R × 2 nin 3 × 2 nin /64 = ⌈64R/3⌉

Known-Space Attack: Breaking Security Claim
We finally present an attack using the known-space attack (KSA) model only. The authors claim the space hardness when at most 2 nin /64 = 2 nin−6 table entries are extracted by the KSA every table update. We break this security claim. Specifically, we reconstruct the encryption program by using the leakage generated by the KSA model only. Since the attacker can encrypt any plaintext with the reconstructed program, a space-hardness is lost.
In the KSA, the attacker cannot modify table entries. The attacker extracts only partial data about T 1 , T 2 , and T 3 but cannot choose extracted entries. To avoid the confusion, we use T . The attack goal is to recover the canonical representation, i.e., we recoverT 1 ,T 2 , . . .T R . Note that we assume the AES key K A is known or extractable. Otherwise, the discussion is meaningless because it is secure regardless of the Yoroi-core part. , andT 2 .

Recovery ofT 1
The first goal is to recoverT 1 , whereT 1 = (E . The attacker extracts α ≈ 2 nin−6 table entries and checks lsb t (T is not applied to the last t bits, Therefore, we can easily recover We next recover B ηj . Note that the mask E . Therefore, we search for (x, x ′ ) satisfying About α 2 × 2 −m ≈ 2 2nin−13−m pairs can be observed every T . By collecting leakages with different T , we recover B ηj for all j ∈ F m 2 . In the canonical representation,T 1 can be recovered by A i and B ηj for all i and j (see Lemma 1). We omit to estimate the complexity because this step is more negligible than the following steps.

Recovery ofT 2
The second goal is to recoverT 2 , whereT 2 = (E We now have α ≈ 2 nin−6 table entries of T (T ) 1 andT 1 is already recovered. In other words, we collect some table entries of E (T ) 1 by α trials, and Corollary 1 is available to estimate the number of recovered table entries.

∥I).
The attack procedure to recoverT 2 is the same asT 1 , but the number of available table entries decreases from 2 10 to 226.536. Therefore, the cost to recoverT 2 is larger than the cost to recoverT 1 . The more rounds we analyze, the more cost increases.

Recovery ofT R−1
Similarly to the recovery ofT 2 , we recover partial data of E

∥I)
for each T and obtain partial data ofT r+1 . By collecting many leakage every T , we recover all table entries ofT r+1 . The following is the summary of this analysis.

Recovery ofT R
We finally recoverT R . ∥I) are available.
The probability 0.05336 is lower than 0.2134, which is the case to recoverT R−1 . However, we do not need to search for collision in this step because of no post-mask, i.e.,T R−1 = T ∥I). Due to the coupon collector's problem, we need to check roughly 2 20 values. Therefore, 1/0.05336 × 2 20 ≈ 2 24.23 table updates are required to recoverT R , but it is negligible compared to 2 35.97 .

Experimental Reports
Considering the attack complexity of 2 48.78 for Yoroi-16, our attack is difficult to experimentally verify. Therefore, to verify the correctness of our complexity analysis, we implemented our attack with small-scaled Yoroi. We use Yoroi-10, where n in = 10, t = 4, m = 6, and R = 5. We first show the theoretical analysis.
With p succ = 0.5, s ≈ 2 23.26 . Table 4 shows the experimental results for the number of the required table updates to recoverT R−1 . We conducted four experiments, and the average number of required table updates is about 2 23.11 . Therefore, our theoretical estimation works very well.

Remark on ACSA and Countermeasure
In the KSA, we need to rely on probabilistic events to recover table entries about E r−1 ∥I) as possible are known. Therefore, the security enhancement is slow even if the number of rounds increases.
We believe that space-hard block ciphers should be secure against not only the KSA but also ACSA. Thus, the problem of how to design space-hard block ciphers with longevity goes back to being an open problem.

Hybrid Code Lifting on SPNbox
The hybrid code lifting is an attack model to evaluate the blackbox security after the code lifting by a whitebox attacker. It is interesting if other space-hard block ciphers are still secure in this attack model. In this section, we discuss the security against the hybrid code lifting on SPNbox [BIT16].
SPNbox has four variants: SPNbox-32, SPNbox-24, SPNbox-16, and SPNbox-8. For each, the whitebox implementation uses a single table T b : F nin 2 → F nin 2 , where n in is 32, 24, 16, and 8, respectively, and the number of rounds of all variants is R = 10. We denote the number of table entries by N , and the claimed security of all variants in the whitebox model is (N/4, 64)-space hardness.
We evaluate the security against the hybrid code lifting with parameter (λ, τ w , τ b ) = (T /4 × n in , 2 64 , 2 128 ). In other words, we evaluate if SPNbox still maintains 128-bit security against key-recovery attacks in the blackbox model even if T /4 table entries are leaked.
The state of SPNbox-n in is organized as a vector of ℓ = n/n in elements of n in bits each: A plaintext X 0 is encrypted to a ciphertext X R by applying R rounds of the following round transformation to the plaintext: For all concrete proposals, SPNbox-8, SPNbox-16, SPNbox-24, and SPNbox-32, we set the number of rounds to R = 10. We now define each of the components γ, θ, and σ r .
The Nonlinear Layer γ. γ is a nonlinear substitution layer, in which t key-dependent identical bijective n in -bit S-boxes are applied to the state: In SPNbox-n in , the substitution S nin is realized by a dedicated small block cipher of block length n in . In this paper, we omit the specifications of small block ciphers and refer to [BIT16] for the details.
The Linear Layer θ. θ is a linear diffusion layer that applies a t × t MDS matrix to the state: We denote by cir(a 0 , . . . , a ℓ−1 ) the ℓ×ℓ circulant matrix A with the coefficients a 0 , . . . , a ℓ−1 in the first row; and by had(a 0 , . . . , a ℓ−1 ) the ℓ × ℓ Hadamard matrix A with coefficients A i,j = a i⊕j , with ℓ a power of two.
For the concrete proposals SPNbox-n in with n in = 32, 24, 16, 8, the matrix M nin is defined as follows: for n in = 32, for n in = 24, The Affine Layer σ r . σ r is an affine layer that adds round-dependent constants to the state:

Code-Lifting Phase
The hybrid code lifting allows not only table entries but also arbitrary leakage. For example, extracting plaintext-intermediatetext pairs is possible, but it is non-trivial to plaintexts are always encrypted up to the 1st round. The attacker additionally extracts table entries that enable us to encrypt these N plaintexts up to one round before the last round. Then, N × (R − 2) × ℓ (overlapping) table entries are extracted. This is equivalent to the coupon collector's problem, where there are 2 nin coupons and 2 nin−2 coupons are collected with N × (R − 2) × ℓ trials 6 . Due to Corollary 1, We estimate N and M 0 on each parameter, and Table 5 summarizes these results. We notice N = 0 in SPNbox-8, but this is convincing because we need (R − 2) × 16 = 128 table lookups but the size of leaked table entries is only 64 (= 2 8 4 ).

Blackbox Phase
The blackbox phase consists of two steps.

The 1st
Step. In the 1st step of the blackbox phase, N chosen plaintexts are encrypted up to one round before the last round in the local of the blackbox attacker. The attacker also queries these chosen plaintexts to the encryption oracle. Then, N × ℓ (overlapping) table entries are extracted from the last round. This is equivalent to the coupon collector's problem, where there are 2 nin coupons, 2 nin−2 coupons are already collected, and additional coupons are collected by N × ℓ trials. Due to Corollary 1, the number of finally collected table entries is estimated as follows.
We estimate M on each parameter, and Table 5 summarizes these results.

The 2nd
Step (Simple Procedure). The goal of this step is to recover the whole  where s = R−1 r=2 s r . Therefore, M ℓ × p plaintexts can be successfully encrypted (if each guess is correct) up to one round before the last round. However, as we already mention in the 2nd question, we need to filter wrong guesses. Since there are at most (2 nin − M) s expansions each plaintext by guessing table entries, we observe at most (2 nin − M) s × M ℓ × p wrong encryptions. To filter them, we use data, where s + 1 out of ℓ accesses to table entries are known in the last round. The probability that such data appears is f = ℓ s+1 × 1 − M For 11-round SPNbox-16, 2 4.97 (overlapped) table entries are recovered. However, since our goal is to recover the full table entries, it has not been clear whether 11-round SPNbox-16 can be attacked or not. As far as we analyze, we cannot recover any table entry for 12-, 16, and 19-round SPNbox-16, SPNbox-24, and SPNbox-32, respectively.

Conclusion
In this paper, we propose a new attack model called hybrid code lifting for space-hard block ciphers. We implicitly expected the leakage-resilient security by the space hardness, but it is not always equal. Our attack model reflects the leakage-resilient security, where λ-bit leakage by a whitebox attacker running in time 2 τw does not decrease the blackbox security. As an application, we showed practical program recovery attacks on Yoroi. Moreover, we also break the security claim about the longevity.
Hybrid code lifting and our attack against longevity present many future topics for both attacks and designs for space-hard block ciphers.
On attacks, the security against hybrid code lifting on existing space-hard block ciphers such as SPACE [BI15] or WhiteBlock [FKKM16] can be discussed. We evaluated SPNbox, but the technique is straightforward. More advanced and non-trivial attacks are open questions. In particular, we leave it as an open question whether we can attack 12-, 16-, and 19-round SPNbox-16, SPNbox-24, and SPNbox-32, respectively.
In designs, our attacks against the longevity on Yoroi are critical, and Yoroi is not easy to tweak to resist our attack. Although it would be possible to resist only known-space attacks by increasing the number of rounds, it is unlikely to resist not only arbitrary leakage but also adaptive chosen-space attacks. The problem of designing such ciphers goes back to being an open problem.
Our attack goal was a program recovery to demonstrate apparent vulnerability in the use case of space-hard ciphers. On the other hand, for future design, only defending program recovery is unlikely sufficient. For example, we should not allow blackbox attackers to recover the program, which can encrypt/decrypt many (but not all) plaintexts/ciphertexts. It is unavoidable that λ-bit leakage can leak λ-bit plaintext-ciphertext pairs. Therefore, one of the possible security goals is that blackbox attackers cannot output plaintext-ciphertext pairs beyond λ-bit data without an encryption oracle. In practice, as far as we analyze, 12-round SPNbox-16 can be a good candidate for ensuring such strong security.

A Trivial Chosen-Space Attack
The authors of [KI21] claimed the (N/4)-space hardness against the known/chosen-space attack. However, the analysis described in [KI21] is only for the known-space attack, and the authors overlooked a gap between the known-space attack and chosen-space attack when the space-hard block cipher uses multiple tables. Yoroi has S 1 , S 2 , and S 3 , and S 2 is used more than S 1 and S 3 . Therefore, extracting more table entries from S 2 is advantageous for attackers.
Let us consider the case of Yoroi-16. We extract 2 nin /8, 2 nin /2, and 2 nin /8 table entries from S 1 , S 2 , and S 3 , respectively. The total leakage size is Then, the probability that any randomly drawn plaintext is encrypted is which is clearly higher than 2 −128 .

B Understanding Canonical Representation
We exploit the canonical representation in our attacks against Yoroi. In this appendix, we provide a small example to understand the representation. Let us consider a small-scaled Yoroi, where a 4-bit bijective function is used instead of the n in bijective function. We use t = 2 and m = 2. Namely, θ and σ r are applied to the last 2 bits, and the top 2 bits become a direct input of the next round. We suppose that θ and σ r are revised adequately according to the change of bit length.
As an example, let us consider the following 4-bit bijective function S. Note that we can compute S(x) for any x from A i and B j . For example, S(1010) = η 01 ∥ρ 11 = 1011 because 1010 belongs to A ρ11 and B η01 .
We now change this S-box to the one in the canonical representation, where Property 1 holds in (E∥I) • S. Specifically, we focus on A 00 , and Then, E is defined as follows.
After we change the S-box to the one in the canonical representation by applying E∥I after the S-box, we next go to the next round. Note that the S-box in the next round is S • (E −1 ∥I) to maintain the functionality. Therefore, we next change S • (E −1 ∥I) to the one in the canonical representation.