Finding Collisions for Round-Reduced Romulus-H

. The hash function Romulus-H is a finalist in the NIST Lightweight Cryptography competition. It is based on the Hirose double block-length (DBL) construction which is provably secure when used with an ideal block cipher. However, in practice, ideal block ciphers can only be approximated. Therefore, the security of concrete instantiations must be cryptanalyzed carefully; the security margin may be higher or lower than in the secret-key setting. So far, the Hirose DBL construction has been studied with only a few other block ciphers, like IDEA and AES . However, Romulus-H uses Hirose DBL with the SKINNY block cipher where only very little analysis has been published so far. In this work, we present the first practical analysis of Romulus-H . We propose a new framework for finding collisions in hash functions based on the Hirose DBL construction. This is in contrast to previous work that only focused on free-start collisions. Our framework is based on the idea of joint differential characteristics which capture the relationship between the two block cipher calls in the Hirose DBL construction. To identify good joint differential characteristics, we propose a combination of MILP and CP models. Then, we use these characteristics in another CP model to find collisions. Finally, we apply this framework to Romulus-H and find practical collisions of the hash function for 10 out of 40 rounds and practical semi-free-start collisions for up to 14 rounds.


Introduction
Hash functions are among the most versatile and widely-used cryptographic schemes. However, compared to other cryptographic primitives, such as keyed block ciphers, estimating the security margin of hash functions appears even more challenging. Most applications rely on standardized hash functions, such as the NIST standards SHA-2 and SHA-3, which have undergone substantial scrutiny. Both are however not well-suited for more constrained platforms, which may struggle with the large state size of the Keccak permutation or the software-oriented design of SHA-2. For this reason, the National Institute of Standards and Technology (NIST) ran the Lightweight Cryptography (LWC) standardization project, a cryptographic competition for lightweight authenticated encryption schemes (AEAD) and hash functions.
One of the 10 finalists in the NIST LWC competition is the Romulus cipher suite, published by Iwata et al. at FSE 2020 [IKMP20]. Romulus comprises multiple authenticated encryption modes for different use cases and the hash function Romulus-H. All members of the Romulus cipher suite internally use the tweakable block cipher SKINNY [BJK + 16].
The hash function Romulus-H follows the MDPH construction [Nai19]. This construction combines the Hirose double block-length (DBL) mode [Hir06], a construction to build a 2n-bit compression function from an n-bit block cipher, with MDP [HPY07], an extension of the Merkle-Damgård construction that prevents length extension attacks. MDPH is proved to be indifferentiable from a random oracle up to n − log(n) bits when instantiated The SKINNY tweakable block cipher used in Romulus-H has been analyzed in an unkeyed setting in a different context, in the Matyas-Meyer-Oseas (MMO) construction. This hash function is a single-block-length construction and thus has an output size of 128 bits for SKINNY; so, it only offers 64 bits of security against collisions. Guo [GLLP22].
While the MDPH construction is provably secure when instantiated with an ideal block cipher, SKINNY only claims security in the weaker related-key model. Therefore, we need dedicated analysis of the security of using SKINNY in the MDPH construction; but so far, few results have been published. At Crypto 2021, Dong et al. [DHS + 21] published meetin-the-middle attacks on SKINNY-128-384 which they use to find preimages and free-start collisions for 23 rounds of Romulus-H with time complexity 2 248 and 2 124 , respectively. To find preimages, they use a meet-in-the-middle attack on SKINNY which they identify based on an automated approach. Their free-start collision attack is based on generating preimages for a partial target, i.e., preimages that map to a subset of size 2 x of the output space. Then, by generating about 2 x 2 preimages that map to that subset, a free-start collision can be found. So far, no dedicated results on collisions for the hash function with a constant chaining value, i.e., semi-free-start collisions, have been published to the best of our knowledge.
Our contributions. We fill this gap in the analysis of Romulus-H and present the first collision attacks on the hash function. Our contribution can be summarized as follows.
• We create a framework for analyzing hash functions based on Hirose DBL and the Merkle-Damgård construction. We propose so-called joint differential characteristics. These capture the relationship between the two block cipher calls in the Hirose DBL construction by using an additional connecting characteristic. Thus, unlike the preimage attacks by Dong et al. [DHS + 21], we do not treat the second primitive call as a black box.
• To identify these joint differential characteristics, we propose a two-step search process based on MILP and CP models. Furthermore, we show efficient CP models for finding collisions based on joint differential characteristics.
• We apply this framework to the Romulus-H hash function and provide optimizations based on the STK framework as well as on the slow diffusion of SKINNY.
• With these models, we obtain practical collisions for round-reduced variants of Romulus-H. Concretely, we show practical collisions for 10 out of 40 rounds of Romulus-H and practical semi-free-start collisions for 14 rounds.
Our results are summarized in Table 1. The first two table rows show our bounds on the minimum number of active S-boxes (#S) in joint characteristics in general and in a simplified, equality-based model. Next, we show the estimated probability rounded to the closest integer of the best characteristics we found. These estimations are under the Markov assumption, which is clearly not satisfied in the unkeyed setting. The bold values are optimal when restricted to a fixed cellwise characteristic with the minimum number of active S-boxes; we observe a large gap between an activity-based bound and actual characteristics, i.e., many S-boxes use suboptimal transitions. Yet, our results show that our tool-based approach can make efficient use of the available degrees of freedom and find solutions for characteristics with very low probability.
In Section 2, we recall the specification of Romulus-H and SKINNY, as well as background on automated differential cryptanalysis. In Section 3, we propose our framework for collision-finding in the Hirose DBL construction and propose the concept and modeling of joint characteristics. In Section 4, we apply the framework to Romulus-H and present cipher-specific optimizations as well as our results. We conclude in Section 5.

Background
In this chapter, we outline necessary preliminary knowledge, including the definition of Romulus-H and the SKINNY block cipher. Furthermore, we recall differential cryptanalysis, how it can be automated, and how it can be applied to hash functions.

The Romulus-H hash function
Romulus-H is a hash function proposed by Iwata where E(M i , TK) denotes encryption of the plaintext M i under tweakey TK.  Figure 2: The compression function of Romulus-H based on Hirose's DBL construction.

The SKINNY family of block ciphers
SKINNY is a family of tweakable block ciphers proposed by Beierle et al. [BJK + 16]. It features a 64-bit or 128-bit state and a tweakey of size 128, 256, or 384 bits. Romulus-H uses a state size of 128 bits and a tweakey of 384 bits. The state is organized as a 4 × 4 matrix of 8-bit elements. The tweakey is stored in three 4 × 4 matrices of 8-bit elements called TK 1, TK 2, and TK 3. However, instead of SKINNY-128-384, Romulus-H uses the round-reduced variant SKINNY-128-384+ with 40 instead of 56 rounds.
SKINNY's round function is depicted in Figure 3 and consists of the following steps. SubCells: An 8-bit S-box is applied to all 16 cells of the state in parallel. This S-box was chosen to meet certain minimum security requirements while minimizing the complexity of hardware implementations. Consequently, the DDT (differential distribution table) contains entries with probabilities ranging from 2 −2 to 2 −7 . Also, it is highly structured as evident from Figure 4. In this figure, each pixel corresponds to an entry in the DDT with darker pixels corresponding to higher probability. AddConstants: A round constant is xored onto the leftmost column of the state.

AddRoundTweakey:
The top two rows of the three tweakey states TK 1, TK 2, and TK 3 are xored onto the state.

ShiftRows:
The cells of the state are rotated row-wise, similar to AES but to the right.
MixColumns: Each column of the state is multiplied with the following binary matrix: This mixing layer is chosen to be lightweight and can be implemented with only three xors and some rewiring.

Differential Cryptanalysis
Differential cryptanalysis is one of the main techniques for evaluating the security of cryptographic primitives. It was proposed by Biham and Shamir in 1990 to attack the DES block cipher [BS90]. The core idea is to find differential characteristics which trace the difference between two executions of a cryptographic primitive and hold with a particular probability. In SPN designs, this probability is determined by the active S-boxes, i.e., S-boxes with nonzero input differences.
In the context of hash function cryptanalysis, differential characteristics are particularly useful for finding collisions; i.e., we are interested in vanishing characteristics with output difference 0. Besides the classic xor-differences, other difference notions such as modular differences (subtraction modulo 2 n ) and signed differences [WY05, WLF + 05] are useful. Additionally, different levels of determination of differences can be used to describe different phases of the search: Similar to truncated differential analysis [Knu94], we may consider parts of the characteristic undetermined or even refine the notion and impose additional constraints on the values [CR06].
Automating Differential Cryptanalysis. The process of finding differential characteristics has increasingly been automated. One common tool is Mixed-Integer Linear Programming (MILP), a modeling approach based on linear constraints and linear objectives, which is often used to find patterns of active S-boxes [MWGP11, ENP19, ZHWW20, MR22]. Additionally, SMT and SAT solvers are often used to find differential characteristics based on these S-box patterns [AK18,MAS15]. In the context of hash function cryptanalysis, automation is mostly driven by custom dedicated tools, particularly when searching for actual collisions [MNS11, SBK + ]. For preimage attacks, a few results using SAT solvers have been shown [HMRS12].

Framework for Finding Collisions in the Hirose DBL
This section outlines our general framework for finding collisions in hash functions based on MDPH. Our framework is based on joint differential characteristics and performs the following 4 steps.
1. Find a joint cellwise characteristic with few active S-boxes.
2. Find many joint bitwise characteristics based on the cellwise characteristic.

Joint Differential Characteristics
To apply differential analysis to the Hirose DBL construction, we define the concept of joint differential characteristics. We define this concept based on a few observations. We can use the same differential characteristic for both invocations of SKINNY. There is a connection between the two SKINNY calls within a single compression function which we describe using the connecting difference. In the first few rounds, this connecting difference is mostly zero because the initial difference affects only a single bit. We can keep track of this connecting difference by using a connecting differential characteristic. After some time, it does not pay to keep track of the connecting difference. Then, we can leave the difference unknown, as in truncated differential cryptanalysis. Therefore, we define a joint characteristic as a (main) differential characteristic δ for the tweakable SKINNY block cipher and a connecting characteristic τ , i.e., a truncated differential characteristic. Note that the tweakey difference in the connecting characteristic must be zero as both SKINNY calls in a compression function use the same tweakey. We depict the notation based on the example of a single S-box in Figure 5. For each S-box transition (δ i , τ i ) → (δ o , τ o ) in a block cipher with s-bit S-boxes, we have the following: Joint probability. We calculate the overall probability of a joint characteristic as the product of the probabilities of each S-box transition. To accurately evaluate the joint transition (δ i , τ i ) → (δ o , τ o ) through an S-box, we cannot rely only on the DDT but need additional information. For this purpose, we define two additional tables for the S-box S, the DDT3 and DDT4, depending on whether we want to know τ o or not: Additionally, we use XDDT, XDDT3, and XDDT4 to denote the underlying solution sets, i.e., the sets of input values to the S-box (x in the equation above) that lead to the desired transition. Then, the transition probability p of (δ (1) Maximum Probability of the DDT3 and DDT4. The largest entries in the DDT3 and DDT4 (for nonzero τ i ) equal the differential uniformity of the S-box. This is a special case of the following fact: While the DDT4 of an 8-bit S-box is a table with 2 32 entries, it is very sparse. For example, the DDT4 of the 8-bit SKINNY S-box only contains 2 16.8 nonzero entries. Therefore, we can store it efficiently as a sparse matrix. The DDT3 is also relatively sparse; for SKINNY there are 2 16.2 nonzero entries (out of 2 24 ).

MILP Model for Joint Cellwise Characteristics
To find joint characteristics and conforming inputs, we follow a multi-step approach. One of the main challenges is the complexity of the DDT4, which is challenging for automated solvers. Hence, we want to minimize the number of S-boxes we need to model with the full DDT4 and first search for cellwise characteristics; that is, we first only determine the activity pattern of the joint characteristics, not the precise bit-level differences. This way, we obtain upper bounds on the probability of the best joint characteristics and starting points to search for good (but not necessarily optimal) bit-level differential characteristics.
We define a MILP model to search for optimal or near-optimal joint cellwise characteristics and associated bounds. We useδ andτ to denote the cellwise activity in the main and connecting differential characteristics, respectively. To calculate cellwise bounds, we consider the maximum differential probability given by these tables. We consider three different settings to evaluate the advantage of joint characteristics: • Plain setting: We ignore any connection between the upper and lower characteristic (i.e.,τ = ? is completely undetermined) and consider the same characteristicδ independently for both SKINNY calls. For each active S-boxδ i ̸ = 0, we pay (at least) the maximum differential probability of the S-box DDT twice. In other words, we consider only the first case in Equation 1. For the cellwise model, we need only one variable per S-box, as the same optimal characteristicδ can be used for the upper and lower parts, andτ is not needed; for the bitwise model, we only need the DDT.
• Equality setting: We propagate the fixed input difference ofτ with probability 1, i.e., we deduce in which cellsτ is definitely zero. Then, for each active S-boxδ i ̸ = 0, we pay the S-box DDT transition either once (ifτ i = 0, second case in Equation 1) or else twice as in the independent setting. For the cellwise model, we use two variables per S-box forδ andτ , but only theδ variables are actual decision variables whileτ can be deduced deterministically. For the bitwise model, we still only need the DDT.
• Joint setting: We use the full definition of joint characteristics from Subsection 3.1, with the transition probabilities of Equation 1. The connection characteristicτ is modeled as a partial characteristic: Differences can be either known (τ ∈ F s 2 for bitwise characteristics orτ ∈ {0, x} for cellwise characteristics, where x denotes an active, nonzero difference) or unknown (τ = ?). In an S-box with a known, nonzero input differenceτ i ̸ = 0, we can choose to either know or forget the output difference. For the cellwise model, we need corresponding decision variables to these states; for the bitwise model, we need the DDT, DDT3, and DDT4.
MILP model for the plain setting. The objective of this model is to minimize the number of active S-boxes in the main characteristicδ in order to maximize the differential probability in the two "independent" SKINNY calls. We introduce the following binary decision variables for R-round SKINNY-128-384: X[r, i, j] is the S-box activity ofδ in round r ∈ R ′ = {0, 1, . . . , R} (where X[R, ·, ·] is the output), row i ∈ I = {0, 1, 2, 3}, and column j ∈ J = {0, 1, 2, 3}; Y [r, i, j] is the activity pattern ofδ after AddRoundTweakey for r ∈ R = {0, 1, . . . , R − 1}; and K[r, i, j] is the activity pattern of the round subtweakey. This round subtweakey is the xor of the 3 round tweakeys TK Based on these variables, a cellwise differential model of the round update function is quite straightforward using only 2-input cellwise xors. We model all xors without helper variables to minimize the solving time [ENP19].
Since we want to find collisions, we additionally constrain the input and output differences accordingly. In the DBL construction, both SKINNY instances are used with a feed-forward. Thus, we obtain a free-start collision of the compression function if the input and output difference match, i.e., X[0, i, j] = X[R, i, j] for all i ∈ I, j ∈ J . In this paper, we focus on semi-free-start collisions and full hash collisions, so we require a zero input difference in the chaining value, X[0, i, j] = 0 for all i, j. Additionally, we are limited to the TK2 setting, as only the 2 tweakey input states corresponding to the message block M can induce differences (see Figure 2).
MILP model for the equality setting and joint setting. For both settings, the objective is to minimize the joint S-box transition cost and thus maximize the joint differential probability. We introduce additional binary decision variables for the connecting characteristic τ : T [r, i, j] is the S-box activity ofτ in round r; U [r, i, j] denotes whether this activity is known (U = 0) or unknown (U = 1) at the S-box output; and F [r, i, j] marks whether a known, active input differenceτ i = x to an S-box is forgotten, i.e.,τ o = ?. Thus, U [r, i, j] denotes whether the S-box output is unknown, while U [r, i, j] − F [r, i, j] denotes whether the input is unknown.
The cost C[r, i, j] ∈ {0, 1, 2} of an S-box transition depends on all these decision variables and is measured in multiples of the differential weight of the best nonzero DDT transition. It is nonzero whenever the main characteristic is active,δ i ̸ = 0, or the connecting characteristic is active with known output difference,τ i =τ o = x (see Equation 1). It reaches the maximum cost of 2 when the main characteristic is active,δ i ̸ = 0, and the connecting characteristic has an unknown input difference,τ i = ?. This can be expressed in linear constraints as For the joint setting, we initialize For simplicity, we only describe the joint setting; the equality setting is functionally equivalent to the joint setting with U [0, 0, 0] = 1, i.e., considering the initial fixed difference of 1 as unknown, which will propagate to all following (potentially) active cells inτ . Similarly, the model is reduced to the plain setting if we set U [0, i, j] = 1 for all i, j.
We can model the effect of the round function forτ in a similar way asδ based on 2-input xors, using a simple setting with no tweakey difference as both calls use the same tweakey. Regarding the known/unknown status, we can choose to forgetτ o in any S-box with active, known inputτ i : For the xor operations, if either of the inputs is unknown, then the output is unknown; else, it is known. For example, if the S-box input in cell (i, j) of round r +1 is derived as the xor of S-box output cells (i ′ , j ′ ) and (i ′′ , j ′′ ) in round r (via ShiftRows and MixColumns), then we add We discuss the bounds obtained with this model in Subsection 4.1 and use the best cellwise characteristics as starting points for collision-finding.

SMT Model for Finding Joint Bitwise Characteristics
To find a joint bitwise characteristic, we use the Z3 SMT solver [dMB08]. Concretely, we create variables for all active S-box input and output differences in the main characteristic and variables for all active and known differences in the connecting characteristic.
To model the linear layer, we use xor constraints of the SMT solver. We only need to model those differences that are active in the truncated joint characteristic. Note that we only model the connecting differential characteristic as long as there is a known difference.
To model the S-boxes, we consider the relevant DDTs. For a given S-box, we distinguish based on the values of the activity indicators of the joint cellwise characteristic, i.e.,δ, τ i ,τ o . If both characteristics are inactive, i.e.,δ =τ i =τ o = 0, we do not need to model anything. If only the main characteristic is active and the connecting difference is inactive or unknown, i.e.,δ = x andτ i ∈ {0, ?}, then we need to model the DDT. To do this, we list all possible transitions {δ i , δ o : DDT(δ i , δ o ) > 0} in a DNF. We then convert this DNF to a small CNF using the espresso logic minimizer and pass it to the Z3 solver. If only the connecting difference is active and known, i.e.,τ i =τ o = x andδ = 0, we use an analogous model. The remaining cases are handled similarly. For the case of the connecting difference turning unknown, i.e.,τ i = x,τ o = ?, we model the DDT3. Finally, if all differences are nonzero and known, i.e.,δ =τ i =τ o = x, we model the DDT4.
While the model is functional, we perform a small modification to improve efficiency and to find better characteristics. Concretely, we define a cutoff value w c , and only consider the transitions {δ i , δ o : DDT(δ i , δ o ) ≥ w c } in a DNF. We do this analogously for the DDT3 and DDT4 with cutoff values w c,DDT3 , and w c,DDT4 , respectively. This serves two purposes. First, the resulting model will be simpler as fewer transitions need to be considered. And second, the resulting differences likely have higher probability as all transitions worse than the cutoff are left out.
We want to use the characteristics this model generates to find semi-free-start collisions and, ultimately, full collisions. To do this, we search for assignments of the characteristic with arbitrary or fixed chaining value h for semi-free-start collisions or full collisions, respectively. One factor that influences whether such an assignment exists is the probability and the degrees of freedom in the given setting. In the semi-free-start setting, we can freely choose the message block M 1 and the chaining value h. In contrast, when searching for full collisions, we only freely control the message M 1 ; the chaining value h is determined pseudorandomly based on the preceding message blocks. As an attacker, this chaining value h can thus only be influenced by generating many different preceding message blocks. We can only expect to find an assignment, i.e., a (semi-free-start) collision, if the degrees of freedom exceed the probability of the differential characteristic.
We would expect a probability of ≥ 2 −512 to be sufficient to find semi-free-start collisions and a probability of ≥ c · 2 −256 to be sufficient to find full collisions within c trials. However, when applying this model to Romulus-H in Section 4, we identify many characteristics with sufficient probability where no valid assignment exists. This is because we estimate the probability under the assumption of a Markov cipher, i.e., under the assumption that each round is totally independent of the previous round. This is not the case for most block ciphers and definitely not the case for SKINNY. First, each round of SKINNY only mixes 64 bits of tweakey material into the 128-bit state. Second, the key schedule is relatively simple as it only consists of cellwise permutations and LFSRs. Therefore, in our application to Romulus-H this only serves as a necessary condition.

SMT Model for Finding Semi-Free-Start Collisions
So far, we have found a cellwise joint characteristic and a compatible bitwise joint characteristic. Now we want to use the bitwise joint characteristic to find a semi-free-start collision, i.e., an assignment for h and M such that CF(h, M ) = CF(h, M ⊕ ∆M ).
A naive way to find such an assignment would be to model the whole compression function twice and restrict the difference between them based on the main differential characteristic and the difference within each compression function based on the connecting differential characteristic. Such a model is easy to construct; however, it creates a lot of unnecessary variables and constraints. For example, many variables must be equal due to a zero difference in the joint characteristic. Furthermore, we can even use the fixed nonzero differences in the characteristic to eliminate variables.
Instead of this naive model, we only define variables for the execution of CF(h, M ) and add conditions such that the differential characteristic is followed as done in previous work [MZ06]. We optimize our model based on the main differential characteristic and the connecting characteristic. We outline these optimizations in the paragraphs below.
Optimizations based on the main differential characteristic. We model each S-box based on the XDDT. Concretely, we model all possible values (x, y) such that y = S(x) and x ∈ XDDT[δ i , δ o ]. Similar to the model for finding characteristics, we use the espresso logic minimizer to generate a small CNF. Note that we also apply this model to the inactive S-boxes, where XDDT[0, 0] corresponds to the whole input space. By using this model, all the S-box transitions are guaranteed to happen.
For the linear layer, we only need to ensure that the values we choose for the S-boxes are also consistent with the linear layer. As the linear layer transitions happen with probability 1, we do not need any extra constraints. To build the model, we use xor constraints of the SMT solver. Note that we also need to model the round constants.
This model is already functional and fast. However, we can further improve it by considering the connecting difference.
Further optimizations based on the connecting characteristic. So far, we have seen that we only need to model one compression function with some extra constraints on the S-boxes. Now, we explain how we can use the connecting difference to reduce the number of modeled S-boxes and linear layers for this one compression function.
Consider an S-box transition with differences δ i → δ o and connecting differences τ i → τ o . If both connecting differences are known, i.e., τ i ̸ = ? and τ o ̸ = ?, then we only need to model a single S-box assignment instead of two. Concretely, we model (x, y) such that y = S(x) ∧ x ∈ XDDT4[δ i , τ i δ o , τ o ]. As before, we use the espresso logic minimizer to create an optimized CNF for this assignment. Note that if τ i = τ o = 0, then the XDDT4 equals XDDT(δ i , δ o ) and we get the same model as before, but we only need to create it once instead of twice.
If only one of the connecting differences is known, i.e., τ i ̸ = ? and τ o = ?, then we create two models for the two S-box assignments (x, y 0 ) such that y 0 = S(x) and x ∈ XDDT(δ i , δ o ) and (x, y 1 ) such that y 1 = S(x ⊕ τ i ) and (x ⊕ τ i ) ∈ XDDT(δ i , δ o ). These are modeled as two separate minified CNFs.
If both connecting differences are unknown, i.e., τ i = ? and τ o = ?, then we need to use the model outlined above. That is, we model the two separate S-box assignments (x 0 , y 0 ) and (x 1 , y 1 ) according to the XDDT.
To model the linear layer, we can use similar observations. In particular, we only need to include the linear layer for those values that we actually use in the S-box model above. Concretely, if the connecting difference τ at an output of the linear layer is known (τ ̸ = ?), we only need to model this calculation once. Otherwise, we need to model it twice.

Extension to Full Collisions
While (semi-)free-start collisions show potential weaknesses in a hash function, we ultimately want to find full collisions. That is, we want to find two messages M 1 and M 2 such that H(M 1 ) = H(M 2 ) and M 1 ̸ = M 2 .
For our search, this means that the input chaining value is constrained, either to 0 (if we target the first message block) or to the chaining value produced by a fixed prefix. We still want to take advantage of the (limited) degrees of freedom in the chaining value, so we repeatedly pick a random prefix M pre . This prefix leads to a fixed chaining value h = CF(0 256 , M pre ) as defined by the MDPH construction. Then, we use this chaining value as a starting point for a collision in the next compression function call. Consequently, we search for messages of the following form: To model this setting, we can reuse the SMT model from the previous subsection with only a minor modification. Instead of leaving the chaining value h free, we use the fixed chaining value h = CF(0 256 , M pre ). This way, the assignment for M will not only be a semi-free-start collision but a full collision for the hash function. The setup for the colliding messages in the analyzed construction is depicted in Figure 6.  We do not expect that this process will be successful on the first try. Therefore, we generate many random prefix values M pre until we find an assignment for M . Note that this process can be performed in parallel on many cores.

Application to Romulus-H
Now, we apply the framework to find hash collisions from the previous section to Romulus-H. We introduce dedicated optimizations based on the STK-framework and SKINNY and discuss the results.

Results for Joint Cellwise Characteristics
We use the unmodified MILP model to derive upper bounds on the number of active S-boxes in joint cellwise characteristics.
The results of these models are listed in Table 2. Based on these bounds for the number of S-boxes, we can also find a bound on the best joint differential characteristics. However, we do not expect these bounds to be very tight because the concrete transitions for each S-box have a big range of probability from 2 −2 to 2 −7 . For a valid bound, we need to assume all S-box transitions use the best case of 2 −2 which will not be the case in practice. In addition to the bound, we also obtain optimal joint cellwise characteristics. The optimal characteristic in the joint setting for 14 rounds is depicted in Figure 7.
Based on these results, we see that the idea of joint differential characteristics gains importance as the number of S-boxes grows. In particular, for 16 rounds, only 74 active S-boxes are needed in the joint settings while 96 are needed in the equality setting.

Finding Joint Bitwise Characteristics for Romulus-H
Next, we use the cellwise characteristics from the previous step to find bitwise characteristics for Romulus-H. For this purpose, we introduce an optimized model for the tweakey schedule which applies to all STK constructions. Furthermore, we discuss different trade-offs and approaches for obtaining minimized CNFs of the large DDT4 tables.

Optimized model for the tweakey schedule
We want to create a model for all possible round tweakey differences that are compatible with the cellwise joint characteristic. In our setup, we only need to model the TK 2 and TK 3 lanes of each active tweakey lane. The TK 1 lanes always have zero difference, as these are given by the chaining value in the MDPH mode of operation. We want to model the LFSRs of the tweakey schedule with as few variables and as few xor constraints as possible. Therefore, we take inspiration from the technique outlined by Soos et al. [SNC09]. They propose to choose the reference bits, i.e., the actual variables the solver solves for, in the middle of the output stream of the LFSR. This way, fewer xor constraints are needed as the output bits are closer to the reference bits of the model.
Additionally, we use the fact that the LFSRs for TK 2 and TK 3 are exact inverses of each other and combine that with the known cancellations in the cellwise characteristic in the tweakey schedule. If there is a cancellation between TK 2 and TK 3 in the tweakey schedule, we know that the two differences in this cell must be equal. Therefore, we can create a single set of 8 variables as a reference state to model the difference for the two active tweakey lanes TK 2 and TK 3. Then, based on these 8 variables, we create all the linear expressions that we need to model the TK 2 and TK 3 lane. By using these ideas, we only need 8 variables instead of 16 to model one TK 2 lane and the associated TK 3 lane.

Optimized CNF encoding of the DDT4
Creating a small representation of the DDT4 in CNF formulation is challenging because it is so large. In particular, just using the espresso logic minimizer does not lead to results, even after multiple days. Instead, we can use the SAT solver Lingeling, which supports CNF minimization with time limits, to minimize the CNF [Bie17]. Concretely, our approach is performed in 4 steps. First, we create a DNF of all entries above or equal to the cutoff value w c,DDT4 . Second, we convert this DNF to a CNF by using espresso without optimization. We use the .phase 0 option to swap the set of ones and zeros after reading the input. This way, we can specify the much smaller DNF, and it is internally converted to a CNF of a reasonable size. Instead of optimizing this CNF, we use the -Decho command line flag to disable all optimization steps and receive a reasonably sized CNF. Third, we note that a valid DDT transition is a necessary condition for a valid DDT4 transition. Hence, we add a CNF of {δ i , δ o , : DDT(δ i , δ o ) ≥ w c,DDT4 } and {τ i , τ o , : DDT(τ i , τ o ) ≥ w c,DDT4 }. Finally, we use Lingeling on this combined CNF in minimization mode to eliminate many of the now redundant clauses.
When we apply this model to the formulation of DDT4 } with cutoff w c,DDT4 = 8 we see a drastic decrease in the number of required clauses. Instead of listing the nearly 2 32 entries below the cutoff, we only list the 2 15.7 entries above or equal to the cutoff thanks to the .phase 0 option. When running espresso with all optimization steps disabled, we receive a CNF with 146 317 = 2 17.2 clauses within seconds. Then, we add 217 clauses for each of the two DDTs. Finally, we run Lingeling in optimization mode and receive a CNF with only 31 578 = 2 14.9 clauses after a few hours.

Results for Bitwise Characteristics and Semi-Free-Start Collision
We apply this model to the best cellwise joint characteristic for 10 rounds and find that choosing the right cutoff values is crucial for identifying good characteristics quickly. When using w c = 4, w c,DDT3 = w c,DDT4 = 8, the SMT solver reports that no such characteristics exist after a few seconds. When using w c = 4, w c,DDT3 = w c,DDT4 = 4, we find a new joint characteristic about every 20 seconds on a laptop. The identified joint characteristics have probabilities ranging from 2 −219 to 2 −229 .
To speed up the search, we also apply the model to the best cellwise joint characteristic for 10 rounds in the equality model. Then, we do not need to model the heavy DDT4. When using a DDT cutoff of w c = 4, the Z3 solver needs about 250 milliseconds for each joint characteristic. While these have lower probabilities on average, we can generate many more of them within the same time. After generating 355 joint characteristics, we find a satisfiable one with probability p = 2 −233.8 which is depicted in Figure 8. The green border indicates which S-boxes have a connecting difference that is known to be zero.
We also apply the model to the best cellwise joint characteristic for 14 rounds in the equality model. Because a cutoff of w c = 4 leads to an unsatisfiable model, we use w c = 2. In this configuration, Z3 needs about 600 milliseconds to generate a new joint characteristic. After generating 405 joint characteristics, we find a satisfiable one with probability p = 2 −419.66 which is depicted in Figure 9. Interestingly, we did not find any additional satisfiable characteristics, even after generating many thousands more.
For this joint characteristic, we can find valid assignments to obtain a semi-free-start collision for 14 rounds of Romulus-H. Concretely, we find a chaining value h and two messages M 1 , M 2 such that CF(h, M 1 ) = CF(h, M 2 ), where CF denotes the 14-round compression function: Finding this semi-free-start based on the characteristic takes about 60 seconds on a laptop. The main bottleneck which prevents extending these results to more rounds is finding characteristics that are actually satisfiable. We generated thousands of characteristics for 15 rounds; none of them were satisfiable. Based on the consideration of degrees of freedom and probabilities, there might be solutions for 15 rounds as well. However, there seem to be many hidden contradictions because SKINNY is not a Markov cipher.

Bounding Joint Differential Characteristics
We have established that there is a link between satisfiability and the probability of a differential characteristic. This raises the question of whether we can extend our attacks by explicitly searching for characteristics with high probability.  To find the best joint differential characteristics compatible with a given cellwise trail, we use a technique similar to the one in CryptoSMT [Kö]. The main idea is to define an upper limit on the overall log-probability w max and then find a value x such that the model with w max = x is satisfiable while the model with w max = x − 1 is not. To build this model, we use the cellwise characteristics obtained from the equality model as a basis. This makes our model easier, as we do not have to consider the large DDT3 and DDT4. Concretely, we split the DDT into sub-tables w-DDT where for each weight w ∈ {0, 2, 3, 4, 5, 6, 7}. Each table w − DDT corresponds to those entries in the DDT with associated probability 2 w−1 < p ≤ 2 w . Then we create a separate CNF for each w-DDT. For each S-box, we then create indicator variables v w for each value of w and model v w ⇒ (δ i , δ o ∈ w-DDT). Additionally, we need to make sure that at least one of the indicator variables is true, so we add the following constraint: w v w . With this model, we might slightly overestimate the probability of our differential characteristic (a characteristic with probability slightly worse 2 x might be satisfiable with w max = x) because we are grouping transitions with non-power-of-two probability with the next higher power of two. However, this effect is limited as of the 11 469 transitions, only 170 have a probability that is not a power of two.
Results. The probabilities of the best characteristics identified with this model are listed in Table 1 on page 3. The probabilities typeset in bold font correspond to tight bounds where either all transitions used a power-of-two entry in the DDT or it has been verified with a more exact model that creates a separate class for each possible transition probability.  The other probabilities are the best characteristics identified but not necessarily optimal for the given cellwise characteristic.

Finding Collisions for Romulus-H
Compared to semi-free-start collisions, when searching for full collisions, we are much more limited by the fixed chaining value h, which is determined by the prefix M pre . To find semi-free-start collisions, we use the approach from Subsection 3.4. However, as the number of rounds increases, many of the bitwise joint characteristics are not satisfiable. We thus generate many characteristics in a loop and immediately check satisfiability in a semi-free start setting; this way, we reduce the number of characteristics for the full search. When searching for full collisions, we can perform some optimizations based on the unkeyed first round of SKINNY. Remember that each prefix M pre leads to a fixed chaining value h L at the input of SKINNY block cipher. Therefore, we have limited degrees of freedom in the first few rounds. The first SubCells layer is completely tweakey-independent (and thus message-independent). Any active S-box in the connecting differential characteristic will only be valid for a subset of chaining values. Similarly, we can optimize based on the second round, where the tweakey dependency is still low. The second SubCells layer only depends on 64 bits of tweakey material which is only slowly mixed with the state. Remember that tweakey addition is performed on the first and second row only. According to the definition of MixColumns, the second row only influences the third row after mixing, while the third row influences row 1, 2, and 4. Now, consider the characteristic in Figure 10. The three highlighted S-box transitions in the second round only depend on a single tweakey byte. Consequently, only a small fraction of all h L values permit an assignment of the free tweakey values such that the characteristic is followed in the first two rounds. In the example from Figure 10, this fraction is 2 −11 . Based on these observations, we can efficiently filter the chaining values. Then, we only need to call the SAT solver for a small fraction of chaining values. This filtering step for chaining values is crucial for finding collisions in reasonable time.  We did not find any full collisions for more than 10 rounds. The main problem seems to be that with more rounds, the probability of our characteristics decreases drastically, and for 12 rounds, the best characteristic has probability 2 −302.9 , which is lower than the 256 degrees of freedom we have due to the message input. For 12 rounds, further optimizations are necessary to have a chance of obtaining practical attacks with this approach.

Conclusion
In this work, we developed a new framework for analyzing the hash functions based on the Hirose DBL construction and the MDP mode of operation. This framework uses the idea of joint differential characteristics which capture the dependencies within the Hirose DBL constructions. We applied this framework to Romulus-H and found practical collisions up to 10 out of 40 rounds and practical semi-free-start collisions up to 14 rounds. These are the first practical attacks on round-reduced Romulus-H, and indeed the first attacks on the reduced hash function with complexity below the 128-bit security claim. The results do not threaten full-round Romulus-H. Our framework shows that the dependencies between the two block cipher calls in the Hirose DBL construction should be taken into account. These dependencies increase the number of rounds that can be attacked as fewer S-boxes need to be considered. Our results only apply to round-reduced versions of Romulus-H and do not threaten the security of the full version. Based on these results, we gain valuable insights into the dependencies within the Hirose DBL construction.
These insights warrant future work on Romulus-H and other hash functions based on Hirose DBL. As our main bottleneck is finding satisfiable characteristics, we believe further research into why so many characteristics are not satisfiable might improve upon our results. A thorough understanding of these conflicts that lead to not satisfiable characteristics would allow avoiding them much earlier. One source of such conflicts is the limited key addition of SKINNY, as we have seen when searching for full collisions. Another potential direction for future work is to explore different cellwise characteristics as starting points.
In this work, we have focused on practical attacks, which leads to the question of whether we can extend the number of attacked rounds based on theoretical estimates. One idea is to apply the approach of Wei et al. to Romulus-H to find free-start collisions. Furthermore, a version of the Rebound attack that considers joint differential characteristics might give semi-free-start collisions for even more rounds. We conclude that our research on joint differential characteristics may serve as a starting point for several lines of research.