PLCrypto: A Symmetric Cryptographic Library for Programmable Logic Controllers

. Programmable Logic Controllers (PLCs) are control devices widely used in industrial automation. They can be found in critical infrastructures like power grids, water systems, nuclear plants, manufacturing systems, etc. This paper introduces PLCrypto , a software cryptographic library that implements lightweight symmetric cryptographic algorithms for PLCs using a standard PLC programming language called structured text (ST). To the best of our knowledge, PLCrypto is the ﬁrst ST-based cryptographic library that is executable on commercial oﬀ-the-shelf PLCs. PLCrypto includes a wide range of commonly used algorithms, totaling ten algorithms, including one-way functions, message authentication codes, hash functions, block ciphers, and pseudo-random functions/generators. PLCrypto can be used to protect the conﬁdentiality and integrity of data on PLCs without additional hardware or ﬁrmware modiﬁcation . This paper also presents general optimization methodologies and techniques used in PLCrypto for implementing primitive operations like bit-shifting/rotation, substitution, and permutation. The optimization tricks we distilled from our practice can also guide future implementation of other computation-heavy programs on PLCs. To demonstrate a use case of PLCrypto in practice, we further realize a cryptographic protocol called proof of aliveness as a case study. We benchmarked the algorithms and protocols in PLCrypto on a commercial PLC, Allen Bradley ControlLogix 5571, which is widely used in the real world. Also, we make our source codes publicly available, so plant operators can freely deploy our library in practice.


Introduction
It is indisputable that the Industrial Internet of Things (IIoT) adoption in industrial control systems or critical infrastructures has excellent potential in the future. The concept of IIoT allows all components in such systems to be connected and coordinated intelligently and efficiently. Research conducted by Morgan Stanley and Automation World Magazine in 2015 predicted that the global market of IIoT would grow to 123 billion USD in 2021 [Sta16]. However, according to the same report, the manufacturers expressed their concerns about cybersecurity and their legacy-installed base [Sta16]. This reality imposes a vital question for us: how to secure legacy industrial systems? In this paper, we will show our solution as a significant step towards solving this problem. In particular, we will show how to retrofit legacy programmable logic controllers (PLCs) to secure their communications against network attackers without additional hardware or new firmware.
The cores of industrial control systems are PLCs that control the physical processes directly. Thus, PLCs are usually the primary targets for attackers to compromise. However, widely used commercial PLCs lack proper security protections, like encryption and authentication. For example, suppose attackers are somehow connected to the operational technology network. In that case, they can easily intercept and manipulate the communication (e.g., over Common Industrial Protocol (CIP)) between PLCs and supervisory control and data acquisition (SCADA) servers.
To formalize the security needs in the automation industry, ODVA drafted in 2015 the first version of a security specification, CIP Security [ODV19], for the communication of design automation devices, including PLCs. In the specification, ODVA highlighted the need for device authentication, data integrity, and data confidentiality. Following this specification, PLC vendors have started designing and producing new CIP Security capable devices. However, the reality is that there are still a huge number of legacy devices running in the field, and the average lifetime of devices in factories is around 20 years [Dec18]. This also means that it will take at least another 20 years for the manufacturers to fully adapt to today's technology. Even today, CIP Security capable PLCs are not mainstream products from leading PLC vendors. For example, in a document published in September 2019 by Rockwell Automation, one leading PLC vendor, they only have one PLC model, ControLogix 5580, that supports CIP Security [Bra19].
Our Solution. To support legacy PLCs in the real world with no extra cost, we propose to secure PLC communications by developing a comprehensive symmetric cryptographic library, PLCrypto, on the control logic layer. Note that the control logic program is running on top of the firmware of a PLC. It is the only layer that one PLC user (e.g., an operator of an industrial control system) can program to realize various control and computation functionalities.
We develop our cryptographic library on Allen Bradley PLCs from Rockwell Automation [Roc20] because it is one of the two leading PLC vendors (SIEMENS and Rockwell Automation) in the world, each of which has more than 20% market share in the global market of PLCs as of 2017 [Deu17]. Although the implementations we developed are specific for Allen Bradley PLCs, our library can be easily migrated to the PLCs from other vendors. It is because PLCrypto is developed in structured text (ST), which is one of the standard programming languages for PLCs defined in IEC-61131-3 [JT13].
Realizing cryptographic algorithms on PLCs is challenging. The main difficulties we encountered are: 1. To program PLCs, we have to use a particular set of programming languages defined in IEC-61131-3 [JT13], i.e., ladder diagram, function block diagram, structured text, sequential function chart, and instruction list (deprecated). We selected structured text for developing our library because other PLC programming languages are graphical programming languages and are not suitable for implementing complex arithmetic operations.
the PLCs. The attacker may exploit this feature of PLC to implement a family of attacks, which we jointly call tag manipulation attacks.
4. PLCs are primarily used to control physical processes, so they are not optimized for complex logical or arithmetic operations in cryptographic algorithms. During the process of implementing PLCrypto, we discovered a variety of optimization tricks. For example, we notice that we can easily access individual bits in variables on PLCs. Based on this observation, we introduced one trick called hard-coding to significantly improve the performance of certain cryptographic algorithms. For example, our hard-coding implementation of a one-way function (OWF) is 2× faster than the straightforward implementation.
PLCrypto includes cryptographic algorithms providing the security properties that are highly demanded in real-world applications, such as confidentiality, integrity, collisionresistance, and pseudo-randomness. Most of the included algorithms are either standardized by ISO/IEC or the state-of-the-art symmetric cryptographic algorithms that fit the PLC environment.  [GPP11,JG16] and SPONGENT [BKL + 13]. PLCrypto also contains some subset-sum based cryptographic primitives as efficient alternatives, like one-way functions (OWF) and universal one-way hash functions (UOWHF). In the meantime, we realize some basic operations (such as big integer addition and subtraction) to support our implementation of block ciphers. Note that the resilience to potential sophisticated side-channel attacks is not considered in this paper, and it is one of the future works.
To demonstrate how one can extend the application of PLCrypto beyond communication security, we use PLCrypto to implement a proof of aliveness (PoA) protocol [JYvDZ19] on an Allen Bradley PLC. The protocol was designed to prove the aliveness of devices in critical infrastructures to a remote server. To the best of our knowledge, this is the first implementation of proof of aliveness protocol on a commercial off-the-shelf PLC, and the authors of [JYvDZ19] only implemented their protocol on a much more powerful device, Raspberry Pi, in C as a prototype. Our new evaluation results of PoA demonstrate its practicality on commercial PLCs.
By making this library publicly available, we believe the research community can benefit from it for future research. Open-source codes also greatly facilitate the deployment of cryptographic algorithms on legacy PLCs by the plant operators, who typically have no background in cryptography or cryptographic engineering.

Contributions.
In this paper, we made the following significant contributions: 1. To the best of our knowledge, PLCrypto is the first cryptographic library implemented for PLCs using the languages defined in IEC-61131-3. This allows cryptography to be easily integrated into industrial systems to protect communications without the need for additional hardware or firmware modification. attaching another device called SCADA Cryptographic Module (SCM) to both parties of communication. Hence, the SCM becomes a proxy that can encrypt the message sent from the source and decrypt the message at the destination. A similar idea is also realized in [CATO17], where the authors proposed to reduce the computational overhead of the cryptographic methods by selectively encrypting security-critical messages, e.g., reading/writing requests. The proposed framework is implemented using a Raspberry Pi to tap the communication. Although this solution can be realized in a system with commercial PLCs, it is still not scalable as it requires new devices as a proxy for every PLC. Besides encryption and signature, authenticated key exchange protocols were also proposed for the applications on PLCs in critical infrastructures [JYAZ19]. It exploits the historical data stored on the SCADA server as additional authentication factors. To make this authenticated key exchange protocol compatible with legacy PLCs, one additional proxy has to be added.
Firmware Modification. Some researchers took a different approach to introduce cryptography into PLC systems. They tried to propose methods that are integrated into the firmware of PLCs. However, typically only the PLC vendors have the source codes of PLC firmware, and only the vendors can modify it. Alves et al. proposed to add a cryptographic layer in the network layer of PLCs, so they used AES-256 in cipher block chaining mode to protect both the confidentiality and integrity of messages [AMY17]. As a follow-up research, Alves et al. augmented the cryptographic layer in [AMY17] with machine learning-based intrusion detection in the framework of OpenPLC [ADM18]. Cryptographic encryption and integrity check are also embedded in Snapshotter design in [JVvD18]. Snapshotter system is a secure logging system of PLCs, which logs all security-related events on PLCs and then encrypts the messages using AES. All the above proposals require modifying the firmware of PLCs for tighter integration, so the performance evaluations in [AMY17,ADM18,JVvD18]  In this Javascript crypto library, they introduced several optimization techniques (including hard-coding strategies) tailored to couple the specific characteristics of Javascript interpreters. We stress that the traditional hard-coding and Bitslicing optimization techniques cannot be straightforwardly applied to implement cryptographic algorithms on PLC as well. That is, we face many unique challenges and difficulties in realizing PLCrypto as mentioned in Section 1, e.g., the restriction imposed by the programming language and the physical constraints of the resource-constrained devices.
Beyond those difficulties, we also need to prevent tag manipulation attacks (detailed in Section 4), which are unique on PLCs.

Preliminaries
In this section, we introduce various notations and cryptographic primitives used in this paper.
General Notations. We denote the security parameter by κ, an empty string by ∅, and the set of integers between 0 and n − 1 by [n] = {0, . . . , n − 1}. If X is a set, then x $ ← X denotes the action of sampling a uniformly random element from X. If X is a probabilistic algorithm, then x $ ← X denotes that X runs with fresh random coins and returns x. We denote the binary representation of a value X with bit size l n as a vector x can be represented as a bit array. We let | · | be an operation to calculate the bit-length of a value. We let || be an operation of concatenating two strings. We let 0 n denote a bit string consisting of n zeros.
In Table 1, we summarize some important notations used in this paper.

Background of PLC Programming
PLC Basics. Programmable logic controllers (PLCs) are a class of embedded devices designed specifically for controlling industrial devices (such as sensors and actuators) in industrial control systems. On a PLC, the control program is running periodically. In each period, the PLC takes inputs, executes its control program based on the inputs (from sensors), and generates outputs (to steer actuators). The period is typically called a scan cycle. All variables are called tags in PLC programming, so we use "tags" and "variables" interchangeably hereinafter. In addition, PLCs are usually connected with the supervisory control and data acquisition (SCADA) system in an industrial control system. SCADA system is responsible for collecting the operational data provided by all PLCs in the system and coordinate their control behavior to achieve the best performance. Benefits. An advantage of ST compared to other high-level programming languages is that it allows direct access to every bit of a SINT or DINT type variable. This feature can be a great boost of performance for cryptographic algorithms on PLCs, i.e., one can append ".
[·]" to a SINT or DINT variable to read a bit of the variable. For example, given a DINT variable t, we can get the 6-th bit of t by using t. [5] or t.5.
Drawbacks. ST does not have pointer like data type (as in C language), and it does not support dynamic memory management. Therefore, sending parameters between two routines (functions) might be costly, particularly concerning large-sized data (e.g., hash messages). Also, ST does not have bit-wise shift/rotate instructions, which are primitive functions used in many cryptographic algorithms (e.g., block ciphers). These drawbacks of ST impede the performance of the cryptographic algorithms running on PLC.

Subset-sum Problem
Let A = {a 0 , a 1 , . . . , a ln−1 } be a set of l n numbers, where l n ∈ N and each a i (for i ∈ [l n ]) is an l n -bit integer. The subset-sum problem is one of Karp's NP-complete problems [IN96, DRX17, CG20] which can be viewed as inverting the following function: where x ∈ {0, 1} ln , and A is a fixed parameter of F sss . In PLCrypto, we choose the modular addition under the field F 2 ln for efficiency. Given a target value t ∈ {0, 1} ln , inverting F sss is to find an appropriate x such that F sss (x, A) = t. As shown in [IN96], many cryptographic primitives, such as one-way function (OWF), and universal one-way hash function (UOWHF), can be built from the subset-sum problem. Namely, the function F sss (x, A) directly implies the construction of OWF. UOWHFs [NY89] is also known as target collision-resistant hash function such that it is hard to find a collision where one pre-image is chosen independently of the hash function parameters. UOWHF can be realized with F sss (x, A) by appropriately truncating some bits from its output for compression purposes. We denote such a UOWHF by H sss , which takes as input a message m ∈ {0, 1} lm and outputs a hash value t ∈ {0, 1} l h . Moreover, Steinfeld et al.
[SPW06] pointed out that higher-order UOWHF can also be constructed from subsetsum assumption, so it is feasible to build a UOWHF function with variable input-length l m ≥ l n using F sss (x, A) as a compression function. UOWHF has many cryptographic applications, e.g., it is widely used for hashing long messages before signing with a digital signature scheme.

Algorithms in PLCrypto
In PLCrypto, we include cryptographic algorithms providing the security properties that are mostly desired in real-world applications, such as confidentiality, integrity, collision-resistance, and pseudo-randomness. Most of the included algorithms are either standardized by ISO/IEC or the state-of-the-art symmetric cryptographic algorithms that fit the PLC environment. Besides subset-sum problem based OWF and  [GPP11,JG16] and SPONGENT [BKL + 13]. We also include a cryptographic protocol called Proof of Aliveness (PoA) [JYvDZ19], which serves as an application example to show how to use the cryptographic algorithms in PLCrypto. As our implementation may require big-integer operations, we also realize some basic operations (such as addition, subtraction, multiplication, and division) to show their performance on PLCs. More details of these algorithms are reviewed in Appendix A.

Threat Model
In this section, we describe the threats against PLCs. Generally speaking, most commodity PLC platforms offer little security protection for remote access, which enables attackers to exploit systematic vulnerabilities. In the following, we discuss the attacker capability and the settings of PLCs. Attackers. To better illustrate the threats against PLCs, we present the high-level system model and threat model in Figure 2. PLCs are connected via wired/wireless industrial local area network to computers (e.g., SCADA in the monitor center, or even another PLC in the network). The PLCs can exchange system operation data and control messages with the SCADA and other connected PLCs. However, since the network module of many commercial PLCs do not support any security features, the network communication to PLCs is usually unprotected and open to attackers once they get access to one of the devices or access points in the network.
In this work, we mainly consider network attackers against PLCs, who can take control of the communication of a plant where the target PLCs are installed. The goal of the network attackers is to manipulate the executable code or data stored in the PLC and try to steal the secrets stored in the PLC (e.g., the system status and parameters). We assume such attackers could be powerful enough to connect to PLCs and to receive/inject new messages via communication ports (e.g., launched by remote PLC management software such as Studio 5000 [Bra09b] or third-party customized program like Pycomm [AM18]). In addition, a network attacker may leverage its communication power to download/upload the control program from/to PLC, 2 if it is allowed, and manipulate the tags of PLC (i.e., read and modify tags). However, we do not consider insider attackers who can physically access the PLC and locally manipulate it. We also assume that no additional security appliance is attached to the PLCs to provide secure communication, such as Stratix 5950 [Bra20], which may cost thousands of dollars each.

Tag Manipulation Attack.
To facilitate the control and supervision of PLCs, modern PLCs usually support online tag reading/writing capabilities without interrupting the operations of PLCs. Any tags (variables) in the control program can be written or read anytime, including when the program is being executed (within scan cycles). This feature is very useful when operators need to update certain parameters of the system without shutting down the whole service. However, this online tag manipulation capability would trivially lead to a family of attacks on cryptographic algorithms, which we collectively call tag manipulation attacks (TMA). A simple read or write operation to a sensitive tag can leak sensitive information (e.g., secret keys) and compromise operational parameters (e.g., the rotational speed of connected centrifuges, in the case of Stuxnet [FMC11]). For the security of PLCrypto, we introduce tag manipulation attacks, as a kind of adversaries that are unique on PLC platforms. Tag manipulation attacks would jeopardize not only the confidentiality but also the integrity of critical tag values. In the implementation section of PLCrypto, we will show the effects of tag manipulation attacks on various algorithms and how to prevent them. We assume that no extra trusted hardware is used to store the cryptographic secrets for PLC. Note that we are the first to introduce this unique threat model and propose countermeasures to the attacks on PLC platforms.

Sketch of the Abstract Security Model for PLCrypto.
Including the aforementioned tag manipulation attacks, we can model the attack capabilities by a setup with interactive Turing machines and a traditional black-box security notion N of the corresponding algorithm (e.g., N = Authenticated encryption with associated data), where • The "game" machine maintains all variables in two mutually-exclusive collections HL.x and TL.x, for hard-coded tags and the other tags, respectively.
• All variables HL.x in HL are assigned with a value in game initialization.
• Any variable TL.x can be created, written, read, or deleted by (the code of) the oracles of N .
• Apart from the oracles of the traditional security notion N would expose, the attacker would get two additional oracles readVar(varName) and writeVar(varName, varValue).
-readVar(varName) returns the latest value of the variable TL.varName for any currently existing variable in TL.
-writeVar(varName, varValue) sets the value of TL.varName to varValue for any currently existing variable in TL.
• The oracle calls are atomic, so any variables created or deleted in a single oracle call are not visible to the attacker, as in the real world.
Note that there is still a discrepancy with real life, which is ensuring that a function can execute within a single scan cycle on a device. As this is platform-dependent, this assumption must be checked per-device model.

System Level Settings for Security
PLC Settings. To guarantee the secure implementation of our cryptographic library in a PLC against the network attackers (i.e., preventing the attackers from trivially stealing the secrets in a PLC), we first assume that the PLC is switched to "run" model using a physical switch on the PLC, so that the network attackers is not allowed to download/upload a project to/from it. If the PLC needs to be re-programmed, it should be done by an administrator with strict supervision, e.g., locally switching the mode to "program" by a staff. In practice, it is very rare for critical infrastructures to update the control program of PLCs because nobody wants to interrupt the operations. In this way, the network attackers are unable to get the secrets hard encoded within a control program. In a PLC, the control program is executed repeatedly/periodically, and the time between the repeated executions is called a scan cycle. To prevent the network attackers from remotely reading and writing tags within a scan cycle, we assume that all tasks implementing cryptographic algorithms are assigned to be event or periodic tasks with priority higher than the communication task, so that they cannot be interrupted by the communication task. In this setting, within scan cycles, the network attackers cannot read the intermediate tag values which may contain secrets, but all variables are still readable/writable by the attacker after PLCrypto thread completes execution or between PLCrypto executions. Security Validation. We implemented the tag manipulation attacks (TMA) against our Allen-Bradley PLC via Studio 5000 (the official tool for managing every detail of an Allen-Bradley PLC, including read-only tags). In our experimental attack, we develop a toy task which does the following steps: • Initialize DINT tags TARGET := 0; • Repeat the code TARGET := TARGET + 1 for 100 times; • After the above loop execution, if TARGET ≤ 100, TARGET := 0, else set the attack result ATTACK_RESULT := TARGET.
Note that we are testing the modification capability by assigning TARGET with a value that is larger than 100 via Studio 5000 during the execution of the above toy task. If such a modification of TARGET is successful within the execution of the toy task, then it will not be cleared in the last step, and we will observe the modified value in the other tag ATTACK_RESULT.
We first set the type of the toy task to be continuous. Then, through the run-time tag management interface of Studio 5000, we can see the change of the TARGET during the execution, and we can manually modify its value using Studio 5000. However, when we change the PLC setting as mentioned above (i.e., we change the type of the toy task to be periodical and set its priority to be higher than the communication task), then we can only observe zero through Studio 5000. This means that the TMA within one scan cycle are prevented if we properly configure task types and PLC modes. More details on PLC modes and tasks are in Section 3.1 (also in the manual [Bra09b]). However, a PLC task is usually repeatedly executed and two executions of the same task share tags, so we still need a solution to protect the critical tags (e.g., cryptographic keys) between two consecutive executions from TMA. In other words, we still need to prevent TMA that manipulates tag values between scan cycles, which we will discuss in Section 5.3.

Overview of Implementation Tricks
Our primary security concern on PLCrypto implementation is to resist the tag manipulation attacks, in particular for protecting the cryptographic keys. Besides all system settings above, the most important implementation trick for this problem is to hard-code the concrete values of critical tags (either keys or parameters) into the program code to protect the confidentiality of them, since it is possible to prevent the attackers from accessing the program code after the system is set to the RUN model (as discussed in Section 4). In our implementation, we will leverage two hard-coding strategies: • Hard-coding with Runtime Loading (HC-RL): Unlike the program on PC (which is immune to TMA), it is insecure to load the tag values once and use them across multiple scan cycles on PLC, since adversaries can trivially read the tag values via communication tasks in the interval between two executions of the algorithm. Note that the attacker does not have to issue read/write requests exactly in the interval to manipulate the tag values successfully. The received read/write requests will be scheduled when the communication task is running. In this HC-RL strategy, an algorithm must load the concrete values of critical tags on the fly at the beginning of every execution (in a scan-cycle), e.g., key_tag := 12345. At the end of the execution, the algorithm should erase the tag, e.g., key_tag := 0, if it is a secret, such as a cryptographic key. This strategy is suitable for the situation when only a small amount of tags are sensitive and need to be hard-coded. This approach is simple and easy to implement.
• Hard-coding at Where-used (HC-WU): When there are lots of hard-coding tags, the HC-RL approach may become a performance bottleneck of the implementation. For example, as each number a i in the parameter A = {a 0 , a 1 , . . . , a ln−1 } of subset-sum based OWF would be represented by l n /31 DINT numbers in the implementation, it requires l 2 n /31 = 2304 assignments (when l n = 256) to load the entire A matrix, which would take a lot of time on a PLC. Hence, in this solution, we hard-code a tag value at where it is used. E.g., for an expression Another goal of this paper is to seek efficient implementations of the selected algorithms on PLC. The core objective of our optimization technique is to reduce the number of computation steps. To do so, we shall take full advantage of the bit-accessibility of ST to improve the efficiency of implemented algorithms and realize the necessary functionalities like shifting and rotation. Besides, we extensively rely on pre-computation in conjunction with hard-coding strategies to improve the performance. In other words, we can pre-process many computation steps of an algorithm and hard-code them in exchange for efficiency. We summarize the optimization ideas in the following: • Bit-wise Read and Write (B-RW): With the bit-wise accessibility of ST, we can read and write a bit of an integer just like accessing the integer (e.g., obtain the carry bit in big-integer addition), unlike the implementation (e.g., [BKL + 12]) based on C language that needs shifting and AND operations. 3 With the bit-wise write capability, we can erase bits with few cheap assignments. This could be useful in realizing the modular operations modulo 2 n , i.e., we only need to clear the bits beyond the (n − 1)-th bit.
• Bit-wise Move (B-MV): Thanks to B-RW, we could also directly move a bit to the target position with only one assignment statement, e.g., s.[j/32] := t.
[i] moves the i-th bit of t to the j/32-th bit of s. Relying on this approach, we could efficiently implement the basic functionalities, including shift/rotate and permutation box. In addition, when the indices can be pre-determined (or pre-computed), e.g., in bit-wise rotations, we can pre-compute all target positions (e.g., j/32 in the above example) and hard-code all movement steps following the HC-WU strategy.
• Merge Bit-wise Operations (B-MO): The objective of this approach is to merge bit-wise operations of a procedure (e.g., permutation box) into other procedures (e.g., substitution box) instead of executing these procedures independently so that it could reduce some intermediate computation steps. For example, one can apply the permutation box to each intermediate substitution result on the fly rather than at the end of the substitution procedure (to avoid the steps for assembling the small intermediate substitution results to a large value). An example of this optimization approach could be found in our implementation of PRESENT.
Nevertheless, to realize the above general hard-coding and optimization ideas, we still need to study and test the concrete optimization steps for specific algorithms.

Security Principles against Tag Manipulation Attacks
Overview of Security Principles. We summarize our comprehensive principles for preventing tag manipulation attacks concerning different attack targets as below: • Confidentiality and Integrity of Any Tags Within One Scan Cycle: As we mentioned in Section 5.1, once the mode of the PLC is set to the RUN mode and the cryptographic task has a higher priority than that of the communication thread, remote attackers can no longer interrupt the cryptographic task within one scan cycle and steal/compromise any intermediate values. This principle can be seen in all the algorithms implemented.
• Confidentiality of Any Secret Constants between Scan Cycles: Even if we have the necessary PLC system settings mentioned above, the communication task will still be scheduled between every two consecutive scan cycles. Thus, an attacker can request to read/write to any tags when the cryptographic task is not running. As a rule of thumb, at the end of a cryptographic task, all intermediate values that can potentially leak the secret must be cleared. Also, all secret values (e.g., secret keys) are hardcoded in the control program using HC-RL or HC-WU. Since the PLC is at the RUN mode, the attacker cannot access the program itself; he/she has no way to directly read the secret values from the program or the tags. This design principle is used in all algorithms involving a secret.
• Integrity of Constants between Scan Cycles: In cryptographic algorithms, pre-defined constants play a critical role. Sometimes, if the constants are tampered with by an attacker, a fault injection attack can be launched. To prevent such an attack, we apply our hard-coding implementation tricks HC-RL and HC-WU.
Essentially, all constants need to be loaded again from the program at the beginning of a cryptographic task to prevent any malicious modification of the constants between scan cycles. We present an example of this threat scenario in the implementation of subset-sum based OWF in Section 6.1, where we need to protect the integrity of the public constant matrix.
• Integrity of Public Variables between Scan Cycles: The primitive algorithms (like OWF, BC, MAC, and HASH) we implemented are all stateless, so we can safely clean all variables used inside the stateless algorithms. However, when we use the algorithms in a larger context (e.g., in a protocol or in a certain mode), we may need to keep a public state variable over multiple scan cycles. This public variable is subject to TMA as well. We have to compute a MAC to protect the integrity of the states at the end of a cryptographic task, and check its integrity before it is used again in the next scan cycle. Fortunately, the MAC algorithm implemented is very efficient. An example of this practice can be found in Section 7 where we integrate multiple algorithms and implement a protocol called PoA, in which a public monotonic counter needs to be maintained.

• Confidentiality and Integrity of Secret Variables between Scan Cycles:
Though we have not encountered any secret variables that need to be kept in multiple scan cycles in our implementations, for the sake of completeness, we will recommend using encryption and MAC algorithms (or an Authenticated Encryption scheme) to protect any secret variables in such a case.
Minimal Soft/Hardware Requirements. Given the above security analysis against tag manipulation attacks, we summarize the minimal software and hardware features required for securely running PLCrypto as follows: 1. The PLC supports standard ST defined by IEC-61131-3; 2. The PLC supports priority based task scheduling and supports at least one kind of task whose priority level is higher than that of the communication thread; 3. The PLC has a hardware switch to set the PLC to run mode, preventing remote users from access the control program; 4. The PLC has enough memory space to run PLCrypto code as specified in Table 6 for each algorithm.

Selection Criteria of Algorithms in PLCrypto
Generally speaking, we mainly select cryptographic algorithms which are standardized and efficient. We also consider algorithms (e.g., subset-sum based OWF) that are provably secure and efficient and can serve as an easy-to-understand example for demonstrating the TMA threats. In consideration of efficiency, it is possible to apply the above hardcoding and optimization strategies as metrics for selecting the cryptographic algorithms in PLCrypto. In the following, we list our selection criteria in detail: 1. Standardized (STD). Standardized algorithms are usually widely recognized and accepted in practice, so our top priority of choosing algorithms is to pick algorithms standardized by either ISO (International Organization for Standardization) or NIST (National Institute of Standards and Technology).
2. Promising Performance on PLCs (PPP). We also try to seek algorithms that are more suitable for PLC, such as subset-sum based OWF and SPECK. Our benchmark results for such algorithms may provide a baseline for future research.
3. Pre-computation Friendly (PCF). Algorithms that are easy to pre-compute their expensive operations could reduce the computation cost, such as PHOTON with a precomputed tables-based implementation (applying both the SBOX and the MixColumns coefficients at the same time).
4. Bit-wise Operable (BWO). The above bit-wise optimization strategies can optimize algorithms that comprise of many bit-wise operations (such as permutation box).

Short Keys and Parameters (SKP)
. Since keys and parameters should be hardcoded due to TMA, this criterion would affect the performance significantly. Table 3 summarizes the techniques and selection criteria applied to the algorithms in PLCrypto. The PoA is used as a "use case" to show: i) usage of algorithms in PLCrypto; ii) example of protecting the integrity of tags across scan-cycles; iii) new performance results of PoA on commercial PLCs.

PLCrypto Implementation
In this section, we elaborate on the implementations of selected cryptographic algorithms in PLCrypto. Our implementations are done on PC and PLC, respectively. On a PC, we mainly use Python to automate the initialization and code generation (e.g., key sampling, parameter generation, pre-computation, and key-dependent hard-coding) of algorithms. To load the parameters (e.g., A of F sss ), PLC can run an independent task for initializing those parameters (represented as tags) used by the cryptographic algorithms. For readability, we skip the details of the initialization task, which just consists of a few assignment steps Here we focus on describing the detailed optimizations of algorithms implemented on PLC using ST. As the secret keys of algorithms should be hard-coded, the keyed functions will no longer explicitly take as input the keys in the following description.
Here we focus on describing the detailed optimizations of algorithms implemented on PLC using ST. Also, we present some pseudo-codes in Appendix C.
Notations. Let≪ and≫ be hard-coded left rotation and right rotation operations implemented on a PLC.ˆ andˆ represent hard-coded left/right shifting operations. And DN denotes the number of DINT variables that are required to represent a big number.

Implementation of OWF and UOWHF
In this subsection, we show the implementations of subset-sum (sss) based OWF and UOWHF. We include the subset-sum based OWF because it is much more efficient than using other lightweight cryptographic hash functions as OWF. And we will use it as an example to show a special form of TMA when the attacked tags are parameters (which would be usually treated less carefully) and the feasibility of our optimization approaches. In the following, we first introduce a concrete TMA on subset-sum based OWF. Then we present two approaches to implement OWF: the first one follows the original steps of the algorithm, and the second one exploits a time-space trade-off to improve the efficiency of OWF by leveraging hard-coding strategy HC-WU and optimization approach B-RW.
A Tag Manipulation Attack on OWF. We first study the importance of hardcoding the parameters. Consider the situation that one initializes the parameter A = a 0 , a 1 , . . . , a ln−1 once with a separate initialization task but uses it repeatedly across executions/scan cycles. However, when such an initialization task is done, the network attackers are able to modify A to launch a tag manipulation attack to recover the pre-image x. For example, to obtain j-th bit x, the attacker only needs to set {a i } = 0 for i = j and i ∈ [l n ]. It is obvious that the evaluation result of OWF would be either a j or 0 that could be used to infer x[j] trivially. [DNt] , which stores the values of the parameter a 0 , a 1 , . . . , a ln−1 . 4: for i := 0 to l n − 1 by 1 do 5: if v = 32 then 6: v := 0; u := u + 1; //switch to the next 32-bit block Baseline Implementation of OWF. To avoid the TMA, we can load A on the fly in each OWF evaluation in this implementation scenario. As a baseline, we first realize the OWF following HC-RL hard-coding strategy. To realize an addition modulo 2 ln , we appeal to the standard multiple-precision addition [MvOV96, Algorithm 14.7]. We let + denote the big-integer addition implemented on PLC modulo 2 l b where l b ≥ 32. To efficiently get the carry bit, we choose to use a digit base 2 31 to realize big-integer addition so that we can obtain the carry bit (i.e., the sign bit of a DINT variable) with only one assignment statement. In this way, we can avoid dealing with the overflow caused by the sign bit, which may need many additional judgments or logical operations.

Algorithm 1: Evaluation of Subset-sum based OWF
The input x ∈ {0, 1} ln of F sss (x, A) is represented by DN x = ln 32 DINT variables while we realize the evaluation sub-algorithm on PLC. We initialize the parameter A hard-coded in the PLC program by sampling a random l n × DN t two-dimensional 31-bit integer array A[l n , DN t ] on PC, where the set of integers {A[i, j]} j∈ [DNt] represents the i-th l n -bit number a i of A, and DN t = ln 31 . The evaluation sub-algorithm of F sss (x, A) is shown in Algorithm 1. We can leverage the B-RW optimization idea to easily get a bit of x by x [u]. [v], where the variable u ≤ DN x is an ARRAY index and the variable v ≤ 30 is bit index of x. So the evaluation of F sss can be realized with two-layer nested loops, in which the outer layer is to decide whether [v], and the inner loop is to calculate the l n -bit big-integer addition.
Faster Implementation of OWF. Note that the initialization of the parameter A in Algorithm 1 requires l n × DN t -times assignments, which take almost half of the computational cost of the entire algorithm. To improve the performance, we could apply HC-WU hard-coding and B-RW optimization strategies. Specifically, we can leverage "If" statement in ST to hard-code A and all computation steps involved in Equation 1. To do so, we make use of Python to automatically generate all of the l n "If" statements between Line 8 and Line 10, with concrete values of u, v, i, j, DN t and DN x . In the meantime, the big-integer addition involved in each "If" statement is hard-coded following strategy HC-WU.

Implementation of UOWHF.
The UOWHF H sss with fixed message input length can be straightforwardly obtained by an OWF F sss with customized input and output spaces, so we have H sss = F sss . Instead of using length-preserving OWF, we specifically set the output length of F sss to be half of the input, i.e., l m = l n and l h = l n /2. Then each element a i in the public parameter A = {a 0 , a 1 , . . . , a ln−1 } has l h -bit. The implementation of F sss is similar; thus, we omit repetition here. To extend the message space, we can divide an arbitrarily long message into multiple l n /2-bit message blocks and hash them one by one iteratively.
Security Analysis. Here, we focus on analyzing the resistance of TMA against our implementation of OWF. The security of UOWHF implementation is implied by that of OWF. Note that the subset-sum based OWF only leverages constant tags between scan cycles, i.e., the parameter A. The network attacker cannot manipulate the HC-RL hard-coded tags of A between two scan cycles since the OWF task would load and refresh the tag values in every scan cycle from the code. By our PLC settings, the network attacker cannot read/write the tags (including parameter A and all other intermediate results) during the execution of an OWF task. In a nutshell, our OWF implementation blocks all attack surfaces of TMA attackers.
Furthermore, it is also important to understand the side-channel leakage of the implementation via some obvious side channels like timing. 4 Note that the performance of OWF depends on the number of one bits of the input value, i.e., no operations are done for zero bits. Hence the network attackers may exploit timing-based side-channel information to infer the input of the OWF. We roughly analyze this kind of threat based on the different usage of OWF. If the OWF is used as a hash function for message compression, then runtime would have no impact on the collision resistance of it. When the input of OWF is a secret, the runtime of OWF will be close to the average case since the input should be chosen at random with sufficient large entropy (so the hamming distance between two secrets should be close). Nevertheless, to resist timing-based side-channel attacks, the runtime of OWF is better to be constant. To achieve this, one could add dummy operations to handle the zero bits in the input to ensure that the runtime of OWF to be always the worst-case performance. We leave a concrete solution for preventing timing-based side-channel attacks as one of the future works.

Shifting and Rotation Operations
Shifting and rotation operations are extensively used by symmetric cryptographic algorithms. Unfortunately, some PLCs [Bra18a] do not provide any bit-wise shifting/rotation instruction in structured text (ST). Therefore, we have to develop bit-wise shifting/rotation operations first using ST as a building block for implementing other algorithms. Of course, a shifting operation can be realized by multiplication or division operations; however, such an approach is inconvenient and inefficient for signed DINT variables and big integers. Note that some PLCs do not support 32-bit unsigned integers. So a few more operations (including arithmetic and logic operations) should be carried out to deal with the sign bit and overflow, in particular when a big integer is involved. Another benefit of our hardcoded shifting/rotation operations is its constant runtime which is independent of the positions being shifted/rotated. Hence, they leak nothing through timing.
Our solution is to leverage the bit-wise accessibility of a DINT variable to directly move a bit into the corresponding target position, i.e., by utilizing the optimization approach B-MV. And it can be easily applied for big integers, which are represented by a few DINT variables. We will call this approach as bit-assign shift/rotate. Specifically, we develop a function Shift(Dir, isRot, m, pos) to realize all shift/rotate operations that are needed by Remark 1. We stress that all arithmetic operations in Shift can be pre-computed and hard-coded if pos is known in a specific algorithm (e.g., Chaskey and SPECK). That is, we could apply the hard-coding strategy HC-WU and use Python to generate all assignments involved in Line 5 or Line 12 with concrete array index values. So the hard-coded version of Shift is very efficient since it requires only a constant number of assignments determined by the bit-length of m.
In Figure 3, we show a code snippet of a concrete hard-coded rotate function with 64-bit operator, i.e., Speck_SR_m≪3, used in the implementation of SPECK. Here the rotated message Speck_SR_m is represented by two 32-bit DINT variables, We will use the set of operators {ˆ ,ˆ ,≪,≫} to denote all kinds of operations realized by Shift (Dir, isRot, m, pos), e.g., mˆ pos is short for Shift(1, 0, m, pos). To initialize the program, we first generate a MAC key on PC so that it can be hard-coded following strategy HC-RL.

Implementation of MAC Algorithm Chaskey
The MAC evaluation algorithm of Chaskey is a permutation based scheme. We implement the most efficient permutation π c [MMH + 14, §3.2] which is realized by our Security Analysis. Since we only need to protect the keys of Chaskey, which are protected based on our HC-RL hard coding strategy, no network attackers can read/write those fixed tags in multiple scan cycles. Similarly, the network attacker cannot manipulate the tags within one scan cycle because of our PLC settings. Hence, our implementation of Chaskey can prevent TMA.
Moreover, The major operations, i.e., rotation, in the permutation sub-algorithm π c is hardcoded as presented in Section 6.2 have constant-time execution time. And all other codes of Chaskey mainly involve standard arithmetic operations whose performance is independent of the input message or internal states. Hence, our Chaskey implementation has constant runtime. Namely, it does not leak any timing-based side-channel information to network attackers.

Implementations of Block Ciphers: PRESENT, SPECK, and SI-MON
In this subsection, we introduce the implementations of block ciphers included in PLCrypto. . Both schemes support various parameter sets and key sizes, so they offer more flexibility than PRESENT for users in choosing proper parameters for their applications.
The key generation of all these block ciphers can be pre-computed, so we implemented the key generation procedure on PC in Python and hard-coded the round keys in the generated ST code of the encryption and decryption schemes following HC-RL. Since the decryption is an inversion of the encryption for all implemented block ciphers, we will just describe the implementations of the encryption algorithms. Since we pre-compute key scheduling of the block ciphers in our implementation, it will limit the usable modes of op- Run the merged SBOXLayer and PBOXLayer as: However, we observe that some parts of the steps of SBOXLayer and PBOXLayer can be merged following the optimization idea B-MO. To execute the complete SBOXLayer, each SBOX look-up result stmp should be assembled back to St (nibble-by-nibble), so that the final resulting state St would be taken as input to the PBOXLayer. Such assembling steps may require 64 assignments, which dominate 1/3 steps of the entire SBOXLayer and PBOXLayer (where an assignment statement roughly costs 1.17 µs). However, we figure out that it is possible to directly input the SBOX result stmp into PBOXLayer for permutation since the corresponding position of each bit of stmp in St is pre-determined.
We implement the encryption of PRESENT as Algorithm 3. For efficiency, we can also pre-compute the arithmetic operations in SBOXLayer(St) and PBOXLayer(St), i.e., 4 * v + z, j/8, j MOD 8, blk := PBOX(j * 4 + δ)/32, and pos := PBOX(j * 4 + δ) MOD 32 in Step 4. Namely, we can implement SBOXLayer and PBOXLayer with only a few assignments after applying a HC-WU-like hard-coding strategy. We stress that the arithmetic operations involved in these steps shown in Algorithm 3 are just used here for demonstrating our intuition.
Taking the third times substitution result PT_State_tmp in Figure 4 as an example, the least significant bit PT_State_tmp.

Implementation of SPECK and SIMON.
The key techniques in our implementations are our tailored shifting/rotation operation and big-integer Add/Sub functions. The encryption algorithms of SPECK implemented on PLC only take a message as input since we hard-code the concrete values of pre-computed round encryption keys following the HC-RL strategy. To implement SPECK with a 128-bit message (each block having 64-bit), we leverage the big-integer addition (in encryption) and subtraction (in decryption) [MvOV96, Algorithm 14.9], respectively. The implementation of SIMON is similar to that of SPECK, which mainly relies on our hard-coded shifting/rotation function.

Implementation of PRF and PRG.
Here we leverage block ciphers to realize both pseudo-random function (PRF) and pseudo-random generator (PRG), following the approach standardized in [SPLI06, EB07]. Note that Chaskey can be viewed as a PRF, and running in counter mode turns it into a PRG [MMH + 14]. Hence, counter mode Chaskey is included in the comparison as well. We consider a PRG to have the same input as PRF, i.e., the PRG evaluation function has an additional input message x ∈ {0, 1} lx that could be the counter indexing the generated randomness.
Specifically, we utilize PRESENT, SPECK, and Chaskey to implement both PRF and PRG. To generate longer random bits for PRG, we leverage counter (CTR) mode in their operations.

Security Analysis.
Due to the HC-RL hard coding of round keys, our block cipher implementations can resist TMA as well. From the algorithmic point of view, PRESENT, SPECK, and SIMON have no branches in the program. In addition, the hard-coded SBOXLayer/PBOXLayer, and the shifting/rotation operations in SPECK and SIMON run in constant time, so our implementations of block ciphers have constant runtime. Note that the security of our PRF/PRG is implied by that of the underlying block ciphers. Security Analysis. Similar to the implementations of block ciphers, we protect the parameters (i.e., SBOX and PBOX) of both schemes using the hard coding strategies. Moreover, our implementations have constant runtime by their algorithm designs and our hard coding tricks.

Case Study: Proof of Aliveness
Background. Proof of aliveness (PoA) is a cryptographic notion that was recently proposed by Jin et al. [JYvDZ19]. Although PoA was proposed to attest the aliveness (working status) of CPS devices like PLC, to the best of our knowledge, it has never been implemented on a commercial off-the-shelf PLC. Here we briefly introduce the advanced PoA protocol Π PRG OWF in [JYvDZ19]. Π PRG OWF is composed of two procedures: i) proof generation; and ii) proof replenishment, where proof generation algorithm is used to generate a publicly verifiable proof every ∆ s seconds to attest its aliveness, and proof replenishment algorithm is used to re-initialize a new protocol instance when the proofs of the current instance are used up.
The protocol Π PRG OWF has a multiple-chain structure to generate proofs, where the number of the sub-chain is denoted by a parameter η. The i-th sub-chain of Π PRG OWF is an OWF-chain that starts from a head p i 0 and ends at the tail p i N where N is the number of nodes in a sub-chain. The tail of the sub-chain is known by the verifier, and the nodes in sub-chains are periodically sent from the prover to the verifier in reverse order (from the tail to the head) as aliveness proofs. The heads of all OWF sub-chains are linked like a chain of random numbers generated by a PRG. The replenishment of a protocol instance means that the prover will select a new seed for PRG, and create a new multiple-chain structure, and commit all the tails (for verifying the new protocol instance) of sub-chains to the verifier using a one-time signature (OTS) scheme, whose signing keys are the sub-chain heads of the current instance.
A Tag Manipulation Attack on PoA. We stress that it is not secure to implement the PoA on PLCs directly following the original specification. According to the original design [JYvDZ19], we have to store some critical tags during the full lifetime of the protocol, e.g., protocol instance counter P and sub-chain counter S. However, since these tags need to be updated frequently, they cannot be hard-coded in the program. Attackers can tamper with the tags, i.e., attackers can get any unused proofs by manipulating either P or S to trick PLCs to generate the future proofs that should not be generated at the current time.

Modifications of the Original PoA Protocol.
To protect critical tags, we can generate a MAC value of tags and check their integrity at the beginning of a new scan cycle. Because of the reading/writing capability of network attackers, we cannot cache the proofs (in the memory) as in [JYvDZ19]; otherwise, all future proofs cached in the memory will be read by the attackers.

Implementation. To implement Π PRG
OWF in [JYvDZ19], we used F sss , PRG realized by Chaskey-CTR, Chaskey, and H sss in PLCrypto. The head nodes of sub-chains are generated by PRG, as p i 0 := PRG(P||i||l r ). The OWF is instantiated with F sss with l r = 256 (which is much more efficient than any other cryptographic hash functions like PHOTON on PLC). To replenish proofs, we run the initialization algorithm on PLC first and then sign the new verification state (tails) using Lamport's one-time signature (OTS) [Lam79] using the sub-chain heads of signing keys. So the minimum number of sub-chains is 256, i.e., the number of signing keys of OTS. Before signing the tail nodes, we compress them using UOWHF H sss as a message domain extender [SPW06]. Also, we need to run the whole initialization in one scan cycle to avoid calculating MAC for too many tags. Fortunately, we can choose to use a relatively short sub-chain due to the replenishment feature, so the computation cost of the initialization is adjustable as demanded.
Security Analysis. The security of our PoA is guaranteed by the secure implementation of the concrete instances of its building blocks. The integrity of the counter between scan cycles relies on the security of the MAC scheme. 5 Deployment Cost. Being a software-only security solution, PLCrypto does not require additional hardware components or firmware updates. Also, we do not need to modify the communication protocols. We just need to add a line in the original control program to call a function in PLCrypto with inputs.

Benchmark
Benchmark Settings. To show the performance of the cryptographic algorithms in PLCrypto, we implemented them on a mainstream commercial PLC [Pro18] from Allen Bradley, which has a controllogix 5571 processor and 2 megabytes (MB) memory. We benchmarked our algorithms with various parameters for comparison. We measured the time by averaging the results of 1000 repeated experiments if not specified separately. The execution time is measured using a built-in GSV (Get System Values) instruction to get the execution time of a scan cycle, which only contains the function under the test. The time in the average case and the worst case are reported separately as average/worst.

Correctness Verification.
Our implementations are parameterized, so users can select the parameter settings which fit their applications most. To demonstrate the performance of our implementation, we chose a few widely-used security parameters (e.g., 64, 128, 256) to instantiate the implementations for benchmarking. We used the test vectors in the original papers or the reference implementations provided by algorithm inventors to test the correctness of our implementations (except for the subset sum problem based primitives, we wrote our own reference program to verify its correctness). When we are Performance of Atomic Operations. In Table 4, we show the benchmark results of some important atomic operations on 32-bit DINT variables on the experimented PLC. We measure the performance of each operation by averaging over 10,000 executions.
Since the performance of F sss is determined by the number of "1" bit in the input, we experimented with inputs having ln 2 ones (as the average case) and l n ones (as the worst case). We benchmarked our two implementation solutions: the "baseline" OWF implementation as Algorithm 1, and the improved solution ("Imp-IF") based on hardcoded If statements. The performance of subset-sum based OWF is shown in Table 5a in milliseconds (ms).
The subset-sum based OWF is efficient on PLC, which only needs a few milliseconds for different input lengths. The baseline implementation of the OWF with l n = 256 costs 14.7 ms in the average case and 24.0 ms in the worst case. Clearly, our performance improvement is around two times. Nevertheless, the results show that F sss has much better performance on PLC than on other platforms, e.g., ARM [JYvDZ19].
Regarding H sss , we first benchmarked the fixed message length version that is implemented based on "Imp-IF"-style F sss , whose performance of H sss is also shown in Table 5a. In Figure 5, we show the performance of H sss with message lengths varying from 0.5 kilobits (Kb) to 8 Kb, representing the commonly used parameters in most applications of PLC. Runtimes of Chaskey. The performance of Chaskey with different permutation rounds and message lengths is shown in Figure 6 (a). For l m = 128 and T = 8, the evaluation time is about 2.7 ms, that is efficient enough to authenticate messages collected from sensors in almost real-time. Since the computation cost is dominated by the permutation algorithm π c , it is not surprising that the performance Chaskey with 8-round permutation is 2x faster than that with 16-round (but the latter provides stronger security). Besides, we show that the performance of Chaskey with T = 12 rounds (recommended by ISO/IEC 29192-6:2019) is between that of T = 8 and T = 16.

Runtimes of PRESENT.
PRESENT supports either 80-bit or 128-bit key, but the difference between the two ciphers is only the key schedule procedure that can be done on PC. Hence, the encryption/decryption time on PLC for two different keys is identical. Through our hard-coding optimization towards PRESENT, it is no longer the slowest one among the three implemented block ciphers, as opposed to the results in [BSS + 13], regarding software implementations. For example, PRESENT is 8x slower than SPECK and 5.6x slower than SIMON in [BSS + 13], but we reduce the performance gap between PRESENT and SPECK to be 1.5x on PLC, and it is even faster than SIMON in our benchmark. Hence, PRESENT seems to be more suitable in the PLC environment. The encryption/decryption performance of PRESENT (around 7.1 ms) is fast enough for most CPS applications.

Runtimes of SPECK and SIMON.
We benchmarked SPECK and SIMON with some typical parameters that are multiples of 32 (i.e., the bit-length of DINT) so that no bits in a DINT variable are wasted. The benchmark results of SPECK and SIMON on PLC are listed in Table 5b. SPECK and SIMON are also efficient on PLC since their encryption and decryption algorithms only consist of a few hard-coded rotations and XOR operations, which are very fast. This is also why we include them in PLCrypto. However, SPECK is more efficient than SIMON since it requires less shift/rotate and logical operations in each round, and it has fewer rounds than that of SIMON.

Runtimes of PRF and PRG.
We benchmarked PRF/PRG with three concrete instantiations, PRESENT-CTR, SPECK-CTR, and Chaskey-CTR. For efficiency and security consideration, we chose the message length to be l m = 64 for PRF and PRG. The message length of Chaskey is 128 bits. We set the key size of PRF/PRG/Chaskey to be l k = 128 to meet practical security requirements. The performance of PRESENT-CTR, SPECK-CTR, and Chaskey-CTR is shown in Figure 6 (b). The performance is efficient and linear in the size of the generated randomness. To generate 256 bits randomness (e.g., as required in PoA), the most efficient instantiation Chaskey-CTR roughly needs 8.0 ms.

Runtimes of PHOTON and SPONGENT.
The performance of the five most efficient ciphersuits of PHOTON and SPONGENT is shown in Figure 7  Runtimes of PoA. Firstly, we need to instantiate those parameters (i.e., the number η of sub-chains and the number N of the proofs in a sub-chain) to make all algorithms executable on PLC. It is not hard to see that the proof replenishment algorithm is more costly than the proof generation algorithm since it needs to run the whole initialization procedure at once. Therefore, we need to choose the parameters based on the cost of the replenishment. We set η = 256, which is the lower bound of the number of sub-chains for running the OTS. But we adjust the total number of proofs, i.e., N , from 64 to 128 when we evaluated the proof replenishment algorithm. For proof generation, we experimented with a longer sub-chain to test its performance bound. The performance of PoA on a PLC is shown in Figure 8. The proof generation time is below 6 seconds, even if N = 1000, while, in practice, the interval between two consecutive proof generations can be greater than 30 seconds [JYvDZ19]. When N = 128, the replenishment time is about 3 minutes. For these concrete parameters N = 128 and η = 256, a protocol instance has 32768 proofs that can be used for roughly 11 days. A PLC only needs to run the proof replenishment algorithm in the idle time every 11 days for 3 minutes before the proofs run out. Storage Costs of Algorithms. Table 6 summarizes the storage costs of the implemented algorithms. For simplicity, we present the version of algorithms or the storage costs in  Practicality Discussion. Our library is the first-of-the-kind, and we open-source our codes to encourage others to improve our implementation with respect to its performance and security. The practically acceptable range of runtime and memory consumption certainly depends on applications and devices. On the commercial PLC we tested, our memory consumption (shown in Table 6) is well below the total memory size (2 MB) of the PLC. The runtime of most of the algorithms in PLCrypto is on the orders of milliseconds. Usually, PLCs and their monitoring servers do not communicate very often, so PLCrypto has enough time to compute before sending data to the servers. For example, in a water treatment system, PLC data is collected once per second [GAJM16]. As another example, the PLCs controlling train systems report their status every 300 ms [JAYZ21]. Using Speck 64/128 for encryption and Chaskey with T = 16, one only needs 30 ms to encrypt and generate a MAC tag for 128-bit data.

Conclusion and Future Work
We implemented an efficient and secure cryptographic library PLCrypto for PLC based on ST. PLCrypto includes a wide range of symmetric cryptographic algorithms for realizing essential cryptographic functions (such as OWF, BC, PRF/PRG, MAC, Hash, and PoA).
To use PLCrypto in practice, users can either build the application directly on the maintask of an algorithm in PLCrypto or copy the routines of cryptographic algorithms from PLCrypto to a target application.
To further extend this line of research, one can investigate the possibility of side-channel attacks on PLCrypto and enhance the library with side-channel resistance. Also, one can extend PLCrypto to include more algorithms, e.g., asymmetric cryptographic algorithms. We also encourage researchers to propose novel implementation tricks to further improve the performance of our open-source PLCrypto. Cryptographers are encouraged to develop new lightweight algorithms that better fit the programming constraints on the PLC platform. Due to both security and performance considerations, we used hard-coding techniques in our implementations, but hard-coding techniques also prevent us from frequently updating secret keys of algorithms in PLCrypto. One possible way to address this would be to have an authenticated encryption with associated data with hard-coded key, which would be used for pre-computed round key wrapping; the encrypted round keys can be stored in tags or memory between cryptographic function calls.
The evaluation procedure will leverage on a permutation π c which is defined later. Based on π c , the evaluation algorithm first splits the input message m into blocks m 0 , m 1 , . . . The permutation function π c used above π c consists of T ∈ {8, 16} rounds, where 16-round is recommended for long-term security. Each round runs the following steps based on four 32-bit input variables v 0 , v 1 , v 2 , and v 2 :

A.2 Block Ciphers
In this section, we review three lightweight block cipher families that are implemented in PLCrypto.

A.2.1 PRESENT
We review the algorithms of PRESENT as follows: • Key Generation: This algorithm first randomly samples a key k $ ← {0, 1} l k that shall be loaded into the key register for generating the subsequent encryption keys used in each round. Each round encryption key is the left-most 64-bit of the current key register which will be updated by a key scheduling procedure to generate the next round encryption key. The key schedule depends on the key length l k . The main idea of the key scheduling is to rotating left the key register with 61 bits, partial bits the of the key register are passed through the SBOX (e.g., the left-most four bits for 80-bit key), and the round counter value i is exclusive-ored with the corresponding bits of the key register. The detailed key scheduling procedure is detailed in [BKL + 07]. This algorithm will generate 32 round encryption keys k 0 , . . . , K T −1 where T = 32.
• Encryption: We denote this algorithm by PRESENT enc (K, m) which takes as input an encryption key K = (k 0 || . . . ||k T −1 ) and a 64-bit message m, and outputs a 64-bit block ciphertext C. . After the SP rounds, the encryption algorithm runs an additional AddRoundKey procedure with key k T to obtain the final ciphertext C. • Decryption: We denote this algorithm by PRESENT dec (K , C) which takes as input the encryption key K = (k T −1 || . . . ||k 0 ) and a 64-bit ciphertext C, and outputs message m. The decryption algorithm is a reverse execution of encryption algorithm. That is, it runs 31 times AddRoundKey, PBOXLayer −1 and SBOXLayer −1 , and one additional AddRoundKey procedure, where PBOXLayer −1 and SBOXLayer −1 are inverse of the permutation box and sub-situation box, respectively.

A.2.2 SPECK
We let l e be the bit-length of a word used by them, where l e could be 16, 24, 32, 48, and 64. So each block of message has a bit-length l m = 2l e . Both SPECK and SIMON have encryption key of l w -word, i.e., l k = l e l w bits, l w ∈ {2, 3, 4}. We shall denote a concrete ciphersuit Ξ ∈ {SIMON, SPECK} as Ξ-l m /l k . In the following, we review them respectively. The parameters of SPECK include two right operands of rotation operations, denoted by α and β, which are constants determined by the parameter l e . Specifically, if l e = 16 (32-bit block) then we set α := 7 and β := 2; otherwise we have α := 8 and β := 3. In the following, we review algorithms of SPECK.
• Encryption: We denote this algorithm by SPECK enc (K, m) which takes as input an encryption key K = (k 0 || . . . ||k T −1 ) and a 2l e -bit message m = (m 0 ||m 1 ) (that is divided into two equal length sub-messages), and outputs a 2l e -bit block ciphertext C = (c 0 ||c 1 ). Each round of encryption carries out a round function RF k (x, y) = ((x ≫ α) + y) ⊕ k, (y ≪ β) ⊕ (x ≫ α) + y) ⊕ k). The entire encryption procedure is the composition , the rounds keys are used in a reverse order for decryption.

A.2.3 SIMON
It has the same message, key, and ciphertext spaces as SPECK. It consists the following three algorithms.
• Encryption: We denote this algorithm by SIMON enc (K, m) which takes as input a key K = (k 0 || . . . ||k T −1 ) and a 2l e -bit message m = (m 0 ||m 1 ) (that is divided into two equal length sub-messages), and outputs a 2l e -bit block ciphertext C = (c 0 ||c 1 ). Each round of encryption encompasses a two-stage Feistel map RF k (x, y) • Decryption: We denote this algorithm by SIMON dec (K , C) which takes as input the encryption key K = (k T −1 || . . . ||k 0 ) and a 2l e -bit ciphertext C = (c 0 ||c 1 ), and outputs message m = (m 0 ||m 1 ). This algorithm leverage a round function RF −1 k (x, y) = (y, x ⊕ f (y) ⊕ k) for decryption.

A.3 Collision-resistant Hash Functions PHOTON and SPONGENT
In this section, we briefly describe two collision-resistant hash function families, PHOTON and SPONGENT, that standardized by ISO/IEC [JG16].

A.3.1 PHOTON
Here we review the two algorithms of PHOTON: • Initialization: This algorithm initializes the constants and parameters that shall be used by during the hash evaluation (including the ones used by π p ). If first sets the initial vector IV and the first internal state S 0 as S 0 The last block may be padded with a "1" bit along with many zeros (if necessary). The evaluation executes absorbing and squeezing procedures. In the absorbing phase, for i ∈ [ ], it computes the i + 1-th internal state as S i+1 := π p (S i ⊕ (m i ||0 lc )). Once all message blocks have been absorbed, it computes S +i+1 := π p (S +i ) for i ∈ [ ], in the squeezing phase. Eventually, hash output is built by concatenating the successive l r -bit output blocks z 0 , . . . , z −1 until it gets appropriate output size l h , where z i is the l r left-most bits of internal state S +i . Now we briefly review the permutation π p . It would first divide the l t -bit input into a l d × l d -matrixC, where each cell of such a matrix has l s -bit. That is, we have l t = l d · l d · l s . In the following, we usesC[i, j] (for i, j ∈ [l d ]) to access the cell at the i-th row and j-th column.

A.3.2 SPONGENT
We review the SPONGENT construction via the following algorithms: • Initialization: This algorithm randomly generates a 8-bit SBOX as in the state-of-theart (C++) reference implementation of SPONGENT [BKL + 12]. It also initializes the parameters including the rate l r of bits of input or output handled in the permutation procedure, the capacity l c of bits of internal state, and the hash output length l h in bits. These parameters can uniquely determined a specific version of SPONGENT. • Evaluation: We denote this algorithm by SPONGENT(m) which takes as input a message m ∈ {0, 1} lm and outputs a hash value z ∈ {0, 1} l h . This algorithm first pads input message m with the same approach of PHOTON, and divides m into l r -bit blocks (m 1 , m 2 , . . . , m ), where = |m| lr . SPONGENT(m) encompasses two phases: Absorbing and Squeezing . In Absorbing phase, each l r -bit message block is xored into the first l r bits of the state and each resultant state is updated by the permutation function π s . Squeezing phase is used to generate the hash value, which would iteratively get the first l r bits of the state and apply the permutation function π s to update the state, until l h bits are obtained. The core permutation function π s operates over the state St with size l g = l r + l c in T -rounds, where T ∈ {45, 70, 90, 120, 140} that corresponds to the parameters l h /l c /l r . For i ∈ [T ], π s runs three procedures: St := VI(i) ⊕ St ⊕ IV(i), SLayer, and PLayer, to update the state. IV(i) is the state of an linear feedback shift register (LFSR) in round i, which outputs the round dependent constant and is xored to the right-most bits of state. VI(i) is the bits of IV(i) represented in reversed order, which is xored to the left-most bits of state.
The initial values of IV(i) can be found in [BKL + 13, Table 2]. The SLayer procedure is identical to that of PRESENT. PLayer moves the j-th bit of St to new position sPBOX(j), where PBOX(j) returns j · l g /4 mod l g if j ∈ [l g − 1] and l g − 1 otherwise.

A.4 Proof of Aliveness
A proof of aliveness (PoA) scheme is a two-party protocol in which a client idC proves its aliveness at a certain time to a server idS. We represent the time elapse via a discrete-time slots {T i }, and any two of them has a time interval ∆ s , i.e., T i+1 − T i = ∆ s . Let ∆ rc denote the life-span of a PoA protocol instance, and T att be the aliveness tolerance time. Basically, if the server idS fails to receive any valid proof from the client idC within T att , then idC is considered to be dead.
Here we review the second protocol Π PRG OWF proposed in [JYvDZ19] and implemented on PLC. We briefly review the protocol execution phases of Π PRG OWF as below: • Initialization: The life-span ∆ rc of the protocol is first divided into η time periods, and each period has a length ∆ rc = ∆rc η . In the i-th (i ∈ [η]) time period, idC runs ss i ||p i 0 := PRG(ss i−1 ) to get the corresponding initial secret p i 0 of the i-th sub-chain and the PRG seed ss i , It generates the i-th verify-point p i N (for verifying the aliveness proofs in the corresponding chain) by computing N -times of the OWF F. Namely, each node p i j in this chain is computed from its predecessor p i j−1 and the OWF F, i.e., p i j := F(p i j−1 ). At the end, the client would keep the state (ss u , p 1 0 , {T i end } 0≤i≤η , η, u), where u is a variable initialized to be 1 to track the index of the stored PRG seed and the OWF-chain head. And the server has state ({p i N , T i end } i∈ [η] , T ack , η), where T ack is the latest time that the verifier received a valid aliveness proof. -While x ≥ yb la−ly , it computes q la−ly := q la−ly + 1 and x := x − yb la−ly ; -For i ∈ [l a − 1, . . . , l y ], it runs the following steps: If x i = y ly−1 , then it sets q i−ly := b − 1, else it sets q i−ly : If x < 0 then it sets x := x + yb i−ly and q i−ly := q i−ly − 1; -Eventually, the quotient q has been calculated, and the reminder is r := x.
Note that the division operation can also be used as a modular operation, in which case only the reminder r is returned.

B Remarks on Extending the Life-span of PoA Instances
To improve password replenishment, it is possible to leverage external storage, such as an SD card that is supported by PLC [Bra16], so that we could compute each new tail node at the idle time of PLC and store it at the SD card. The time to compute a tail node is identical to the worst-case time to generate a password in the proof generation procedure. As we assume that the PLC is running in RUN mode, these tail nodes cannot be tampered by adversaries. The compression of tail nodes based on UOWHF could also be done at the idle time as well, which costs about 1s. For the online OTS signing procedure, it only needs to read tails and the corresponding hash value from SD, and run the PRG to generate the OWF-chain heads accordingly, that roughly costs 6s. In this way, the PoA instance could use a much longer sub-chain, so that the interval between two replenishment procedures can be longer as well. For example, an POA instance can be used for 91 days when N = 1024. To facilitate the proof generation algorithm, one could also store checkpoints in sub-chains in the SD card. Nevertheless, it is an open question to figure out the optimal implementation strategy for PoA on PLC with external storage devices.

C Other Pseudo-codes C.1 Pseudo-codes of Chaskey
The MAC evaluation function Chaskey(m) is shown by Algorithm 5. The input message m ∈ {0, 1} lm is represented as a DINT ARRAY m[ * DN k ], where * DN k = 128 32 for 128-bit security. We define a DINT ARRAY K[3DN k ] to store the key K = k||K 1 ||K 2 . To hard-code the key, the first step of Chaskey is to assign {K[i]} i∈[3DN k ] with corresponding concrete value of k||K 1 ||K 2 generated by Python.
We implement the permutation π c (with T ∈ {8, 16} rounds) on PLC via the Algorithm 5.

C.2 Pseudo-code of SPECK
The encryption function SPECK enc (m) of SPECK would first divide the input message into two words each of which has length l e . In the implementation, we consider a word length l e ∈ {16, 32, 64}. SPECK enc (m) is realized by Algorithm 6.

C.5 Pseudo-codes of PHOTON
PHOTON mainly leverages on a sponge-like construction (adopted by AES) as internal unkeyed permutation denoted by π p which deals with l t = l c + l r -bit input and has T = 12 rounds, where l c is the capacity (security parameter) and l r is the bit-length (bitrate) of a message block. The input message m ∈ {0, 1} lm of PHOTON will be divided into message blocks for hash evaluation, each of which has l r bits, i.e., l m = l r . The output hash value z ∈ {0, 1} l h of PHOTON is represented by chunks each of which has a bit-length (bitrate) l r , where l h ∈ {80, 128, 160, 224, 256}. π p would apply a l s -bit SBOX, where l s = 8 when l h = 256 and l s = 4 otherwise. We may append the concrete value of l h to PHOTON to differentiate each version.
The evaluation algorithm of PHOTON on PLC is shown by Algorithm 9, which mainly runs the permutation algorithm π p to generate the hash value. We present π p in Algorithm 10. To get the least significant l s -bit from the SCShRMCS table look-up result T v (≥ 32 bits) to update the internal l s -bit state), we apply the optimization idea B-RW to develop a function GetByte(T v, start, l s ) that can get each l s -bit block value from T v, where start indicates the starting bit to operate. That is, the returned value res of GetByte(T v, start, l s ) is computed by a few assignments and additions res.

C.6 Pseudo-codes of SPONGENT
The main skeleton of evaluation algorithm is briefly presented in Algorithm 11.
To implement the permutation function π s , we first pre-compute round-dependent LSFR constants VI That is, we try to avoid such an expensive computation (without hardware support) during evaluation, so that we load them in the initialization phase as constants. The SBOX is loaded as constants as well, so we can realize the SLayer by lookup operations. In addition, the PLayer is implemented the similar idea in the implementation of PRESENT.

D Implementation and Performance of Big-integer Operations D.1 Pseudo-codes of Big-integer Operations
Here we focus on describing the implementation of big-integer operations with a particular base b = 2 l b that can be optimized via ST's bit-wise operability. For other bases, e.g., b = 10, they can be implemented analogously following the specification. We represent each operand of big-integer operation with l b -bit digits. The implementation of multipleprecision addition Add(x, y) is shown by Algorithm 15. Since the carry is only a one-bit value, we can obtain it from the sum of two digits for free, e.g., c := w[i].[l b ], and the mod base operation can be straightforwardly realized by setting the l b -th bit to be zero.  During the division, we need to shift left a big-integer with a few digits, e.g., the result of yb i−ly−1 (as illustrated in Section A.5). To so do, we define a function LShiftB that takes as input a big-integer {x[i]} i∈ [la] and the number of digits pos(< l a ) that it will shift, it returns the left-shifted result {y[i]} i∈ [la] . That is, y By Algorithm 19, we implement multiplication in a additive group Z m , i.e., AMul. Since scanning the bits of the input e is free, AMul can be realized by running our multipleprecision addition following the specification. In the meantime, if some intermediate result is larger than 2m, we do the modular reduction of the value by subtracting it with m.

D.2 Performance of Big-integer Operations
Big-integer operations are implicitly used in our PLCrypto, while mathematical operations are carried with operands over 30 bits. To show the performance of big-integer operations on PLC, we implement five typical big-integer operations as presented in Appendix A.5 and Appendix D.1, i.e., multiple-precision addition (Add), subtraction (Sub), multiplication (Mul), and division (Div), and additive multiplication (AMul).
We benchmark the big-integer operations with three types of bases, i.e., b = 2 15 , b = 2 30 and b = 10. However, base b = 2 30 is used for benchmarking the addition and the subtraction only, since 30-bit multiplication would exceed the range of DINT. Note that the big-integer addition/subtraction with base b = 2 30 is implicitly used in the implementation of SPECK. Note that modulo 2 31 addition (used by subset-sum based OWF) is similar to that of modulo 2 30 . We thus omit it here for simplicity. Also, the 32-th bit of a DINT variable is the sign bit that cannot be used to store the multiplication result, so the base b = 2 15 is the largest base and most efficient one for multiplication that we could test.
In Figure 9, we show the performance of five types of big-integer functions. The Add and Sub are not efficient enough comparing to the corresponding results on other platforms. So the Mul and the Div which are realized based on Add and Sub are costly. Since the group formed by the points on elliptic curve cryptography (ECC) over a finite field unitizes additive notation, our result regarding Algorithm 19 implies that ECC might be feasible on PLC. For example, AMul with 512-bit multipliers costs around 400 ms. Also, we have experimentally verified that the DINT multiplication is constant-time on our platform, regardless of the given operands. In this work, we just made some preliminary attempts