Practical seed-recovery for the PCG Pseudo-Random Number Generator

. The Permuted Congruential Generators ( PCG ) are popular conventional (non-cryptographic) pseudo-random generators designed in 2014. They are used by default in the NumPy scientiﬁc computing package. Even though they are not of cryptographic strength, their designer stated that predicting their output should nevertheless be "challenging". In this article, we present a practical algorithm that recovers all the hidden parameters and reconstructs the successive internal states of the generator. This enables us to predict the next “random” numbers, and output the seeds of the generator. We have successfully executed the reconstruction algorithm using 512 bytes of challenge input; in the worst case, the process takes 20 000 CPU hours. This reconstruction algorithm makes use of cryptanalytic techniques, both symmetric and lattice-based. In particular, the most computationally expensive part is a guess-and-determine procedure that solves about 2 52 instances of the Closest Vector Problem on a very small lattice.


Introduction
Pseudo-random generators (PRG) are well-studied primitives in symmetric cryptography.A PRG is an efficient deterministic algorithm that stretches a small random seed into a longer pseudo-random stream.To achieve cryptographic-grade pseudo-randomness, a PRG must ensure that the pseudo-random stream is computationally indistinguishable from a "truly" random sequence of bits by efficient adversaries.Alternatively, it is possible to define pseudo-randomness by asking that no efficient algorithm is capable of predicting the next pseudo-random bit with non-negligible accuracy.The two definitions are in fact equivalent.It is well-known that pseudo-random generators can be turned into symmetric encryption algorithms, by generating "random" masks to be used in the one-time pad.This is precisely what stream ciphers do.
Not all pseudo-random generators are of cryptographic strength.In some applications, it is simply not necessary: to be used in Monte-Carlo numerical simulations or generate random choices in games, a relaxed, non-cryptographic notion of pseudo-randomness may be sufficient.This allows for faster algorithms.For instance, python standard library's random module uses the Mersenne Twister [MN98].The C library that comes along gcc (the glibc) uses a (poor) truncated linear congruential generator by default to implement the rand function.
In the realm of non-cryptographic random generators, a PRG is deemed "good enough" if it passes some efficient statistical tests -whereas the cryptographic notion of pseudorandomness asks that it passes any efficient test.There are de facto statistical test suites; an initial battery of randomness tests for RNGs was suggested by Knuth in the 1969 first edition of The Art of Computer Programming.In 1996, Knuth's tests were then supplanted by Marsaglia's Diehard tests.In 2007, L'Ecuyer proposed the TestU01 [LS07] library, whose "BigCrush" test is considered state-of-the-art by the relevant community, to the best of our knowledge.In 2010, the NIST proposed its own statistical test suite (Special Publication 800-22), to which improvements were later suggested [ZML + 16].We understand that the PractRand test suite also has a good reputation.
In any case, designers of conventional pseudo-random generators try to obtain the simplest and fastest algorithm that passes the day's favourite test suite.A few selected ones are shown in Fig. 1. lehmer64 is a truncated linear congruential generator, touted as "the fastest PRNG that passes BigCrush" [Lem19].xorshift128 is a clever implementation of a 128-bit LFSR with period 2 128 − 1 due to Marsaglia [Mar03], using only a few simple 32-bit operations.xorshift128+ is a improved version due to Vigna [Vig17] that returns the sum of two consecutive outputs of a Xorshift LFSR; it passes the "BigCrush" test suite and is the default PRNG in many Javascript implementations, including that in Google's V8 engine (Chrome), Firefox and Safari.
Failures in cryptographic pseudo-random generators have catastrophic security implications.Let us mention for instance the well-known problem in Debian Linux from 2008, where a bug in the OpenSSL package led to insufficient entropy gathering and to practical attacks on the SSH and SSL protocols (the only remaining source of entropy comes from the PID of the process, i.e. 16 bits or less of effective entropy) [YRS + 09].
However, problems in non-cryptographic random number generators can also have dire consequences (barring the obvious case where they are used in lieu of their cryptographic counterparts).When they are used in scientific computing for Monte-Carlo methods, their defects have the potential to alter the results of numerical simulations.Ferrenberg et al. [FLW92] ran a classical Ferromagnetism Ising model Monte-Carlo simulation, in a special case where the exact results could be computed analytically, and compared the results of the simulation with the "true" answer.They used several conventional pseudo-random generators: a 32-bit linear congruential generator, two LFSRs, various combinations thereof, etc.They observed that changing the source of random numbers significantly altered the outcome of the numerical simulation.Different generators produced different biases: in particular a given LFSR yielded energy levels that were systematically too low and critical temperatures that were always too high, while another kind of generator yielded the opposite (in many, repeated, trials).
The scientific computing community also realized that the need for fast parallel random number generation could be satisfied by the use of block ciphers in counter mode [SMDS11].The need for speed then leads to the use of weakened cryptographic primitives (roundreduced AES or custom and presumably weak block-ciphers) In most cases, it is fairly easy to see that a given conventional PRG does not meet the cryptographic notion of pseudo-randomness, and there are few exceptions.Most are fairly easy to predict, meaning that after having observed a prefix of the output, it is easy to produce the next "pseudo-random" bits.This makes a good source of exercises for cryptology students.
In this paper, we study the PCG family of non-cryptographic pseudo-random generators proposed by O'Neil [O'N14b, O'N14a].She did not claim that the algorithm has cryptographic strength, but that predicting its output ought to be "challenging".We therefore took up the challenge.
PCG stands for "Permuted Congruential Generator": it essentially consists in applying a non-linear filtering function on top of a linear congruential generator (in a way reminiscent to the venerable filtered LFSRs).The resulting combination is fast and passes current statistical test suites.The PCG family contains many members, but we focus on the strongest one, named either PCG64 or PCG-XSL-RR.It has a 128-bit internal state and produces 64 bits when clocked.It is the default pseudo-random number generator in the popular NumPy [vCV11] scientific computing package for Python.
The internal state of the PCG64 generator is made of a 128-bit "state" and a 128-bit "increment", whose intended use is to provide several pseudo-random streams with the same seed (just as the initialisation vectors do in stream ciphers).A default increment is provided in case the end-user just want one pseudo-random stream with a single 128-bit seed.

Contribution.
We describe an algorithm that reconstructs the full internal state of the strongest member of the PCG family.This allows to predict the pseudo-random stream deterministically and clock the generator backwards.The original seeds can also easily be reconstructed.The state reconstruction algorithm is practical and we have executed it in practice.It follows that predicting the output of the PCG should be considered practically feasible.
While the PCG pseudo-random generator is not meant as a cryptographic primitive, obtaining an actual prediction algorithm requires the use of cryptanalytic techniques.Making it practical requires in addition a non-trivial implementation effort.
Our algorithm reconstructs the internal state using a "guess-and-determine" approach: some bits of the internal state are guessed ; assuming the guesses are correct, some other information is computed ; a consistency check discards bad guesses early on ; then candidate internal states are computed and fully tested.The problem actually comes in two distinct flavors.
When the increment is known (for instance when it is the default value), a simplified prediction algorithm recovers the internal state from 192 bits of pseudo-random stream.The process runs in 20 CPU minutes.It guesses 37 bits of the internal state, then solves an instance of the Closest Vector Problem (CVP) in a 3-dimensional euclidean lattice.This requires about 50 arithmetic operations in total and reveals the entire internal state if the guesses are correct.
When the increment is unknown, things are a bit more complicated.This is the default situation in NumPy, where both the state and the increment are initialised using an external source of entropy.In this case, our prediction algorithm requires 4096 bits of pseudo-random stream ; it guesses between 51 and 55 bits, then for each guess it solves an instance of CVP in dimension 4 (using about 75 arithmetic operations).This recovers 64 more bits of information about the difference between two successive states, and this is enough to filter the bad guesses.This information can then be used in a subsequent and comparably inexpensive phase to recover the entire internal state.On average, the whole process requires a bit less than 20 000 CPU hours to complete.
We implemented our algorithms, then asked the designer of the PCG family to send us "challenge" pseudo-random streams ; we ran our code and emailed back the (correct) seeds used to generate the challenge streams the next day.
Related Work.Deterministic pseudo-random generators can be traced back to the work of Von Neumann and Metropolis on the ENIAC computer [vN51]; they suggested around 1946 to use the "middle-square" method: if u n is a k-digit number, form u n+1 by taking the square of the k 2 middle digits of u n .This is a venerable precursor of the Blum-Blum-Shub "provably secure" PRNG.The main problem of this method is that it produces sequences that quickly enters short cycles.
Lehmer later proposed linear congruential generators in 1949, also for use on the ENIAC computer [Leh49].He gave the sequence defined by u 0 = 47594118, u n+1 = 23u n mod 10 8 + 1 and proved that it had period 5882353, a clear improvement compared to the middle-square approach.More details on early pseudo-random generators can be found in [Knu98].
Knuth discussed whether truncated linear congruential generators could be good stream ciphers; he therefore studied the problem of recovering the internal state of a truncated linear congruential generator x i+1 = ax i + c mod 2 k when a and c are unknown [Knu85]; he gave an algorithm exponential in the number of truncated bits.
Boyar studied further the problem [Boy89] and presented an algorithm which could predict a linear congruential generator when all the parameters (multiplier, increment, modulus and initial state) are unknown; she extended her idea to the case of truncated linear congruential generators, under the condition that the number of bit unrevealed is really small in comparison to the size of the modulus.
Frieze et al [FHK + 88] improved the efficiency of reconstruction algorithms in simpler cases.They supposed that the multiplier a and the modulus 2 k were known and used lattice-based techniques to recover a truncated linear congruential generator with more truncated bits.
Later on, Joux and Stern extended this result to the case where the multiplier a and the modulus 2 k are unknown, also using lattice techniques [JS98].

The PCG Pseudo-Random Number Generator Family
This section introduces some notations and describes the PCG64 non-cryptographic pseudorandom number generator (a.k.a.PCG-XSL-RR in the designer's terminology).
If x ∈ {0, 1} n is an n-bit string, then x[i:j] denotes the bit string x i x i+1 . . .x j−2 x j−1 , where x = x 0 . . .x n−1 (this is the "slice notation" used in Python).The set Z 2 k of integers modulo 2 k is seen as the set of k-bit strings.If x is a floating point number, then x denotes the nearest integer (using the "rounding half to even" tie-breaking rule -this is the default in IEEE754 arithmetic).If U is a vector or a sequence, then U i is the i-th element (we use capital letters for these).If U is such a sequence, we denote by U mod M the sequence (U 0 mod M, U 1 mod M, . . .).The XOR operation is denoted ⊕, left and right rotations are denoted ≪ and ≫ respectively.Modular addition is denoted + (or to make it even more explicit).
PCG64 has an internal state of 128-bit, which operate as a linear congruential generator modulo 2 128 .More precisely: where the "multiplier" a is a fixed 126-bit constant.The first initial state S 0 is the seed of the generator.The increment c may be specified by the user of the PRNG to produce different output streams with the same seed (just as the IV acts in a stream cipher).If no value of c is specified, then a default increment is provided.Note that c must be odd.The default values are: Each time the PRNG is clocked, 64 output bits are extracted from the internal state using a non-linear function that makes use of data-dependent rotations, in a way reminiscent of the RC5 block cipher [Riv94].The six most significant bits of the internal state encode a number 0 ≤ r < 64.The two 64-bit halves of the internal state are XORed together, and this 64-bit result is rotated right by r positions.
The successive 64-bit outputs of the generator are X 0 , X 1 , . . .where: For the sake of convenience, we denote by Y i the XOR of the two halves of the state (before the rotation) and by r i the number of shifts of the "i-th rotation".Fig. 2 summaries the process.The overall design strategy is similar to that of a filtered LFSR: the successive states of a weak internal generator with a strong algebraic structure are "filtered" by a non-linear function.
Updating the internal state requires a 128 × 128 → 128 multiplication operation.In fact, this can be done with three 64 × 64 → 128 multiplications and two 64-bit additions.High-end desktop CPUs all implement these operations in hardware, so the generator is quite fast on these platforms.

Tools
In the rest of this paper, we often perform arithmetic operations on integers where only some bits are known.This leads to generation of unknown carries.If a, b are integers modulo 2 128 and 0 ≤ i < j < 128, then there is a carry 0 ≤ γ ≤ 1 (resp.a borrow 0 ≤ β ≤ 1) such that:

Linear Congruential Generators and Lattices
Given an integer k, a fixed multiplier a, an increment c and a "seed" x, define the sequence: When reduced modulo 2 k , the sequence U form the successive states of a linear congruential generator (LCG).Let LCG k (x, c) denote the vector (u 0 , u 1 , u 2 , . . . ) of integers modulo 2 k .It is easy to check that: Let L denote the euclidean lattice spawned by the rows of the following n × n matrix: This lattice contains all n-terms geometric progressions of common ratio a modulo 2 k ; therefore the first n terms of the sequence LCG k (x, 0) give the coordinates of a vector in this lattice.
Reconstructing the state of a truncated linear congruential generator can generally be seen as the problem of finding a point of this lattice given only an approximation thereof, for instance when the least-significant bits of each components have been dropped.The "lattice approach" to truncated linear congruential generators is due to Frieze et al.
Let U be a vector of n integers modulo 2 k such that denote the top t bits of U i , and let N denote an arbitrary "noise vector" such that N i ∈ {−1, 0, 1}.Finally, set T i = T i + N i mod 2 t .We will be facing the following problem ("reconstructing noisy truncated geometric series") several times: n , a (noisy) version of U truncated to the top t bits.
OUTPUT U 0 mod 2 k , the first term of the (non-truncated) geometric sequence.This is a particular case of the hidden number problem.We will be facing a "highdimension" instance in section 5.3 and many "low-dimension" instances in sections 4 and 5.The rest of this section discusses algorithmic tools to solve these problems.We first claim that 2 k−t T is "close" to a point of the lattice L.
Proof.We first observe that U belongs to the lattice L. We start by setting U ← U , and we examine all coordinates of U : ), then we have: • Otherwise, there are two possible "wraparound" cases: -Either T i = 0 and N i = −1, which leads to T i = 2 t − 1.In this case, we have and we set U i = U i + 2 k (note that this amount to adding a lattice vector to U , so U stays in the lattice).We have: -Or T i = 2 t − 1 and N i = +1, which leads to T i = 0.This implies that again, with this modification U stays in the lattice), and we find: In the end, we have

Reconstruction in "High" Dimension Using an Exact CVP Solver
Lemma 1 tells us that the approximation of a geometric sequence obtained by dropping least-significant bits cannot be arbitrarily far from a lattice point which reveals U 0 mod 2 k .Therefore, we may possibly reconstruct truncated geometric series by finding the lattice vector closest to the approximation we have.This means solving instances of the wellknown Closest Vector Problem (CVP), a fundamental hard problem on lattices.It is NP-hard, and all known algorithms are exponential in the dimension of the lattice, yet they can be fairly practical up to dimension ≈ 70.Let CVP(L, x) denote the vector of L closest to the input vector x.Using the same notations as above, we want to know if CVP L, 2 k−t T is indeed U .This will necessarily be the case when 2 k−t T − U is smaller than the length of the shortest non-zero vector of L -this quantity, the first minimum of the lattice, is denoted by λ 1 (L).By the triangular inequality, we have: But, as the vector U belongs to the lattice, by definition of the closest vector and by lemma 1: If we can prove that the right side of this inequality is smaller than the first minimum of the lattice λ 1 (L), then we would have proved that CVP(L, 2 k−t T ) indeed reveals U 0 mod 2 k .
In section 5.3, we will be facing the problem of reconstructing a geometric sequence modulo 2 128 given arbitrarily many (noisy versions of the) most-significant 6 bits of successive elements of the sequence.Therefore we have k = 128 and t = 6, and we wish to determine the required number of samples, i.e. the value of n.This means finding the values of n such that √ n2 124 ≤ λ 1 (L).Starting from n = 122/6 , we computed the length of the shortest vector of the lattice spanned by G n,128 for each successive n until the condition holds.The Shortest Vector Problem (SVP) is another well-known lattice NP-hard problem; we used the (almost) off-the-shelf G6K library [ADH + 19], which gave results very quickly by sieving.fplll [dt16] was too slow above dimension 50, in the default settings.
After this computation, we found that the minimal possible n is 63: with n = 63, the shortest vector of L has length greater than 2 127.02 , which is high enough.This vector can be obtained by bootstrapping the geometric sequence with U 0 = 12144252875850345479015002205241987363 then reducing the terms modulo 2 128 in zero-centered representation (subtracting 2 128 to U i if U i > 2 127 ).It follows that when n ≥ 63, k = 128 and t = 6, any CVP oracle will return a vector congruent to the original U when given T .

Reconstruction in Low Dimension Using Babai's Rounding
In sections 4 and 5.1 we will need to reconstruct billions of noisy truncated geometric series modulo 2 64 with very few terms, of which a large fraction of most-significant bits are known.In this setting, the CVP problem becomes much easier.This enables us to use faster and more ad hoc methods, such as Babai's rounding algorithm [Bab86].
If M is a square matrix, we denote by ~M ~the induced matrix norm : In the case of the • 2 norm used throughout this paper, ~M ~is the largest singular value of M ; equivalently, it is the square root of the absolute value of the largest eigenvalue of M t M .Denote again by L the n-dimensional lattice spanned by the rows of G n,64 , and let H denote the LLL-reduction of G n,64 .The same lattice is also spanned by the rows of H.For instance, with n = 3: does not belong a priori to the lattice, S does not have to be an integer vector.Let then R denote the rounding of S, i.e.R i = S i .Then RH is an element of the lattice.Under the right conditions, it will be the vector of the lattice closest to 2 k−t T .Indeed: By definition, R is the closest integer vector to 2 k−t T H −1 .But as U is an element of the lattice, U H −1 is an integer vector.Thus Note that ~H−1 ~× ~H~is the condition number of the matrix H. Lattice reduction has the side effect of reducing the condition number, therefore it makes sense to use an LLL-reduced basis.If we can prove that the right side of the inequality is smaller than the first minimum of the lattice, then we would have proved that RH is indeed the closest vector we were searching for.Because we have fixed k = 64, by lemma 1 we have So, if we fix n we can search for the minimum number t of known most-significant bits such that: When t is greater than the values given in table 1, then Babai's rounding technique will always return the closest vector, and allow us to reconstruct a truncated geometric serie.

Application to the lehmer64 generator
Adapting the previous reasoning enables an efficient state reconstruction algorithm for the lehmer64() generator shown in Fig. 1.When clocked, it outputs the top 64 bits of a geometric sequence (k = 128 and t = 64).Three successive outputs are sufficient to reconstruct the internal state using Babai's rounding technique.This yields the following reconstruction algorithm: def reconstruct(X): """ Produce the internal state of the generator given three consecutive outputs of lehmer64().16 multiplications, 1 division, 11 additions and 3 roundings only.
""" We first consider the easier case where the "increment" (the c term in the definition of the underlying linear congruential generator) is known -recall that a default value is specified in case the user of the pseudo-random generator does not want to provide one.In this case, reconstructing the 128-bit internal state S i of the generator is sufficient to produce the pseudo-random flow with 100% accuracy (the generator can also be clocked backwards if necessary, so that the seed can be easily reconstructed).We therefore focus on reconstructing S 0 (the seed) from X 0 , X 1 , X 2 , . . . .A very simple strategy could be the following: 1. Guess the 64 upper bits of S 0 (this includes the rotation).
2. Compute the missing 64 lower bits using (1), with: This "baseline" procedure requires 2 64 iterations of a loop that does a dozen arithmetic operations; it always outputs the correct value of S 0 , and may output a few other ones (they can be easily discarded by checking X 2 ).An improved "guess-and-determine" state reconstruction algorithm is possible, which essentially amounts to expose a truncated version of the underlying linear congruential generator, and attack it using the tools exposed in section 3.This is possible by combining the following ingredients: • The underlying linear congruential generator uses a power-of-two modulus, therefore the low-order bits of S i+1 are entirely determined by the low-order bits of S i .
More precisely, we have: Therefore, guessing the least-significant bits of S 0 yields a "long-term advantage" that holds for all subsequent states.
• Guessing a 6-bit rotation r i gives access to Y i (the XOR of the two halves of the internal state).Thus, if a part of the state is known, then this transfers existing knowledge to the other half.
In figure 3, we see that guessing S 0 [0: ] and a few 6-bit rotations r i give access to S i [58:64 + ] for the corresponding states.Therefore, looking at S i [ :64 + ], we are facing a truncated linear congruential generator on 64 bits, where we have access to the 6 + most-significant bits of each state (denoted by T ), for a few consecutive states.This is sufficient to reconstruct entirely the successive states of this truncated linear congruential generator.This reveals S 0 [ :64 + ], and using (1) the entire S 0 can be reconstructed.The precise details follow.
We know K i [58:64 + ] for all i, and for each guessed rotation r i we have access to We are thus in the context of the problem discussed in section 3, namely reconstructing a geometric sequence given t = 6 + (noisy) most-significant bits.The "noise" is the unknown vector B of borrows.
We will guess n rotations and least-significant bits of the state, for a total of 2 6n+ guessed bits.Table 1 gives a lower-bound on t = 6 + given n, and we see that the total number of guessed bits reaches a minimum of 38 when n = 3 and = 20.Therefore, success is guaranteed if we guess = 20 low-order bits of the state and three consecutive rotations.
The algorithm that reconstructs the internal state of the PCG64 generator with known increment proceeds as shown in algorithm 1.The point is that when the guesses are correct, then from the truncated geometric series T , the solution of the CVP instance reveals U j = S j [ :64 + ].From there, the correction of the algorithm is easily established.
The procedure is completely practical.More details are given in section 6.Let us just mention that the procedure often works (twice faster) with = 19 or even four times faster with = 18 (with a reduced success probablity).
Algorithm 1 State reconstruction Algorithm (case where c is known) // Statement involving j must be repeated for j = 0, 1, 2. for 0 ≤ w < 2 do Guess least-significant bits of S 0 6: for 0 ≤ r 0 , r 1 , r 2 < 64 do Guess rotations 8: Y j ← X j ≪ r j Undo rotations 9: T j ← T j K j [58:64 + ] Truncated geometric series on 6 + bits 11: output S 0 as a candidate internal state.

State Reconstruction for PCG64 With Secret Increment
The algorithm of section 4 does not apply directly to the general case where the value of c is unknown.A "baseline" procedure would consist in guessing S 0 [64:128] and S 1 [64:128]; using eq.( 1), this would reveal S 0 and S 1 ; from there, the increment c is easy to obtain, and every secret information has been reconstructed.This would take 2 128 iterations of a very simple procedure, which is completely infeasible.
Set ∆S i = S i+1 S i ; it is easily checked that ∆S i is a geometric progression of common ratio a.Therefore, reconstructing both S 0 and ∆S 0 is sufficient to compute all subsequent states (and recover the unknown increment c).The global "guess-and-determine" strategy is essentially the same as before: gaining access to a truncated version of ∆S i , solving a small SVP instance, reconstructing ∆S 0 , then checking consistency.
2. Reconstruct all rotations r i from this partial knowledge.
4. Reconstruct S 0 from ∆S 0 and the rotations.
Only the first phase is computationally intensive.The four steps are discussed in the next four subsections.

Partial Difference Reconstruction
In order to access to a part of ∆S i , we use the same "guess-and-determine" strategy as in section 4: we guess the least significant bits of S 0 and some rotations, then check consistency.The difference is that, since c is unknown, we must in addition guess the least significant bits of c to obtain the same "long-term advantage" (c is always odd; this makes one less bit to guess).We must also guess k + 1 successive rotations to get information on k successive differences ∆S i .
Confirming that the guesses are correct is less immediate.When c was known, we could reconstruct the internal state; from there, filtering out the bad guesses was easy.When c is unknown, the same strategy does not work, but a very strong consistency check can still be implemented.
We consider again the sequence of internal states S = (S 0 , S 1 , . . . ) = LCG 128 (S 0 , c).We will guess the least-significant bits of S 0 and of c, therefore let us assume that their value is known and denote it by w 0 and c 0 .We define S = LCG 128 (S 0 − w 0 , c − c 0 ) and K = LCG 128 (w 0 , c 0 ) -again, K is known and S = S − K.This time, the components of S do not follow a geometric progression; but we still have that the least significant bits of each S i are zero.Set ∆S i def = S i+1 − S i ; ∆S [ :64 + ] follows a geometric progression of common ratio a modulo 2 64 (again).This time, we have to find ∆S 0 [ :64 + ].
As in section 4, we have access to T i def = S i [58:64 + ].We want to subtract the known part to obtain T i def = (S i K i )[58:64 + ], which is the truncation of S i .This again introduces an unknown vector B of borrows, and in fact we can only compute T = S[58:64 + ] K[58:64 + ], with T = T B. As explained above, to access a geometric sequence, we would like to obtain ∆T i def = T i+1 − T i , but we can only compute: We are thus still in the context of the problem discussed in section 3, but this time the "noise" caused by the carries is given by B i+1 − B i .When the guesses are correct, then Babai's rounding will reconstruct ∆ S [ :64 + ] from ∆ T .This in turn yields ∆S 0 [0:64 + ].
Once we have found ∆S 0 [0:64 + ], we can compute ∇S i [0:64 + ] for any i because eq. ( 7) holds modulo 2 64+ ; because we have guessed the first rotation and the least significant bits of the state, using (1) we gain access to S 0 [58:64 + ]; combined with the "differences" ∇S i , this reveals S i [58:64 + ] for any i (and we already had S i [0: ]).This allows us to compute Y i [0: ] = S i [0: ] ⊕ S i [64:64 + ] for any i.Given a "fresh" output X i , and assuming that the guesses are correct, then we should have: In particular, if the guesses were correct, then we should have for any i: If none of the 64 possible rotations yields a match, then the guesses made beforehand have to be wrong.As a consequence, bad guesses can be filtered with an arbitrarily low probability of false positives, by trying several indices i.
In algorithm 2, ConsistencyCheck uses eq. ( 9) combined with this observation to discard bad guesses.
The heart of the algorithm is again the reconstruction of a truncated geometric progression knowing the t = + 6 upper bits of four consecutive terms.Looking at table 1, we see that the best choice consists in guessing 5 consecutive rotations and = 14 least-significant bits.Therefore, ReconstructPartialDifference does 2 57 iterations of the inner loop, and succeeds deterministically.

Predicting all the Rotations
Knowing the values of ∆S 0 [0:64 + ] as well as the least-significant bits of S 0 and c is sufficient to get rid of the nastier feature of PCG64: armed with this knowledge, we can determine all the subsequent rotations deterministically, at negligible cost, using eq (8).For each index i, it suffices to try the 64 possible values of r i ; only one should satisfy eq (8).The complete pseudo-code is shown in algorithm 3.
It is unlikely that several possible values of r i match: each value is "checked" on bits, so an accidental match happens with probability 2 −6 .The total number of lists returned by ReconstructRotations then follows a binomial distribution of parameters 2 −6 , k.With = 14 and k = 64, then only one rotation vector should pass the test for 0 ≤ i < 64 on average.T ← ReconstructRotations(∆S 0 , v 0 , i + 1, k) Find all the (r i+1 , . . ., r k )

Algorithm 3 Rotations and full difference reconstruction algorithm
6: H ← [] List of possible r i 's 7: 8:

Full Difference Reconstruction
Using X 0 , X 1 , . . ., X 63 , we recover all rotations and thus we recover the 6 most-significant bits of S 0 , S 1 , . . ., S 63 .This allows us to compute the 6 most significant bits of the differences ∆S i between consecutive states (up to missing carries), and we are faced with the problem of reconstructing a 128-bit geometric progression using 63 consecutive outputs truncated to their 6 most-significant bits.There is again an unknown vector of borrows B such that ∆S i [122:128] C i = r i+1 r i .
Reconstructing ∆S 0 from the r i is exactly the problem discussed in section 3.2.This can be done by solving an instance of CVP in dimension 63.We use the off-the-shelf CVP solver embedded in fplll: it runs in negligible time.

Complete State Reconstruction
Once all the rotations have be recovered and ∆S 0 has been found entirely, the only thing that remain is to actually find the entire S 0 .For this, we use again eq.(1), coupled with the "differences": The Y i and ∇S i are known, ∇S 0 = 0, and the problem consists in recovering S 0 .We could probably encode it as an instance of SAT, feed it to a SAT-solver and be done with it.
Nevertheless, here is a detailed recovery procedure which obtain all bits of S 0 , from right to left, by exploiting the non-linearity of modular addition.It takes negligible time.Let C i the vector of (incoming) carries generated during the addition of S 0 and ∇S i : Combining all the above, we have: This useful equation enables an induction process.
• When j = 0, the 0-th carries are zero, and therefore eq. ( 10) reveals the 64-th carries: • Next, suppose that C i [0:j], S 0 [0:j − 1], C i [64:64 + j] and S 0 [64:64 + j − 1] are known, for all i.We can compute C i [j] ⊕ C i [64 + j] for any i using eq.(10).We then look a a specific index i > 0 such that The point is that, thanks to the majority function, From there, we also have , and the j-th carry bits can be computed normally.
The whole procedure is shown in algorithm 4. Note that once S 0 has been found, then all subsequent states can be computed with error using S i = S 0 ∇S i .In particular, computing S 1 gives c by c ← S 1 aS 0 .This complete the reconstruction procedure for PCG64.

Implementation and Practical Results
We have implemented the state reconstruction algorithms described above using a mixture of C (for the computationally expensive parts) and Python (for the rest).We used the fplll library [dt16] to solve CVP instances exactly in dimension 63.
In this section, we briefly outline important aspects of our implementations and present practical results.Our codes are available in the supplementary material as well as online at: Algorithm 4 Full state reconstruction algorithm 1: function ReconstructState(∆S 0 , r 0 , . . ., r k , X 0 , . . ., X k ) 2: for i = 0, 1, . . ., k do Setup 3: for j = 1, 2, . . ., 64 do Induction 8: i ← ⊥ Find good index 9: No suitable indice found? 13: Abort with Failure 14: Compute next state bit 15: 17: for i = 0, 1, . . ., k do Compute next carries 18: The designer of PCG was kind enough to send us two sets challenge inputs: one with the default (known) increment and one with a random secret increment.She generated random seeds and provided us with the first outputs of the pseudo-random generator.We were able to reconstruct the seed with an extremely high confidence level, because they re-generate the same outputs.We emailed back the seeds and received confirmation that they were indeed correct.
We have therefore successfully taken the challenge of predicting the output of the PCG64 generator.
The analysis of section 3 yields parameters that guarantee that the reconstruction procedure always succeeds.In most cases, these parameters are pessimistic.We ran a serie of experiments to determine more practical choices: using smaller-than-guaranteed values of (the number of guessed least-significant bits), we measured the success probability of the state reconstruction procedure.The results are shown in table 2.

Known Increment
When the increment c is known, algorithm 1 is all it takes to reconstruct the internal state of the generator and predict it (or output the seed).We implemented it in C, using return magic.l;} This hack exploits the IEEE754 representation of double-precision floats: the mantissa lies in bits [0:52] while the sign bit and the exponents take the 12 most significant bits.Adding 2 52 + 2 51 forces the mantissa to shift to the correct position and inserts an extra 1 bit at position 51.The two shifts clear the extra bit and the exponent, while correctly expanding the sign bit.

Unknown Increment
When the increment c is known, the internal state of PCG64 can be practically reconstructed from X 0 , . . ., X 63 using the algorithms shown in section 5.Only algorithm 2 is computationally expensive; we implemented it in C, while we implemented algorithms 3 and 4 in Python.
We have shown that algorithm 2 is correct when = 14.The procedure does 2 29+2 iterations of the inner loop, so decreasing would really be interesting.Looking at table 2, we settle for = 13 in the worst case; let T denotes the running time when = 13.
It seems that the most promising strategy consists in choosing = 11; if the reconstruction procedure fails, then we try again with different inputs.The expected running time of this approach number of trials is T /(16 × 0.64) ≈ T /10.25.In our implementation, T = 200, 000 CPU hours, so the expected running time of the reconstruction procedure is about 20, 000 CPU hours.In fact we were lucky: on the challenge input, the first attempt with = 11 succeeded, so the whole process took only 12, 500 CPU hours.
It actually ran in 35 wall-clock minutes using 512 cluster nodes, each equipped with two 20-cores Intel Xeon Gold 6248 @ 2.5Ghz ("Cascade Lake").The actual machine is the jean-zay computer located at the IDRIS national computation center.Note that on this particular parallel computer, running the algorithm with = 13 would take 10 hours using the same amount of resources, so the whole procedure is practical, even in the absolute worst case.
The outer loop of algorithm 2 makes 2 2 −1 iterations while the inner loop makes 2 30 iterations.Using a single hardware execution context, we measured that one of the outer loop takes between 41.5s and 44s (apparently not all nodes of the cluster are running at exactly the same speed, potentially because of "turbo boost" adjustments and thermal constraints).Because of this variability, we implemented a master-slave work distribution pattern, in which a master process dispatches iterations of the outer loop to slave processes.This also made checkpointing very easy.We used MPI for inter-process communication.
With = 11, the whole process took 2 56.74 CPU cycles, which makes less than 6 cycles per iteration of the inner loop.We used essentially the same implementation tricks discussed above.However, this time we had to additionally implement the ConsistencyCheck procedure, which is called in the inner loop.We observed that the set of possible candidate values C only depends on w 0 (the variable of the outer loop).Therefore, before entering the inner loop, we precompute a bit field of size 2 describing C i .To simplify the implementation, we flatten them by computing C = ∪ i C i .This slightly increase the probability of false positives, but makes our code slightly simpler.

Conclusion
We have presented a practical state reconstruction algorithm for the PCG64 conventional pseudo-random number generator, the default in the NumPy library.In the worst case, we recovers all the secret information using 512 consecutive output bytes, using 2.3 CPU years of computation.We have executed the algorithm in practice using a large parallel computer.The PCG64 generator is fast and not intended for cryptographic purposes; we have shown that, in practice, this comes at the price of strong pseudo-randomness.It should absolutely not be used when unpredictability of the random numbers is required, for fear of practical attacks.
On the other hand, our results do not mean that PCG64 should be deprecated for scientific computing.But they do mean that its output has detectable properties.Whether these properties may affect the results of Monte-Carlo numerical simulations is another matter entirely.

Figure 1 :
Figure 1: Some conventional pseudo-random generators, designed for speed and simplicity.

Table 1 :
minimal t needed for a given n

Table 2 :
Empirical success probabilities with smaller parameters.