Xorshift


Xorshift random number generators, also called shift-register generators are a class of pseudorandom number generators that were discovered by George Marsaglia. They are a subset of linear-feedback shift registers which allow a particularly efficient implementation without using excessively sparse polynomials. They generate the next number in their sequence by repeatedly taking the exclusive or of a number with a bit-shifted version of itself. This makes them extremely fast on modern computer architectures. Like all LFSRs, the parameters have to be chosen very carefully in order to achieve a long period.
Xorshift generators are among the fastest non-cryptographically-secure random number generators, requiring very small code and state. Although they do not pass every statistical test without further refinement, this weakness is well-known and easily amended by combining them with a non-linear function, resulting e.g. in a xorshift+ or xorshift* generator. A native C implementation of a xorshift+ generator that passes all tests from the BigCrush suite typically takes fewer than 10 clock cycles on x86 to generate a random number, thanks to instruction pipelining.
The scramblers known as + and * still leave weakness in the low bits, so they're intended for floating point use, as conversion of a random number to floating point discards the low bits. For general purpose, the scrambler ** makes the LFSR generators pass in all bits.
Because plain xorshift generators fail a few statistical tests, they have been accused of being unreliable.

Example implementation

A C version of three xorshift algorithms is given here. The first has one 32-bit word of state, and period 232−1. The second has one 64-bit word of state and period 264−1. The last has four 32-bit words of state, and period 2128−1. All use three shifts and three or four exclusive-or operations:

  1. include
struct xorshift32_state ;
/* The state word must be initialized to non-zero */
uint32_t xorshift32
struct xorshift64_state ;
uint64_t xorshift64
struct xorshift128_state ;
/* The state array must be initialized to not be all zero */
uint32_t xorshift128

The 128-bit algorithm passes the diehard tests. However, it fails the MatrixRank and LinearComp tests of the BigCrush test suite from the TestU01 framework.

Variations

All xorshift generators fail some tests out of TestU01's BigCrush test suite. This is true for all generators based on linear recurrences, such as the Mersenne Twister or WELL. However, it is easy to scramble the output of such generators to improve their quality.

xorwow

Marsaglia suggested scrambling the output by combining it with a simple additive counter modulo 232. This also increases the period by a factor of 232, to 2160−232:

  1. include
struct xorwow_state ;
/* The state array must be initialized to not be all zero in the first four words */
uint32_t xorwow

This performs well, but fails a few tests in BigCrush. This generator is the default in Nvidia's CUDA toolkit.

xorshift*

A generator takes a generator and applies an invertible multiplication to its output as a non-linear transformation, as suggested by Marsaglia. The following 64-bit generator with 64 bits of state has a maximal period of 264−1 and fails only the MatrixRank test of BigCrush:

  1. include
struct xorshift64s_state ;
uint64_t xorshift64s

A similar generator is suggested in Numerical Recipes as RanQ1, but it also fails the BirthdaySpacings test.
If the generator is modified to return only the high 32 bits, then it passes BigCrush with zero failures. In fact, a reduced version with only 40 bits of internal state passes the suite, suggesting a large safety margin.
Vigna suggests the following generator with 1024 bits of state and a maximal period of 21024−1; it however does not always pass BigCrush. xoshiro256** is therefore a much better option.

  1. include
/* The state must be seeded so that there is at least one non-zero element in array */
struct xorshift1024s_state ;
uint64_t xorshift1024s

Both generators, as it happens with all generators, emit a sequence of 64-bit values that is equidistributed in the maximum possible dimension.

xorshift+

Rather than using multiplication, it is possible to use addition as a faster non-linear transformation. The idea was first proposed by Saito and Matsumoto in the XSadd generator, which adds two consecutive outputs of an underlying generator based on 32-bit shifts.
XSadd, however, has some weakness in the low-order bits of its output; it fails several BigCrush tests when the output words are bit-reversed. To correct this problem, Vigna introduced the family, based on 64-bit shifts: the following generator uses 128 bits of state and has a maximal period of 2128−1. It passes BigCrush, but not when reversed.

  1. include
struct xorshift128p_state ;
/* The state must be seeded so that it is not all zero */
uint64_t xorshift128p

This generator is one of the fastest generators passing BigCrush. One disadvantage of adding consecutive outputs is while the underlying generator is 2-dimensionally equidistributed, the associated generator is only 1-dimensionally equidistributed.
Xorshift+ generators, even as large as, exhibit some detectable linearity in the low-order bits of their output.

xoshiro and xoroshiro

xoshiro and xoroshiro are other variations of the shift-register generators, using rotations in addition to shifts. According to Vigna, they are faster and produce better quality output than xorshift.
This class of generator has variants for 32-bit and 64-bit integer and floating point output; for floating point, one takes the upper 53 bits or the upper 23 bits, since the upper bits are of better quality than the lower bits in the floating point generators. The algorithms also include a jump function, which sets the state forward by some number of steps – usually a power of two that allows many threads of execution to start at distinct initial states.

xoshiro256**

xoshiro256** is the family's general-purpose random 64-bit number generator.

/* Adapted from the code included on Sebastian Vigna's website */
  1. include
uint64_t rol64
struct xoshiro256ss_state ;
uint64_t xoshiro256ss

xoshiro256+

xoshiro256+ is approximately 15% faster than xoshiro256**, but the lowest three bits have low linear complexity; therefore, it should be used only for floating point results by extracting the upper 53 bits.

  1. include
uint64_t rol64
struct xoshiro256p_state ;
uint64_t xoshiro256p

Other variants

If space is at a premium, xoroshiro128** is the equivalent of xoshiro256**, and xoroshiro128+ is the equivalent of xoshiro256+. These have smaller state spaces, and thus are less useful for massively parallel programs. xoroshiro128+ also exhibits a mild dependency in Hamming weights, generating a failure after 5TB of output. The authors do not believe that this can be detected in real world programs.
For 32-bit output, xoshiro128** and xoshiro128+ are exactly equivalent to xoshiro256** and xoshiro256+, with uint32_t in place of uint64_t, and with different shift/rotate constants; similarly, xoroshiro64** and xoroshiro64* are the equivalent of xoroshiro128** and xoroshiro128+ respectively. Unlike the functions with larger state, xoroshiro64** and xoroshiro64* are not straightforward ports of their higher-precision counterparts.
More recently, the ++ generators have been made as an alternative to the ** generators.

Initialization

It is the recommendation of the authors of the xoshiro paper to initialize the state of the generators using a generator which is radically different from the initialized generators, as well as one which will never give the "all-zero" state; for shift-register generators, this state is impossible to escape from. The authors specifically recommend using the SplitMix64 generator, from a 64-bit seed, as follows:

  1. include
struct splitmix64_state ;
uint64_t splitmix64
// as an example; one could do this same thing for any of the other generators
struct xorshift128_state xorshift128_init