[long winded explanation]

The state space of the generator is about 2^190. No matter how you setup the state, you will end up at 1 position on a long loop of 2^190 states. If you just had one thread, there’s no problem. You pick a seed, use that seed to generate a starting state, and you’re good to go.

When you have multiple threads you have to start thinking about how to get the threads working independently. The simplest way is for every thread to have a different seed. Each seed maps to some position in the 2^190 loop, so for 1 million threads you will end up with 1 million “cuts” of the loop. The distance between the cuts will be erratic, they will depend on weird interactions between the seeding algorithm and mathematical properties of the pseudorandom generator. There aren’t any mathematical results about when this is safe or not. For any particular pattern of seeds you can run statistical tests and verify that it’s OK. But when you choose a different pattern of seeds for the next experiment you don’t have assurance that this new pattern will be OK.

The idea of the sequence number and the 2^67 spacing is that the “cuts” are evenly spaced. No matter which seed you pick, every thread will be spaced 2^67 apart. That way you can analyze the statistical properties of the generator once, and the results hold for ANY seed.

As an example, suppose you use seeds of 1, 2, 3, …, n, and no sequence number. Suppose somehow the hash function that sets up the state from the seed value has a collision on 3 and 7. That means threads 3 and 7 are generating exactly the same numbers, forever! That’s obviously bad. Guaranteeing this won’t happen is hard; hash functions are designed to be hard to analyze. There are lots of subtle ways that the hash function could introduce some sort of bias, and no mathematical guarantees that it won’t happen.

[conclusion]

In the end the sequence number is an optional feature, it’s perfectly reasonable to use it or not. On the plus side it makes it possible to analyze the quality of the RNG results and extrapolate to multiple seed values. On the negative it is computationally expensive.