I’m trying to reuse Mersenne Twister (SDK) in my application.
My application requires at least 1536000000 random numbers.
I’m not sure if I got it right for now.
In MT we have mt_struct_stripped ds_MT which is set of configs (bit-vector Mersenne Twister parameters) that are loaded from a file MersenneTwister.dat. For each config from the ds_MT we generate NPerRng random numbers.
MT_RNG_COUNT parameter specifies how many configs we have, that is, the number of independent random number series (?). For each config, we generate N_PER_RNG random numbers.
Questions:
How independent are those numbers generated from two different configurations?
How big can N_PER_RNG be?
If it can’t be really big, we want to increase MT_RNG_COUNT in order to get more random numbers. Since MersenneTwister.dat contains data for only 4096 configs, we got to update the file (regenerate it, whatever). How that can be done? Is it just entropy, or those configs are specially generated by something (if that is the case, how can one generate more configs)?
Well, at least I found answer for the question 3 from above: these configurations are “offline”-generated ones, and it is computationally intense to generate them. From the site you posted we can have file with data for 32768 configs.
Question 2: seems like it can be very large, at least testing for numbers 10 millions and more didn’t hit the period.
Question 1: are they correlated in any way? What are distribution properties?
One more question: What is the difference (in terms of number series properties and quality) between generating m * n random numbers from m configurations, n numbers per configuration, and generating m * n numbers from a single configuration (all in row)?
My program calculates some well known graph so that I know what I aim to. When using regular rand() in sequential version, I get pretty smooth graph with 10000 random numbers. When using Mersenne Twister, my graph is not so smooth, almost the same as I get with 1000 random numbers in sequential version. I’ve tested numbers generated by Mersenne Twister for distribution properties, and found them satisfactory. Now I wonder what other properties of MT numbers could affect the solution in such way… External Image
You can’t use words like “smooth” and “random” when testing PRNG behavior… they’re too vague.
Is there some test which the MT values fail but the rand() test passes?
I suspect that “smooth” here does not mean random, it means a short period or a sample correlation such that the histogram of values is TOO evenly sampled. This happens a lot if you use an an inadequate LCG with power-of-two modulus as your PRNG and use only the low bits of the prand.
I suspect that you are absolutely right in that last one!
Sorry for me confusing you with this “smooth” word =) What I meant is that I build the simulator for some physics process, and the result is well known (that is, the graph of the process is studied well). So what I meant was that my sequential version produces results that are very close to the model with only 10000 random numbers involved in Monte Carlo algorithm, but parallel version under CUDA with 10000 randoms gives me a polyline (as in sequential with 1000 randoms). Hope it’s clear now.
So, assuming that you are right, the problem first of all in using rand()?
p.s. no, the only test I performed was drawing the square (with the side of 1) and filling it with points generated by PRNG.