uncorrelated random numbers


I am trying to solve numerically a huge group of stochastic differential equations. Each equation has its own stochastic term, a brownian, which is totally independent of the others.

For doing this I need to run several monte carlo simulations and afterwards join the results. Each thread each calculating the value for one equation.

For generating the random numbers I am using curand, with just one seed and different sequence numbers for each thread. The curand guide says: “Sequences generated with the same seed and different sequence numbers will not have statistically correlated values.”

This is not really happening as I am running some test using the device API and the correlations can reach values up to 0.3. I attach an image with a graph where the vertical axis shows correlation values and the horizontal axis has the number of the trial.

In each trial instead of solving the equation I just returned a random number generated the way I wrote before. I run 100 simulations, and computed the correlations. As at each time steps random numbers where returned the correlation of this values should show the correlations in the generator.

The value that the correlation is giving me is quite high, at least higher than what I need.

Does anyone knows a good way of generating uncorrelated random numbers in CUDA?

For example, using python and numpy, I can make something like np.random.multivariate_normal(noise_mean,noise_covariance,n_steps) to generate all the random number needed. If in the covariance I put 0 everywhere except in the diagonal I can get uncorrelated random numbers.

Thanks in advance for your help and comments

A random sequence will sometimes show high correlations due to random fluctuations and finite sample size for each sequence. (It wouldn’t be random otherwise.) To decide if this is problematic, you have to figure out what the probability distribution of correlations from an uncorrelated source would be, and then decide if the distribution you see is inconsistent with that.

There might be a bug in curand, but I don’t think you can conclude it based on that plot.

To check the curand side of things, can you show the code region where you are initializing the curand generators?

Some thoughts/questions:

Do you get graphs that look significantly different when using numpy with the same sizes and numbers of sequences?

If you generate numbers from difference sequences and calculate the correlation, you won’t get precisely zero. The sequences should be independent draws from the selected distribution.

Another way to think about it is with measuring the mean. If you generate 100 random normals with mean 10 and standard deviation 5, the average of the results will probably be close to 10. But it won’t be precisely 10. If it were exactly 10 I would start to be suspicious about the “randomness” of the data.

Here’s an example in numpy calculating covariance of multivariate normals:

>>> numpy.cov(numpy.random.multivariate_normal((0, 0), [[1, 0], [0, 1]], 1000).T)

array([[ 1.05084079, -0.02067414],

       [-0.02067414,  0.98787443]])

The covariance matrix of the random sources is the 2d identity matrix. The measured covariance is close to the identity matrix but with some noise.

Another thing to try is switch to using a different seed for each thread all with sequence number 0. This will be a totally different ordering of results. If you make that change and get graphs that look the same, the problem is probably not with curand.

Thanks for the answers!!

About the last comment, you can see that the correlation in the numpy generated random numbers is -0.02067414 which is quite smaller than the values you can see on the graph, which are even reaching 0.3. In the model I am trying to simulate there are several Brownians, and the assumption is that they are independent from each other. In fact, a correlation between them of just 0.1 changes dramatically the final results. With values of around 0.02 the model gives the expected results (at least in the code of a colleague writen in Python, which I am trying to make faster).

I think this also apply to the first comment from seibert, I know it is impossible to reach exactly 0 as correlation, but you can reach values much lower than what the graph shows, thats what I need. I need the Brownian processes to be as uncorrelated as possible, and the correlation values from curand are much bigger than what you can expect with numpy (or I think even matlab).

A little info about my problem and how I am trying to solve it:

I am trying to simulate a neural network, where each neuron is described by a 5 stochastic differential equation. Each of the equations have a noise element inside (one of them has 2), and as I wrote before they are assumed to be uncorrelated (independent). I want to run the simulation with 50 000 neurons, so I need a huge amount of random numbers, hopefully with the lowest correlation possible.

Nathan said:
“Another thing to try is switch to using a different seed for each thread all with sequence number 0. This will be a totally different ordering of results. If you make that change and get graphs that look the same, the problem is probably not with curand.”

I can not use this as I have too many random numbers. Each thread in my case is computing the values for 1 neuron (5 equations), so I will need at least 50 000 seeds. If I defined 50 000 seed by myself, the most probable thing is that I will end up with undesired correlations. It will be quite difficult to define them.

AS asked before here is the code for the initialization, but some comments first:

I am using a multi-gpu machine, each card computes the values of a subgroup of the neurons. As for each neuron I have 6 Brownians (one equation has 2). I initialize 6 states per thread with different sequence number, so that each sequence is used in a different Brownian. The parameter startingPoint is the number of the first neuron to be calculated in the current card. The kernel is executed in each “slave” thread which controls each of the cards.

global void setup_kernel(curandState *state,int startingPoint)

int id = blockDim.xblockIdx.x + threadIdx.x;
Each thread gets same seed, a different sequence number,
no offset */
curand_init(1234, (id+startingPoint)6+0, 0, &state[id6+0]);
curand_init(1234, (id+startingPoint)6+1, 0, &state[id6+1]);
curand_init(1234, (id+startingPoint)6+2, 0, &state[id6+2]);
curand_init(1234, (id+startingPoint)6+3, 0, &state[id6+3]);
curand_init(1234, (id+startingPoint)6+4, 0, &state[id6+4]);
curand_init(1234, (id+startingPoint)6+5, 0, &state[id6+5]);


If this is not good or you have another idea, please tell me!! It is always possible I am not using correctly curand.

Thanks a lot for your help!!

OK, this looks fine. The only other possible problems I could imagine:

  • Accidentally grabbing the wrong RNG state due to an indexing mistake.

  • Copying the RNG state without realizing it and not copying it back to the device array.

Just to make sure I understand the test: Use curand to produce pairs of 1000 element sequences, where each element is normally distributed. Compute the covariance between the pairs and plot vs trial number.

Do I have that right?

Thanks again for answering!

The kernel for the test is this:

global void e_m(double *u_v, double t, double sum_y, curandState *state){

const int real_i = (blockDim.x*blockIdx.x + threadIdx.x);

//get the random number generator for the 6 brownians
curandState localState0 = state[real_i*6+0];
curandState localState1 = state[real_i*6+1];
curandState localState2 = state[real_i*6+2];
curandState localState3 = state[real_i*6+3];
curandState localState4 = state[real_i*6+4];
curandState localState5 = state[real_i*6+5];

u_v[0+i] = curand_normal_double(&localState0);
u_v[1+i] = curand_normal_double(&localState1);
u_v[2+i] = curand_normal_double(&localState2);
u_v[3+i] = curand_normal_double(&localState3); 
u_v[4+i] = curand_normal_double(&localState4);

state[real_i*6+0] = localState0;
state[real_i*6+1] = localState1;
state[real_i*6+2] = localState2;
state[real_i*6+3] = localState3;
state[real_i*6+4] = localState4;
state[real_i*6+5] = localState5;


The test is a little more complex, I described it next:

I know it only uses 5 randoms, but it is because this is an adapted version of the original kernel that calculates the values for the equations (5 of them), so it returns only the randoms.

The test goes like this:

I call one run to 2000 calls to this kernel (2000 integration time steps in the original code), with 10000 threads in total (10 000 neurons), originally this 10000 are divided in the different cards.

I make 100 runs, without changing seed, just in a loop.

I take 2000x10000 random variables, each of them is the random at one of the 2000 calls to the kernel for one of the neurons (10 000 neurons in total) . I have 100 possible values to each of the random variables.

I take two random neurons (any pair will be good), and I calculate the correlation coefficient between the 2 for each of the 2000 steps.

The plot on the first post shows this values for neurons 0 and 1000. That is why the horizontal axis goes from 0 to 2000.

I do not think I have made some of the errors you mention, as you can see in the kernel. If you see something weird please tell me!

Thanks for your help!

The test I was suggesting was just to rewrite that block of code to this:

curand_init((id+startingPoint)*6+0, 0, 0, &state[id*6+0]);

curand_init((id+startingPoint)*6+1, 0, 0, &state[id*6+1]);

curand_init((id+startingPoint)*6+2, 0, 0, &state[id*6+2]);

curand_init((id+startingPoint)*6+3, 0, 0, &state[id*6+3]);

curand_init((id+startingPoint)*6+4, 0, 0, &state[id*6+4]);

curand_init((id+startingPoint)*6+5, 0, 0, &state[id*6+5]);

CURAND does a hash step on the seed you pass in, so giving sequential seeds isn’t terrible. If this update significantly changes the correlations you’re seeing then the problem is related to how CURAND organizes subsequences, otherwise it’s something else.

An idea is to separate out the generation of the random values and their use, storing the intermediate values in global memory. This way you can replace the random generation with CPU computation and a cudaMemcpy() to copy the values into global memory. If other generators running on the CPU give the same test results it rules out problems with CURAND.

Thanks Nathan,

I just did the test you told me, and the final graph is similar to the first.

I have thought about the second idea you give on your message and make some tests. The problem is that the network I need to simulate is so big that I will need a huge amount of random numbers.
If I do this on the CPU I will wait for ever (I made some tests yesterday). I am currently looking around the literature on parallel random number generator but it seems this is a quite difficult topic.

Thanks for your advices.


It seems that 100 trials was not enough, I make 10000 and the result changed reaching a maximum of 0.03

I’m also working on a parallel random number generator, based on a very simple idea ;) I probably invented it a while ago and I am now “researching” if it can work on cuda ;) :)

If it does work I will provide a file with some random numbers in it generated by this cuda program.

Then people like you can stuff it into their statistic analyses programs to see if the random numbers generated are high quality or not.

I myself will also use winrar to try and compress it and also perhaps another program.

However I am not going to give it away for free because it’s too cool ! ;) =D

But I will sell the algorithm/kernel for 50 to 100 bucks or so ;)

It’s quite simple, but the devil is in the details ;)

Just a few lines of code.

I first still have to test if it works and if it’s any good…

Then I will put it on my website here:


Writing kernels is kinda fun… it allows me to use my programming skills to do cool stuff… and maybe people will buy the software for cheap prices ;) :)

Would be interesting bussiness model don’t you think ?! ;) =D

OK, seriously Skybuck, you are rapidly descending from “confused and chatty” into “semi-nonsensical forum spammer.” Maybe you don’t realize this, or just don’t care, but I figured I would point it out once, just in case.

And as a general service announcement: Random number generation is like cryptography. I highly advise against inventing your own techniques unless you really have a deep understanding of the field.

A thought that’s indirectly related to your question: can’t you use the Gillespie algorithm for your problem? We’ve obtained huge speedups compared to MC using this algorithm in stochastic reaction-diffusion problems. You might have to slightly adapt the algorithm, but from what I understand from your problem, I’m pretty sure it will work.

I think you just annoyed I don’t wanna share my invention and code with you ! ;) =D

I dont know anything about the Gillespie algorithm, I never heard of it before. Can you give me some more details or maybe some reference.


D.T. Gillespie (2007). Stochastic simulation of chemical kinetics. Annual Review of Physical Chemistry, 58:35–55.

You can use this method to greatly increase your time step, because, unlike direct simulation of the Markov processes, it reproduces the statistics of the process exactly for any dt, not only for dt->0.