how to get same output by CURAND in CPU and GPU

educnq · July 10, 2011, 4:12am

I want to get same random number sequence by CURAND in CPU and GPU (for one thread in both cases). But they differ by ‘regular’ way (as in CURAND guide). The code I used for CPU is:

// compile: nvcc thisfile.c -lcurand

#include <stdio.h>

#include <curand.h>

int main()

{

  int i;

  double data;

  curandGenerator_t gen;

curandCreateGeneratorHost(&gen, CURAND_RNG_PSEUDO_DEFAULT);

  curandSetPseudoRandomGeneratorSeed(gen,1234);

  for(i=0;i<10;i++){

    //curandSetGeneratorOffset(gen,i*8192); // 2*4096

    curandGenerateUniformDouble(gen, &data, 1);

    printf("%lf ",data);

  }

  printf("\n");

  curandDestroyGenerator(gen);

  return 0;

}

The GPU code is:

// compile: nvcc thisfile.cu

#include <stdio.h>

#include <curand_kernel.h>

__global__ void kernel()

{

  int i;

  curandState rngState;

curand_init(1234,0,0,&rngState);

  for(i=0;i<10;i++){

    //curand_init(1234,i,0,&rngState); // i: sequence number

    printf("%lf ",curand_uniform_double(&rngState));

  }

  printf("\n");

}

int main()

{

  int *foo; // for in-kernel printf

  cudaMalloc(&foo,sizeof(int));

  kernel<<<1,1>>>();

  cudaFree(foo);

}

However, if I (1) remove the curand_init before for loop in the GPU code and use the curand_init (now commented) inside the loop, their results are same. Or, if I (2) start to use curandSetGeneratorOffset in the CPU code (now commented), the CPU output became same as that of GPU. Does that mean GPU and CPU versions are using same sequence, but CPU version is picking numbers from the sequence every 8192 numbers (the 2nd way), or alternatively, (as the 1st way) CPU version is picking numbers from the sequence every 2^67 numbers (as stated in here)? and why it does this? It looks strange…

educnq · July 10, 2011, 5:41am

The code above is for double. For float, only the 1st way work. To make 2nd way work too, offset factor needs to be changed from 8192 to 4096.

educnq · July 14, 2011, 7:31am

Anyone know the answer to the title, or is it a reason few people using CURAND?

NathanW · July 19, 2011, 6:57pm

The quick answer: the simplest way to get the same results on the CPU and GPU is to use the host API. This allows you to generate random values into memory on the CPU or the GPU, the only difference is whether you call curandCreateGeneratorHost() versus curandCreateGenerator().

To get the same results from the host API and the device API is a bit more work, you have to set things up carefully. The basic idea is that mathematically there is one long sequence of pseudorandom numbers. This long sequence is then cut up into chunks and shuffled together to get a final sequence that can be generated in parallel.

For CURAND, we use the seed to pick a random spot to start, then cut the long sequence into 4096 chunks each spaced 2^67 positions apart. The host API lets you grab blocks of results from this shuffled sequence. If you request 8192 results, you will get the first result from each of the 4096 chunks, then the second result from each of the 4096 chunks.

For the device API using curand_init(), you explicitly give the subsequence number and manage the threads yourself. If you want to exactly match the results from the host API you need to launch 4096 total threads, then have each one call curand_init() with the same seed and subsequence numbers from 0 to 4095. Then you need to store the results in a coalesced strided manner; thread 0 goes first with one value, then next in memory is thread 1 with one value, then thread 2, etc…

The reason you are seeing the number 8192 is because you are generating double precision values. Each double result uses 2 32-bit results.

Let me know if that doesn’t explain things.

Topic		Replies	Views
Differences between host API and device API for CURAND? CUDA Programming and Performance	4	12084	February 16, 2011
Inconsistent offsets in CURAND Host API CUDA Programming and Performance	7	3564	May 23, 2012
CURAND: Independence of RND numbers Are Host API generated RND numbers independant? CUDA Programming and Performance	3	5849	May 26, 2011
Trying to understand CURand (curand_init) sequence input parameter CUDA Programming and Performance	5	5529	April 19, 2011
How do I test random kernels vs cpu? CUDA Programming and Performance	2	539	August 8, 2017
CURAND question CUDA Programming and Performance	1	1421	December 1, 2010
About Host API of curand curandGenerateUniform() function CUDA Programming and Performance	3	1692	June 15, 2016
need help with CURAND CUDA Programming and Performance	5	4055	December 12, 2011
CURAND won't work correctly on 32 bit windows 7 CURAND doesn't return random nubers CUDA Programming and Performance	2	5885	July 11, 2011
CURAND performace? CUDA Programming and Performance	0	3263	July 14, 2011

how to get same output by CURAND in CPU and GPU

Related topics