skipping ahead in Sobol RNG

Hi all,

I am trying to implement a montecarlo option pricer with Sobol numbers. While the excellent thread https://devtalk.nvidia.com/default/topic/539775 was enough to get me started, i got stuck on parallelizing my sequence generation. My simulation is split in simulation batches (say 5000 paths each), so every following batch needs to start generating numbers where the previous batch as stopped. I thought it was as trivial as using the skipahead function and providing the scenario index as an offset, but this didn’t really work. Even getting skipahead to skip to the first or second sobol number vs. consecutive curand_normal calls didn’t give the same results.

e.g. in my understanding, both kernels in the code below should produce the same values for the second element of the 1D sequence:

#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <curand_kernel.h>

#define DIRECTION_VECTOR_SIZE 32

#define CUDA_CALL(x) do { if((x) != cudaSuccess) { \
printf("Error at %s:%d\n",__FILE__,__LINE__); \
return EXIT_FAILURE;}} while(0)

#define CURAND_CALL(x) do { if((x) != CURAND_STATUS_SUCCESS) { \
printf("Error at %s:%d\n",__FILE__,__LINE__); \
return EXIT_FAILURE;}} while(0)

__global__ void kernel_no_skipahead(unsigned int * sobolDirectionVectors, curandStateSobol32 *state)
{
	int tid = threadIdx.x + blockIdx.x * blockDim.x;
	if (tid>0)
		return;
	
	__shared__ double s_randomData[3];
	/* Each thread uses 3 different dimensions */
	curand_init(sobolDirectionVectors + DIRECTION_VECTOR_SIZE*tid, 1234, &state[0]);
    s_randomData[0] = curand_normal(&state[0]);
	s_randomData[1] = curand_normal(&state[0]);
	s_randomData[2] = curand_normal(&state[0]);
	printf("no skip ahead for element 0 = %f\n", s_randomData[0]);
	printf("no skip ahead for element 1 = %f\n", s_randomData[1]);
	printf("no skip ahead for element 2 = %f\n", s_randomData[2]);
}

__global__ void kernel_skipahead(unsigned int * sobolDirectionVectors, curandStateSobol32 *state)
{
	int tid = threadIdx.x + blockIdx.x * blockDim.x;
	if (tid>0)
		return;
	
	__shared__ double s_randomData[3];
	/* Each thread uses 3 different dimensions */
	curand_init(sobolDirectionVectors + DIRECTION_VECTOR_SIZE*tid, 1234, &state[0]);
	
	skipahead((unsigned int)2,&state[0]);
	s_randomData[2] = curand_normal(&state[0]);
	printf("skip ahead for element 2 = %f\n",s_randomData[2]);
}

int main(int argc, char *argv[])
{
	unsigned int * d_directionVectors32;
	
	curandDirectionVectors32_t *h_directionVectors32;
		
	curandStateSobol32 *devSobol32States1;
	curandStateSobol32 *devSobol32States2;
	
	CUDA_CALL(cudaMalloc((void **)&devSobol32States1, 32 * sizeof(curandStateSobol32)));
	CUDA_CALL(cudaMalloc((void **)&devSobol32States2, 32 * sizeof(curandStateSobol32)));
    
	CURAND_CALL(curandGetDirectionVectors32(&h_directionVectors32,CURAND_DIRECTION_VECTORS_32_JOEKUO6));
	
	CUDA_CALL(cudaMalloc((void **)&(d_directionVectors32), DIRECTION_VECTOR_SIZE * sizeof(int)));
	CUDA_CALL(cudaMemcpy(d_directionVectors32,h_directionVectors32, DIRECTION_VECTOR_SIZE * sizeof(int),cudaMemcpyHostToDevice));

	kernel_skipahead<<<1,1>>>(d_directionVectors32, devSobol32States1);
	cudaDeviceSynchronize();
	kernel_no_skipahead<<<1,1>>>(d_directionVectors32, devSobol32States2);
	cudaDeviceSynchronize();

	return EXIT_SUCCESS;
}

The same behaviour is observed when i use curand_uniform, instead of curand_normal. With all said above, i would be grateful for tips/links/examples of skipahead usage/parallel sobol rng generation!

Thanks

Vlad

I modified the Sobol in the SDK to do skip ahead. The code is in the kooderive project.

Thank you, Mark! I will dig into your code.