Sobol32 Generator in curand(newbie)

I am creating a program in which I need to generate 2-D co-ordinates in normal distribution.
I do not know the precise working of SOBOL Generator, hence I do not know whether SOBOL will generate a multivariate normal distribution or will generate x co-ordinates and y co-ordinates in normal distribution indivudually.

I have created 1 block with only 32 threads in one dimension for starters and I need 32 normally distributed 2-D points
Here is the code snippet which I have made:

#define pts 32
#define CUDA_CALL(x) do { if((x) != cudaSuccess) { printf(“Error at %s:%d\n”,FILE,LINE);return EXIT_FAILURE;}} while(0)
#define CURAND_CALL(x) do { if((x)!=CURAND_STATUS_SUCCESS) { printf(“Error at %s :% d \n”, FILE , LINE); return EXIT_FAILURE;}}while(0)

global void setup_kernel(unsigned int *sobolDirectionVectors,curandStateSobol32 *state){

int id = threadIdx.x;
int dim = 2id;
/
Each thread uses 2 different dimensions */
curand_init(sobolDirectionVectors + dim,1234, &state[dim]);
curand_init(sobolDirectionVectors + dim + 1, 1234, &state[dim + 1]);
}

global void SPSA(curandStateSobol32 *state,float *result)
{

     int id = threadIdx.x;
 int baseDim = 2 * id;
     result[basedim]=curand_normal(&state[baseDim]);	
 result[basedim+1]=curand_normal(&state[baseDim+1]);

}

int main()
{
curandStateSobol32 *devSobol32States;
unsigned int * devDirectionVectors32;
curandDirectionVectors32_t *vectors;
float *h_result, *d_result;

    //Allocate space for result in host memory
h_result = (float*)calloc(2*pts , sizeof(float));

     // Allocate Space for result in device memory
	CUDA_CALL(cudaMalloc((void **)&d_result, 2 * pts * sizeof(float)));


    //Allocate memory for 2 states per thread, each state to get a unique dimension 
CUDA_CALL(cudaMalloc((void **)&(devSobol32States), 32 * 2 * sizeof(curandStateSobol32)));
    // Allocate memory and copy 2 sets of vectors per thread to the device 
CURAND_CALL(curandGetDirectionVectors32(&vectors,CURAND_DIRECTION_VECTORS_32_JOEKUO6));
CUDA_CALL(cudaMalloc((void **)&(devDirectionVectors32),2 * 32 * 32 * sizeof(int)));
CUDA_CALL(cudaMemcpy(devDirectionVectors32,vectors,2*32*32*sizeof(int),cudaMemCpyHostToDevice));

//Initialize the states 
    setup_kernel<<<1,32>>>(devDirectionVectors32,devSobol32States);

    // Generate numbers
SPSA<<<1,32>>>(devSobol32States,d_result);
   /* Copy result from device memory to host */
	CUDA_CALL(cudaMemcpy(h_result, d_result,2* pts * sizeof(float),cudaMemcpyDeviceToHost));

return 0;
}

Any help would be appreciated.
Thank you.

Have You read this topic: “Confused about CURAND Sobol generator”?

This chapter of GPU Gems may be of use for You also, I think, though it does not use curand (just for the idea of pseudo-random number generation in parallel).

MK

I have read that topic many times and with its help only I was able to use SOBOL32. However there are some changes in my code which I am not entirely sure are correct.
Also, I am not that good at statistics, so I do not know if SOBOL32 generates x and y co-orinates individually using normal distribution or does it generate the combination (x,y) normally. As far as my knowledge goes, these two are not necessarily equal.

Thank you for the reference, I will definitely go through it,