curand_init fails with non-zero subsequence or offset

I’m trying to use the curand library to create random numbers inside an optix program. Running a test program seems to show that curand_init fails when subsequence and offset are not zero.

The program does this (it’s basically just a kernel, despite the slightly different syntax)

RT_PROGRAM void set_random_state()
{
curand_init(0, 0, 0, &random_state_buffer[launch_index]);
rtPrintf(“%d %d %d : %f\n”, launch_index.x, launch_index.y, launch_index.z, curand_uniform(&random_state_buffer[launch_index]));
}

This code runs fine and the printf calls prints out the random number. If I change the call to curand_init to

curand_init(0, 1, 0, &random_state_buffer[launch_index]);

There is no output, which I believe means that the kernel died.

Any ideas as to what I’m doing wrong or if it’s a bug?

The system is a gtx690 running under debian linux with cuda 5 and optix 3, driver 313.30

Thanks

I have a couple of questions to make sure I’m looking at the right code…

Are you using the default generator (XORWOW)?

I’m wondering about your use of [launch_index] as an index into your state array, when it looks from the rtPrintf to be a structure?

Are you able to build and run the kernel example from the documentation pages? (It does something very similar to your example with curand_init)

The printf in your sample code doesn’t print a random number, so I’m wondering if you could include more of your program so I can see how the kernel is invoked.

Let me simplify this. The output of the following code that is pure cuda is:

0
kernel1: 0.740219
0
0
0
0
0

There should be several kernel#: # lines in there


#include <curand_kernel.h>
#include <stdio.h>
#include

device curandState_t state;

global void kernel()
{
curand_init(0, 0, 0, &state);
printf(“kernel1: %f\n”, curand_uniform(&state));
}

global void kernel2()
{
curand_init(0, 1, 0, &state);
printf(“kernel2: %f\n”, curand_uniform(&state));
}

global void kernel3()
{
curand_init(0, 0, 1, &state);
printf(“kernel3: %f\n”, curand_uniform(&state));
}

int main()
{
kernel<<<1,1>>>();
std::cout << cudaGetLastError() << std::endl;
kernel2<<<1,1>>>();
std::cout << cudaGetLastError() << std::endl;
kernel3<<<1,1>>>();
std::cout << cudaGetLastError() << std::endl;
kernel<<<1,1>>>();
std::cout << cudaGetLastError() << std::endl;
kernel2<<<1,1>>>();
std::cout << cudaGetLastError() << std::endl;
kernel3<<<1,1>>>();
std::cout << cudaGetLastError() << std::endl;

    return 0;

}

Thanks that helps. The issue is the asynchronous nature of kernel printf. To see all the kernel printf’s in sequence with those from the CPU you can add a cudaThreadSynchronize() after each kernel call. When I did this with your sample code the output looks like:

kernel1: 0.740219
0
kernel2: 0.920994
0
kernel3: 0.438451
0
kernel1: 0.740219
0
kernel2: 0.920994
0
kernel3: 0.438451
0

I tried to use curand in OptiX as well. Since in OptiX there is no dimention control of grid and block, and all generator-algorithmes are closely tied to number of thread and block. I wonder how you manage to initialize states.
thanks in advance.