Kernel crash Random number generation

I have a device function that generates random numbers and somehow this code causes the kernel to crash if I use a large value for the maximum random number. Here is the code:

#define CUDA_RAND_MAX 32767

__shared__ float rand_val[BLOCK_SIZE];

__device__ float CUDA_rand()


	float rand = 69069*rand_val[threadIdx.x] + 362437;

	rand = rand-(int(rand/CUDA_RAND_MAX)*CUDA_RAND_MAX);

	rand_val[threadIdx.x] = rand + 1;

	return rand + 1;


Kernel Code:

// *** Random number generation declarations ***

	rand_val[threadIdx.x] = g_SeedArray[index]; // Initialize rand_val to seed array value generated on the CPU

Code that calls the random number generator function:

while(s>=1.0||s<= 0.0)


 �  r1=2*((float)(1.0*CUDA_rand()/(CUDA_RAND_MAX+1.0)))-1;

 Â  r2=2*((float)(1.0*CUDA_rand()/(CUDA_RAND_MAX+1.0)))-1;

�  s=r1*r1+r2*r2;


Originally I thought maybe the problem was due to the fact I was using unsigned short/unsigned int for the rand_val and performing integer division and/or modulous, which was extremely slow. But even after chaning the value to float I am seeing the same issue.

Then I thought maybe the code was getting stuck in the while loop indefinitely where the random number function call occurs, but running in EmuDebug mode shows that in most cases it only takes 1 or 2 passes through the loop to get a valid value for ‘s’. It could be that EmuDebug isn’t showing how the values are actually being computed, maybe CUDA_rand() returns 0 ever time I thought. But I dismised this because if I change CUDA_RAND_MAX to a smaller value such as 50, the code works. Why with this implementation why would a large value cause my screen to flicker and the kernel to ultimately crash?

Thanks for any help.

I was able to solve the problem simply by using “floorf()” rather than “int()”. So I changed this line:

rand = rand-(int(rand/CUDA_RAND_MAX)*CUDA_RAND_MAX);


rand = rand-(floorf(rand/CUDA_RAND_MAX)*CUDA_RAND_MAX);

and the crash no longer occurs.

Why are you using modulus at all for that value of CUDA_RAND_MAX?

As the programming guide states, rand & 32767 works just fine, and unless your

code was very obfuscated the compiler should have done that kind of optimization.