I’m trying to generate a random number within a @cuda.jit Kernel. I need each thread to generate ~ 5000 random numbers. Because there will be ~ 10**5 blocks of 1024 threads each, generating a single random number array in global memory is not feasible (524 billion total random numbers). So I can’t simply use the curand bindings on a device array.
Is there a way to generate a random number within a Kernel written using @cuda.jit?
As an example, I’m trying to do something like:
i = cuda.grid(1)
out = 0.
for i in range(1024*5):
t = np.random.uniform() #Generate single random number
#Do something with t
out += t
d[i] = out