I an pretty new to CUDA programming, but I need to expertise CUDA and GPU architecture.
These days I am working with Random Number generation with Mersenne Twister. I have some doubts on this implementation.
How many random numbers are generating from this by default. Initially I thought it is 4096, So I commented the BoX-Muller and printed the returned array from GPU. There were over 100000 of PRN there. Please comment on this, I really appreciate.
Why RandomGPU<<<32, 128>>>(d_Rand, N_PER_RNG); is calling inside a forloop. it is getting executed 10 times.
How to generate desired number of Random numbers from this. I need to measure the running time for various configurations.
of PRN time