I an pretty new to CUDA programming, but I need to expertise CUDA and GPU architecture.
These days I am working with Random Number generation with Mersenne Twister. I have some doubts on this implementation.
How many random numbers are generating from this by default. Initially I thought it is 4096, So I commented the BoX-Muller and printed the returned array from GPU. There were over 100000 of PRN there. Please comment on this, I really appreciate.
Why RandomGPU<<<32, 128>>>(d_Rand, N_PER_RNG); is calling inside a forloop. it is getting executed 10 times.
How to generate desired number of Random numbers from this. I need to measure the running time for various configurations.
For example,
Yes I am doing that. I just wanted to get in touch with some interesting things to me. Actually this was urgent to me, that’s why I wanted help from experts.
I suggest to clearly state in topic that this is about sdk sample. It probably adds additional cycles to measure time. You may also try to reach sample author.
I suggest to check cpu “gold” implementation to see how many numbers are generated. Things are a bit complicated there, cause each thread produces bunch of numbers. Maybe debug program to see what is going on.