Hi,
i checked the random number generation algorithm, and it seems to be compute intensive. My neural net GA uses a lot of random number calls. Memory cost is not a problem, but the speed is critical. So , i thought maybe I could reuse the random numbers?
Consider the following:
- I generate a lot of random numbers on the GPU.
- I transfrer these numbers from GPU to HOST (at a cost of 5GB/sec) and store them in an array inside the main memory (around 2GB of random numbers) with a further retrieval cost of around 25GB/sec with DDR3
- I fetch these numbers as I need them in sequential order and they will be used only within code run by CPU
- When I reach the end of the array of 2GB, i will loop it another 10 times to reuse the data.
- To make the reuse more random, i will not start reusing from the beginning again but shift the beginning by using a random number (generated by CPU or using current time) as the start of the loop, and then loop for the whole 2GB again. If the end of 2GB array is reached, I will just reset to the index 0 and finish at the position it started. After reusing random numbers for some time (i don’t know, 10 times, or maybe 20? experimental results will show) i will load another 2Gigs of random numbers again from the GPU.
I would like your comments on this idea, will reusing random numbers affect my problem-solving algorithm by some degree of non-randomness? And how big should be the array of random numbers for reuse? And how many times you suggest me to reuse it?
Thanks in advance for any comment.
Nulik
Just an idea, but it sounds like most of your code is being run on the CPU and the GPU is only being used to generate random numbers.
If this is the case, it seems like you might be able to use the fact that GPU execution is async and continue generating random numbers on the GPU while the CPU is executing.
Something like…
- Call kernel with output arrray1
- Retrieve array1 of rands
- Call kernel with a second output array array2
- Launch CPU code
- Retrieve array2 of rands
- When you run out of random numbers in array1, start using the numbers from array2
- Call kernel with output array1
repeat…
This way the CPU can keep using random numbers and the GPU is executing constantly in the background so that you don’t have to wait as long for results every time…as long as the CPU takes longer than the GPU does.
sorry, i forgot to tell, that GPU must run the fitness() function which is compute intensive, so i won’t be able to use it for random generation all the time.
This is a hard question with no obvious right answer. The quality of random number generation required for a particular algorithm is difficult to determine. A game AI that uses random numbers to decide how to make a character behave can tolerate very poor random numbers, whereas something like Markov Chain Monte Carlo algorithms probably need very good random numbers to guarantee convergence.
However, random number generation has a similar rule of thumb to cryptography: Don’t try to invent your own ad-hoc variations on existing methods. Immediately reusing a previous random number sequence with a “random” offset might sound “random enough,” but the effect depends a lot on what you are doing.
If the bottleneck in your random number generation is the device-to-host transfers, perhaps it would be better to move random number generation to the CPU using a simpler algorithm, like a linear congruence generator. CUDA doesn’t sound like a good fit for this problem unless you can also use the random numbers directly on the device without copying them to the host.