i have to show to my team GPU performances compared to CPU’s.
It sure is correct to say that CPU execution-time increases linearly with the number of iterations.
I have some problems with GPU one tough.
I am referring to the variables allocation time PLUS execution time.
A fact is that there is a fixed amount of time needed to allocate variables and get the results back. If programmed properly the global computational time should increase much less faster (or better negligibly) than CPU one.
My problem is to understand what happens if the number of operation is very low. I had a case where i used a kernel module with only 50 operations and the global computing time was of several seconds. Instead when the operation executed were several thousands this time was of a bunch of milliseconds.
Is that coherent with your experiences?
If not what could be the trick? And if yes how can we justify the fact that the computational time decreases?
I hope i have been clear enough. Could someone tell me what should be the differences if we execute the same kernel module allocating very few (10 or 50) or several thousands (10M or 500M)of threads?