I’m working on a conjugate gradient solver in CUDA. If I analyze my code using the Visual Profiler I see that the values in the CPU time column are significantly higher compared to the values in the GPU time column. e.g:
GPU time: 20 (usec)
CPU time: 70 (usec)
Is the difference in GPU and CPU time the overhead needed for the execution of the kernel? Is there anything I can try to reduce this overhead?
Stefan, I think not. I have the same thing happening, and the trouble is that the overhead is more or less a constant, so short running kernels have much much more relative overhead.
Hmm, in my case the overhead is about 50% of the overall computation time. It would be really nice if this overhead could be reduced.
Are you on linux/windows? I think I had about 40 usec overhead in linux. But as far as I know it also depends on the amount of input variables to your kernel.
Hi Denis, I use windows and the overhead in my case is about 40 usecs, too. It seems that this is quiet independent from the number of input variables, however the overhead increases significantly if I use a texture in my kernel (instead of global memory).
Hey, interesting. When my machine is back functional I will check, since I have a kernel that uses textures too.