I’m doing some profiling of my CUDA application on a GTX 280, and I have a little question.
I have some code running on the CPU where I want to hide its computations by running it right after a kernel launch.
So instead of running 20ms on the GPU followed by 20 ms on the CPU I want to do both at once, hence a total running time of 20 ms.
I think I have managed to do this because I’m able to observe the expected speedup. However, in the visual profiler I do not see the expected results.
There is still a gap in the GPU-time-width plot that I expected to be closed.
So how should I interpret the GPU time stamps? In the results I have here, it looks like the time stamp is the sum of time spent on GPU and CPU.