I previously used to time my CPU code inside the host code, using cuda timer. Let’s say I got 15X speedup.
Now, I run my code through cuda profiler, and both CPU and GPU show the same time !!! no speedup?
How is that possible? Has anyone saw this before??
CPU time is the time from start of CUDA call to return to CPU. So in general CPU time = GPU time + a little overhead for transfer of parameters to the GPU, etc. GPU time is the time from kernel start on GPU to kernel end on GPU.