I have recently started using the profiler in the 3.2 toolkit. I have a 9800gt on a ~3ghz machine. GPU times for my kernels are ~ 50us. When I look at the width plot there is a bit of idle time (white) and the whole application takes ~600us. I used the same code on a GTX580 quad core ~2.5 ghz. The gpu kernel execution times to half the time which is good. However the memcpy H->D were double and the idle time was huge!! the application now takes ~ 4000us pretty much all of that is idle!? Can anyone expaline this to me. Is idle time a function of the processor? The CPU times incidiently were fairl ylow on each machine so where does this idle time come from? Is it an operating system thing?
Yes, the profiler adds significant idle time. On linux, I found it to be anywhere from 20-100+ microseconds. Its been ages since I’ve even compiled CUDA on windows, so I cannot comment there.