Massive idle time


I have recently started using the profiler in the 3.2 toolkit. I have a 9800gt on a ~3ghz machine. GPU times for my kernels are ~ 50us. When I look at the width plot there is a bit of idle time (white) and the whole application takes ~600us. I used the same code on a GTX580 quad core ~2.5 ghz. The gpu kernel execution times to half the time which is good. However the memcpy H->D were double and the idle time was huge!! the application now takes ~ 4000us pretty much all of that is idle!? Can anyone expaline this to me. Is idle time a function of the processor? The CPU times incidiently were fairl ylow on each machine so where does this idle time come from? Is it an operating system thing?

Any thoughts would be greatly appreciated.

The hardware setup for each machine is as follows:

Machine 1:

Intel Xeon X5450,


GeForce 9800GT

Windows XP (64 bit)

Machine 2:

Intel Core2 Quad Q9300


GeForce GTX580

Windows XP (32 bit)

Both machines are running 32 bit compilations, compiled using Visual Studio 2005 and Cuda Toolkit 3.2.

Does the profiler add a lot of idle time? Can someone please help me out with this one?

Yes, the profiler adds significant idle time. On linux, I found it to be anywhere from 20-100+ microseconds. Its been ages since I’ve even compiled CUDA on windows, so I cannot comment there.