Profiler CPU >> GPU time why? only with certain config

Hi All,

I’ve been profiling my app and the CPU time is much higher than my GPU time for my second kernel call, shadeR:

357553 render 13804.7 14512.1
379571 shadeR 19475 30734.2
858099 render 13797.3 14392.2
872650 shadeR 19453.6 30245.5
906936 render 13778 14296.4

Does anyone know why this might be? If I comment out some of my shading code (to exclude reflective and transparent materials) and increase block size to 16x16 the CPU roughly equals the GPU time. Why? I am still doing the same amount of memcopies. I would have expected the CPU time to roughly equal the GPU time in the above profiler output?

Anyone got any ideas? Tim Murray?

BUMP! Anyone? I can post some more code if it will help.

check out http://forums.nvidia.com/index.php?showtopic=94669
see if it’s a similar problem. Sorry I don’t know of any solutions at the moment.

I’ve tried various scheduling methods (polling & yielding), various thread priorities, etc - and my only conclusion is that the driver or hardware scheduler must be lazy - in the sense that it won’t free an MP the nano-second it completes a kernel, I’m guessing the scheduler has some kind of update frequency or event for freeing MPs, and it simply happens to be quite slow - thus with a high frequency of kernel invocations, especially lots of small/fast kernels, you probably can’t end up occupying the entire card due to the inability of the scheduler to free MPs that have finished executing.

(Purely a random shot in the dark, but nothing else makes sense. at least to me.)