Profiler CPU >> GPU time why? only with certain config

st5486 · April 28, 2009, 4:22pm

Hi All,

I’ve been profiling my app and the CPU time is much higher than my GPU time for my second kernel call, shadeR:

357553 render 13804.7 14512.1
379571 shadeR 19475 30734.2
858099 render 13797.3 14392.2
872650 shadeR 19453.6 30245.5
906936 render 13778 14296.4

Does anyone know why this might be? If I comment out some of my shading code (to exclude reflective and transparent materials) and increase block size to 16x16 the CPU roughly equals the GPU time. Why? I am still doing the same amount of memcopies. I would have expected the CPU time to roughly equal the GPU time in the above profiler output?

Anyone got any ideas? Tim Murray?

st5486 · April 29, 2009, 6:00pm

BUMP! Anyone? I can post some more code if it will help.

gatoatigrado · April 29, 2009, 8:52pm

check out [url=“http://forums.nvidia.com/index.php?showtopic=94669”]http://forums.nvidia.com/index.php?showtopic=94669[/url]
see if it’s a similar problem. Sorry I don’t know of any solutions at the moment.

Smokey · April 29, 2009, 10:52pm

I’ve tried various scheduling methods (polling & yielding), various thread priorities, etc - and my only conclusion is that the driver or hardware scheduler must be lazy - in the sense that it won’t free an MP the nano-second it completes a kernel, I’m guessing the scheduler has some kind of update frequency or event for freeing MPs, and it simply happens to be quite slow - thus with a high frequency of kernel invocations, especially lots of small/fast kernels, you probably can’t end up occupying the entire card due to the inability of the scheduler to free MPs that have finished executing.

(Purely a random shot in the dark, but nothing else makes sense. at least to me.)