I’ve been profiling my app and the CPU time is much higher than my GPU time for my second kernel call, shadeR:
357553 render 13804.7 14512.1
379571 shadeR 19475 30734.2
858099 render 13797.3 14392.2
872650 shadeR 19453.6 30245.5
906936 render 13778 14296.4
Does anyone know why this might be? If I comment out some of my shading code (to exclude reflective and transparent materials) and increase block size to 16x16 the CPU roughly equals the GPU time. Why? I am still doing the same amount of memcopies. I would have expected the CPU time to roughly equal the GPU time in the above profiler output?
Anyone got any ideas? Tim Murray?
BUMP! Anyone? I can post some more code if it will help.
check out http://forums.nvidia.com/index.php?showtopic=94669
see if it’s a similar problem. Sorry I don’t know of any solutions at the moment.
I’ve tried various scheduling methods (polling & yielding), various thread priorities, etc - and my only conclusion is that the driver or hardware scheduler must be lazy - in the sense that it won’t free an MP the nano-second it completes a kernel, I’m guessing the scheduler has some kind of update frequency or event for freeing MPs, and it simply happens to be quite slow - thus with a high frequency of kernel invocations, especially lots of small/fast kernels, you probably can’t end up occupying the entire card due to the inability of the scheduler to free MPs that have finished executing.
(Purely a random shot in the dark, but nothing else makes sense. at least to me.)