scheduling blocks - automatic or by code? one sample app (mandelbrot) does it's own block schedu

I see that the mandelbrot sample app only creates as many blocks as the GPU can run simultaneously, it then has kernel code to allocate work units in turn and process them.

The obvious way of doing things is to create as many blocks as are required to compute the complete result, and let the run time system schedule these onto the GPU resources.

This mandelbrot sample app approach suggests that the run time system isn’t too smart at scheduling (or there is something else going on I don’t understand).

Is the sample mandelbrot approach worth while, or is this just Mark showing other techniques that could be useful elsewhere? (or even that the early run time scheduling wasn’t very good)

(I’m running on Linux btw in case it’s relevant)

Yes, the block scheduling hardware is pretty simple, and optimized for homogeneous units of work. For some workloads (like ray tracing and the Mandelbrot sample) where blocks have wildly varying run-times, it can sometimes be more efficient to do software scheduling. This paper has some details:

http://www.nvidia.com/object/nvidia_research_pub_011.html

Thanks Simon, I knocked up a quick trial and it came out just very slightly slower than letting blosck scheduling handle it, but I’ll go and dig a bit further now

very interesting paper too :)