I see that the mandelbrot sample app only creates as many blocks as the GPU can run simultaneously, it then has kernel code to allocate work units in turn and process them.
The obvious way of doing things is to create as many blocks as are required to compute the complete result, and let the run time system schedule these onto the GPU resources.
This mandelbrot sample app approach suggests that the run time system isn’t too smart at scheduling (or there is something else going on I don’t understand).
Is the sample mandelbrot approach worth while, or is this just Mark showing other techniques that could be useful elsewhere? (or even that the early run time scheduling wasn’t very good)
(I’m running on Linux btw in case it’s relevant)