Scheduling block execution Do multiprocessors block each other?

SPWorley’s second idea does not really require atomics…

You can always code like:

for(int blk=blockIdx.x; blk<n; blk += gridDim.x)

{

	effect_blkid = blk;

/* code goes in here... */

}

All the developer needs to take care is : Spawn just enough blocks keep all the MPs busy… and this number varies from device to device… Spawning logic must take care of it…

Ashtey…

if your heterogeneous runtimes are periodic, you risk very bad scheduling bubbles by statically scheduling that way.

That static scheduling technique might help too, but since it doesn’t dynamically assign work to idle SMs you’d still have a lot of inefficiency if one set of blocks happened to be a lot slower than others.

It also requires you to figure out at runtime exactly how many blocks can run simultaneously on your device, which is nontrival (though certainly possible).

I’m surprised you state it does not work. I could understand (and might even expect) that with a little help from the hardware the new scheduling in Fermi outperforms the software implementation. But it’s hard to see why it should not work.

I meant that it’s going to always be a performance loss versus using the hardware scheduler.

Thanks for the clarification!