I have a few questions regarding block and thread scheduling.
I know the CUDA programming guide says issue order of blocks within a grid and warps within a block are undefined. I understand this statement in general, but I would like to know whether the issue order in the Quadro FX 5600 or GeForce 8800GTX implementations are somewhat predictable and/or their reliability.
In general, once a block begins execution on an SM, can it be preempted and scheduled later? I’m not talking about multiple blocks running on an SM simultaneously, but about one block’s state being saved to global memory, sleeps for a while, and then state is restored and it resumes execution on the same SM. Can I assume blocks, once started, run to completion?
Even if the warp issue order is undefined, is there an efficient way to find which warp is issued first? I need one thread/block to do a global memory write, and I’d like to put that write in the first warp scheduled for performance reasons. Similarly, I’d like to have a thread in the last warp scheduled do a global memory read.