Will the neighbor blocks be batched in the same SM?

As we know, several blocks can be batched in the same SM.

my question is, in the beginning, when all the SM is idle, for example, the neighbor block 1, 2, 3, … , will be batched into the same SM?

that is
1, 2, 3 → SM1
4, 5, 6 → SM2

(assume that one SM can contain three blocks)

Is that true?

thank you for answer!

Not necessarily. What can it help you?

Not in general, no. An obvious counterexample is if you have a GTX280 with 30 SMs, and a kernel with a grid of 50 blocks. Every SM will end up getting an initial assignment of only 1 or 2 blocks, even though they could hold 3.

thank you for your answer, but will the neighbor blocks be batched on neighbor SM?

that is to say:

1 ->SM1

2 ->SM2

will?

Some blocks may terminate early (depending on the algorithm in the kernel) so typically you can not expect blocks to be scheduled in sequence on the SMs.

Also any CUDA update may change the scheduling mechanism, so relying on undocumented behavior will get you in trouble eventually.

Never try to guess where your blocks will go. You may be wrong!

(and if you ever write any code that depends on block scheduling, I will sit in the corner and be sad. or maybe yell at you a lot. actually yeah, probably the second)