Distribution of Blocks to MPs

Is there any information on how the GPU maps blocks to MPs?

For example if there are 30 MPs on a GPU and 60 blocks available for execution, would block 0 be assigned to MP0, block 1 to MP1, …, block 30 to MP0, block 31 to MP1,…?

The reason I am interested is distribution of an even load to MPs in the so called fat kernels. Imagine that a fat kernel executes two blocks of code one is more time consuming that the other. We want to make sure that each MP executes one block on each section so that load is evenly balanced.


I think that the schedule of the MP was made by the hardware, and it don’t follow any rule about where the blocks are executed.