Is there any information on how the GPU maps blocks to MPs?
For example if there are 30 MPs on a GPU and 60 blocks available for execution, would block 0 be assigned to MP0, block 1 to MP1, …, block 30 to MP0, block 31 to MP1,…?
The reason I am interested is distribution of an even load to MPs in the so called fat kernels. Imagine that a fat kernel executes two blocks of code one is more time consuming that the other. We want to make sure that each MP executes one block on each section so that load is evenly balanced.