Which way are the block assignment done? Lets consider that I’ve a Tesla C1060 (30 S.M) and 103 thread blocks and because the resources limitation only 5 thread blocks can be assigned to each S.M. How will be the blocks assigned? 30-30-30-13 or 5-5-5-5-5-5-5-5…-3 ?
“But be aware that all these are un-documented, un-supported stuff. If you write your code based on that, you wont get support” – This is what NVIDIA moderators (tmurray) keep saying.
Absolutely. While I don’t believe it will, it might change with every driver update.
Or just on christmas eve, if the Nvidia programmers feel like it. External Image