what is the minimum number of blocks that will be assigned per SM as of CUDA 10.1?
zero
I think I have malformed my question. If I were to launch a grid with X blocks, where X < number of SM on my device, then at most how many SMs will be used by the grid?
I remember from somewhere that the SM-block assignment policy assigns at least 2 blocks per SM, so the answer would be X/2, but I might be wrong.
X
Certainly there can be no requirement of assigning two blocks per SM. Occupancy restrictions (e.g. maximal use of shared memory, maximal use of registers) may prevent this.