Relation between SM and block

My understanding is that if I issue N blocks and there are n<N SMs on my GPU, these N blocks will wait in a queue for available SMs.
So even if the work load of each block is quite different, the GPU SMs are still always busy if there is enough blocks (assuming threads doesn’t stall).
Is this correct?

E.g., 4 blocks * 32 threads/block, and
block 0 does 32 addings,
block 1 does 64 addings,
block 2 does 64 addings,
block 3 does 32 addings,
and there are 2 SMs on GPU, the work load of SMs are still balanced though the work load of block are not. Right?

No, for example, TeslaC1060 has 30 SM.
one SM can have
(1) 8 active thread-block at most AND
(2) 1024 active threads at most
if your setting is 32 threads/block, then one SM can have 8 thread-blocks.

suppose you have 4 thread-blocks, then it would be scheduled into 4 SMs, not 2 SMs.

Hardware scheduler has no information about workload of each block.