A basic question about block scheduling

No, once a block has been launched on an SM, it stays on that SM and consumes a slot, until it is retired (i.e. the threadblock completes and terminates execution). If a SM has a full complement of threadblocks scheduled on it, one or more of those threadblocks must complete before any new threadblocks can be scheduled, on that SM.

There might be a situation where a threadblock does get rescheduled, but for basic CUDA understanding, these cases can usually be excluded.