Can multiple executions of the same kernel execute on the same SM? i.e. if the thread block configured for a kernel is 256 threads and there is only one thread block (grid 1x1) are there circumstances when they would end up on the same SM and execute concurrently?
It’s theoretically possible. For example, there exist GPUs with only a single SM. If you launch two separate kernels, into two separate streams, then its possible that threadblocks from each kernel launch could be co-resident on the SM, subject to occupancy, timing, and perhaps other factors.