hardware scheduling logic on the GPU

I read a post[url]cuda - Limitations of threads and blocks and execution of threads and blocks - Stack Overflow, which consists of a question and the corresponding answer. Put them here:
“question: Basic question: No. of blocks per grid in configuration execution, it means, grid will consume all SMs or single SM?
answer: Yes, the grid associated with any kernel launch can use any or all of the SMs in a GPU. This is handled by hardware scheduling logic on the GPU and you should not concern yourself with the details of it. The GPU will attempt to best schedule your blocks on available SMs to maximize throughput.”

My question is, where can I find the reference of the hardware scheduling logic of GPU and whether can I assign only one SM to the kernel by programming?
Mnay thanks in advance!

I dont think nVidia publicaly published the logic for scheduling tasks on the GPU.
That said, it is probably a logical one such as blocks with lower ids will run first
before higher ids.
I think people showed in the past that this is rather determenistic by launching blocks
and printing/collecting the SM ID into an ordered structure and then print the order
(maybe you can even use printf to test this).

As for whether you can assign only one SM either run one block or you can check
the PTX special variable %smid inside the kernel (this also relates to the above explaination
regarding the scheduling logic).

eyal

eyalhir74, many thanks to you, and I will try it.