I have a problem that if I have and only have 2 blocks, whether they would be placed in two SMs respectively or simply be put in one SM? I think if they are placed in two SMs, they can execute concurrently. Of course, the situationis also possible that if they are placed in one SM, the overhead of each block can be hiden better. So, I simply wonder how GPU handles 2 blocks?
As far as I know, they are placed in different SM’s.
So, the processing time for 1 block which has 1 thread would be almost the same as 2 blocks, each of which has 1 thread and the 2 threads are identical and totally independent. Is that true?
Yes, and more over, 32 threads in a block will have the same running time as 1 thread, due to the way the hardware is pipelined.