When a warp is issued to a SM, are all the threads executed on one core (8 warps can be executed simultaneously on one SM) or divided among all the cores on one SM ?
I guess,they are divided among cores. If so, can someone please explain the sequence in which the threads from a warp are issued to cores ?
(Assuming 8 cores/SM) Is it like :
thread 0,1,2 …7 on core 1 ( one after the other), thread 8,9,10…15 on core 2 , thread 16,17…23 on core 3,thread 24 …31 on core 4
thread 0,1,2 …7 on core 5, thread 8,9,10…15 on core 6 , thread 16,17…23 on core 7,thread 24 …31 on core 8
This approach would ensure that two warps are executed simultaneously on one SM.
All the threads from only one warp are distributed across all the cores on one SM. Each warp is divided into 4 parts.
thread 0,8,16,24 on core 1(one after the other), thread 1,9,16,25 on core 2 and so on …
I guess ,it should be approach 2. But I am not sure. Can some one please help me with this !
Thanks and Regards