i learnt that a Warp is a collection of threads executing the same instruction. So basically a codeblock of 512 threads is broken down into
512/32(assuming warp size as 32) = 16 warps.
now if i have a GPU with say an SM(i created) containing 8 Sp’s . How are the Warps allocated.
is it like 1 Warp allocated to the SM for execution or (case 1)
1 warp per SP, making 8 Warps for the SM (i created). (case 2).
Assuming Case 1 is the right undersanding, then i can execute 8 threads from the “same” Warp concurrently. As 8 threads from same warp are executed in 8 SP’s
Assuming Case 2 is the right understanding , then i can execute 8 threads, each belonging to “8 different warps” concurrently.
Also for the above i assume that an SP at its basic level can handle only one thread at a time. so 8 Sp’s handling 8 threads gives a concurrent execution of 8.
I dont plan to go deeper than this level of understanding atleastfor now. Enlightening me on this would be a great help :)