i am having some troubles understanding threads in NVIDIA gpu architecture with cuda.
please could anybody clarify these info: an 8800 gpu has 16 SMs with 8 SPs each. so we have 128 SPs.
i was viewing Stanford’s video presentation and it was saying that every SP is capable of running 96 threads concurrently. does this mean that it (SP) can run 96/32=3 warps concurrently?
moreover, since every SP can run 96 threads and we have 8 SPs in every SM. does this mean that every SM can run 96*8=768 threads concurrently?? but if every SM can run a single Block at a time, and the maximum number of threads in a block is 512, so what is the purpose of running 768 threads concurrently and have a max of 512 threads?
a more general question is:how are blocks,threads,and warps distributed to SMs and SPs? i read that every SM gets a single block to execute at a time and threads in a block is divided into warps (32 threads), and SPs execute warps.