Wrap size depending on the number of SP/SM


I’m looking to understand how the NVIDIA GPU schedules the threads and I read a SP can load 4 threads at once. That should explain why wraps for a GPU possessing SM with 8 CUDA cores posseses a size of 32 (4*8). Is the size of wraps a function of the ration SP/SM ? If the SM have 16 SP, does the wraps have a size of 64 ?

Thanks for your help,

Best regards.

The warp size is 32 on every generation of CUDA compatible hardware there is. Threads are always scheduled in warps of 32. On 8 core SM GPUs, an instruction is effectively executed 4 times on the 8 cores to service 32 threads in a warp. On 32 core SM GPUs, instructions are executed twice on 16 cores for a warp of 32 threads, with two active warps at a time per SM.