How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated?

Robert_Crovella · July 14, 2023, 1:45pm

Yes. When a threadblock is deposited on a SM by the CWD/block scheduler, the warps in that threadblock are statically assigned to SMSPs (SM sub-partitions). Each sub-partition has a single warp scheduler, so this is like saying the warps are statically assigned to each of the warp schedulers. If there is only one warp scheduler, all warps will be assigned to that. If there are two warp schedulers, about half of the warps will be assigned to one (assuming the SM is empty) and about half will be assigned to the other. If there are 4 warp schedulers in the SM, and assuming an initially empty SM, then the warps will be distributed approximately 1/4 to each warp scheduler. Certain functional unit resources in a SM are also partitioned between the SMSPs. So a SM with 64 “cuda cores” and 4 warp schedulers means that each SMSP/warp scheduler actually only has 16 “cuda cores” to use or assign instructions to.

A warp scheduler always schedules (i.e. issues) instructions warp-wide. Any time a warp scheduler needs to schedule an instruction for which there are less than 32 of the corresponding supporting functional units available, the warp scheduler will schedule that instruction over multiple clock cycles. If there are 16 units available, it will take 2 cycles. If there are 8 units available, it will take 4 cycles. If there are 4 units available, it will take 8 cycles, and if there are 2 units available (such as would be the case for a FP64 instruction) it will take 16 cycles, to schedule that instruction.

Topic		Replies	Views
How is a warp executed on a SM CUDA Programming and Performance hw , cuda	0	330	September 7, 2020
Streaming multriprocessors and processing blocks CUDA Programming and Performance	3	869	January 8, 2024
About the number of CUDA cores in SMSP, less or gerater than warp threads number(32) CUDA Programming and Performance	8	978	June 17, 2024
Thread Scheduling Concept CUDA Programming and Performance	3	3848	June 21, 2012
Warp thread Scheduling CUDA Programming and Performance	7	2326	June 28, 2010
Warp scheduling - have I got this right? CUDA Programming and Performance	17	12407	February 12, 2013
What Is The Relation Between Warp And SM Processing Block? CUDA Programming and Performance	1	1985	May 25, 2018
Blocks/Warps/Threads Allocation I have some doubts about the allocation of blocks/warps/thread in CU CUDA Programming and Performance	5	2656	November 1, 2012
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8413	September 11, 2009
Inquisitive about SP cores in SMs CUDA Programming and Performance	3	1448	October 1, 2009

How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated?

Related topics