Are Turing's CUDA kernels divided into 4 partitions managed by 4 warp schedulers?

439290087 · February 21, 2022, 9:22am

Assuming there is a job only needs 32 threads to execute, what’s the best schedule for performance?
Executing all work in one warp or executing them on 4 warps each warp has 8 active threads to run.

I’m not sure about whether a warp scheduler is able to access all CUDA cores within on sm.

Robert_Crovella · February 23, 2022, 8:16pm

It’s not clear what you are proposing.

If you are suggesting to launch a thread block of 32 threads, that will be composed of 4 warps of 8 threads each, that simply cannot be done.

If you are proposing to launch a threadblock of 128 threads (4 warps), where each warp only has 8 active threads, I don’t know of any reason that would be faster than a single warp of 32 threads in the general case.

A warp scheduler is not able to access all the CUDA cores in Turing, but I see no reason why breaking a schedulable instruction for 32 CUDA cores into 4 instructions each of which requires 32 CUDA cores would provide any benefit.

Anyway, you could always benchmark it.

439290087 · February 24, 2022, 2:47am

My initial assumption is that a warp that has 8 active threads doesn’t occupy 32 CUDA cores, maybe 8 CUDA cores.

breaking a schedulable instruction for 32 CUDA cores into 4 instructions each of which requires 32 CUDA cores

If this is the case, it indeed doesn’t provide any benefits. Thanks for your explanation!

Topic		Replies	Views
Thread Scheduling Concept CUDA Programming and Performance	3	3893	June 21, 2012
About the number of CUDA cores in SMSP, less or gerater than warp threads number(32) CUDA Programming and Performance	8	1047	June 17, 2024
Warp thread Scheduling CUDA Programming and Performance	7	2360	June 28, 2010
How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated? CUDA Programming and Performance cuda , board-design	4	1303	September 28, 2023
How is a warp executed on a SM CUDA Programming and Performance hw , cuda	0	348	September 7, 2020
Warp scheduling - have I got this right? CUDA Programming and Performance	17	12493	February 12, 2013
cuda CUDA Programming and Performance	1	1228	August 6, 2009
Scheduling threads as Warps CUDA Programming and Performance	3	955	July 11, 2013
How do CUDA cores on a SM execute warps concurrently? CUDA Programming and Performance	8	28988	July 4, 2019
Execution of a warp CUDA Programming and Performance	0	495	November 28, 2013

Are Turing's CUDA kernels divided into 4 partitions managed by 4 warp schedulers?

Related topics