How is a warp executed on a SM

673362907 · September 7, 2020, 10:47am

Dear all,

In the Turing GPU, an SM consists of 64 CUDAcores, the hardware resource would be split into four portions, and each one owns 16 CUDA cores + 1 warp scheduler/dispatch.

For a warp, it has 32 threads and how is a ready warp executed on a portion? If the next instruction for this warp is a float calculation instruction and all the resources are ready for the warp, the warp completes 32 instructions over two clock, right?

By the way, from a CUDA textbook, there is a rule for active warp to run, which is there are 32 free CUDA cores. In the truing architecture, a warp scheduler only controls 16 CUDA cores, do it violates the rule?

Many thanks for any replies.

Topic		Replies	Views
How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated? CUDA Programming and Performance cuda , board-design	4	1065	September 28, 2023
Are Turing's CUDA kernels divided into 4 partitions managed by 4 warp schedulers? CUDA Programming and Performance	3	868	March 10, 2022
Execution of a warp CUDA Programming and Performance	0	460	November 28, 2013
Wrap size depending on the number of SP/SM CUDA Programming and Performance	1	11466	March 10, 2011
CUDA hardware level: Streaming Multiprocessor CUDA Programming and Performance	1	2641	April 27, 2015
How do CUDA cores on a SM execute warps concurrently? CUDA Programming and Performance	8	28716	July 4, 2019
Execution of warps CUDA Programming and Performance	1	1552	January 7, 2009
how many threads concurrently run at a clock? CUDA Programming and Performance	3	1427	April 15, 2009
Any need to revise the principle "Threads in a half-warp are SIMT synchronous" ? CUDA Programming and Performance	1	693	July 30, 2013
questions about sp and sm CUDA Programming and Performance	5	4027	June 19, 2019

How is a warp executed on a SM

Related topics