Understanding fermi warp scheduler

Asiri_Rathnayake · December 2, 2011, 7:51pm

Hi All,

I’m trying to understand how the Fermi warp scheduler works. So far I have figured out the following:

Each thread-block (TB) is assigned to a specific streaming-multiprocessor (SM). A TB is never migrated between two SMs because the shared memory (TB local) and the thread-contexts of the TB are allocated on the assigned SM.
Each SM has two instruction dispatch units each of which issues instructions to 16 CUDA cores (and few other units) in parallel (32 threads in total).
Each SM picks two different warps and schedules them half-by-half (first 16 threads from the first warp and the second 16 threads from the second warp) using the two dispatch units onto the 32 cores. I guess those half-warp memory coalescing restrictions come from this half-warp scheduling / dispatch policy (?).

My question is, those two different warps, do they come from the same TB or could they be from two different TBs assigned to the same SM? I’m also a bit puzzled about how this dual-issue mechanism increases efficiency (as opposed to having one dispatch unit per SM and scheduling one whole warp at once), any explanations are very welcome!

I know I don’t need this information to program in CUDA External Image this is for a little survey (research) External Image

Many thanks!

Topic		Replies	Views
How do CUDA cores on a SM execute warps concurrently? CUDA Programming and Performance	8	28699	July 4, 2019
Execution of a warp CUDA Programming and Performance	0	460	November 28, 2013
Scheduler concept inside FERMI CUDA Programming and Performance	2	7245	March 25, 2011
warp scheduler of Fermi architecture CUDA Programming and Performance	2	3209	February 5, 2012
How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated? CUDA Programming and Performance cuda , board-design	4	1050	September 28, 2023
Does a CUDA thread get assigned to a specific core from the start and until it finishes execution? CUDA Programming and Performance	12	4786	May 2, 2011
Can threads in a warp from different blocks? CUDA Programming and Performance	17	11836	March 26, 2010
Tesla Fermi card thread scheduling CUDA Programming and Performance	1	805	August 14, 2014
Fermi doesn't keep all execution units busy? CUDA Programming and Performance	2	4756	February 24, 2010
Are Turing's CUDA kernels divided into 4 partitions managed by 4 warp schedulers? CUDA Programming and Performance	3	868	March 10, 2022

Understanding fermi warp scheduler

Related topics