Why sometimes number of issued warps is smaller than the number eligible warps?

dyanab · April 2, 2019, 11:11am

As far as I’m concerned, once there are one or more eligible warps, the scheduler can issue at least one warp. What prevents the scheduler from issuing warps when there are eligible warps?

Thanks

Robert_Crovella · April 2, 2019, 2:38pm

To some degree, this may depend on the GPU you are running on. Recent GPUs partition warps to warp schedulers. If you have 16 available warps, and 4 warp schedulers, then each warp scheduler may be “responsible” for 4 warps. A warp scheduler can usually only issue (at most) 2 instructions from a single warp, per clock cycle. Therefore if the 4 warps assigned to warp scheduler 0 are all eligible, and none of the 12 warps assigned to warp schedulers 1, 2, and 3 are eligible, you will have 4 out of 16 warps eligible, but only 1 or 2 instructions issued (i.e. one issued warp) in that clock cycle, in that SM.

I imagine there may be other possible reasons/examples, as well. For example, suppose all warps are eligible in the above example. You have 16 eligible warps, but only 4 issued warps, in a particular cycle.

dyanab · April 2, 2019, 3:51pm

That is reasonable. Thank you, Robert.

Greg · April 2, 2019, 10:54pm

The number of issued warps is always less than the number of eligible warps as issued warps are a subset of eligible warps. From a data collection standpoint these two counters may be collected on separate passes so there is a small chance that this condition does not hold true if the tool has to replay to collect the counters.

When a warp is launched it is assigned to a SM sub-partition (warp scheduler). The warp will remain on that SM sub-partition until it completes. In the case of instruction level preemption the warp will be saved and restored to the same SM sub-partition. The only exception is CDP preemption which is SW implementation.

On each cycle the warp scheduler will scan the active warps for eligible warps (warps that are not stalled) and select one warp to issue. The micro-scheduler may issue 1 or 2 instructions from the warp. The number is dependent on the architecture.

Nsight compute metrics are collected at the SM subpartition (smsp) level. Other tools collect at the SM subpartition level and display at the SM level.

At the SM sub-partition level there can be 1 - MAX_WARPS_PER_SUBPARTITION active warps. This varies from 8-32 on recent hardware. An active warp is either eligible or stalled. Only 1 eligible warp can be selected each cycle.

If the raw counters are rolled up to the SM level then the worst case is only 1 sub-partition can have warps and the number of eligible warps per SM could be 16 and only 1 is selected. The best case is there is at least 1 eligible warp per sub-partition each cycle so all schedulers can issue instruction each cycle.

https://docs.nvidia.com/nsight-visual-studio-edition/Nsight_Visual_Studio_Edition_User_Guide.htm#Analysis/Report/CudaExperiments/KernelLevel/IssueEfficiency.htm

The GTC2018 talk S9345 - CUDA Kernel Profiling Using NVIDIA Nsight Compute by Magnus Strengert has good coverage of this topic. The slides and recording will be available later this year.

dyanab · April 3, 2019, 3:34am

Very helpful. Thank you for the information, Magnus. :)

Topic		Replies	Views
Increasing number of active warps per scheduler CUDA Programming and Performance	4	2239	January 7, 2022
Can warps from different CTAs be coscheduled? CUDA Programming and Performance	5	200	July 6, 2024
Warps and Occupancy CUDA Programming and Performance	4	4045	April 19, 2011
About the number of CUDA cores in SMSP, less or gerater than warp threads number(32) CUDA Programming and Performance	8	779	June 17, 2024
How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated? CUDA Programming and Performance cuda , board-design	4	1012	September 28, 2023
How to keep the float pipe busy? CUDA Programming and Performance	7	703	April 23, 2019
How to the A100 GPU’s maximum warps per scheduler CUDA Programming and Performance	3	255	July 17, 2024
Clarifing the process of issuing instructions on CUDA devices CUDA Programming and Performance	5	318	March 26, 2024
Warp scheduling - have I got this right? CUDA Programming and Performance	17	12098	February 12, 2013
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8302	September 11, 2009

Why sometimes number of issued warps is smaller than the number eligible warps?

Related topics