Eligible/Stalled warps

Abdopensky · April 10, 2020, 10:44am

Hello,

I am doing an example in order to understand how to improve performance through the eligible active warps and here is the code:

__global__ void LAUNCHBOUNDS(1024) kernel()
{
	float test = 1.0f;
}

int main ()
{
kernel << <102400 / 1024, 1024>> > ();
}

By monitoring the Nsight Compute, it seems that there is too high stalled warps at the launch as you can see on the next figure :

Do you know if there is something wrong on my settings?

Thanks

felix_dt · June 5, 2020, 8:35am

You would want to inspect the Warp State Statistics section next, to identify why those warps are stalled and are not eligible. This section will show you the individual warp stall reasons that were found, described in detail here: Nsight Compute :: Nsight Compute Documentation

You can also check the “Sampling Data” metrics on the Source page to see where those stalls occur in the code, even though for your trivial code that might not provide too much additional insight. For larger codes, it can be very valuable.

Greg · June 8, 2020, 10:16pm

kernel has 0 side effects so it will be a null kernel. The kernel will have only 3 instructions.

MOV R1, c[0x0]0x28] ; setup stack pointer
...
EXIT

If you look at the Warp State Statistics (using metric names) you will find the following:

smsp__average_warps_issue_stalled_barrier_per_issue_active.ratio [inst]             0.00
smsp__average_warps_issue_stalled_dispatch_stall_per_issue_active.ratio [inst]      0.00
smsp__average_warps_issue_stalled_drain_per_issue_active.ratio [inst]               3.40
smsp__average_warps_issue_stalled_imc_miss_per_issue_active.ratio [inst]           36.67
smsp__average_warps_issue_stalled_lg_throttle_per_issue_active.ratio [inst]         0.00
smsp__average_warps_issue_stalled_long_scoreboard_per_issue_active.ratio [inst]     0.00
smsp__average_warps_issue_stalled_math_pipe_throttle_per_issue_active.ratio [inst]  0.13
smsp__average_warps_issue_stalled_membar_per_issue_active.ratio [inst]              0.00
smsp__average_warps_issue_stalled_mio_throttle_per_issue_active.ratio [inst]        0.12
smsp__average_warps_issue_stalled_misc_per_issue_active.ratio [inst]                0.00
smsp__average_warps_issue_stalled_no_instruction_per_issue_active.ratio [inst]      6.22
smsp__average_warps_issue_stalled_not_selected_per_issue_active.ratio [inst]        0.55
smsp__average_warps_issue_stalled_selected_per_issue_active.ratio [inst]            1.00
smsp__average_warps_issue_stalled_short_scoreboard_per_issue_active.ratio [inst]    0.00
smsp__average_warps_issue_stalled_sleeping_per_issue_active.ratio [inst]            0.00
smsp__average_warps_issue_stalled_tex_throttle_per_issue_active.ratio [inst]        0.00
smsp__average_warps_issue_stalled_wait_per_issue_active.ratio [inst]                3.51

The average warp spent majority of time waiting on the initial imc_miss and waiting to fetch an instruction.

Basically, this test is measuring the overhead to launch a warp, miss in the constant cache, miss in the instruction cache, and exit.

Topic		Replies	Views
Stall reasons summation is not 100% Nsight Compute	7	1014	October 12, 2021
What cause dispatch stall? How to avoid it? Nsight Compute cuda	11	1754	February 9, 2023
Stalll reasons CUDA Programming and Performance	1	594	May 2, 2020
Kernel with very low eligible warps despite fully coalesced memory access CUDA Programming and Performance	7	1039	July 17, 2023
Memory Workload Analysis related metrics Nsight Compute	1	1893	January 30, 2020
Reasons for encountering stalls of type "misc" Nsight Compute	2	864	January 20, 2020
Why is the sm__warps_active so high Nsight Compute	2	121	March 24, 2025
NSight : How to calculate FLOP/s that's close to achieved FLOP/s CUDA Programming and Performance	3	3059	October 4, 2017
Optimize CUDA kernel with low eligible warps and stall long scoreborad CUDA Programming and Performance cuda	0	214	July 11, 2023
Warp stalls are concentrated on "LDL" instructions Nsight Compute	3	681	April 27, 2023

Eligible/Stalled warps

Related topics