Why is stall reason metric calculated based on issue active cycles in the Warp State Statistics?

FlyK · July 7, 2023, 3:26am

Why is stall reason metric calculated based on issue active cycles? Doesn’t that make the unit “warp/cycle”, which is not aligned with unit “cycle/inst” of “Warp Cycles Per Issued Instruction” ?

Is this calculation related to the equality between inst_issued and issue_cycle metrics?

So I’m confused about why the X-axis of the histogram does not show cycle per instruction, while the corresponding stall reason metric is actually warp per cycle?
Thanks.

jmarusarz · July 13, 2023, 8:53pm

The metric you have circled in red is a ratio. It is calculated as the number of warps stalled per cycle in this state times the total number of cycles. And then, since we don’t issue on every cycle we need to divide by the total number of cycles where an instruction was issued. This gives us the average amount of cycles a warp spends in this state per issued instruction. You could also label the x-axis “Cycles per Issued Instruction” but we use “Cycles per Instruction” for brevity.

FlyK · July 26, 2023, 2:11am

I have another question that I don’t quite understand. Why are inst_issued and issue_cycle metrics the same? Isn’t this a multiple issue situation?

Greg · July 26, 2023, 7:04pm

CC 2.x (Fermi) - CC 6.x (Pascal) supported dual-issue of instructions per SM sub-partition warp scheduler (SMSP).
CC 7.0 (Volta) - CC 9.0 (Hopper) support single-issue of instructions per SM sub-partition warp scheduler (SMSP).

For dual-issue architectures smsp__inst_issued.avg can be up to 2 x smsp__issue_active.avg. For single-issue architectures the counters will output the same value.

Warps report stall reasons per cycle. For normalization the tools use cycles that a warp issued and instruction vs. instructions issued or executed. Both can be interesting. For CC 7.0 - 9.0 the ratio of issue_active and inst_executed is 1.0 in most cases; however, there are some small cases where more instructions are issued than executed (retired).

FlyK · July 27, 2023, 3:31am

Thank you very much. I don’t have any other questions.

system · August 10, 2023, 3:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How are the cycles of different warp stall reasons calculated?(In the section warp state statistics) Nsight Compute	1	507	September 6, 2022
Stall cycle of SM Nsight Compute	1	591	July 15, 2020
Stall reasons summation is not 100% Nsight Compute	7	1032	October 12, 2021
nvprof metrics: issue_slot_utilization and stall_other CUDA Programming and Performance	1	998	December 13, 2018
Difference sm__cycles_elapsed/smsp__cycles_elapsed and sm__inst_executed/smsp__inst_executed? Nsight Compute performance-metrics	6	1906	February 16, 2022
Memory Workload Analysis related metrics Nsight Compute	1	1905	January 30, 2020
what is IPC(Instructions Per Cycle)? CUDA Programming and Performance	2	3154	October 15, 2018
IPC at device level Nsight Compute	3	647	October 26, 2021
Reasons for encountering stalls of type "misc" Nsight Compute	2	874	January 20, 2020
How to analysis the stall wait in this HMMA case Nsight Compute	3	535	October 31, 2024

Why is stall reason metric calculated based on issue active cycles in the Warp State Statistics?

Related topics