Achieved occupancy reported at nsight compute

beiwang · June 25, 2021, 4:01pm

I have recently worked on the profiling of an application and puzzled by the achieved occupancy # reported at nsight compute.

Nsight compute reports active warps per scheduler in scheduler statistics section and achieved occupancy in occupancy section. My understanding is if we divide the active warps per scheduler by the maximum warps per scheduler, we will get an achieved occupancy, roughly the same as the achieved occupancy reported in occupancy section.

However, this is not the case for the application I am looking at. I am wondering what causes the difference. By checking the metrics used for active warps per scheduler and achieved occupancy, I found that active warps per scheduler uses smsp_warps_active.avg.per_cycle_active and achieved occupancy uses sm_warps_active.pct_of_peak_sustained_active. I am wondering if the first one is the average over all warp schedulers for a particular SM and the later the the average over all warp scheduler for all SMs? So if we see very different numbers for occupancy here, it may suggest load imbalance for different SMs?

Greg · July 23, 2021, 5:02pm

The table shows a time line in elapsed cycles from 0 to 24 of a single SM with 4 SM sub-partitions (SMSP). A thread block is launched that has 256 threads == 8 warps. Each SMSP is allocated 2 warps. The higher warps exit early resulting in imbalance between the SMSP.

Nsight Compute is focused on single kernel profiling. The assumption is the GPU is active 100% of the elapsed cycles as the PM system will measure from the launch of the grid to the completion of the grid. The result is that Nsight Compute tends to use the cycles_active to convert to a percentage vs. cycles_elapsed. Timeline based tools such as Nsight Systems and Nsight Graphics GPU Trace use cycles_elapsed as there is no expectation that the GPU will be active.

sm__cycles_active increments if the SM has at least 1 active warp.
smsp__cycles_active increments if the SMSP has at least 1 active warp.

From the right side of the time table it can be observed that SMSP3 is idle >75% of the active cycles and 75% of the sm__cycles_active.

It is useful to compare SM level statistics to SMSP level statistics to determine load balancing issues.

This can be done using numerous comparisons.

First - Try to make sure all SMs are active during the measurement period. This may not always be possible if the grid is small. In this case you would want to try to overlap additional independent work.
sm__cycles_active.avg.pct_of_peak_sustained_elapsed (how often were SMs active)

Second - Try to make sure all SMSPs are active.
sm__cycles_active.avg.pct_of_peak_sustained_elapsed (how often were SMSPs active)
smsp__cycles_active.avg vs. sm__cycles_active.avg
smsp__cycles_active.min vs. sm__cycles_active.max

Third - Compare the occupancy between SMSP and SM to see how imbalanced the work may be.
smsp__warps_active.avg.pct_of_peak_sustained_elapsed vs. sm__warps_active.avg.pct_of_peak_sustained_elapsed
You can also compare .min vs. max

beiwang · July 23, 2021, 5:34pm

Many thanks for the updates. This is very helpful to understand the discrepancy in the earlier observations.

Topic		Replies	Views
What does Achieved Active Warps Per SM in Nsight means and how to calculate it? Nsight Compute cuda	3	1370	October 12, 2021
How to profile overall SM utilization of the program by Nsight Compute? Nsight Compute	9	2542	July 27, 2023
Theoretical and Achieved Occupancy metrics Nsight Compute	5	460	June 6, 2025
Calculating number of active SMs Nsight Compute	2	520	April 17, 2023
The result of the achieved occupancy did not make sense Nsight Compute	1	465	September 7, 2020
How is occupancy calculated in the profiler? CUDA NVCC Compiler	3	85	May 17, 2025
Question about NVIDIA Visual Profiler's occupancy results CUDA Programming and Performance	2	1024	May 29, 2019
Achieved Occupancy vs Theoretical CUDA Programming and Performance	6	5432	September 20, 2011
Metric references and description Nsight Compute	7	5126	March 2, 2024
How to get real-time SM occupancy Nsight Visual Studio Edition pytorch	1	2372	August 16, 2022

Achieved occupancy reported at nsight compute

Related topics