How to profile overall SM utilization of the program by Nsight Compute?

tianyu9748 · February 7, 2023, 4:01pm

First, could I get time for each kernel bysm__cycles_elapsed.avg / sm__cycles_elapsed.avg.per_second?
Second, since I can profile sm__throughput.avg.pct_of_peak_sustained_elapsed,how could I compute the program level utilization? Should I use kernel_time (from first question) * sm__throughput.avg.pct_of_peak_sustained_elapsed then divide by total program duration?

Best
Max

jmarusarz · February 9, 2023, 10:04pm

Are you looking for some type of value representing how much the SMs were used compared to the entire application (which may include CPU time etc…)? For example, if the program took 10 seconds, the kernel took 5 of those seconds, and during the kernel there was an average of 50% SM utilization, then the value you’re looking for is (0.5 x 5)/10 = 25%

If that’s the case, then you’re on the right track. The only thing I would mention is that instead of calculating kernel time with the formula above you could use the gpu__time_duration.sum which calculates it directly.

If you’re looking for something else, please clarify. Thanks.

tianyu9748 · February 10, 2023, 5:38pm

Hi jmarusarz.
Thanks, that is what I am looking for.
Also, if I plan to profile the memory bandwidth usage of the program, should I use gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed or gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed?

Best
Max

jmarusarz · February 16, 2023, 9:17pm

gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed is what you’re looking for if you want to know about the GPU DRAM usage. The other metric include other levels of the memory hierarchy like L1 and L2.

tianyu9748 · May 4, 2023, 2:16am

Hi jmarusarz,
I have a new question.
How could I have the SM occupancy?

I see there is a metric, Achieved Occupancy, which is it the ratio of the average active warps to the maximum active warps allocated for the kernel.

Does this mean if the kernel is allocated 10 wraps, and 6 wraps are used for computation in average, then Achieved Occupancy is 60%?

Or should I use the metric Speed of Light SM [%] x Achieved Occupancy to get the actually occupied wraps (threads)?

Best
Max

jmarusarz · May 4, 2023, 8:10pm

Achieved Occupancy is active warps/active cycles. The value represents how many warps were active on average for a given cycle. For example, on GA100 this would be between 0 and 16. This occupancy can be impacted by the way your application divides the work and also hardware resource limitations like the register file, shared memory etc…

Speed of Light SM [%] x Achieved Occupancy isn’t really a calculation we would use.

tianyu9748 · May 5, 2023, 2:56pm

Do you mean there is no way to get the kernel SM occupancy?

jmarusarz · May 11, 2023, 9:44pm

I’m not sure what you mean by “kernel SM occupancy”. Could you expand on that term to explain exactly what you are looking for?

tianyu9748 · July 25, 2023, 2:56am

Hi Jmarusarz,
Do you mean the max_warps_per_sm for a100 is 16?

Also, I find a formula about Achieved Occupancy, which equal to (Active_warps / Active_cycles) / max_warps_per_sm.

As you said before, current metric, achieved_occupancy, is equal to (active_warps / active_cycles) right?

jmarusarz · July 27, 2023, 8:20pm

No, max_warps_per_sm for a100 64. The 16 is warps per warp scheduler and there are 4 per SM.
On a100 (Active_warps / Active_cycles) / max_warps_per_sm is going to be a percentage, while (active_warps / active_cycles) will be a fraction between 1 and 16.

Topic		Replies	Views
Achieved occupancy reported at nsight compute Nsight Compute	2	965	July 23, 2021
What exactly does SM Active Cycles mean? Nsight Compute	3	717	July 30, 2024
Metric references and description Nsight Compute	7	4251	March 2, 2024
What does Achieved Active Warps Per SM in Nsight means and how to calculate it? Nsight Compute cuda	3	1174	October 12, 2021
Best way to extract SM Occupancy and execution time Profiling Linux Targets	5	29	December 13, 2024
Utilization report in Nsight Systems Profiling Linux Targets	2	197	July 4, 2024
Calculating number of active SMs Nsight Compute	2	421	April 17, 2023
Occupancy question Nsight Compute	2	123	December 13, 2024
Profiling device memory bandwidth utilization Nsight Compute	5	2666	September 5, 2022
How to get real-time SM occupancy Nsight Visual Studio Edition pytorch	1	2084	August 16, 2022

How to profile overall SM utilization of the program by Nsight Compute?

Related topics