How to profile overall SM utilization of the program by Nsight Compute?

First, could I get time for each kernel bysm__cycles_elapsed.avg / sm__cycles_elapsed.avg.per_second?
Second, since I can profile sm__throughput.avg.pct_of_peak_sustained_elapsed,how could I compute the program level utilization? Should I use kernel_time (from first question) * sm__throughput.avg.pct_of_peak_sustained_elapsed then divide by total program duration?

Best
Max

Are you looking for some type of value representing how much the SMs were used compared to the entire application (which may include CPU time etc…)? For example, if the program took 10 seconds, the kernel took 5 of those seconds, and during the kernel there was an average of 50% SM utilization, then the value you’re looking for is (0.5 x 5)/10 = 25%

If that’s the case, then you’re on the right track. The only thing I would mention is that instead of calculating kernel time with the formula above you could use the gpu__time_duration.sum which calculates it directly.

If you’re looking for something else, please clarify. Thanks.

1 Like

Hi jmarusarz.
Thanks, that is what I am looking for.
Also, if I plan to profile the memory bandwidth usage of the program, should I use gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed or gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed?

Best
Max

gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed is what you’re looking for if you want to know about the GPU DRAM usage. The other metric include other levels of the memory hierarchy like L1 and L2.