Nsight compute has an option “–print-summary per-kernel” which shows the average statistics for a given kernel which invoked multiple times. I have noticed that for one kernel, my_kernel, I see multiple statistics because the grid sized are different. For example:
my_kernel, block size = 128, grid size=1000
ipc_avg = 0.4
my_kernel, block size = 128, grid size=1001
ipc_avg = 0.5
For multiple kernels with multiple grid sizes, the number of reported stats are large. I tried “per-gpu” too, but got the same result.
Moreover, I would like to know if it is possible to sum up a given metric for kernels being profiled? Otherwise I have to manually write some bash/awk/python script to extract the value of the metric in each section and then sum them up.
Any idea about that?