Getting cummulative stats from profiler

Nsight compute has an option “–print-summary per-kernel” which shows the average statistics for a given kernel which invoked multiple times. I have noticed that for one kernel, my_kernel, I see multiple statistics because the grid sized are different. For example:

my_kernel, block size = 128, grid size=1000
ipc_avg = 0.4
my_kernel, block size = 128, grid size=1001
ipc_avg = 0.5

For multiple kernels with multiple grid sizes, the number of reported stats are large. I tried “per-gpu” too, but got the same result.

Moreover, I would like to know if it is possible to sum up a given metric for kernels being profiled? Otherwise I have to manually write some bash/awk/python script to extract the value of the metric in each section and then sum them up.

Any idea about that?

There is no mode to summarize kernels across different grid launch configurations. The easiest way to post-process this data right now will be to export the --page raw as --csv, with --print-units base, so that metric values of different orders of magnitude can be more easily combined.

