Question about grouping kernels for statistical profiling in nsys

Hello,

I am currently using nsys for profiling my CUDA application and I would like to generate a statistical report that groups kernels by families rather than individual instances. For example, if I have multiple instances of the same kernel function (like multiple instances of GEMM), I want them to be aggregated together in the report under a single category (e.g., “GEMM”).

Is there a way to achieve this using nsys? I couldn’t find a direct option for kernel grouping by families. Any advice or guidance on how to approach this would be greatly appreciated!

Thank you.

Nsys doesn’t currently have any way to define “kernel families” that would allow you to do what you are asking for.

However, if you wrapped the kernel families with consistent NVTX ranges, you could extract the data in terms of NVTX ranges.