Hello,
I am currently using nsys
for profiling my CUDA application and I would like to generate a statistical report that groups kernels by families rather than individual instances. For example, if I have multiple instances of the same kernel function (like multiple instances of GEMM), I want them to be aggregated together in the report under a single category (e.g., “GEMM”).
Is there a way to achieve this using nsys
? I couldn’t find a direct option for kernel grouping by families. Any advice or guidance on how to approach this would be greatly appreciated!
Thank you.