in the output of “nsys profile --trace=cuda”, I see that kernels with long names are truncated? How can I get mote information about the kernel name? Because of the multiple arguments, the exact kernel name is somewhere in the long template name.
4,0 331.322.797 450 736.272,0 735.556 736.933 void cutlass::Kernel<cutlass_tensorop_s1688fprop_optimized_tf32_64x256_16x4>(cutlass_tensorop_s1688…
I moved your post into the Nsight Systems forum.
I assume you are using the most recent release of Nsight Systems (2021.1 or later). If so, check out the help for the nsys CLI’s stats command - i.e. execute the ‘nsys stats --help’ command.
This nsys CLI command will output the full names of the kernels used during the collection to a CSV file. If the QDREP file is named report1.qdrep, the resulting CSV file will be named report1_gpukernsum.csv.
‘nsys stats --report gpukernsum --format csv --output . report1.qdrep’