Nsys profile "[6/8] Executing 'cuda_gpu_kern_sum' stats report" is not showing individual kernels

Question:
How can I debug and get the profiler to correctly show the kernel duration on step [6/8] so I don’t have to open it in nsight systems?

Error:

[6/8] Executing 'cuda_gpu_kern_sum' stats report
[libprotobuf ERROR C:\dvs\p4\build\sw\devtools\Agora\Rel\CUDA12.3\Imports\Source\ProtoBuf\protobuf-3_21_1\src\google\protobuf\wire_format_lite.cc:618] String field 'Agent.StatsReportExecutionInfo.output' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

Context:
The following command is working for more simple programs:

nsys profile --stats=true -o vector-add-prefetch-report <program_name.exe>

However, I have a pretty complex code base that does the CUDA kernel launches from within a class member function. I’m not sure why, but as opposed to showing the performance of each kernel I get the following error:

[6/8] Executing 'cuda_gpu_kern_sum' stats report
[libprotobuf ERROR C:\dvs\p4\build\sw\devtools\Agora\Rel\CUDA12.3\Imports\Source\ProtoBuf\protobuf-3_21_1\src\google\protobuf\wire_format_lite.cc:618] String field 'Agent.StatsReportExecutionInfo.output' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

Below is what I would expect to see for example:

[6/8] Executing 'cuda_gpu_kern_sum' stats report

 Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                         Name
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------
    100.0            56835          1   56835.0   56835.0     56835     56835          0.0  addImages(const unsigned char *, const unsigned char *, unsigned char *, int, int)

My work around is to open the report and I can see the cuda kernels though being tracked. In addition, I can look at the duration of each kernel. So I don’t believe that the information isn’t getting tracked/profiled

I suggest asking this question on the nsight systems forum. (profiling windows targets)

Okay, I’ll do this!

Sorry about any overhead and thank you for letting me know!

I can move the question if you like.

1 Like

That’s okay, I’ll do it!