Question:
How can I debug and get the profiler to correctly show the kernel duration on step [6/8] so I don’t have to open it in nsight systems?
Error:
[6/8] Executing 'cuda_gpu_kern_sum' stats report
[libprotobuf ERROR C:\dvs\p4\build\sw\devtools\Agora\Rel\CUDA12.3\Imports\Source\ProtoBuf\protobuf-3_21_1\src\google\protobuf\wire_format_lite.cc:618] String field 'Agent.StatsReportExecutionInfo.output' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
Context:
The following command is working for more simple programs:
nsys profile --stats=true -o vector-add-prefetch-report <program_name.exe>
However, I have a pretty complex code base that does the CUDA kernel launches from within a class member function. I’m not sure why, but as opposed to showing the performance of each kernel I get the following error:
[6/8] Executing 'cuda_gpu_kern_sum' stats report
[libprotobuf ERROR C:\dvs\p4\build\sw\devtools\Agora\Rel\CUDA12.3\Imports\Source\ProtoBuf\protobuf-3_21_1\src\google\protobuf\wire_format_lite.cc:618] String field 'Agent.StatsReportExecutionInfo.output' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
Below is what I would expect to see for example:
[6/8] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- -------- -------- ----------- ----------------------------------------------------------------------------------
100.0 56835 1 56835.0 56835.0 56835 56835 0.0 addImages(const unsigned char *, const unsigned char *, unsigned char *, int, int)
My work around is to open the report and I can see the cuda kernels though being tracked. In addition, I can look at the duration of each kernel. So I don’t believe that the information isn’t getting tracked/profiled