nvprof fails with TensorRT inference task


I am using nvprof to analyse the compute and data transfer performance of P4 on
GoogleNet pre-trained network .

If I run TensorRT without nvprof the inference engine is generated and I can see the smi and giexec log.

For some reason when i run the same giexec with nvporf options it gets stuck and eventually crashes.

Here are the nvprof options i used :

nvprof --events all --print-gpu-trace --system-profiling on -u ms --log-file my_file_%p.log (followed by giexec command)

Is this an nvprof bug, or a certain configuration flag/plug (to nvprof) that I am missing?

Thanks for your help.

Hi, g.amardeep

This looks like the well-known slowdown caused due to kernel serialization.

Suggest you run the application using nvprof for a couple of minutes and then kill it. You should get usable results.

Please also refer answer in https://devtalk.nvidia.com/default/topic/1015752/b/t/post/5184891/?offset=7#5186940