Record profiled data when ending program with ctrl-c

jinha.chung · March 16, 2020, 4:57am

Hi. I’m trying to profile the GPU usage of a TensorRT server-client model.
Here’s what I’m doing:

Run nvprof --profile-all-processes -o results%p.nvvp on one docker container’s terminal.
Start running TensorRT server on a different terminal in the same docker container as 1’s.
→ Up to this point, 1’s nvprof recognizes there is a process running, since it shows NVPROF is profiling process 920, command: /opt/tensorrtserver/bin/trtserver --model-store=/modelstore --allow-profiling=true --allow-metrics=true --allow-gpu-metrics=true on its terminal.
Start running TensorRT client on a different terminal
→ 3 works fine as well, because the correct results are shown on its terminal.

Now, when the client request of 3 is finished, it exits normally.
However, since TensorRT server (from 2) is still running. As far as I know, the only way to turn off TensorRT server is by killing it with ctrl-c. However, after this, on 1’s terminal, it reads: ==920== Error: Internal profiling error 4087:35.
I believe this is because I have ended the TensorRT server with ctrl-c. And because of this, when I end nvprof (with ctrl-c, like it tells you to), the result is only a 380KB file, and when opened with nvvp, the file has no information about any timeline whatsoever.

Is there a way to save profiling results when the program is exited abnormally (via ctrl-c)? Or is there any workaround using nvprof with TensorRT servers?

Thanks in advance!

mjain · April 6, 2020, 8:32am

Hi Jinha,

I suspect you ran into the security issue due to which nvprof has to disable the profiling support for non-root users. For instructions on enabling permissions please refer https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters. A quick solution is to run as the root.

Are you able to profile any simple application otherwise? Please try without option --profile-all-processes
$nvprof <application>

If none of the solutions work, please provide details of the CUDA toolkit version.

Topic		Replies	Views
nvprof fails with TensorRT inference task Visual Profiler and nvprof	1	870	August 21, 2018
how to profile cycle program with nvprof. Visual Profiler and nvprof	1	1168	October 3, 2019
nvprof becomes unresponsive CUDA Programming and Performance	6	957	June 27, 2018
No events/metrics were profiled when use nvprof in CUDA 10.1.168 Visual Profiler and nvprof	5	5054	December 14, 2019
Nvidia CUDA profiler is not able to profile certain code Visual Profiler and nvprof	5	489	July 10, 2020
Unable to use nvprof on Tx1 R24.2.1 Jetson TX1	4	826	October 18, 2021
Nvprof issue when profiling docker + pytorch / tensorRT on AGX Xavier Jetson AGX Xavier tensorrt , docker , pytorch , deep-learning-profiler , profiling	3	1710	September 17, 2021
nvprof: Warning: The user does not have permission to profile on the target device. Visual Profiler and nvprof	20	25742	October 12, 2021
Cannot profile RTX 2060 KO (TU104) with CUDA 11.0 on windows and ubuntu Visual Profiler and nvprof nvbugs	8	2768	July 27, 2020
nvprof with tensorflow is suspiciously slow CUDA Programming and Performance	7	1548	January 19, 2019

Record profiled data when ending program with ctrl-c

Related topics