Hi. I’m trying to profile the GPU usage of a TensorRT server-client model.
Here’s what I’m doing:
- Run
nvprof --profile-all-processes -o results%p.nvvp
on one docker container’s terminal. - Start running TensorRT server on a different terminal in the same docker container as 1’s.
→ Up to this point, 1’s nvprof recognizes there is a process running, since it showsNVPROF is profiling process 920, command: /opt/tensorrtserver/bin/trtserver --model-store=/modelstore --allow-profiling=true --allow-metrics=true --allow-gpu-metrics=true
on its terminal. - Start running TensorRT client on a different terminal
→ 3 works fine as well, because the correct results are shown on its terminal.
Now, when the client request of 3 is finished, it exits normally.
However, since TensorRT server (from 2) is still running. As far as I know, the only way to turn off TensorRT server is by killing it with ctrl-c. However, after this, on 1’s terminal, it reads: ==920== Error: Internal profiling error 4087:35.
I believe this is because I have ended the TensorRT server with ctrl-c. And because of this, when I end nvprof (with ctrl-c, like it tells you to), the result is only a 380KB file, and when opened with nvvp, the file has no information about any timeline whatsoever.
Is there a way to save profiling results when the program is exited abnormally (via ctrl-c)? Or is there any workaround using nvprof with TensorRT servers?
Thanks in advance!