Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
- Use two containers, tritonserver and deepstream
- Receive real-time CCTV footage from deepstream and continuously request inference to tritonserver
- Stop deepstream inference request
- When proceeding with the above results, the memory usage of the GPU consumed by tritonserver continues to increase from the beginning
- When repeating the above process multiple times, the memory usage of the GPU continues to increase, eventually causing a problem where inference itself becomes impossible
- For reference, the same problem occurs in tritonserver whether it is a pytorch model, a tensorrt model, or an onnx model
- The table below shows the results of the above content.
- Immediately after running triton
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2708MiB |
±--------------------------------------------------------------------------------------+
- When requesting inference from deepstream
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2714MiB |
±--------------------------------------------------------------------------------------+
- When the inference request is stopped in deepstream
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2758MiB |
±--------------------------------------------------------------------------------------+**
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered