Problem with accumulating gpu memory usage in tritonserver

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

- Use two containers, tritonserver and deepstream
- Receive real-time CCTV footage from deepstream and continuously request inference to tritonserver
- Stop deepstream inference request
- When proceeding with the above results, the memory usage of the GPU consumed by tritonserver continues to increase from the beginning
- When repeating the above process multiple times, the memory usage of the GPU continues to increase, eventually causing a problem where inference itself becomes impossible
- For reference, the same problem occurs in tritonserver whether it is a pytorch model, a tensorrt model, or an onnx model
- The table below shows the results of the above content.

- Immediately after running triton
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2708MiB |
±--------------------------------------------------------------------------------------+

- When requesting inference from deepstream
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2714MiB |
±--------------------------------------------------------------------------------------+

- When the inference request is stopped in deepstream
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2758MiB |
±--------------------------------------------------------------------------------------+**

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered