Problem with accumulating gpu memory usage in tritonserver

dbdnjswns2 · September 3, 2024, 6:56am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

- Use two containers, tritonserver and deepstream
- Receive real-time CCTV footage from deepstream and continuously request inference to tritonserver
- Stop deepstream inference request
- When proceeding with the above results, the memory usage of the GPU consumed by tritonserver continues to increase from the beginning
- When repeating the above process multiple times, the memory usage of the GPU continues to increase, eventually causing a problem where inference itself becomes impossible
- For reference, the same problem occurs in tritonserver whether it is a pytorch model, a tensorrt model, or an onnx model
- The table below shows the results of the above content.

- Immediately after running triton
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2708MiB |
±--------------------------------------------------------------------------------------+

- When requesting inference from deepstream
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2714MiB |
±--------------------------------------------------------------------------------------+

- When the inference request is stopped in deepstream
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1684 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1845 G /usr/bin/gnome-shell 90MiB |
| 0 N/A N/A 2412 G …,WinRetrieveSuggestionsOnlyOnDemand 102MiB |
| 0 N/A N/A 3664 G /usr/lib/xorg/Xorg 317MiB |
| 0 N/A N/A 3781 G /usr/bin/gnome-shell 76MiB |
| 0 N/A N/A 4704 G …6617046,15039575277166925408,262144 144MiB |
| 0 N/A N/A 5407 G …sion,SpareRendererForSitePerProcess 97MiB |
| 0 N/A N/A 14116 C /usr/bin/python 5224MiB |
| 0 N/A N/A 27119 C tritonserver 2758MiB |
±--------------------------------------------------------------------------------------+**

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

Topic		Replies	Views
Triton server memory accumulation problem TensorRT cudnn	1	263	March 14, 2024
Failed to deploy the reference server. Make an inference request to the peoplenet model via http TensorRT cudnn , inference-server-triton , deepstream	1	20	August 29, 2024
Avoid memory copy for deepstream pipeline connecting to a standalone local triton inference server DeepStream SDK docker , inference-server-triton , gpu , grpc , deepstream	2	375	April 1, 2024
TensorRT concurrent or parrellel inference in one GPU in jetson platform TensorRT jetson-inference	1	589	June 29, 2023
Unable to run Triton example TensorRT inference-server-triton	1	774	May 31, 2024
I can't run deepstream-lidar-inference-app on jetson nano. It will report an error! DeepStream SDK	11	362	September 28, 2023
Deepstream - Failed to register CUDA shared memory DeepStream SDK	3	300	December 25, 2023
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1310	September 6, 2024
Triton server logs DeepStream SDK	7	5100	May 16, 2022
Inferencing on DINO in triton inference server TensorRT inference-server-triton	1	52	August 29, 2024

Problem with accumulating gpu memory usage in tritonserver

Description

Environment

Relevant Files

Steps To Reproduce

Related topics