Triton server memory accumulation problem

dbdnjswns2 · March 5, 2024, 7:01am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: [TensorRT 8.6.1.6]
GPU Type: rtx2080
Nvidia Driver Version: 252.105.17
CUDA Version: 12.0
CUDNN Version:
Operating System + Version: ubuntu22.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Immediately after running triton
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:04:00.0 Off | N/A |
| 35% 30C P2 55W / 260W | 5520MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

The trtion server is receiving the request and processing it.
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:04:00.0 Off | N/A |
| 35% 31C P2 65W / 260W | 8176MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

After the triton request is completed
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:04:00.0 Off | N/A |
| 35% 31C P2 55W / 260W | 5680MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

I use triton23.10.
When running tritonserver and checking nvidia-smi, it says that 5520 is used in the gpu.
In deepstream6.1, an inference request was sent to tritonserver, so memory was consumed by about 8176.
However, when the request is completely terminated and nvidia-smi is checked, the memory becomes 5680, and as a result of repeating the request and checking nvidia-smi,
A problem arises in which memory waste gradually accumulates.
I would appreciate it if you could tell me how to solve this problem.

docker run --gpus device=3 -d --name st_model_convert_always --restart=always
–net=host
-v /home/users/asd/model:/models
nvcr.io/nvidia/tritonserver:23.10-py3 tritonserver --model-repository=/models

The request was made in deepstream6.1

AakankshaS · March 14, 2024, 5:13am

Hi @dbdnjswns2 ,
We request you to reach out to Issues · triton-inference-server/server · GitHub

Thanks

Topic		Replies	Views
Problem with accumulating gpu memory usage in tritonserver TensorRT cudnn , inference-server-triton , deepstream	0	271	September 3, 2024
Memory exceeded error when running triton-inference-server General Topics & Other SDKs cuda , inference-server-triton , gpu	0	1359	February 2, 2023
Triton server 20.02/20.03 GPU memory leaks [bug https://developer.nvidia.com/nvidia_bug/3061266] Triton Inference Server (archived)	0	813	July 16, 2020
how to release the TRTIS memory Triton Inference Server (archived)	2	1043	October 23, 2019
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	3073	September 2, 2022
Random spikes in RAM while using Triton Inference TensorRT tensorrt , cuda , ubuntu , inference-server-triton	1	514	August 3, 2023
Triton server GPU memory leak for grpc cuda shared memory request GPU - Hardware cuda , inference-server-triton , gpu	3	296	August 8, 2025
Memory deepstream triton DeepStream SDK deepstream61	6	629	August 31, 2023
TensorRT Inference Server system RAM usage climbs until container is closed by OS Triton Inference Server (archived)	2	1045	June 23, 2019
New TensorRT Model occupying more GPU Memory as compared to older version TensorRT tensorrt , tensorflow , gpu	8	2097	August 20, 2021

Triton server memory accumulation problem

Description

Environment

Relevant Files

Steps To Reproduce

Related topics