Random spikes in RAM while using Triton Inference

Description

We are observing spikes in RAM (~40GB) while using Triton Inference server. We have python backend models (CPU and GPU) and TensorRT models. And we also use BLS in the pipeline.

Environment

Trition verison: 22.12 (nvcr.io/nvidia/tritonserver:22.12-py3)
Python Backend verison: r21.08
TensorRT Version: 8.5.1
GPU Type: GeForce RTX 2080 SUPER
Nvidia Driver Version: 510.108.03
CUDA Version: 11.8
CUDNN Version: 8.7.0 GA
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.7

Relevant Files

Steps To Reproduce

We couldn’t find the exact reason for the spikes, so cannot specify the steps to reproduce. This issue happens mostly on long runs (8-12 hrs) of Triton Server.

Hi,

We recommend you to please reach out to Issues · triton-inference-server/server · GitHub to get better help on Triton related issues.

Thank you.

1 Like