RAM usage doesn't go down for ASR. Could it be because of Triton's requests cache? If so, why don't the metrics reflect that?

hkhairy · March 12, 2023, 7:18pm

Please provide the following information when requesting support.

Hardware - GPU (T4)
Operating System - Container-Optimized OS with containerd (cos_containerd) (on a k8s cluster)
Riva Version 2.8.1

I’m experimenting with a deployment on a K8s cluster, using Helm charts. I’m only using a subset of the models, and when they’re loaded and Triton starts, the RAM (not vRAM) usage sits at around 5.8GB

After I send requests for offline ASR, the RAM usage spikes, then goes down, then it hits a plateau of about 8.8 GB, and doesn’t go down back to 5.8GB

The following image shows the Triton metric for RAM consumption nv_cpu_memory_used_bytes and shows the RAM spike, then plateau at 8.8GB

Now, is that Triton caching the results locally in the RAM?
If so, why is it when I investigate metrics related to cache, all metrics are zeros? When investigating the metrics called nv_cache_num_hits_per_model and nv_cache_num_misses_per_model, they’re both zeros, even when sending the same file for transcription.

rvinobha · March 21, 2023, 7:23am

Hi @hkhairy

Thanks for your interest in Riva

I will check regarding this query with the Riva Team and get back

Thanks

ryein · March 21, 2023, 3:40pm

I have noticed the same thing. riva_server never gets killed. I am calling for an end() and it never kills the process.

Topic		Replies	Views
Riva and Triton thread leak and consequent memory leak Riva riva	2	473	June 19, 2024
One of the processes has exited unexpectedly. Stopping container \| CPU memory Leak Riva	8	755	June 23, 2024
Random spikes in RAM while using Triton Inference TensorRT tensorrt , cuda , ubuntu , inference-server-triton	1	518	August 3, 2023
LLVM ERROR: out of memory Riva	4	209	September 9, 2025
Segfault and GPU memory overflow after activating all languages in RIVA for ASR Riva	4	1164	November 3, 2022
how to release the TRTIS memory Triton Inference Server (archived)	2	1050	October 23, 2019
TensorRT Inference Server system RAM usage climbs until container is closed by OS Triton Inference Server (archived)	2	1054	June 23, 2019
Triton server memory accumulation problem TensorRT cudnn	1	445	March 14, 2024
Memory usage will drop after I use trtexec Jetson Xavier NX tensorrt	6	1320	October 28, 2021
Enable cpu metrics Riva	6	758	February 14, 2023

RAM usage doesn't go down for ASR. Could it be because of Triton's requests cache? If so, why don't the metrics reflect that?

Related topics