RAM usage doesn't go down for ASR. Could it be because of Triton's requests cache? If so, why don't the metrics reflect that?

Please provide the following information when requesting support.

Hardware - GPU (T4)
Operating System - Container-Optimized OS with containerd (cos_containerd) (on a k8s cluster)
Riva Version 2.8.1

I’m experimenting with a deployment on a K8s cluster, using Helm charts. I’m only using a subset of the models, and when they’re loaded and Triton starts, the RAM (not vRAM) usage sits at around 5.8GB

After I send requests for offline ASR, the RAM usage spikes, then goes down, then it hits a plateau of about 8.8 GB, and doesn’t go down back to 5.8GB

The following image shows the Triton metric for RAM consumption nv_cpu_memory_used_bytes and shows the RAM spike, then plateau at 8.8GB

  1. Now, is that Triton caching the results locally in the RAM?
  2. If so, why is it when I investigate metrics related to cache, all metrics are zeros? When investigating the metrics called nv_cache_num_hits_per_model and nv_cache_num_misses_per_model, they’re both zeros, even when sending the same file for transcription.

Hi @hkhairy

Thanks for your interest in Riva

I will check regarding this query with the Riva Team and get back


1 Like

I have noticed the same thing. riva_server never gets killed. I am calling for an end() and it never kills the process.