Please provide the following information when requesting support.
Hardware - GPU (T4)
Operating System - Container-Optimized OS with containerd (cos_containerd) (on a k8s cluster)
Riva Version 2.8.1
I’m experimenting with a deployment on a K8s cluster, using Helm charts. I’m only using a subset of the models, and when they’re loaded and Triton starts, the RAM (not vRAM) usage sits at around 5.8GB
After I send requests for offline ASR, the RAM usage spikes, then goes down, then it hits a plateau of about 8.8 GB, and doesn’t go down back to 5.8GB
The following image shows the Triton metric for RAM consumption
nv_cpu_memory_used_bytes and shows the RAM spike, then plateau at 8.8GB
- Now, is that Triton caching the results locally in the RAM?
- If so, why is it when I investigate metrics related to cache, all metrics are zeros? When investigating the metrics called
nv_cache_num_misses_per_model, they’re both zeros, even when sending the same file for transcription.