Hardware - GPU (A10G 24 Gb)
Hardware - CPU (32 vCPUs)
Operating System - Linux/Ubuntu
Riva Version - 2.19
Hi. I’m currently testing streaming diarization with Riva quick start and their default configuration. I’m running performance tests and loading Riva with a different number of parallel requests. I noticed that when the RIVA sevres is up, it takes up 11.6 GB of VRAM. If I increase the number of requests to 85, the VRAM usage increases to 13.35 Gb. I hadn’t noticed such a leak before, and RIVA didn’t exceed the reserved memory limit. Can anyone tell me if this is the way it should be or if it can be avoided somehow?
To deploy the service, I use the original riva quick start scripts: Riva Skills Quick Start | NVIDIA NGC
In the config.sh i changed following lines:
asr_acoustic_model=("parakeet_1.1b")
asr_language_code=("en-US")
asr_accessory_model=("diarizer")
use_asr_greedy_decoder=false
I perform tests according to this guide: Performance — NVIDIA Riva