Nvidia Riva handling Concurrent requests

sdm.amansehgal · May 6, 2022, 11:13am

Hardware - GPU T4 AWS G4dn linux AMI
Hardware - CPU
Operating System Ubuntu
Riva Version 1.10
TLT Version (if relevant)

I am trying to process concurrent Audio streams using Nvidia RIVA for transcription. Currently I use one 16gb G4Dn.large AMI instance for development. Can someone tell me how many concurrent requests it can handle or the limit. Also is there a way I can optimize multiple threads for RIVA server.

rvinobha · May 12, 2022, 4:58pm

Hi @sdm.amansehgal

Thanks for your interest in Riva

Please find the below link

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-performance.html#results

In the link you will find the Table, Click on T4 Tab (which will be you GPU), you can choose the model and language that you like to use and get the performance figures (like # of streams, Latency (ms), Throughput (RTFX)) ), this will give you a rough guide

Topic		Replies	Views
How many request a T4 GPU can handle at a time? Riva	1	601	July 28, 2023
About Riva ASR Concurrency performance Riva	2	506	February 1, 2024
Is there any configuration to limit the maximum number of concurrent requests processed in riva? Riva	4	656	March 28, 2023
Is there an example of a node.js engine for ASR/TTS? Riva	2	776	February 27, 2023
RIVA ASR performance for AGX Orin Riva	1	461	November 29, 2023
Is my GPU compatible with Riva? Riva	8	787	November 2, 2022
Riva: Node.JS Examples Riva riva	4	2211	April 15, 2022
Riva and Triton thread leak and consequent memory leak Riva riva	2	379	June 19, 2024
Parakeet_1.1b + streaming diarisation VRAM leakage Riva	1	19	April 10, 2025
Unable to start riva Riva	6	1662	March 12, 2022

Nvidia Riva handling Concurrent requests

Related topics