Nvidia Riva handling Concurrent requests

Hardware - GPU T4 AWS G4dn linux AMI
Hardware - CPU
Operating System Ubuntu
Riva Version 1.10
TLT Version (if relevant)

I am trying to process concurrent Audio streams using Nvidia RIVA for transcription. Currently I use one 16gb G4Dn.large AMI instance for development. Can someone tell me how many concurrent requests it can handle or the limit. Also is there a way I can optimize multiple threads for RIVA server.

Hi @sdm.amansehgal

Thanks for your interest in Riva

Please find the below link

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-performance.html#results

In the link you will find the Table, Click on T4 Tab (which will be you GPU), you can choose the model and language that you like to use and get the performance figures (like # of streams, Latency (ms), Throughput (RTFX)) ), this will give you a rough guide