I am trying to process concurrent Audio streams using Nvidia RIVA for transcription. Currently I use one 16gb G4Dn.large AMI instance for development. Can someone tell me how many concurrent requests it can handle or the limit. Also is there a way I can optimize multiple threads for RIVA server.

In the link you will find the Table, Click on T4 Tab (which will be you GPU), you can choose the model and language that you like to use and get the performance figures (like # of streams, Latency (ms), Throughput (RTFX)) ), this will give you a rough guide