Nvidia RIVA fails to infer full audio chunk

Hardware - GPU (GTX 3090)
Hardware - CPU
Operating System Ubuntu 20.04
Riva Version 2.9.0

I am using silero-vad to create small chunks (15s max) of large audio and infer those chunks using RIVA. It fails to transcribe the complete audio, in a few chunks, it misses a couple of words(4-6 words or more) of that audio.

How I can solve this issue to infer a long audio file.

HI @shihab2

Thanks for your interest in Riva

Can you share with us

  1. Details about the ASR Model used for inference
  2. config.sh used or riva-build and riva-deploy command used (whichever applicable)
  3. Audio samples and expected transcription expected details per audio

Thanks

I figured out the problem, It was because of LM. In a few chunks, it fails to transcribe because of the probability threshold(beam_threshold=20), I reduce the value and make all the chunks in the range of 7s, solving my problem.

Thanks @shihab2 for your kind feedback
Really Appreciate

1 Like