Hardware - GPU (GTX 3090)
Hardware - CPU
Operating System Ubuntu 20.04
Riva Version 2.9.0
I am using silero-vad to create small chunks (15s max) of large audio and infer those chunks using RIVA. It fails to transcribe the complete audio, in a few chunks, it misses a couple of words(4-6 words or more) of that audio.
How I can solve this issue to infer a long audio file.
Thanks for your interest in Riva
Can you share with us
- Details about the ASR Model used for inference
- config.sh used or riva-build and riva-deploy command used (whichever applicable)
- Audio samples and expected transcription expected details per audio
I figured out the problem, It was because of LM. In a few chunks, it fails to transcribe because of the probability threshold(beam_threshold=20), I reduce the value and make all the chunks in the range of 7s, solving my problem.
Thanks @shihab2 for your kind feedback