How to reduce noise in ASR transcriptions?

Hardware - GPU: T4
Operating System: Ubuntu 20.04
Riva Version: 2.10

I"m using the CTC conformer models in Spanish (es-US) to do streaming recognition through a telephone line. However, when there is background noise, spurious words appear in ASR transcriptions. In the releases it is mentioned that there is an option to use the neural-based voice activity detector to avoid this problem, how can I use it? Is there any other way to suppress the noise without doing fine-tuning?


I will check with the internal team whether noise reduction is possible


Hi @nharo

We do have a Noise Robust es-US 3.1 model link below, could you confirm whether you are already using this model?


Yes, that is the model I am using, however the noise affects transcriptions. Sometimes it transcribes background noise when no one is speaking, and other times it transcribes both, noise and user audio. Is there a filter or some kind of score for streaming recognition?