How to reduce noise in ASR transcriptions?

Please provide the following information when requesting support.

Hardware - GPU: T4
Operating System: Ubuntu 20.04
Riva Version: 2.10

I"m using the CTC conformer models in Spanish (es-US) to do streaming recognition through a telephone line. However, when there is background noise, spurious words appear in ASR transcriptions. In the releases it is mentioned that there is an option to use the neural-based voice activity detector to avoid this problem, how can I use it? Is there any other way to suppress the noise without doing fine-tuning?

Thanks.

1 Like

HI @nharo

Sincere Apologies for the delay,

I will check with the internal team whether noise reduction is possible

Thanks

Hi @nharo

We do have a Noise Robust es-US 3.1 model link below, could you confirm whether you are already using this model?

Thanks

Yes, that is the model I am using, however the noise affects transcriptions. Sometimes it transcribes background noise when no one is speaking, and other times it transcribes both, noise and user audio. Is there a filter or some kind of score for streaming recognition?

any news on this? I’m in the same situation

No, we continue using the same model, we consider doing fine tuning but it’s a lot of job

same problem. Something new? Could using some model like vad_telephony_marblenet solve the problem? Or is vad_telephony_marblenet very outdated and not trained for Spanish?