Please provide the following information when requesting support.
Hardware - T4
Hardware - CPU x86_64
Operating System - Debian GNU/Linux 11 (bullseye)
Riva Version: 2.12.1
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
Hi! I’m building the speech-recognition pipeline, with the en-US conformer with the high throughput configurations.
I’m also using the marblenet VAD, and I have adjusted the endpointing to have a 2000ms stop_history.
I have some questions/issue i would love to resolve:
- The ASR/VAD are extremely sensitive to speech, even it is very very far away from the microphone, is there some kind of configuration to adjust the sensitivity such that a higher volume would be needed by the speaker? (i.e filter out background noise / babble ? )
- Is there a way to force a final result after X amount of seconds? I.E after 20 seconds finalize and give a final result for the current recognition stream?
- I saw that there exists a Nemo marblenet telephony VAD, which I did convert but it seems feature dimensions of this VAD do not work with the feature dimensions of the Conformer ? Is there something specific I need to do to make them work together?
Many thanks.