Hi,
I am starting to evaluate NVIDIA NIM Riva ASR and have a basic question
about utterance boundary detection.
I found the Riva ASR documentation about
“Beginning / End of Utterance Detection” (stop_history, silence_duration_ms, etc.),
and I was wondering if similar controls are available when using NIM,
especially with Parakeet RNNT Multilingual in streaming mode.
From a quick test, it seems that FINAL results are not always emitted,
so I am not sure if these endpointing parameters are supported in NIM or not.
Could you clarify:
- Are utterance boundary / endpointing parameters configurable in NIM?
- Or is it expected that applications handle end-of-utterance detection
outside of NIM when using RNNT models?
Thanks in advance.