Enhancing Naturalness in Text-to-Speech with NVIDIA Riva

Can NVIDIA Riva’s text-to-speech feature be enhanced to include speech disfluencies like 'hmm’s and 'uh’s, to make the generated speech sound more natural, particularly while the system processes calculations?

Hi @ro.goab

Thanks for your interest in Riva

Apologies this is not supported currently
For Speech Recognition we have work boosting capability
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/asr-improve-recognition-for-specific-words.html

Thanks