Hi, I’m deploying a RIVA server based for TTS on this documentation:
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html#nemo-development
I’m also using this model for TTS in Spanish:
The server works fine, but the only problem is that: when asking it to generate some texts, there’re some words that it doesn’t generates (enunciates)
For example:
- It doesn’t enunciate “números” (numbers) but it does enunciate “numero” (number)
- It doesn’t enunciate “comiendo” (eating) but it does enunciate “comer” (eat)
In summary, it has problems with some plurals and conjugations.
I checked the model generated for the list of words (file: 14512b4cbbe64855a62dfa72b30c4527_ipa_es_Latin_America_nv22.11.txt)
(this model was generated by following the documentation above)
And this file doesn’t include the words that is not enunciating, so i guess it might be source of this issue.
I used the same nemo model directly without the riva pipeline and it works fine.
So, I want to know if there’s any argument i need to pass to fix this problem? Or any other way to solve it?
Also, I used this argument --preprocessor.g2p_ignore_ambiguous=false, but it doesn’t seem to be the solution for this.
Thanks in advance!