Offline speech synthesis - TTS

Hi ,

Have anyone tried offline TTS like Tacotoron or Mozilla TTS on any of the Jetson platforms like Jetson Nano or Jetson Xavier NX.

Any other better options for offline TTS is available ?


I am currently working with a small group of enthusiasts on a “free” German dataset for TTS models. We are currently focused on Mozilla TTS as this seems to be the best Tacotron implementation around (sorry Nvidia).
I am looking at other TTS implementations like FastSpeech, AlignTTS or Nvidia FlowTron as well.

I am using my Xavier AGX for training experiments (trying different parameters - DeepLearning is a lot of trial and error). For Tacotron2 this performs about 3-5seconds per step - which is not too fast (2080RTX is <1seconds/step).
Taco2 Inference on AGX is about 1:1 - one second processing time for one second of audio. This is Xavier’s CPU - I didn’t get GPU inference working yet. Jetson Nano is much slower, inference is about 5:1 or slower. For a feasible offline TTS solution I would use a Xavier AGX or NX.


Thanks for the inputs @dkreutz

I had a chance yesterday to try on NX for Mozilla TTS, NeMo TTS and Deepvoice 3.
Here is my data on inference with pre-trained models:
Mozilla TTS takes around 3-5 secs,
NeMo TTS is taking more than several minutes.
Deepvoice 3, is the best as of now, which is 1:1
I was surprised to see NeMo TTS not yet optimized for NX.

did you train a german model with flowtron succesfully?
could you give me a short instruction, what i have to change in the code?
I setted up a german dataset and started already a training.
but the uotput is “kauderwelsch” thank you so much!