Have anyone tried offline TTS like Tacotoron or Mozilla TTS on any of the Jetson platforms like Jetson Nano or Jetson Xavier NX.

I am currently working with a small group of enthusiasts on a “free” German dataset for TTS models. We are currently focused on Mozilla TTS as this seems to be the best Tacotron implementation around (sorry Nvidia).
I am looking at other TTS implementations like FastSpeech, AlignTTS or Nvidia FlowTron as well.

I am using my Xavier AGX for training experiments (trying different parameters - DeepLearning is a lot of trial and error). For Tacotron2 this performs about 3-5seconds per step - which is not too fast (2080RTX is <1seconds/step).
Taco2 Inference on AGX is about 1:1 - one second processing time for one second of audio. This is Xavier’s CPU - I didn’t get GPU inference working yet. Jetson Nano is much slower, inference is about 5:1 or slower. For a feasible offline TTS solution I would use a Xavier AGX or NX.


Thanks for the inputs @dkreutz

I had a chance yesterday to try on NX for Mozilla TTS, NeMo TTS and Deepvoice 3.
Here is my data on inference with pre-trained models:
Mozilla TTS takes around 3-5 secs,
NeMo TTS is taking more than several minutes.
Deepvoice 3, is the best as of now, which is 1:1
I was surprised to see NeMo TTS not yet optimized for NX.

did you train a german model with flowtron succesfully?
could you give me a short instruction, what i have to change in the code?
I setted up a german dataset and started already a training.
but the uotput is “kauderwelsch” thank you so much!