Jetson-Voice tts from Dusty_NV input sentence length

Hi I am playing around with the tts in the Dusty NV github repo and I was hoping there was a way to enter a sentence at a time it seems i hit a limit around 8 words of text. What Happens is that the file playback begins to sound very choppy. It seems to me that it would be optimal to have a sentence at a time be cycled to the wave. I cannot find where the limiting factor is in the 8 word limit. Is there a way to change this?

I need to have this to get through material as i cannot sit and read due to psychical limitations. What I have so far works but the choppy breaks are annoying.

example audio output.wav

Hi @Nick_579, I believe the max length is defined here:

https://github.com/dusty-nv/jetson-voice/blob/1ec13b77f493d399f31c31f0af0650bcdbca8bc0/jetson_voice/models/tts/tts_engine.py#L43

Funny does that count spaces?

I will have a look at thanks Dusty!

So those units are in MEL features and not in tokens/characters. That’s because there is pre-processing that happens. But yes, I believe spaces count because the spaces are used to let the TTS know when to end words.

Wrote audio to data/audio/tts_test/0.wav
And yet, hardly anything of what they said is true.

Run 0 -- Time to first audio: 0.369s. Generated 3.49s of audio. RTFx=9.46.
Run 1 -- Time to first audio: 0.137s. Generated 3.49s of audio. RTFx=25.43.
Run 2 -- Time to first audio: 0.138s. Generated 3.49s of audio. RTFx=25.24.
Run 3 -- Time to first audio: 0.155s. Generated 3.49s of audio. RTFx=22.60.
Run 4 -- Time to first audio: 0.138s. Generated 3.49s of audio. RTFx=25.37.
Run 5 -- Time to first audio: 0.133s. Generated 3.49s of audio. RTFx=26.30.

Wrote audio to data/audio/tts_test/1.wav
Of the many lies they told, one in particular surprised me, namely that you should be careful not to be deceived by an accomplished speaker like me.

[TensorRT] ERROR: 1: [deconv.cu::deconv_half8_explicit_gemm::234] Error Code 1: Cuda Runtime (invalid configuration argument)
Run 0 -- Time to first audio: 0.501s. Generated 8.87s of audio. RTFx=17.69.
[TensorRT] ERROR: 1: [deconv.cu::deconv_half8_explicit_gemm::234] Error Code 1: Cuda Runtime (invalid configuration argument)
Run 1 -- Time to first audio: 0.233s. Generated 8.87s of audio. RTFx=38.07.
[TensorRT] ERROR: 1: [deconv.cu::deconv_half8_explicit_gemm::234] Error Code 1: Cuda Runtime (invalid configuration argument)
Run 2 -- Time to first audio: 0.237s. Generated 8.87s of audio. RTFx=37.41.
[TensorRT] ERROR: 1: [deconv.cu::deconv_half8_explicit_gemm::234] Error Code 1: Cuda Runtime (invalid configuration argument)
Run 3 -- Time to first audio: 0.239s. Generated 8.87s of audio. RTFx=37.17.
[TensorRT] ERROR: 1: [deconv.cu::deconv_half8_explicit_gemm::234] Error Code 1: Cuda Runtime (invalid configuration argument)
Run 4 -- Time to first audio: 0.244s. Generated 8.87s of audio. RTFx=36.42.
[TensorRT] ERROR: 1: [deconv.cu::deconv_half8_explicit_gemm::234] Error Code 1: Cuda Runtime (invalid configuration argument)
Run 5 -- Time to first audio: 0.247s. Generated 8.87s of audio. RTFx=35.84.

Wrote audio to data/audio/tts_test/2.wav
That they were not ashamed to be immediately proved wrong by the facts, when I show myself not to be an accomplished speaker at all, that I thought was most shameless on their part—unless indeed they call an accomplished speaker the man who speaks the truth.

[TensorRT] ERROR: 3: [executionContext.cpp::setBindingDimensions::969] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::969, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [1,80,1151] for bindings[0] exceed min ~ max range at index 2, maximum dimension in profile is 1024, minimum dimension in profile is 1, but supplied dimension is 1151.
)
Traceback (most recent call last):
  File "examples/tts.py", line 50, in <module>
    audio = tts(i)
  File "/jetson-voice/jetson_voice/models/tts/tts_engine.py", line 81, in __call__
    audio = self.vocoder.execute(mels)
  File "/jetson-voice/jetson_voice/backends/tensorrt/trt_model.py", line 114, in execute
    setup_binding(self.bindings[idx], input)
  File "/jetson-voice/jetson_voice/backends/tensorrt/trt_model.py", line 109, in setup_binding
    binding.set_shape(input.shape)
  File "/jetson-voice/jetson_voice/backends/tensorrt/trt_binding.py", line 80, in set_shape
    raise ValueError(f"failed to set binding '{self.name}' with shape {shape}")
ValueError: failed to set binding 'mels' with shape (1, 80, 1151)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2966, GPU 7057 (MiB)

Did you try increasing the limits in the code I linked to above, and also deleting the *.engine file under jetson-voice/data/networks/tts/fastpitch-hifigan?