**Subject: Issue with TTS in Riva on Orin Dev Kit - Repeated Audio Chunks**

Hello NVIDIA Developer Community,

I am encountering a persistent issue with the Riva TTS models on my Orin Dev Kit and I’m seeking assistance to resolve it. Below are the details of my setup and the problem:

  • Hardware: NVIDIA Orin Dev Kit 32GB
  • Operating System: Jetpack 5.1 (also tested on versions up to Jetpack 6DP)
  • Riva Version: 2.13.00 to 2.14.00
  • TLT Version: [If relevant, please specify]
  • Issue Description: When using TTS, the output (whether saved to WAV, streamed, or played directly) consists of repeated segments of the first audio chunk, each lasting 0.936 seconds, instead of the full, correct audio sequence.

Steps Taken and Observations:

  • Initial suspicion was on ALSA, but it has been ruled out.
  • I have completely reflashed the Orin and tested across different Jetpack versions.
  • The issue manifests identically regardless of the output method (WAV, streaming, direct play).
  • Analysis of the WAV file in a hex editor revealed the repeated 0.936-second audio chunks. This duration matches the internal chunk duration used by Riva in streaming mode.
  • This problem has been consistent across Riva versions 2.13.00, 2.13.01, and 2.14.00.
  • System resources don’t seem to be a bottleneck, as I currently have 5 GB free.

I had initially refrained from posting, thinking it might be an isolated issue or resource-related. However, having ruled out these factors and seeing the problem persist through several Riva updates, I’m inclined to believe this might be a bug in Riva itself, specifically with the multispeaker english and mandarin TTS models I have been attempting to use but it may effect all TTS models… This issue began with 1.13.00 as far as I know and did not exist in previous releases…

I am keen to resolve this as my work heavily involves english and mandarin TTS. If anyone else has experienced this or can offer insights, your input would be greatly appreciated. Any confirmation of similar issues on your end, suggestions for troubleshooting, or solutions would be extremely helpful.

Attached en-US FastPitch HiFi-GAN English-US IPA Multi-speaker English-US.Female-1 WAV file saved from inference. (Google Drive Link) test.wav - Google Drive

Thank you for your time and assistance.

Best regards,
Rich

Hi @richg

Thanks for your interest in Riva

I will check with the developers and get back

Thanks

Hi @richg

Query from Engineering

is this with out-of-the-box prebuilt models or your own generated model?
this is pretty basic case of TTS inference and must be working, so suspecting if you have done some kind of tuning ?

Thanks

Right out of the box trying to use the built in models for TTS, what is really strange is that it wasnt happening prior to 2.13. ALso the length of the audio being generated aligns with the length of what is being spoken, its like that first chunk is sent again and again.