Hello NVIDIA Developer Community,
I am encountering a persistent issue with the Riva TTS models on my Orin Dev Kit and I’m seeking assistance to resolve it. Below are the details of my setup and the problem:
- Hardware: NVIDIA Orin Dev Kit 32GB
- Operating System: Jetpack 5.1 (also tested on versions up to Jetpack 6DP)
- Riva Version: 2.13.00 to 2.14.00
- TLT Version: [If relevant, please specify]
- Issue Description: When using TTS, the output (whether saved to WAV, streamed, or played directly) consists of repeated segments of the first audio chunk, each lasting 0.936 seconds, instead of the full, correct audio sequence.
Steps Taken and Observations:
- Initial suspicion was on ALSA, but it has been ruled out.
- I have completely reflashed the Orin and tested across different Jetpack versions.
- The issue manifests identically regardless of the output method (WAV, streaming, direct play).
- Analysis of the WAV file in a hex editor revealed the repeated 0.936-second audio chunks. This duration matches the internal chunk duration used by Riva in streaming mode.
- This problem has been consistent across Riva versions 2.13.00, 2.13.01, and 2.14.00.
- System resources don’t seem to be a bottleneck, as I currently have 5 GB free.
I had initially refrained from posting, thinking it might be an isolated issue or resource-related. However, having ruled out these factors and seeing the problem persist through several Riva updates, I’m inclined to believe this might be a bug in Riva itself, specifically with the multispeaker english and mandarin TTS models I have been attempting to use but it may effect all TTS models… This issue began with 1.13.00 as far as I know and did not exist in previous releases…
I am keen to resolve this as my work heavily involves english and mandarin TTS. If anyone else has experienced this or can offer insights, your input would be greatly appreciated. Any confirmation of similar issues on your end, suggestions for troubleshooting, or solutions would be extremely helpful.
Attached en-US FastPitch HiFi-GAN English-US IPA Multi-speaker
English-US.Female-1 WAV file saved from inference. (Google Drive Link) test.wav - Google Drive
Thank you for your time and assistance.