Quick Start v2.17.0 running on a Jetson AGX Orin Dev Kit with JetPack 6.1.
I’m running the examples with a mic connected to the HD Audio input on the AGX Orin Dev Kit and setup for mono capture as per the developer guide.
- Translate Speech-to-Text (S2T)
riva_nmt_streaming_s2t_client --audio_device=hw:APE,0 --source_language_code="en-US" --target_language_code="de-DE"
The results are pretty bad. So I decided next to try ASR, as I’ve previously run this with a USB mic and got great results.
- ASR
riva_streaming_asr_client --language_code=en-GB --audio_device=hw:APE,0
Terrible results, which I suspect explains the S2T translation results.
- TTS
root@3d34e08ff9c4:/opt/riva# riva_tts_client --voice_name=English-US.Female-1 --text="Hello, this is a test powered by NVIDIA Riva Speech AI SDK." --audio_file=out.wav I1025 11:25:14.062170 3035 grpc.h:94] Using Insecure Server Credentials E1025 11:25:16.279562 3035 riva_tts_client.cc:220] Request time: 2.21471 s E1025 11:25:16.279830 3035 riva_tts_client.cc:230] Got 408064 bytes back from server
Followed by:
root@3d34e08ff9c4:/opt/riva# aplay -D plughw:1,0 out.wav Playing WAVE 'out.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
This appears to play out at at least double speed.
So while I observed no errors, it looks as though the audio device/processing is not setup properly for the examples. Any suggestions of how to remedy would be much appreciated!