Riva examples and sample rate/format mismatches?

Quick Start v2.17.0 running on a Jetson AGX Orin Dev Kit with JetPack 6.1.

I’m running the examples with a mic connected to the HD Audio input on the AGX Orin Dev Kit and setup for mono capture as per the developer guide.

  1. Translate Speech-to-Text (S2T)

riva_nmt_streaming_s2t_client --audio_device=hw:APE,0 --source_language_code="en-US" --target_language_code="de-DE"

The results are pretty bad. So I decided next to try ASR, as I’ve previously run this with a USB mic and got great results.

  1. ASR

riva_streaming_asr_client --language_code=en-GB --audio_device=hw:APE,0

Terrible results, which I suspect explains the S2T translation results.

  1. TTS

root@3d34e08ff9c4:/opt/riva# riva_tts_client --voice_name=English-US.Female-1 --text="Hello, this is a test powered by NVIDIA Riva Speech AI SDK." --audio_file=out.wav I1025 11:25:14.062170 3035 grpc.h:94] Using Insecure Server Credentials E1025 11:25:16.279562 3035 riva_tts_client.cc:220] Request time: 2.21471 s E1025 11:25:16.279830 3035 riva_tts_client.cc:230] Got 408064 bytes back from server

Followed by:

root@3d34e08ff9c4:/opt/riva# aplay -D plughw:1,0 out.wav Playing WAVE 'out.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono

This appears to play out at at least double speed.

So while I observed no errors, it looks as though the audio device/processing is not setup properly for the examples. Any suggestions of how to remedy would be much appreciated!

Hi @andrewrfback ,
Apologies for delayed response,
we haven’t tested with separate channels. Since they are not listed when we run “transcribe_mic.py” ,I will check on this and get back.

Thanks