Riva ASR Not Recognizing Speech (Empty Transcript)

eric.yang4 · February 25, 2025, 10:48am

Greetings,

Problem Description
I’m experiencing an issue with Nvidia Riva ASR where the server receives audio data via gRPC and detects the audio duration, but it fails to recognize any speech and always returns an empty transcript to the client.

Below is a log snippet from the Riva server:

grpc_riva_asr.cc:685] ASRService.Recognize called.
grpc_riva_asr.cc:854] Using model conformer-en-US-asr-offline-asr-bls-esemble from Triton localhost:8001 for inference
status_builder.h:100] {"specversion":"1.0", "type": "riva.asr.recognize.v1", "source":"","subject":"", "id": "df8df5d5-72bb-4ff11-b918-f19b0p477712", "datacontenttype":"application/son","time":"2025-02-21T11:30:40.061173945+00:00","data":{"release_version":"2.18.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request count":1,"audio_duration":4.000805421311011, "speech_duration":"0.0, "status":0,"err msg":""}}

The key issue here is that speech_duration remains 0.0, indicating that the server does not detect any actual speech in the provided audio.

Troubleshooting Steps Taken

To identify the cause, I performed the following diagnostics:

1. Verified Audio Data Integrity

The recorded voice was converted into signed 16-bit PCM (RAW buffer), as required by Riva’s Linear PCM audio encoding.
The byte array was converted back to audio and played successfully, confirming that the data was not corrupted.

2. Adjusted VAD (Voice Activity Detection) Settings

Increased stop_history and stop_history_eou to 2000 ms.
Lowered stop_threshold and stop_threshold_eou to 0.1, ensuring that Riva does not prematurely terminate recognition.

3. Confirmed Riva’s General Functionality

Tested local Riva ASR by transcribing a sample WAV file—the speech was correctly recognized.
Tested the remote client audio data with both Recognize() and StreamingRecognize() APIs—neither produced a valid transcript.

4. Verified gRPC Data Transmission

Successfully tested gRPC communication by using Riva TTS.
The synthesized voice was generated, transmitted via gRPC, and played correctly on the client side, confirming that the gRPC pathway is functional.

System Information

GPU: Nvidia A10
Operating System: Ubuntu
Riva Version: riva_quickstart_v2.18.0

Request for Assistance

Given that:

The audio data is properly formatted as 16-bit signed PCM.
The VAD settings are adjusted to avoid premature endpointing.
Riva ASR works fine with local samples, but fails with streamed data.
gRPC communication is functioning as confirmed via Riva TTS.

What could be causing Riva ASR to fail in recognizing speech from the remote client? Are there additional debugging steps or configurations I should check?

Any insights would be greatly appreciated!

Benutzer1925 · March 1, 2025, 6:02pm

having the same problem, with identical configuration

AakankshaS · March 11, 2025, 10:34pm

Hi @Benutzer1925 , Can you please help us with the server logs ?
Also the config, model used, sample audio and repro steps ?
This will help us with the debugging process.

Thanks

eric.yang4 · March 12, 2025, 7:46am

Greetings,

I am pleased to share that the issue has been successfully resolved. After cross-examining my client with a simplified version based on the official GitHub examples, I discovered that the problem stemmed from my client’s inability to properly retrieve the transcript from the response.

It turns out that the server had been sending valid transcripts all along. I had either misinterpreted the speech_duration variable—assuming it represented the duration of detected speech in my audio—or the logging of this variable was not functioning as expected.

This conclusion is supported by comparing the server logs and the exchanged data between the server and both clients. When ensuring that both the defective client and the working client sent the exact same audio data, they both triggered identical server log entries, such as:

"audio_duration":4.000805421311011, "speech_duration":"0.0", "status":0, "err_msg":""

While the defective client failed to display the transcript from the valid response, the working client successfully printed the full transcript without any issues.

After correcting the defective client, it now transmits audio data and receives transcripts as expected.

Thank you for your time and support.

system · March 26, 2025, 7:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Speach toText Notebook v1.3 StatusCode.INVALID_ARGUMENT Docker and NVIDIA Docker notebooks	1	1103	February 16, 2022
Final transcripts showing empty transcription Riva python	6	647	November 2, 2022
Riva ASR issue on transcribing demo audio Riva riva	3	677	April 25, 2023
Nvidia RIVA inferernce ASR real time Riva python	2	967	April 20, 2022
Error calling rtts.SynthesizeSpeechRequest and riva_asr.Recognize Riva	1	598	May 12, 2022
Riva Python Client isn't producing the proper streaming output for ASRs Riva	1	726	February 14, 2023
Sporadic streaming gRPC error "2 UNKNOWN: TRTIS response timeout" Riva	6	1445	January 25, 2022
Attempt to transcribe audio file fails (detected audio length is 0) Riva	2	509	February 3, 2024
RIVA node JS GRPC Riva	8	1106	April 18, 2022
Final transcript is empty on streaming mode Riva	5	722	December 22, 2022

Riva ASR Not Recognizing Speech (Empty Transcript)

Related topics