Riva on Whisper Large v3 returns only part transcription

pruthvidhar.nanda · January 20, 2025, 7:01am

Please provide the following information when requesting support.

Hardware - GPU (A10)
Hardware - CPU
Operating System
Riva Version: 2.18.0
TLT Version (if relevant) - No
How to reproduce the issue ?
Command and logs below:
root@xfusion-server:/opt/riva# ./clients/riva_asr_client --riva_uri=0.0.0.0:50051 --audio_file=/opt/riva/wav/‘RC Audio Test 2(1).wav’ --model_name=whisper-large-v3-multi-asr-offline-asr-bls-ensemble --language_code=ar-AR --output_filename ‘riva-output.txt’
I0120 07:00:01.267732 252 grpc.h:94] Using Insecure Server Credentials
Loading eval dataset…
filename: /opt/riva/wav/RC Audio Test 2(1).wav
Done loading 1 files

File: /opt/riva/wav/RC Audio Test 2(1).wav

Final transcripts:
0 : السلام عليكم اشتركوا في القناة لا تتردد تواصل معنا لا لا جزاكم الله خير شكرا الله يشفيك ويحفظك حنا معك

Word Start (ms) End (ms) Confidence

Audio processed: 65.6428 sec.

Done processing 1 responses
Latencies (ms):
Median 90th 95th 99th Avg
1201.6 1201.6 1201.6 1201.6 1201.6
Run time: 1.2154 sec.
Total audio processed: 65.643 sec.
Throughput: 54.009 RTFX
Final transcripts written to riva-output.txt
I try to run on a rather long audio file but it returns only final (part) of the audio but I am in need to get the whole. Am I making a mistake here or is there something I am missing? Need help.

amargolin · January 21, 2025, 4:58pm

Hi,
The script line is correct and tested with no issues, hence the entire script was produced.

pruthvidhar.nanda · January 22, 2025, 5:30am

I see. But the audio file had a lot more vocabulary (as in words) to be transcribed but it has missed a lot and seems to have done it for just the last few seconds. Any way to debug this?

pruthvidhar.nanda · January 23, 2025, 7:00am

Also would be great if you could share the command you used for testing.

jkh · January 23, 2025, 8:55am

Hi pruthvidhar.nanda did you manage to get Arabic transcript with diacritics?

RIVA Conformer ASR Arabic claims to provide “diacritics along with spaces” but it was not the case when I run the pre-built “ar-AR” model (offline transcription).

I made a relevant post here.

pruthvidhar.nanda · January 23, 2025, 9:00am

Hey @jkh, I am not an arabic speaker but my mates seemed to find decent accuracy with the conformer model. Unsure of the diacritics. I am just testing different models for R&D.

jkh · January 23, 2025, 9:04am

I would be grateful if you could double-check with your teammates on the diacritics case. RIVA Conformer ASR Arabic description seems misleading. @pruthvidhar.nanda

ealbasiri · January 23, 2025, 4:46pm

@jkh @pruthvidhar.nanda As for diacritics, the Riva Arabic ASR model supports diacritics but not every speech is transcribed with full diacritics. If the audio in Modern Standard Arabic or dialectal speech then the model will provide partial diacritics where the context is ambiguous or diacritics will aid meaning such as shaddah or tanween. If the speech is Quran, then the model will produce fully diacritized transcripts with harkat. So, the answer is yes diacritics are supported by the model but it’s context dependent.

Topic		Replies	Views
RIVA Conformer ASR Arabic does not provide diacritics Riva	4	48	January 23, 2025
Attempt to transcribe audio file fails (detected audio length is 0) Riva	2	433	February 3, 2024
RIVA ASR StreamingRecognition low confidence for word transcripts Riva	1	481	November 29, 2023
Nvidia RIVA fails to infer full audio chunk Riva	3	685	April 10, 2023
Arabic ASR using riva throws error - "Error: Unavailable model requested given these parameters: language_code=ar; sample_rate=16000; type=offline; " Riva nemo , riva	0	22	February 25, 2025
Final transcripts showing empty transcription Riva python	6	556	November 2, 2022
Riva ASR issue on transcribing demo audio Riva riva	3	615	April 25, 2023
Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned Riva	0	55	December 1, 2024
Riva ASR Not Recognizing Speech (Empty Transcript) Riva riva	4	50	March 12, 2025
Finetuned ASR conformer returns only empty transcripts Riva	13	946	October 20, 2022

Riva on Whisper Large v3 returns only part transcription

Audio processed: 65.6428 sec.

Related topics