Riva on Whisper Large v3 returns only part transcription

Please provide the following information when requesting support.

Hardware - GPU (A10)
Hardware - CPU
Operating System
Riva Version: 2.18.0
TLT Version (if relevant) - No
How to reproduce the issue ?
Command and logs below:
root@xfusion-server:/opt/riva# ./clients/riva_asr_client --riva_uri=0.0.0.0:50051 --audio_file=/opt/riva/wav/‘RC Audio Test 2(1).wav’ --model_name=whisper-large-v3-multi-asr-offline-asr-bls-ensemble --language_code=ar-AR --output_filename ‘riva-output.txt’
I0120 07:00:01.267732 252 grpc.h:94] Using Insecure Server Credentials
Loading eval dataset…
filename: /opt/riva/wav/RC Audio Test 2(1).wav
Done loading 1 files

File: /opt/riva/wav/RC Audio Test 2(1).wav

Final transcripts:
0 : السلام عليكم اشتركوا في القناة لا تتردد تواصل معنا لا لا جزاكم الله خير شكرا الله يشفيك ويحفظك حنا معك

Word Start (ms) End (ms) Confidence

Audio processed: 65.6428 sec.

Done processing 1 responses
Latencies (ms):
Median 90th 95th 99th Avg
1201.6 1201.6 1201.6 1201.6 1201.6
Run time: 1.2154 sec.
Total audio processed: 65.643 sec.
Throughput: 54.009 RTFX
Final transcripts written to riva-output.txt
I try to run on a rather long audio file but it returns only final (part) of the audio but I am in need to get the whole. Am I making a mistake here or is there something I am missing? Need help.

Hi,
The script line is correct and tested with no issues, hence the entire script was produced.

I see. But the audio file had a lot more vocabulary (as in words) to be transcribed but it has missed a lot and seems to have done it for just the last few seconds. Any way to debug this?

Also would be great if you could share the command you used for testing.

Hi pruthvidhar.nanda did you manage to get Arabic transcript with diacritics?

RIVA Conformer ASR Arabic claims to provide “diacritics along with spaces” but it was not the case when I run the pre-built “ar-AR” model (offline transcription).

I made a relevant post here.

Hey @jkh, I am not an arabic speaker but my mates seemed to find decent accuracy with the conformer model. Unsure of the diacritics. I am just testing different models for R&D.

I would be grateful if you could double-check with your teammates on the diacritics case. RIVA Conformer ASR Arabic description seems misleading. @pruthvidhar.nanda

@jkh @pruthvidhar.nanda As for diacritics, the Riva Arabic ASR model supports diacritics but not every speech is transcribed with full diacritics. If the audio in Modern Standard Arabic or dialectal speech then the model will provide partial diacritics where the context is ambiguous or diacritics will aid meaning such as shaddah or tanween. If the speech is Quran, then the model will produce fully diacritized transcripts with harkat. So, the answer is yes diacritics are supported by the model but it’s context dependent.

1 Like