RIVA Conformer ASR Arabic does not provide diacritics

jkh · January 22, 2025, 2:54pm

Description

I employed Nvidia RIVA in order to run offline ASR on Arabic conversations with the goal of getting transcriptions including diacritics. Although RIVA Conformer ASR Arabic claims to provide “diacritics along with spaces” this is not the case in practice. That is, the resulting transcriptions include Arabic text without diacritics.

RIVA Conformer ASR Arabic model version: nvidia/riva/speechtotext_ar_ar_conformer:deployable_v3.0_export_v2
RIVA image: nvcr.io/nvidia/riva/riva-speech:2.18.0

How to reproduce

I have employed 2 approaches for getting transcriptions with Arabic diacritics without success.

First approach (Pretrained ASR Model)

First approach follows the steps of the RIVA guide and invokes Pretrained ASR Models for Arabic language (see Arabic (ar-AR) entry). The only essential change that was needed was in config.sh:

service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false
service_enabled_nmt=true
<...>
asr_language_code=("ar-AR")

Although this approach went smooth, the resulting transcriptions include Arabic text without diacritics. Some usage examples:

riva_asr_client  --list_models
'ar-AR': 'conformer-ar-AR-asr-offline-asr-bls-ensemble'

riva_asr_client --audio_file=/opt/riva/wav/ar-AR_sample.wav --language_code=ar-AR  --print_transcripts=true --automatic_punctuation=true --verbatim_transcripts=true

Loading eval dataset...
filename: /opt/riva/wav/ar-AR_sample.wav
Done loading 1 files
-----------------------------------------------------------
File: /opt/riva/wav/ar-AR_sample.wav

Final transcripts: 
0 : هل بإمكانك أن تعطيني المزيد من القهوة من فضلك؟ 

Word                                    Start (ms)      End (ms)        Confidence      
هل                                    640             800             4.9614e-01      
بإمكانك                          880             1480            4.0296e-01      
أن                                    1640            1680            8.0560e-01      
تعطيني                            1720            2160            9.4022e-01      
<...>

Second approach (Build and Deploy ASR Model)

Given that RIVA Conformer ASR Arabic provides only the .riva file, the second attempt includes the building and deployment of the model in order to acquire the .rmir that can be used later for inference. By following the instructions to assembly the command line in Pipeline Configuration and downloading the respective additional data (wfst_tokenizer_model, wfst_verbalizer_model, decoding_language_model_binary, decoding_vocab) I could successfully generate the .rmir file (the additional data can be manually retrieved from Pretrained ASR Models for Arabic language).

Following, I followed again the procedure from the quick start guide (riva_clean.h, riva_init.sh, riva_start.sh, riva_start_client.sh) with no success (I excluded the model download part from riva_init.sh since I manually generated the .rmir). The ouput was identical to what I mention earlier.

riva-build command line:

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ar-AR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

Hardware specs

Tesla V100-PCIE-32GB
Driver Version: 560.35.03      CUDA Version: 12.6
RIVA version: 2.18.0

Intel(R) Xeon(R) Gold 6278C CPU @ 2.60GHz

Did anyone observe the same issue?

ealbasiri · January 23, 2025, 5:02pm

@jkh As for diacritics, the Riva Arabic ASR model supports diacritics but not every speech is transcribed with full diacritics, it depends on context. If the audio is in Modern Standard Arabic or dialectal speech then the model will provide only partial diacritics where the context is ambiguous or diacritics will aid meaning such as shaddah or tanween. If the speech is Quran, then the model will produce fully diacritized transcripts with harkat. So, the answer is yes diacritics are supported as a feature by the model but it’s context dependent.

jkh · January 23, 2025, 8:04pm

Hi @ealbasiri and thank you for your answer. I found some quran archives that i can try soon. Is there another useful audio source that I can put to test? What other Arabic audio categories you have in mind?

ealbasiri · January 23, 2025, 8:16pm

@jkh you can try any Arabic benchmark datasets available. This is a recent publication for an Open Universal Arabic ASR Leaderboard, a continuous benchmark project for open-source general Arabic ASR models across various multi-dialect datasethere. Their results show that Riva Arabic ASR ranks first on the leaderboard.

system · February 6, 2025, 8:16pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Riva on Whisper Large v3 returns only part transcription Riva	7	65	January 23, 2025
Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned Riva	0	61	December 1, 2024
Arabic ASR using riva throws error - "Error: Unavailable model requested given these parameters: language_code=ar; sample_rate=16000; type=offline; " Riva nemo , riva	0	28	February 25, 2025
Riva v2.19 speaker diarization issue Riva riva	3	57	April 24, 2025
RIVA ASR StreamingRecognition low confidence for word transcripts Riva	1	488	November 29, 2023
Help Needed: Riva ASR Model Not Detecting Audio Riva riva	1	85	April 22, 2025
Riva ASR issue on transcribing demo audio Riva riva	3	615	April 25, 2023
Can we use Riva as a standalone package? Riva	9	824	November 28, 2022
NVIDIA Riva ASR failed start with WFST decoders Riva riva	5	516	March 29, 2024
Issues with Speaker Diarization in Riva ASR - All Audio Segments Tagged as Person 0 Riva riva , generative_ai	3	42	April 10, 2025