Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned

d.sowmiya86 · December 1, 2024, 1:56pm

Please provide the following information when requesting support.

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2164 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------------------+
(base) user@sowmiya-masterworks:~$

Hardware - CPU
Operating System: Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

Riva Version:riva_quickstart_v2.17.0
TLT Version (if relevant)
Hello NVIDIA Community,

I have fine-tuned the NVIDIA Riva Speech-to-Text Arabic Conformer model (pre-trained on 3600 hours of Arabic speech -link attached above) with my custom dataset. The fine-tuned model, in .nemo format, generates
docker logs riva-speech.txt (18.6 KB)
Arabic transcriptions when used directly in the NeMo framework.

However, after converting the .nemo model to the Riva format and deploying it via the Riva API services, I am not receiving any transcriptions. Below, I outline the steps I followed for conversion, building, and deployment. I kindly request guidance to resolve this issue.

Steps Followed

1. Model Conversion (NeMo to Riva):

I used the following command to convert the fine-tuned .nemo model to Riva format (.riva):

bash

Copy code

nemo2riva --out /home/user/NEMO-to-RIVA/data/Speech_To_Text_Finetuning.riva \
          --max-dim 5000 \
          --max-batch 4 \
          --device cuda \
          /home/user/NEMO-to-RIVA/2024-11-07_06-07-54/checkpoints/Speech_To_Text_Finetuning.nemo

2. RMIR Generation and Building:

I built both offline and streaming ASR pipelines using the following commands:

Offline Pipeline:

bash

Copy code

docker run --rm --gpus 0 -v /home/user/NEMO-to-RIVA:/data nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker -- \
    riva-build speech_recognition \
    /data/rmir/asr_offline_conformer_ctc.rmir \
    /data/data/Speech_To_Text_Finetuning.riva \
    --offline \
    --name=asr_offline_conformer_ctc_pipeline \
    --decoder_type=greedy \
    --ms_per_timestep=40 \
    --chunk_size=4.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --max_batch_size=4 \
    --nn.fp16_needs_obey_precision_pass \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --featurizer.max_batch_size=512 \
    --featurizer.max_execution_batch_size=512 \
    --language_code=ar-AR

Streaming Pipeline:

bash

Copy code

docker run --rm --gpus 0 -v /home/user/NEMO-to-RIVA:/data nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker -- \
    riva-build speech_recognition \
    /data/rmir/asr_streaming_conformer_ctc.rmir \
    /data/data/Speech_To_Text_Finetuning.riva \
    --streaming=true \
    --name=asr_streaming_conformer_ctc_pipeline \
    --decoder_type=greedy \
    --ms_per_timestep=40 \
    --chunk_size=4.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --max_batch_size=4 \
    --nn.fp16_needs_obey_precision_pass \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --featurizer.max_batch_size=512 \
    --featurizer.max_execution_batch_size=512 \
    --language_code=ar-AR

3. Deployment:

I deployed the Riva service using the following commands:

bash

Copy code

bash riva_init.sh
bash riva_start.sh

4. Issue:

Despite successfully deploying the service and configuring the pipelines, I am not receiving any transcriptions from the Riva API. The service is running, but it does not produce any output for the given input audio.

Request for Help:

Could there be any issues with the conversion parameters or configuration during the nemo2riva or riva-build steps?
Are there additional settings or debugging steps I should follow to identify the cause of the missing transcriptions?
Also, please guide me to the correct documentation or provide steps to deploy a custom fine-tuned model using the Riva pipeline. I am also attaching the logs for the reference

Any advice or insights would be greatly appreciated.

Thank you!

Topic		Replies	Views
Arabic ASR using riva throws error - "Error: Unavailable model requested given these parameters: language_code=ar; sample_rate=16000; type=offline; " Riva nemo , riva	0	32	February 25, 2025
RIVA Conformer ASR Arabic does not provide diacritics Riva	4	56	January 23, 2025
Riva providing empty transcriptions for a few audios, but nemo does not for those audios Riva python , nemo , riva	4	860	November 21, 2022
JARVIS throwing errors for offline ASR when using own model Riva riva	12	2845	September 4, 2021
RIVA error, when deploying official Conformer ASR network Riva riva	10	1947	January 27, 2023
Finetuned ASR conformer returns only empty transcripts Riva	13	956	October 20, 2022
Final transcripts showing empty transcription Riva python	6	557	November 2, 2022
Final transcript is empty on streaming mode Riva	5	649	December 22, 2022
Riva model deployment issue Riva inception	8	1564	April 4, 2024
Riva Build fails for finetuned conformer NeMo models with batch size 1 Riva	2	753	November 1, 2022

Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned

Steps Followed

1. Model Conversion (NeMo to Riva):

2. RMIR Generation and Building:

3. Deployment:

4. Issue:

Request for Help:

Related topics