Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100)
nvidia-smi
Sun Dec 1 19:17:07 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 … Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P3 22W / 40W | 15MiB / 6144MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2164 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------------------+
(base) user@sowmiya-masterworks:~$

Hardware - CPU
Operating System: Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

Riva Version:riva_quickstart_v2.17.0
TLT Version (if relevant)
Hello NVIDIA Community,

I have fine-tuned the NVIDIA Riva Speech-to-Text Arabic Conformer model (pre-trained on 3600 hours of Arabic speech -link attached above) with my custom dataset. The fine-tuned model, in .nemo format, generates
docker logs riva-speech.txt (18.6 KB)
Arabic transcriptions when used directly in the NeMo framework.

However, after converting the .nemo model to the Riva format and deploying it via the Riva API services, I am not receiving any transcriptions. Below, I outline the steps I followed for conversion, building, and deployment. I kindly request guidance to resolve this issue.

Steps Followed

1. Model Conversion (NeMo to Riva):

I used the following command to convert the fine-tuned .nemo model to Riva format (.riva):

bash

Copy code

nemo2riva --out /home/user/NEMO-to-RIVA/data/Speech_To_Text_Finetuning.riva \
          --max-dim 5000 \
          --max-batch 4 \
          --device cuda \
          /home/user/NEMO-to-RIVA/2024-11-07_06-07-54/checkpoints/Speech_To_Text_Finetuning.nemo

2. RMIR Generation and Building:

I built both offline and streaming ASR pipelines using the following commands:

  • Offline Pipeline:

bash

Copy code

docker run --rm --gpus 0 -v /home/user/NEMO-to-RIVA:/data nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker -- \
    riva-build speech_recognition \
    /data/rmir/asr_offline_conformer_ctc.rmir \
    /data/data/Speech_To_Text_Finetuning.riva \
    --offline \
    --name=asr_offline_conformer_ctc_pipeline \
    --decoder_type=greedy \
    --ms_per_timestep=40 \
    --chunk_size=4.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --max_batch_size=4 \
    --nn.fp16_needs_obey_precision_pass \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --featurizer.max_batch_size=512 \
    --featurizer.max_execution_batch_size=512 \
    --language_code=ar-AR
  • Streaming Pipeline:

bash

Copy code

docker run --rm --gpus 0 -v /home/user/NEMO-to-RIVA:/data nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker -- \
    riva-build speech_recognition \
    /data/rmir/asr_streaming_conformer_ctc.rmir \
    /data/data/Speech_To_Text_Finetuning.riva \
    --streaming=true \
    --name=asr_streaming_conformer_ctc_pipeline \
    --decoder_type=greedy \
    --ms_per_timestep=40 \
    --chunk_size=4.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --max_batch_size=4 \
    --nn.fp16_needs_obey_precision_pass \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --featurizer.max_batch_size=512 \
    --featurizer.max_execution_batch_size=512 \
    --language_code=ar-AR

3. Deployment:

I deployed the Riva service using the following commands:

bash

Copy code

bash riva_init.sh
bash riva_start.sh

4. Issue:

Despite successfully deploying the service and configuring the pipelines, I am not receiving any transcriptions from the Riva API. The service is running, but it does not produce any output for the given input audio.

Request for Help:

  • Could there be any issues with the conversion parameters or configuration during the nemo2riva or riva-build steps?
  • Are there additional settings or debugging steps I should follow to identify the cause of the missing transcriptions?
  • Also, please guide me to the correct documentation or provide steps to deploy a custom fine-tuned model using the Riva pipeline. I am also attaching the logs for the reference

Any advice or insights would be greatly appreciated.

Thank you!