Please provide the following information when requesting support.
Hardware - GPU (A100/A30/T4/V100)
nvidia-smi
Sun Dec 1 19:17:07 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 … Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P3 22W / 40W | 15MiB / 6144MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2164 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------------------+
(base) user@sowmiya-masterworks:~$
Hardware - CPU
Operating System: Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
Riva Version:riva_quickstart_v2.17.0
TLT Version (if relevant)
Hello NVIDIA Community,
I have fine-tuned the NVIDIA Riva Speech-to-Text Arabic Conformer model (pre-trained on 3600 hours of Arabic speech -link attached above) with my custom dataset. The fine-tuned model, in .nemo
format, generates
docker logs riva-speech.txt (18.6 KB)
Arabic transcriptions when used directly in the NeMo framework.
However, after converting the .nemo
model to the Riva format and deploying it via the Riva API services, I am not receiving any transcriptions. Below, I outline the steps I followed for conversion, building, and deployment. I kindly request guidance to resolve this issue.
Steps Followed
1. Model Conversion (NeMo to Riva):
I used the following command to convert the fine-tuned .nemo
model to Riva format (.riva
):
bash
Copy code
nemo2riva --out /home/user/NEMO-to-RIVA/data/Speech_To_Text_Finetuning.riva \
--max-dim 5000 \
--max-batch 4 \
--device cuda \
/home/user/NEMO-to-RIVA/2024-11-07_06-07-54/checkpoints/Speech_To_Text_Finetuning.nemo
2. RMIR Generation and Building:
I built both offline and streaming ASR pipelines using the following commands:
- Offline Pipeline:
bash
Copy code
docker run --rm --gpus 0 -v /home/user/NEMO-to-RIVA:/data nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker -- \
riva-build speech_recognition \
/data/rmir/asr_offline_conformer_ctc.rmir \
/data/data/Speech_To_Text_Finetuning.riva \
--offline \
--name=asr_offline_conformer_ctc_pipeline \
--decoder_type=greedy \
--ms_per_timestep=40 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=4 \
--nn.fp16_needs_obey_precision_pass \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--language_code=ar-AR
- Streaming Pipeline:
bash
Copy code
docker run --rm --gpus 0 -v /home/user/NEMO-to-RIVA:/data nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker -- \
riva-build speech_recognition \
/data/rmir/asr_streaming_conformer_ctc.rmir \
/data/data/Speech_To_Text_Finetuning.riva \
--streaming=true \
--name=asr_streaming_conformer_ctc_pipeline \
--decoder_type=greedy \
--ms_per_timestep=40 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=4 \
--nn.fp16_needs_obey_precision_pass \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--language_code=ar-AR
3. Deployment:
I deployed the Riva service using the following commands:
bash
Copy code
bash riva_init.sh
bash riva_start.sh
4. Issue:
Despite successfully deploying the service and configuring the pipelines, I am not receiving any transcriptions from the Riva API. The service is running, but it does not produce any output for the given input audio.
Request for Help:
- Could there be any issues with the conversion parameters or configuration during the
nemo2riva
orriva-build
steps? - Are there additional settings or debugging steps I should follow to identify the cause of the missing transcriptions?
- Also, please guide me to the correct documentation or provide steps to deploy a custom fine-tuned model using the Riva pipeline. I am also attaching the logs for the reference
Any advice or insights would be greatly appreciated.
Thank you!