Hardware - GPU L4
Riva Version - 1.18.0
NeMo - 1.23.0
Nemo2Riva - 1.18.0
I finetune the parakeet-tdt_ctc-0.6b-ja model with my custom dataset. Got test/validation character error rate (CER) ~17% with CTC decoder
After getting this CER I,
- Extract the CTC head from this NeMo model
- Convert this to .riva
- Build an offline Riva model with a greedy decoder
- Serve the Riva model
- Take a transcript from this offline Riva-deployed model on the same test/validation dataset.
This time I’m getting CER ~27%. I did not get any clue why the ~10% CER jump for the Riva model. Is it expected behavior?
For the streaming, low-latency, the CER jumped to 33%. For the Riva build, I use the default configuration from the Riva pipeline configs
I also tried to fine-tune a conformer-CTC model and get similar behavior. 10 to 15% CER jump in the Riva model
If you need any other information regarding this, please let me know.
Thanks for your help
Can you share the build command you used?
@mayjain Thanks for your reply.
I use this Riva build command for offline STT
riva-build speech_recognition -f \
"/servicemaker-dev/$RMIR_MODEL:tlt_encode"\
"/servicemaker-dev/$RIVA_MODEL:tlt_encode"\
--offline \
--name=parakeet-0.6b-unified-ml-cs-es-ja-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP \
--force
@mayjain I also tried streaming configs with
--name=parakeet-0.6b-unified-ml-cs-es-ja-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--append_space_to_transcripts=False \
--language_code=ja-JP \
--force
For this I get CER: ~28%
mayjain
8
Can you try our latest Parakeet CTC NIM containers.
I am not getting such spike in CER in latest containers.
Could you please share the Parakeet NIM model for the Japanese language
For the Riva build, I use this image
nvcr.io/nvidia/riva/riva-speech:2.18.0
mayjain
11
You can checkout NIM docs on how to deploy a custom NIM.
CONTAINER_ID = parakeet-1-1b-ctc-en-us
Thanks. I will try NIM deployment and let you know the update
Hi @mayjain I tried with NIM and getting the same issue
export NIM_EXPORT_PATH=~/nim_export
export CONTAINER_ID=parakeet-0-6b-ctc-en-us
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:3.1.0
mayjain
14
Can you try base parakeet checkpoint to understand if this is model issue or something wrong with setup/ params.
I finetune this model. Are you asking to deploy it using NIM and evaluate it on the same dataset?
mayjain
16
Yes, If the CER is bad even for this model, then there’s something is wrong with setup or params.
@mayjain With the base model, I get this result