Hardware - GPU L4
Riva Version - 1.18.0
NeMo - 1.23.0
Nemo2Riva - 1.18.0
I finetune the parakeet-tdt_ctc-0.6b-ja model with my custom dataset. Got test/validation character error rate (CER) ~17% with CTC decoder
After getting this CER I,
- Extract the CTC head from this NeMo model
- Convert this to .riva
- Build an offline Riva model with a greedy decoder
- Serve the Riva model
- Take a transcript from this offline Riva-deployed model on the same test/validation dataset.
This time I’m getting CER ~27%. I did not get any clue why the ~10% CER jump for the Riva model. Is it expected behavior?
For the streaming, low-latency, the CER jumped to 33%. For the Riva build, I use the default configuration from the Riva pipeline configs
I also tried to fine-tune a conformer-CTC model and get similar behavior. 10 to 15% CER jump in the Riva model
If you need any other information regarding this, please let me know.
Thanks for your help
Can you share the build command you used?
@mayjain Thanks for your reply.
I use this Riva build command for offline STT
riva-build speech_recognition -f \
"/servicemaker-dev/$RMIR_MODEL:tlt_encode"\
"/servicemaker-dev/$RIVA_MODEL:tlt_encode"\
--offline \
--name=parakeet-0.6b-unified-ml-cs-es-ja-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP \
--force
@mayjain I also tried streaming configs with
--name=parakeet-0.6b-unified-ml-cs-es-ja-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--append_space_to_transcripts=False \
--language_code=ja-JP \
--force
For this I get CER: ~28%
mayjain
8
Can you try our latest Parakeet CTC NIM containers.
I am not getting such spike in CER in latest containers.
Could you please share the Parakeet NIM model for the Japanese language
For the Riva build, I use this image
nvcr.io/nvidia/riva/riva-speech:2.18.0
mayjain
11
You can checkout NIM docs on how to deploy a custom NIM.
CONTAINER_ID = parakeet-1-1b-ctc-en-us
Thanks. I will try NIM deployment and let you know the update