Why CER is very high when serving NeMo model in Riva

Hardware - GPU L4
Riva Version - 1.18.0
NeMo - 1.23.0
Nemo2Riva - 1.18.0

I finetune the parakeet-tdt_ctc-0.6b-ja model with my custom dataset. Got test/validation character error rate (CER) ~17% with CTC decoder

After getting this CER I,

  1. Extract the CTC head from this NeMo model
  2. Convert this to .riva
  3. Build an offline Riva model with a greedy decoder
  4. Serve the Riva model
  5. Take a transcript from this offline Riva-deployed model on the same test/validation dataset.

This time I’m getting CER ~27%. I did not get any clue why the ~10% CER jump for the Riva model. Is it expected behavior?

For the streaming, low-latency, the CER jumped to 33%. For the Riva build, I use the default configuration from the Riva pipeline configs

I also tried to fine-tune a conformer-CTC model and get similar behavior. 10 to 15% CER jump in the Riva model

If you need any other information regarding this, please let me know.

Thanks for your help

Can you share the build command you used?

@mayjain Thanks for your reply.

I use this Riva build command for offline STT

riva-build speech_recognition -f \
    "/servicemaker-dev/$RMIR_MODEL:tlt_encode"\
    "/servicemaker-dev/$RIVA_MODEL:tlt_encode"\
    --offline \
    --name=parakeet-0.6b-unified-ml-cs-es-ja-JP-asr-offline \
    --return_separate_utterances=True \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --ms_per_timestep=80 \
    --endpointing.residue_blanks_at_start=-16 \
    --nn.fp16_needs_obey_precision_pass \
    --unified_acoustic_model \
    --chunk_size=4.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --featurizer.max_batch_size=256 \
    --featurizer.max_execution_batch_size=256 \
    --decoder_type=greedy \
    --greedy_decoder.asr_model_delay=-1 \
    --language_code=ja-JP \
    --force

@mayjain I also tried streaming configs with

--name=parakeet-0.6b-unified-ml-cs-es-ja-JP-asr-streaming \
    --return_separate_utterances=False \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --ms_per_timestep=80 \
    --endpointing.residue_blanks_at_start=-16 \
    --nn.fp16_needs_obey_precision_pass \
    --unified_acoustic_model \
    --chunk_size=0.32 \
    --left_padding_size=3.92 \
    --right_padding_size=3.92 \
    --decoder_type=greedy \
    --greedy_decoder.asr_model_delay=-1 \
    --append_space_to_transcripts=False \
    --language_code=ja-JP \
    --force

For this I get CER: ~28%

To extract CTC head I use this script NeMo/examples/asr/asr_hybrid_transducer_ctc/helpers/convert_nemo_asr_hybrid_to_ctc.py at main · NVIDIA-NeMo/NeMo · GitHub

Can you try our latest Parakeet CTC NIM containers.
I am not getting such spike in CER in latest containers.

Could you please share the Parakeet NIM model for the Japanese language

For the Riva build, I use this image

nvcr.io/nvidia/riva/riva-speech:2.18.0

You can checkout NIM docs on how to deploy a custom NIM.
CONTAINER_ID = parakeet-1-1b-ctc-en-us

Thanks. I will try NIM deployment and let you know the update

Hi @mayjain I tried with NIM and getting the same issue

export NIM_EXPORT_PATH=~/nim_export
export CONTAINER_ID=parakeet-0-6b-ctc-en-us
docker run -it --rm --name=$CONTAINER_ID \
   --runtime=nvidia \
   --gpus '"device=0"' \
   --shm-size=8GB \
   -e NGC_API_KEY \
   -e NIM_TAGS_SELECTOR \
   -e NIM_DISABLE_MODEL_DOWNLOAD=true \
   -e NIM_HTTP_API_PORT=9000 \
   -e NIM_GRPC_API_PORT=50051 \
   -p 9000:9000 \
   -p 50051:50051 \
   -v $NIM_EXPORT_PATH:/opt/nim/export \
   -e NIM_EXPORT_PATH=/opt/nim/export \
   nvcr.io/nim/nvidia/$CONTAINER_ID:3.1.0

Can you try base parakeet checkpoint to understand if this is model issue or something wrong with setup/ params.

I finetune this model. Are you asking to deploy it using NIM and evaluate it on the same dataset?

Yes, If the CER is bad even for this model, then there’s something is wrong with setup or params.

@mayjain With the base model, I get this result