Riva Citrinet Language Model

Please provide the following information when requesting support.

Hardware - GPU (T4)
Operating System ubuntu
Riva Version 1.5.0

Hi So I’m trying to run the citrinet pre-trained model with custom configs, but when adding a language model (NVIDIA NGC where I got the language model ) the riva_start always times out and fails:

Riva Build Command:

riva-build speech_recognition /data/rmir/speechtotext_english_citrinet.rmir:tlt_encode /data/generated/speechtotext_english_citrinet.riva:tlt_encode --name=citrinet --decoder_type=flashlight  --chunk_size=0.8 --padding_size=1.6 --ms_per_timestep=80 --featurizer.use_utterance_norm_params=False --featurizer.precalc_norm_time_steps=0 --featurizer.precalc_norm_params=False --vad.vad_start_history=300 --vad.vad_start_th=0.2 --vad.vad_stop_history=1200 --vad.vad_stop_th=0.98 --decoding_language_model_binary=../../data/generated/mixed-lower.binary --decoding_vocab=../../data/generated/words.mixed_lm.txt

Log when running riva_start (I did initialise the model by using the quickstart script)

E0911 13:32:58.027425 73 sequence_batch_scheduler.cc:941] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'citrinet-ctc-decoder-cpu-streaming': (13) Invalid parameters in model configuration

Can you please tell me what the invalid parameters are? The riva-build was successful btw.

Hi @HansieB ,
Could you please check the model configuration params? Please refer to below section:
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-asr.html?highlight=beam_size_token#language-models
When using the Citrinet acoustic model, the language model can be specified with the following riva-build command:

riva-build speech_recognition \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key> \
    /servicemaker-dev/<n_gram_riva_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --decoder_type=flashlight \
    --chunk_size=0.16 \
    --padding_size=1.92 \
    --ms_per_timestep=80 \
    --flashlight_decoder.asr_model_delay=-1 \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --decoding_vocab=<vocabulary_filename>

Thanks!

Hi, thanks for the reply @AakankshaS , but is it possible to use the above command for the language model ? Or do I have to create a new language model with tao ?

Hi @HansieB ,
You can use the same tao model and export it to riva before deploying/build it
https://docs.nvidia.com/tao/tao-toolkit/text/asr/speech_recognition_with_citrinet.html#model-export

Thanks!