Riva Citrinet Language Model

Please provide the following information when requesting support.

Hardware - GPU (T4)
Operating System ubuntu
Riva Version 1.5.0

Hi So I’m trying to run the citrinet pre-trained model with custom configs, but when adding a language model (NVIDIA NGC where I got the language model ) the riva_start always times out and fails:

Riva Build Command:

riva-build speech_recognition /data/rmir/speechtotext_english_citrinet.rmir:tlt_encode /data/generated/speechtotext_english_citrinet.riva:tlt_encode --name=citrinet --decoder_type=flashlight  --chunk_size=0.8 --padding_size=1.6 --ms_per_timestep=80 --featurizer.use_utterance_norm_params=False --featurizer.precalc_norm_time_steps=0 --featurizer.precalc_norm_params=False --vad.vad_start_history=300 --vad.vad_start_th=0.2 --vad.vad_stop_history=1200 --vad.vad_stop_th=0.98 --decoding_language_model_binary=../../data/generated/mixed-lower.binary --decoding_vocab=../../data/generated/words.mixed_lm.txt

Log when running riva_start (I did initialise the model by using the quickstart script)

E0911 13:32:58.027425 73 sequence_batch_scheduler.cc:941] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'citrinet-ctc-decoder-cpu-streaming': (13) Invalid parameters in model configuration

Can you please tell me what the invalid parameters are? The riva-build was successful btw.

Hi @HansieB ,
Could you please check the model configuration params? Please refer to below section:
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-asr.html?highlight=beam_size_token#language-models
When using the Citrinet acoustic model, the language model can be specified with the following riva-build command:

riva-build speech_recognition \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key> \
    /servicemaker-dev/<n_gram_riva_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --decoder_type=flashlight \
    --chunk_size=0.16 \
    --padding_size=1.92 \
    --ms_per_timestep=80 \
    --flashlight_decoder.asr_model_delay=-1 \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --decoding_vocab=<vocabulary_filename>

Thanks!

Hi, thanks for the reply @AakankshaS , but is it possible to use the above command for the language model ? Or do I have to create a new language model with tao ?

Hi @HansieB ,
You can use the same tao model and export it to riva before deploying/build it
https://docs.nvidia.com/tao/tao-toolkit/text/asr/speech_recognition_with_citrinet.html#model-export

Thanks!

@HansieB @AakankshaS

Hi, guys

I’m new here.
I’m trying to do the same thing, running the citrinet pre-trained model with pre-trained language mode NVIDIA NGC.

My question is:
How to get Vocabulary file for the language model?
I did find a file words.mixed_lm.txt on NGC, but the content is :

version https://git-lfs.github.com/spec/v1
oid sha256:c3122fbed89b0e3c1362c1924532bf2849721add730f99fae7a959adc47a3506
size 3366212

Should I use it as --decoding_vocab=words.mixed_lm.txt ?
Or need to get the real vocabulary, if yes, how to do?

Thanks a lot.