Riva Citrinet Language Model

HansieB · September 11, 2021, 3:31pm

Please provide the following information when requesting support.

Hardware - GPU (T4)
Operating System ubuntu
Riva Version 1.5.0

Hi So I’m trying to run the citrinet pre-trained model with custom configs, but when adding a language model (Riva ASR English LM | NVIDIA NGC where I got the language model ) the riva_start always times out and fails:

Riva Build Command:

riva-build speech_recognition /data/rmir/speechtotext_english_citrinet.rmir:tlt_encode /data/generated/speechtotext_english_citrinet.riva:tlt_encode --name=citrinet --decoder_type=flashlight  --chunk_size=0.8 --padding_size=1.6 --ms_per_timestep=80 --featurizer.use_utterance_norm_params=False --featurizer.precalc_norm_time_steps=0 --featurizer.precalc_norm_params=False --vad.vad_start_history=300 --vad.vad_start_th=0.2 --vad.vad_stop_history=1200 --vad.vad_stop_th=0.98 --decoding_language_model_binary=../../data/generated/mixed-lower.binary --decoding_vocab=../../data/generated/words.mixed_lm.txt

Log when running riva_start (I did initialise the model by using the quickstart script)

E0911 13:32:58.027425 73 sequence_batch_scheduler.cc:941] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'citrinet-ctc-decoder-cpu-streaming': (13) Invalid parameters in model configuration

Can you please tell me what the invalid parameters are? The riva-build was successful btw.

AakankshaS · September 13, 2021, 5:34am

Hi @HansieB ,
Could you please check the model configuration params? Please refer to below section:
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-asr.html?highlight=beam_size_token#language-models
When using the Citrinet acoustic model, the language model can be specified with the following riva-build command:

riva-build speech_recognition \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key> \
    /servicemaker-dev/<n_gram_riva_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --decoder_type=flashlight \
    --chunk_size=0.16 \
    --padding_size=1.92 \
    --ms_per_timestep=80 \
    --flashlight_decoder.asr_model_delay=-1 \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --decoding_vocab=<vocabulary_filename>

Thanks!

HansieB · September 13, 2021, 3:40pm

Hi, thanks for the reply @AakankshaS , but is it possible to use the above command for the language model ? Or do I have to create a new language model with tao ?

AakankshaS · September 16, 2021, 8:07am

Hi @HansieB ,
You can use the same tao model and export it to riva before deploying/build it
https://docs.nvidia.com/tao/tao-toolkit/text/asr/speech_recognition_with_citrinet.html#model-export

Thanks!

user60610 · November 22, 2021, 1:11pm

@HansieB @AakankshaS

Hi, guys

I’m new here.
I’m trying to do the same thing, running the citrinet pre-trained model with pre-trained language mode NVIDIA NGC.

My question is:
How to get Vocabulary file for the language model?
I did find a file words.mixed_lm.txt on NGC, but the content is :

version https://git-lfs.github.com/spec/v1
oid sha256:c3122fbed89b0e3c1362c1924532bf2849721add730f99fae7a959adc47a3506
size 3366212

Should I use it as --decoding_vocab=words.mixed_lm.txt ?
Or need to get the real vocabulary, if yes, how to do?

Thanks a lot.

Topic		Replies	Views
Help with custom deploy and perform inference using citrinet-mandarin NGC pre-trained model in Riva Riva riva	6	1229	October 12, 2021
Recreate QuickStart Stock Citrinet Model with Modified Parameters Riva	14	1866	August 4, 2022
Error when starting Citrinet with language model Riva riva	6	886	October 12, 2021
Not able to run LM fine tuned qurtznet model Riva riva	13	1415	October 8, 2021
Language model with citrinet model is not working Riva nemo , riva	2	713	September 6, 2022
[RIVA][Jasper][Citrinet] Build and deploy ASR models with custom KenLM language model TAO Toolkit riva	5	781	October 12, 2021
Missing Information in the Docs Riva	5	897	October 12, 2021
How can I use the German Citrinet with Riva TTS Riva	1	607	August 28, 2023
Riva 1.8 riva_start.sh fail when build with language model Riva riva	3	1249	July 27, 2022
Rebuilding the asrset3 citrinet offline pipeline but with larger chunk size Riva	10	1467	February 16, 2022

Riva Citrinet Language Model

Related topics