Error when starting Citrinet with language model

I used the standard nemo model “stt_en_citrinet_512” NVIDIA NGC and standart LM model “speechtotext_english_lm” NVIDIA NGC"
This model was converted from .nemo to .ejrvs format using nemo2jarvis-1.3.0b0-py3-none-any.whl script
I run Jarvis build with parameters for Citrinet model, build finished with no errors:

jarvis-build speech_recognition
/servicemaker-dev/stt_en_citrinet_512_lm.jmir
/servicemaker-dev/stt_en_citrinet_512.ejrvs
–decoding_language_model_binary="/servicemaker-dev/mixed-lower.binary"
–lm_decoder_cpu.beam_search_width=128
–lm_decoder_cpu.language_model_alpha=1.0
–lm_decoder_cpu.language_model_beta=1.0

Script jarvis_init.sh so also finished withuot errors
But when running jarvis_start.sh the error message is:

slava@k8s-worker1:~/jarvis_quickstart_v1.3.0-beta$ sudo bash jarvis_start.sh
Starting Jarvis Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Waiting for Jarvis server to load all models…retrying in 10 seconds
Health ready check failed.
Check Jarvis logs with: docker logs jarvis-speech

Jarvis-speech logs contain the following:

slava@k8s-worker1:~/jarvis_quickstart_v1.3.0-beta$ sudo docker logs jarvis-speech

==========================
== Jarvis Speech Skills ==

NVIDIA Release 21.05 (build 24727069)

Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for the inference server. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 …

Jarvis waiting for Triton server to load all models…retrying in 1 second
I0720 23:46:34.423369 69 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I0720 23:46:34.426804 69 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0720 23:46:34.426825 69 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0720 23:46:34.426831 69 onnxruntime.cc:1738] ‘onnxruntime’ TRITONBACKEND API version: 1.0
I0720 23:46:34.706931 69 pinned_memory_manager.cc:206] Pinned memory pool is created at ‘0x7f4290000000’ with size 268435456
I0720 23:46:34.707217 69 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
I0720 23:46:34.711375 69 model_repository_manager.cc:1066] loading: jarvis-asr-feature-extractor-streaming:1
I0720 23:46:34.812129 69 model_repository_manager.cc:1066] loading: jarvis-asr-ctc-decoder-cpu-streaming:1
I0720 23:46:34.812922 69 custom_backend.cc:201] Creating instance jarvis-asr-feature-extractor-streaming_0_0_gpu0 on GPU 0 (8.6) using libtriton_jarvis_asr_features.so
I0720 23:46:34.912681 69 model_repository_manager.cc:1066] loading: jarvis-asr-voice-activity-detector-ctc-streaming:1
I0720 23:46:34.913392 69 custom_backend.cc:198] Creating instance jarvis-asr-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_jarvis_asr_decoder_cpu.so
E0720 23:46:34.928965 69 sequence_batch_scheduler.cc:941] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for ‘jarvis-asr-ctc-decoder-cpu-streaming’: (13) Invalid parameters in model configuration
E0720 23:46:34.929916 69 model_repository_manager.cc:1243] failed to load ‘jarvis-asr-ctc-decoder-cpu-streaming’ version 1: Internal: Initialization failed for all sequence-batch scheduler threads
I0720 23:46:35.013210 69 model_repository_manager.cc:1066] loading: jarvis-trt-jarvis-asr-am-streaming:1
I0720 23:46:35.013698 69 custom_backend.cc:198] Creating instance jarvis-asr-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_jarvis_asr_vad.so
I0720 23:46:35.097707 69 model_repository_manager.cc:1240] successfully loaded ‘jarvis-asr-voice-activity-detector-ctc-streaming’ version 1
E:decoder_context.cc:634: Invalid decoder_type. Must be greedy or flashlight > Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
Jarvis waiting for Triton server to load all models…retrying in 1 second
I0720 23:46:50.656539 69 model_repository_manager.cc:1240] successfully loaded ‘jarvis-asr-feature-extractor-streaming’ version 1
Jarvis waiting for Triton server to load all models…retrying in 1 second
I0720 23:46:51.314936 69 plan_backend.cc:384] Creating instance jarvis-trt-jarvis-asr-am-streaming_0_0_gpu0 on GPU 0 (8.6) using model.plan
Jarvis waiting for Triton server to load all models…retrying in 1 second
I0720 23:46:51.819125 69 plan_backend.cc:768] Created instance jarvis-trt-jarvis-asr-am-streaming_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0720 23:46:51.823744 69 model_repository_manager.cc:1240] successfully loaded ‘jarvis-trt-jarvis-asr-am-streaming’ version 1
E0720 23:46:51.823933 69 model_repository_manager.cc:1431] Invalid argument: ensemble ‘jarvis-asr’ depends on ‘jarvis-asr-ctc-decoder-cpu-streaming’ which has no loaded version
I0720 23:46:51.824142 69 server.cc:504]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0720 23:46:51.824261 69 server.cc:543]
±------------±----------------------------------------------------------------±-------+
| Backend | Path | Config |
±------------±----------------------------------------------------------------±-------+
| tensorrt | | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
±------------±----------------------------------------------------------------±-------+

I0720 23:46:51.824451 69 server.cc:586]
±-------------------------------------------------±--------±--------------------------------------------------------------------------------------+
| Model | Version | Status |
±-------------------------------------------------±--------±--------------------------------------------------------------------------------------+
| jarvis-asr-ctc-decoder-cpu-streaming | 1 | UNAVAILABLE: Internal: Initialization failed for all sequence-batch scheduler threads |
| jarvis-asr-feature-extractor-streaming | 1 | READY |
| jarvis-asr-voice-activity-detector-ctc-streaming | 1 | READY |
| jarvis-trt-jarvis-asr-am-streaming | 1 | READY |
±-------------------------------------------------±--------±--------------------------------------------------------------------------------------+

I0720 23:46:51.824781 69 tritonserver.cc:1658]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.9.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /data/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 1000000000 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0720 23:46:51.824809 69 server.cc:234] Waiting for in-flight requests to complete.
I0720 23:46:51.824826 69 model_repository_manager.cc:1099] unloading: jarvis-asr-voice-activity-detector-ctc-streaming:1
I0720 23:46:51.824977 69 model_repository_manager.cc:1099] unloading: jarvis-asr-feature-extractor-streaming:1
I0720 23:46:51.825182 69 model_repository_manager.cc:1099] unloading: jarvis-trt-jarvis-asr-am-streaming:1
I0720 23:46:51.825482 69 server.cc:249] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0720 23:46:51.837571 69 model_repository_manager.cc:1223] successfully unloaded ‘jarvis-trt-jarvis-asr-am-streaming’ version 1
I0720 23:46:51.854743 69 model_repository_manager.cc:1223] successfully unloaded ‘jarvis-asr-feature-extractor-streaming’ version 1
I0720 23:46:51.871453 69 model_repository_manager.cc:1223] successfully unloaded ‘jarvis-asr-voice-activity-detector-ctc-streaming’ version 1
Jarvis waiting for Triton server to load all models…retrying in 1 second
I0720 23:46:52.825667 69 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Jarvis waiting for Triton server to load all models…retrying in 1 second
Triton server died before reaching ready state. Terminating Jarvis startup.
Check Triton logs with: docker logs
/opt/jarvis/bin/start-jarvis: line 1: kill: (69) - No such process

What can this error be related to and how to start Citrinet model with LM?

Hi @sltl ,
Can you please try running jarvis_clean.sh and then jarvis_init.sh, and let us know if this works?
Thanks!

Unfortunately it didn’t work, I get the same error :(

Could you please check the model configuration params? Please refer to below section:
https://docs.nvidia.com/deeplearning/jarvis/user-guide/docs/service-asr.html?highlight=beam_size_token#language-models

When using the Jasper or Quartznet acoustic models, the language model parameters alpha, beta, and beam_search_width can be specified with:

jarvis-build speech_recognition \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --decoding_language_model_binary=<KenLM_binary_filename> \
    --lm_decoder_cpu.beam_search_width=<beam_search_width> \
    --lm_decoder_cpu.language_model_alpha=<language_model_alpha> \
    --lm_decoder_cpu.language_model_beta=<language_model_beta>

With the Citrinet acoustic model, one can specify the Flashlight decoder hyper-parameters beam_size, beam_size_token, beam_threshold, lm_weight and word_insertion_score as follows:

jarvis-build speech_recognition \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --chunk_size=0.16 \
    --padding_size=1.92 \
    --ms_per_timestep=80 \
    --lm_decoder_cpu.asr_model_delay=-1 \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --lm_decoder_cpu.decoder_type=flashlight \
    --decoding_language_model_binary=<arpa_filename> \
    --decoding_vocab=<vocab_filename> \
    --lm_decoder_cpu.beam_size=<beam_size> \
    --lm_decoder_cpu.beam_size_token=<beam_size_token> \
    --lm_decoder_cpu.beam_threshold=<beam_threshold> \
    --lm_decoder_cpu.lm_weight=<lm_weight> \
    --lm_decoder_cpu.word_insertion_score=<word_insertion_score>

Thanks

1 Like

Thank you very much for your help, the problem was the lack of “–decoding_vocab=<vocab_filename>” parameters when building.

1 Like