Missing Information in the Docs

Hey Nvidia Team,

im trying to build a Quartznet model with a binary LM.
However, in the Docs Section “Language Model” , there’s just an empty green box when explaining how to build a Jasper or Quartznet model.

Could u guys fix this?

because when im running riva build as follows, the health ready check atfer running ./riva_start.sh fails.

riva-build speech_recognition /servicemaker-dev/german_quartznet_binaryLM_2.rmir /servicemaker-dev/german_quartznet.riva --offline --decoder_type=os2s --decoding_language_model_binary=scorer6.binary --os2s_decoder.beam_search_width=128 --os2s_decoder.language_model_alpha=0.931289039105002 --os2s_decoder.language_model_beta=1.1834137581510284

The full error from the docker logs riva-speech that accures when running riva_start.sh is:

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.08 (build 26374001)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0905 10:17:58.091885 71 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I0905 10:17:58.093712 71 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0905 10:17:58.093724 71 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0905 10:17:58.093726 71 onnxruntime.cc:1738] 'onnxruntime' TRITONBACKEND API version: 1.0
I0905 10:17:58.221914 71 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7f4e80000000' with size 268435456
I0905 10:17:58.222149 71 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
I0905 10:17:58.224796 71 model_repository_manager.cc:1066] loading: riva-asr-feature-extractor-streaming-offline:1
I0905 10:17:58.324975 71 model_repository_manager.cc:1066] loading: riva-asr-ctc-decoder-cpu-streaming-offline:1
I0905 10:17:58.325383 71 custom_backend.cc:201] Creating instance riva-asr-feature-extractor-streaming-offline_0_0_gpu0 on GPU 0 (8.6) using libtriton_riva_asr_features.so
I0905 10:17:58.425147 71 model_repository_manager.cc:1066] loading: riva-asr-voice-activity-detector-ctc-streaming-offline:1
I0905 10:17:58.425334 71 custom_backend.cc:198] Creating instance riva-asr-ctc-decoder-cpu-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
E:decoder_context.cc:594: Cannot initialize decoders. Error msg: external/kenlm/lm/model.cc:70 in lm::ngram::detail::GenericModel<Search, VocabularyT>::GenericModel(const char*, const lm::ngram::Config&) [with Search = lm::ngram::trie::TrieSearch<lm::ngram::SeparatelyQuantize, lm::ngram::trie::ArrayBhiksha>; VocabularyT = lm::ngram::SortedVocabulary] threw FormatLoadException because `new_config.enumerate_vocab && !parameters.fixed.has_vocabulary'.
E0905 10:17:58.428477 71 sequence_batch_scheduler.cc:941] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'riva-asr-ctc-decoder-cpu-streaming-offline': (13) Invalid parameters in model configuration
E0905 10:17:58.428773 71 model_repository_manager.cc:1243] failed to load 'riva-asr-ctc-decoder-cpu-streaming-offline' version 1: Internal: Initialization failed for all sequence-batch scheduler threads
I0905 10:17:58.525349 71 model_repository_manager.cc:1066] loading: riva-trt-riva-asr-am-streaming-offline:1
I0905 10:17:58.525542 71 custom_backend.cc:198] Creating instance riva-asr-voice-activity-detector-ctc-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0905 10:17:58.528983 71 model_repository_manager.cc:1240] successfully loaded 'riva-asr-voice-activity-detector-ctc-streaming-offline' version 1
The decoder requested all the vocabulary strings, but this binary file does not have them.  You may need to rebuild the binary file with an updated version of build_binary.  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0905 10:18:04.555087 71 model_repository_manager.cc:1240] successfully loaded 'riva-asr-feature-extractor-streaming-offline' version 1
I0905 10:18:04.988146 71 plan_backend.cc:384] Creating instance riva-trt-riva-asr-am-streaming-offline_0_0_gpu0 on GPU 0 (8.6) using model.plan
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0905 10:18:05.161450 71 plan_backend.cc:768] Created instance riva-trt-riva-asr-am-streaming-offline_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0905 10:18:05.162693 71 model_repository_manager.cc:1240] successfully loaded 'riva-trt-riva-asr-am-streaming-offline' version 1
E0905 10:18:05.162718 71 model_repository_manager.cc:1431] Invalid argument: ensemble 'riva-asr' depends on 'riva-asr-ctc-decoder-cpu-streaming-offline' which has no loaded version
I0905 10:18:05.162747 71 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0905 10:18:05.162768 71 server.cc:543] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0905 10:18:05.162796 71 server.cc:586] 
+--------------------------------------------------------+---------+---------------------------------------------------------------------------------------+
| Model                                                  | Version | Status                                                                                |
+--------------------------------------------------------+---------+---------------------------------------------------------------------------------------+
| riva-asr-ctc-decoder-cpu-streaming-offline             | 1       | UNAVAILABLE: Internal: Initialization failed for all sequence-batch scheduler threads |
| riva-asr-feature-extractor-streaming-offline           | 1       | READY                                                                                 |
| riva-asr-voice-activity-detector-ctc-streaming-offline | 1       | READY                                                                                 |
| riva-trt-riva-asr-am-streaming-offline                 | 1       | READY                                                                                 |
+--------------------------------------------------------+---------+---------------------------------------------------------------------------------------+

I0905 10:18:05.162852 71 tritonserver.cc:1658] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.9.0                                                                                                                                                                                  |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0905 10:18:05.162856 71 server.cc:234] Waiting for in-flight requests to complete.
I0905 10:18:05.162858 71 model_repository_manager.cc:1099] unloading: riva-trt-riva-asr-am-streaming-offline:1
I0905 10:18:05.162875 71 model_repository_manager.cc:1099] unloading: riva-asr-voice-activity-detector-ctc-streaming-offline:1
I0905 10:18:05.162903 71 model_repository_manager.cc:1099] unloading: riva-asr-feature-extractor-streaming-offline:1
I0905 10:18:05.162937 71 server.cc:249] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0905 10:18:05.163895 71 model_repository_manager.cc:1223] successfully unloaded 'riva-asr-voice-activity-detector-ctc-streaming-offline' version 1
I0905 10:18:05.182325 71 model_repository_manager.cc:1223] successfully unloaded 'riva-asr-feature-extractor-streaming-offline' version 1
I0905 10:18:05.183213 71 model_repository_manager.cc:1223] successfully unloaded 'riva-trt-riva-asr-am-streaming-offline' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0905 10:18:06.163027 71 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs 
/opt/riva/bin/start-riva: line 1: kill: (71) - No such process

Everything is working just fine when im using no Language Model and a greedy decoder as follows:

riva-build speech_recognition /servicemaker-dev/german_quartznet_binaryLM.rmir /servicemaker-dev/german_quartznet.riva --offline --decoder_type=greedy

Can you point out to me whats going wrong? Thanks a lot in advance!

Hi @martin.waldschmidt ,
Thank you for highlighting this. We will fix this.
Meanwhile you can try following commands

riva-build speech_recognition \
    /servicemaker-dev/<rmir_filename>:<encryption_key>  \
    /servicemaker-dev/<riva_filename>:<encryption_key> \
    /servicemaker-dev/<n_gram_riva_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --decoder_type=os2s

Hi @AakankshaS,
Thanks for the fast reply.

What does the flag --name mean?

Thanks you very much!

Hi @martin.waldschmidt

User can use --name field in jarvis-build to give the model a name and then request it explicitly when making the ASR request using the model parameter in the request config.

Thanks

1 Like

Alright, thank you very much!