Init. Jarvis with german model

Hi,

im trying to initialize Jarvis’ ASR with a pretrained german model and i cant find any documentation on how to do it properly.

Following the Quickstart-Guide, i ran jarvis_init.sh and jarvis_start.sh.

Before that i changed the config.sh file according to my needs, and stumbled upon that:

# JMIR ($jarvis_model_loc/jmir)
# Jarvis uses an intermediate representation (JMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $jarvis_model_loc/jmir by `jarvis_init.sh`

with emphasis on (by specifying NGC models below).

The ENGLISH Jasper Offline ASR Model can be obtained by callling it in the config.sh file as follows:

###  Jasper Offline w/ CPU decoder
    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_offline:${jarvis_ngc_model_version}"

How do i have to modify this to deploy the german Jarvis Model?

I know that u can get the according .nemo file from here:
https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_de_quartznet15x5

but im clueless how to proceed from there.
They mention to run

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="stt_de_quartznet15x5")

but where do i call this?
Do i have to call it when running inference, in the transcripe_file_offline.py file?

To be clear, Offline ASR does work on my setup when im in the

jarvis_quickstart_v1.2.1-beta/examples

directory and run

python3 transcribe_file_offline.py --audio /path/to/audio

I just want it to use the German Jasper Model and not the English one.

Thanks alot. Any advice is appreciated. Hava a nice weekend

Hi @martin.waldschmidt
Please refer to below link in case it helpful
https://docs.nvidia.com/deeplearning/jarvis/user-guide/docs/custom-model-deployment.html
https://docs.nvidia.com/tlt/tlt-user-guide/text/jarvis_tlt_integration.html

Thanks

Thanks alot!

1 Like

Hi! Did you manage to deploy it? I am also interested in training a german ASR model and deploy it using Nvidia tools.

@e.ricardo.chavez i’m still struggling quiet a bit. did u manage it?

Hi @martin.waldschmidt
Could you please share the error you are getting while deploying the custom model?
Also, if possible please share the model and repro script so we can help better.

Thanks

Hi @SunilJB ,
So im trying to deploy a german quartznet .nemo model. I converted it ro .riva by using the nemo2riva package.

After that im running riva-build as follows:

docker run --gpus all -it --rm -v /home/ws/untow/riva3:/servicemaker-dev -v /home/ws/untow/riva3/models:/data --entrypoint="/bin/bash" nvcr.io/nvidia/riva/riva-speech:1.5.0-beta-servicemaker

riva-build speech_recognition
/servicemaker-dev/german_quartznet_binaryLM.rmir
/servicemaker-dev/german_quartznet.riva
–offline
–decoder_type=os2s
–decoding_language_model_binary=“scorer6.binary”

Running ./riva_init.sh leads to:

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Image nvcr.io/nvidia/riva/riva-speech:1.5.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech-client:1.5.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech:1.5.0-beta-servicemaker exists. Skipping.

Converting RMIRs at /home/ws/untow/riva3/riva_model_loc/rmir to Riva Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v /home/ws/untow/riva3/riva_model_loc:/data -e MODEL_DEPLOY_KEY=tlt_encode --name riva-service-maker nvcr.io/nvidia/riva/riva-speech:1.5.0-beta-servicemaker deploy_all_models /data/rmir /data/models

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2021-09-08 09:09:59,584 [INFO] Writing Riva model repository to '/data/models'...
2021-09-08 09:09:59,584 [INFO] The riva model repo target directory is /data/models
2021-09-08 09:09:59,955 [INFO] Extract_binaries for featurizer -> /data/models/riva-asr-feature-extractor-streaming-offline/1
2021-09-08 09:09:59,957 [INFO] Extract_binaries for nn -> /data/models/riva-trt-riva-asr-am-streaming-offline/1
2021-09-08 09:09:59,993 [INFO] Printing copied artifacts:
2021-09-08 09:09:59,993 [INFO] {'onnx': '/data/models/riva-trt-riva-asr-am-streaming-offline/1/model_graph.onnx'}
2021-09-08 09:09:59,993 [INFO] Building TRT engine from ONNX file
[TensorRT] WARNING: /workspace/TensorRT/t/oss-cicd/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2021-09-08 09:12:39,673 [INFO] Extract_binaries for vad -> /data/models/riva-asr-voice-activity-detector-ctc-streaming-offline/1
2021-09-08 09:12:39,675 [INFO] Extract_binaries for lm_decoder -> /data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1
2021-09-08 09:12:39,675 [INFO] {'vocab_file': '/data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1/vocab.txt', 'decoding_language_model_binary': '/data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1/scorer6.binary'}
2021-09-08 09:12:39,676 [INFO] Extract_binaries for self -> /data/models/riva-asr/1
+ echo

+ echo 'Riva initialization complete. Run ./riva_start.sh to launch services.'
Riva initialization complete. Run ./riva_start.sh to launch services.

after that i’m running ./riva_start.sh with the following output:

 Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Health ready check failed.
Check Riva logs with: docker logs riva-speech

docker logs riva-speech shows:


==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.08 (build 26374001)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0908 09:13:19.451693 70 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I0908 09:13:19.453500 70 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0908 09:13:19.453512 70 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0908 09:13:19.453515 70 onnxruntime.cc:1738] 'onnxruntime' TRITONBACKEND API version: 1.0
I0908 09:13:19.598998 70 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7f2980000000' with size 268435456
I0908 09:13:19.599237 70 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
I0908 09:13:19.601927 70 model_repository_manager.cc:1066] loading: riva-asr-feature-extractor-streaming-offline:1
I0908 09:13:19.702143 70 model_repository_manager.cc:1066] loading: riva-asr-ctc-decoder-cpu-streaming-offline:1
I0908 09:13:19.702558 70 custom_backend.cc:201] Creating instance riva-asr-feature-extractor-streaming-offline_0_0_gpu0 on GPU 0 (8.6) using libtriton_riva_asr_features.so
I0908 09:13:19.802321 70 model_repository_manager.cc:1066] loading: riva-asr-voice-activity-detector-ctc-streaming-offline:1
I0908 09:13:19.802525 70 custom_backend.cc:198] Creating instance riva-asr-ctc-decoder-cpu-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
E:decoder_context.cc:594: Cannot initialize decoders. Error msg: external/kenlm/lm/model.cc:70 in lm::ngram::detail::GenericModel<Search, VocabularyT>::GenericModel(const char*, const lm::ngram::Config&) [with Search = lm::ngram::trie::TrieSearch<lm::ngram::SeparatelyQuantize, lm::ngram::trie::ArrayBhiksha>; VocabularyT = lm::ngram::SortedVocabulary] threw FormatLoadException because `new_config.enumerate_vocab && !parameters.fixed.has_vocabulary'.
E0908 09:13:19.805930 70 sequence_batch_scheduler.cc:941] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'riva-asr-ctc-decoder-cpu-streaming-offline': (13) Invalid parameters in model configuration
E0908 09:13:19.806230 70 model_repository_manager.cc:1243] failed to load 'riva-asr-ctc-decoder-cpu-streaming-offline' version 1: Internal: Initialization failed for all sequence-batch scheduler threads
I0908 09:13:19.902541 70 model_repository_manager.cc:1066] loading: riva-trt-riva-asr-am-streaming-offline:1
I0908 09:13:19.902783 70 custom_backend.cc:198] Creating instance riva-asr-voice-activity-detector-ctc-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0908 09:13:19.906469 70 model_repository_manager.cc:1240] successfully loaded 'riva-asr-voice-activity-detector-ctc-streaming-offline' version 1
The decoder requested all the vocabulary strings, but this binary file does not have them.  You may need to rebuild the binary file with an updated version of build_binary.  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0908 09:13:25.958206 70 model_repository_manager.cc:1240] successfully loaded 'riva-asr-feature-extractor-streaming-offline' version 1
I0908 09:13:26.401839 70 plan_backend.cc:384] Creating instance riva-trt-riva-asr-am-streaming-offline_0_0_gpu0 on GPU 0 (8.6) using model.plan
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0908 09:13:26.576973 70 plan_backend.cc:768] Created instance riva-trt-riva-asr-am-streaming-offline_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0908 09:13:26.579238 70 model_repository_manager.cc:1240] successfully loaded 'riva-trt-riva-asr-am-streaming-offline' version 1
E0908 09:13:26.579275 70 model_repository_manager.cc:1431] Invalid argument: ensemble 'riva-asr' depends on 'riva-asr-ctc-decoder-cpu-streaming-offline' which has no loaded version
I0908 09:13:26.579308 70 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0908 09:13:26.579330 70 server.cc:543] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0908 09:13:26.579365 70 server.cc:586] 
+--------------------------------------------------------+---------+---------------------------------------------------------------------------------------+
| Model                                                  | Version | Status                                                                                |
+--------------------------------------------------------+---------+---------------------------------------------------------------------------------------+
| riva-asr-ctc-decoder-cpu-streaming-offline             | 1       | UNAVAILABLE: Internal: Initialization failed for all sequence-batch scheduler threads |
| riva-asr-feature-extractor-streaming-offline           | 1       | READY                                                                                 |
| riva-asr-voice-activity-detector-ctc-streaming-offline | 1       | READY                                                                                 |
| riva-trt-riva-asr-am-streaming-offline                 | 1       | READY                                                                                 |
+--------------------------------------------------------+---------+---------------------------------------------------------------------------------------+

I0908 09:13:26.579433 70 tritonserver.cc:1658] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.9.0                                                                                                                                                                                  |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0908 09:13:26.579437 70 server.cc:234] Waiting for in-flight requests to complete.
I0908 09:13:26.579439 70 model_repository_manager.cc:1099] unloading: riva-trt-riva-asr-am-streaming-offline:1
I0908 09:13:26.579462 70 model_repository_manager.cc:1099] unloading: riva-asr-voice-activity-detector-ctc-streaming-offline:1
I0908 09:13:26.579501 70 model_repository_manager.cc:1099] unloading: riva-asr-feature-extractor-streaming-offline:1
I0908 09:13:26.579544 70 server.cc:249] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0908 09:13:26.580506 70 model_repository_manager.cc:1223] successfully unloaded 'riva-asr-voice-activity-detector-ctc-streaming-offline' version 1
I0908 09:13:26.597855 70 model_repository_manager.cc:1223] successfully unloaded 'riva-asr-feature-extractor-streaming-offline' version 1
I0908 09:13:26.598752 70 model_repository_manager.cc:1223] successfully unloaded 'riva-trt-riva-asr-am-streaming-offline' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0908 09:13:27.579639 70 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs 
/opt/riva/bin/start-riva: line 1: kill: (70) - No such process

I got no problems when using a greedy decoder with no binary language model… What could be the problem?

Thanks for your help!

Ok, nevermind. Found my mistake:

i was using an binary language model file i created using deepspeech by mozilla.
Turns out they are not compatible - should have checked that earlier… sorry my mistake

Thanks for the Help! Great projekt!

1 Like