Jarvis-asr: jarvis_start.sh times out [TensorRT version error]

Hi, We have converted Nemo model[b4 branch] to Jarvis asr speech skills model for audio streaming.
We have used these steps for nemo to jarvis conversion:

On running jarvis_start.sh [Jarvis API version: 1.0.0-b3], it times out and throws a TensorRT version error and Model not loaded error inside the docker.

TensorRT error in the docker logs:

E0628 04:06:02.545757 70 logging.cc:43] INVALID_CONFIG: The engine plan file is not compatible with this version of TensorRT, expecting library version 7.2.2 got 7.2.1, please rebuild.

Here is the entire output of docker logs jarvis-speech :

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release 21.03 (build 21236204)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0628 04:05:47.190740 70 metrics.cc:221] Collecting metrics for GPU 0: Tesla T4
I0628 04:05:47.240207 70 onnxruntime.cc:1728] TRITONBACKEND_Initialize: onnxruntime
I0628 04:05:47.240817 70 onnxruntime.cc:1738] Triton TRITONBACKEND API version: 1.0
I0628 04:05:47.240830 70 onnxruntime.cc:1744] 'onnxruntime' TRITONBACKEND API version: 1.0
I0628 04:05:47.453731 70 pinned_memory_manager.cc:205] Pinned memory pool is created at '0x7f2098000000' with size 268435456
I0628 04:05:47.454794 70 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
I0628 04:05:47.472853 70 model_repository_manager.cc:787] loading: jarvis-asr-ctc-decoder-cpu-streaming:1
I0628 04:05:47.573070 70 model_repository_manager.cc:787] loading: jarvis-asr-feature-extractor-streaming:1
I0628 04:05:47.573322 70 custom_backend.cc:198] Creating instance jarvis-asr-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_jarvis_asr_decoder_cpu.so
I0628 04:05:47.624053 70 model_repository_manager.cc:960] successfully loaded 'jarvis-asr-ctc-decoder-cpu-streaming' version 1
I0628 04:05:47.673287 70 model_repository_manager.cc:787] loading: jarvis-asr-voice-activity-detector-ctc-streaming:1
I0628 04:05:47.673631 70 custom_backend.cc:201] Creating instance jarvis-asr-feature-extractor-streaming_0_0_gpu0 on GPU 0 (7.5) using libtriton_jarvis_asr_features.so
I0628 04:05:47.773488 70 model_repository_manager.cc:787] loading: jarvis-trt-jarvis-asr-am-streaming:1
I0628 04:05:47.773715 70 custom_backend.cc:198] Creating instance jarvis-asr-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_jarvis_asr_vad.so
I0628 04:05:47.852341 70 model_repository_manager.cc:960] successfully loaded 'jarvis-asr-voice-activity-detector-ctc-streaming' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
E0628 04:06:02.545757 70 logging.cc:43] INVALID_CONFIG: The engine plan file is not compatible with this version of TensorRT, expecting library version 7.2.2 got 7.2.1, please rebuild.
E0628 04:06:02.545803 70 logging.cc:43] engine.cpp (1646) - Serialization Error in deserialize: 0 (Core engine deserialization failure)
E0628 04:06:02.553355 70 logging.cc:43] INVALID_STATE: std::exception
E0628 04:06:02.553397 70 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
E0628 04:06:02.554710 70 model_repository_manager.cc:963] failed to load 'jarvis-trt-jarvis-asr-am-streaming' version 1: Internal: unable to create TensorRT engine
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0628 04:06:03.232984 70 model_repository_manager.cc:960] successfully loaded 'jarvis-asr-feature-extractor-streaming' version 1
E0628 04:06:03.233056 70 model_repository_manager.cc:1160] Invalid argument: ensemble 'jarvis-asr' depends on 'jarvis-trt-jarvis-asr-am-streaming' which has no loaded version
I0628 04:06:03.233142 70 server.cc:495] 
+-------------+-----------------------------------------------------------------+------+
| Backend     | Config                                                          | Path |
+-------------+-----------------------------------------------------------------+------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}   |
+-------------+-----------------------------------------------------------------+------+

I0628 04:06:03.233204 70 server.cc:538] 
+--------------------------------------------------+---------+---------------------------------------------------------+
| Model                                            | Version | Status                                                  |
+--------------------------------------------------+---------+---------------------------------------------------------+
| jarvis-asr                                       | -       | Not loaded: No model version was found                  |
| jarvis-asr-ctc-decoder-cpu-streaming             | 1       | READY                                                   |
| jarvis-asr-feature-extractor-streaming           | 1       | READY                                                   |
| jarvis-asr-voice-activity-detector-ctc-streaming | 1       | READY                                                   |
| jarvis-trt-jarvis-asr-am-streaming               | 1       | UNAVAILABLE: Internal: unable to create TensorRT engine |
+--------------------------------------------------+---------+---------------------------------------------------------+

I0628 04:06:03.233297 70 tritonserver.cc:1642] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                              |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                             |
| server_version                   | 2.7.0                                                                                                                                              |
| server_extensions                | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                       |
| model_control_mode               | MODE_NONE                                                                                                                                          |
| strict_model_config              | 1                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                          |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                         |
| min_supported_compute_capability | 6.0                                                                                                                                                |
| strict_readiness                 | 1                                                                                                                                                  |
| exit_timeout                     | 30                                                                                                                                                 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+

I0628 04:06:03.233308 70 server.cc:220] Waiting for in-flight requests to complete.
I0628 04:06:03.233315 70 model_repository_manager.cc:820] unloading: jarvis-asr-voice-activity-detector-ctc-streaming:1
I0628 04:06:03.233368 70 model_repository_manager.cc:820] unloading: jarvis-asr-feature-extractor-streaming:1
I0628 04:06:03.233556 70 model_repository_manager.cc:820] unloading: jarvis-asr-ctc-decoder-cpu-streaming:1
I0628 04:06:03.233722 70 server.cc:235] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0628 04:06:03.439320 70 model_repository_manager.cc:943] successfully unloaded 'jarvis-asr-voice-activity-detector-ctc-streaming' version 1
I0628 04:06:03.439458 70 model_repository_manager.cc:943] successfully unloaded 'jarvis-asr-ctc-decoder-cpu-streaming' version 1
I0628 04:06:03.445044 70 model_repository_manager.cc:943] successfully unloaded 'jarvis-asr-feature-extractor-streaming' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0628 04:06:04.233848 70 server.cc:235] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Triton server died before reaching ready state. Terminating Jarvis startup.
Check Triton logs with: docker logs 
/opt/jarvis/bin/start-jarvis: line 1: kill: (70) - No such process

Hi,
We recommend you to raise this query in TRITON forum for better assistance.

Thanks!

Hi @namanveer2000

Could you please try to run jarvis_clean.sh and then jarvis_init.sh.
To me, it looks like you may have downloaded a new version and perhaps trying to install over top of an old version because of this:
[E] [TRT] INVALID_CONFIG: The engine plan file is not compatible with this version of TensorRT, expecting library version 7.2.1 got 7.2.2, please rebuild.

Thanks

Hi @SunilJB. We have tried this approach initially and the models which were created as a result of jarvis_init.sh were giving us an improper and gibberish output. Due to this, we used a different method to convert Nemo to Jarvis.

We have converted the model using the 1.0.0-b.3 servicemaker instead of 1.0.0-b.2 servicemaker. By this we were able to resolve the TensorRT version error.