Triton server died before reaching ready state. Terminating Jarvis startup

Hi, I want to set up the Jarvis server with jarvis_init.sh, but is facing a problem of:
Triton server died before reaching ready state. Terminating Jarvis startup.

I have tried ignoring this issue and run jarvis_start.sh, but it just loops Waiting for Jarvis server to load all models...retrying in 10 seconds, and ultimately printed out

Health ready check failed.
Check Jarvis logs with: docker logs jarvis-speech

I am not sure what I am doing wrong.

Below is the output of docker logs jarvis-speech
I did not make any changes to the config.sh file.

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release 21.05 (build 23684531)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0609 23:03:48.230563 54 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x2035a0000' with size 268435456
I0609 23:03:48.230645 54 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
E0609 23:03:48.236969 54 model_repository_manager.cc:1946] Poll failed for model directory 'jarvis-trt-jarvis_qa-nn-bert-base-uncased': failed to open text file for read /data/models/jarvis-trt-jarvis_qa-nn-bert-base-uncased/config.pbtxt: No such file or directory
I0609 23:03:48.237594 54 model_repository_manager.cc:1066] loading: jarvis_qa_preprocessor:1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0609 23:03:48.338706 54 custom_backend.cc:198] Creating instance jarvis_qa_preprocessor_0_0_cpu on CPU using libtriton_jarvis_nlp_tokenizer.so
I0609 23:03:48.379554 54 model_repository_manager.cc:1240] successfully loaded 'jarvis_qa_preprocessor' version 1
I0609 23:03:48.379708 54 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 23:03:48.379732 54 server.cc:543]
+----------+------------+--------+
| Backend  | Path       | Config |
+----------+------------+--------+
| tensorrt | <built-in> | {}     |
+----------+------------+--------+

I0609 23:03:48.379790 54 server.cc:586]
+------------------------+---------+--------+
| Model                  | Version | Status |
+------------------------+---------+--------+
| jarvis_qa_preprocessor | 1       | READY  |
+------------------------+---------+--------+

I0609 23:03:48.379897 54 tritonserver.cc:1658]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value
                                          |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton
                                          |
| server_version                   | 2.9.0
                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models
                                          |
| model_control_mode               | MODE_NONE
                                          |
| strict_model_config              | 1
                                          |
| pinned_memory_pool_byte_size     | 268435456
                                          |
| cuda_memory_pool_byte_size{0}    | 1000000000
                                          |
| min_supported_compute_capability | 6.0
                                          |
| strict_readiness                 | 1
                                          |
| exit_timeout                     | 30
                                          |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 23:03:48.379930 54 server.cc:234] Waiting for in-flight requests to complete.
I0609 23:03:48.379936 54 model_repository_manager.cc:1099] unloading: jarvis_qa_preprocessor:1
I0609 23:03:48.380021 54 server.cc:249] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 23:03:48.381494 54 model_repository_manager.cc:1223] successfully unloaded 'jarvis_qa_preprocessor' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0609 23:03:49.380324 54 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Triton server died before reaching ready state. Terminating Jarvis startup.
Check Triton logs with: docker logs
/opt/jarvis/bin/start-jarvis: line 1: kill: (54) - No such process

Not sure whether this is helpful or not, but below is the output of my nvidia-smi

Thu Jun 10 07:48:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.28       Driver Version: 470.76       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Quadro R...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8     8W /  N/A |    164MiB /  6144MiB |    ERR!      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I am using wsl2 with Ubuntu 18.04 on a Windows 10 device with NVIDIA Quadro RTX 3000 GPU.

Thank you!

Could you please run the script jarvis_clean.sh and then start afresh?
In case issue persist, please share the latest log, cmd output and system details so we can help better?

Thanks

Hi there. Has this been fixed by any chance. I am trying to get up to speed with the latest version (riva v1.4.0 beta) and I am getting exactly the same symptoms. I have cleaned and started from scratch several times, but no luck yet. Any hints would be greatly appreciated. Thanks!

Here is the log from docker logs riva-speech:


==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.07 (build 25292380)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0808 23:14:43.261451 75 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1050 Ti with Max-Q Design
I0808 23:14:43.284651 75 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0808 23:14:43.284836 75 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0808 23:14:43.284855 75 onnxruntime.cc:1738] 'onnxruntime' TRITONBACKEND API version: 1.0
I0808 23:14:43.409988 75 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7fc97e000000' with size 268435456
I0808 23:14:43.410781 75 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
E0808 23:14:43.418776 75 model_repository_manager.cc:1946] Poll failed for model directory 'riva-trt-citrinet-1024': failed to open text file for read /data/models/riva-trt-citrinet-1024/config.pbtxt: No such file or directory
E0808 23:14:43.418826 75 model_repository_manager.cc:1946] Poll failed for model directory 'riva-trt-riva_punctuation-nn-bert-base-uncased': failed to open text file for read /data/models/riva-trt-riva_punctuation-nn-bert-base-uncased/config.pbtxt: No such file or directory
E0808 23:14:43.419733 75 model_repository_manager.cc:1946] Poll failed for model directory 'riva_punctuation_label_tokens_cap': failed to open text file for read /data/models/riva_punctuation_label_tokens_cap/config.pbtxt: No such file or directory
E0808 23:14:43.419778 75 model_repository_manager.cc:1946] Poll failed for model directory 'riva_punctuation_label_tokens_punct': failed to open text file for read /data/models/riva_punctuation_label_tokens_punct/config.pbtxt: No such file or directory
E0808 23:14:43.420574 75 model_repository_manager.cc:1431] Invalid argument: ensemble citrinet-1024-asr-trt-ensemble-vad-streaming contains models that are not available: riva-trt-citrinet-1024
E0808 23:14:43.420583 75 model_repository_manager.cc:1431] Invalid argument: ensemble riva_punctuation contains models that are not available: riva-trt-riva_punctuation-nn-bert-base-uncased, riva_punctuation_label_tokens_punct, riva_punctuation_label_tokens_cap
I0808 23:14:43.420654 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0808 23:14:43.522432 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0808 23:14:43.523450 75 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming_0_0_gpu0 on GPU 0 (6.1) using libtriton_riva_asr_features.so
I0808 23:14:43.623112 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline:1
I0808 23:14:43.623836 75 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0808 23:14:43.723376 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline:1
I0808 23:14:43.723613 75 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0808 23:14:43.823623 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline:1
I0808 23:14:43.823951 75 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline_0_0_gpu0 on GPU 0 (6.1) using libtriton_riva_asr_features.so
I0808 23:14:43.923867 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0808 23:14:43.924071 75 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0808 23:14:44.000512 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline' version 1
I0808 23:14:44.024210 75 model_repository_manager.cc:1066] loading: riva_detokenize:1
I0808 23:14:44.024621 75 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0808 23:14:44.095156 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming' version 1
I0808 23:14:44.124494 75 model_repository_manager.cc:1066] loading: riva_punctuation_gen_output:1
I0808 23:14:44.124678 75 custom_backend.cc:198] Creating instance riva_detokenize_0_0_cpu on CPU using libtriton_riva_nlp_detokenizer.so
I0808 23:14:44.128839 75 model_repository_manager.cc:1240] successfully loaded 'riva_detokenize' version 1
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, row_bytes * rows)' in fileriva/cbe/common/cu-matrix.cc line 101'
cudaError_t 1 : "invalid argument" returned from 'cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)' in fileriva/cbe/common/cu-matrix.cc line 125'
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0808 23:14:44.224773 75 model_repository_manager.cc:1066] loading: riva_punctuation_merge_labels:1
I0808 23:14:44.224929 75 custom_backend.cc:198] Creating instance riva_punctuation_gen_output_0_0_cpu on CPU using libtriton_riva_nlp_punctuation.so
I0808 23:14:44.228038 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0808 23:14:44.228130 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline' version 1
I0808 23:14:44.228343 75 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation_gen_output' version 1
I0808 23:14:44.325025 75 model_repository_manager.cc:1066] loading: riva_tokenizer:1
I0808 23:14:44.325247 75 custom_backend.cc:198] Creating instance riva_punctuation_merge_labels_0_0_cpu on CPU using libtriton_riva_nlp_labels.so
I0808 23:14:44.329531 75 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation_merge_labels' version 1
I0808 23:14:44.425508 75 custom_backend.cc:198] Creating instance riva_tokenizer_0_0_cpu on CPU using libtriton_riva_nlp_tokenizer.so
I0808 23:14:44.448542 75 model_repository_manager.cc:1240] successfully loaded 'riva_tokenizer' version 1
I0808 23:14:44.578084 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming' version 1
I0808 23:14:44.653141 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline' version 1
I0808 23:14:44.653243 75 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0808 23:14:44.653272 75 server.cc:543] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0808 23:14:44.653352 75 server.cc:586] 
+----------------------------------------------------------------------------------------------------+---------+--------+
| Model                                                                                              | Version | Status |
+----------------------------------------------------------------------------------------------------+---------+--------+
| citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming                             | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming                           | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline             | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline           | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming                 | 1       | READY  |
| riva_detokenize                                                                                    | 1       | READY  |
| riva_punctuation_gen_output                                                                        | 1       | READY  |
| riva_punctuation_merge_labels                                                                      | 1       | READY  |
| riva_tokenizer                                                                                     | 1       | READY  |
+----------------------------------------------------------------------------------------------------+---------+--------+

I0808 23:14:44.653480 75 tritonserver.cc:1658] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.9.0                                                                                                                                                                                  |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0808 23:14:44.653486 75 server.cc:234] Waiting for in-flight requests to complete.
I0808 23:14:44.653490 75 model_repository_manager.cc:1099] unloading: riva_tokenizer:1
I0808 23:14:44.653541 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0808 23:14:44.653625 75 model_repository_manager.cc:1099] unloading: riva_punctuation_gen_output:1
I0808 23:14:44.653812 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline:1
I0808 23:14:44.653988 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline:1
I0808 23:14:44.654041 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0808 23:14:44.654155 75 model_repository_manager.cc:1099] unloading: riva_punctuation_merge_labels:1
I0808 23:14:44.654207 75 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation_gen_output' version 1
I0808 23:14:44.654333 75 model_repository_manager.cc:1099] unloading: riva_detokenize:1
I0808 23:14:44.654709 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline:1
I0808 23:14:44.654768 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0808 23:14:44.654846 75 server.cc:249] Timeout 30: Found 9 live models and 0 in-flight non-inference requests
I0808 23:14:44.655028 75 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation_merge_labels' version 1
I0808 23:14:44.655277 75 model_repository_manager.cc:1223] successfully unloaded 'riva_detokenize' version 1
I0808 23:14:44.658378 75 model_repository_manager.cc:1223] successfully unloaded 'riva_tokenizer' version 1
I0808 23:14:44.666682 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline' version 1
I0808 23:14:44.668366 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming' version 1
I0808 23:14:44.683012 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0808 23:14:44.722378 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline' version 1
I0808 23:14:44.892547 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline' version 1
I0808 23:14:44.902566 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
W0808 23:14:45.263508 75 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
W0808 23:14:45.263589 75 metrics.cc:307] failed to get power usage for GPU 0: Not Supported
W0808 23:14:45.263600 75 metrics.cc:329] failed to get energy consumption for GPU 0: Not Supported
I0808 23:14:45.655723 75 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Riva waiting for Triton server to load all models...retrying in 1 second
W0808 23:14:47.264069 75 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
W0808 23:14:47.264187 75 metrics.cc:307] failed to get power usage for GPU 0: Not Supported
W0808 23:14:47.264210 75 metrics.cc:329] failed to get energy consumption for GPU 0: Not Supported
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs 
/opt/riva/bin/start-riva: line 1: kill: (75) - No such process

I think issue might be due to GPU memory requirement not met.
https://docs.nvidia.com/deeplearning/jarvis/user-guide/docs/support-matrix.html#hardware

Thanks

Hi Sunil,

Thank you for the prompt response. I have tried the same on different hardware which meets the requirements (at least I think so - please correct me if I am wrong). Here’s my output from nvidia-smi:

Mon Aug  9 08:25:57 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P0    25W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The cudaMalloc error has disappeared now, which is good. However, the server still fails to start. I get a n otherwise very similar log:

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.07 (build 25292380)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:09.608106 73 metrics.cc:228] Collecting metrics for GPU 0: Tesla V100-SXM2-16GB
I0809 08:19:09.611788 73 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0809 08:19:09.611823 73 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0809 08:19:09.611835 73 onnxruntime.cc:1738] 'onnxruntime' TRITONBACKEND API version: 1.0
I0809 08:19:09.818898 73 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7f8318000000' with size 268435456
I0809 08:19:09.819359 73 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
E0809 08:19:09.826319 73 model_repository_manager.cc:1946] Poll failed for model directory 'riva-trt-citrinet-1024': failed to open text file for read /data/models/riva-trt-citrinet-1024/config.pbtxt: No such file or directory
E0809 08:19:09.830150 73 model_repository_manager.cc:1431] Invalid argument: ensemble citrinet-1024-asr-trt-ensemble-vad-streaming contains models that are not available: riva-trt-citrinet-1024
I0809 08:19:09.830277 73 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0809 08:19:09.930657 73 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0809 08:19:09.931107 73 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming_0_0_gpu0 on GPU 0 (7.0) using libtriton_riva_asr_features.so
I0809 08:19:10.031024 73 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline:1
I0809 08:19:10.031357 73 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0809 08:19:10.131479 73 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline:1
I0809 08:19:10.131765 73 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0809 08:19:10.232144 73 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline:1
I0809 08:19:10.232748 73 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline_0_0_gpu0 on GPU 0 (7.0) using libtriton_riva_asr_features.so
I0809 08:19:10.332875 73 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0809 08:19:10.333165 73 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0809 08:19:10.433631 73 model_repository_manager.cc:1066] loading: riva-trt-riva_punctuation-nn-bert-base-uncased:1
I0809 08:19:10.434435 73 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0809 08:19:10.535025 73 model_repository_manager.cc:1066] loading: riva_detokenize:1
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:10.636381 73 model_repository_manager.cc:1066] loading: riva_punctuation_gen_output:1
I0809 08:19:10.637236 73 custom_backend.cc:198] Creating instance riva_detokenize_0_0_cpu on CPU using libtriton_riva_nlp_detokenizer.so
I0809 08:19:10.640930 73 model_repository_manager.cc:1240] successfully loaded 'riva_detokenize' version 1
I0809 08:19:10.736893 73 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline' version 1
I0809 08:19:10.737141 73 model_repository_manager.cc:1066] loading: riva_punctuation_label_tokens_cap:1
I0809 08:19:10.737592 73 custom_backend.cc:198] Creating instance riva_punctuation_gen_output_0_0_cpu on CPU using libtriton_riva_nlp_punctuation.so
I0809 08:19:10.740658 73 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation_gen_output' version 1
I0809 08:19:10.837529 73 model_repository_manager.cc:1066] loading: riva_punctuation_label_tokens_punct:1
I0809 08:19:10.837810 73 custom_backend.cc:198] Creating instance riva_punctuation_label_tokens_cap_0_0_cpu on CPU using libtriton_riva_nlp_seqlabel.so
I0809 08:19:10.841017 73 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation_label_tokens_cap' version 1
I0809 08:19:10.843310 73 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming' version 1
I0809 08:19:10.938104 73 model_repository_manager.cc:1066] loading: riva_punctuation_merge_labels:1
I0809 08:19:10.938516 73 custom_backend.cc:198] Creating instance riva_punctuation_label_tokens_punct_0_0_cpu on CPU using libtriton_riva_nlp_seqlabel.so
I0809 08:19:10.939538 73 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation_label_tokens_punct' version 1
I0809 08:19:11.038768 73 model_repository_manager.cc:1066] loading: riva_tokenizer:1
I0809 08:19:11.039165 73 custom_backend.cc:198] Creating instance riva_punctuation_merge_labels_0_0_cpu on CPU using libtriton_riva_nlp_labels.so
I0809 08:19:11.041352 73 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation_merge_labels' version 1
I0809 08:19:11.140001 73 custom_backend.cc:198] Creating instance riva_tokenizer_0_0_cpu on CPU using libtriton_riva_nlp_tokenizer.so
I0809 08:19:11.198410 73 model_repository_manager.cc:1240] successfully loaded 'riva_tokenizer' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:12.592620 73 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming' version 1
I0809 08:19:12.650523 73 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:21.639735 73 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0809 08:19:21.639768 73 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:27.805589 73 plan_backend.cc:384] Creating instance riva-trt-riva_punctuation-nn-bert-base-uncased_0_0_gpu0 on GPU 0 (7.0) using model.plan
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:29.404662 73 plan_backend.cc:768] Created instance riva-trt-riva_punctuation-nn-bert-base-uncased_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0809 08:19:29.412361 73 model_repository_manager.cc:1240] successfully loaded 'riva-trt-riva_punctuation-nn-bert-base-uncased' version 1
I0809 08:19:29.412983 73 model_repository_manager.cc:1066] loading: riva_punctuation:1
I0809 08:19:29.513455 73 model_repository_manager.cc:1240] successfully loaded 'riva_punctuation' version 1
I0809 08:19:29.513575 73 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0809 08:19:29.513621 73 server.cc:543] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0809 08:19:29.513726 73 server.cc:586] 
+----------------------------------------------------------------------------------------------------+---------+--------+
| Model                                                                                              | Version | Status |
+----------------------------------------------------------------------------------------------------+---------+--------+
| citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming                             | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming                           | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline             | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline           | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline | 1       | READY  |
| citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming                 | 1       | READY  |
| riva-trt-riva_punctuation-nn-bert-base-uncased                                                     | 1       | READY  |
| riva_detokenize                                                                                    | 1       | READY  |
| riva_punctuation                                                                                   | 1       | READY  |
| riva_punctuation_gen_output                                                                        | 1       | READY  |
| riva_punctuation_label_tokens_cap                                                                  | 1       | READY  |
| riva_punctuation_label_tokens_punct                                                                | 1       | READY  |
| riva_punctuation_merge_labels                                                                      | 1       | READY  |
| riva_tokenizer                                                                                     | 1       | READY  |
+----------------------------------------------------------------------------------------------------+---------+--------+

I0809 08:19:29.513835 73 tritonserver.cc:1658] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.9.0                                                                                                                                                                                  |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0809 08:19:29.513853 73 server.cc:234] Waiting for in-flight requests to complete.
I0809 08:19:29.513860 73 model_repository_manager.cc:1099] unloading: riva_tokenizer:1
I0809 08:19:29.513899 73 model_repository_manager.cc:1099] unloading: riva_punctuation_merge_labels:1
I0809 08:19:29.514027 73 model_repository_manager.cc:1099] unloading: riva_punctuation_label_tokens_punct:1
I0809 08:19:29.514185 73 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0809 08:19:29.514343 73 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0809 08:19:29.514559 73 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation_label_tokens_punct' version 1
I0809 08:19:29.514560 73 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation_merge_labels' version 1
I0809 08:19:29.514635 73 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline:1
I0809 08:19:29.515101 73 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline:1
I0809 08:19:29.515238 73 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline:1
I0809 08:19:29.515417 73 model_repository_manager.cc:1099] unloading: riva_punctuation:1
I0809 08:19:29.515879 73 model_repository_manager.cc:1099] unloading: riva_punctuation_gen_output:1
I0809 08:19:29.515950 73 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation' version 1
I0809 08:19:29.516100 73 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0809 08:19:29.516253 73 model_repository_manager.cc:1099] unloading: riva-trt-riva_punctuation-nn-bert-base-uncased:1
I0809 08:19:29.516363 73 model_repository_manager.cc:1099] unloading: riva_detokenize:1
I0809 08:19:29.516471 73 model_repository_manager.cc:1099] unloading: riva_punctuation_label_tokens_cap:1
I0809 08:19:29.516622 73 server.cc:249] Timeout 30: Found 11 live models and 0 in-flight non-inference requests
I0809 08:19:29.518555 73 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation_label_tokens_cap' version 1
I0809 08:19:29.519226 73 model_repository_manager.cc:1223] successfully unloaded 'riva_tokenizer' version 1
I0809 08:19:29.519473 73 model_repository_manager.cc:1223] successfully unloaded 'riva_punctuation_gen_output' version 1
I0809 08:19:29.522928 73 model_repository_manager.cc:1223] successfully unloaded 'riva_detokenize' version 1
I0809 08:19:29.526356 73 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline' version 1
I0809 08:19:29.531945 73 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming' version 1
I0809 08:19:29.540759 73 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0809 08:19:29.571905 73 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline' version 1
I0809 08:19:29.591141 73 model_repository_manager.cc:1223] successfully unloaded 'riva-trt-riva_punctuation-nn-bert-base-uncased' version 1
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0809 08:19:29.844704 73 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming' version 1
I0809 08:19:29.849695 73 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline' version 1
I0809 08:19:30.516863 73 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs 
/opt/riva/bin/start-riva: line 1: kill: (73) - No such process

It appears that all the models are initially loading successfully, but then they start unloading and then the process terminates.

I have tried this several times, and I can see pretty consistently that the unloading starts exactly 20 seconds after I start the execution of riva_start.sh. I wonder if there is a timeout setting somewhere that I could possibly increase or override? Or maybe you can spot something else in the fresh logs?

Thanks in advance,
Yannis

EDIT (again) : I have also updated the nvidia drivers, from version 450 to the latest (470) - but still no luck.

Can you try jarvis clean and restart the process?
Also could you please try commenting out all the NLP models except 1 and see if that deploys successfully on your setup.

Thanks

Hi SunilJB,

This worked - many thanks!

To recap, the solution in my case was to:

  • Use a different machine, with a supported GPU
  • Update its nvidia drivers to the latest version (470 did the trick) and clean/restart after that.

All good.

1 Like