Jarvis Installation Issue: "Waiting for Jarvis server to load all models...retrying in 10 seconds" when running sudo bash jarvis_start.sh

Hi all, I am having issue with installing Jarvis v1.2.0-beta on Ubuntu 20.04.2 with the Jarvis Quickstart.

After running sudo bash jarvis_init.sh it says “Jarvis initialisation complete”. However, when I run sudo bash jarvis_start.sh, Jarvis server cannot load all the model. By looking at the docker logs, I think it might have something to do with failing to install jarvis-trt-waveglow.

I have made no changes to the config.sh file.

Here is the output of docker logs jarvis-speech:

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release 21.05 (build 23684531)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:10.744186 75 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA Quadro RTX 3000 with Max-Q Design
I0611 02:29:10.916648 75 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7f1c4e000000' with size 268435456
I0611 02:29:10.917305 75 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
E0611 02:29:10.926068 75 model_repository_manager.cc:1946] Poll failed for model directory 'jarvis-trt-jarvis_text_classification_domain-nn-bert-base-uncased': failed to open text file for read /data/models/jarvis-trt-jarvis_text_classification_domain-nn-bert-base-uncased/config.pbtxt: No such file or directory
I0611 02:29:10.930549 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0611 02:29:11.031248 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0611 02:29:11.032188 75 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming_0_0_gpu0 on GPU 0 (7.5) using libtriton_jarvis_asr_features.so
I0611 02:29:11.131601 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0611 02:29:11.131897 75 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_jarvis_asr_decoder_cpu.so
I0611 02:29:11.231859 75 model_repository_manager.cc:1066] loading: jarvis-trt-citrinet-1024:1
I0611 02:29:11.232112 75 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_jarvis_asr_vad.so
I0611 02:29:11.283928 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming' version 1
I0611 02:29:11.332063 75 model_repository_manager.cc:1066] loading: jarvis-trt-tacotron2_encoder:1
I0611 02:29:11.432315 75 model_repository_manager.cc:1066] loading: jarvis-trt-waveglow:1
I0611 02:29:11.532585 75 model_repository_manager.cc:1066] loading: jarvis_tokenizer:1
I0611 02:29:11.632892 75 model_repository_manager.cc:1066] loading: tacotron2_decoder_postnet:1
I0611 02:29:11.633249 75 custom_backend.cc:198] Creating instance jarvis_tokenizer_0_0_cpu on CPU using libtriton_jarvis_nlp_tokenizer.so
I0611 02:29:11.696349 75 model_repository_manager.cc:1240] successfully loaded 'jarvis_tokenizer' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:11.733491 75 model_repository_manager.cc:1066] loading: tts_preprocessor:1
I0611 02:29:11.736443 75 tacotron-decoder-postnet.cc:873] TRITONBACKEND_ModelInitialize: tacotron2_decoder_postnet (version 1)
I0611 02:29:11.739230 75 tacotron-decoder-postnet.cc:767] model configuration:
{
    "name": "tacotron2_decoder_postnet",
    "platform": "",
    "backend": "jarvis_tts_taco_postnet",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 8,
    "input": [
        {
            "name": "input_decoder",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1,
                400,
                512
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "input_processed_decoder",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                400,
                128,
                1,
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "input_num_characters",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "spectrogram_chunk",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                80,
                80
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "z",
            "data_type": "TYPE_FP32",
            "dims": [
                8,
                2656,
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "num_valid_samples",
            "data_type": "TYPE_INT32",
            "dims": [
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "end_flag",
            "data_type": "TYPE_INT32",
            "dims": [
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 8,
            "preferred_batch_size": [
                8
            ],
            "max_queue_delay_microseconds": 100
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "tacotron2_decoder_postnet_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "profile": []
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "num_samples_per_frame": {
            "string_value": "256"
        },
        "z_dim0": {
            "string_value": "8"
        },
        "tacotron_decoder_engine": {
            "string_value": "/data/models/tacotron2_decoder_postnet/1/model.plan"
        },
        "num_mels": {
            "string_value": "80"
        },
        "encoding_dimension": {
            "string_value": "512"
        },
        "z_dim1": {
            "string_value": "2656"
        },
        "max_execution_batch_size": {
            "string_value": "8"
        },
        "chunk_length": {
            "string_value": "80"
        },
        "max_input_length": {
            "string_value": "400"
        },
        "attention_dimension": {
            "string_value": "128"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": true
    }
}
I0611 02:29:11.739447 75 tacotron-decoder-postnet.cc:927] TRITONBACKEND_ModelInstanceInitialize: tacotron2_decoder_postnet_0 (device 0)
I0611 02:29:11.833734 75 model_repository_manager.cc:1066] loading: waveglow_denoiser:1
I0611 02:29:11.834481 75 custom_backend.cc:201] Creating instance tts_preprocessor_0_0_gpu0 on GPU 0 (7.5) using libtriton_jarvis_tts_preprocessor.so
I0611 02:29:11.841373 75 model_repository_manager.cc:1240] successfully loaded 'tts_preprocessor' version 1
I0611 02:29:11.934457 75 custom_backend.cc:201] Creating instance waveglow_denoiser_0_0_gpu0 on GPU 0 (7.5) using libtriton_jarvis_tts_denoiser.so
I0611 02:29:12.457702 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
W0611 02:29:12.745213 75 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
W0611 02:29:14.747740 75 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
W0611 02:29:16.750214 75 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:26.085057 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0611 02:29:26.264476 75 plan_backend.cc:384] Creating instance jarvis-trt-tacotron2_encoder_0_0_gpu0 on GPU 0 (7.5) using model.plan
I0611 02:29:26.371046 75 model_repository_manager.cc:1240] successfully loaded 'waveglow_denoiser' version 1
I0611 02:29:26.372714 75 plan_backend.cc:772] Created instance jarvis-trt-tacotron2_encoder_0_0_gpu0 on GPU 0 with stream priority 0
I0611 02:29:26.378955 75 model_repository_manager.cc:1240] successfully loaded 'jarvis-trt-tacotron2_encoder' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:27.796439 75 plan_backend.cc:384] Creating instance jarvis-trt-citrinet-1024_0_0_gpu0 on GPU 0 (7.5) using model.plan
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:28.539207 75 model_repository_manager.cc:1240] successfully loaded 'tacotron2_decoder_postnet' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:29.211907 75 plan_backend.cc:768] Created instance jarvis-trt-citrinet-1024_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0611 02:29:29.222198 75 model_repository_manager.cc:1240] successfully loaded 'jarvis-trt-citrinet-1024' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:30.183957 75 plan_backend.cc:384] Creating instance jarvis-trt-waveglow_0_0_gpu0 on GPU 0 (7.5) using model.plan
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
E0611 02:29:31.128529 75 logging.cc:43] ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
E0611 02:29:31.132046 75 logging.cc:43] FAILED_ALLOCATION: std::exception
E0611 02:29:31.174161 75 model_repository_manager.cc:1243] failed to load 'jarvis-trt-waveglow' version 1: Internal: unable to create TensorRT context
E0611 02:29:31.174547 75 model_repository_manager.cc:1431] Invalid argument: ensemble 'tacotron2_ensemble' depends on 'jarvis-trt-waveglow' which has no loaded version
I0611 02:29:31.174631 75 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming:1
I0611 02:29:31.275322 75 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming' version 1
I0611 02:29:31.275466 75 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0611 02:29:31.275542 75 server.cc:543] 
+-------------------------+-----------------------------------------------------------------------------------------+--------+
| Backend                 | Path                                                                                    | Config |
+-------------------------+-----------------------------------------------------------------------------------------+--------+
| tensorrt                | <built-in>                                                                              | {}     |
| jarvis_tts_taco_postnet | /opt/tritonserver/backends/jarvis_tts_taco_postnet/libtriton_jarvis_tts_taco_postnet.so | {}     |
+-------------------------+-----------------------------------------------------------------------------------------+--------+

I0611 02:29:31.275689 75 server.cc:586] 
+------------------------------------------------------------------------------------+---------+----------------------------------------------------------+
| Model                                                                              | Version | Status                                                   |
+------------------------------------------------------------------------------------+---------+----------------------------------------------------------+
| citrinet-1024-asr-trt-ensemble-vad-streaming                                       | 1       | READY                                                    |
| citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming             | 1       | READY                                                    |
| citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming           | 1       | READY                                                    |
| citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming | 1       | READY                                                    |
| jarvis-trt-citrinet-1024                                                           | 1       | READY                                                    |
| jarvis-trt-tacotron2_encoder                                                       | 1       | READY                                                    |
| jarvis-trt-waveglow                                                                | 1       | UNAVAILABLE: Internal: unable to create TensorRT context |
| jarvis_tokenizer                                                                   | 1       | READY                                                    |
| tacotron2_decoder_postnet                                                          | 1       | READY                                                    |
| tts_preprocessor                                                                   | 1       | READY                                                    |
| waveglow_denoiser                                                                  | 1       | READY                                                    |
+------------------------------------------------------------------------------------+---------+----------------------------------------------------------+

I0611 02:29:31.275851 75 tritonserver.cc:1658] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.9.0                                                                                                                                                                                  |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0611 02:29:31.275864 75 server.cc:234] Waiting for in-flight requests to complete.
I0611 02:29:31.275872 75 model_repository_manager.cc:1099] unloading: tacotron2_decoder_postnet:1
I0611 02:29:31.275936 75 model_repository_manager.cc:1099] unloading: tts_preprocessor:1
I0611 02:29:31.276052 75 model_repository_manager.cc:1099] unloading: jarvis_tokenizer:1
I0611 02:29:31.276408 75 model_repository_manager.cc:1099] unloading: waveglow_denoiser:1
I0611 02:29:31.276463 75 tacotron-decoder-postnet.cc:1000] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0611 02:29:31.276557 75 model_repository_manager.cc:1099] unloading: jarvis-trt-tacotron2_encoder:1
I0611 02:29:31.276673 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0611 02:29:31.276883 75 model_repository_manager.cc:1099] unloading: jarvis-trt-citrinet-1024:1
I0611 02:29:31.276999 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0611 02:29:31.277074 75 model_repository_manager.cc:1223] successfully unloaded 'tts_preprocessor' version 1
I0611 02:29:31.277188 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0611 02:29:31.277348 75 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming:1
I0611 02:29:31.277534 75 server.cc:249] Timeout 30: Found 9 live models and 0 in-flight non-inference requests
I0611 02:29:31.277850 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming' version 1
I0611 02:29:31.279871 75 model_repository_manager.cc:1223] successfully unloaded 'jarvis_tokenizer' version 1
I0611 02:29:31.282404 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0611 02:29:31.283409 75 model_repository_manager.cc:1223] successfully unloaded 'jarvis-trt-tacotron2_encoder' version 1
I0611 02:29:31.284242 75 model_repository_manager.cc:1223] successfully unloaded 'waveglow_denoiser' version 1
I0611 02:29:31.288785 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming' version 1
I0611 02:29:31.291100 75 model_repository_manager.cc:1223] successfully unloaded 'jarvis-trt-citrinet-1024' version 1
I0611 02:29:31.478491 75 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming' version 1
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:32.277635 75 server.cc:249] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:33.277750 75 server.cc:249] Timeout 28: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:34.277892 75 server.cc:249] Timeout 27: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:35.278036 75 server.cc:249] Timeout 26: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:36.278188 75 server.cc:249] Timeout 25: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:37.278330 75 server.cc:249] Timeout 24: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:38.278703 75 server.cc:249] Timeout 23: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:39.279092 75 server.cc:249] Timeout 22: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:40.279229 75 server.cc:249] Timeout 21: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:41.279357 75 server.cc:249] Timeout 20: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:42.279600 75 server.cc:249] Timeout 19: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:43.279781 75 server.cc:249] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:44.279968 75 server.cc:249] Timeout 17: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:45.280209 75 server.cc:249] Timeout 16: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:46.280416 75 server.cc:249] Timeout 15: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:47.280598 75 server.cc:249] Timeout 14: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:48.280764 75 server.cc:249] Timeout 13: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:49.280947 75 server.cc:249] Timeout 12: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:50.281162 75 server.cc:249] Timeout 11: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:51.281379 75 server.cc:249] Timeout 10: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:52.281784 75 server.cc:249] Timeout 9: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:53.281963 75 server.cc:249] Timeout 8: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:54.282127 75 server.cc:249] Timeout 7: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:55.282304 75 server.cc:249] Timeout 6: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:56.282544 75 server.cc:249] Timeout 5: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:57.282767 75 server.cc:249] Timeout 4: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:58.282976 75 server.cc:249] Timeout 3: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:29:59.283205 75 server.cc:249] Timeout 2: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 02:30:00.283356 75 server.cc:249] Timeout 1: Found 1 live models and 0 in-flight non-inference requests
I0611 02:30:01.283508 75 server.cc:249] Timeout 0: Found 1 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
  > Triton server died before reaching ready state. Terminating Jarvis startup.
Check Triton logs with: docker logs 
/opt/jarvis/bin/start-jarvis: line 1: kill: (75) - No such process

Output of sudo bash jarvis_init.sh:

Please enter API key for ngc.nvidia.com: 
Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Jarvis Speech Server images.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.2.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech-client:1.2.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.2.0-beta-servicemaker exists. Skipping.

Downloading models (JMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing JMIRs set the location and corresponding flag in config.sh.

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

/data/artifacts /opt/jarvis
Directory jmir_punctuation_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_jarvis_asr_citrinet_1024_asrset1p7_offline_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_punctuation_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_named_entity_recognition_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_intent_slot_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_question_answering_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_text_classification_v1.2.0-beta already exists, skipping. Use '--force' option to override.
Directory jmir_jarvis_tts_ljspeech_v1.2.0-beta already exists, skipping. Use '--force' option to override.
/opt/jarvis

Converting JMIRs at jarvis-model-repo/jmir to Jarvis Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v jarvis-model-repo:/data -e MODEL_DEPLOY_KEY=tlt_encode --name jarvis-service-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.2.0-beta-servicemaker deploy_all_models /data/jmir /data/models

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

Traceback (most recent call last):
  File "/opt/conda/bin/jarvis-deploy", line 8, in <module>
    sys.exit(deploy_from_jmir())
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/cli/deploy.py", line 73, in deploy_from_jmir
    raise FileExistsError(f"{args.target} exists. Use --force/-f to overwrite.")
FileExistsError: /data/models exists. Use --force/-f to overwrite.
+ echo

+ echo 'Jarvis initialization complete. Run ./jarvis_start.sh to launch services.'
Jarvis initialization complete. Run ./jarvis_start.sh to launch services.

Output of sudo bash jarvis_start.sh

Starting Jarvis Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Waiting for Jarvis server to load all models...retrying in 10 seconds
Health ready check failed.
Check Jarvis logs with: docker logs jarvis-speech

What I have done to try and resolve the issue:
I have tried following the forum below, I commented out all the NLP models except one, and also commented out the TTS model, then re-run sudo bash jarvis_init.sh and sudo bash jarvis_start.sh, but it still doesn’t work.

I also tried removing the docker volume jarvis-model-repo following the forum below.

But I could not remove the docker volume even with the force remove command docker volume rm -f jarvis-model-repo, it outputs:

Error response from daemon: remove jarvis-model-repo: volume is in use - [8977df414a7b381d054433d1ea37861232d53812d7dfda2d6bfcfc7eb93fd436]

Not sure whether the information below are useful or not, but here they are
Here is the output of nvidia-smi:

Fri Jun 11 11:50:51 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Quadro R...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   48C    P5     6W /  N/A |    948MiB /  5934MiB |     29%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1112      G   /usr/lib/xorg/Xorg                159MiB |
|    0   N/A  N/A      1841      G   /usr/lib/xorg/Xorg                358MiB |
|    0   N/A  N/A      2032      G   /usr/bin/gnome-shell               92MiB |
|    0   N/A  N/A      2557      G   ...AAAAAAAAA= --shared-files      109MiB |
|    0   N/A  N/A     13952      G   ...AAAAAAAAA= --shared-files       29MiB |
|    0   N/A  N/A     24910      G   ...AAAAAAAAA= --shared-files      182MiB |
+-----------------------------------------------------------------------------+

Output of nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

My docker version:

Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:38 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:50 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of sudo apt install nvidia-cuda-toolkit

Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-cuda-toolkit is already the newest version (10.1.243-3).
The following packages were automatically installed and are no longer required:
  chromium-codecs-ffmpeg-extra gstreamer1.0-vaapi libgstreamer-plugins-bad1.0-0 libva-wayland2
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 211 not upgraded.

Output of uname -a

Linux ato-Precision-5750 5.10.0-1029-oem #30-Ubuntu SMP Fri May 28 23:53:50 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output of lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           165
Model name:                      Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz
Stepping:                        2
CPU MHz:                         858.548
CPU max MHz:                     5100.0000
CPU min MHz:                     800.0000
BogoMIPS:                        4599.93
Virtualization:                  VT-x
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2 cache:                        2 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-15
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pd
                                 pe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 moni
                                 tor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
                                  rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept 
                                 vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav
                                 es dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke md_clear flush_l1d arch_capabilities

Output of lspci | grep VGA:

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation TU106GLM [Quadro RTX 3000 Mobile / Max-Q] (rev a1)

I am dual booting Ubuntu 20.04.2 on a Windows 10.

Thank you very much!

Hi @ato1

It seems to be OOM issue in your case. (Quadro RTX 3000 Mobile - 6GB only)

Which one NLP model you used during this exercise (All other models are commented)? Could run the script jarvis_clean.sh and then start afresh?
Can you share the docker logs jarvis-speech output of single model run as well?

Thanks

I have tried again with only the Bert base Punctuation model being not commented right now for NLP, and all the models for TTS and ASR are same as the default.
Then I ran the following commands

$ sudo bash jarvis_clean.sh
$ sudo bash jarvis_init.sh
$ sudo bash jarvis_start.sh

And I am still facing the OOM issue.

Here is the output of docker logs jarvis-speech

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release 21.05 (build 23684531)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 08:00:36.146282 74 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA Quadro RTX 3000 with Max-Q Design
I0611 08:00:36.255335 74 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7f8700000000' with size 268435456
I0611 08:00:36.255596 74 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
E0611 08:00:36.257952 74 model_repository_manager.cc:1946] Poll failed for model directory 'jarvis-trt-citrinet-1024': failed to open text file for read /data/models/jarvis-trt-citrinet-1024/config.pbtxt: No such file or directory
I0611 08:00:36.258026 74 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0611 08:00:36.359687 74 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming_0_0_gpu0 on GPU 0 (7.5) using libtriton_jarvis_asr_features.so
I0611 08:00:37.096279 74 model_repository_manager.cc:1240] successfully loaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0611 08:00:37.096373 74 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0611 08:00:37.096394 74 server.cc:543] 
+----------+------------+--------+
| Backend  | Path       | Config |
+----------+------------+--------+
| tensorrt | <built-in> | {}     |
+----------+------------+--------+

I0611 08:00:37.096413 74 server.cc:586] 
+--------------------------------------------------------------------------+---------+--------+
| Model                                                                    | Version | Status |
+--------------------------------------------------------------------------+---------+--------+
| citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming | 1       | READY  |
+--------------------------------------------------------------------------+---------+--------+

I0611 08:00:37.096483 74 tritonserver.cc:1658] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.9.0                                                                                                                                                                                  |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /data/models                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0611 08:00:37.096490 74 server.cc:234] Waiting for in-flight requests to complete.
I0611 08:00:37.096493 74 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0611 08:00:37.096519 74 server.cc:249] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
I0611 08:00:37.131892 74 model_repository_manager.cc:1223] successfully unloaded 'citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming' version 1
I0611 08:00:38.096704 74 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  > Jarvis waiting for Triton server to load all models...retrying in 1 second
W0611 08:00:38.146964 74 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
  > Triton server died before reaching ready state. Terminating Jarvis startup.
Check Triton logs with: docker logs 
/opt/jarvis/bin/start-jarvis: line 1: kill: (74) - No such process

Output of sudo bash jarvis_init.sh:

Please enter API key for ngc.nvidia.com: 
Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Jarvis Speech Server images.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.2.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech-client:1.2.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.2.0-beta-servicemaker exists. Skipping.

Downloading models (JMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing JMIRs set the location and corresponding flag in config.sh.

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

/data/artifacts /opt/jarvis
  > Downloading nvidia/jarvis/jmir_punctuation:1.2.0-beta...
Downloaded 418.11 MB in 1m 9s, Download speed: 6.05 MB/s               
----------------------------------------------------
Transfer id: jmir_punctuation_v1.2.0-beta Download status: Completed.
Downloaded local path: /data/artifacts/jmir_punctuation_v1.2.0-beta
Total files downloaded: 1 
Total downloaded size: 418.11 MB
Started at: 2021-06-11 07:47:00.145856
Completed at: 2021-06-11 07:48:09.245032
Duration taken: 1m 9s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:1.2.0-beta...
Downloaded 579.01 MB in 1m 26s, Download speed: 6.72 MB/s               
----------------------------------------------------
Transfer id: jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming_v1.2.0-beta Download status: Completed.
Downloaded local path: /data/artifacts/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming_v1.2.0-beta
Total files downloaded: 1 
Total downloaded size: 579.01 MB
Started at: 2021-06-11 07:48:18.063926
Completed at: 2021-06-11 07:49:44.188379
Duration taken: 1m 26s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline:1.2.0-beta...
Downloaded 579.01 MB in 1m 41s, Download speed: 5.72 MB/s               
----------------------------------------------------
Transfer id: jmir_jarvis_asr_citrinet_1024_asrset1p7_offline_v1.2.0-beta Download status: Completed.
Downloaded local path: /data/artifacts/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline_v1.2.0-beta
Total files downloaded: 1 
Total downloaded size: 579.01 MB
Started at: 2021-06-11 07:49:52.874157
Completed at: 2021-06-11 07:51:34.027360
Duration taken: 1m 41s
----------------------------------------------------
Directory jmir_punctuation_v1.2.0-beta already exists, skipping. Use '--force' option to override.
  > Downloading nvidia/jarvis/jmir_jarvis_tts_ljspeech:1.2.0-beta...
Downloaded 527.36 MB in 1m 10s, Download speed: 7.52 MB/s               
----------------------------------------------------
Transfer id: jmir_jarvis_tts_ljspeech_v1.2.0-beta Download status: Completed.
Downloaded local path: /data/artifacts/jmir_jarvis_tts_ljspeech_v1.2.0-beta
Total files downloaded: 1 
Total downloaded size: 527.36 MB
Started at: 2021-06-11 07:51:44.801552
Completed at: 2021-06-11 07:52:54.910119
Duration taken: 1m 10s
----------------------------------------------------
/opt/jarvis

Converting JMIRs at jarvis-model-repo/jmir to Jarvis Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v jarvis-model-repo:/data -e MODEL_DEPLOY_KEY=tlt_encode --name jarvis-service-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.2.0-beta-servicemaker deploy_all_models /data/jmir /data/models

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2021-06-11 07:53:01,429 [INFO] Writing Jarvis model repository to '/data/models'...
2021-06-11 07:53:01,429 [INFO] The jarvis model repo target directory is /data/models
2021-06-11 07:53:03,155 [INFO] Extract_binaries for featurizer -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming/1
2021-06-11 07:53:03,157 [INFO] Extract_binaries for nn -> /data/models/jarvis-trt-citrinet-1024/1
2021-06-11 07:53:07,834 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING /workspace/TensorRT/t/oss-cicd/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /workspace/TensorRT/t/oss-cicd/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 564429124
[TensorRT] WARNING: /workspace/TensorRT/t/oss-cicd/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
2021-06-11 07:53:23,381 [ERROR] Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/cli/deploy.py", line 87, in deploy_from_jmir
    generator.serialize_to_disk(
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 340, in serialize_to_disk
    module.serialize_to_disk(repo_dir, jmir, config_only, verbose, overwrite)
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 231, in serialize_to_disk
    self.update_binary(version_dir, jmir, verbose)
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 577, in update_binary
    with self.build_trt_engine_from_onnx(model_weights) as engine, open(
AttributeError: __enter__

+ echo

+ echo 'Jarvis initialization complete. Run ./jarvis_start.sh to launch services.'
Jarvis initialization complete. Run ./jarvis_start.sh to launch services.

The config.sh file:

# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.

# Enable or Disable Jarvis Services
service_enabled_asr=true
service_enabled_nlp=true
service_enabled_tts=true

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# jarvis_init.sh will create a `jmir` and `models` directory in the volume or
# path specified. 
#
# JMIR ($jarvis_model_loc/jmir)
# Jarvis uses an intermediate representation (JMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $jarvis_model_loc/jmir by `jarvis_init.sh`
# 
# Custom models produced by NeMo or TLT and prepared using jarvis-build
# may also be copied manually to this location $(jarvis_model_loc/jmir).
#
# Models ($jarvis_model_loc/models)
# During the jarvis_init process, the JMIR files in $jarvis_model_loc/jmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $jarvis_model_loc/models. The jarvis server exclusively uses these
# optimized versions.
jarvis_model_loc="jarvis-model-repo"

# The default JMIRs are downloaded from NGC by default in the above $jarvis_jmir_loc directory
# If you'd like to skip the download from NGC and use the existing JMIRs in the $jarvis_jmir_loc
# then set the below $use_existing_jmirs flag to true. You can also deploy your set of custom
# JMIRs by keeping them in the jarvis_jmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_jmirs=false

# Ports to expose for Jarvis services
jarvis_speech_api_port="50051"
jarvis_vision_api_port="60051"

# NGC orgs
jarvis_ngc_org="nvidia"
jarvis_ngc_team="jarvis"
jarvis_ngc_image_version="1.2.0-beta"
jarvis_ngc_model_version="1.2.0-beta"

# Pre-built models listed below will be downloaded from NGC. If models already exist in $jarvis-jmir
# then models can be commented out to skip download from NGC

########## ASR MODELS ##########

models_asr=(
### Punctuation model
    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_punctuation:${jarvis_ngc_model_version}"

### Citrinet-1024 Streaming w/ CPU decoder, best latency configuration
    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:${jarvis_ngc_model_version}"

### Citrinet-1024 Streaming w/ CPU decoder, best throughput configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming_throughput:${jarvis_ngc_model_version}"

### Citrinet-1024 Offline w/ CPU decoder, 
    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline:${jarvis_ngc_model_version}"

### Jasper Streaming w/ CPU decoder, best latency configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_streaming:${jarvis_ngc_model_version}"

### Jasper Streaming w/ CPU decoder, best throughput configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_streaming_throughput:${jarvis_ngc_model_version}"

###  Jasper Offline w/ CPU decoder
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_offline:${jarvis_ngc_model_version}"
 
### Quarztnet Streaming w/ CPU decoder, best latency configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_quartznet_english_streaming:${jarvis_ngc_model_version}"

### Quarztnet Streaming w/ CPU decoder, best throughput configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_quartznet_english_streaming_throughput:${jarvis_ngc_model_version}"

### Quarztnet Offline w/ CPU decoder
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_quartznet_english_offline:${jarvis_ngc_model_version}"

### Jasper Streaming w/ GPU decoder, best latency configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_streaming_gpu_decoder:${jarvis_ngc_model_version}"

### Jasper Streaming w/ GPU decoder, best throughput configuration
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_streaming_throughput_gpu_decoder:${jarvis_ngc_model_version}"

### Jasper Offline w/ GPU decoder
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_asr_jasper_english_offline_gpu_decoder:${jarvis_ngc_model_version}"
)

########## NLP MODELS ##########

models_nlp=(
### Bert base Punctuation model
    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_punctuation:${jarvis_ngc_model_version}"

### BERT base Named Entity Recognition model fine-tuned on GMB dataset with class labels LOC, PER, ORG etc.
    #"${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_named_entity_recognition:${jarvis_ngc_model_version}"

### BERT Base Intent Slot model fine-tuned on weather dataset.
    #"${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_intent_slot:${jarvis_ngc_model_version}"

### BERT Base Question Answering model fine-tuned on Squad v2.
    #"${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_question_answering:${jarvis_ngc_model_version}"

### Megatron345M Question Answering model fine-tuned on Squad v2.
#    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_nlp_question_answering_megatron:${jarvis_ngc_model_version}"

### Bert base Text Classification model fine-tuned on 4class (weather, meteorology, personality, nomatch) domain model.
    #"${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_text_classification:${jarvis_ngc_model_version}"
)

########## TTS MODELS ##########

models_tts=(
   "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_jarvis_tts_ljspeech:${jarvis_ngc_model_version}"
)

NGC_TARGET=${jarvis_ngc_org}
if [[ ! -z ${jarvis_ngc_team} ]]; then
  NGC_TARGET="${NGC_TARGET}/${jarvis_ngc_team}"
else
  team="\"\""
fi

# define docker images required to run Jarvis
image_client="nvcr.io/${NGC_TARGET}/jarvis-speech-client:${jarvis_ngc_image_version}"
image_speech_api="nvcr.io/${NGC_TARGET}/jarvis-speech:${jarvis_ngc_image_version}-server"

# define docker images required to setup Jarvis
image_init_speech="nvcr.io/${NGC_TARGET}/jarvis-speech:${jarvis_ngc_image_version}-servicemaker"

# daemon names
jarvis_daemon_speech="jarvis-speech"
jarvis_daemon_client="jarvis-client"

I have checked the available memory allocation with the free command when running jarvis_init.sh and jarvis_start.sh, but it seems that there are still a lot of memory available.
$ free while jarvis_init.sh is executing

              total        used        free      shared  buff/cache   available
Mem:       32621864     6307068    17406852      472720     8907944    25405332
Swap:      15625212           0    15625212

$ free while jarvis_start.sh is executing

              total        used        free      shared  buff/cache   available
Mem:       32621864     4448196    15411692      355352    12761976    27380976
Swap:      15625212           0    15625212

Thank you!