Error during jarvis_init.sh for jarvis 1.2.1 beta

I’m getting an error when trying to run jarvis_init.sh. Here are the logs:

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Jarvis Speech Server images.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.2.1-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech-client:1.2.1-beta exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.2.1-beta-servicemaker exists. Skipping.

Converting JMIRs at jarvis-model-repo/jmir to Jarvis Model repository.
+ docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --init -it --rm --gpus '"device=0"' -v jarvis-model-repo:/data -e MODEL_DEPLOY_KEY=tlt_encode --name jarvis-service-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.2.1-beta-servicemaker deploy_all_models /data/jmir /data/models

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

2021-06-18 01:35:55,409 [INFO] Writing Jarvis model repository to '/data/models'...
2021-06-18 01:35:55,409 [INFO] The jarvis model repo target directory is /data/models
2021-06-18 01:35:56,961 [INFO] Extract_binaries for featurizer -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming/1
2021-06-18 01:35:56,963 [INFO] Extract_binaries for nn -> /data/models/jarvis-trt-citrinet-1024/1
2021-06-18 01:36:01,092 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING /workspace/TensorRT/t/oss-cicd/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /workspace/TensorRT/t/oss-cicd/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 564429124
[TensorRT] WARNING: /workspace/TensorRT/t/oss-cicd/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2021-06-18 01:43:24,479 [INFO] Extract_binaries for vad -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming/1
2021-06-18 01:43:24,480 [INFO] Extract_binaries for lm_decoder -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1
2021-06-18 01:43:24,510 [INFO] {'vocab_file': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1/vocab.txt', 'decoding_language_model_binary': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1/jarvis_asr_train_datasets_noSpgi_noLS_gt_3gram.binary', 'decoding_vocab': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1/dict_vocab.txt', 'tokenizer_model': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1/tokenizer.model'}
2021-06-18 01:43:24,510 [INFO] Model config has vocab file and tokenizer specified. Will create lexicon file from  vocab_file /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1/dict_vocab.txt and tokenizer model /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming/1/tokenizer.model
2021-06-18 01:43:24,611 [INFO] processed 10000 lines
2021-06-18 01:43:24,714 [INFO] processed 20000 lines
2021-06-18 01:43:24,818 [INFO] processed 30000 lines
2021-06-18 01:43:24,923 [INFO] processed 40000 lines
2021-06-18 01:43:25,028 [INFO] processed 50000 lines
2021-06-18 01:43:25,132 [INFO] processed 60000 lines
2021-06-18 01:43:25,238 [INFO] processed 70000 lines
2021-06-18 01:43:25,343 [INFO] processed 80000 lines
2021-06-18 01:43:25,449 [INFO] processed 90000 lines
2021-06-18 01:43:25,554 [INFO] processed 100000 lines
2021-06-18 01:43:25,660 [INFO] processed 110000 lines
2021-06-18 01:43:25,765 [INFO] processed 120000 lines
2021-06-18 01:43:25,873 [INFO] processed 130000 lines
2021-06-18 01:43:25,980 [INFO] processed 140000 lines
2021-06-18 01:43:26,088 [INFO] processed 150000 lines
2021-06-18 01:43:26,195 [INFO] processed 160000 lines
2021-06-18 01:43:26,302 [INFO] processed 170000 lines
2021-06-18 01:43:26,407 [INFO] processed 180000 lines
2021-06-18 01:43:26,514 [INFO] processed 190000 lines
2021-06-18 01:43:26,621 [INFO] processed 200000 lines
2021-06-18 01:43:26,729 [INFO] processed 210000 lines
2021-06-18 01:43:26,836 [INFO] processed 220000 lines
2021-06-18 01:43:26,943 [INFO] processed 230000 lines
2021-06-18 01:43:26,971 [INFO] skipped 0 empty lines
2021-06-18 01:43:26,971 [INFO] filtered 0 lines
2021-06-18 01:43:26,973 [INFO] Extract_binaries for self -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming/1
2021-06-18 01:43:28,617 [INFO] Extract_binaries for featurizer -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline/1
2021-06-18 01:43:28,618 [WARNING] /data/models/jarvis-trt-citrinet-1024 already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/jarvis-trt-citrinet-1024
2021-06-18 01:43:28,619 [INFO] Extract_binaries for vad -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline/1
2021-06-18 01:43:28,620 [INFO] Extract_binaries for lm_decoder -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1
2021-06-18 01:43:28,649 [INFO] {'vocab_file': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1/vocab.txt', 'decoding_language_model_binary': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1/jarvis_asr_train_datasets_noSpgi_noLS_gt_3gram.binary', 'decoding_vocab': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1/dict_vocab.txt', 'tokenizer_model': '/data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1/tokenizer.model'}
2021-06-18 01:43:28,649 [INFO] Model config has vocab file and tokenizer specified. Will create lexicon file from  vocab_file /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1/dict_vocab.txt and tokenizer model /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline/1/tokenizer.model
2021-06-18 01:43:28,750 [INFO] processed 10000 lines
2021-06-18 01:43:28,852 [INFO] processed 20000 lines
2021-06-18 01:43:28,956 [INFO] processed 30000 lines
2021-06-18 01:43:29,061 [INFO] processed 40000 lines
2021-06-18 01:43:29,166 [INFO] processed 50000 lines
2021-06-18 01:43:29,271 [INFO] processed 60000 lines
2021-06-18 01:43:29,376 [INFO] processed 70000 lines
2021-06-18 01:43:29,488 [INFO] processed 80000 lines
2021-06-18 01:43:29,597 [INFO] processed 90000 lines
2021-06-18 01:43:29,705 [INFO] processed 100000 lines
2021-06-18 01:43:29,808 [INFO] processed 110000 lines
2021-06-18 01:43:29,911 [INFO] processed 120000 lines
2021-06-18 01:43:30,021 [INFO] processed 130000 lines
2021-06-18 01:43:30,131 [INFO] processed 140000 lines
2021-06-18 01:43:30,237 [INFO] processed 150000 lines
2021-06-18 01:43:30,342 [INFO] processed 160000 lines
2021-06-18 01:43:30,447 [INFO] processed 170000 lines
2021-06-18 01:43:30,553 [INFO] processed 180000 lines
2021-06-18 01:43:30,667 [INFO] processed 190000 lines
2021-06-18 01:43:30,773 [INFO] processed 200000 lines
2021-06-18 01:43:30,878 [INFO] processed 210000 lines
2021-06-18 01:43:30,983 [INFO] processed 220000 lines
2021-06-18 01:43:31,088 [INFO] processed 230000 lines
2021-06-18 01:43:31,116 [INFO] skipped 0 empty lines
2021-06-18 01:43:31,116 [INFO] filtered 0 lines
2021-06-18 01:43:31,118 [INFO] Extract_binaries for self -> /data/models/citrinet-1024-asr-trt-ensemble-vad-streaming-offline/1
2021-06-18 01:43:32,013 [INFO] Extract_binaries for tokenizer -> /data/models/jarvis_tokenizer/1
2021-06-18 01:43:32,015 [INFO] Extract_binaries for language_model -> /data/models/jarvis-trt-jarvis_intent_weather-nn-bert-base-uncased/1
2021-06-18 01:43:35,045 [INFO] Building TRT engine from PyTorch Checkpoint
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/export_bert_pytorch_to_trt.py", line 976, in <module>
    pytorch_to_trt()
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/export_bert_pytorch_to_trt.py", line 935, in pytorch_to_trt
    return convert_pytorch_bert_to_trt(
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/export_bert_pytorch_to_trt.py", line 788, in convert_pytorch_bert_to_trt
    with build_engine(
AttributeError: __enter__
2021-06-18 01:43:48,460 [ERROR] Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/cli/deploy.py", line 87, in deploy_from_jmir
    generator.serialize_to_disk(
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 340, in serialize_to_disk
    module.serialize_to_disk(repo_dir, jmir, config_only, verbose, overwrite)
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 231, in serialize_to_disk
    self.update_binary(version_dir, jmir, verbose)
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 569, in update_binary
    bindings = self.build_trt_engine_from_pytorch_bert(
  File "/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py", line 532, in build_trt_engine_from_pytorch_bert
    raise Exception("convert_pytorch_to_trt failed.")
Exception: convert_pytorch_to_trt failed.

+ echo

+ echo 'Jarvis initialization complete. Run ./jarvis_start.sh to launch services.'
Jarvis initialization complete. Run ./jarvis_start.sh to launch services.

I’m using a 3070 gpu which has 8GB I believe. The computer has 32GB of main memory. What should I do in order to run Jarvis?

Hi @ruze55

Could you please try commenting out all the NLP models except 1 and see if that deploys successfully on your setup.
You should run jarvis_clean.sh to empty the model repository prior to redeploying the service.

Thanks

Thanks for your response. I commented them all out except for

### Bert base Punctuation model
    "${jarvis_ngc_org}/${jarvis_ngc_team}/jmir_punctuation:${jarvis_ngc_model_version}"

But it’s the same output.

I also tried this config with TTS only (after a clean install again):

# Enable or Disable Jarvis Services
service_enabled_asr=false
service_enabled_nlp=false
service_enabled_tts=true

Now jarvis_init.sh completes but the Triton server dies on jarvis_start.sh. Here is the full log:


==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release 21.05 (build 23858942)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

  > Jarvis waiting for Triton server to load all models...retrying in 1 second

I0620 18:11:24.077752 50 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x2034e0000' with size 268435456
I0620 18:11:24.077839 50 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
I0620 18:11:24.086896 50 model_repository_manager.cc:1066] loading: tacotron2_decoder_postnet:1
I0620 18:11:24.187268 50 model_repository_manager.cc:1066] loading: jarvis-trt-waveglow:1
I0620 18:11:24.188775 50 tacotron-decoder-postnet.cc:873] TRITONBACKEND_ModelInitialize: tacotron2_decoder_postnet (version 1)
I0620 18:11:24.189625 50 tacotron-decoder-postnet.cc:767] model configuration:
{
    "name": "tacotron2_decoder_postnet",
    "platform": "",
    "backend": "jarvis_tts_taco_postnet",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 8,
    "input": [
        {
            "name": "input_decoder",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1,
                400,
                512
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "input_processed_decoder",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                400,
                128,
                1,
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "input_num_characters",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "spectrogram_chunk",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                80,
                80
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "z",
            "data_type": "TYPE_FP32",
            "dims": [
                8,
                2656,
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "num_valid_samples",
            "data_type": "TYPE_INT32",
            "dims": [
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "end_flag",
            "data_type": "TYPE_INT32",
            "dims": [
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 8,
            "preferred_batch_size": [
                8
            ],
            "max_queue_delay_microseconds": 100
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "tacotron2_decoder_postnet_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "profile": []
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "max_execution_batch_size": {
            "string_value": "8"
        },
        "max_input_length": {
            "string_value": "400"
        },
        "chunk_length": {
            "string_value": "80"
        },
        "attention_dimension": {
            "string_value": "128"
        },
        "num_samples_per_frame": {
            "string_value": "256"
        },
        "z_dim0": {
            "string_value": "8"
        },
        "num_mels": {
            "string_value": "80"
        },
        "encoding_dimension": {
            "string_value": "512"
        },
        "tacotron_decoder_engine": {
            "string_value": "/data/models/tacotron2_decoder_postnet/1/model.plan"
        },
        "z_dim1": {
            "string_value": "2656"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": true
    }
}
/opt/jarvis/bin/start-jarvis: line 4:    50 Segmentation fault      tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]

  > Triton server died before reaching ready state. Terminating Jarvis startup.
Check Triton logs with: docker logs 

Did you ever get any update on this ? @ruze55
I just started working with Jarvis and am facing the same problems with a RTX 3070

Thanks

No, I never did and gave up. I tried running the operations inside and outside the Docker containers (basically replicating the process natively instead of Dockerized version) and that yielded the same error. There may be some hardware differences between laptop GPU and desktop GPU that nVidia is not taking into account here.

Please try with Riva version 1.7 or 1.8 as the memory requirements may be somewhat reduced for optimizing models for inference. With that said, we recommend a minimum of 16GB of VRAM for Riva deployments.

I upgraded today to a RTX 3080 and still the same issue. From further reading the docs fron Nvidia, it looks like there is no problem with the models. The triton server will fail if there is not enough memory (GPU memory and not RAM).
So what solved, kind of, my issue is that first to perform ./jarvis_clean.sh.
Then to go to config.sh file and start / stop services one at a time. For example enable only TTS first and then do ./jarvis_init.sh. That will download files correctly and after completing it you can start the server and wont have any problem. Then disable TTS and start NLP and go back to do Jarvis_init and start server with NLP only.

To conclude, if you have less then 16GB GPU memory (i think) you wont be able to start the Triton server will all the services enabled.