How can I start Riva without an error

Hello. I want to setup riva but I get an error. Do you know what is causing and how can solve the problem?

Hardware - GPU (GEFORCE RTX 3060)
Hardware - CPU core i7
Operating System - Ubuntu 20.04
Riva Version - v1.4.0-beta

  1. $ bash riva_init.sh - done.
  2. $ bash riva_start.sh - an error occured and timed out.

$ bash riva_start.sh
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Health ready check failed.
Check Riva logs with: docker logs riva-speech

nvidia-smt

±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P0 14W / N/A | 10MiB / 5946MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1172 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1992 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+

docker logs riva-speech

==========================
=== Riva Speech Skills ===

NVIDIA Release 21.07 (build 25292380)

Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for the inference server. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 …

Riva waiting for Triton server to load all models…retrying in 1 second
I0827 04:44:08.833089 70 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
I0827 04:44:08.881955 70 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0827 04:44:08.882453 70 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0827 04:44:08.882457 70 onnxruntime.cc:1738] ‘onnxruntime’ TRITONBACKEND API version: 1.0
I0827 04:44:09.060622 70 pinned_memory_manager.cc:206] Pinned memory pool is created at ‘0x7f6204000000’ with size 268435456
I0827 04:44:09.061990 70 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
E0827 04:44:09.082852 70 model_repository_manager.cc:1946] Poll failed for model directory ‘riva-trt-riva_punctuation-nn-bert-base-uncased’: failed to open text file for read /data/models/riva-trt-riva_punctuation-nn-bert-base-uncased/config.pbtxt: No such file or directory
I0827 04:44:09.084114 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0827 04:44:09.184510 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0827 04:44:09.184942 70 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming_0_0_gpu0 on GPU 0 (8.6) using libtriton_riva_asr_features.so
I0827 04:44:09.284805 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline:1
I0827 04:44:09.285018 70 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
Riva waiting for Triton server to load all models…retrying in 1 second
I0827 04:44:09.385032 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0827 04:44:09.385332 70 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0827 04:44:09.452584 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline’ version 1
I0827 04:44:09.485357 70 model_repository_manager.cc:1066] loading: riva-trt-citrinet-1024:1
I0827 04:44:09.485657 70 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0827 04:44:09.540997 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming’ version 1
I0827 04:44:09.585618 70 model_repository_manager.cc:1066] loading: riva_tokenizer:1
I0827 04:44:09.685859 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline:1
I0827 04:44:09.686130 70 custom_backend.cc:198] Creating instance riva_tokenizer_0_0_cpu on CPU using libtriton_riva_nlp_tokenizer.so
I0827 04:44:09.786237 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline:1
I0827 04:44:09.786477 70 custom_backend.cc:198] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0827 04:44:09.886860 70 custom_backend.cc:201] Creating instance citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline_0_0_gpu0 on GPU 0 (8.6) using libtriton_riva_asr_features.so
I0827 04:44:09.910864 70 model_repository_manager.cc:1240] successfully loaded ‘riva_tokenizer’ version 1
Riva waiting for Triton server to load all models…retrying in 1 second
W0827 04:44:10.836003 70 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
I0827 04:44:11.098119 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming’ version 1
I0827 04:44:11.298671 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline’ version 1
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
W0827 04:44:12.837705 70 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
W0827 04:44:14.841930 70 metrics.cc:292] failed to get power limit for GPU 0: Not Supported
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0827 04:44:18.869097 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline’ version 1
I0827 04:44:18.869124 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming’ version 1
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0827 04:44:20.590049 70 plan_backend.cc:384] Creating instance riva-trt-citrinet-1024_0_0_gpu0 on GPU 0 (8.6) using model.plan
Riva waiting for Triton server to load all models…retrying in 1 second
I0827 04:44:21.623610 70 plan_backend.cc:768] Created instance riva-trt-citrinet-1024_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0827 04:44:21.630624 70 model_repository_manager.cc:1240] successfully loaded ‘riva-trt-citrinet-1024’ version 1
I0827 04:44:21.631136 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming:1
I0827 04:44:21.731656 70 model_repository_manager.cc:1066] loading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline:1
I0827 04:44:21.832078 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming’ version 1
I0827 04:44:21.832416 70 model_repository_manager.cc:1240] successfully loaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline’ version 1
I0827 04:44:21.832599 70 server.cc:504]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0827 04:44:21.832682 70 server.cc:543]
±------------±----------------------------------------------------------------±-------+
| Backend | Path | Config |
±------------±----------------------------------------------------------------±-------+
| tensorrt | | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
±------------±----------------------------------------------------------------±-------+

I0827 04:44:21.832843 70 server.cc:586]
±---------------------------------------------------------------------------------------------------±--------±-------+
| Model | Version | Status |
±---------------------------------------------------------------------------------------------------±--------±-------+
| citrinet-1024-asr-trt-ensemble-vad-streaming | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline | 1 | READY |
| citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming | 1 | READY |
| riva-trt-citrinet-1024 | 1 | READY |
| riva_tokenizer | 1 | READY |
±---------------------------------------------------------------------------------------------------±--------±-------+

I0827 04:44:21.833068 70 tritonserver.cc:1658]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.9.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /data/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 1000000000 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0827 04:44:21.833095 70 server.cc:234] Waiting for in-flight requests to complete.
I0827 04:44:21.833129 70 model_repository_manager.cc:1099] unloading: riva_tokenizer:1
I0827 04:44:21.833216 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming:1
I0827 04:44:21.833405 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline:1
I0827 04:44:21.833558 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline:1
I0827 04:44:21.833727 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline:1
I0827 04:44:21.834200 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming:1
I0827 04:44:21.834347 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline’ version 1
I0827 04:44:21.834649 70 model_repository_manager.cc:1099] unloading: riva-trt-citrinet-1024:1
I0827 04:44:21.834740 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline:1
I0827 04:44:21.835464 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming:1
I0827 04:44:21.835920 70 model_repository_manager.cc:1099] unloading: citrinet-1024-asr-trt-ensemble-vad-streaming:1
I0827 04:44:21.836156 70 server.cc:249] Timeout 30: Found 9 live models and 0 in-flight non-inference requests
I0827 04:44:21.836378 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming’ version 1
I0827 04:44:21.839485 70 model_repository_manager.cc:1223] successfully unloaded ‘riva_tokenizer’ version 1
I0827 04:44:21.848595 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming’ version 1
I0827 04:44:21.848859 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline’ version 1
I0827 04:44:21.862569 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-feature-extractor-streaming’ version 1
I0827 04:44:21.879011 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline’ version 1
I0827 04:44:21.879243 70 model_repository_manager.cc:1223] successfully unloaded ‘riva-trt-citrinet-1024’ version 1
I0827 04:44:22.064240 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline’ version 1
I0827 04:44:22.071154 70 model_repository_manager.cc:1223] successfully unloaded ‘citrinet-1024-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming’ version 1

Riva waiting for Triton server to load all models…retrying in 1 second
I0827 04:44:22.836629 70 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs
/opt/riva/bin/start-riva: line 1: kill: (70) - No such process

Thanks.

Hi @tohru.fukuhara ,
Please refer to the below link to get started with Riva
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html
Please share more details with us, in case of any specific error.

Thanks!

sorry, I published it on the way. Added error messages. Thanks.

My GPU has only 6GB memory. Is this the reason why it doesn’t work?
What is the minimum memory required to run it?
I edited config.sh to make it only work with ASR, but it still seems to give an out of memory error.

$ bash riva_init.sh
Logging into NGC docker registry if necessary…
Pulling required docker images if necessary…
Note: This may take some time, depending on the speed of your Internet connection.

Pulling Riva Speech Server images.
Image nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-server exists. Skipping.
Image nvcr.io/nvidia/riva/riva-speech-client:1.4.0-beta exists. Skipping.
Image nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker exists. Skipping.

Downloading models (RMIRs) from NGC…
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.

==========================
=== Riva Speech Skills ===

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for the inference server. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 …

/data/artifacts /opt/riva

Downloading nvidia/riva/rmir_nlp_punctuation_bert_base:1.4.0-beta…
Downloaded 418.11 MB in 1m 55s, Download speed: 3.63 MB/s


Transfer id: rmir_nlp_punctuation_bert_base_v1.4.0-beta Download status: Completed.
Downloaded local path: /data/artifacts/rmir_nlp_punctuation_bert_base_v1.4.0-beta
Total files downloaded: 1
Total downloaded size: 418.11 MB
Started at: 2021-09-08 06:59:59.725328
Completed at: 2021-09-08 07:01:54.932976
Duration taken: 1m 55s

/opt/riva

Converting RMIRs at riva-model-repo/rmir to Riva Model repository.

==========================
=== Riva Speech Skills ===

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for the inference server. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 …

2021-09-08 07:02:04,755 [INFO] Writing Riva model repository to ‘/data/models’…
2021-09-08 07:02:04,755 [INFO] The riva model repo target directory is /data/models
2021-09-08 07:02:05,542 [INFO] Extract_binaries for tokenizer → /data/models/riva_tokenizer/1
2021-09-08 07:02:05,544 [INFO] Extract_binaries for language_model → /data/models/riva-trt-riva_punctuation-nn-bert-base-uncased/1
2021-09-08 07:02:08,375 [INFO] Printing copied artifacts:
2021-09-08 07:02:08,375 [INFO] {‘ckpt’: ‘/data/models/riva-trt-riva_punctuation-nn-bert-base-uncased/1/model_weights.ckpt’, ‘bert_config_file’: ‘/data/models/riva-trt-riva_punctuation-nn-bert-base-uncased/1/bert-base-uncased_encoder_config.json’}
2021-09-08 07:02:08,375 [INFO] Building TRT engine from PyTorch Checkpoint
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/export_bert_pytorch_to_trt.py”, line 1050, in
pytorch_to_trt()
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/export_bert_pytorch_to_trt.py”, line 1009, in pytorch_to_trt
return convert_pytorch_bert_to_trt(
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/export_bert_pytorch_to_trt.py”, line 862, in convert_pytorch_bert_to_trt
with build_engine(
AttributeError: enter
2021-09-08 07:02:18,411 [ERROR] Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/servicemaker/cli/deploy.py”, line 85, in deploy_from_rmir
generator.serialize_to_disk(
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py”, line 340, in serialize_to_disk
module.serialize_to_disk(repo_dir, rmir, config_only, verbose, overwrite)
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py”, line 231, in serialize_to_disk
self.update_binary(version_dir, rmir, verbose)
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py”, line 573, in update_binary
bindings = self.build_trt_engine_from_pytorch_bert(
File “/opt/conda/lib/python3.8/site-packages/servicemaker/triton/triton.py”, line 534, in build_trt_engine_from_pytorch_bert
raise Exception(“convert_pytorch_to_trt failed.”)
Exception: convert_pytorch_to_trt failed.

  • echo

  • echo ‘Riva initialization complete. Run ./riva_start.sh to launch services.’
    Riva initialization complete. Run ./riva_start.sh to launch services.

Hi @tohru.fukuhara ,
While we do mention 16GB minimum is required, even for individual model deployment 12GB is minimum, as servicemaker during running riva_init can go upto 10GB of usage and free it after init completes.
From log it seems that you failed at riva-deploy stage which is inherently part of riva-init as well, where 12GB would be recommended.
Thanks!

I understand the recommended amount of memory. Thank you for your information. Do I still need 12GB to comment out anything other than ASR in config.sh?
Thanks.