Riva start can't download all models on DGX spark

Please provide the following information when requesting support.

Riva start can’t download all models on DGX spark GB10.

Hardware - GPU (A100/A30/T4/V100)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Here is the log from running riva_start.sh:

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ bash riva_start.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Health ready check failed.

Check Riva logs with: docker logs riva-speech

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ docker logs riva-speech

==========================

=== Riva Speech Skills ===

==========================

NVIDIA Release 25.02 (build 151443008)

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.

By pulling and using the container, you accept the terms and conditions of this license:

WARNING: Detected NVIDIA GB10 GPU, which may not yet be supported in this version of the container

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:40.737388 141 pinned_memory_manager.cc:277] “Pinned memory pool is created at ‘0x32ee00000’ with size 268435456”

I1119 04:14:40.742592 141 cuda_memory_manager.cc:107] “CUDA memory pool is created on device 0 with size 1000000000”

I1119 04:14:40.780633 141 model_lifecycle.cc:473] “loading: conformer-en-US-asr-offline-asr-bls-ensemble:1”

I1119 04:14:40.780666 141 model_lifecycle.cc:473] “loading: conformer-en-US-asr-streaming-asr-bls-ensemble:1”

I1119 04:14:40.780684 141 model_lifecycle.cc:473] “loading: riva-onnx-fastpitch_encoder-English-US:1”

I1119 04:14:40.780701 141 model_lifecycle.cc:473] “loading: riva-punctuation-en-US:1”

I1119 04:14:40.780714 141 model_lifecycle.cc:473] “loading: riva-trt-conformer-en-US-asr-offline-am-streaming-offline:1”

I1119 04:14:40.780725 141 model_lifecycle.cc:473] “loading: riva-trt-conformer-en-US-asr-streaming-am-streaming:1”

I1119 04:14:40.780733 141 model_lifecycle.cc:473] “loading: riva-trt-hifigan-English-US:1”

I1119 04:14:40.780742 141 model_lifecycle.cc:473] “loading: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased:1”

I1119 04:14:40.780754 141 model_lifecycle.cc:473] “loading: spectrogram_chunker-English-US:1”

I1119 04:14:40.780769 141 model_lifecycle.cc:473] “loading: tts_postprocessor-English-US:1”

I1119 04:14:40.780789 141 model_lifecycle.cc:473] “loading: tts_preprocessor-English-US:1”

I1119 04:14:41.045152 141 pipeline_library.cc:24] “TRITONBACKEND_ModelInitialize: riva-punctuation-en-US (version 1)”

I1119 04:14:41.045638 141 backend_model.cc:303] “model configuration:\n{\n \“name\”: \“riva-punctuation-en-US\”,\n \“platform\”: \”\“,\n \“backend\”: \“riva_nlp_pipeline\”,\n \“runtime\”: \”\“,\n \“version_policy\”: {\n \“latest\”: {\n \“num_versions\”: 1\n }\n },\n \“max_batch_size\”: 8,\n \“input\”: [\n {\n \“name\”: \“PIPELINE_INPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“format\”: \“FORMAT_NONE\”,\n \“dims\”: [\n 1\n ],\n \“is_shape_tensor\”: false,\n \“allow_ragged_batch\”: false,\n \“optional\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“output\”: [\n {\n \“name\”: \“PIPELINE_OUTPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“dims\”: [\n 1\n ],\n \“label_filename\”: \”\“,\n \“is_shape_tensor\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“batch_input\”: ,\n \“batch_output\”: ,\n \“optimization\”: {\n \“priority\”: \“PRIORITY_DEFAULT\”,\n \“input_pinned_memory\”: {\n \“enable\”: true\n },\n \“output_pinned_memory\”: {\n \“enable\”: true\n },\n \“gather_kernel_buffer_threshold\”: 0,\n \“eager_batching\”: false\n },\n \“instance_group\”: [\n {\n \“name\”: \“riva-punctuation-en-US_0\”,\n \“kind\”: \“KIND_CPU\”,\n \“count\”: 1,\n \“gpus\”: ,\n \“secondary_devices\”: ,\n \“profile\”: ,\n \“passive\”: false,\n \“host_policy\”: \”\“\n }\n ],\n \“default_model_filename\”: \”\“,\n \“cc_model_filenames\”: {},\n \“metric_tags\”: {},\n \“parameters\”: {\n \“attn_mask_tensor_name\”: {\n \“string_value\”: \“attention_mask\”\n },\n \“punct_logits_tensor_name\”: {\n \“string_value\”: \“punct_logits\”\n },\n \“model_api\”: {\n \“string_value\”: \“PunctuateText\”\n },\n \“language_code\”: {\n \“string_value\”: \“en-US\”\n },\n \“eos_token\”: {\n \“string_value\”: \”[SEP]\“\n },\n \“unk_token\”: {\n \“string_value\”: \”[UNK]\“\n },\n \“to_lower\”: {\n \“string_value\”: \“true\”\n },\n \“punctuation_mapping_path\”: {\n \“string_value\”: \“fe160f3a917d411b99852e509e3279a3_punct_label_ids.csv\”\n },\n \“load_model\”: {\n \“string_value\”: \“false\”\n },\n \“tokenizer\”: {\n \“string_value\”: \“wordpiece\”\n },\n \“delimiter\”: {\n \“string_value\”: \” \“\n },\n \“token_type_tensor_name\”: {\n \“string_value\”: \“token_type_ids\”\n },\n \“remove_spaces\”: {\n \“string_value\”: \“False\”\n },\n \“capitalization_mapping_path\”: {\n \“string_value\”: \“a4ed235fb32c44e58eab5854d3cd94f8_capit_label_ids.csv\”\n },\n \“model_family\”: {\n \“string_value\”: \“riva\”\n },\n \“tokenizer_to_lower\”: {\n \“string_value\”: \“true\”\n },\n \“pipeline_type\”: {\n \“string_value\”: \“punctuation\”\n },\n \“code_point_filename\”: {\n \“string_value\”: \“cp_data.json\”\n },\n \“unicode_normalize\”: {\n \“string_value\”: \“False\”\n },\n \“pad_chars_with_spaces\”: {\n \“string_value\”: \“False\”\n },\n \“use_int64_nn_inputs\”: {\n \“string_value\”: \“False\”\n },\n \“vocab\”: {\n \“string_value\”: \“f92889b136d2433693cb9127e1aea218_vocab.txt\”\n },\n \“bos_token\”: {\n \“string_value\”: \”[CLS]\“\n },\n \“capit_logits_tensor_name\”: {\n \“string_value\”: \“capit_logits\”\n },\n \“model_name\”: {\n \“string_value\”: \“riva-trt-riva-punctuation-en-US-nn-bert-base-uncased\”\n },\n \“preserve_accents\”: {\n \“string_value\”: \“false\”\n },\n \“input_ids_tensor_name\”: {\n \“string_value\”: \“input_ids\”\n }\n },\n \“model_warmup\”: \n}”

I1119 04:14:41.045717 141 pipeline_library.cc:28] “TRITONBACKEND_ModelInstanceInitialize: riva-punctuation-en-US_0_0 (device 0)”

I1119 04:14:41.062678 141 onnxruntime.cc:2718] “TRITONBACKEND_Initialize: onnxruntime”

I1119 04:14:41.062704 141 onnxruntime.cc:2728] “Triton TRITONBACKEND API version: 1.19”

I1119 04:14:41.062707 141 onnxruntime.cc:2734] “‘onnxruntime’ TRITONBACKEND API version: 1.16”

I1119 04:14:41.062709 141 onnxruntime.cc:2764] “backend configuration:\n{\“cmdline\”:{\“auto-complete-config\”:\“false\”,\“backend-directory\”:\”/opt/tritonserver/backends\“,\“min-compute-capability\”:\“6.000000\”,\“default-max-batch-size\”:\“4\”}}”

I1119 04:14:41.074151 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble (version 1)”

Found yaml file: /data/models/conformer-en-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml

I1119 04:14:41.077293 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-asr-bls-ensemble (version 1)”

Found yaml file: /data/models/conformer-en-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml

I1119 04:14:41.080170 141 onnxruntime.cc:2829] “TRITONBACKEND_ModelInitialize: riva-onnx-fastpitch_encoder-English-US (version 1)”

I1119 04:14:41.080738 141 onnxruntime.cc:2894] “TRITONBACKEND_ModelInstanceInitialize: riva-onnx-fastpitch_encoder-English-US_0_0 (GPU device 0)”

I1119 04:14:41.101297 141 backend_model.cc:303] “model configuration:\n{\n \“name\”: \“conformer-en-US-asr-streaming-asr-bls-ensemble\”,\n \“platform\”: \”\“,\n \“backend\”: \“riva_asr_ensemble_pipeline\”,\n \“runtime\”: \”\“,\n \“version_policy\”: {\n \“latest\”: {\n \“num_versions\”: 1\n }\n },\n \“max_batch_size\”: 1024,\n \“input\”: [\n {\n \“name\”: \“PIPELINE_INPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“format\”: \“FORMAT_NONE\”,\n \“dims\”: [\n 1\n ],\n \“is_shape_tensor\”: false,\n \“allow_ragged_batch\”: false,\n \“optional\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“output\”: [\n {\n \“name\”: \“PIPELINE_OUTPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“dims\”: [\n 1\n ],\n \“label_filename\”: \”\“,\n \“is_shape_tensor\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“batch_input\”: ,\n \“batch_output\”: ,\n \“optimization\”: {\n \“graph\”: {\n \“level\”: 0\n },\n \“priority\”: \“PRIORITY_DEFAULT\”,\n \“cuda\”: {\n \“graphs\”: false,\n \“busy_wait_events\”: false,\n \“graph_spec\”: ,\n \“output_copy_stream\”: true\n },\n \“input_pinned_memory\”: {\n \“enable\”: true\n },\n \“output_pinned_memory\”: {\n \“enable\”: true\n },\n \“gather_kernel_buffer_threshold\”: 0,\n \“eager_batching\”: false\n },\n \“sequence_batching\”: {\n \“oldest\”: {\n \“max_candidate_sequences\”: 1024,\n \“preferred_batch_size\”: [\n 64,\n 128\n ],\n \“max_queue_delay_microseconds\”: 1000,\n \“preserve_ordering\”: false\n },\n \“max_sequence_idle_microseconds\”: 60000000,\n \“control_input\”: [\n {\n \“name\”: \“START\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_START\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“READY\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_READY\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“END\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_END\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“CORRID\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_CORRID\”,\n \“int32_false_true\”: ,\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_UINT64\”\n }\n ]\n }\n ],\n \“state\”: ,\n \“iterative_sequence\”: false\n },\n \“instance_group\”: [\n {\n \“name\”: \“conformer-en-US-asr-streaming-asr-bls-ensemble_0\”,\n \“kind\”: \“KIND_CPU\”,\n \“count\”: 1,\n \“gpus\”: ,\n \“secondary_devices\”: ,\n \“profile\”: ,\n \“passive\”: false,\n \“host_policy\”: \”\“\n }\n ],\n \“default_model_filename\”: \”\“,\n \“cc_model_filenames\”: {},\n \“metric_tags\”: {},\n \“parameters\”: {\n \“offline\”: {\n \“string_value\”: \“False\”\n },\n \“language_code\”: {\n \“string_value\”: \“en-US\”\n },\n \“type\”: {\n \“string_value\”: \“online\”\n },\n \“yaml_parameters_file\”: {\n \“string_value\”: \“riva_bls_config.yaml\”\n },\n \“streaming\”: {\n \“string_value\”: \“True\”\n },\n \“sample_rate\”: {\n \“string_value\”: \“16000\”\n },\n \“model_family\”: {\n \“string_value\”: \“riva\”\n }\n },\n \“model_warmup\”: ,\n \“model_transaction_policy\”: {\n \“decoupled\”: true\n }\n}”

I1119 04:14:41.101388 141 backend_model.cc:303] “model configuration:\n{\n \“name\”: \“conformer-en-US-asr-offline-asr-bls-ensemble\”,\n \“platform\”: \”\“,\n \“backend\”: \“riva_asr_ensemble_pipeline\”,\n \“runtime\”: \”\“,\n \“version_policy\”: {\n \“latest\”: {\n \“num_versions\”: 1\n }\n },\n \“max_batch_size\”: 1024,\n \“input\”: [\n {\n \“name\”: \“PIPELINE_INPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“format\”: \“FORMAT_NONE\”,\n \“dims\”: [\n 1\n ],\n \“is_shape_tensor\”: false,\n \“allow_ragged_batch\”: false,\n \“optional\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“output\”: [\n {\n \“name\”: \“PIPELINE_OUTPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“dims\”: [\n 1\n ],\n \“label_filename\”: \”\“,\n \“is_shape_tensor\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“batch_input\”: ,\n \“batch_output\”: ,\n \“optimization\”: {\n \“graph\”: {\n \“level\”: 0\n },\n \“priority\”: \“PRIORITY_DEFAULT\”,\n \“cuda\”: {\n \“graphs\”: false,\n \“busy_wait_events\”: false,\n \“graph_spec\”: ,\n \“output_copy_stream\”: true\n },\n \“input_pinned_memory\”: {\n \“enable\”: true\n },\n \“output_pinned_memory\”: {\n \“enable\”: true\n },\n \“gather_kernel_buffer_threshold\”: 0,\n \“eager_batching\”: false\n },\n \“sequence_batching\”: {\n \“oldest\”: {\n \“max_candidate_sequences\”: 1024,\n \“preferred_batch_size\”: [\n 64,\n 128\n ],\n \“max_queue_delay_microseconds\”: 1000,\n \“preserve_ordering\”: false\n },\n \“max_sequence_idle_microseconds\”: 60000000,\n \“control_input\”: [\n {\n \“name\”: \“START\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_START\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“READY\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_READY\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“END\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_END\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“CORRID\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_CORRID\”,\n \“int32_false_true\”: ,\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_UINT64\”\n }\n ]\n }\n ],\n \“state\”: ,\n \“iterative_sequence\”: false\n },\n \“instance_group\”: [\n {\n \“name\”: \“conformer-en-US-asr-offline-asr-bls-ensemble_0\”,\n \“kind\”: \“KIND_CPU\”,\n \“count\”: 1,\n \“gpus\”: ,\n \“secondary_devices\”: ,\n \“profile\”: ,\n \“passive\”: false,\n \“host_policy\”: \”\“\n }\n ],\n \“default_model_filename\”: \”\“,\n \“cc_model_filenames\”: {},\n \“metric_tags\”: {},\n \“parameters\”: {\n \“streaming\”: {\n \“string_value\”: \“True\”\n },\n \“type\”: {\n \“string_value\”: \“offline\”\n },\n \“sample_rate\”: {\n \“string_value\”: \“16000\”\n },\n \“offline\”: {\n \“string_value\”: \“True\”\n },\n \“language_code\”: {\n \“string_value\”: \“en-US\”\n },\n \“model_family\”: {\n \“string_value\”: \“riva\”\n },\n \“yaml_parameters_file\”: {\n \“string_value\”: \“riva_bls_config.yaml\”\n }\n },\n \“model_warmup\”: ,\n \“model_transaction_policy\”: {\n \“decoupled\”: true\n }\n}”

I1119 04:14:41.101419 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”

I1119 04:14:41.101488 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”

I1119 04:14:41.211051 141 model_lifecycle.cc:849] “successfully loaded ‘riva-punctuation-en-US’”

I1119 04:14:41.228707 141 tensorrt.cc:65] “TRITONBACKEND_Initialize: tensorrt”

I1119 04:14:41.228728 141 tensorrt.cc:75] “Triton TRITONBACKEND API version: 1.19”

I1119 04:14:41.228730 141 tensorrt.cc:81] “‘tensorrt’ TRITONBACKEND API version: 1.19”

I1119 04:14:41.228733 141 tensorrt.cc:105] “backend configuration:\n{\“cmdline\”:{\“auto-complete-config\”:\“false\”,\“backend-directory\”:\”/opt/tritonserver/backends\“,\“min-compute-capability\”:\“6.000000\”,\“default-max-batch-size\”:\“4\”}}”

I1119 04:14:41.233863 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-offline-am-streaming-offline (version 1)”

I1119 04:14:41.234178 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0 (GPU device 0)”

I1119 04:14:41.239240 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.239248 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’”

I1119 04:14:41.239253 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.239268 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’”

I1119 04:14:41.239275 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline’”

I1119 04:14:41.244287 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming (version 1)”

I1119 04:14:41.244483 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0 (GPU device 0)”

I1119 04:14:41.249308 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.249314 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’”

I1119 04:14:41.249317 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.249326 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-conformer-en-US-asr-streaming-am-streaming’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’”

I1119 04:14:41.249330 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-conformer-en-US-asr-streaming-am-streaming’”

I1119 04:14:41.254355 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-hifigan-English-US (version 1)”

I1119 04:14:41.254514 141 backend_model.cc:281] “Overriding execution policy to \“TRITONBACKEND_EXECUTION_BLOCKING\” for sequence model \“riva-trt-hifigan-English-US\””

I1119 04:14:41.254520 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-hifigan-English-US_0_0 (GPU device 0)”

I1119 04:14:41.260219 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.260235 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’”

I1119 04:14:41.260240 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.260258 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-hifigan-English-US’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’”

I1119 04:14:41.260263 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-hifigan-English-US’”

I1119 04:14:41.265299 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased (version 1)”

I1119 04:14:41.265547 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0 (GPU device 0)”

I1119 04:14:41.270502 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.270507 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’”

I1119 04:14:41.270511 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.270520 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’”

I1119 04:14:41.270524 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased’”

I1119 04:14:41.275984 141 spectrogram-chunker.cc:274] “TRITONBACKEND_ModelInitialize: spectrogram_chunker-English-US (version 1)”

I1119 04:14:41.276357 141 spectrogram-chunker.cc:276] “TRITONBACKEND_ModelInstanceInitialize: spectrogram_chunker-English-US_0_0 (device 0)”

I1119 04:14:41.276672 141 model_lifecycle.cc:849] “successfully loaded ‘spectrogram_chunker-English-US’”

I1119 04:14:41.281986 141 tts-postprocessor.cc:308] “TRITONBACKEND_ModelInitialize: tts_postprocessor-English-US (version 1)”

I1119 04:14:41.282276 141 tts-postprocessor.cc:310] “TRITONBACKEND_ModelInstanceInitialize: tts_postprocessor-English-US_0_0 (device 0)”

I1119 04:14:41.293689 141 model_lifecycle.cc:849] “successfully loaded ‘tts_postprocessor-English-US’”

I1119 04:14:41.305631 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: tts_preprocessor-English-US (version 1)”

I1119 04:14:41.306160 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: tts_preprocessor-English-US_0_0 (device 0)”

WARNING: Logging before InitGoogleLogging() is written to STDERR

I1119 04:14:41.306182 146 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

2025-11-19 04:14:41.346024669 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

2025-11-19 04:14:41.346046093 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I1119 04:14:41.368024 146 abstract-grm-manager.h:168] Updating FST 0xe7d6ed0eb560 with input label sorted version.

I1119 04:14:41.400485 146 abstract-grm-manager.h:168] Updating FST 0xe7d8b825d290 with input label sorted version.

I1119 04:14:41.402129 146 abstract-grm-manager.h:168] Updating FST 0xe7d8b84bfb30 with input label sorted version.

I1119 04:14:41.402230 146 preprocessor.cc:279] TTS character mapping loaded from /data/models/tts_preprocessor-English-US/1/mapping.txt

I1119 04:14:41.402258 146 preprocessor.cc:357] Abbreviation mapping loaded from /data/models/tts_preprocessor-English-US/1/abbr.txt

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:41.435352 141 model_lifecycle.cc:849] “successfully loaded ‘riva-onnx-fastpitch_encoder-English-US’”

I1119 04:14:41.453712 146 preprocessor.cc:397] TTS phonetic mapping loaded from /data/models/tts_preprocessor-English-US/1/ipa_cmudict_single_pron-0.82_nv24.08.txt

I1119 04:14:41.453745 146 preprocessor.cc:529] TTS Preprocessor initialized with 1 languages

I1119 04:14:41.454046 141 model_lifecycle.cc:849] “successfully loaded ‘tts_preprocessor-English-US’”

WARNING: Logging before InitGoogleLogging() is written to STDERR

I1119 04:14:41.722193 147 asr_ensemble_factory.cc:278] Loading acoustic model

I1119 04:14:41.722230 147 asr_ensemble_factory.cc:284] Done loading acoustic model

I1119 04:14:41.798803 147 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

I1119 04:14:41.840123 141 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’”

I1119 04:14:41.857189 148 asr_ensemble_factory.cc:278] Loading acoustic model

I1119 04:14:41.857213 148 asr_ensemble_factory.cc:284] Done loading acoustic model

I1119 04:14:41.872560 148 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

I1119 04:14:41.903657 141 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’”

E1119 04:14:41.904077 141 model_repository_manager.cc:703] “Invalid argument: ensemble ‘fastpitch_hifigan_ensemble-English-US’ depends on ‘riva-trt-hifigan-English-US’ which has no loaded version. Model ‘riva-trt-hifigan-English-US’ loading failed with error: version 1 is at UNAVAILABLE state: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’;”

I1119 04:14:41.904124 141 server.cc:604]

±-----------------±-----+

| Repository Agent | Path |

±-----------------±-----+

±-----------------±-----+

I1119 04:14:41.904160 141 server.cc:631]

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Backend | Path | Config |

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

| riva_nlp_pipeline | /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_asr_ensemble_pipeline | /opt/tritonserver/backends/riva_asr_ensemble_pipeline/libtriton_riva_asr_ensemble_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_chunker | /opt/tritonserver/backends/riva_tts_chunker/libtriton_riva_tts_chunker.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_postprocessor | /opt/tritonserver/backends/riva_tts_postprocessor/libtriton_riva_tts_postprocessor.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_pipeline | /opt/tritonserver/backends/riva_tts_pipeline/libtriton_riva_tts_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.904203 141 server.cc:674]

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Model | Version | Status |

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| conformer-en-US-asr-offline-asr-bls-ensemble | 1 | READY |

| conformer-en-US-asr-streaming-asr-bls-ensemble | 1 | READY |

| riva-onnx-fastpitch_encoder-English-US | 1 | READY |

| riva-punctuation-en-US | 1 | READY |

| riva-trt-conformer-en-US-asr-offline-am-streaming-offline | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’ |

| riva-trt-conformer-en-US-asr-streaming-am-streaming | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’ |

| riva-trt-hifigan-English-US | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’ |

| riva-trt-riva-punctuation-en-US-nn-bert-base-uncased | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’ |

| spectrogram_chunker-English-US | 1 | READY |

| tts_postprocessor-English-US | 1 | READY |

| tts_preprocessor-English-US | 1 | READY |

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.942429 141 metrics.cc:890] “Collecting metrics for GPU 0: NVIDIA GB10”

I1119 04:14:41.949441 141 metrics.cc:783] “Collecting CPU metrics”

I1119 04:14:41.949499 141 tritonserver.cc:2598]

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Option | Value |

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| server_id | triton |

| server_version | 2.54.0 |

| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |

| model_repository_path[0] | /data/models |

| model_control_mode | MODE_NONE |

| strict_model_config | 1 |

| model_config_name | |

| rate_limit | OFF |

| pinned_memory_pool_byte_size | 268435456 |

| cuda_memory_pool_byte_size{0} | 1000000000 |

| min_supported_compute_capability | 6.0 |

| strict_readiness | 1 |

| exit_timeout | 30 |

| cache_enabled | 0 |

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.949521 141 server.cc:305] “Waiting for in-flight requests to complete.”

I1119 04:14:41.949532 141 server.cc:321] “Timeout 30: Found 0 model versions that have in-flight inferences”

I1119 04:14:41.949975 141 server.cc:336] “All models are stopped, unloading models”

I1119 04:14:41.949983 141 server.cc:345] “Timeout 30: Found 7 live models and 0 in-flight non-inference requests”

I1119 04:14:41.950474 141 pipeline_library.cc:31] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950695 141 pipeline_library.cc:30] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950843 141 onnxruntime.cc:2946] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950891 141 pipeline_library.cc:29] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.951242 141 spectrogram-chunker.cc:279] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.951280 141 spectrogram-chunker.cc:275] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.952507 141 tts-postprocessor.cc:313] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.952533 141 pipeline_library.cc:29] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.952554 141 model_lifecycle.cc:636] “successfully unloaded ‘spectrogram_chunker-English-US’ version 1”

I1119 04:14:41.963702 141 pipeline_library.cc:27] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.963772 141 model_lifecycle.cc:636] “successfully unloaded ‘riva-punctuation-en-US’ version 1”

I1119 04:14:41.964476 141 onnxruntime.cc:2870] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.964553 141 model_lifecycle.cc:636] “successfully unloaded ‘riva-onnx-fastpitch_encoder-English-US’ version 1”

I1119 04:14:41.966142 141 tts-postprocessor.cc:309] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.967078 141 model_lifecycle.cc:636] “successfully unloaded ‘tts_postprocessor-English-US’ version 1”

I1119 04:14:41.971463 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.971526 141 model_lifecycle.cc:636] “successfully unloaded ‘tts_preprocessor-English-US’ version 1”

I1119 04:14:42.085942 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:42.086430 141 model_lifecycle.cc:636] “successfully unloaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’ version 1”

I1119 04:14:42.089331 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:42.089647 141 model_lifecycle.cc:636] “successfully unloaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’ version 1”

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:42.950485 141 server.cc:345] “Timeout 29: Found 0 live models and 0 in-flight non-inference requests”

W1119 04:14:42.951227 141 metrics.cc:644] “Unable to get power limit for GPU 0. Status:Success, value:0.000000”

W1119 04:14:42.951257 141 metrics.cc:725] “Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0”

error: creating server: Internal - failed to load all models

> Riva waiting for Triton server to load all models…retrying in 1 second

W

W

> Riva waiting for Triton server to load all models…retrying in 1 second

> Triton server died before reaching ready state. Terminating Riva startup.

Check Triton logs with: docker logs

/opt/riva/bin/start-riva: line 1: kill: (141) - No such process

here is the config.sh:

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ bash riva_start.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Waiting for Riva server to load all models…retrying in 10 seconds

Health ready check failed.

Check Riva logs with: docker logs riva-speech

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ docker logs riva-speech

==========================

=== Riva Speech Skills ===

==========================

NVIDIA Release 25.02 (build 151443008)

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.

By pulling and using the container, you accept the terms and conditions of this license:

WARNING: Detected NVIDIA GB10 GPU, which may not yet be supported in this version of the container

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:40.737388 141 pinned_memory_manager.cc:277] “Pinned memory pool is created at ‘0x32ee00000’ with size 268435456”

I1119 04:14:40.742592 141 cuda_memory_manager.cc:107] “CUDA memory pool is created on device 0 with size 1000000000”

I1119 04:14:40.780633 141 model_lifecycle.cc:473] “loading: conformer-en-US-asr-offline-asr-bls-ensemble:1”

I1119 04:14:40.780666 141 model_lifecycle.cc:473] “loading: conformer-en-US-asr-streaming-asr-bls-ensemble:1”

I1119 04:14:40.780684 141 model_lifecycle.cc:473] “loading: riva-onnx-fastpitch_encoder-English-US:1”

I1119 04:14:40.780701 141 model_lifecycle.cc:473] “loading: riva-punctuation-en-US:1”

I1119 04:14:40.780714 141 model_lifecycle.cc:473] “loading: riva-trt-conformer-en-US-asr-offline-am-streaming-offline:1”

I1119 04:14:40.780725 141 model_lifecycle.cc:473] “loading: riva-trt-conformer-en-US-asr-streaming-am-streaming:1”

I1119 04:14:40.780733 141 model_lifecycle.cc:473] “loading: riva-trt-hifigan-English-US:1”

I1119 04:14:40.780742 141 model_lifecycle.cc:473] “loading: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased:1”

I1119 04:14:40.780754 141 model_lifecycle.cc:473] “loading: spectrogram_chunker-English-US:1”

I1119 04:14:40.780769 141 model_lifecycle.cc:473] “loading: tts_postprocessor-English-US:1”

I1119 04:14:40.780789 141 model_lifecycle.cc:473] “loading: tts_preprocessor-English-US:1”

I1119 04:14:41.045152 141 pipeline_library.cc:24] “TRITONBACKEND_ModelInitialize: riva-punctuation-en-US (version 1)”

I1119 04:14:41.045717 141 pipeline_library.cc:28] “TRITONBACKEND_ModelInstanceInitialize: riva-punctuation-en-US_0_0 (device 0)”

I1119 04:14:41.062678 141 onnxruntime.cc:2718] “TRITONBACKEND_Initialize: onnxruntime”

I1119 04:14:41.062704 141 onnxruntime.cc:2728] “Triton TRITONBACKEND API version: 1.19”

I1119 04:14:41.062707 141 onnxruntime.cc:2734] “‘onnxruntime’ TRITONBACKEND API version: 1.16”

I1119 04:14:41.062709 141 onnxruntime.cc:2764] “backend configuration:\n{\“cmdline\”:{\“auto-complete-config\”:\“false\”,\“backend-directory\”:\”/opt/tritonserver/backends\“,\“min-compute-capability\”:\“6.000000\”,\“default-max-batch-size\”:\“4\”}}”

I1119 04:14:41.074151 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble (version 1)”

Found yaml file: /data/models/conformer-en-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml

I1119 04:14:41.077293 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-asr-bls-ensemble (version 1)”

Found yaml file: /data/models/conformer-en-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml

I1119 04:14:41.080170 141 onnxruntime.cc:2829] “TRITONBACKEND_ModelInitialize: riva-onnx-fastpitch_encoder-English-US (version 1)”

I1119 04:14:41.080738 141 onnxruntime.cc:2894] “TRITONBACKEND_ModelInstanceInitialize: riva-onnx-fastpitch_encoder-English-US_0_0 (GPU device 0)”

I1119 04:14:41.101419 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”

I1119 04:14:41.101488 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”

I1119 04:14:41.211051 141 model_lifecycle.cc:849] “successfully loaded ‘riva-punctuation-en-US’”

I1119 04:14:41.228707 141 tensorrt.cc:65] “TRITONBACKEND_Initialize: tensorrt”

I1119 04:14:41.228728 141 tensorrt.cc:75] “Triton TRITONBACKEND API version: 1.19”

I1119 04:14:41.228730 141 tensorrt.cc:81] “‘tensorrt’ TRITONBACKEND API version: 1.19”

I1119 04:14:41.228733 141 tensorrt.cc:105] “backend configuration:\n{\“cmdline\”:{\“auto-complete-config\”:\“false\”,\“backend-directory\”:\”/opt/tritonserver/backends\“,\“min-compute-capability\”:\“6.000000\”,\“default-max-batch-size\”:\“4\”}}”

I1119 04:14:41.233863 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-offline-am-streaming-offline (version 1)”

I1119 04:14:41.234178 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0 (GPU device 0)”

I1119 04:14:41.239240 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.239248 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’”

I1119 04:14:41.239253 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.239268 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’”

I1119 04:14:41.239275 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline’”

I1119 04:14:41.244287 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming (version 1)”

I1119 04:14:41.244483 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0 (GPU device 0)”

I1119 04:14:41.249308 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.249314 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’”

I1119 04:14:41.249317 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.249326 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-conformer-en-US-asr-streaming-am-streaming’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’”

I1119 04:14:41.249330 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-conformer-en-US-asr-streaming-am-streaming’”

I1119 04:14:41.254355 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-hifigan-English-US (version 1)”

I1119 04:14:41.254514 141 backend_model.cc:281] “Overriding execution policy to \“TRITONBACKEND_EXECUTION_BLOCKING\” for sequence model \“riva-trt-hifigan-English-US\””

I1119 04:14:41.254520 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-hifigan-English-US_0_0 (GPU device 0)”

I1119 04:14:41.260219 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.260235 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’”

I1119 04:14:41.260240 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.260258 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-hifigan-English-US’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’”

I1119 04:14:41.260263 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-hifigan-English-US’”

I1119 04:14:41.265299 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased (version 1)”

I1119 04:14:41.265547 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0 (GPU device 0)”

I1119 04:14:41.270502 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.270507 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’”

I1119 04:14:41.270511 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.270520 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’”

I1119 04:14:41.270524 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased’”

I1119 04:14:41.275984 141 spectrogram-chunker.cc:274] “TRITONBACKEND_ModelInitialize: spectrogram_chunker-English-US (version 1)”

I1119 04:14:41.276357 141 spectrogram-chunker.cc:276] “TRITONBACKEND_ModelInstanceInitialize: spectrogram_chunker-English-US_0_0 (device 0)”

I1119 04:14:41.276672 141 model_lifecycle.cc:849] “successfully loaded ‘spectrogram_chunker-English-US’”

I1119 04:14:41.281986 141 tts-postprocessor.cc:308] “TRITONBACKEND_ModelInitialize: tts_postprocessor-English-US (version 1)”

I1119 04:14:41.293689 141 model_lifecycle.cc:849] “successfully loaded ‘tts_postprocessor-English-US’”

I1119 04:14:41.305631 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: tts_preprocessor-English-US (version 1)”

WARNING: Logging before InitGoogleLogging() is written to STDERR

I1119 04:14:41.306182 146 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

2025-11-19 04:14:41.346024669 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

2025-11-19 04:14:41.346046093 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I1119 04:14:41.368024 146 abstract-grm-manager.h:168] Updating FST 0xe7d6ed0eb560 with input label sorted version.

I1119 04:14:41.400485 146 abstract-grm-manager.h:168] Updating FST 0xe7d8b825d290 with input label sorted version.

I1119 04:14:41.402129 146 abstract-grm-manager.h:168] Updating FST 0xe7d8b84bfb30 with input label sorted version.

I1119 04:14:41.402230 146 preprocessor.cc:279] TTS character mapping loaded from /data/models/tts_preprocessor-English-US/1/mapping.txt

I1119 04:14:41.402258 146 preprocessor.cc:357] Abbreviation mapping loaded from /data/models/tts_preprocessor-English-US/1/abbr.txt

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:41.435352 141 model_lifecycle.cc:849] “successfully loaded ‘riva-onnx-fastpitch_encoder-English-US’”

I1119 04:14:41.453712 146 preprocessor.cc:397] TTS phonetic mapping loaded from /data/models/tts_preprocessor-English-US/1/ipa_cmudict_single_pron-0.82_nv24.08.txt

I1119 04:14:41.453745 146 preprocessor.cc:529] TTS Preprocessor initialized with 1 languages

I1119 04:14:41.454046 141 model_lifecycle.cc:849] “successfully loaded ‘tts_preprocessor-English-US’”

WARNING: Logging before InitGoogleLogging() is written to STDERR

I1119 04:14:41.722193 147 asr_ensemble_factory.cc:278] Loading acoustic model

I1119 04:14:41.722230 147 asr_ensemble_factory.cc:284] Done loading acoustic model

I1119 04:14:41.798803 147 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

I1119 04:14:41.840123 141 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’”

I1119 04:14:41.857189 148 asr_ensemble_factory.cc:278] Loading acoustic model

I1119 04:14:41.857213 148 asr_ensemble_factory.cc:284] Done loading acoustic model

I1119 04:14:41.872560 148 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

I1119 04:14:41.903657 141 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’”

E1119 04:14:41.904077 141 model_repository_manager.cc:703] “Invalid argument: ensemble ‘fastpitch_hifigan_ensemble-English-US’ depends on ‘riva-trt-hifigan-English-US’ which has no loaded version. Model ‘riva-trt-hifigan-English-US’ loading failed with error: version 1 is at UNAVAILABLE state: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’;”

I1119 04:14:41.904124 141 server.cc:604]

±-----------------±-----+

| Repository Agent | Path |

±-----------------±-----+

±-----------------±-----+

I1119 04:14:41.904160 141 server.cc:631]

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Backend | Path | Config |

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

| riva_nlp_pipeline | /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_asr_ensemble_pipeline | /opt/tritonserver/backends/riva_asr_ensemble_pipeline/libtriton_riva_asr_ensemble_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_chunker | /opt/tritonserver/backends/riva_tts_chunker/libtriton_riva_tts_chunker.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_postprocessor | /opt/tritonserver/backends/riva_tts_postprocessor/libtriton_riva_tts_postprocessor.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_pipeline | /opt/tritonserver/backends/riva_tts_pipeline/libtriton_riva_tts_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.904203 141 server.cc:674]

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Model | Version | Status |

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| conformer-en-US-asr-offline-asr-bls-ensemble | 1 | READY |

| conformer-en-US-asr-streaming-asr-bls-ensemble | 1 | READY |

| riva-onnx-fastpitch_encoder-English-US | 1 | READY |

| riva-punctuation-en-US | 1 | READY |

| riva-trt-conformer-en-US-asr-offline-am-streaming-offline | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’ |

| riva-trt-conformer-en-US-asr-streaming-am-streaming | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’ |

| riva-trt-hifigan-English-US | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’ |

| riva-trt-riva-punctuation-en-US-nn-bert-base-uncased | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’ |

| spectrogram_chunker-English-US | 1 | READY |

| tts_postprocessor-English-US | 1 | READY |

| tts_preprocessor-English-US | 1 | READY |

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.942429 141 metrics.cc:890] “Collecting metrics for GPU 0: NVIDIA GB10”

I1119 04:14:41.949441 141 metrics.cc:783] “Collecting CPU metrics”

I1119 04:14:41.949499 141 tritonserver.cc:2598]

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Option | Value |

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| server_id | triton |

| server_version | 2.54.0 |

| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |

| model_repository_path[0] | /data/models |

| model_control_mode | MODE_NONE |

| strict_model_config | 1 |

| model_config_name | |

| rate_limit | OFF |

| pinned_memory_pool_byte_size | 268435456 |

| cuda_memory_pool_byte_size{0} | 1000000000 |

| min_supported_compute_capability | 6.0 |

| strict_readiness | 1 |

| exit_timeout | 30 |

| cache_enabled | 0 |

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.949521 141 server.cc:305] “Waiting for in-flight requests to complete.”

I1119 04:14:41.949532 141 server.cc:321] “Timeout 30: Found 0 model versions that have in-flight inferences”

I1119 04:14:41.949975 141 server.cc:336] “All models are stopped, unloading models”

I1119 04:14:41.949983 141 server.cc:345] “Timeout 30: Found 7 live models and 0 in-flight non-inference requests”

I1119 04:14:41.950474 141 pipeline_library.cc:31] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950695 141 pipeline_library.cc:30] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950843 141 onnxruntime.cc:2946] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950891 141 pipeline_library.cc:29] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.951242 141 spectrogram-chunker.cc:279] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.951280 141 spectrogram-chunker.cc:275] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.952507 141 tts-postprocessor.cc:313] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.952533 141 pipeline_library.cc:29] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.952554 141 model_lifecycle.cc:636] “successfully unloaded ‘spectrogram_chunker-English-US’ version 1”

I1119 04:14:41.963702 141 pipeline_library.cc:27] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.963772 141 model_lifecycle.cc:636] “successfully unloaded ‘riva-punctuation-en-US’ version 1”

I1119 04:14:41.964476 141 onnxruntime.cc:2870] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.964553 141 model_lifecycle.cc:636] “successfully unloaded ‘riva-onnx-fastpitch_encoder-English-US’ version 1”

I1119 04:14:41.966142 141 tts-postprocessor.cc:309] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.967078 141 model_lifecycle.cc:636] “successfully unloaded ‘tts_postprocessor-English-US’ version 1”

I1119 04:14:41.971463 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.971526 141 model_lifecycle.cc:636] “successfully unloaded ‘tts_preprocessor-English-US’ version 1”

I1119 04:14:42.085942 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:42.086430 141 model_lifecycle.cc:636] “successfully unloaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’ version 1”

I1119 04:14:42.089331 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:42.089647 141 model_lifecycle.cc:636] “successfully unloaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’ version 1”

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:42.950485 141 server.cc:345] “Timeout 29: Found 0 live models and 0 in-flight non-inference requests”

W1119 04:14:42.951227 141 metrics.cc:644] “Unable to get power limit for GPU 0. Status:Success, value:0.000000”

W1119 04:14:42.951257 141 metrics.cc:725] “Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0”

error: creating server: Internal - failed to load all models

> Riva waiting for Triton server to load all models…retrying in 1 second

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ cat config.sh
#!/bin/bash

Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.

NVIDIA CORPORATION and its licensors retain all intellectual property

and proprietary rights in and to this software, related documentation

and any modifications thereto. Any use, reproduction, disclosure or

distribution of this software and related documentation without an express

license agreement from NVIDIA CORPORATION is strictly prohibited.

GPU family of target platform. Supported values: tegra, non-tegra

riva_target_gpu_family=“non-tegra”

Name of tegra platform that is being used. Supported tegra platforms: orin, xavier

riva_tegra_platform=“orin”

####### Enable or Disable Riva Services #######

For any language other than en-US: service_enabled_nlp must be set to false

service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=true
service_enabled_nmt=false

####### Configure ASR service #######

List of supported ASR models and languages for each ASR model

Language code “multi” means a multilingual model, supported languages for various multilingual models are

specified on ASR Overview — NVIDIA Riva

“DO NOT EDIT” this field. Refer to this for valid values to be set in “asr_acoustic_model” and “asr_language_code” fields

declare -A asr_models_languages_map
asr_models_languages_map[“conformer”]=“ar-AR en-US en-GB de-DE es-ES es-US fr-FR hi-IN it-IT ja-JP ru-RU ko-KR pt-BR zh-CN nl-NL nl-BE”
asr_models_languages_map[“conformer_xl”]=“en-US”
asr_models_languages_map[“conformer_unified”]=“de-DE ja-JP zh-CN”
asr_models_languages_map[“conformer_ml_cs”]=“es-en-US”
asr_models_languages_map[“conformer_unified_ml_cs”]=“ja-en-JP”
asr_models_languages_map[“parakeet_0.6b”]=“en-US”
asr_models_languages_map[“parakeet_0.6b_unified”]=“en-US zh-CN”
asr_models_languages_map[“parakeet_0.6b_unified_ml_cs”]=“es-en-US”
asr_models_languages_map[“parakeet_1.1b”]=“en-US”
asr_models_languages_map[“parakeet_1.1b_unified_ml_cs”]=“em-ea”
asr_models_languages_map[“parakeet_1.1b_unified_ml_cs_universal”]=“multi”
asr_models_languages_map[“parakeet_1.1b_unified_ml_cs_concat”]=“multi”
asr_models_languages_map[“parakeet-rnnt_1.1b”]=“en-US”
asr_models_languages_map[“parakeet-rnnt_1.1b_unified_ml_cs_universal”]=“multi”
asr_models_languages_map[“whisper_large”]=“multi”
asr_models_languages_map[“whisper_large_turbo”]=“multi”
asr_models_languages_map[“distil_whisper_large”]=“en-US”
asr_models_languages_map[“kotoba_whisper”]=“ja-JP”
asr_models_languages_map[“canary_1b”]=“multi”
asr_models_languages_map[“canary_0.6b_turbo”]=“multi”

Specify ASR acoustic model to deploy, as defined in “asr_models_languages_map” above

Only one ASR acoustic model can be deployed at a time

asr_acoustic_model=(“conformer”)

Specify ASR language to deploy, as defined in “asr_models_languages_map” above

For multiple languages, enter space separated language codes

asr_language_code=(“en-US”)

Specify ASR accessory model from below list, prebuilt model available only when “asr_acoustic_model” is set to “parakeet_1.1b”

“diarizer” : deploy ASR model with Speaker Diarization model

“silero” : deploy ASR model with Silero Voice Activity Detector (VAD) model

“tele” : deploy ASR model trained with channel robust (telephony) data

Only one ASR accessory model can be deployed at a time

asr_accessory_model=(“”)

Set this field as true to deploy ASR with greedy decoder, instead of flashlight decoder

use_asr_greedy_decoder=false

Set this as true to deploy streaming ASR in high throughput mode, instead of low latency mode

use_asr_streaming_throughput_mode=false

Set this field as true to deploy an offline speaker diarization model

deploy_offline_diarizer=false

####### Configure TTS service #######

List of supported TTS models and languages for each TTS model

Language code “multi” means a multilingual model, supported languages for the multilingual models are

specified on TTS Overview — NVIDIA Riva

“DO NOT EDIT” this field. Refer to this for valid values to be set in “tts_model” and “tts_language_code” fields

declare -A tts_models_languages_map
tts_models_languages_map[“fastpitch_hifigan”]=“en-US es-ES es-US it-IT de-DE zh-CN”
tts_models_languages_map[“magpie”]=“multi”
tts_models_languages_map[“radtts_hifigan”]=“en-US”
tts_models_languages_map[“radttspp_hifigan”]=“en-US”
tts_models_languages_map[“pflow_hifigan”]=“en-US”

Specify TTS model to deploy, as defined in “tts_models_languages_map” above

Only one TTS model can be deployed at a time

tts_model=(“fastpitch_hifigan”)

Specify TTS language to deploy, as defined in “tts_models_languages_map” above

For multiple languages, enter space separated language codes

tts_language_code=(“en-US”)

####### Configure translation services #######

Text-to-Text translation (T2T):

- service_enabled_nmt must be set to true

Speech-to-Text translation (S2T):

- service_enabled_asr, service_enabled_nmt must be set to true

- Set language code of input speech in the asr_language_code field

Speech-to-Speech translation (S2S):

- service_enabled_asr, service_enabled_nmt, service_enabled_tts must be set to true

- Set language code of input speech in the asr_language_code field

- Set language code of output speech in the tts_language_code field

Remote deployment for ASR and TTS for S2T and S2S use cases

- NMT deployment supports using remote ASR and TTS service to allow better control on deployments.

- You need to deploy a separate Riva ASR service and Riva TTS service to use this functionality.

- Set nmt_remote_asr_service to point to your remote endpoint for Riva ASR service

- Set nmt_remote_tts_service to point to your remote endpoint for Riva TTS service

- By default, ASR and TTS service is used from the same local deployment along with NMT.

nmt_remote_asr_service=0.0.0.0:50051
nmt_remote_tts_service=0.0.0.0:50051

Enable Riva Enterprise

If enrolled in Enterprise, enable Riva Enterprise by setting configuration

here. You must explicitly acknowledge you have read and agree to the EULA.

RIVA_API_KEY=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
RIVA_API_NGC_ORG=superbold
RIVA_EULA=accept

Specify one or more GPUs to use

specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.

gpus_to_use=“all”

Specify the encryption key to use to deploy models

MODEL_DEPLOY_KEY=“tlt_encode”

Locations to use for storing models artifacts

If an absolute path is specified, the data will be written to that location

Otherwise, a Docker volume will be used (default).

riva_init.sh will create a rmir and models directory in the volume or

path specified.

RMIR ($riva_model_loc/rmir)

Riva uses an intermediate representation (RMIR) for models

that are ready to deploy but not yet fully optimized for deployment. Pretrained

versions can be obtained from NGC (by specifying NGC models below) and will be

downloaded to $riva_model_loc/rmir by riva_init.sh

Custom models produced by NeMo or TLT and prepared using riva-build

may also be copied manually to this location $(riva_model_loc/rmir).

Models ($riva_model_loc/models)

During the riva_init process, the RMIR files in $riva_model_loc/rmir

are inspected and optimized for deployment. The optimized versions are

stored in $riva_model_loc/models. The riva server exclusively uses these

optimized versions.

riva_model_loc=“/models/riva”

if [[ $riva_target_gpu_family == “tegra” ]]; then
riva_model_loc=“pwd/model_repository”
fi

The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory

If you’d like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc

then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom

RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the

below flag to deploy them all together.

use_existing_rmirs=false

Ports to expose for Riva services

riva_speech_api_port=“50051”
riva_speech_api_http_port=“50000”

NGC orgs

riva_ngc_org=“nvidia”
riva_ngc_team=“riva”
riva_ngc_image_version=“2.19.0”
riva_ngc_model_version=“2.19.0”

########## ASR MODELS ##########

models_asr=()

for lang_code in ${asr_language_code[@]}; do

filter unsupported models on tegra platform

if [[ $riva_target_gpu_family == “tegra” ]]; then
if [[ ${asr_acoustic_model} == “conformer_xl” ||
${asr_acoustic_model} == “parakeet-rnnt” ||
${asr_acoustic_model} == “canary” ||
${asr_acoustic_model} == “whisper” ]]; then
echo “${asr_acoustic_model} model not available for ${riva_target_gpu_family} gpu family”
exit 1
fi
if [[ ${asr_accessory_model} != “” || ${use_asr_greedy_decoder} == “true” || ${use_asr_streaming_throughput_mode} == “true” ]]; then
echo “Prebuilt accessory model, greedy decoder and streaming-throughput mode with ASR are not available for ${riva_target_gpu_family} gpu family”
exit 1
fi
fi

# filter unsupported models and languages
supported_languages_list=(${asr_models_languages_map[${asr_acoustic_model}]})
if [[ ${#supported_languages_list[@]} == 0 ]]; then
  echo "Acoustic model ${asr_acoustic_model} not found. Provide model name as defined in asr_models_languages_map"
  exit 1
else
  found=0
  for lang in "${supported_languages_list[@]}"; do
    if [[ ${lang} == ${lang_code} ]]; then
      found=1
      break
    fi
  done
  if [[ $found == 0 ]]; then
    echo "Acoustic model ${asr_acoustic_model} does not support ${lang_code} language. Provide language as defined in asr_models_languages_map"
    exit 1
  fi
fi

modified_asr_acoustic_model=${asr_acoustic_model//./-}
modified_lang_code="_${lang_code//-/_}"
modified_lang_code=${modified_lang_code,,}
if [[ ${modified_lang_code} == "_multi" ]]; then
  modified_lang_code=""
fi

# check if prebuilt RMIR with accessory model is to be used
accessory_model=""
if [[ ${asr_accessory_model} != "" ]]; then
  if [[ ${asr_accessory_model} != "diarizer" && ${asr_accessory_model} != "silero" && ${asr_accessory_model} != "tele" ]]; then
    echo "Invalid accessory model ${asr_accessory_model}. Only diarizer, silero and tele are supported"
    exit 1
  fi
  if [[ ${asr_acoustic_model} != "parakeet_1.1b" ]]; then
    echo "Only parakeet_1.1b + ${asr_accessory_model} is available as prebuilt model. Perform riva-build to create RMIR for other ASR models with ${asr_accessory_model}"
    exit 1
  fi
  if [[ ${use_asr_greedy_decoder} == "true" ]]; then
    echo "Greedy decoder is not supported with accessory models. Set use_asr_greedy_decoder to false"
    exit 1
  fi
  if [[ ${use_asr_streaming_throughput_mode} == "true" && ${asr_accessory_model} == "diarizer" ]]; then
    echo "Streaming throughput mode is not supported with accessory model ${asr_accessory_model}, Set use_asr_streaming_throughput_mode to false"
    exit 1
  fi
  accessory_model="_${asr_accessory_model}"
fi

# check if greedy decoder should be used
decoder=""
if [[ ${use_asr_greedy_decoder} == "true" || \
      ${asr_acoustic_model} == "parakeet_1.1b_unified_ml_cs_universal" || \
      ${asr_acoustic_model} == "parakeet_1.1b_unified_ml_cs_concat" || \
      ${asr_acoustic_model} == "parakeet-rnnt_1.1b" || \
      ${asr_acoustic_model} == "parakeet-rnnt_1.1b_unified_ml_cs_universal" ]]; then
  decoder="_gre"
fi

# check if streaming throughput mode is to be used
streaming_mode=""
if [[ ${use_asr_streaming_throughput_mode} == "true" ]]; then
  streaming_mode="_thr"
fi

# populate ngc paths
if [[ $riva_target_gpu_family == "tegra" ]]; then
    models_asr+=(
      ### Streaming w/ CPU decoder, best latency configuration
        "${riva_ngc_org}/${riva_ngc_team}/models_asr_${modified_asr_acoustic_model}${modified_lang_code}_str:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"
    )
    if [[ ${deploy_offline_diarizer} == "true" ]]; then
      models_asr+=(
        ### Offline w/ CPU decoder
          "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_${modified_asr_acoustic_model}${modified_lang_code}_ofl${decoder}:${riva_ngc_model_version}"
          "${riva_ngc_org}/${riva_ngc_team}/rmir_diarizer_offline:${riva_ngc_model_version}"
      )
    fi
else
  if [[ ${asr_acoustic_model} != *"whisper"* && ${asr_acoustic_model} != "parakeet-rnnt_1.1b" && ${asr_acoustic_model} != *"canary"* ]]; then
    models_asr+=(
      ### Streaming w/ CPU decoder, best latency or best throughput configuration
        "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_${modified_asr_acoustic_model}${modified_lang_code}_str${streaming_mode}${decoder}${accessory_model}:${riva_ngc_model_version}"
    )
  fi

  ### Offline w/ CPU decoder
  if [[ ${asr_acoustic_model} == *"whisper"* || ${asr_acoustic_model} == *"canary"* ]]; then
    models_asr+=(
      "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_${modified_asr_acoustic_model}_ofl:${riva_ngc_model_version}"
    )
  else
    if [[ ${asr_accessory_model} == "diarizer" ]]; then
      models_asr+=(
        "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_${modified_asr_acoustic_model}${modified_lang_code}_ofl${decoder}:${riva_ngc_model_version}"
      )
    else
      models_asr+=(
        "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_${modified_asr_acoustic_model}${modified_lang_code}_ofl${decoder}${accessory_model}:${riva_ngc_model_version}"
      )
fi
if [[ ${deploy_offline_diarizer} == "true" ]]; then
      models_asr+=(
    "${riva_ngc_org}/${riva_ngc_team}/rmir_diarizer_offline:${riva_ngc_model_version}"
      )
fi
  fi
fi

### Punctuation model
if [[ ${asr_acoustic_model} != *"unified"* && ${asr_acoustic_model} != *"whisper"* && ${asr_acoustic_model} != *"canary"* ]]; then
  pnc_lang=$(echo $modified_lang_code | cut -d "_" -f 2)
  pnc_region=${modified_lang_code##*_}
  modified_lang_code="_${pnc_lang}_${pnc_region}"
  if [[ $riva_target_gpu_family == "tegra" ]]; then
    if [[ "$lang_code" == "en-US" ]]; then
      models_asr+=(
      #  "${riva_ngc_org}/${riva_ngc_team}/models_nlp_punctuation_bert_large${modified_lang_code}:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"
      )
fi
    models_asr+=(
      "${riva_ngc_org}/${riva_ngc_team}/models_nlp_punctuation_bert_base${modified_lang_code}:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"
    )
  else
    if [[ "$lang_code" == "en-US" ]]; then
      models_asr+=(
      #  "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_large${modified_lang_code}:${riva_ngc_model_version}"
      )
fi
    models_asr+=(
      "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base${modified_lang_code}:${riva_ngc_model_version}"
    )
  fi
fi

done

########## NLP MODELS ##########

if [[ $riva_target_gpu_family == “tegra” ]]; then
models_nlp=(

Bert base Punctuation model

  "${riva_ngc_org}/${riva_ngc_team}/models_nlp_punctuation_bert_base_en_us:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"

“${riva_ngc_org}/${riva_ngc_team}/models_nlp_punctuation_bert_large_en_us:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}”

)
else
models_nlp=(

Bert base Punctuation model

  "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base_en_us:${riva_ngc_model_version}"

“${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_large_en_us:${riva_ngc_model_version}”

)
fi

########## TTS MODELS ##########

models_tts=()

for lang_code in ${tts_language_code[@]}; do

filter unsupported models on tegra platform

if [[ $riva_target_gpu_family == “tegra” ]]; then
if [[ ${tts_model} == “magpie” ]]; then
echo “${tts_model} model not available for ${riva_target_gpu_family} gpu family”
exit 1
fi
fi

filter unsupported models and languages

supported_languages_list=(${tts_models_languages_map[${tts_model}]})
if [[ ${#supported_languages_list[@]} == 0 ]]; then
echo “Model ${tts_model} not found. Provide model name as defined in tts_models_languages_map”
exit 1
else
found=0
for lang in “${supported_languages_list[@]}”; do
if [[ ${lang} == ${lang_code} ]]; then
found=1
break
fi
done
if [[ $found == 0 ]]; then
echo “Model ${tts_model} does not support ${lang_code} language. Provide language as defined in tts_models_languages_map”
exit 1
fi
fi

modified_lang_code=“${lang_code//-/}”
modified_lang_code=${modified_lang_code,}
if [[ ${modified_lang_code} == “_multi” ]]; then
modified_lang_code=“_multilingual”
fi

populate ngc paths

if [[ $riva_target_gpu_family == “tegra” ]]; then
if [[ ${lang_code} == “multi” || ${lang_code} == “en-US” || ${lang_code} == “zh-CN” || ${lang_code} == “es-US” ]]; then
if [[ ${tts_model} == “pflow_hifigan” ]]; then

This is a zero shot model for synthesizing speech using audio prompt input, require access to ea-riva-tts NGC org for using it

models_tts+=(
“gjaugwraudqz/rmir_tts_${tts_model}${modified_lang_code}ipa:${riva_ngc_model_version}"
)
else
models_tts+=(
"${riva_ngc_org}/${riva_ngc_team}/models_tts
${tts_model}${modified_lang_code}ipa:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"
)
fi
else
if [[ ${lang_code} != “de-DE” ]]; then
models_tts+=(
"${riva_ngc_org}/${riva_ngc_team}/models_tts
${tts_model}${modified_lang_code}f_ipa:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"
)
fi
models_tts+=(
"${riva_ngc_org}/${riva_ngc_team}/models_tts
${tts_model}${modified_lang_code}*m_ipa:${riva_ngc_model_version}-${riva_target_gpu_family}-${riva_tegra_platform}"
)
fi
else
if [[ ${lang_code} == “multi” || ${lang_code} == “en-US” || ${lang_code} == “zh-CN” || ${lang_code} == “es-US” ]]; then
if [[ ${tts_model} == “pflow_hifigan” ]]; then

This is a zero shot model for synthesizing speech using audio prompt input, require access to ea-riva-tts NGC org for using it

models_tts+=(
"gjaugwraudqz/rmir_tts*${tts_model}${modified_lang_code}ipa:${riva_ngc_model_version}"
)
else
models_tts+=(
"${riva_ngc_org}/${riva_ngc_team}/rmir_tts
${tts_model}${modified_lang_code}ipa:${riva_ngc_model_version}"
)
fi
else
if [[ ${lang_code} != “de-DE” ]]; then
models_tts+=(
"${riva_ngc_org}/${riva_ngc_team}/rmir_tts
${tts_model}${modified_lang_code}f_ipa:${riva_ngc_model_version}"
)
fi
models_tts+=(
"${riva_ngc_org}/${riva_ngc_team}/rmir_tts
${tts_model}${modified_lang_code}_m_ipa:${riva_ngc_model_version}”
)
fi
fi
done

######### NMT models ###############

Models follow Source language _ One or more target languages model architecture

Source or target language “any” means the model supports 32 languages mentioned in docs.

e.g., rmir_megatronnmt_en_any_500m is a English to 32 languages megatron model

models_nmt=(

Megatron models

#“${riva_ngc_org}/${riva_ngc_team}/rmir_megatronnmt_any_en_500m:${riva_ngc_model_version}”
#“${riva_ngc_org}/${riva_ngc_team}/rmir_megatronnmt_en_any_500m:${riva_ngc_model_version}”
#“${riva_ngc_org}/${riva_ngc_team}/rmir_nmt_megatron_1b_any_en:${riva_ngc_model_version}”
#“${riva_ngc_org}/${riva_ngc_team}/rmir_nmt_megatron_1b_en_any:${riva_ngc_model_version}”
“${riva_ngc_org}/${riva_ngc_team}/rmir_nmt_megatron_1b_any_any:${riva_ngc_model_version}”
)

NGC_TARGET=${riva_ngc_org}
if [[ ! -z ${riva_ngc_team} ]]; then
NGC_TARGET=“${NGC_TARGET}/${riva_ngc_team}”
else
team=““””
fi

Specify paths to SSL Key and Certificate files to use TLS/SSL Credentials for a secured connection.

If either are empty, an insecure connection will be used.

Stored within container at /ssl/servert.crt and /ssl/server.key

Optional, one can also specify a root certificate, stored within container at /ssl/root_server.crt

Set ssl_use_mutual_auth to true for enabling mutual TLS (mTLS) authentication

ssl_server_cert=“”
ssl_server_key=“”
ssl_root_cert=“”
ssl_use_mutual_auth=false

define Docker images required to run Riva

image_speech_api=“nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}”

daemon names

riva_daemon_speech=“riva-speech”
if [[ $riva_target_gpu_family != “tegra” ]]; then
riva_daemon_client=“riva-client”
fi