Riva start can't download all models on DGX spark

jack175 · November 20, 2025, 5:25am

Please provide the following information when requesting support.

Riva start can’t download all models on DGX spark GB10.

Hardware - GPU (A100/A30/T4/V100)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

jack175 · November 20, 2025, 11:24pm

Here is the log from running riva_start.sh:

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ bash riva_start.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.

Waiting for Riva server to load all models…retrying in 10 seconds

Health ready check failed.

Check Riva logs with: docker logs riva-speech

(base) a@spark-c3a9:/models/riva/riva_quickstart_v2.19.0$ docker logs riva-speech

==========================

=== Riva Speech Skills ===

==========================

NVIDIA Release 25.02 (build 151443008)

This container image and its contents are governed by the NVIDIA Deep Learning Container License.

By pulling and using the container, you accept the terms and conditions of this license:

WARNING: Detected NVIDIA GB10 GPU, which may not yet be supported in this version of the container

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:40.737388 141 pinned_memory_manager.cc:277] “Pinned memory pool is created at ‘0x32ee00000’ with size 268435456”

I1119 04:14:40.742592 141 cuda_memory_manager.cc:107] “CUDA memory pool is created on device 0 with size 1000000000”

I1119 04:14:40.780633 141 model_lifecycle.cc:473] “loading: conformer-en-US-asr-offline-asr-bls-ensemble:1”

I1119 04:14:40.780666 141 model_lifecycle.cc:473] “loading: conformer-en-US-asr-streaming-asr-bls-ensemble:1”

I1119 04:14:40.780684 141 model_lifecycle.cc:473] “loading: riva-onnx-fastpitch_encoder-English-US:1”

I1119 04:14:40.780701 141 model_lifecycle.cc:473] “loading: riva-punctuation-en-US:1”

I1119 04:14:40.780714 141 model_lifecycle.cc:473] “loading: riva-trt-conformer-en-US-asr-offline-am-streaming-offline:1”

I1119 04:14:40.780725 141 model_lifecycle.cc:473] “loading: riva-trt-conformer-en-US-asr-streaming-am-streaming:1”

I1119 04:14:40.780733 141 model_lifecycle.cc:473] “loading: riva-trt-hifigan-English-US:1”

I1119 04:14:40.780742 141 model_lifecycle.cc:473] “loading: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased:1”

I1119 04:14:40.780754 141 model_lifecycle.cc:473] “loading: spectrogram_chunker-English-US:1”

I1119 04:14:40.780769 141 model_lifecycle.cc:473] “loading: tts_postprocessor-English-US:1”

I1119 04:14:40.780789 141 model_lifecycle.cc:473] “loading: tts_preprocessor-English-US:1”

I1119 04:14:41.045152 141 pipeline_library.cc:24] “TRITONBACKEND_ModelInitialize: riva-punctuation-en-US (version 1)”

I1119 04:14:41.045638 141 backend_model.cc:303] “model configuration:\n{\n \“name\”: \“riva-punctuation-en-US\”,\n \“platform\”: \”\“,\n \“backend\”: \“riva_nlp_pipeline\”,\n \“runtime\”: \”\“,\n \“version_policy\”: {\n \“latest\”: {\n \“num_versions\”: 1\n }\n },\n \“max_batch_size\”: 8,\n \“input\”: [\n {\n \“name\”: \“PIPELINE_INPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“format\”: \“FORMAT_NONE\”,\n \“dims\”: [\n 1\n ],\n \“is_shape_tensor\”: false,\n \“allow_ragged_batch\”: false,\n \“optional\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“output\”: [\n {\n \“name\”: \“PIPELINE_OUTPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“dims\”: [\n 1\n ],\n \“label_filename\”: \”\“,\n \“is_shape_tensor\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“batch_input\”: ,\n \“batch_output\”: ,\n \“optimization\”: {\n \“priority\”: \“PRIORITY_DEFAULT\”,\n \“input_pinned_memory\”: {\n \“enable\”: true\n },\n \“output_pinned_memory\”: {\n \“enable\”: true\n },\n \“gather_kernel_buffer_threshold\”: 0,\n \“eager_batching\”: false\n },\n \“instance_group\”: [\n {\n \“name\”: \“riva-punctuation-en-US_0\”,\n \“kind\”: \“KIND_CPU\”,\n \“count\”: 1,\n \“gpus\”: ,\n \“secondary_devices\”: ,\n \“profile\”: ,\n \“passive\”: false,\n \“host_policy\”: \”\“\n }\n ],\n \“default_model_filename\”: \”\“,\n \“cc_model_filenames\”: {},\n \“metric_tags\”: {},\n \“parameters\”: {\n \“attn_mask_tensor_name\”: {\n \“string_value\”: \“attention_mask\”\n },\n \“punct_logits_tensor_name\”: {\n \“string_value\”: \“punct_logits\”\n },\n \“model_api\”: {\n \“string_value\”: \“PunctuateText\”\n },\n \“language_code\”: {\n \“string_value\”: \“en-US\”\n },\n \“eos_token\”: {\n \“string_value\”: \”[SEP]\“\n },\n \“unk_token\”: {\n \“string_value\”: \”[UNK]\“\n },\n \“to_lower\”: {\n \“string_value\”: \“true\”\n },\n \“punctuation_mapping_path\”: {\n \“string_value\”: \“fe160f3a917d411b99852e509e3279a3_punct_label_ids.csv\”\n },\n \“load_model\”: {\n \“string_value\”: \“false\”\n },\n \“tokenizer\”: {\n \“string_value\”: \“wordpiece\”\n },\n \“delimiter\”: {\n \“string_value\”: \” \“\n },\n \“token_type_tensor_name\”: {\n \“string_value\”: \“token_type_ids\”\n },\n \“remove_spaces\”: {\n \“string_value\”: \“False\”\n },\n \“capitalization_mapping_path\”: {\n \“string_value\”: \“a4ed235fb32c44e58eab5854d3cd94f8_capit_label_ids.csv\”\n },\n \“model_family\”: {\n \“string_value\”: \“riva\”\n },\n \“tokenizer_to_lower\”: {\n \“string_value\”: \“true\”\n },\n \“pipeline_type\”: {\n \“string_value\”: \“punctuation\”\n },\n \“code_point_filename\”: {\n \“string_value\”: \“cp_data.json\”\n },\n \“unicode_normalize\”: {\n \“string_value\”: \“False\”\n },\n \“pad_chars_with_spaces\”: {\n \“string_value\”: \“False\”\n },\n \“use_int64_nn_inputs\”: {\n \“string_value\”: \“False\”\n },\n \“vocab\”: {\n \“string_value\”: \“f92889b136d2433693cb9127e1aea218_vocab.txt\”\n },\n \“bos_token\”: {\n \“string_value\”: \”[CLS]\“\n },\n \“capit_logits_tensor_name\”: {\n \“string_value\”: \“capit_logits\”\n },\n \“model_name\”: {\n \“string_value\”: \“riva-trt-riva-punctuation-en-US-nn-bert-base-uncased\”\n },\n \“preserve_accents\”: {\n \“string_value\”: \“false\”\n },\n \“input_ids_tensor_name\”: {\n \“string_value\”: \“input_ids\”\n }\n },\n \“model_warmup\”: \n}”

I1119 04:14:41.045717 141 pipeline_library.cc:28] “TRITONBACKEND_ModelInstanceInitialize: riva-punctuation-en-US_0_0 (device 0)”

I1119 04:14:41.062678 141 onnxruntime.cc:2718] “TRITONBACKEND_Initialize: onnxruntime”

I1119 04:14:41.062704 141 onnxruntime.cc:2728] “Triton TRITONBACKEND API version: 1.19”

I1119 04:14:41.062707 141 onnxruntime.cc:2734] “‘onnxruntime’ TRITONBACKEND API version: 1.16”

I1119 04:14:41.062709 141 onnxruntime.cc:2764] “backend configuration:\n{\“cmdline\”:{\“auto-complete-config\”:\“false\”,\“backend-directory\”:\”/opt/tritonserver/backends\“,\“min-compute-capability\”:\“6.000000\”,\“default-max-batch-size\”:\“4\”}}”

I1119 04:14:41.074151 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble (version 1)”

Found yaml file: /data/models/conformer-en-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml

I1119 04:14:41.077293 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-asr-bls-ensemble (version 1)”

Found yaml file: /data/models/conformer-en-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml

I1119 04:14:41.080170 141 onnxruntime.cc:2829] “TRITONBACKEND_ModelInitialize: riva-onnx-fastpitch_encoder-English-US (version 1)”

I1119 04:14:41.080738 141 onnxruntime.cc:2894] “TRITONBACKEND_ModelInstanceInitialize: riva-onnx-fastpitch_encoder-English-US_0_0 (GPU device 0)”

I1119 04:14:41.101297 141 backend_model.cc:303] “model configuration:\n{\n \“name\”: \“conformer-en-US-asr-streaming-asr-bls-ensemble\”,\n \“platform\”: \”\“,\n \“backend\”: \“riva_asr_ensemble_pipeline\”,\n \“runtime\”: \”\“,\n \“version_policy\”: {\n \“latest\”: {\n \“num_versions\”: 1\n }\n },\n \“max_batch_size\”: 1024,\n \“input\”: [\n {\n \“name\”: \“PIPELINE_INPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“format\”: \“FORMAT_NONE\”,\n \“dims\”: [\n 1\n ],\n \“is_shape_tensor\”: false,\n \“allow_ragged_batch\”: false,\n \“optional\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“output\”: [\n {\n \“name\”: \“PIPELINE_OUTPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“dims\”: [\n 1\n ],\n \“label_filename\”: \”\“,\n \“is_shape_tensor\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“batch_input\”: ,\n \“batch_output\”: ,\n \“optimization\”: {\n \“graph\”: {\n \“level\”: 0\n },\n \“priority\”: \“PRIORITY_DEFAULT\”,\n \“cuda\”: {\n \“graphs\”: false,\n \“busy_wait_events\”: false,\n \“graph_spec\”: ,\n \“output_copy_stream\”: true\n },\n \“input_pinned_memory\”: {\n \“enable\”: true\n },\n \“output_pinned_memory\”: {\n \“enable\”: true\n },\n \“gather_kernel_buffer_threshold\”: 0,\n \“eager_batching\”: false\n },\n \“sequence_batching\”: {\n \“oldest\”: {\n \“max_candidate_sequences\”: 1024,\n \“preferred_batch_size\”: [\n 64,\n 128\n ],\n \“max_queue_delay_microseconds\”: 1000,\n \“preserve_ordering\”: false\n },\n \“max_sequence_idle_microseconds\”: 60000000,\n \“control_input\”: [\n {\n \“name\”: \“START\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_START\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“READY\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_READY\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“END\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_END\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“CORRID\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_CORRID\”,\n \“int32_false_true\”: ,\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_UINT64\”\n }\n ]\n }\n ],\n \“state\”: ,\n \“iterative_sequence\”: false\n },\n \“instance_group\”: [\n {\n \“name\”: \“conformer-en-US-asr-streaming-asr-bls-ensemble_0\”,\n \“kind\”: \“KIND_CPU\”,\n \“count\”: 1,\n \“gpus\”: ,\n \“secondary_devices\”: ,\n \“profile\”: ,\n \“passive\”: false,\n \“host_policy\”: \”\“\n }\n ],\n \“default_model_filename\”: \”\“,\n \“cc_model_filenames\”: {},\n \“metric_tags\”: {},\n \“parameters\”: {\n \“offline\”: {\n \“string_value\”: \“False\”\n },\n \“language_code\”: {\n \“string_value\”: \“en-US\”\n },\n \“type\”: {\n \“string_value\”: \“online\”\n },\n \“yaml_parameters_file\”: {\n \“string_value\”: \“riva_bls_config.yaml\”\n },\n \“streaming\”: {\n \“string_value\”: \“True\”\n },\n \“sample_rate\”: {\n \“string_value\”: \“16000\”\n },\n \“model_family\”: {\n \“string_value\”: \“riva\”\n }\n },\n \“model_warmup\”: ,\n \“model_transaction_policy\”: {\n \“decoupled\”: true\n }\n}”

I1119 04:14:41.101388 141 backend_model.cc:303] “model configuration:\n{\n \“name\”: \“conformer-en-US-asr-offline-asr-bls-ensemble\”,\n \“platform\”: \”\“,\n \“backend\”: \“riva_asr_ensemble_pipeline\”,\n \“runtime\”: \”\“,\n \“version_policy\”: {\n \“latest\”: {\n \“num_versions\”: 1\n }\n },\n \“max_batch_size\”: 1024,\n \“input\”: [\n {\n \“name\”: \“PIPELINE_INPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“format\”: \“FORMAT_NONE\”,\n \“dims\”: [\n 1\n ],\n \“is_shape_tensor\”: false,\n \“allow_ragged_batch\”: false,\n \“optional\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“output\”: [\n {\n \“name\”: \“PIPELINE_OUTPUT\”,\n \“data_type\”: \“TYPE_STRING\”,\n \“dims\”: [\n 1\n ],\n \“label_filename\”: \”\“,\n \“is_shape_tensor\”: false,\n \“is_non_linear_format_io\”: false\n }\n ],\n \“batch_input\”: ,\n \“batch_output\”: ,\n \“optimization\”: {\n \“graph\”: {\n \“level\”: 0\n },\n \“priority\”: \“PRIORITY_DEFAULT\”,\n \“cuda\”: {\n \“graphs\”: false,\n \“busy_wait_events\”: false,\n \“graph_spec\”: ,\n \“output_copy_stream\”: true\n },\n \“input_pinned_memory\”: {\n \“enable\”: true\n },\n \“output_pinned_memory\”: {\n \“enable\”: true\n },\n \“gather_kernel_buffer_threshold\”: 0,\n \“eager_batching\”: false\n },\n \“sequence_batching\”: {\n \“oldest\”: {\n \“max_candidate_sequences\”: 1024,\n \“preferred_batch_size\”: [\n 64,\n 128\n ],\n \“max_queue_delay_microseconds\”: 1000,\n \“preserve_ordering\”: false\n },\n \“max_sequence_idle_microseconds\”: 60000000,\n \“control_input\”: [\n {\n \“name\”: \“START\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_START\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“READY\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_READY\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“END\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_END\”,\n \“int32_false_true\”: [\n 0,\n 1\n ],\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_INVALID\”\n }\n ]\n },\n {\n \“name\”: \“CORRID\”,\n \“control\”: [\n {\n \“kind\”: \“CONTROL_SEQUENCE_CORRID\”,\n \“int32_false_true\”: ,\n \“fp32_false_true\”: ,\n \“bool_false_true\”: ,\n \“data_type\”: \“TYPE_UINT64\”\n }\n ]\n }\n ],\n \“state\”: ,\n \“iterative_sequence\”: false\n },\n \“instance_group\”: [\n {\n \“name\”: \“conformer-en-US-asr-offline-asr-bls-ensemble_0\”,\n \“kind\”: \“KIND_CPU\”,\n \“count\”: 1,\n \“gpus\”: ,\n \“secondary_devices\”: ,\n \“profile\”: ,\n \“passive\”: false,\n \“host_policy\”: \”\“\n }\n ],\n \“default_model_filename\”: \”\“,\n \“cc_model_filenames\”: {},\n \“metric_tags\”: {},\n \“parameters\”: {\n \“streaming\”: {\n \“string_value\”: \“True\”\n },\n \“type\”: {\n \“string_value\”: \“offline\”\n },\n \“sample_rate\”: {\n \“string_value\”: \“16000\”\n },\n \“offline\”: {\n \“string_value\”: \“True\”\n },\n \“language_code\”: {\n \“string_value\”: \“en-US\”\n },\n \“model_family\”: {\n \“string_value\”: \“riva\”\n },\n \“yaml_parameters_file\”: {\n \“string_value\”: \“riva_bls_config.yaml\”\n }\n },\n \“model_warmup\”: ,\n \“model_transaction_policy\”: {\n \“decoupled\”: true\n }\n}”

I1119 04:14:41.101419 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”

I1119 04:14:41.101488 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”

I1119 04:14:41.211051 141 model_lifecycle.cc:849] “successfully loaded ‘riva-punctuation-en-US’”

I1119 04:14:41.228707 141 tensorrt.cc:65] “TRITONBACKEND_Initialize: tensorrt”

I1119 04:14:41.228728 141 tensorrt.cc:75] “Triton TRITONBACKEND API version: 1.19”

I1119 04:14:41.228730 141 tensorrt.cc:81] “‘tensorrt’ TRITONBACKEND API version: 1.19”

I1119 04:14:41.228733 141 tensorrt.cc:105] “backend configuration:\n{\“cmdline\”:{\“auto-complete-config\”:\“false\”,\“backend-directory\”:\”/opt/tritonserver/backends\“,\“min-compute-capability\”:\“6.000000\”,\“default-max-batch-size\”:\“4\”}}”

I1119 04:14:41.233863 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-offline-am-streaming-offline (version 1)”

I1119 04:14:41.234178 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0 (GPU device 0)”

I1119 04:14:41.239240 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.239248 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’”

I1119 04:14:41.239253 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.239268 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’”

I1119 04:14:41.239275 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline’”

I1119 04:14:41.244287 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming (version 1)”

I1119 04:14:41.244483 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0 (GPU device 0)”

I1119 04:14:41.249308 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.249314 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’”

I1119 04:14:41.249317 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.249326 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-conformer-en-US-asr-streaming-am-streaming’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’”

I1119 04:14:41.249330 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-conformer-en-US-asr-streaming-am-streaming’”

I1119 04:14:41.254355 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-hifigan-English-US (version 1)”

I1119 04:14:41.254514 141 backend_model.cc:281] “Overriding execution policy to \“TRITONBACKEND_EXECUTION_BLOCKING\” for sequence model \“riva-trt-hifigan-English-US\””

I1119 04:14:41.254520 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-hifigan-English-US_0_0 (GPU device 0)”

I1119 04:14:41.260219 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.260235 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’”

I1119 04:14:41.260240 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.260258 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-hifigan-English-US’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’”

I1119 04:14:41.260263 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-hifigan-English-US’”

I1119 04:14:41.265299 141 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased (version 1)”

I1119 04:14:41.265547 141 tensorrt.cc:297] “TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0 (GPU device 0)”

I1119 04:14:41.270502 141 tensorrt.cc:353] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

E1119 04:14:41.270507 141 backend_model.cc:692] “ERROR: Failed to create instance: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’”

I1119 04:14:41.270511 141 tensorrt.cc:274] “TRITONBACKEND_ModelFinalize: delete model state”

E1119 04:14:41.270520 141 model_lifecycle.cc:654] “failed to load ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased’ version 1: Unavailable: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’”

I1119 04:14:41.270524 141 model_lifecycle.cc:789] “failed to load ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased’”

I1119 04:14:41.275984 141 spectrogram-chunker.cc:274] “TRITONBACKEND_ModelInitialize: spectrogram_chunker-English-US (version 1)”

I1119 04:14:41.276357 141 spectrogram-chunker.cc:276] “TRITONBACKEND_ModelInstanceInitialize: spectrogram_chunker-English-US_0_0 (device 0)”

I1119 04:14:41.276672 141 model_lifecycle.cc:849] “successfully loaded ‘spectrogram_chunker-English-US’”

I1119 04:14:41.281986 141 tts-postprocessor.cc:308] “TRITONBACKEND_ModelInitialize: tts_postprocessor-English-US (version 1)”

I1119 04:14:41.282276 141 tts-postprocessor.cc:310] “TRITONBACKEND_ModelInstanceInitialize: tts_postprocessor-English-US_0_0 (device 0)”

I1119 04:14:41.293689 141 model_lifecycle.cc:849] “successfully loaded ‘tts_postprocessor-English-US’”

I1119 04:14:41.305631 141 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: tts_preprocessor-English-US (version 1)”

I1119 04:14:41.306160 141 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: tts_preprocessor-English-US_0_0 (device 0)”

WARNING: Logging before InitGoogleLogging() is written to STDERR

I1119 04:14:41.306182 146 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

2025-11-19 04:14:41.346024669 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

2025-11-19 04:14:41.346046093 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I1119 04:14:41.368024 146 abstract-grm-manager.h:168] Updating FST 0xe7d6ed0eb560 with input label sorted version.

I1119 04:14:41.400485 146 abstract-grm-manager.h:168] Updating FST 0xe7d8b825d290 with input label sorted version.

I1119 04:14:41.402129 146 abstract-grm-manager.h:168] Updating FST 0xe7d8b84bfb30 with input label sorted version.

I1119 04:14:41.402230 146 preprocessor.cc:279] TTS character mapping loaded from /data/models/tts_preprocessor-English-US/1/mapping.txt

I1119 04:14:41.402258 146 preprocessor.cc:357] Abbreviation mapping loaded from /data/models/tts_preprocessor-English-US/1/abbr.txt

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:41.435352 141 model_lifecycle.cc:849] “successfully loaded ‘riva-onnx-fastpitch_encoder-English-US’”

I1119 04:14:41.453712 146 preprocessor.cc:397] TTS phonetic mapping loaded from /data/models/tts_preprocessor-English-US/1/ipa_cmudict_single_pron-0.82_nv24.08.txt

I1119 04:14:41.453745 146 preprocessor.cc:529] TTS Preprocessor initialized with 1 languages

I1119 04:14:41.454046 141 model_lifecycle.cc:849] “successfully loaded ‘tts_preprocessor-English-US’”

WARNING: Logging before InitGoogleLogging() is written to STDERR

I1119 04:14:41.722193 147 asr_ensemble_factory.cc:278] Loading acoustic model

I1119 04:14:41.722230 147 asr_ensemble_factory.cc:284] Done loading acoustic model

I1119 04:14:41.798803 147 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

I1119 04:14:41.840123 141 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’”

I1119 04:14:41.857189 148 asr_ensemble_factory.cc:278] Loading acoustic model

I1119 04:14:41.857213 148 asr_ensemble_factory.cc:284] Done loading acoustic model

I1119 04:14:41.872560 148 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”

verbalizer_grammar: “verbalizer.ascii_proto”

sentence_boundary_regexp: "[\\.:!\\?] "

sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”

I1119 04:14:41.903657 141 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’”

E1119 04:14:41.904077 141 model_repository_manager.cc:703] “Invalid argument: ensemble ‘fastpitch_hifigan_ensemble-English-US’ depends on ‘riva-trt-hifigan-English-US’ which has no loaded version. Model ‘riva-trt-hifigan-English-US’ loading failed with error: version 1 is at UNAVAILABLE state: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’;”

I1119 04:14:41.904124 141 server.cc:604]

±-----------------±-----+

| Repository Agent | Path |

±-----------------±-----+

I1119 04:14:41.904160 141 server.cc:631]

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Backend | Path | Config |

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

| riva_nlp_pipeline | /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_asr_ensemble_pipeline | /opt/tritonserver/backends/riva_asr_ensemble_pipeline/libtriton_riva_asr_ensemble_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_chunker | /opt/tritonserver/backends/riva_tts_chunker/libtriton_riva_tts_chunker.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_postprocessor | /opt/tritonserver/backends/riva_tts_postprocessor/libtriton_riva_tts_postprocessor.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

| riva_tts_pipeline | /opt/tritonserver/backends/riva_tts_pipeline/libtriton_riva_tts_pipeline.so | {“cmdline”:{“auto-complete-config”:“false”,“backend-directory”:“/opt/tritonserver/backends”,“min-compute-capability”:“6.000000”,“default-max-batch-size”:“4”}} |

±---------------------------±----------------------------------------------------------------------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.904203 141 server.cc:674]

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Model | Version | Status |

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| conformer-en-US-asr-offline-asr-bls-ensemble | 1 | READY |

| conformer-en-US-asr-streaming-asr-bls-ensemble | 1 | READY |

| riva-onnx-fastpitch_encoder-English-US | 1 | READY |

| riva-punctuation-en-US | 1 | READY |

| riva-trt-conformer-en-US-asr-offline-am-streaming-offline | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-offline-am-streaming-offline_0_0’ |

| riva-trt-conformer-en-US-asr-streaming-am-streaming | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model.plan’ for model instance ‘riva-trt-conformer-en-US-asr-streaming-am-streaming_0_0’ |

| riva-trt-hifigan-English-US | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-hifigan-English-US/1/model.plan’ for model instance ‘riva-trt-hifigan-English-US_0_0’ |

| riva-trt-riva-punctuation-en-US-nn-bert-base-uncased | 1 | UNAVAILABLE: Unavailable: unable to find ‘/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan’ for model instance ‘riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0_0’ |

| spectrogram_chunker-English-US | 1 | READY |

| tts_postprocessor-English-US | 1 | READY |

| tts_preprocessor-English-US | 1 | READY |

±----------------------------------------------------------±--------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.942429 141 metrics.cc:890] “Collecting metrics for GPU 0: NVIDIA GB10”

I1119 04:14:41.949441 141 metrics.cc:783] “Collecting CPU metrics”

I1119 04:14:41.949499 141 tritonserver.cc:2598]

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Option | Value |

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| server_id | triton |

| server_version | 2.54.0 |

| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |

| model_repository_path[0] | /data/models |

| model_control_mode | MODE_NONE |

| strict_model_config | 1 |

| model_config_name | |

| rate_limit | OFF |

| pinned_memory_pool_byte_size | 268435456 |

| cuda_memory_pool_byte_size{0} | 1000000000 |

| min_supported_compute_capability | 6.0 |

| strict_readiness | 1 |

| exit_timeout | 30 |

| cache_enabled | 0 |

±---------------------------------±----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1119 04:14:41.949521 141 server.cc:305] “Waiting for in-flight requests to complete.”

I1119 04:14:41.949532 141 server.cc:321] “Timeout 30: Found 0 model versions that have in-flight inferences”

I1119 04:14:41.949975 141 server.cc:336] “All models are stopped, unloading models”

I1119 04:14:41.949983 141 server.cc:345] “Timeout 30: Found 7 live models and 0 in-flight non-inference requests”

I1119 04:14:41.950474 141 pipeline_library.cc:31] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950695 141 pipeline_library.cc:30] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950843 141 onnxruntime.cc:2946] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.950891 141 pipeline_library.cc:29] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.951242 141 spectrogram-chunker.cc:279] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.951280 141 spectrogram-chunker.cc:275] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.952507 141 tts-postprocessor.cc:313] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.952533 141 pipeline_library.cc:29] “TRITONBACKEND_ModelInstanceFinalize: delete instance state”

I1119 04:14:41.952554 141 model_lifecycle.cc:636] “successfully unloaded ‘spectrogram_chunker-English-US’ version 1”

I1119 04:14:41.963702 141 pipeline_library.cc:27] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.963772 141 model_lifecycle.cc:636] “successfully unloaded ‘riva-punctuation-en-US’ version 1”

I1119 04:14:41.964476 141 onnxruntime.cc:2870] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.964553 141 model_lifecycle.cc:636] “successfully unloaded ‘riva-onnx-fastpitch_encoder-English-US’ version 1”

I1119 04:14:41.966142 141 tts-postprocessor.cc:309] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.967078 141 model_lifecycle.cc:636] “successfully unloaded ‘tts_postprocessor-English-US’ version 1”

I1119 04:14:41.971463 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:41.971526 141 model_lifecycle.cc:636] “successfully unloaded ‘tts_preprocessor-English-US’ version 1”

I1119 04:14:42.085942 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:42.086430 141 model_lifecycle.cc:636] “successfully unloaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’ version 1”

I1119 04:14:42.089331 141 pipeline_library.cc:25] “TRITONBACKEND_ModelFinalize: delete model state”

I1119 04:14:42.089647 141 model_lifecycle.cc:636] “successfully unloaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’ version 1”

> Riva waiting for Triton server to load all models…retrying in 1 second

I1119 04:14:42.950485 141 server.cc:345] “Timeout 29: Found 0 live models and 0 in-flight non-inference requests”

W1119 04:14:42.951227 141 metrics.cc:644] “Unable to get power limit for GPU 0. Status:Success, value:0.000000”

W1119 04:14:42.951257 141 metrics.cc:725] “Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0”

error: creating server: Internal - failed to load all models

> Riva waiting for Triton server to load all models…retrying in 1 second

W

> Riva waiting for Triton server to load all models…retrying in 1 second

> Triton server died before reaching ready state. Terminating Riva startup.

Check Triton logs with: docker logs

/opt/riva/bin/start-riva: line 1: kill: (141) - No such process

here is the config.sh: