Run init_start.sh failed

Please provide the following information when requesting support.

Hardware - Nvidia Geforce 1650TI
Hardware - Intel i5 9400F
Operating System - Ubuntu 20.04
Riva Version Riva Speech Skills 1.10.0 Beta

I ran riva_start.sh failed.

$ bash riva_start.sh
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Waiting for Riva server to load all models…retrying in 10 seconds
Health ready check failed.
Check Riva logs with: docker logs riva-speech

And I checked the docker log, found the following message:

==========================
=== Riva Speech Skills ===

NVIDIA Release 22.02 (build 32720915)
Riva Speech Server Version 1.10.0-beta

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 470.103.01 which has support for CUDA 11.4. This container
was built with CUDA 11.6 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation for details.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for Riva Speech Server. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 …

Riva waiting for Triton server to load all models…retrying in 1 second
I0408 12:45:01.876378 100 onnxruntime.cc:2319] TRITONBACKEND_Initialize: onnxruntime
I0408 12:45:01.876460 100 onnxruntime.cc:2329] Triton TRITONBACKEND API version: 1.8
I0408 12:45:01.876468 100 onnxruntime.cc:2335] ‘onnxruntime’ TRITONBACKEND API version: 1.8
I0408 12:45:01.876473 100 onnxruntime.cc:2365] backend configuration:
{}
I0408 12:45:02.018362 100 pinned_memory_manager.cc:240] Pinned memory pool is created at ‘0x7f3a28000000’ with size 268435456
I0408 12:45:02.018695 100 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 1000000000
E0408 12:45:02.020943 100 model_repository_manager.cc:1844] Poll failed for model directory ‘riva-trt-citrinet-1024-en-US-asr-offline-am-streaming-offline’: failed to open text file for read /data/models/riva-trt-citrinet-1024-en-US-asr-offline-am-streaming-offline/config.pbtxt: No such file or directory
I0408 12:45:02.022953 100 model_repository_manager.cc:994] loading: qa_qa_postprocessor:1
I0408 12:45:02.123480 100 model_repository_manager.cc:994] loading: qa_tokenizer:1
I0408 12:45:02.131935 100 qa_postprocessor_cbe.cc:124] TRITONBACKEND_ModelInitialize: qa_qa_postprocessor (version 1)
I0408 12:45:02.132803 100 backend_model.cc:255] model configuration:
{
“name”: “qa_qa_postprocessor”,
“platform”: “”,
“backend”: “riva_nlp_qa”,
“version_policy”: {
“latest”: {
“num_versions”: 1
}
},
“max_batch_size”: 8,
“input”: [
{
“name”: “QA_LOGITS__0”,
“data_type”: “TYPE_FP32”,
“format”: “FORMAT_NONE”,
“dims”: [
384,
2
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “SEQ_LEN__1”,
“data_type”: “TYPE_INT64”,
“format”: “FORMAT_NONE”,
“dims”: [
1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “TOK_STR__2”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
384
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “TOK_TO_ORIG__3”,
“data_type”: “TYPE_UINT16”,
“format”: “FORMAT_NONE”,
“dims”: [
384
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “IN_PASSAGE_STR__4”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
}
],
“output”: [
{
“name”: “ANSWER_SPANS__0”,
“data_type”: “TYPE_STRING”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “ANSWER_SCORES__1”,
“data_type”: “TYPE_FP32”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
}
],
“batch_input”: ,
“batch_output”: ,
“optimization”: {
“priority”: “PRIORITY_DEFAULT”,
“input_pinned_memory”: {
“enable”: true
},
“output_pinned_memory”: {
“enable”: true
},
“gather_kernel_buffer_threshold”: 0,
“eager_batching”: false
},
“instance_group”: [
{
“name”: “qa_qa_postprocessor_0”,
“kind”: “KIND_CPU”,
“count”: 1,
“gpus”: ,
“secondary_devices”: ,
“profile”: ,
“passive”: false,
“host_policy”: “”
}
],
“default_model_filename”: “”,
“cc_model_filenames”: {},
“metric_tags”: {},
“parameters”: {
“version_2_with_negative”: {
“string_value”: “True”
},
“n_best_size”: {
“string_value”: “20”
},
“max_answer_length”: {
“string_value”: “30”
},
“bert_model_seq_length”: {
“string_value”: “384”
}
},
“model_warmup”:
}
I0408 12:45:02.133061 100 qa_postprocessor_cbe.cc:126] TRITONBACKEND_ModelInstanceInitialize: qa_qa_postprocessor_0 (device 0)
I0408 12:45:02.133321 100 model_repository_manager.cc:1149] successfully loaded ‘qa_qa_postprocessor’ version 1
I0408 12:45:02.224030 100 model_repository_manager.cc:994] loading: riva-trt-riva_ner-nn-bert-base-uncased:1
I0408 12:45:02.241504 100 tokenizer_library.cc:18] TRITONBACKEND_ModelInitialize: qa_tokenizer (version 1)
W:parameter_parser.cc:118: Parameter bos could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter dropout_prob could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter eos could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter reverse could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter tokenizer_to_lower could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0408 12:45:02.243767 100 backend_model.cc:255] model configuration:
{
“name”: “qa_tokenizer”,
“platform”: “”,
“backend”: “riva_nlp_tokenizer”,
“version_policy”: {
“latest”: {
“num_versions”: 1
}
},
“max_batch_size”: 8,
“input”: [
{
“name”: “IN_QUERY_STR__0”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “IN_PASSAGE_STR__1”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
}
],
“output”: [
{
“name”: “SEQ__0”,
“data_type”: “TYPE_INT32”,
“dims”: [
384
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “MASK__1”,
“data_type”: “TYPE_INT32”,
“dims”: [
384
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “SEQ_LEN__2”,
“data_type”: “TYPE_INT64”,
“dims”: [
1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “TOK_STR__3”,
“data_type”: “TYPE_STRING”,
“dims”: [
384
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “SEGMENT__4”,
“data_type”: “TYPE_INT32”,
“dims”: [
384
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “TOK_TO_ORIG__5”,
“data_type”: “TYPE_UINT16”,
“dims”: [
384
],
“label_filename”: “”,
“is_shape_tensor”: false
}
],
“batch_input”: ,
“batch_output”: ,
“optimization”: {
“priority”: “PRIORITY_DEFAULT”,
“input_pinned_memory”: {
“enable”: true
},
“output_pinned_memory”: {
“enable”: true
},
“gather_kernel_buffer_threshold”: 0,
“eager_batching”: false
},
“instance_group”: [
{
“name”: “qa_tokenizer_0”,
“kind”: “KIND_CPU”,
“count”: 1,
“gpus”: ,
“secondary_devices”: ,
“profile”: ,
“passive”: false,
“host_policy”: “”
}
],
“default_model_filename”: “”,
“cc_model_filenames”: {},
“metric_tags”: {},
“parameters”: {
“unk_token”: {
“string_value”: “[UNK]”
},
“vocab”: {
“string_value”: “/data/models/qa_tokenizer/1/tokenizer.vocab_file”
},
“tokenizer”: {
“string_value”: “wordpiece”
},
“max_query_length”: {
“string_value”: “64”
},
“bos_token”: {
“string_value”: “[CLS]”
},
“to_lower”: {
“string_value”: “true”
},
“eos_token”: {
“string_value”: “[SEP]”
},
“task”: {
“string_value”: “qa”
},
“doc_stride”: {
“string_value”: “128”
}
},
“model_warmup”:
}
I0408 12:45:02.244480 100 tokenizer_library.cc:21] TRITONBACKEND_ModelInstanceInitialize: qa_tokenizer_0 (device 0)
I0408 12:45:02.275000 100 model_repository_manager.cc:1149] successfully loaded ‘qa_tokenizer’ version 1
I0408 12:45:02.324704 100 model_repository_manager.cc:994] loading: riva-trt-riva_qa-nn-bert-base-uncased:1
I0408 12:45:02.389447 100 tensorrt.cc:5145] TRITONBACKEND_Initialize: tensorrt
I0408 12:45:02.389470 100 tensorrt.cc:5155] Triton TRITONBACKEND API version: 1.8
I0408 12:45:02.389477 100 tensorrt.cc:5161] ‘tensorrt’ TRITONBACKEND API version: 1.8
I0408 12:45:02.389560 100 tensorrt.cc:5204] backend configuration:
{}
I0408 12:45:02.389585 100 tensorrt.cc:5256] TRITONBACKEND_ModelInitialize: riva-trt-riva_ner-nn-bert-base-uncased (version 1)
I0408 12:45:02.390143 100 tensorrt.cc:5305] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva_ner-nn-bert-base-uncased_0 (GPU device 0)
I0408 12:45:02.424962 100 model_repository_manager.cc:994] loading: token_classification_detokenizer:1
I0408 12:45:02.525213 100 model_repository_manager.cc:994] loading: token_classification_label_tokens:1
I0408 12:45:02.625474 100 model_repository_manager.cc:994] loading: token_classification_tokenizer:1
I0408 12:45:02.693092 100 logging.cc:49] [MemUsageChange] Init CUDA: CPU +314, GPU +0, now: CPU 351, GPU 1723 (MiB)
Riva waiting for Triton server to load all models…retrying in 1 second
I0408 12:45:02.884058 100 logging.cc:49] Loaded engine size: 209 MiB
I0408 12:45:03.483453 100 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 1419, GPU 2289 (MiB)
I0408 12:45:03.598646 100 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +114, GPU +52, now: CPU 1533, GPU 2341 (MiB)
I0408 12:45:03.599335 100 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +95, now: CPU 0, GPU 95 (MiB)
I0408 12:45:03.614462 100 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1114, GPU 2333 (MiB)
I0408 12:45:03.615487 100 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1114, GPU 2341 (MiB)
I0408 12:45:03.677797 100 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +59, now: CPU 0, GPU 154 (MiB)
I0408 12:45:03.678006 100 tensorrt.cc:1409] Created instance riva-trt-riva_ner-nn-bert-base-uncased_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0408 12:45:03.678069 100 tensorrt.cc:5256] TRITONBACKEND_ModelInitialize: riva-trt-riva_qa-nn-bert-base-uncased (version 1)
I0408 12:45:03.678287 100 model_repository_manager.cc:1149] successfully loaded ‘riva-trt-riva_ner-nn-bert-base-uncased’ version 1
I0408 12:45:03.678862 100 detokenizer_cbe.cc:145] TRITONBACKEND_ModelInitialize: token_classification_detokenizer (version 1)
I0408 12:45:03.679285 100 backend_model.cc:255] model configuration:
{
“name”: “token_classification_detokenizer”,
“platform”: “”,
“backend”: “riva_nlp_detokenizer”,
“version_policy”: {
“latest”: {
“num_versions”: 1
}
},
“max_batch_size”: 8,
“input”: [
{
“name”: “IN_TOKEN_LABELS__0”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
-1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “IN_TOKEN_SCORES__1”,
“data_type”: “TYPE_FP32”,
“format”: “FORMAT_NONE”,
“dims”: [
-1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “IN_SEQ_LEN__2”,
“data_type”: “TYPE_INT64”,
“format”: “FORMAT_NONE”,
“dims”: [
1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
},
{
“name”: “IN_TOK_STR__3”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
-1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
}
],
“output”: [
{
“name”: “OUT_TOKEN_LABELS__0”,
“data_type”: “TYPE_STRING”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “OUT_TOKEN_SCORES__1”,
“data_type”: “TYPE_FP32”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “OUT_SEQ_LEN__2”,
“data_type”: “TYPE_INT64”,
“dims”: [
1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “OUT_TOK_STR__3”,
“data_type”: “TYPE_STRING”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
}
],
“batch_input”: ,
“batch_output”: ,
“optimization”: {
“priority”: “PRIORITY_DEFAULT”,
“cuda”: {
“graphs”: false,
“busy_wait_events”: false,
“graph_spec”: ,
“output_copy_stream”: true
},
“input_pinned_memory”: {
“enable”: true
},
“output_pinned_memory”: {
“enable”: true
},
“gather_kernel_buffer_threshold”: 0,
“eager_batching”: false
},
“instance_group”: [
{
“name”: “token_classification_detokenizer_0”,
“kind”: “KIND_CPU”,
“count”: 1,
“gpus”: ,
“secondary_devices”: ,
“profile”: ,
“passive”: false,
“host_policy”: “”
}
],
“default_model_filename”: “”,
“cc_model_filenames”: {},
“metric_tags”: {},
“parameters”: {},
“model_warmup”: ,
“model_transaction_policy”: {
“decoupled”: false
}
}
I0408 12:45:03.679453 100 detokenizer_cbe.cc:147] TRITONBACKEND_ModelInstanceInitialize: token_classification_detokenizer_0 (device 0)
I0408 12:45:03.679545 100 model_repository_manager.cc:1149] successfully loaded ‘token_classification_detokenizer’ version 1
I0408 12:45:03.679800 100 tensorrt.cc:5305] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva_qa-nn-bert-base-uncased_0 (GPU device 0)
I0408 12:45:03.680293 100 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1228, GPU 2545 (MiB)
I0408 12:45:03.870424 100 logging.cc:49] Loaded engine size: 208 MiB
Riva waiting for Triton server to load all models…retrying in 1 second
I0408 12:45:03.984435 100 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1760, GPU 2887 (MiB)
I0408 12:45:03.985024 100 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1760, GPU 2897 (MiB)
I0408 12:45:03.985336 100 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +95, now: CPU 0, GPU 249 (MiB)
I0408 12:45:04.000701 100 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1344, GPU 2889 (MiB)
I0408 12:45:04.001185 100 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1344, GPU 2897 (MiB)
I0408 12:45:04.091156 100 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +62, now: CPU 0, GPU 311 (MiB)
I0408 12:45:04.091373 100 tensorrt.cc:1409] Created instance riva-trt-riva_qa-nn-bert-base-uncased_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0408 12:45:04.091447 100 sequence_label_cbe.cc:137] TRITONBACKEND_ModelInitialize: token_classification_label_tokens (version 1)
I0408 12:45:04.091637 100 model_repository_manager.cc:1149] successfully loaded ‘riva-trt-riva_qa-nn-bert-base-uncased’ version 1
I0408 12:45:04.091921 100 backend_model.cc:255] model configuration:
{
“name”: “token_classification_label_tokens”,
“platform”: “”,
“backend”: “riva_nlp_seqlabel”,
“version_policy”: {
“latest”: {
“num_versions”: 1
}
},
“max_batch_size”: 8,
“input”: [
{
“name”: “TOKEN_LOGIT__1”,
“data_type”: “TYPE_FP32”,
“format”: “FORMAT_NONE”,
“dims”: [
-1,
13
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
}
],
“output”: [
{
“name”: “TOKEN_LABELS__0”,
“data_type”: “TYPE_STRING”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “TOKEN_SCORES__1”,
“data_type”: “TYPE_FP32”,
“dims”: [
-1
],
“label_filename”: “”,
“is_shape_tensor”: false
}
],
“batch_input”: ,
“batch_output”: ,
“optimization”: {
“priority”: “PRIORITY_DEFAULT”,
“input_pinned_memory”: {
“enable”: true
},
“output_pinned_memory”: {
“enable”: true
},
“gather_kernel_buffer_threshold”: 0,
“eager_batching”: false
},
“instance_group”: [
{
“name”: “token_classification_label_tokens_0”,
“kind”: “KIND_CPU”,
“count”: 1,
“gpus”: ,
“secondary_devices”: ,
“profile”: ,
“passive”: false,
“host_policy”: “”
}
],
“default_model_filename”: “”,
“cc_model_filenames”: {},
“metric_tags”: {},
“parameters”: {
“classes”: {
“string_value”: “/data/models/token_classification_label_tokens/1/label_ids.csv”
}
},
“model_warmup”:
}
I0408 12:45:04.091995 100 tokenizer_library.cc:18] TRITONBACKEND_ModelInitialize: token_classification_tokenizer (version 1)
W:parameter_parser.cc:118: Parameter doc_stride could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_query_length could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter bos could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter doc_stride could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter dropout_prob could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter eos could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_query_length could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter reverse could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter tokenizer_to_lower could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0408 12:45:04.092467 100 backend_model.cc:255] model configuration:
{
“name”: “token_classification_tokenizer”,
“platform”: “”,
“backend”: “riva_nlp_tokenizer”,
“version_policy”: {
“latest”: {
“num_versions”: 1
}
},
“max_batch_size”: 8,
“input”: [
{
“name”: “INPUT_STR__0”,
“data_type”: “TYPE_STRING”,
“format”: “FORMAT_NONE”,
“dims”: [
1
],
“is_shape_tensor”: false,
“allow_ragged_batch”: false,
“optional”: false
}
],
“output”: [
{
“name”: “SEQ__0”,
“data_type”: “TYPE_INT32”,
“dims”: [
128
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “MASK__1”,
“data_type”: “TYPE_INT32”,
“dims”: [
128
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “SEGMENT__4”,
“data_type”: “TYPE_INT32”,
“dims”: [
128
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “SEQ_LEN__2”,
“data_type”: “TYPE_INT64”,
“dims”: [
1
],
“label_filename”: “”,
“is_shape_tensor”: false
},
{
“name”: “TOK_STR__3”,
“data_type”: “TYPE_STRING”,
“dims”: [
128
],
“label_filename”: “”,
“is_shape_tensor”: false
}
],
“batch_input”: ,
“batch_output”: ,
“optimization”: {
“priority”: “PRIORITY_DEFAULT”,
“input_pinned_memory”: {
“enable”: true
},
“output_pinned_memory”: {
“enable”: true
},
“gather_kernel_buffer_threshold”: 0,
“eager_batching”: false
},
“instance_group”: [
{
“name”: “token_classification_tokenizer_0”,
“kind”: “KIND_CPU”,
“count”: 1,
“gpus”: ,
“secondary_devices”: ,
“profile”: ,
“passive”: false,
“host_policy”: “”
}
],
“default_model_filename”: “”,
“cc_model_filenames”: {},
“metric_tags”: {},
“parameters”: {
“vocab”: {
“string_value”: “/data/models/token_classification_tokenizer/1/tokenizer.vocab_file”
},
“tokenizer”: {
“string_value”: “wordpiece”
},
“bos_token”: {
“string_value”: “[CLS]”
},
“eos_token”: {
“string_value”: “[SEP]”
},
“to_lower”: {
“string_value”: “true”
},
“task”: {
“string_value”: “single_input”
},
“unk_token”: {
“string_value”: “[UNK]”
}
},
“model_warmup”:
}
I0408 12:45:04.092498 100 sequence_label_cbe.cc:139] TRITONBACKEND_ModelInstanceInitialize: token_classification_label_tokens_0 (device 0)
I0408 12:45:04.092589 100 model_repository_manager.cc:1149] successfully loaded ‘token_classification_label_tokens’ version 1
I0408 12:45:04.092623 100 tokenizer_library.cc:21] TRITONBACKEND_ModelInstanceInitialize: token_classification_tokenizer_0 (device 0)
I0408 12:45:04.103258 100 model_repository_manager.cc:1149] successfully loaded ‘token_classification_tokenizer’ version 1
I0408 12:45:04.103609 100 model_repository_manager.cc:994] loading: riva_ner:1
I0408 12:45:04.204070 100 model_repository_manager.cc:994] loading: riva_qa:1
I0408 12:45:04.304570 100 model_repository_manager.cc:1149] successfully loaded ‘riva_ner’ version 1
I0408 12:45:04.304916 100 model_repository_manager.cc:1149] successfully loaded ‘riva_qa’ version 1
I0408 12:45:04.305133 100 server.cc:522]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0408 12:45:04.305352 100 server.cc:549]
±---------------------±----------------------------------------------------------------------------------±-------+
| Backend | Path | Config |
±---------------------±----------------------------------------------------------------------------------±-------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| riva_nlp_qa | /opt/tritonserver/backends/riva_nlp_qa/libtriton_riva_nlp_qa.so | {} |
| riva_nlp_tokenizer | /opt/tritonserver/backends/riva_nlp_tokenizer/libtriton_riva_nlp_tokenizer.so | {} |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {} |
| riva_nlp_detokenizer | /opt/tritonserver/backends/riva_nlp_detokenizer/libtriton_riva_nlp_detokenizer.so | {} |
| riva_nlp_seqlabel | /opt/tritonserver/backends/riva_nlp_seqlabel/libtriton_riva_nlp_seqlabel.so | {} |
±---------------------±----------------------------------------------------------------------------------±-------+

I0408 12:45:04.305551 100 server.cc:592]
±---------------------------------------±--------±-------+
| Model | Version | Status |
±---------------------------------------±--------±-------+
| qa_qa_postprocessor | 1 | READY |
| qa_tokenizer | 1 | READY |
| riva-trt-riva_ner-nn-bert-base-uncased | 1 | READY |
| riva-trt-riva_qa-nn-bert-base-uncased | 1 | READY |
| riva_ner | 1 | READY |
| riva_qa | 1 | READY |
| token_classification_detokenizer | 1 | READY |
| token_classification_label_tokens | 1 | READY |
| token_classification_tokenizer | 1 | READY |
±---------------------------------------±--------±-------+

I0408 12:45:04.370080 100 metrics.cc:623] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650 SUPER
I0408 12:45:04.370385 100 tritonserver.cc:1932]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.19.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /data/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 1000000000 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0408 12:45:04.370395 100 server.cc:252] Waiting for in-flight requests to complete.
I0408 12:45:04.370401 100 model_repository_manager.cc:1026] unloading: token_classification_tokenizer:1
I0408 12:45:04.370445 100 model_repository_manager.cc:1026] unloading: token_classification_label_tokens:1
I0408 12:45:04.370481 100 model_repository_manager.cc:1026] unloading: riva_qa:1
I0408 12:45:04.370567 100 model_repository_manager.cc:1026] unloading: token_classification_detokenizer:1
I0408 12:45:04.370575 100 sequence_label_cbe.cc:141] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0408 12:45:04.370607 100 model_repository_manager.cc:1026] unloading: riva-trt-riva_qa-nn-bert-base-uncased:1
I0408 12:45:04.370617 100 sequence_label_cbe.cc:138] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.370624 100 model_repository_manager.cc:1132] successfully unloaded ‘riva_qa’ version 1
I0408 12:45:04.370639 100 tokenizer_library.cc:25] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0408 12:45:04.370676 100 model_repository_manager.cc:1026] unloading: riva-trt-riva_ner-nn-bert-base-uncased:1
I0408 12:45:04.370721 100 model_repository_manager.cc:1026] unloading: riva_ner:1
I0408 12:45:04.370768 100 model_repository_manager.cc:1026] unloading: qa_tokenizer:1
I0408 12:45:04.370847 100 model_repository_manager.cc:1026] unloading: qa_qa_postprocessor:1
I0408 12:45:04.370865 100 tensorrt.cc:5343] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0408 12:45:04.370886 100 detokenizer_cbe.cc:149] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0408 12:45:04.370905 100 tensorrt.cc:5343] TRITONBACKEND_ModelInstanceFinalize: delete instance stateI0408 12:45:04.370911 100 detokenizer_cbe.cc:146] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.370923 100 server.cc:267] Timeout 30: Found 8 live models and 0 in-flight non-inference requests

I0408 12:45:04.370955 100 model_repository_manager.cc:1132] successfully unloaded ‘riva_ner’ version 1
I0408 12:45:04.370984 100 tokenizer_library.cc:25] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0408 12:45:04.371234 100 model_repository_manager.cc:1132] successfully unloaded ‘token_classification_label_tokens’ version 1
I0408 12:45:04.371671 100 model_repository_manager.cc:1132] successfully unloaded ‘token_classification_detokenizer’ version 1
I0408 12:45:04.371737 100 qa_postprocessor_cbe.cc:128] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0408 12:45:04.371755 100 qa_postprocessor_cbe.cc:125] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.372342 100 model_repository_manager.cc:1132] successfully unloaded ‘qa_qa_postprocessor’ version 1
I0408 12:45:04.373135 100 tokenizer_library.cc:20] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.373422 100 tokenizer_library.cc:20] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.374046 100 model_repository_manager.cc:1132] successfully unloaded ‘token_classification_tokenizer’ version 1
I0408 12:45:04.376284 100 model_repository_manager.cc:1132] successfully unloaded ‘qa_tokenizer’ version 1
I0408 12:45:04.378483 100 tensorrt.cc:5282] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.381356 100 tensorrt.cc:5282] TRITONBACKEND_ModelFinalize: delete model state
I0408 12:45:04.391763 100 model_repository_manager.cc:1132] successfully unloaded ‘riva-trt-riva_ner-nn-bert-base-uncased’ version 1
I0408 12:45:04.394650 100 model_repository_manager.cc:1132] successfully unloaded ‘riva-trt-riva_qa-nn-bert-base-uncased’ version 1

Riva waiting for Triton server to load all models…retrying in 1 second
I0408 12:45:05.371090 100 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs
/opt/riva/bin/start-riva: line 1: kill: (100) - No such process

I can’t find the problem, need your help. Thanks!

Hi @1405956117 ,

Thanks for your interest in Riva,

Thanks for sharing us the logs,

Since GTX 1650 Ti has 4GB of VRAM, we recommend running single model at a time for functionality and performance,

This can be changed in the config.sh, at main level we can enable only for the category that we are interested in, for example if i want to run nlp perhaps we can disable all other riva services and vice versa

service_enabled_asr=false
service_enabled_nlp=true
service_enabled_tts=false

Once decided which module to use (like asr or nlp or tts), then you can perhaps further go down in the config.sh, find the exact model which we intend to use and comment out other models,

For example, in nlp if we choose to run Text Classification model for 4classes (weather), we can have that uncommented and comment out all the rest of the models (reference image below)

So, we recommended the following steps,

  1. Run bash riva_clean.sh
  2. Modify the config.sh to run only a single model
  3. Run bash riva_init.sh and share logs (if facing any error)
  4. Run bash riva_start.sh and share logs (if facing any error)

You can try the above steps and let us know if it works for you