========================== === Riva Speech Skills === ========================== NVIDIA Release (build 45250441) Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:01:50.215963 103 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime I1010 16:01:50.216175 103 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9 I1010 16:01:50.216343 103 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9 I1010 16:01:50.216364 103 onnxruntime.cc:2446] backend configuration: {} I1010 16:01:50.400309 103 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fd47e000000' with size 268435456 I1010 16:01:50.400535 103 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 1000000000 E1010 16:01:50.482470 103 model_repository_manager.cc:2064] Poll failed for model directory 'riva-trt-conformer-en-US-asr-offline-am-streaming-offline': failed to open text file for read /data/models/riva-trt-conformer-en-US-asr-offline-am-streaming-offline/config.pbtxt: No such file or directory E1010 16:01:50.496880 103 model_repository_manager.cc:1420] Invalid argument: ensemble conformer-en-US-asr-offline contains models that are not available: riva-trt-conformer-en-US-asr-offline-am-streaming-offline I1010 16:01:50.497083 103 model_repository_manager.cc:1077] loading: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline:1 I1010 16:01:50.597552 103 model_repository_manager.cc:1077] loading: conformer-en-US-asr-offline-endpointing-streaming-offline:1 I1010 16:01:50.697914 103 model_repository_manager.cc:1077] loading: conformer-en-US-asr-offline-feature-extractor-streaming-offline:1 I1010 16:01:50.798468 103 model_repository_manager.cc:1077] loading: conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming:1 I1010 16:01:50.898713 103 model_repository_manager.cc:1077] loading: conformer-en-US-asr-streaming-endpointing-streaming:1 I1010 16:01:50.999216 103 model_repository_manager.cc:1077] loading: conformer-en-US-asr-streaming-feature-extractor-streaming:1 > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:01:51.082555 103 endpointing_library.cc:18] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-endpointing-streaming-offline (version 1) WARNING: Logging before InitGoogleLogging() is written to STDERR W1010 16:01:51.087898 111 parameter_parser.cc:144] Parameter 'chunk_size' set but unused. W1010 16:01:51.087939 111 parameter_parser.cc:144] Parameter 'ms_per_timestep' set but unused. W1010 16:01:51.087947 111 parameter_parser.cc:144] Parameter 'residue_blanks_at_end' set but unused. W1010 16:01:51.087953 111 parameter_parser.cc:144] Parameter 'residue_blanks_at_start' set but unused. W1010 16:01:51.087960 111 parameter_parser.cc:144] Parameter 'start_history' set but unused. W1010 16:01:51.087966 111 parameter_parser.cc:144] Parameter 'start_th' set but unused. W1010 16:01:51.087972 111 parameter_parser.cc:144] Parameter 'stop_history' set but unused. W1010 16:01:51.087980 111 parameter_parser.cc:144] Parameter 'stop_th' set but unused. W1010 16:01:51.087985 111 parameter_parser.cc:144] Parameter 'streaming' set but unused. W1010 16:01:51.087991 111 parameter_parser.cc:144] Parameter 'use_subword' set but unused. W1010 16:01:51.087998 111 parameter_parser.cc:144] Parameter 'vocab_file' set but unused. I1010 16:01:51.088688 103 backend_model.cc:303] model configuration: { "name": "conformer-en-US-asr-offline-endpointing-streaming-offline", "platform": "", "backend": "riva_asr_endpointing", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 2048, "input": [ { "name": "CLASS_LOGITS", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 257 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "SEGMENTS_START_END", "data_type": "TYPE_FP32", "dims": [ -1, 2 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] } ], "state": [] }, "instance_group": [ { "name": "conformer-en-US-asr-offline-endpointing-streaming-offline_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "use_subword": { "string_value": "True" }, "streaming": { "string_value": "True" }, "stop_history": { "string_value": "800" }, "residue_blanks_at_end": { "string_value": "0" }, "start_th": { "string_value": "0.2" }, "chunk_size": { "string_value": "4.8" }, "endpointing_type": { "string_value": "greedy_ctc" }, "stop_th": { "string_value": "0.98" }, "vocab_file": { "string_value": "/data/models/conformer-en-US-asr-offline-endpointing-streaming-offline/1/riva_decoder_vocabulary.txt" }, "start_history": { "string_value": "200" }, "residue_blanks_at_start": { "string_value": "0" }, "ms_per_timestep": { "string_value": "40" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:51.088836 103 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline (version 1) WARNING: Logging before InitGoogleLogging() is written to STDERR W1010 16:01:51.091704 110 parameter_parser.cc:144] Parameter 'beam_size' set but unused. W1010 16:01:51.091738 110 parameter_parser.cc:144] Parameter 'beam_size_token' set but unused. W1010 16:01:51.091745 110 parameter_parser.cc:144] Parameter 'beam_threshold' set but unused. W1010 16:01:51.091751 110 parameter_parser.cc:144] Parameter 'blank_token' set but unused. W1010 16:01:51.091758 110 parameter_parser.cc:144] Parameter 'decoder_num_worker_threads' set but unused. W1010 16:01:51.091764 110 parameter_parser.cc:144] Parameter 'forerunner_beam_size' set but unused. W1010 16:01:51.091770 110 parameter_parser.cc:144] Parameter 'forerunner_beam_size_token' set but unused. W1010 16:01:51.091776 110 parameter_parser.cc:144] Parameter 'forerunner_beam_threshold' set but unused. W1010 16:01:51.091782 110 parameter_parser.cc:144] Parameter 'forerunner_use_lm' set but unused. W1010 16:01:51.091789 110 parameter_parser.cc:144] Parameter 'language_model_file' set but unused. W1010 16:01:51.091795 110 parameter_parser.cc:144] Parameter 'lexicon_file' set but unused. W1010 16:01:51.091802 110 parameter_parser.cc:144] Parameter 'lm_weight' set but unused. W1010 16:01:51.091809 110 parameter_parser.cc:144] Parameter 'log_add' set but unused. W1010 16:01:51.091814 110 parameter_parser.cc:144] Parameter 'max_execution_batch_size' set but unused. W1010 16:01:51.091822 110 parameter_parser.cc:144] Parameter 'max_supported_transcripts' set but unused. W1010 16:01:51.091830 110 parameter_parser.cc:144] Parameter 'num_tokenization' set but unused. W1010 16:01:51.091835 110 parameter_parser.cc:144] Parameter 'profane_words_file' set but unused. W1010 16:01:51.091843 110 parameter_parser.cc:144] Parameter 'set_default_index_to_unk_token' set but unused. W1010 16:01:51.091850 110 parameter_parser.cc:144] Parameter 'sil_token' set but unused. W1010 16:01:51.091857 110 parameter_parser.cc:144] Parameter 'smearing_mode' set but unused. W1010 16:01:51.091866 110 parameter_parser.cc:144] Parameter 'tokenizer_model' set but unused. W1010 16:01:51.091872 110 parameter_parser.cc:144] Parameter 'unk_score' set but unused. W1010 16:01:51.091879 110 parameter_parser.cc:144] Parameter 'unk_token' set but unused. W1010 16:01:51.091886 110 parameter_parser.cc:144] Parameter 'vocab_file' set but unused. W1010 16:01:51.091893 110 parameter_parser.cc:144] Parameter 'word_insertion_score' set but unused. I1010 16:01:51.099646 103 model_repository_manager.cc:1077] loading: intent_slot_detokenizer:1 I1010 16:01:51.103787 103 backend_model.cc:303] model configuration: { "name": "conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline", "platform": "", "backend": "riva_asr_decoder", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 1024, "input": [ { "name": "CLASS_LOGITS", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 257 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "END_FLAG", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SEGMENTS_START_END", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 2 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "CUSTOM_CONFIGURATION", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ -1, 2 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "FINAL_TRANSCRIPTS", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "FINAL_TRANSCRIPTS_SCORE", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "FINAL_WORDS_START_END", "data_type": "TYPE_INT32", "dims": [ -1, 2 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_TRANSCRIPTS", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_TRANSCRIPTS_STABILITY", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_WORDS_START_END", "data_type": "TYPE_INT32", "dims": [ -1, 2 ], "label_filename": "", "is_shape_tensor": false }, { "name": "FINAL_WORDS_SCORE", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_WORDS_SCORE", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 1024, "preferred_batch_size": [ 32, 64 ], "max_queue_delay_microseconds": 1000 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "vocab_file": { "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt" }, "ms_per_timestep": { "string_value": "40" }, "use_subword": { "string_value": "True" }, "streaming": { "string_value": "True" }, "beam_size": { "string_value": "32" }, "right_padding_size": { "string_value": "1.6" }, "beam_size_token": { "string_value": "16" }, "sil_token": { "string_value": "▁" }, "num_tokenization": { "string_value": "1" }, "beam_threshold": { "string_value": "20.0" }, "language_model_file": { "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/4gram-pruned-0_2_7_9-en-lm-set-2.0.bin" }, "tokenizer_model": { "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/8b8f095152034e98b24ab33726708bd0_tokenizer.model" }, "max_execution_batch_size": { "string_value": "1024" }, "forerunner_use_lm": { "string_value": "true" }, "forerunner_beam_size_token": { "string_value": "8" }, "profane_words_file": { "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/profane_words_file.txt" }, "forerunner_beam_threshold": { "string_value": "10.0" }, "decoder_num_worker_threads": { "string_value": "-1" }, "asr_model_delay": { "string_value": "-1" }, "word_insertion_score": { "string_value": "1.0" }, "unk_token": { "string_value": "" }, "left_padding_size": { "string_value": "1.6" }, "set_default_index_to_unk_token": { "string_value": "False" }, "decoder_type": { "string_value": "flashlight" }, "forerunner_beam_size": { "string_value": "8" }, "unk_score": { "string_value": "-inf" }, "chunk_size": { "string_value": "4.8" }, "max_supported_transcripts": { "string_value": "1" }, "lexicon_file": { "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon.txt" }, "smearing_mode": { "string_value": "max" }, "log_add": { "string_value": "True" }, "lm_weight": { "string_value": "0.8" }, "blank_token": { "string_value": "#" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:51.106247 103 feature-extractor.cc:400] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-feature-extractor-streaming-offline (version 1) I1010 16:01:51.159492 103 backend_model.cc:303] model configuration: { "name": "conformer-en-US-asr-offline-feature-extractor-streaming-offline", "platform": "", "backend": "riva_asr_features", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 512, "input": [ { "name": "AUDIO_SIGNAL", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SAMPLE_RATE", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "AUDIO_FEATURES", "data_type": "TYPE_FP32", "dims": [ 80, -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "AUDIO_PROCESSED", "data_type": "TYPE_FP32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 512, "preferred_batch_size": [ 256, 512 ], "max_queue_delay_microseconds": 1000 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "conformer-en-US-asr-offline-feature-extractor-streaming-offline_0", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "stddev": { "string_value": "2.2668, 3.1642, 3.7079, 3.7642, 3.5349, 3.5901, 3.7640, 3.8424, 4.0145, 4.1475, 4.0457, 3.9048, 3.7709, 3.6117, 3.3188, 3.1489, 3.0615, 3.0362, 2.9929, 3.0500, 3.0341, 3.0484, 3.0103, 2.9474, 2.9128, 2.8669, 2.8332, 2.9411, 3.0378, 3.0712, 3.0190, 2.9992, 3.0124, 3.0024, 3.0275, 3.0870, 3.0656, 3.0142, 3.0493, 3.1373, 3.1135, 3.0675, 2.8828, 2.7018, 2.6296, 2.8826, 2.9325, 2.9288, 2.9271, 2.9890, 3.0137, 2.9855, 3.0839, 2.9319, 2.3512, 2.3795, 2.6191, 2.7555, 2.9326, 2.9931, 3.1543, 3.0855, 2.6820, 3.0566, 3.1272, 3.1663, 3.1836, 3.0018, 2.9089, 3.1727, 3.1626, 3.1086, 2.9804, 3.1107, 3.2998, 3.3697, 3.3716, 3.2487, 3.1597, 3.1181" }, "chunk_size": { "string_value": "4.8" }, "max_execution_batch_size": { "string_value": "512" }, "sample_rate": { "string_value": "16000" }, "window_size": { "string_value": "0.025" }, "num_features": { "string_value": "80" }, "window_stride": { "string_value": "0.01" }, "streaming": { "string_value": "True" }, "stddev_floor": { "string_value": "1e-05" }, "transpose": { "string_value": "False" }, "left_padding_size": { "string_value": "1.6" }, "right_padding_size": { "string_value": "1.6" }, "gain": { "string_value": "1.0" }, "use_utterance_norm_params": { "string_value": "False" }, "precalc_norm_time_steps": { "string_value": "0" }, "dither": { "string_value": "0.0" }, "apply_normalization": { "string_value": "True" }, "precalc_norm_params": { "string_value": "False" }, "norm_per_feature": { "string_value": "True" }, "mean": { "string_value": "-11.4412, -9.9334, -9.1292, -9.0365, -9.2804, -9.5643, -9.7342, -9.6925, -9.6333, -9.2808, -9.1887, -9.1422, -9.1397, -9.2028, -9.2749, -9.4776, -9.9185, -10.1557, -10.3800, -10.5067, -10.3190, -10.4728, -10.5529, -10.6402, -10.6440, -10.5113, -10.7395, -10.7870, -10.6074, -10.5033, -10.8278, -10.6384, -10.8481, -10.6875, -10.5454, -10.4747, -10.5165, -10.4930, -10.3413, -10.3472, -10.3735, -10.6830, -10.8813, -10.6338, -10.3856, -10.7727, -10.8957, -10.8068, -10.7373, -10.6108, -10.3405, -10.2889, -10.3922, -10.4946, -10.3367, -10.4164, -10.9949, -10.7196, -10.3971, -10.1734, -9.9257, -9.6557, -9.1761, -9.6653, -9.7876, -9.7230, -9.7792, -9.7056, -9.2702, -9.4650, -9.2755, -9.1369, -9.1174, -8.9197, -8.5394, -8.2614, -8.1353, -8.1422, -8.3430, -8.6655" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:51.159702 103 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0 (device 0) I1010 16:01:51.199949 103 model_repository_manager.cc:1077] loading: intent_slot_label_tokens_weather:1 I1010 16:01:51.300133 103 model_repository_manager.cc:1077] loading: intent_slot_tokenizer-en-US-weather:1 I1010 16:01:51.400337 103 model_repository_manager.cc:1077] loading: qa_qa_postprocessor:1 I1010 16:01:51.500555 103 model_repository_manager.cc:1077] loading: qa_tokenizer-en-US:1 I1010 16:01:51.600763 103 model_repository_manager.cc:1077] loading: riva-onnx-fastpitch_encoder-English-US:1 I1010 16:01:51.700999 103 model_repository_manager.cc:1077] loading: riva-punctuation-en-US:1 I1010 16:01:51.801230 103 model_repository_manager.cc:1077] loading: riva-trt-conformer-en-US-asr-streaming-am-streaming:1 I1010 16:01:51.901443 103 model_repository_manager.cc:1077] loading: riva-trt-hifigan-English-US:1 I1010 16:01:52.001660 103 model_repository_manager.cc:1077] loading: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased:1 > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:01:52.101872 103 model_repository_manager.cc:1077] loading: riva-trt-riva_intent_weather-nn-bert-base-uncased:1 I1010 16:01:52.202092 103 model_repository_manager.cc:1077] loading: riva-trt-riva_ner-nn-bert-base-uncased:1 I1010 16:01:52.213779 110 ctc-decoder.cc:174] Beam Decoder initialized successfully! I1010 16:01:52.213857 103 endpointing_library.cc:22] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-endpointing-streaming-offline_0 (device 0) I1010 16:01:52.214185 103 model_repository_manager.cc:1231] successfully loaded 'conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline' version 1 I1010 16:01:52.233477 103 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming (version 1) W1010 16:01:52.234179 113 parameter_parser.cc:144] Parameter 'beam_size' set but unused. W1010 16:01:52.234185 113 parameter_parser.cc:144] Parameter 'beam_size_token' set but unused. W1010 16:01:52.234187 113 parameter_parser.cc:144] Parameter 'beam_threshold' set but unused. W1010 16:01:52.234189 113 parameter_parser.cc:144] Parameter 'blank_token' set but unused. W1010 16:01:52.234190 113 parameter_parser.cc:144] Parameter 'decoder_num_worker_threads' set but unused. W1010 16:01:52.234192 113 parameter_parser.cc:144] Parameter 'forerunner_beam_size' set but unused. W1010 16:01:52.234194 113 parameter_parser.cc:144] Parameter 'forerunner_beam_size_token' set but unused. W1010 16:01:52.234195 113 parameter_parser.cc:144] Parameter 'forerunner_beam_threshold' set but unused. W1010 16:01:52.234197 113 parameter_parser.cc:144] Parameter 'forerunner_use_lm' set but unused. W1010 16:01:52.234215 113 parameter_parser.cc:144] Parameter 'language_model_file' set but unused. W1010 16:01:52.234216 113 parameter_parser.cc:144] Parameter 'lexicon_file' set but unused. W1010 16:01:52.234217 113 parameter_parser.cc:144] Parameter 'lm_weight' set but unused. W1010 16:01:52.234220 113 parameter_parser.cc:144] Parameter 'log_add' set but unused. W1010 16:01:52.234221 113 parameter_parser.cc:144] Parameter 'max_execution_batch_size' set but unused. W1010 16:01:52.234241 113 parameter_parser.cc:144] Parameter 'max_supported_transcripts' set but unused. W1010 16:01:52.234243 113 parameter_parser.cc:144] Parameter 'num_tokenization' set but unused. W1010 16:01:52.234246 113 parameter_parser.cc:144] Parameter 'profane_words_file' set but unused. W1010 16:01:52.234246 113 parameter_parser.cc:144] Parameter 'set_default_index_to_unk_token' set but unused. W1010 16:01:52.234248 113 parameter_parser.cc:144] Parameter 'sil_token' set but unused. W1010 16:01:52.234251 113 parameter_parser.cc:144] Parameter 'smearing_mode' set but unused. W1010 16:01:52.234252 113 parameter_parser.cc:144] Parameter 'tokenizer_model' set but unused. W1010 16:01:52.234254 113 parameter_parser.cc:144] Parameter 'unk_score' set but unused. W1010 16:01:52.234256 113 parameter_parser.cc:144] Parameter 'unk_token' set but unused. W1010 16:01:52.234257 113 parameter_parser.cc:144] Parameter 'vocab_file' set but unused. W1010 16:01:52.234259 113 parameter_parser.cc:144] Parameter 'word_insertion_score' set but unused. I1010 16:01:52.234286 103 model_repository_manager.cc:1231] successfully loaded 'conformer-en-US-asr-offline-endpointing-streaming-offline' version 1 I1010 16:01:52.234579 103 backend_model.cc:303] model configuration: { "name": "conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming", "platform": "", "backend": "riva_asr_decoder", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 1024, "input": [ { "name": "CLASS_LOGITS", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 257 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "END_FLAG", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SEGMENTS_START_END", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 2 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "CUSTOM_CONFIGURATION", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ -1, 2 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "FINAL_TRANSCRIPTS", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "FINAL_TRANSCRIPTS_SCORE", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "FINAL_WORDS_START_END", "data_type": "TYPE_INT32", "dims": [ -1, 2 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_TRANSCRIPTS", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_TRANSCRIPTS_STABILITY", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_WORDS_START_END", "data_type": "TYPE_INT32", "dims": [ -1, 2 ], "label_filename": "", "is_shape_tensor": false }, { "name": "FINAL_WORDS_SCORE", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PARTIAL_WORDS_SCORE", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 1024, "preferred_batch_size": [ 32, 64 ], "max_queue_delay_microseconds": 1000 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "sil_token": { "string_value": "▁" }, "num_tokenization": { "string_value": "1" }, "beam_threshold": { "string_value": "20.0" }, "language_model_file": { "string_value": "/data/models/conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/4gram-pruned-0_2_7_9-en-lm-set-2.0.bin" }, "tokenizer_model": { "string_value": "/data/models/conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/8b8f095152034e98b24ab33726708bd0_tokenizer.model" }, "max_execution_batch_size": { "string_value": "1024" }, "forerunner_use_lm": { "string_value": "true" }, "profane_words_file": { "string_value": "/data/models/conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/profane_words_file.txt" }, "forerunner_beam_size_token": { "string_value": "8" }, "forerunner_beam_threshold": { "string_value": "10.0" }, "asr_model_delay": { "string_value": "-1" }, "decoder_num_worker_threads": { "string_value": "-1" }, "word_insertion_score": { "string_value": "1.0" }, "unk_token": { "string_value": "" }, "left_padding_size": { "string_value": "1.92" }, "set_default_index_to_unk_token": { "string_value": "False" }, "decoder_type": { "string_value": "flashlight" }, "forerunner_beam_size": { "string_value": "8" }, "unk_score": { "string_value": "-inf" }, "chunk_size": { "string_value": "0.16" }, "max_supported_transcripts": { "string_value": "1" }, "lexicon_file": { "string_value": "/data/models/conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt" }, "smearing_mode": { "string_value": "max" }, "log_add": { "string_value": "True" }, "lm_weight": { "string_value": "0.8" }, "blank_token": { "string_value": "#" }, "vocab_file": { "string_value": "/data/models/conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/riva_decoder_vocabulary.txt" }, "ms_per_timestep": { "string_value": "40" }, "streaming": { "string_value": "True" }, "use_subword": { "string_value": "True" }, "beam_size": { "string_value": "32" }, "right_padding_size": { "string_value": "1.92" }, "beam_size_token": { "string_value": "16" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:52.234656 103 endpointing_library.cc:18] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-endpointing-streaming (version 1) W1010 16:01:52.234984 115 parameter_parser.cc:144] Parameter 'chunk_size' set but unused. W1010 16:01:52.234990 115 parameter_parser.cc:144] Parameter 'ms_per_timestep' set but unused. W1010 16:01:52.234992 115 parameter_parser.cc:144] Parameter 'residue_blanks_at_end' set but unused. W1010 16:01:52.234992 115 parameter_parser.cc:144] Parameter 'residue_blanks_at_start' set but unused. W1010 16:01:52.234993 115 parameter_parser.cc:144] Parameter 'start_history' set but unused. W1010 16:01:52.234994 115 parameter_parser.cc:144] Parameter 'start_th' set but unused. W1010 16:01:52.234995 115 parameter_parser.cc:144] Parameter 'stop_history' set but unused. W1010 16:01:52.234997 115 parameter_parser.cc:144] Parameter 'stop_th' set but unused. W1010 16:01:52.234997 115 parameter_parser.cc:144] Parameter 'streaming' set but unused. W1010 16:01:52.234998 115 parameter_parser.cc:144] Parameter 'use_subword' set but unused. W1010 16:01:52.234999 115 parameter_parser.cc:144] Parameter 'vocab_file' set but unused. I1010 16:01:52.235177 103 backend_model.cc:303] model configuration: { "name": "conformer-en-US-asr-streaming-endpointing-streaming", "platform": "", "backend": "riva_asr_endpointing", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 2048, "input": [ { "name": "CLASS_LOGITS", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 257 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "SEGMENTS_START_END", "data_type": "TYPE_FP32", "dims": [ -1, 2 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] } ], "state": [] }, "instance_group": [ { "name": "conformer-en-US-asr-streaming-endpointing-streaming_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "start_th": { "string_value": "0.2" }, "chunk_size": { "string_value": "0.16" }, "endpointing_type": { "string_value": "greedy_ctc" }, "stop_th": { "string_value": "0.98" }, "vocab_file": { "string_value": "/data/models/conformer-en-US-asr-streaming-endpointing-streaming/1/riva_decoder_vocabulary.txt" }, "start_history": { "string_value": "200" }, "residue_blanks_at_start": { "string_value": "-2" }, "ms_per_timestep": { "string_value": "40" }, "streaming": { "string_value": "True" }, "use_subword": { "string_value": "True" }, "stop_history": { "string_value": "800" }, "residue_blanks_at_end": { "string_value": "0" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:52.235250 103 feature-extractor.cc:400] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-feature-extractor-streaming (version 1) I1010 16:01:52.235882 103 backend_model.cc:303] model configuration: { "name": "conformer-en-US-asr-streaming-feature-extractor-streaming", "platform": "", "backend": "riva_asr_features", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 1024, "input": [ { "name": "AUDIO_SIGNAL", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SAMPLE_RATE", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "AUDIO_FEATURES", "data_type": "TYPE_FP32", "dims": [ 80, -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "AUDIO_PROCESSED", "data_type": "TYPE_FP32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "AUDIO_FEATURES_LENGTH", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 1024, "preferred_batch_size": [ 256, 512 ], "max_queue_delay_microseconds": 1000 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "conformer-en-US-asr-streaming-feature-extractor-streaming_0", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "mean": { "string_value": "-11.4412, -9.9334, -9.1292, -9.0365, -9.2804, -9.5643, -9.7342, -9.6925, -9.6333, -9.2808, -9.1887, -9.1422, -9.1397, -9.2028, -9.2749, -9.4776, -9.9185, -10.1557, -10.3800, -10.5067, -10.3190, -10.4728, -10.5529, -10.6402, -10.6440, -10.5113, -10.7395, -10.7870, -10.6074, -10.5033, -10.8278, -10.6384, -10.8481, -10.6875, -10.5454, -10.4747, -10.5165, -10.4930, -10.3413, -10.3472, -10.3735, -10.6830, -10.8813, -10.6338, -10.3856, -10.7727, -10.8957, -10.8068, -10.7373, -10.6108, -10.3405, -10.2889, -10.3922, -10.4946, -10.3367, -10.4164, -10.9949, -10.7196, -10.3971, -10.1734, -9.9257, -9.6557, -9.1761, -9.6653, -9.7876, -9.7230, -9.7792, -9.7056, -9.2702, -9.4650, -9.2755, -9.1369, -9.1174, -8.9197, -8.5394, -8.2614, -8.1353, -8.1422, -8.3430, -8.6655" }, "stddev": { "string_value": "2.2668, 3.1642, 3.7079, 3.7642, 3.5349, 3.5901, 3.7640, 3.8424, 4.0145, 4.1475, 4.0457, 3.9048, 3.7709, 3.6117, 3.3188, 3.1489, 3.0615, 3.0362, 2.9929, 3.0500, 3.0341, 3.0484, 3.0103, 2.9474, 2.9128, 2.8669, 2.8332, 2.9411, 3.0378, 3.0712, 3.0190, 2.9992, 3.0124, 3.0024, 3.0275, 3.0870, 3.0656, 3.0142, 3.0493, 3.1373, 3.1135, 3.0675, 2.8828, 2.7018, 2.6296, 2.8826, 2.9325, 2.9288, 2.9271, 2.9890, 3.0137, 2.9855, 3.0839, 2.9319, 2.3512, 2.3795, 2.6191, 2.7555, 2.9326, 2.9931, 3.1543, 3.0855, 2.6820, 3.0566, 3.1272, 3.1663, 3.1836, 3.0018, 2.9089, 3.1727, 3.1626, 3.1086, 2.9804, 3.1107, 3.2998, 3.3697, 3.3716, 3.2487, 3.1597, 3.1181" }, "chunk_size": { "string_value": "0.16" }, "max_execution_batch_size": { "string_value": "1024" }, "sample_rate": { "string_value": "16000" }, "num_features": { "string_value": "80" }, "window_size": { "string_value": "0.025" }, "window_stride": { "string_value": "0.01" }, "streaming": { "string_value": "True" }, "transpose": { "string_value": "False" }, "left_padding_size": { "string_value": "1.92" }, "stddev_floor": { "string_value": "1e-05" }, "right_padding_size": { "string_value": "1.92" }, "gain": { "string_value": "1.0" }, "precalc_norm_time_steps": { "string_value": "0" }, "use_utterance_norm_params": { "string_value": "False" }, "precalc_norm_params": { "string_value": "False" }, "apply_normalization": { "string_value": "True" }, "dither": { "string_value": "0.0" }, "norm_per_feature": { "string_value": "True" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:52.235950 103 feature-extractor.cc:402] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-feature-extractor-streaming-offline_0 (device 0) I1010 16:01:52.302307 103 model_repository_manager.cc:1077] loading: riva-trt-riva_qa-nn-bert-base-uncased:1 I1010 16:01:52.402753 103 model_repository_manager.cc:1077] loading: riva-trt-riva_text_classification_domain-nn-bert-base-uncased:1 I1010 16:01:52.503259 103 model_repository_manager.cc:1077] loading: spectrogram_chunker-English-US:1 I1010 16:01:52.603734 103 model_repository_manager.cc:1077] loading: text_classification_tokenizer-en-US-domain:1 I1010 16:01:52.704238 103 model_repository_manager.cc:1077] loading: token_classification_detokenizer:1 I1010 16:01:52.804710 103 model_repository_manager.cc:1077] loading: token_classification_label_tokens:1 I1010 16:01:52.905167 103 model_repository_manager.cc:1077] loading: token_classification_tokenizer-en-US:1 I1010 16:01:53.005700 103 model_repository_manager.cc:1077] loading: tts_postprocessor-English-US:1 > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:01:53.106273 103 model_repository_manager.cc:1077] loading: tts_preprocessor-English-US:1 > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:01:57.153535 103 model_repository_manager.cc:1231] successfully loaded 'conformer-en-US-asr-offline-feature-extractor-streaming-offline' version 1 I1010 16:01:57.153591 103 detokenizer_cbe.cc:145] TRITONBACKEND_ModelInitialize: intent_slot_detokenizer (version 1) I1010 16:01:57.154139 103 backend_model.cc:303] model configuration: { "name": "intent_slot_detokenizer", "platform": "", "backend": "riva_nlp_detokenizer", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "IN_TOKEN_LABELS__0", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IN_TOKEN_SCORES__1", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IN_SEQ_LEN__2", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IN_TOK_STR__3", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "OUT_TOKEN_LABELS__0", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "OUT_TOKEN_SCORES__1", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "OUT_SEQ_LEN__2", "data_type": "TYPE_INT64", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "OUT_TOK_STR__3", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "intent_slot_detokenizer_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": {}, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I1010 16:01:57.154215 103 endpointing_library.cc:22] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-endpointing-streaming_0 (device 0) I1010 16:01:57.176744 103 feature-extractor.cc:402] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-feature-extractor-streaming_0 (device 0) I1010 16:01:57.177537 103 model_repository_manager.cc:1231] successfully loaded 'conformer-en-US-asr-streaming-endpointing-streaming' version 1 I1010 16:01:57.180000 103 sequence_label_cbe.cc:137] TRITONBACKEND_ModelInitialize: intent_slot_label_tokens_weather (version 1) I1010 16:01:57.180125 103 model_repository_manager.cc:1231] successfully loaded 'conformer-en-US-asr-streaming-feature-extractor-streaming' version 1 I1010 16:01:57.180479 103 backend_model.cc:303] model configuration: { "name": "intent_slot_label_tokens_weather", "platform": "", "backend": "riva_nlp_seqlabel", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "TOKEN_LOGIT__1", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1, 65 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "TOKEN_LABELS__0", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "TOKEN_SCORES__1", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "intent_slot_label_tokens_weather_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "classes": { "string_value": "/data/models/intent_slot_label_tokens_weather/1/slot_labels.csv" } }, "model_warmup": [] } I1010 16:01:57.180566 103 detokenizer_cbe.cc:147] TRITONBACKEND_ModelInstanceInitialize: intent_slot_detokenizer_0 (device 0) I1010 16:01:57.180584 103 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming_0 (device 0) I1010 16:01:57.180707 103 model_repository_manager.cc:1231] successfully loaded 'intent_slot_detokenizer' version 1 > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:01:58.153828 113 ctc-decoder.cc:174] Beam Decoder initialized successfully! I1010 16:01:58.154135 103 model_repository_manager.cc:1231] successfully loaded 'conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming' version 1 I1010 16:01:58.156667 103 tokenizer_library.cc:18] TRITONBACKEND_ModelInitialize: intent_slot_tokenizer-en-US-weather (version 1) WARNING: Logging before InitGoogleLogging() is written to STDERR W1010 16:01:58.157042 121 parameter_parser.cc:144] Parameter 'unk_token' set but unused. W1010 16:01:58.157047 121 parameter_parser.cc:144] Parameter 'vocab' set but unused. I1010 16:01:58.157101 103 backend_model.cc:303] model configuration: { "name": "intent_slot_tokenizer-en-US-weather", "platform": "", "backend": "riva_nlp_tokenizer", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "INPUT_STR__0", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "SEQ__0", "data_type": "TYPE_INT32", "dims": [ 128 ], "label_filename": "", "is_shape_tensor": false }, { "name": "MASK__1", "data_type": "TYPE_INT32", "dims": [ 128 ], "label_filename": "", "is_shape_tensor": false }, { "name": "SEGMENT__4", "data_type": "TYPE_INT32", "dims": [ 128 ], "label_filename": "", "is_shape_tensor": false }, { "name": "SEQ_LEN__2", "data_type": "TYPE_INT64", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "TOK_STR__3", "data_type": "TYPE_STRING", "dims": [ 128 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "intent_slot_tokenizer-en-US-weather_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "task": { "string_value": "single_input" }, "unk_token": { "string_value": "[UNK]" }, "vocab": { "string_value": "/data/models/intent_slot_tokenizer-en-US-weather/1/tokenizer.vocab_file" }, "tokenizer": { "string_value": "wordpiece" }, "bos_token": { "string_value": "[CLS]" }, "to_lower": { "string_value": "true" }, "eos_token": { "string_value": "[SEP]" }, "pad_chars_with_spaces": { "string_value": "False" } }, "model_warmup": [] } I1010 16:01:58.157142 103 sequence_label_cbe.cc:139] TRITONBACKEND_ModelInstanceInitialize: intent_slot_label_tokens_weather_0 (device 0) I1010 16:01:58.157338 103 model_repository_manager.cc:1231] successfully loaded 'intent_slot_label_tokens_weather' version 1 I1010 16:01:58.159987 103 qa_postprocessor_cbe.cc:124] TRITONBACKEND_ModelInitialize: qa_qa_postprocessor (version 1) I1010 16:01:58.160304 103 backend_model.cc:303] model configuration: { "name": "qa_qa_postprocessor", "platform": "", "backend": "riva_nlp_qa", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "QA_LOGITS__0", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 384, 2 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SEQ_LEN__1", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "TOK_STR__2", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 384 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "TOK_TO_ORIG__3", "data_type": "TYPE_UINT16", "format": "FORMAT_NONE", "dims": [ 384 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IN_PASSAGE_STR__4", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "ANSWER_SPANS__0", "data_type": "TYPE_STRING", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "ANSWER_SCORES__1", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "qa_qa_postprocessor_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "bert_model_seq_length": { "string_value": "384" }, "version_2_with_negative": { "string_value": "True" }, "n_best_size": { "string_value": "20" }, "max_answer_length": { "string_value": "30" } }, "model_warmup": [] } I1010 16:01:58.160368 103 tokenizer_library.cc:21] TRITONBACKEND_ModelInstanceInitialize: intent_slot_tokenizer-en-US-weather_0 (device 0) I1010 16:01:58.168538 103 tokenizer_library.cc:18] TRITONBACKEND_ModelInitialize: qa_tokenizer-en-US (version 1) I1010 16:01:58.168685 103 model_repository_manager.cc:1231] successfully loaded 'intent_slot_tokenizer-en-US-weather' version 1 W1010 16:01:58.169078 123 parameter_parser.cc:144] Parameter 'unk_token' set but unused. W1010 16:01:58.169085 123 parameter_parser.cc:144] Parameter 'vocab' set but unused. I1010 16:01:58.169161 103 backend_model.cc:303] model configuration: { "name": "qa_tokenizer-en-US", "platform": "", "backend": "riva_nlp_tokenizer", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "IN_QUERY_STR__0", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IN_PASSAGE_STR__1", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "SEQ__0", "data_type": "TYPE_INT32", "dims": [ 384 ], "label_filename": "", "is_shape_tensor": false }, { "name": "MASK__1", "data_type": "TYPE_INT32", "dims": [ 384 ], "label_filename": "", "is_shape_tensor": false }, { "name": "SEQ_LEN__2", "data_type": "TYPE_INT64", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "TOK_STR__3", "data_type": "TYPE_STRING", "dims": [ 384 ], "label_filename": "", "is_shape_tensor": false }, { "name": "SEGMENT__4", "data_type": "TYPE_INT32", "dims": [ 384 ], "label_filename": "", "is_shape_tensor": false }, { "name": "TOK_TO_ORIG__5", "data_type": "TYPE_UINT16", "dims": [ 384 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "qa_tokenizer-en-US_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "unk_token": { "string_value": "[UNK]" }, "tokenizer": { "string_value": "wordpiece" }, "vocab": { "string_value": "/data/models/qa_tokenizer-en-US/1/tokenizer.vocab_file" }, "bos_token": { "string_value": "[CLS]" }, "max_query_length": { "string_value": "64" }, "to_lower": { "string_value": "true" }, "eos_token": { "string_value": "[SEP]" }, "pad_chars_with_spaces": { "string_value": "False" }, "task": { "string_value": "qa" }, "doc_stride": { "string_value": "128" } }, "model_warmup": [] } I1010 16:01:58.169204 103 onnxruntime.cc:2481] TRITONBACKEND_ModelInitialize: riva-onnx-fastpitch_encoder-English-US (version 1) I1010 16:01:58.209148 103 qa_postprocessor_cbe.cc:126] TRITONBACKEND_ModelInstanceInitialize: qa_qa_postprocessor_0 (device 0) I1010 16:01:58.209278 103 tokenizer_library.cc:21] TRITONBACKEND_ModelInstanceInitialize: qa_tokenizer-en-US_0 (device 0) I1010 16:01:58.209609 103 model_repository_manager.cc:1231] successfully loaded 'qa_qa_postprocessor' version 1 I1010 16:01:58.223281 103 onnxruntime.cc:2524] TRITONBACKEND_ModelInstanceInitialize: riva-onnx-fastpitch_encoder-English-US_0 (GPU device 0) I1010 16:01:58.223384 103 model_repository_manager.cc:1231] successfully loaded 'qa_tokenizer-en-US' version 1 > Riva waiting for Triton server to load all models...retrying in 1 second 2022-10-10 16:02:00.094136967 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '418'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094150047 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '490'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094153022 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '375'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094155541 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '346'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094157745 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '354'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094159886 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '307'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094162061 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '379'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094164094 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '373'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094166869 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '301'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094168925 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '286'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094171451 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '447'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094173561 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '358'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094175906 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '281'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094178541 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '274'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094180757 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '374'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094182894 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '181'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094185427 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '303'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094187577 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '302'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094189816 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '426'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094192196 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '425'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094194610 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '430'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094196948 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '282'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094199362 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '497'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094217231 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '445'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094219855 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '451'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094222297 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '498'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094242931 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '502'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094245253 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '446'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094247781 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '353'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094250465 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '518'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094253634 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '519'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094255871 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '517'. It is not used by any node and should be removed from the model. 2022-10-10 16:02:00.094258122 [W:onnxruntime:, graph.cc:3526 CleanUnusedInitializersAndNodeArgs] Removing initializer '523'. It is not used by any node and should be removed from the model. > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:02:01.085468 103 pipeline_library.cc:22] TRITONBACKEND_ModelInitialize: riva-punctuation-en-US (version 1) I1010 16:02:01.085944 103 model_repository_manager.cc:1231] successfully loaded 'riva-onnx-fastpitch_encoder-English-US' version 1 WARNING: Logging before InitGoogleLogging() is written to STDERR W1010 16:02:01.086774 125 parameter_parser.cc:144] Parameter 'attn_mask_tensor_name' set but unused. W1010 16:02:01.086802 125 parameter_parser.cc:144] Parameter 'bos_token' set but unused. W1010 16:02:01.086807 125 parameter_parser.cc:144] Parameter 'capit_logits_tensor_name' set but unused. W1010 16:02:01.086812 125 parameter_parser.cc:144] Parameter 'capitalization_mapping_path' set but unused. W1010 16:02:01.086814 125 parameter_parser.cc:144] Parameter 'delimiter' set but unused. W1010 16:02:01.086818 125 parameter_parser.cc:144] Parameter 'eos_token' set but unused. W1010 16:02:01.086822 125 parameter_parser.cc:144] Parameter 'input_ids_tensor_name' set but unused. W1010 16:02:01.086827 125 parameter_parser.cc:144] Parameter 'language_code' set but unused. W1010 16:02:01.086831 125 parameter_parser.cc:144] Parameter 'model_api' set but unused. W1010 16:02:01.086834 125 parameter_parser.cc:144] Parameter 'model_family' set but unused. W1010 16:02:01.086839 125 parameter_parser.cc:144] Parameter 'pad_chars_with_spaces' set but unused. W1010 16:02:01.086843 125 parameter_parser.cc:144] Parameter 'punct_logits_tensor_name' set but unused. W1010 16:02:01.086848 125 parameter_parser.cc:144] Parameter 'punctuation_mapping_path' set but unused. W1010 16:02:01.086853 125 parameter_parser.cc:144] Parameter 'remove_spaces' set but unused. W1010 16:02:01.086858 125 parameter_parser.cc:144] Parameter 'to_lower' set but unused. W1010 16:02:01.086861 125 parameter_parser.cc:144] Parameter 'token_type_tensor_name' set but unused. W1010 16:02:01.086865 125 parameter_parser.cc:144] Parameter 'tokenizer_to_lower' set but unused. W1010 16:02:01.086870 125 parameter_parser.cc:144] Parameter 'unk_token' set but unused. W1010 16:02:01.086874 125 parameter_parser.cc:144] Parameter 'use_int64_nn_inputs' set but unused. W1010 16:02:01.086879 125 parameter_parser.cc:144] Parameter 'vocab' set but unused. W1010 16:02:01.086992 125 parameter_parser.cc:144] Parameter 'model_api' set but unused. W1010 16:02:01.087002 125 parameter_parser.cc:144] Parameter 'model_family' set but unused. I1010 16:02:01.087139 103 backend_model.cc:303] model configuration: { "name": "riva-punctuation-en-US", "platform": "", "backend": "riva_nlp_pipeline", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "PIPELINE_INPUT", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "PIPELINE_OUTPUT", "data_type": "TYPE_STRING", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "riva-punctuation-en-US_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "vocab": { "string_value": "/data/models/riva-punctuation-en-US/1/e222f352288a423da453a79b96cc7b75_vocab.txt" }, "capit_logits_tensor_name": { "string_value": "capit_logits" }, "eos_token": { "string_value": "[SEP]" }, "pipeline_type": { "string_value": "punctuation" }, "capitalization_mapping_path": { "string_value": "/data/models/riva-punctuation-en-US/1/fb06800834e74de1bdc32db51da9619c_capit_label_ids.csv" }, "token_type_tensor_name": { "string_value": "token_type_ids" }, "tokenizer": { "string_value": "wordpiece" }, "delimiter": { "string_value": " " }, "pad_chars_with_spaces": { "string_value": "False" }, "remove_spaces": { "string_value": "False" }, "use_int64_nn_inputs": { "string_value": "False" }, "model_family": { "string_value": "riva" }, "unk_token": { "string_value": "[UNK]" }, "bos_token": { "string_value": "[CLS]" }, "punctuation_mapping_path": { "string_value": "/data/models/riva-punctuation-en-US/1/15eace99434b4c87ba28cbd294b48f43_punct_label_ids.csv" }, "model_api": { "string_value": "/nvidia.riva.nlp.RivaLanguageUnderstanding/PunctuateText" }, "to_lower": { "string_value": "true" }, "load_model": { "string_value": "false" }, "attn_mask_tensor_name": { "string_value": "attention_mask" }, "punct_logits_tensor_name": { "string_value": "punct_logits" }, "language_code": { "string_value": "en-US" }, "model_name": { "string_value": "riva-trt-riva-punctuation-en-US-nn-bert-base-uncased" }, "input_ids_tensor_name": { "string_value": "input_ids" }, "tokenizer_to_lower": { "string_value": "true" } }, "model_warmup": [] } > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:02:02.267028 103 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt I1010 16:02:02.267529 103 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9 I1010 16:02:02.268020 103 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9 I1010 16:02:02.269159 103 tensorrt.cc:5353] backend configuration: {} I1010 16:02:02.269285 103 pipeline_library.cc:25] TRITONBACKEND_ModelInstanceInitialize: riva-punctuation-en-US_0 (device 0) I1010 16:02:02.326967 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-conformer-en-US-asr-streaming-am-streaming (version 1) I1010 16:02:02.327122 103 model_repository_manager.cc:1231] successfully loaded 'riva-punctuation-en-US' version 1 I1010 16:02:02.328236 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-hifigan-English-US (version 1) I1010 16:02:02.328844 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased (version 1) I1010 16:02:02.328921 103 backend_model.cc:181] Overriding execution policy to "TRITONBACKEND_EXECUTION_BLOCKING" for sequence model "riva-trt-hifigan-English-US" I1010 16:02:02.329440 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-riva_intent_weather-nn-bert-base-uncased (version 1) I1010 16:02:02.330027 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-riva_ner-nn-bert-base-uncased (version 1) I1010 16:02:02.330605 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-riva_qa-nn-bert-base-uncased (version 1) I1010 16:02:02.331174 103 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: riva-trt-riva_text_classification_domain-nn-bert-base-uncased (version 1) I1010 16:02:02.332270 103 spectrogram-chunker.cc:274] TRITONBACKEND_ModelInitialize: spectrogram_chunker-English-US (version 1) I1010 16:02:02.333155 103 backend_model.cc:303] model configuration: { "name": "spectrogram_chunker-English-US", "platform": "", "backend": "riva_tts_chunker", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "SPECTROGRAM", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 80, -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IS_LAST_SENTENCE", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "NUM_VALID_FRAMES_IN", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SENTENCE_NUM", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "DURATIONS", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "PROCESSED_TEXT", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "VOLUME", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "SPECTROGRAM_CHUNK", "data_type": "TYPE_FP32", "dims": [ 80, -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "END_FLAG", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "NUM_VALID_SAMPLES_OUT", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "SENTENCE_NUM", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "DURATIONS", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PROCESSED_TEXT", "data_type": "TYPE_STRING", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "VOLUME", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 8, "preferred_batch_size": [ 8 ], "max_queue_delay_microseconds": 1000 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "spectrogram_chunker-English-US_0", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "supports_volume": { "string_value": "True" }, "num_samples_per_frame": { "string_value": "512" }, "max_execution_batch_size": { "string_value": "8" }, "chunk_length": { "string_value": "80" }, "num_mels": { "string_value": "80" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": true } } I1010 16:02:02.333236 103 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: riva-trt-hifigan-English-US_0 (GPU device 0) > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:02:05.489209 103 logging.cc:49] [MemUsageChange] Init CUDA: CPU +313, GPU +0, now: CPU 1768, GPU 5099 (MiB) I1010 16:02:05.534116 103 logging.cc:49] Loaded engine size: 32 MiB I1010 16:02:05.696993 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1841, GPU 5139 (MiB) I1010 16:02:05.698375 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1841, GPU 5149 (MiB) I1010 16:02:05.699643 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +31, now: CPU 0, GPU 31 (MiB) I1010 16:02:05.704293 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1775, GPU 5141 (MiB) I1010 16:02:05.705178 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1775, GPU 5149 (MiB) I1010 16:02:05.706412 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +246, now: CPU 0, GPU 277 (MiB) I1010 16:02:05.706802 103 tensorrt.cc:1411] Created instance riva-trt-hifigan-English-US_0 on GPU 0 with stream priority 0 and optimization profile default[0]; I1010 16:02:05.706864 103 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0 (GPU device 0) I1010 16:02:05.707075 103 model_repository_manager.cc:1231] successfully loaded 'riva-trt-hifigan-English-US' version 1 I1010 16:02:05.707743 103 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1776, GPU 5401 (MiB) > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:02:06.224636 103 logging.cc:49] Loaded engine size: 833 MiB I1010 16:02:06.380161 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3860, GPU 5825 (MiB) I1010 16:02:06.381009 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3860, GPU 5835 (MiB) I1010 16:02:06.381439 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +415, now: CPU 0, GPU 692 (MiB) I1010 16:02:06.428151 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2194, GPU 5827 (MiB) I1010 16:02:06.428908 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2194, GPU 5835 (MiB) I1010 16:02:07.008219 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +36, now: CPU 0, GPU 728 (MiB) I1010 16:02:07.008473 103 tensorrt.cc:1411] Created instance riva-trt-riva-punctuation-en-US-nn-bert-base-uncased_0 on GPU 0 with stream priority 0 and optimization profile default[0]; I1010 16:02:07.008530 103 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva_intent_weather-nn-bert-base-uncased_0 (GPU device 0) I1010 16:02:07.008809 103 model_repository_manager.cc:1231] successfully loaded 'riva-trt-riva-punctuation-en-US-nn-bert-base-uncased' version 1 I1010 16:02:07.009465 103 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2212, GPU 5877 (MiB) I1010 16:02:07.137040 103 logging.cc:49] Loaded engine size: 208 MiB > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:02:07.646951 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2759, GPU 6227 (MiB) I1010 16:02:07.647652 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2759, GPU 6235 (MiB) I1010 16:02:07.648068 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +95, now: CPU 0, GPU 823 (MiB) I1010 16:02:07.660328 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2342, GPU 6227 (MiB) I1010 16:02:07.660981 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2342, GPU 6235 (MiB) I1010 16:02:07.701526 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +108, now: CPU 0, GPU 931 (MiB) I1010 16:02:07.701741 103 tensorrt.cc:1411] Created instance riva-trt-riva_intent_weather-nn-bert-base-uncased_0 on GPU 0 with stream priority 0 and optimization profile default[0]; I1010 16:02:07.701819 103 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva_ner-nn-bert-base-uncased_0 (GPU device 0) I1010 16:02:07.701906 103 model_repository_manager.cc:1231] successfully loaded 'riva-trt-riva_intent_weather-nn-bert-base-uncased' version 1 I1010 16:02:07.702590 103 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2456, GPU 6487 (MiB) I1010 16:02:07.831747 103 logging.cc:49] Loaded engine size: 209 MiB I1010 16:02:07.905910 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2989, GPU 6831 (MiB) I1010 16:02:07.906630 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2989, GPU 6839 (MiB) I1010 16:02:07.907058 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +95, now: CPU 0, GPU 1026 (MiB) I1010 16:02:07.919209 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2571, GPU 6831 (MiB) I1010 16:02:07.919882 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2571, GPU 6839 (MiB) I1010 16:02:07.960179 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +109, now: CPU 0, GPU 1135 (MiB) I1010 16:02:07.960365 103 tensorrt.cc:1411] Created instance riva-trt-riva_ner-nn-bert-base-uncased_0 on GPU 0 with stream priority 0 and optimization profile default[0]; I1010 16:02:07.960448 103 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva_qa-nn-bert-base-uncased_0 (GPU device 0) I1010 16:02:07.960695 103 model_repository_manager.cc:1231] successfully loaded 'riva-trt-riva_ner-nn-bert-base-uncased' version 1 I1010 16:02:07.961474 103 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2684, GPU 7093 (MiB) I1010 16:02:08.090194 103 logging.cc:49] Loaded engine size: 208 MiB I1010 16:02:08.164157 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 3215, GPU 7435 (MiB) I1010 16:02:08.164890 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3215, GPU 7443 (MiB) I1010 16:02:08.165340 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +95, now: CPU 0, GPU 1230 (MiB) I1010 16:02:08.177551 103 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2798, GPU 7435 (MiB) I1010 16:02:08.178296 103 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 2799, GPU 7443 (MiB) > Riva waiting for Triton server to load all models...retrying in 1 second I1010 16:02:08.219677 103 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +126, now: CPU 0, GPU 1356 (MiB) I1010 16:02:08.219866 103 tensorrt.cc:1411] Created instance riva-trt-riva_qa-nn-bert-base-uncased_0 on GPU 0 with stream priority 0 and optimization profile default[0]; I1010 16:02:08.219900 103 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: riva-trt-riva_text_classification_domain-nn-bert-base-uncased_0 (GPU device 0) I1010 16:02:08.220081 103 model_repository_manager.cc:1231] successfully loaded 'riva-trt-riva_qa-nn-bert-base-uncased' version 1 I1010 16:02:08.220990 103 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2912, GPU 7715 (MiB) I1010 16:02:08.353270 103 logging.cc:49] Loaded engine size: 209 MiB E1010 16:02:08.414756 103 logging.cc:43] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/bertQKVToContextPlugin/qkvToContext.cu (470) - cuBLAS Error in UnfusedMHARunner: 3 (CUBLAS_STATUS_ALLOC_FAILED) terminate called after throwing an instance of 'nvinfer1::plugin::CublasError' what(): std::exception /opt/riva/bin/start-riva: line 4: 103 Aborted (core dumped) ${CUSTOM_TRITON_ENV} tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000 > Triton server died before reaching ready state. Terminating Riva startup. Check Triton logs with: docker logs kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]