Hardware - GPU /T4
Hardware - CPU Intel x64
Operating System - Ubuntu 24.04
Riva Version - 2.19
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
The error happens when following Riva quick start guide for speech synthesis.
Here are the commands:
ngc registry resource download-version nvidia/riva/riva_quickstart:2.19.0
cd riva_quickstart_v2.19.0
# edit config.sh. Attached the content below.
bash riva_init.sh
bash riva_start.sh
Got “out of memory error”:
I0723 14:50:08.649100 144 pipeline_library.cc:26] "TRITONBACKEND_ModelInstanceInitialize: conformer-es-ES-asr-streaming-asr-bls-ensemble_0_0 (device 0)"
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
cudaError_t 2 : "out of memory" returned from 'cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 100'
cudaError_t 1 : "invalid argument" returned from 'cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 122'
cudaError_t 2 : "out of memory" returned from 'cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 100'
cudaError_t 1 : "invalid argument" returned from 'cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 122'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)' in fileriva/asr/features/extractor.cc line 412'
cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()' in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 122'
cublasStatus_t 3 : "CUBLAS_STATUS_ALLOC_FAILED" returned from 'cublasCreate(&cublas_handle_)'cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, bytes)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, bytes)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, row_bytes * rows)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, row_bytes * rows)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204'
The whole log from docker logs riva-speech
ubuntu@ip-172-31-35-118:~/riva_quickstart_v2.19.0$ ./riva_start.sh
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models…retrying in 10 seconds
…
Health ready check failed.
Check Riva logs with: docker logs riva-speech
ubuntu@ip-172-31-35-118:~/riva_quickstart_v2.19.0$ docker logs riva-speech
==========================
=== Riva Speech Skills ===
NVIDIA Release 25.02 (build 151443007)
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:49:34.793050 144 pinned_memory_manager.cc:277] “Pinned memory pool is created at ‘0x7e4b48000000’ with size 268435456”
I0723 14:49:34.797271 144 cuda_memory_manager.cc:107] “CUDA memory pool is created on device 0 with size 1000000000”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:49:39.789726 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-de-DE-asr-streaming-asr-bls-ensemble (version 1)”
I0723 14:49:39.790338 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-de-DE-asr-offline-asr-bls-ensemble (version 1)”
I0723 14:49:39.790780 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-ar-AR-asr-streaming-asr-bls-ensemble (version 1)”
I0723 14:49:39.791101 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-ar-AR-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: Found yaml file: /data/models/conformer-ar-AR-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
Found yaml file: /data/models/conformer-de-DE-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
/data/models/conformer-de-DE-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
Found yaml file: /data/models/conformer-ar-AR-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:39.878422 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-de-DE-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:39.878427 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-ar-AR-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:39.878435 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-de-DE-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:39.878648 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-ar-AR-asr-offline-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0723 14:49:46.418423 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:46.418465 152 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:46.425920 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:46.460240 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-ar-AR-asr-streaming-asr-bls-ensemble’”
I0723 14:49:46.464440 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-GB-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-GB-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:46.473134 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:46.473157 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:46.474501 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:46.478671 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-GB-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-GB-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "offline": {\n "string_value": "True"\n },\n "type": {\n "string_value": "offline"\n },\n "streaming": {\n "string_value": "True"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "model_family": {\n "string_value": "riva"\n },\n "language_code": {\n "string_value": "en-GB"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:46.478920 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-GB-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:46.496405 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-ar-AR-asr-offline-asr-bls-ensemble’”
I0723 14:49:46.499659 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-GB-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-GB-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:46.515369 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-GB-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-GB-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "language_code": {\n "string_value": "en-GB"\n },\n "model_family": {\n "string_value": "riva"\n },\n "type": {\n "string_value": "online"\n },\n "offline": {\n "string_value": "False"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "streaming": {\n "string_value": "True"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:46.515658 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-GB-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:49:54.895031 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:54.895257 152 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:54.898206 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:54.898232 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:54.907298 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:54.929798 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:55.033062 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-GB-asr-offline-asr-bls-ensemble’”
I0723 14:49:55.036401 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:55.046675 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-GB-asr-streaming-asr-bls-ensemble’”
I0723 14:49:55.050035 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:55.051543 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-US-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-US-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "type": {\n "string_value": "offline"\n },\n "language_code": {\n "string_value": "en-US"\n },\n "offline": {\n "string_value": "True"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "streaming": {\n "string_value": "True"\n },\n "model_family": {\n "string_value": "riva"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:55.051666 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:55.064985 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-US-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-US-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "type": {\n "string_value": "online"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "model_family": {\n "string_value": "riva"\n },\n "streaming": {\n "string_value": "True"\n },\n "offline": {\n "string_value": "False"\n },\n "language_code": {\n "string_value": "en-US"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:55.065193 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:50:08.394371 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:08.394407 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:50:08.449123 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:50:08.471542 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:08.471572 152 asr_ensemble_factory.cc:284] Done loading acoustic model
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:50:08.528484 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:50:08.563075 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’”
I0723 14:50:08.566582 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-ES-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-ES-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:50:08.581015 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-ES-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-ES-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "language_code": {\n "string_value": "es-ES"\n },\n "model_family": {\n "string_value": "riva"\n },\n "streaming": {\n "string_value": "True"\n },\n "offline": {\n "string_value": "True"\n },\n "type": {\n "string_value": "offline"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:08.581155 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-ES-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:50:08.628725 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’”
I0723 14:50:08.631859 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-ES-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-ES-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:50:08.648860 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-ES-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-ES-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "language_code": {\n "string_value": "es-ES"\n },\n "offline": {\n "string_value": "False"\n },\n "model_family": {\n "string_value": "riva"\n },\n "streaming": {\n "string_value": "True"\n },\n "type": {\n "string_value": "online"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:08.649100 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-ES-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)’ in fileriva/asr/features/extractor.cc line 412’
cudaError_t 2 : “out of memory” returned from ‘cudaGetLastError()’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 122’
cublasStatus_t 3 : “CUBLAS_STATUS_ALLOC_FAILED” returned from ‘cublasCreate(&cublas_handle_)‘cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
I0723 14:50:10.870836 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:10.870865 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:50:10.894193 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc( (void**)&cu_input_audio_signals_, config_.max_execution_batch_size * config_.num_samples * sizeof(float))’ in fileriva/asr/features/extractor.cc line 392’
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)’ in fileriva/asr/features/extractor.cc line 412’
cudaError_t 2 : “out of memory” returned from ‘cudaGetLastError()’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 122’
I0723 14:50:11.071019 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-es-ES-asr-offline-asr-bls-ensemble’”
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaFree(this->data_)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 492’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
I0723 14:50:11.087798 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-US-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:50:11.112111 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:11.112136 152 asr_ensemble_factory.cc:284] Done loading acoustic model
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( pinned_host_logits_buffer_.data(), pinned_host_logits_buffer_.size() * sizeof(float), 0)’ in fileriva/asr/pipeline/asr_ensemble/streaming_asr_ensemble.cc line 125’
I0723 14:50:11.113827 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
I0723 14:50:11.114156 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-US-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-US-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "language_code": {\n "string_value": "es-US"\n },\n "type": {\n "string_value": "offline"\n },\n "offline": {\n "string_value": "True"\n },\n "streaming": {\n "string_value": "True"\n },\n "model_family": {\n "string_value": "riva"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:11.114289 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:50:11.340319 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-es-ES-asr-streaming-asr-bls-ensemble’”
I0723 14:50:11.345478 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-US-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
I0723 14:50:11.359800 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-US-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-US-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "model_family": {\n "string_value": "riva"\n },\n "language_code": {\n "string_value": "es-US"\n },\n "type": {\n "string_value": "online"\n },\n "streaming": {\n "string_value": "True"\n },\n "offline": {\n "string_value": "False"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:11.359922 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister(audio_processed_.data(), audio_processed_.size() * sizeof(float), 0)’ in fileriva/asr/features/extractor.cc line 238’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( time_steps_processed_.data(), time_steps_processed_.size() * sizeof(int), 0)’ in fileriva/asr/features/extractor.cc line 241’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( feature_offset_start_.data(), feature_offset_start_.size() * sizeof(int), 0)’ in fileriva/asr/features/extractor.cc line 244’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister(feature_offset_end_.data(), feature_offset_end_.size() * sizeof(int), 0)’ in fileriva/asr/features/extractor.cc line 247’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_feature_offset_start_, sizeof(int) * feature_offset_start_.size())’ in fileriva/asr/features/extractor.cc line 249’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_feature_offset_end_, sizeof(int) * feature_offset_end_.size())’ in fileriva/asr/features/extractor.cc line 250’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc( &d_const_left_pad_feature_offset_start_, sizeof(int) * const_left_pad_feature_offset_start.size())’ in fileriva/asr/features/extractor.cc line 256’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc( &d_const_left_pad_feature_offset_end_, sizeof(int) * const_left_pad_feature_offset_end.size())’ in fileriva/asr/features/extractor.cc line 259’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy( d_const_left_pad_feature_offset_start_, const_left_pad_feature_offset_start.data(), sizeof(int) * const_left_pad_feature_offset_start.size(), cudaMemcpyHostToDevice)’ in fileriva/asr/features/extractor.cc line 262’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy( d_const_left_pad_feature_offset_end_, const_left_pad_feature_offset_end.data(), sizeof(int) * const_left_pad_feature_offset_end.size(), cudaMemcpyHostToDevice)’ in fileriva/asr/features/extractor.cc line 265’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_offline_batch_slot_ids_, config_.max_execution_batch_size * sizeof(int))’ in fileriva/asr/features/extractor.cc line 272’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy( d_offline_batch_slot_ids_, offline_batch_slot_ids_.data(), config_.max_execution_batch_size * sizeof(int), cudaMemcpyHostToDevice)’ in fileriva/asr/features/extractor.cc line 274’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_const_batch_slots_zero_, sizeof(int) * 1)’ in fileriva/asr/features/extractor.cc line 280’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset(d_const_batch_slots_zero_, 0, sizeof(int) * 1)’ in fileriva/asr/features/extractor.cc line 281’
Riva waiting for Triton server to load all models…retrying in 1 second
cublasStatus_t 1 : “CUBLAS_STATUS_NOT_INITIALIZED” returned from 'cublasCreate(&cublas_handle_)'cublasStatus_t 203 : “CUBLAS_STATUS_UNKNOWN_ERROR” returned from 'curandCreateGenerator(&curand_handle_, CURAND_RNG_PSEUDO_DEFAULT)'curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from 'curandSetGeneratorOrdering(curand_handle_, CURAND_ORDERING_PSEUDO_DEFAULT)'curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from ‘curandSetStream(curand_handle_, cudaStreamPerThread)‘curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from ‘curandSetPseudoRandomGeneratorSeed( curand_handle_, config_.is_dither_seed_random ? riva::utils::matrix::RandInt(kMIN_DITHER_SEED, RAND_MAX) : kMIN_DITHER_SEED)‘curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from ‘curandSetGeneratorOffset(curand_handle_, 0)‘cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 229’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( RowData(row), mat[row].data(), num_cols * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 245’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( RowData(row), mat[row].data(), num_cols * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 245’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy2DAsync( data_, dst_pitch, M.data_, src_pitch, width, M.num_rows_, cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 271’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaGetLastError()’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 274’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_, &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 65’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_, &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 65’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_, &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 65’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync(vecs_, &vecs[0], size * sizeof(float ), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 70’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync(offsets_, &offsets[0], size * sizeof(int32_t), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 72’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync(sizes_, &sizes[0], size * sizeof(int32_t), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 74’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaStreamSynchronize(cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 76’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 229’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&lanes_, sizeof(LaneDesc) * max_lanes_)’ in fileexternal/cu-feat-extr/src/cudafeat/online-batched-feature-pipeline-cuda.cc line 151’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( cpu_pcm_audio_buffers_.data(), cpu_pcm_audio_buffers_.size() * sizeof(int16_t), 0)’ in fileriva/asr/features/extractor.cc line 379’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 700 : “an illegal memory access was encountered” returned from 'cudaMalloc( (void*)&cu_input_audio_signals_, config_.max_execution_batch_size * config_.num_samples * sizeof(float))’ in fileriva/asr/features/extractor.cc line 392’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)’ in fileriva/asr/features/extractor.cc line 412’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 229’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
[97b7eb4cefde:144 :0:151] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)
=================================
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
/opt/riva/bin/start-riva: line 8: 144 Segmentation fault (core dumped) ${CUSTOM_TRITON_ENV} tritonserver --log-verbose=${TRITON_LOG_VERBOSE} --log-info=${TRITON_LOG_INFO} --disable-auto-complete-config $model_repos --cuda-memory-pool-byte-size=0:1000000000
Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec … or kill -l [sigspec]
config.sh
#!/bin/bash
GPU family of target platform. Supported values: tegra, non-tegra
riva_target_gpu_family=“non-tegra”
Name of tegra platform that is being used. Supported tegra platforms: orin, xavier
riva_tegra_platform=“orin”
####### Enable or Disable Riva Services #######
For any language other than en-US: service_enabled_nlp must be set to false
service_enabled_asr=false
service_enabled_nlp=false
service_enabled_tts=true
service_enabled_nmt=false
…
Ports to expose for Riva services
riva_speech_api_port=“50051”
riva_speech_api_http_port=“50000”
NGC orgs
riva_ngc_org=“nvidia”
riva_ngc_team=“riva”
riva_ngc_image_version=“2.19.0”
riva_ngc_model_version=“2.19.0”
…
…
…
cuda and gpu driver info
ubuntu@ip-172-31-35-118:~/riva_quickstart_v2.19.0$ nvidia-smi
Wed Jul 23 15:04:06 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 37C P0 28W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+