"out of memory" error when run riva_start.sh

robin84 · July 25, 2025, 6:09am

Hardware - GPU /T4
Hardware - CPU Intel x64
Operating System - Ubuntu 24.04
Riva Version - 2.19
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

The error happens when following Riva quick start guide for speech synthesis.
Here are the commands:

ngc registry resource download-version nvidia/riva/riva_quickstart:2.19.0
cd riva_quickstart_v2.19.0

# edit config.sh.  Attached the content below.

bash riva_init.sh
bash riva_start.sh

Got “out of memory error”:

I0723 14:50:08.649100 144 pipeline_library.cc:26] "TRITONBACKEND_ModelInstanceInitialize: conformer-es-ES-asr-streaming-asr-bls-ensemble_0_0 (device 0)"
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
cudaError_t 2 : "out of memory" returned from 'cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 100'
cudaError_t 1 : "invalid argument" returned from 'cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 122'
cudaError_t 2 : "out of memory" returned from 'cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 100'
cudaError_t 1 : "invalid argument" returned from 'cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)' in fileriva/utils/matrix/cu_matrix.cc line 122'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)' in fileriva/asr/features/extractor.cc line 412'
cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()' in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 122'
cublasStatus_t 3 : "CUBLAS_STATUS_ALLOC_FAILED" returned from 'cublasCreate(&cublas_handle_)'cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, bytes)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, bytes)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, row_bytes * rows)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204'
cudaError_t 2 : "out of memory" returned from 'cudaMalloc(&data, row_bytes * rows)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204'

The whole log from docker logs riva-speech

ubuntu@ip-172-31-35-118:~/riva_quickstart_v2.19.0$ ./riva_start.sh
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models…retrying in 10 seconds
…
Health ready check failed.
Check Riva logs with: docker logs riva-speech
ubuntu@ip-172-31-35-118:~/riva_quickstart_v2.19.0$ docker logs riva-speech

==========================
=== Riva Speech Skills ===

NVIDIA Release 25.02 (build 151443007)

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:49:34.793050 144 pinned_memory_manager.cc:277] “Pinned memory pool is created at ‘0x7e4b48000000’ with size 268435456”
I0723 14:49:34.797271 144 cuda_memory_manager.cc:107] “CUDA memory pool is created on device 0 with size 1000000000”

Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:49:39.789726 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-de-DE-asr-streaming-asr-bls-ensemble (version 1)”
I0723 14:49:39.790338 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-de-DE-asr-offline-asr-bls-ensemble (version 1)”
I0723 14:49:39.790780 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-ar-AR-asr-streaming-asr-bls-ensemble (version 1)”
I0723 14:49:39.791101 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-ar-AR-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: Found yaml file: /data/models/conformer-ar-AR-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
Found yaml file: /data/models/conformer-de-DE-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
/data/models/conformer-de-DE-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
Found yaml file: /data/models/conformer-ar-AR-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:39.878422 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-de-DE-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:39.878427 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-ar-AR-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:39.878435 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-de-DE-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:39.878648 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-ar-AR-asr-offline-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0723 14:49:46.418423 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:46.418465 152 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:46.425920 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:46.460240 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-ar-AR-asr-streaming-asr-bls-ensemble’”
I0723 14:49:46.464440 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-GB-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-GB-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:46.473134 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:46.473157 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:46.474501 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:46.478671 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-GB-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-GB-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "offline": {\n "string_value": "True"\n },\n "type": {\n "string_value": "offline"\n },\n "streaming": {\n "string_value": "True"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "model_family": {\n "string_value": "riva"\n },\n "language_code": {\n "string_value": "en-GB"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:46.478920 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-GB-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:46.496405 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-ar-AR-asr-offline-asr-bls-ensemble’”
I0723 14:49:46.499659 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-GB-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-GB-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:46.515369 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-GB-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-GB-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "language_code": {\n "string_value": "en-GB"\n },\n "model_family": {\n "string_value": "riva"\n },\n "type": {\n "string_value": "online"\n },\n "offline": {\n "string_value": "False"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "streaming": {\n "string_value": "True"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:46.515658 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-GB-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:49:54.895031 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:54.895257 152 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:54.898206 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:49:54.898232 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:49:54.907298 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:54.929798 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:49:55.033062 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-GB-asr-offline-asr-bls-ensemble’”
I0723 14:49:55.036401 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:55.046675 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-GB-asr-streaming-asr-bls-ensemble’”
I0723 14:49:55.050035 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-en-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:49:55.051543 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-US-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-US-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "type": {\n "string_value": "offline"\n },\n "language_code": {\n "string_value": "en-US"\n },\n "offline": {\n "string_value": "True"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "streaming": {\n "string_value": "True"\n },\n "model_family": {\n "string_value": "riva"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:55.051666 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:49:55.064985 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-en-US-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-en-US-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "type": {\n "string_value": "online"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "sample_rate": {\n "string_value": "16000"\n },\n "model_family": {\n "string_value": "riva"\n },\n "streaming": {\n "string_value": "True"\n },\n "offline": {\n "string_value": "False"\n },\n "language_code": {\n "string_value": "en-US"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:49:55.065193 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:50:08.394371 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:08.394407 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:50:08.449123 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:50:08.471542 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:08.471572 152 asr_ensemble_factory.cc:284] Done loading acoustic model
Riva waiting for Triton server to load all models…retrying in 1 second
I0723 14:50:08.528484 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
I0723 14:50:08.563075 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-streaming-asr-bls-ensemble’”
I0723 14:50:08.566582 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-ES-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-ES-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:50:08.581015 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-ES-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-ES-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "language_code": {\n "string_value": "es-ES"\n },\n "model_family": {\n "string_value": "riva"\n },\n "streaming": {\n "string_value": "True"\n },\n "offline": {\n "string_value": "True"\n },\n "type": {\n "string_value": "offline"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:08.581155 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-ES-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:50:08.628725 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-en-US-asr-offline-asr-bls-ensemble’”
I0723 14:50:08.631859 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-ES-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-ES-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:50:08.648860 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-ES-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-ES-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "language_code": {\n "string_value": "es-ES"\n },\n "offline": {\n "string_value": "False"\n },\n "model_family": {\n "string_value": "riva"\n },\n "streaming": {\n "string_value": "True"\n },\n "type": {\n "string_value": "online"\n },\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:08.649100 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-ES-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)’ in fileriva/asr/features/extractor.cc line 412’
cudaError_t 2 : “out of memory” returned from ‘cudaGetLastError()’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 122’
cublasStatus_t 3 : “CUBLAS_STATUS_ALLOC_FAILED” returned from ‘cublasCreate(&cublas_handle_)‘cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
I0723 14:50:10.870836 151 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:10.870865 151 asr_ensemble_factory.cc:284] Done loading acoustic model
I0723 14:50:10.894193 151 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc( (void**)&cu_input_audio_signals_, config_.max_execution_batch_size * config_.num_samples * sizeof(float))’ in fileriva/asr/features/extractor.cc line 392’
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’
cudaError_t 1 : “invalid argument” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 2 : “out of memory” returned from ‘cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)’ in fileriva/asr/features/extractor.cc line 412’
cudaError_t 2 : “out of memory” returned from ‘cudaGetLastError()’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 122’
I0723 14:50:11.071019 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-es-ES-asr-offline-asr-bls-ensemble’”
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaFree(this->data_)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 492’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
I0723 14:50:11.087798 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-US-asr-offline-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-US-asr-offline-asr-bls-ensemble/1/riva_bls_config.yaml
I0723 14:50:11.112111 152 asr_ensemble_factory.cc:278] Loading acoustic model
I0723 14:50:11.112136 152 asr_ensemble_factory.cc:284] Done loading acoustic model
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( pinned_host_logits_buffer_.data(), pinned_host_logits_buffer_.size() * sizeof(float), 0)’ in fileriva/asr/pipeline/asr_ensemble/streaming_asr_ensemble.cc line 125’
I0723 14:50:11.113827 152 normalizer.cc:66] Proto String: tokenizer_grammar: “tokenizer.ascii_proto”
verbalizer_grammar: “verbalizer.ascii_proto”
sentence_boundary_regexp: "[\.:!\?] "
sentence_boundary_exceptions_file: “sentence_boundary_exceptions.txt”
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
I0723 14:50:11.114156 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-US-asr-offline-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-US-asr-offline-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "language_code": {\n "string_value": "es-US"\n },\n "type": {\n "string_value": "offline"\n },\n "offline": {\n "string_value": "True"\n },\n "streaming": {\n "string_value": "True"\n },\n "model_family": {\n "string_value": "riva"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:11.114289 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-US-asr-offline-asr-bls-ensemble_0_0 (device 0)”
I0723 14:50:11.340319 144 model_lifecycle.cc:849] “successfully loaded ‘conformer-es-ES-asr-streaming-asr-bls-ensemble’”
I0723 14:50:11.345478 144 pipeline_library.cc:22] “TRITONBACKEND_ModelInitialize: conformer-es-US-asr-streaming-asr-bls-ensemble (version 1)”
Found yaml file: /data/models/conformer-es-US-asr-streaming-asr-bls-ensemble/1/riva_bls_config.yaml
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 90’
I0723 14:50:11.359800 144 backend_model.cc:303] “model configuration:\n{\n "name": "conformer-es-US-asr-streaming-asr-bls-ensemble",\n "platform": "",\n "backend": "riva_asr_ensemble_pipeline",\n "runtime": "",\n "version_policy": {\n "latest": {\n "num_versions": 1\n }\n },\n "max_batch_size": 1024,\n "input": [\n {\n "name": "PIPELINE_INPUT",\n "data_type": "TYPE_STRING",\n "format": "FORMAT_NONE",\n "dims": [\n 1\n ],\n "is_shape_tensor": false,\n "allow_ragged_batch": false,\n "optional": false,\n "is_non_linear_format_io": false\n }\n ],\n "output": [\n {\n "name": "PIPELINE_OUTPUT",\n "data_type": "TYPE_STRING",\n "dims": [\n 1\n ],\n "label_filename": "",\n "is_shape_tensor": false,\n "is_non_linear_format_io": false\n }\n ],\n "batch_input": ,\n "batch_output": ,\n "optimization": {\n "graph": {\n "level": 0\n },\n "priority": "PRIORITY_DEFAULT",\n "cuda": {\n "graphs": false,\n "busy_wait_events": false,\n "graph_spec": ,\n "output_copy_stream": true\n },\n "input_pinned_memory": {\n "enable": true\n },\n "output_pinned_memory": {\n "enable": true\n },\n "gather_kernel_buffer_threshold": 0,\n "eager_batching": false\n },\n "sequence_batching": {\n "oldest": {\n "max_candidate_sequences": 1024,\n "preferred_batch_size": [\n 64,\n 128\n ],\n "max_queue_delay_microseconds": 1000,\n "preserve_ordering": false\n },\n "max_sequence_idle_microseconds": 60000000,\n "control_input": [\n {\n "name": "START",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_START",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "READY",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_READY",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "END",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_END",\n "int32_false_true": [\n 0,\n 1\n ],\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_INVALID"\n }\n ]\n },\n {\n "name": "CORRID",\n "control": [\n {\n "kind": "CONTROL_SEQUENCE_CORRID",\n "int32_false_true": ,\n "fp32_false_true": ,\n "bool_false_true": ,\n "data_type": "TYPE_UINT64"\n }\n ]\n }\n ],\n "state": ,\n "iterative_sequence": false\n },\n "instance_group": [\n {\n "name": "conformer-es-US-asr-streaming-asr-bls-ensemble_0",\n "kind": "KIND_CPU",\n "count": 1,\n "gpus": ,\n "secondary_devices": ,\n "profile": ,\n "passive": false,\n "host_policy": ""\n }\n ],\n "default_model_filename": "",\n "cc_model_filenames": {},\n "metric_tags": {},\n "parameters": {\n "yaml_parameters_file": {\n "string_value": "riva_bls_config.yaml"\n },\n "model_family": {\n "string_value": "riva"\n },\n "language_code": {\n "string_value": "es-US"\n },\n "type": {\n "string_value": "online"\n },\n "streaming": {\n "string_value": "True"\n },\n "offline": {\n "string_value": "False"\n },\n "sample_rate": {\n "string_value": "16000"\n }\n },\n "model_warmup": ,\n "model_transaction_policy": {\n "decoupled": true\n }\n}”
I0723 14:50:11.359922 144 pipeline_library.cc:26] “TRITONBACKEND_ModelInstanceInitialize: conformer-es-US-asr-streaming-asr-bls-ensemble_0_0 (device 0)”
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_.get(), vec.Data(), vec.Dim() * sizeof(Real), cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 132’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister(audio_processed_.data(), audio_processed_.size() * sizeof(float), 0)’ in fileriva/asr/features/extractor.cc line 238’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( time_steps_processed_.data(), time_steps_processed_.size() * sizeof(int), 0)’ in fileriva/asr/features/extractor.cc line 241’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( feature_offset_start_.data(), feature_offset_start_.size() * sizeof(int), 0)’ in fileriva/asr/features/extractor.cc line 244’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister(feature_offset_end_.data(), feature_offset_end_.size() * sizeof(int), 0)’ in fileriva/asr/features/extractor.cc line 247’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_feature_offset_start_, sizeof(int) * feature_offset_start_.size())’ in fileriva/asr/features/extractor.cc line 249’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_feature_offset_end_, sizeof(int) * feature_offset_end_.size())’ in fileriva/asr/features/extractor.cc line 250’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc( &d_const_left_pad_feature_offset_start_, sizeof(int) * const_left_pad_feature_offset_start.size())’ in fileriva/asr/features/extractor.cc line 256’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc( &d_const_left_pad_feature_offset_end_, sizeof(int) * const_left_pad_feature_offset_end.size())’ in fileriva/asr/features/extractor.cc line 259’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy( d_const_left_pad_feature_offset_start_, const_left_pad_feature_offset_start.data(), sizeof(int) * const_left_pad_feature_offset_start.size(), cudaMemcpyHostToDevice)’ in fileriva/asr/features/extractor.cc line 262’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy( d_const_left_pad_feature_offset_end_, const_left_pad_feature_offset_end.data(), sizeof(int) * const_left_pad_feature_offset_end.size(), cudaMemcpyHostToDevice)’ in fileriva/asr/features/extractor.cc line 265’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_offline_batch_slot_ids_, config_.max_execution_batch_size * sizeof(int))’ in fileriva/asr/features/extractor.cc line 272’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy( d_offline_batch_slot_ids_, offline_batch_slot_ids_.data(), config_.max_execution_batch_size * sizeof(int), cudaMemcpyHostToDevice)’ in fileriva/asr/features/extractor.cc line 274’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&d_const_batch_slots_zero_, sizeof(int) * 1)’ in fileriva/asr/features/extractor.cc line 280’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset(d_const_batch_slots_zero_, 0, sizeof(int) * 1)’ in fileriva/asr/features/extractor.cc line 281’
Riva waiting for Triton server to load all models…retrying in 1 second
cublasStatus_t 1 : “CUBLAS_STATUS_NOT_INITIALIZED” returned from 'cublasCreate(&cublas_handle_)'cublasStatus_t 203 : “CUBLAS_STATUS_UNKNOWN_ERROR” returned from 'curandCreateGenerator(&curand_handle_, CURAND_RNG_PSEUDO_DEFAULT)'curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from 'curandSetGeneratorOrdering(curand_handle_, CURAND_ORDERING_PSEUDO_DEFAULT)'curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from ‘curandSetStream(curand_handle_, cudaStreamPerThread)‘curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from ‘curandSetPseudoRandomGeneratorSeed( curand_handle_, config_.is_dither_seed_random ? riva::utils::matrix::RandInt(kMIN_DITHER_SEED, RAND_MAX) : kMIN_DITHER_SEED)‘curandStatus_t 101 : “CURAND_STATUS_NOT_INITIALIZED” returned from ‘curandSetGeneratorOffset(curand_handle_, 0)‘cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 229’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( RowData(row), mat[row].data(), num_cols * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 245’

cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( RowData(row), mat[row].data(), num_cols * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 245’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpy2DAsync( data_, dst_pitch, M.data_, src_pitch, width, M.num_rows_, cudaMemcpyDeviceToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 271’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaGetLastError()’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 274’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_, &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 65’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’

cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_, &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 65’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 167’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync( this->data_, &vec[0], vec.size() * sizeof(Real), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-vector.cc line 65’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync(vecs_, &vecs[0], size * sizeof(float ), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 70’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync(offsets_, &offsets[0], size * sizeof(int32_t), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 72’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemcpyAsync(sizes_, &sizes[0], size * sizeof(int32_t), cudaMemcpyHostToDevice, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 74’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaStreamSynchronize(cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudafeat/feature-online-batched-spectral-cuda.cc line 76’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 229’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&lanes_, sizeof(LaneDesc) * max_lanes_)’ in fileexternal/cu-feat-extr/src/cudafeat/online-batched-feature-pipeline-cuda.cc line 151’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaHostRegister( cpu_pcm_audio_buffers_.data(), cpu_pcm_audio_buffers_.size() * sizeof(int16_t), 0)’ in fileriva/asr/features/extractor.cc line 379’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, bytes)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 212’
cudaError_t 700 : “an illegal memory access was encountered” returned from 'cudaMalloc( (void*)&cu_input_audio_signals_, config_.max_execution_batch_size * config_.num_samples * sizeof(float))’ in fileriva/asr/features/extractor.cc line 392’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 105’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, row_bytes * rows, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 100’

cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_.get(), stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_matrix.cc line 122’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc( &d_output_buffer_, sizeof(float) * config_.max_execution_batch_size * config_.num_features * output_num_time_steps_)’ in fileriva/asr/features/extractor.cc line 412’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMemset2DAsync( data_, stride_ * sizeof(Real), 0, num_cols_ * sizeof(Real), num_rows_, cudaStreamPerThread)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 229’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMallocAsync(&data, bytes, cudaStreamPerThread)’ in fileriva/utils/matrix/cu_vector.cc line 179’
cudaError_t 700 : “an illegal memory access was encountered” returned from ‘cudaMalloc(&data, row_bytes * rows)’ in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204’
[97b7eb4cefde:144 :0:151] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)

=================================

Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
/opt/riva/bin/start-riva: line 8: 144 Segmentation fault (core dumped) ${CUSTOM_TRITON_ENV} tritonserver --log-verbose=${TRITON_LOG_VERBOSE} --log-info=${TRITON_LOG_INFO} --disable-auto-complete-config $model_repos --cuda-memory-pool-byte-size=0:1000000000
Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec … or kill -l [sigspec]

config.sh

#!/bin/bash

GPU family of target platform. Supported values: tegra, non-tegra

riva_target_gpu_family=“non-tegra”

Name of tegra platform that is being used. Supported tegra platforms: orin, xavier

riva_tegra_platform=“orin”

####### Enable or Disable Riva Services #######

For any language other than en-US: service_enabled_nlp must be set to false

service_enabled_asr=false
service_enabled_nlp=false
service_enabled_tts=true
service_enabled_nmt=false

…

Ports to expose for Riva services

riva_speech_api_port=“50051”
riva_speech_api_http_port=“50000”

NGC orgs

riva_ngc_org=“nvidia”
riva_ngc_team=“riva”
riva_ngc_image_version=“2.19.0”
riva_ngc_model_version=“2.19.0”

…
…
…

cuda and gpu driver info

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+

amargolin · July 28, 2025, 9:18pm

Thank you for submitting the issue. Please also look into submitting a support ticket.

rmittal · July 29, 2025, 12:35pm

From the logs, it seems you are actually running out of GPU memory available on the T4 GPU. The attached config.sh has only TTS enabled, but the logs show that all the models (even ASR) are getting deployed. If you want to deploy ASR, need to enable service_enabled_asr in the config.sh, else suggest to clean up previously deployed models using riva_clean.sh , then do riva_init.sh again.

Also, please monitor GPU memory usage using nvidia-smi command in real time when riva_start.sh is running. This will help to confirm if GPU memory is getting exhausted.

robin84 · August 1, 2025, 6:24am

thanks @rmittal and @amargolin ! With your help, I have resolved the issue!

Topic		Replies	Views
How can I start Riva without an error Riva riva	7	2689	September 29, 2021
Unable to start riva: Triton server died before reaching ready state. Terminating Riva startup Riva cuda , ubuntu , power	1	576	December 13, 2022
LLVM ERROR: out of memory Riva	4	209	September 9, 2025
Unable to start riva Riva	6	1802	March 12, 2022
Segfault and GPU memory overflow after activating all languages in RIVA for ASR Riva	4	1164	November 3, 2022
Failed to get riva started Riva riva	7	1885	December 3, 2022
Triton server died before reaching ready state. Terminating Riva startup Riva	15	8183	November 8, 2023
Error Code 2: OutOfMemory (no further information) Riva ubuntu , riva	9	2081	September 30, 2022
Can´t start riva Riva	1	1257	April 5, 2022
Getting error while instialaizing riva Riva installation , riva	5	1660	June 6, 2022