RIVA error, when deploying official Conformer ASR network

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100): NVIDIA RTX A6000
Hardware - CPU: AMD Ryzen 9 5900X 12-Core Processor
Operating System: Ubuntu 20.04
Riva Version: 2.2.0
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
I tried to build and deploy the STT En Conformer-CTC XLarge model, from NGC.
build command (based on the documentation ):

riva-build speech_recognition \
/riva/stt_en_conformer_ctc_xlarge.rmir\
/nemo/stt_en_conformer_ctc_xlarge.riva \

–name=conformer-en-US-asr-offline
–featurizer.use_utterance_norm_params=False
–featurizer.precalc_norm_time_steps=0
–featurizer.precalc_norm_params=False
–ms_per_timestep=40
–nn.fp16_needs_obey_precision_pass
–vad.vad_start_history=200
–chunk_size=4.8
–left_padding_size=1.6
–right_padding_size=1.6
–max_batch_size=16
–decoder_type=greedy
–language_code=en-US

deploy command:

riva-deploy -f /data/rmir/stt_en_conformer_ctc_xlarge.rmir /data/models/

error log:

==========================
=== Riva Speech Skills ===

NVIDIA Release 22.05 (build 38626400)
Riva Speech Server Version 2.2.0
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

NVIDIA TensorRT | NVIDIA Developer

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh. To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b
See GitHub - NVIDIA/TensorRT: TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. for more information.

[NeMo W 2022-06-07 12:56:58 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
2022-06-07 12:56:58,172 [INFO] Writing Riva model repository to ‘/data/models/’…
2022-06-07 12:56:58,172 [INFO] The riva model repo target directory is /data/models/
2022-06-07 12:57:37,489 [INFO] Using obey-precision pass with fp16 TRT
2022-06-07 12:57:37,489 [INFO] Extract_binaries for nn → /data/models/riva-trt-conformer-en-US-asr-offline-am-streaming/1
2022-06-07 12:57:37,489 [INFO] extracting {‘onnx’: (‘nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE’, ‘model_graph.onnx’)} → /data/models/riva-trt-conformer-en-US-asr-offline-am-streaming/1
2022-06-07 12:57:39,193 [INFO] Printing copied artifacts:
2022-06-07 12:57:39,193 [INFO] {‘onnx’: ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming/1/model_graph.onnx’}
2022-06-07 12:57:39,193 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1682042606
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1682042606
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:780: While parsing node number 203 [Where → “1377”]:
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:781: — Begin node —
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:782: input: “1375”
input: “1376”
input: “1374”
output: “1377”
name: “Where_301”
op_type: “Where”

[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:783: — End node —
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:785: ERROR: parsers/onnx/builtin_op_importers.cpp:4705 In function importWhere:
[8] Assertion failed: (x->getType() == y->getType() && x->getType() != nvinfer1::DataType::kBOOL) && “This version of TensorRT requires input x and y to have the same data type. BOOL is unsupported.”
2022-06-07 12:57:47,486 [INFO] Mixed-precision net: 482 layers, 482 tensors, 0 outputs…
2022-06-07 12:57:47,492 [INFO] Mixed-precision net: 0 layers / 0 outputs fixed
[06/07/2022-12:57:47] [TRT] [E] 4: [network.cpp::validate::2633] Error Code 4: Internal Error (Network must have at least one output)
[06/07/2022-12:57:47] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
2022-06-07 12:57:47,701 [INFO] Extract_binaries for featurizer → /data/models/conformer-en-US-asr-offline-feature-extractor-streaming/1
2022-06-07 12:57:47,702 [INFO] Extract_binaries for vad → /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming/1
2022-06-07 12:57:47,702 [INFO] extracting {‘vocab_file’: ‘/tmp/tmpbid6pnbc/riva_decoder_vocabulary.txt’} → /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming/1
2022-06-07 12:57:47,703 [INFO] Extract_binaries for lm_decoder → /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1
2022-06-07 12:57:47,703 [INFO] extracting {‘vocab_file’: ‘/tmp/tmpbid6pnbc/riva_decoder_vocabulary.txt’, ‘tokenizer_model’: (‘nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE’, ‘19196a05d50f48f68648bfd65f3fb6b0_tokenizer.model’)} → /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1
2022-06-07 12:57:47,704 [INFO] {‘vocab_file’: ‘/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1/riva_decoder_vocabulary.txt’, ‘tokenizer_model’: ‘/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1/19196a05d50f48f68648bfd65f3fb6b0_tokenizer.model’}
2022-06-07 12:57:47,705 [INFO] Extract_binaries for conformer-en-US-asr-offline → /data/models/conformer-en-US-asr-offline/1
2022-06-07 12:57:47,705 [ERROR] Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py”, line 100, in deploy_from_rmir
generator.serialize_to_disk(
File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 427, in serialize_to_disk
RivaConfigGenerator.serialize_to_disk(self, repo_dir, rmir, config_only, verbose, overwrite)
File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 306, in serialize_to_disk
self.generate_config(version_dir, rmir)
File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/asr.py”, line 838, in generate_config
‘output_map’: {nn._outputs[0].name: ctc_inp_key},
IndexError: list index out of range

[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] ‘Shape tensor cast elision’ routine failed with: None

Hi @balintcenturio

Thanks for your interest in Riva,

Apologies for the delay

Thanks for sharing the logs, I am checking with the team further regarding this issue/error, once I have an update I will reply back soon

Thanks for your patience