RIVA error, when deploying official Conformer ASR network

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100): NVIDIA RTX A6000
Hardware - CPU: AMD Ryzen 9 5900X 12-Core Processor
Operating System: Ubuntu 20.04
Riva Version: 2.2.0
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
I tried to build and deploy the STT En Conformer-CTC XLarge model, from NGC.
build command (based on the documentation ):

riva-build speech_recognition \
/riva/stt_en_conformer_ctc_xlarge.rmir\
/nemo/stt_en_conformer_ctc_xlarge.riva \

–name=conformer-en-US-asr-offline
–featurizer.use_utterance_norm_params=False
–featurizer.precalc_norm_time_steps=0
–featurizer.precalc_norm_params=False
–ms_per_timestep=40
–nn.fp16_needs_obey_precision_pass
–vad.vad_start_history=200
–chunk_size=4.8
–left_padding_size=1.6
–right_padding_size=1.6
–max_batch_size=16
–decoder_type=greedy
–language_code=en-US

deploy command:

riva-deploy -f /data/rmir/stt_en_conformer_ctc_xlarge.rmir /data/models/

error log:

==========================
=== Riva Speech Skills ===

NVIDIA Release 22.05 (build 38626400)
Riva Speech Server Version 2.2.0
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

TensorRT SDK | NVIDIA Developer

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh. To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b
See GitHub - NVIDIA/TensorRT: TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. for more information.

[NeMo W 2022-06-07 12:56:58 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
2022-06-07 12:56:58,172 [INFO] Writing Riva model repository to ‘/data/models/’…
2022-06-07 12:56:58,172 [INFO] The riva model repo target directory is /data/models/
2022-06-07 12:57:37,489 [INFO] Using obey-precision pass with fp16 TRT
2022-06-07 12:57:37,489 [INFO] Extract_binaries for nn → /data/models/riva-trt-conformer-en-US-asr-offline-am-streaming/1
2022-06-07 12:57:37,489 [INFO] extracting {‘onnx’: (‘nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE’, ‘model_graph.onnx’)} → /data/models/riva-trt-conformer-en-US-asr-offline-am-streaming/1
2022-06-07 12:57:39,193 [INFO] Printing copied artifacts:
2022-06-07 12:57:39,193 [INFO] {‘onnx’: ‘/data/models/riva-trt-conformer-en-US-asr-offline-am-streaming/1/model_graph.onnx’}
2022-06-07 12:57:39,193 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1682042606
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1682042606
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:780: While parsing node number 203 [Where → “1377”]:
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:781: — Begin node —
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:782: input: “1375”
input: “1376”
input: “1374”
output: “1377”
name: “Where_301”
op_type: “Where”

[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:783: — End node —
[06/07/2022-12:57:47] [TRT] [E] parsers/onnx/ModelImporter.cpp:785: ERROR: parsers/onnx/builtin_op_importers.cpp:4705 In function importWhere:
[8] Assertion failed: (x->getType() == y->getType() && x->getType() != nvinfer1::DataType::kBOOL) && “This version of TensorRT requires input x and y to have the same data type. BOOL is unsupported.”
2022-06-07 12:57:47,486 [INFO] Mixed-precision net: 482 layers, 482 tensors, 0 outputs…
2022-06-07 12:57:47,492 [INFO] Mixed-precision net: 0 layers / 0 outputs fixed
[06/07/2022-12:57:47] [TRT] [E] 4: [network.cpp::validate::2633] Error Code 4: Internal Error (Network must have at least one output)
[06/07/2022-12:57:47] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
2022-06-07 12:57:47,701 [INFO] Extract_binaries for featurizer → /data/models/conformer-en-US-asr-offline-feature-extractor-streaming/1
2022-06-07 12:57:47,702 [INFO] Extract_binaries for vad → /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming/1
2022-06-07 12:57:47,702 [INFO] extracting {‘vocab_file’: ‘/tmp/tmpbid6pnbc/riva_decoder_vocabulary.txt’} → /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming/1
2022-06-07 12:57:47,703 [INFO] Extract_binaries for lm_decoder → /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1
2022-06-07 12:57:47,703 [INFO] extracting {‘vocab_file’: ‘/tmp/tmpbid6pnbc/riva_decoder_vocabulary.txt’, ‘tokenizer_model’: (‘nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE’, ‘19196a05d50f48f68648bfd65f3fb6b0_tokenizer.model’)} → /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1
2022-06-07 12:57:47,704 [INFO] {‘vocab_file’: ‘/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1/riva_decoder_vocabulary.txt’, ‘tokenizer_model’: ‘/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming/1/19196a05d50f48f68648bfd65f3fb6b0_tokenizer.model’}
2022-06-07 12:57:47,705 [INFO] Extract_binaries for conformer-en-US-asr-offline → /data/models/conformer-en-US-asr-offline/1
2022-06-07 12:57:47,705 [ERROR] Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py”, line 100, in deploy_from_rmir
generator.serialize_to_disk(
File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 427, in serialize_to_disk
RivaConfigGenerator.serialize_to_disk(self, repo_dir, rmir, config_only, verbose, overwrite)
File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 306, in serialize_to_disk
self.generate_config(version_dir, rmir)
File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/asr.py”, line 838, in generate_config
‘output_map’: {nn._outputs[0].name: ctc_inp_key},
IndexError: list index out of range

[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] ‘Shape tensor cast elision’ routine failed with: None

Hi @balintcenturio

Thanks for your interest in Riva,

Apologies for the delay

Thanks for sharing the logs, I am checking with the team further regarding this issue/error, once I have an update I will reply back soon

Thanks for your patience

Hi @balintcenturio

Thanks for your interest in Riva,

Apologies for the delay,

There were some issues in earlier version of riva specific to xlarge models,

Please try the above conversion using latest version 2.3.0, Using this version, you will be able to do the conversion,

Please use only 2.3.0 for the all above process, like nemo2riva conversion etc.

if you face any issues, or need additional information please let us know

Thanks

Hi @rvinobha,

Thank you for your reply!

I have the same issue with the more recent version as well:

  1. I used the 22.01 version NeMo container, and the 2.3.0 version of nemo2riva to create the riva file from the nemo file.
  2. For further steps I used the 2.3.0 version of Riva as advised.

The error very similar, it mentions some ‘Where’ node assertion fail. Could it be the GPU, what I use? On paper the RTX A6000 is strong enough for these kind of tasks, but as I understand it is not the recommended hardware.

Hi @balintcenturio

Thanks for your interest in Riva,

Apologies you are facing issue, I will check further with the team,

Request to kindly share the command for which the error occurred and the complete logs of the command that causes error as txt file in this thread/post

Please also let us know the Nvidia Driver Version

Thanks

error.txt (8.0 KB)

Hi @rvinobha

Thank you for the quick response!

The command for which the error occured:

docker run --rm --gpus 1 \
      -v /home/user/riva_quickstart_v2.3.0/models_repo/:/data \
      nvcr.io/nvidia/riva/riva-speech:2.3.0-servicemaker -- \
      riva-deploy -f  /data/rmir/stt_en_conformer_ctc_large.rmir /data/models/

Nvidia driver version: 510.54

I also attached the complete error message as a txt file.

Did you get to resolve the issue? Is A6000 campatible with RIVA?

Hi @rvinobha!

Any update on this topic?

Hi @balintcenturio

Apologies for the delay,

Does this issue happen with our latest version 2.6.0

Thanks