Riva Build fails for finetuned conformer NeMo models with batch size 1

pineapple9011 · October 27, 2022, 4:29pm

Please provide the following information when requesting support.

Hardware - GPU RTX 3090 (edit: A5000 too)
Hardware - CPU AMD
Operating System Ubuntu 20.04
Riva Version 2.6.0
Nvidia Driver Version 510.85.02

I’m trying to build a Conformer Large fine-tuned using NeMo v1.9.0.

I convert the model to the riva format using nemo2riva and then
use the exact riva build command parameters specified in the official docs here.

I tried with different combinations of the --nn.fp16_needs_obey_precision_pass and --nn.use_trt_fp32 flags, but I keep getting the following runtime error (see logs below)

RuntimeError: Shape provided for max is inconsistent with other shapes.

I don’t have this issue on other GPUs like the A5000 or the A100.

Riva build logs:

[NeMo I 2022-10-27 15:51:54 features:225] PADDING: 0
[NeMo I 2022-10-27 15:51:56 save_restore_connector:243] Model EncDecCTCModelBPE was successfully restored from /servicemaker-dev/CfCtcLg-SpeUni1024-DI-EATL.nemo.
[NeMo I 2022-10-27 15:51:56 schema:193] Found validation schema for nemo.collections.asr.models.EncDecCTCModelBPE at /usr/local/lib/python3.8/dist-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml
[NeMo I 2022-10-27 15:51:56 schema:222] Checking installed NeMo version ... 1.12.0 OK (>=1.1)
[NeMo I 2022-10-27 15:51:56 artifacts:65] Found model at ./model_weights.ckpt
INFO: Checking Nemo version for ConformerEncoder ...
[NeMo I 2022-10-27 15:51:56 schema:222] Checking installed NeMo version ... 1.12.0 OK (>=1.7.0rc0)
[NeMo I 2022-10-27 15:51:56 artifacts:142] Retrieved artifacts: dict_keys(['586831f07ef44e2188c1155b59144770_tokenizer.model', '598ba1b826b940a88351bb69bfcb321b_tokenizer.vocab', 'ac4a5bf0a7e845cc9d0f5d7989ee3d81_vocab.txt', 'model_config.yaml'])
[NeMo I 2022-10-27 15:51:56 cookbook:65] Exporting model EncDecCTCModelBPE with config=ExportConfig(export_subnet=None, export_format='ONNX', export_file='model_graph.onnx', encryption=None, autocast=True, max_dim=25000)
[NeMo I 2022-10-27 15:51:56 export_utils:365] Swapped 108 modules
[NeMo W 2022-10-27 15:51:56 nemo_logging:349] /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/modules/conformer_encoder.py:325: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.max_audio_length:

Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
...repeated many times
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
...repeated many times
[NeMo I 2022-10-27 15:53:03 exportable:86] Successfully exported EncDecCTCModelBPE to /tmp/tmpujga99fr/model_graph.onnx
[NeMo I 2022-10-27 15:53:07 cookbook:128] Saving to /servicemaker-dev/2.6.0/86/CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming.riva
[NeMo I 2022-10-27 15:53:27 convert:95] Model saved to /servicemaker-dev/2.6.0/86/CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming.riva
[2022-10-27 15:53:29,501][root][INFO] - Building model CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming
INFO: Created a temporary directory at /tmp/tmp7nddtxeu
INFO: Writing /tmp/tmp7nddtxeu/_remote_module_non_scriptable.py
[NeMo W 2022-10-27 15:53:32 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo_text_processing.g2p.modules.IPAG2P'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.PhonemizerTokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
2022-10-27 15:53:45,270 [INFO] Packing binaries for nn/ONNX : {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')}
2022-10-27 15:53:45,270 [INFO] Copying onnx:model_graph.onnx -> nn:nn-model_graph.onnx
2022-10-27 15:53:46,260 [INFO] Packing binaries for lm_decoder/ONNX : {'vocab_file': '/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', '586831f07ef44e2188c1155b59144770_tokenizer.model')}
2022-10-27 15:53:46,260 [INFO] Copying vocab_file:/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt -> lm_decoder:lm_decoder-riva_decoder_vocabulary.txt
2022-10-27 15:53:46,260 [INFO] Copying tokenizer_model:586831f07ef44e2188c1155b59144770_tokenizer.model -> lm_decoder:lm_decoder-586831f07ef44e2188c1155b59144770_tokenizer.model
2022-10-27 15:53:46,260 [INFO] Packing binaries for rescorer/ONNX : {'vocab_file': '/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt'}
2022-10-27 15:53:46,260 [INFO] Copying vocab_file:/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt -> rescorer:rescorer-riva_decoder_vocabulary.txt
2022-10-27 15:53:46,260 [INFO] Packing binaries for endpointing/ONNX : {'vocab_file': '/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt'}
2022-10-27 15:53:46,260 [INFO] Copying vocab_file:/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt -> endpointing:endpointing-riva_decoder_vocabulary.txt
2022-10-27 15:53:46,261 [INFO] Packing binaries for vad/ONNX : {}
2022-10-27 15:53:46,261 [INFO] Packing binaries for vad_nn/ONNX : {}
2022-10-27 15:53:46,261 [INFO] Saving to /servicemaker-dev/2.6.0/86/CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming.rmir
[2022-10-27 15:54:07,342][root][INFO] - Deploying model CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming
Cleaning directory /data/models/2.6.0
INFO: Created a temporary directory at /tmp/tmpbd_ptwk7
INFO: Writing /tmp/tmpbd_ptwk7/_remote_module_non_scriptable.py
[NeMo W 2022-10-27 15:54:10 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo_text_processing.g2p.modules.IPAG2P'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.PhonemizerTokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
2022-10-27 15:54:10,741 [INFO] Writing Riva model repository to '/data/models/2.6.0'...
2022-10-27 15:54:10,741 [INFO] The riva model repo target directory is /data/models/2.6.0
2022-10-27 15:54:22,308 [INFO] Using tensorrt with fp32
2022-10-27 15:54:22,308 [INFO] Extract_binaries for nn -> /data/models/2.6.0/riva-trt-CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming-am-streaming/1
2022-10-27 15:54:22,308 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/2.6.0/riva-trt-CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming-am-streaming/1
2022-10-27 15:54:23,488 [INFO] Printing copied artifacts:
2022-10-27 15:54:23,488 [INFO] {'onnx': '/data/models/2.6.0/riva-trt-CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming-am-streaming/1/model_graph.onnx'}
2022-10-27 15:54:23,488 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 971990455
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 971990455
[10/27/2022-15:54:28] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/27/2022-15:54:28] [TRT] [E] parsers/onnx/onnx2trt_utils.cpp:737: Found unsupported datatype (16) when importing initializer: onnx::MatMul_7410
[10/27/2022-15:54:28] [TRT] [E] parsers/onnx/ModelImporter.cpp:785: ERROR: parsers/onnx/ModelImporter.cpp:103 In function parseGraph:
[8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx) && "Failed to import initializer."
[10/27/2022-15:54:29] [TRT] [E] 4: Specified optimization profiles must satisfy MIN<=OPT<=MAX.
[10/27/2022-15:54:29] [TRT] [E] 3: [optimizationProfile.cpp::setDimensions::138] Error Code 3: API Usage Error (Parameter check failed at: runtime/common/optimizationProfile.cpp::setDimensions::138, condition: validate(mErrorStreams.getErrorRecorder(), newEntry, true)
)
2022-10-27 15:54:29,146 [ERROR] Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py", line 100, in deploy_from_rmir
generator.serialize_to_disk(
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 434, in serialize_to_disk
module.serialize_to_disk(repo_dir, rmir, config_only, verbose, overwrite)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 310, in serialize_to_disk
self.update_binary(version_dir, rmir, verbose)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/asr.py", line 159, in update_binary
RivaTRTConfigGenerator.update_binary_from_copied(self, version_dir, rmir, copied, verbose)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 702, in update_binary_from_copied
bindings = self.build_trt_engine_from_onnx(model_weights, engine_path=trt_file, verbose=verbose)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 614, in build_trt_engine_from_onnx
profile.set_shape(**p)
RuntimeError: Shape provided for max is inconsistent with other shapes.

pineapple9011 · October 27, 2022, 5:02pm

Quick Update

It looks like this was an issue with setting --max_batch_size=1.
When setting it to 32, I am able to build a model on different devices, so it is not a GPU specific issue.

rvinobha · November 1, 2022, 6:56am

HI @pineapple9011

Thanks for your interest in Riva

Thanks for the self triage and information on the max_batch_size param, i will convey about this to the internal team

Any other detail/issue please let us know

Thanks

Topic		Replies	Views
RIVA error, when deploying official Conformer ASR network Riva riva	10	1947	January 27, 2023
RIVA v2.15.0 fails to build NeMo model Riva	0	395	March 30, 2024
Not able to run LM fine tuned qurtznet model Riva riva	13	1266	October 8, 2021
Error in riva deployment Riva deployment aborted Riva ubuntu , nemo , riva	3	1109	February 27, 2023
Riva 1.8 riva_start.sh fail when build with language model Riva riva	3	1169	July 27, 2022
Fine Tune the hind Nvidia Nemo Riva inception	25	1678	January 25, 2023
Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh Riva inception	3	779	August 12, 2022
Riva model deployment issue Riva inception	8	1561	April 4, 2024
Failed to convert Nemo model to Riva (nemo2riva) - ASR Riva nemo	4	1169	May 31, 2023
Encounter "Unsupported model IR version: 9, max supported IR version: 8" during deploy custom model in riva for TTS Riva onnx , riva	9	3297	January 22, 2024

Riva Build fails for finetuned conformer NeMo models with batch size 1

Related topics