Please provide the following information when requesting support.
Hardware - GPU RTX 3090 (edit: A5000 too)
Hardware - CPU AMD
Operating System Ubuntu 20.04
Riva Version 2.6.0
Nvidia Driver Version 510.85.02
I’m trying to build a Conformer Large fine-tuned using NeMo v1.9.0.
I convert the model to the riva format using nemo2riva
and then
use the exact riva build command parameters specified in the official docs here.
I tried with different combinations of the --nn.fp16_needs_obey_precision_pass
and --nn.use_trt_fp32
flags, but I keep getting the following runtime error (see logs below)
RuntimeError: Shape provided for max is inconsistent with other shapes.
I don’t have this issue on other GPUs like the A5000 or the A100.
Riva build logs:
[NeMo I 2022-10-27 15:51:54 features:225] PADDING: 0
[NeMo I 2022-10-27 15:51:56 save_restore_connector:243] Model EncDecCTCModelBPE was successfully restored from /servicemaker-dev/CfCtcLg-SpeUni1024-DI-EATL.nemo.
[NeMo I 2022-10-27 15:51:56 schema:193] Found validation schema for nemo.collections.asr.models.EncDecCTCModelBPE at /usr/local/lib/python3.8/dist-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml
[NeMo I 2022-10-27 15:51:56 schema:222] Checking installed NeMo version ... 1.12.0 OK (>=1.1)
[NeMo I 2022-10-27 15:51:56 artifacts:65] Found model at ./model_weights.ckpt
INFO: Checking Nemo version for ConformerEncoder ...
[NeMo I 2022-10-27 15:51:56 schema:222] Checking installed NeMo version ... 1.12.0 OK (>=1.7.0rc0)
[NeMo I 2022-10-27 15:51:56 artifacts:142] Retrieved artifacts: dict_keys(['586831f07ef44e2188c1155b59144770_tokenizer.model', '598ba1b826b940a88351bb69bfcb321b_tokenizer.vocab', 'ac4a5bf0a7e845cc9d0f5d7989ee3d81_vocab.txt', 'model_config.yaml'])
[NeMo I 2022-10-27 15:51:56 cookbook:65] Exporting model EncDecCTCModelBPE with config=ExportConfig(export_subnet=None, export_format='ONNX', export_file='model_graph.onnx', encryption=None, autocast=True, max_dim=25000)
[NeMo I 2022-10-27 15:51:56 export_utils:365] Swapped 108 modules
[NeMo W 2022-10-27 15:51:56 nemo_logging:349] /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/modules/conformer_encoder.py:325: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.max_audio_length:
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
...repeated many times
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
...repeated many times
[NeMo I 2022-10-27 15:53:03 exportable:86] Successfully exported EncDecCTCModelBPE to /tmp/tmpujga99fr/model_graph.onnx
[NeMo I 2022-10-27 15:53:07 cookbook:128] Saving to /servicemaker-dev/2.6.0/86/CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming.riva
[NeMo I 2022-10-27 15:53:27 convert:95] Model saved to /servicemaker-dev/2.6.0/86/CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming.riva
[2022-10-27 15:53:29,501][root][INFO] - Building model CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming
INFO: Created a temporary directory at /tmp/tmp7nddtxeu
INFO: Writing /tmp/tmp7nddtxeu/_remote_module_non_scriptable.py
[NeMo W 2022-10-27 15:53:32 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo_text_processing.g2p.modules.IPAG2P'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.PhonemizerTokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:53:33 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
2022-10-27 15:53:45,270 [INFO] Packing binaries for nn/ONNX : {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')}
2022-10-27 15:53:45,270 [INFO] Copying onnx:model_graph.onnx -> nn:nn-model_graph.onnx
2022-10-27 15:53:46,260 [INFO] Packing binaries for lm_decoder/ONNX : {'vocab_file': '/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', '586831f07ef44e2188c1155b59144770_tokenizer.model')}
2022-10-27 15:53:46,260 [INFO] Copying vocab_file:/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt -> lm_decoder:lm_decoder-riva_decoder_vocabulary.txt
2022-10-27 15:53:46,260 [INFO] Copying tokenizer_model:586831f07ef44e2188c1155b59144770_tokenizer.model -> lm_decoder:lm_decoder-586831f07ef44e2188c1155b59144770_tokenizer.model
2022-10-27 15:53:46,260 [INFO] Packing binaries for rescorer/ONNX : {'vocab_file': '/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt'}
2022-10-27 15:53:46,260 [INFO] Copying vocab_file:/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt -> rescorer:rescorer-riva_decoder_vocabulary.txt
2022-10-27 15:53:46,260 [INFO] Packing binaries for endpointing/ONNX : {'vocab_file': '/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt'}
2022-10-27 15:53:46,260 [INFO] Copying vocab_file:/tmp/tmp3m3ubaad/riva_decoder_vocabulary.txt -> endpointing:endpointing-riva_decoder_vocabulary.txt
2022-10-27 15:53:46,261 [INFO] Packing binaries for vad/ONNX : {}
2022-10-27 15:53:46,261 [INFO] Packing binaries for vad_nn/ONNX : {}
2022-10-27 15:53:46,261 [INFO] Saving to /servicemaker-dev/2.6.0/86/CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming.rmir
[2022-10-27 15:54:07,342][root][INFO] - Deploying model CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming
Cleaning directory /data/models/2.6.0
INFO: Created a temporary directory at /tmp/tmpbd_ptwk7
INFO: Writing /tmp/tmpbd_ptwk7/_remote_module_non_scriptable.py
[NeMo W 2022-10-27 15:54:10 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo_text_processing.g2p.modules.IPAG2P'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.PhonemizerTokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-10-27 15:54:10 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
2022-10-27 15:54:10,741 [INFO] Writing Riva model repository to '/data/models/2.6.0'...
2022-10-27 15:54:10,741 [INFO] The riva model repo target directory is /data/models/2.6.0
2022-10-27 15:54:22,308 [INFO] Using tensorrt with fp32
2022-10-27 15:54:22,308 [INFO] Extract_binaries for nn -> /data/models/2.6.0/riva-trt-CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming-am-streaming/1
2022-10-27 15:54:22,308 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/2.6.0/riva-trt-CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming-am-streaming/1
2022-10-27 15:54:23,488 [INFO] Printing copied artifacts:
2022-10-27 15:54:23,488 [INFO] {'onnx': '/data/models/2.6.0/riva-trt-CfLgTest-lps16-rps16-cs08-vsth200-vsph800-vstth02-vspth098-bs1-igc1-streaming-am-streaming/1/model_graph.onnx'}
2022-10-27 15:54:23,488 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 971990455
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 971990455
[10/27/2022-15:54:28] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/27/2022-15:54:28] [TRT] [E] parsers/onnx/onnx2trt_utils.cpp:737: Found unsupported datatype (16) when importing initializer: onnx::MatMul_7410
[10/27/2022-15:54:28] [TRT] [E] parsers/onnx/ModelImporter.cpp:785: ERROR: parsers/onnx/ModelImporter.cpp:103 In function parseGraph:
[8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx) && "Failed to import initializer."
[10/27/2022-15:54:29] [TRT] [E] 4: Specified optimization profiles must satisfy MIN<=OPT<=MAX.
[10/27/2022-15:54:29] [TRT] [E] 3: [optimizationProfile.cpp::setDimensions::138] Error Code 3: API Usage Error (Parameter check failed at: runtime/common/optimizationProfile.cpp::setDimensions::138, condition: validate(mErrorStreams.getErrorRecorder(), newEntry, true)
)
2022-10-27 15:54:29,146 [ERROR] Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py", line 100, in deploy_from_rmir
generator.serialize_to_disk(
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 434, in serialize_to_disk
module.serialize_to_disk(repo_dir, rmir, config_only, verbose, overwrite)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 310, in serialize_to_disk
self.update_binary(version_dir, rmir, verbose)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/asr.py", line 159, in update_binary
RivaTRTConfigGenerator.update_binary_from_copied(self, version_dir, rmir, copied, verbose)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 702, in update_binary_from_copied
bindings = self.build_trt_engine_from_onnx(model_weights, engine_path=trt_file, verbose=verbose)
File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 614, in build_trt_engine_from_onnx
profile.set_shape(**p)
RuntimeError: Shape provided for max is inconsistent with other shapes.