Model quantized with explicit precision mode (with Q/DQ nodes) failed in engine generation

Description

Here is the ONNX model I used to generate the engine: model

It is quantized with torch_quantization toolkit, following the most simple instructions that was given (Using quant_modules.initialize() to automatically replace all the supported layers, no manual Q/DQ placement), and calibrated following the same code provided here.

After the quantization, I exported the ONNX file with opset13 and pytorch_nightly (only nightly can do it for ops supporting reasons), it was successful and I used ONNX_simplifier to further simplify it, and it was successful.

According to the documentation, the explicitly quantized model can be directly compiled to an engine without further configuration. But I tried both trtexec with --int8 flag on and a custom engine building script (following the most basic example provided by the documentation here), both failed with this error message:

[05/05/2022-01:44:01] [TRT] [V] Running: ConstWeightsQuantizeFusion
[05/05/2022-01:44:01] [TRT] [V] ConstWeightsQuantizeFusion: Fusing update_block.flow_head.conv1.weight with QuantizeLinear_1641_quantize_scale_node
[05/05/2022-01:44:01] [TRT] [V] Running: ConstWeightsQuantizeFusion
[05/05/2022-01:44:01] [TRT] [V] ConstWeightsQuantizeFusion: Fusing update_block.flow_head.conv2.weight with QuantizeLinear_1648_quantize_scale_node
[05/05/2022-01:44:01] [TRT] [V] Running: VanillaSwapWithFollowingQ
[05/05/2022-01:44:01] [TRT] [V] Swapping Relu_631 with QuantizeLinear_652_quantize_scale_node
[05/05/2022-01:44:01] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:01] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [E] 2: [checkSanity.cpp::checkSanity::106] Error Code 2: Internal Error (Assertion regionNames.find(r->name) == regionNames.end() failed. Found duplicate region name onnx::Concat_1406_clone_1)
[05/05/2022-01:44:02] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::619] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

It sounds like there is a duplicated node in the graph, but I failed to find it as I search through the ONNX model. Please help me with it, thanks!

PS: The non-quantized version of the model can be successfully compiled to an engine without any issue.

Environment

TensorRT Version: 8.4 GA
GPU Type: NVIDIA GTX 3090
Nvidia Driver Version: 11.6
CUDA Version: 11.5
CUDNN Version: 11.5
Operating System + Version: Ubuntu 20.04 LT
Python Version (if applicable): 3.8.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.12 (Nightly build as May 5, 2022)
Baremetal or Container (if container which image + tag):

Relevant Files

ONNX file

Steps To Reproduce

Just run trtexec with following arguments will give you the error:

trtexec --onnx=onnx_dir --saveEngine='engine_dir ’ --workspace=4096 --int8 --fp16 --noTF32 --verbose --noDataTransfers --separateProfileRun --dumpProfile --useCudaGraph > ‘log_dir’

Hi, Please refer to the below links to perform inference in INT8

Thanks!

To add to this problem, I also tried quantization through TensorRT with a custom calibrator, although it is not stuck on the same node, but it gives the same assertion failure error message:

Completed parsing of ONNX file
Building an engine…
[05/05/2022-14:30:33] [TRT] [I] MatMul_707: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_709: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_711: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_760: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_762: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_764: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_732: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_785: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_745: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,512][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_798: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,512][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_747: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_800: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_849: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_851: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_853: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_904: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_876: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_889: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,512][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_891: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_906: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_908: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_931: broadcasting input1 to make tensors conform, dims(input0)=[4,880,256][NONE] dims(input1)=[1,256,256][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_944: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,512][NONE].
[05/05/2022-14:30:33] [TRT] [I] MatMul_946: broadcasting input1 to make tensors conform, dims(input0)=[4,880,512][NONE] dims(input1)=[1,512,256][NONE].
[05/05/2022-14:30:34] [TRT] [V] Original: 2842 layers
[05/05/2022-14:30:34] [TRT] [V] After dead-layer removal: 2842 layers
[05/05/2022-14:30:34] [TRT] [E] 2: [checkSanity.cpp::checkSanity::106] Error Code 2: Internal Error (Assertion regionNames.find(r->name) == regionNames.end() failed. Found duplicate region name (Unnamed Layer* 435) [Constant]_output’)
[05/05/2022-14:30:34] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::619] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Completed creating Engine
Traceback (most recent call last):
File “build_eg.py”, line 7, in
f.write(serialized_engine)
TypeError: a bytes-like object is required, not ‘NoneType’

Here is the non-quantizatized ONNX model

Here is the calibrator I used, it is just a mild modification from the official sample: script

model building: script

Thank you for your prompt response. I have read those materials multiple times already before I tried it myself. There aren’t many thorough materials or examples about the whole pipeline of TensorRT based quantization or pytorch_quantization → TensorRT engine compiling in the first place.

It will be great if you could take a look at the raised issue and share with me some hints of how to deal with it. Thanks!

Hi,

We could reproduce the same error. This looks like a known issue. Which will be fixed in the future release.

Thank you.