Description
Here is the ONNX model I used to generate the engine: model
It is quantized with torch_quantization toolkit, following the most simple instructions that was given (Using quant_modules.initialize() to automatically replace all the supported layers, no manual Q/DQ placement), and calibrated following the same code provided here.
After the quantization, I exported the ONNX file with opset13 and pytorch_nightly (only nightly can do it for ops supporting reasons), it was successful and I used ONNX_simplifier to further simplify it, and it was successful.
According to the documentation, the explicitly quantized model can be directly compiled to an engine without further configuration. But I tried both trtexec with --int8 flag on and a custom engine building script (following the most basic example provided by the documentation here), both failed with this error message:
[05/05/2022-01:44:01] [TRT] [V] Running: ConstWeightsQuantizeFusion
[05/05/2022-01:44:01] [TRT] [V] ConstWeightsQuantizeFusion: Fusing update_block.flow_head.conv1.weight with QuantizeLinear_1641_quantize_scale_node
[05/05/2022-01:44:01] [TRT] [V] Running: ConstWeightsQuantizeFusion
[05/05/2022-01:44:01] [TRT] [V] ConstWeightsQuantizeFusion: Fusing update_block.flow_head.conv2.weight with QuantizeLinear_1648_quantize_scale_node
[05/05/2022-01:44:01] [TRT] [V] Running: VanillaSwapWithFollowingQ
[05/05/2022-01:44:01] [TRT] [V] Swapping Relu_631 with QuantizeLinear_652_quantize_scale_node
[05/05/2022-01:44:01] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:01] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [V] Running: SplitQAcrossPrecedingFanIn
[05/05/2022-01:44:02] [TRT] [E] 2: [checkSanity.cpp::checkSanity::106] Error Code 2: Internal Error (Assertion regionNames.find(r->name) == regionNames.end() failed. Found duplicate region name onnx::Concat_1406_clone_1)
[05/05/2022-01:44:02] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::619] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
It sounds like there is a duplicated node in the graph, but I failed to find it as I search through the ONNX model. Please help me with it, thanks!
PS: The non-quantized version of the model can be successfully compiled to an engine without any issue.
Environment
TensorRT Version: 8.4 GA
GPU Type: NVIDIA GTX 3090
Nvidia Driver Version: 11.6
CUDA Version: 11.5
CUDNN Version: 11.5
Operating System + Version: Ubuntu 20.04 LT
Python Version (if applicable): 3.8.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.12 (Nightly build as May 5, 2022)
Baremetal or Container (if container which image + tag):
Relevant Files
ONNX file
Steps To Reproduce
Just run trtexec with following arguments will give you the error:
trtexec --onnx=onnx_dir --saveEngine='engine_dir ’ --workspace=4096 --int8 --fp16 --noTF32 --verbose --noDataTransfers --separateProfileRun --dumpProfile --useCudaGraph > ‘log_dir’