Building TensorRT 8 engine from ONNX quantized model fails

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.0.1.6
GPU Type: 2080
Nvidia Driver Version: 470.63.01
CUDA Version: 11.3
CUDNN Version: 8.0
Operating System + Version: Ubuntu 1804
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.9

Relevant Files

I successfully calibrated my pruned model using pytorch -quantization toolkit and exported it to ONNX file.
However when I try to build the engine from it, I receive the following error :

[TensorRT] VERBOSE: Parsing node: QuantizeLinear_694 [QuantizeLinear]
[TensorRT] VERBOSE: Searching for input: uplayer3.weight
[TensorRT] VERBOSE: Searching for input: 1270
[TensorRT] VERBOSE: Searching for input: 1396
[TensorRT] VERBOSE: QuantizeLinear_694 [QuantizeLinear] inputs: [uplayer3.weight → (64, 55, 2, 2)[FLOAT]], [1270 → (55)[FLOAT]], [1396 → (55)[INT8]],
[TensorRT] VERBOSE: Registering layer: uplayer3.weight for ONNX node: uplayer3.weight
[TensorRT] VERBOSE: Registering tensor: 1273 for ONNX tensor: 1273
[TensorRT] VERBOSE: QuantizeLinear_694 [QuantizeLinear] outputs: [1273 → (64, 55, 2, 2)[FLOAT]],
[TensorRT] VERBOSE: Parsing node: DequantizeLinear_695 [DequantizeLinear]
[TensorRT] VERBOSE: Searching for input: 1273
[TensorRT] VERBOSE: Searching for input: 1270
[TensorRT] VERBOSE: Searching for input: 1396
[TensorRT] VERBOSE: DequantizeLinear_695 [DequantizeLinear] inputs: [1273 → (64, 55, 2, 2)[FLOAT]], [1270 → (55)[FLOAT]], [1396 → (55)[INT8]],
[TensorRT] VERBOSE: Registering tensor: 1274 for ONNX tensor: 1274
[TensorRT] VERBOSE: DequantizeLinear_695 [DequantizeLinear] outputs: [1274 → (64, 55, 2, 2)[FLOAT]],
[TensorRT] VERBOSE: Parsing node: ConvTranspose_696 [ConvTranspose]
[TensorRT] VERBOSE: Searching for input: 1269
[TensorRT] VERBOSE: Searching for input: 1274
[TensorRT] VERBOSE: ConvTranspose_696 [ConvTranspose] inputs: [1269 → (1, 64, 864, 120)[FLOAT]], [1274 → (64, 55, 2, 2)[FLOAT]],
[TensorRT] VERBOSE: Convolution input dimensions: (1, 64, 864, 120)
Traceback (most recent call last):
File “/vayaalgo/Work/Pruning/Convert2TRT/main.py”, line 247, in
import_ONNX(cfg)
File “/vayaalgo/Work/Pruning/Convert2TRT/utils/models_quant.py”, line 15, in import_ONNX
engine = backend.prepare(depthnet, verbose=True, device=‘CUDA:0’)
File “/usr/local/lib/python3.7/dist-packages/onnx_tensorrt-8.0.1-py3.7.egg/onnx_tensorrt/backend.py”, line 236, in prepare
File “/usr/local/lib/python3.7/dist-packages/onnx_tensorrt-8.0.1-py3.7.egg/onnx_tensorrt/backend.py”, line 68, in init
RuntimeError: While parsing node number 696:
onnx2trt_utils.cpp:2128 In function convDeconvMultiInput:
[6] Assertion failed: (nChannel == -1 || C * ngroup == nChannel) && "The attribute group and the kernel shape misalign with the channel size of the input tensor. "

The problem comes from the ConvTranspose block as it has a Tensor of size 1x64x864x120 at its input but, has weights of 64x55x2x2, thus meaning that input channel is not equal to output channel.

I already found same issue before here: Assertion failed : QTA onnx imported by tensorrt 8 get error · Issue #1293 · NVIDIA/TensorRT · GitHub

Steps To Reproduce

My code:

import onnx
import onnx_tensorrt.backend as backend

depthnet = onnx.load(‘quantized_depthnet.onnx’)
engine = backend.prepare(depthnet, verbose=True, device=‘CUDA:0’)
depthnet_input_tensor = np.random.random(size=(1, 3, 3456,480).astype(np.float32)
output_data = engine.run(depthnet_input_tensor)

My model is also attached :quantized_depthnet.onnx (7.7 MB)

I would like to stress out that the input channels number of the ConvTranspose block being not equal to output channels number is a result of this model pruning.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi NVES?

Any updates besides automatic answer?

Hi,

We could reproduce the issue, Please allow us some time to work on this.

Thank you.

Hi,

As you mentioned, due to input channel is not equal, its leading the error. If we observe ConvTranspose_665 and previous nodes have correct inputs. We recommend you to please correct it and try.

You can checkout similar issue here Error in converting caffe xilinx yolo v3 model to Tensorrt

Thank you.