Trtexec cannot convert QAT onnx model to trt model

HI, I used Tensorflow 2.4 to train a mix-precision model and then used the quantization aware training inside to fine-tune the model and then saved it as a .onnx file. But when I used the following command to converting my onnx model to trt model it raised an error. Details are shown below.
JP 4.6.2
Tf 2.4
tf2onnx 1.12

./trtexec --onnx=quantized_dirfrom_qat_model.onnx --saveEngine=quantized_dirfrom_qat_model_trt8.2.trt --minShapes=input_3:1x224x224x1 --optShapes=input_3:2x224x224x1 --maxShapes=input_3:2x224x224x1 --workspace=4096 --verbose  --int8

[08/03/2022-14:36:24] [V] [TRT] sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel_dequant [DequantizeLinear] inputs: [sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel -> (38, 1, 3, 3)[INT8]], [scale__99 -> (38)[FLOAT]], [zero_point__52 -> (38)[INT8]],
[08/03/2022-14:36:24] [V] [TRT] Registering layer: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel for ONNX node: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel
[08/03/2022-14:36:24] [E] Error[3]: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel: invalid weights type of Int8
Segmentation fault (core dumped)

I also attached my test onnx model
quantized_dirfrom_qat_model.onnx (232.8 KB)

Hi,

Thanks for reporting this.
Confirmed that we can reproduce this issue on Orin.

Do you use JetPack 5.0.1 DP?
For JetPack 5, the TensorRT version should be 8.4, which is different from your filename.

Thanks.

Here is the information from jtop, and still not working.


[08/04/2022-05:55:04] [V] [TRT] Searching for input: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel
[08/04/2022-05:55:04] [V] [TRT] Searching for input: scale__99
[08/04/2022-05:55:04] [V] [TRT] Searching for input: zero_point__52
[08/04/2022-05:55:04] [V] [TRT] sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel_dequant [DequantizeLinear] inputs: [sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel -> (38, 1, 3, 3)[INT8]], [scale__99 -> (38)[FLOAT]], [zero_point__52 -> (38)[INT8]], 
[08/04/2022-05:55:04] [V] [TRT] Registering layer: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel for ONNX node: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel
[08/04/2022-05:55:04] [E] Error[3]: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel: invalid weights type of Int8
Segmentation fault (core dumped)

HI any updates?
Here is the information from jtop, and still not working.


[08/04/2022-05:55:04] [V] [TRT] Searching for input: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel
[08/04/2022-05:55:04] [V] [TRT] Searching for input: scale__99
[08/04/2022-05:55:04] [V] [TRT] Searching for input: zero_point__52
[08/04/2022-05:55:04] [V] [TRT] sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel_dequant [DequantizeLinear] inputs: [sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel -> (38, 1, 3, 3)[INT8]], [scale__99 -> (38)[FLOAT]], [zero_point__52 -> (38)[INT8]], 
[08/04/2022-05:55:04] [V] [TRT] Registering layer: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel for ONNX node: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel
[08/04/2022-05:55:04] [E] Error[3]: sequential/mobilenet_0.15_224/quant_conv_dw_6/depthwise;sequential/mobilenet_0.15_224/quant_conv_dw_6/LastValueQuant/FakeQuantWithMinMaxVarsPerChannel: invalid weights type of Int8
Segmentation fault (core dumped)

Hi,

It looks like you are using Xavier instead of Orin.
I will move your topic to the Xavier board.

In your QAT model, there are some usages that are not supported by TensorRT:

  • Int8 zero-point
  • Quantizing the bias
  • Asymmetric quantization (zero-point != 0)

Please try our newly released quantization toolkit for TensorFlow to see if it helps:

Thanks.

from the link you commented, I saw the requirement for trt is 8.4, but we are using trt 8.2. Does this tool compatible with trt 8.2?
image

Hi,

Do you have dependencies on JetPack4.6.2?
If not, you can get TensorRT 8.4 with JetPack 5.

Thanks.