Converting to TRT a model from Quantization Aware Training without applying calibration

weissrael · January 12, 2021, 6:36pm

Description

I did fine-tune training of a detector model in Tensorflow 2.3, with Quantization Aware Training (QAT). I converted the model to ONNX and tried to convert it to int8.
I am able to convert it to int8 model in TensorRT only when I’m applying also the Post Training Quantization process with a calibration dataset - but I want to optionally convert the model to int8 and TensorRT without calibration, since calibration shouldn’t be needed after I did Quantization Aware Training.
I followed your slides here - page 32, specifically:

but I get the following error:

[TensorRT] ERROR: …/builder/Network.cpp (1653) - Assertion Error in validateExplicitPrecision: 0 (layer.getNbInputs() == 2)

I don’t understand how having two inputs is relevant here… My model gets 1 input (image).
This is the function I used (python) for building the engine:
build_engine.py (4.2 KB)

If I call it with int8_calibration_flag=False I get the above error.

Could you please supply a working example for converting an onnx model originated from Quantization Aware Training to a TensorRT engine with int8 precision, without calibration?

Environment

TensorRT Version : 7.1.2
CUDA Version : 11.0
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : The model was trained on tf 2.3, converted to onnx, and then converted to tensorRT engine.

I can’t share the relevant model for this.

Any help will be appreciated!

spolisetty · January 13, 2021, 7:33am

Hi @weissrael,

We don’t have samples to share. But we have a developer guide with details of QAT using tensorflow and converting to ONNX model.
For your reference.

Thank you.

weissrael · January 18, 2021, 1:30pm

@spolisetty thank you for the links.
When I wrote the post, I applied the quantization on the model using Keras quantize_model, since my model is composed of concatenated keras Model layers (training is with TF2).
I tried adjusting my model quantization training to what Nvidia uses in the docs that you shared: TF’s quantize_and_dequantize_v2. The onnx graph looks valid - here is how the structure of the quantization layers look like:

but there’s the same error when converting the onnx model to TRT:

[TensorRT] ERROR: ../builder/Network.cpp (1653) - Assertion Error in validateExplicitPrecision: 0 (layer.getNbInputs() == 2)

That’s the same error I had in the post above.
The engine building is using what Nvidia suggests in the build_engine sample here .

As can be seen in Nvidia’s main page of the QAT sample repo, the QAT of my model was done with the recommended function of tf.quantization.quantize_and_dequantize.

So the main questions that I still have are:

What causes such an error? How can this be solved?
Why using a pre-defined quant_scale and dequant_scale when the model was trained in QAT? How can TRT extract these attributes from the model for the dynamic range of each layer? This is needed for converting to TRT successfully without calibration.

spolisetty · January 20, 2021, 10:27am

Hi @weissrael,

Could you please share us model and TRT only reproduce scripts for better debugging.

Thank you.

weissrael · January 27, 2021, 10:40am

Hi @spolisetty ,
I tried using trtexec for converting the onnx to TRT with --int8 flag and I saw that it successfully converted the onnx model to TRT.
I tried figuring out why trtexec converts the onnx to trt the QAT model successfully, while our conversion code fails (which is based on Nvidia samples)
I checked the cpp source code of trtexec and saw that it calls a function that’s crucial for QAT:
setTensorScales function
And our code doesn’t have this logic- and neither does Nvidia has this in their python code samples. The closest I’ve seen to this is this function , but that example handles only the input and they didn’t need to handle all the layers…

So long story short- I used trtexec for converting to TRT and saving the engine file. But I do think that you should include in your conversion code samples the logic in the source code of trtexec that handles QAT models.

If you could answer to one quesion I still have regarding setTensorScales function in the above link:
This function uses arbitrary inScale / outScale for the dynamic range. I thought It should use the values learnt by the network - each QuantizeLinear / DequantizeLinear has a field of y_dequantize_scale which isn’t used in this function for some reason. While I think this scaling is handled when parsing the model since it include the QuantizeLinear / DequantizeLinear , why does trtexec uses arbitary inScale / outScale values in this setTensorScales function?

spolisetty · February 2, 2021, 9:40am

Hi @weissrael,

Quantize/Dequantize handling will be improved in the future releases.

Thank you.

Topic		Replies	Views
TensorRT conversion issues of ONNX model trained with Quantization Aware Training + custom quantization scale TensorRT tensorrt	5	1372	April 14, 2021
Fake quantization ONNX model parse ERROR using TensorRT 8 TensorRT	3	789	September 27, 2021
Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized? TensorRT tensorrt , pytorch , onnx	12	2670	December 4, 2022
Convert int8-onnx model to trt engine? TensorRT onnx	6	1074	April 29, 2023
ONNX to TRT Engine conversion Error TensorRT tensorrt	8	3684	May 25, 2022
ONNX Model INT8 Engine Build TensorRT tensorrt , jetson-inference , calibration , onnx	3	1907	July 26, 2022
Automatic dynamic ranges handling for TensorRT conversion of Quantization Aware Training model TensorRT tensorrt , tensorflow	1	632	April 23, 2021
YOLOX - Quantize int8 and convert to TensorRT engine TensorRT tensorrt , jetson-inference , python	3	1947	September 8, 2023
Onnx model to TRT conversion error TensorRT	6	3280	April 15, 2022
Problem with converting ONNX quantized models to TensorRT Jetson AGX Xavier tensorrt , onnx	6	1591	December 22, 2021

Converting to TRT a model from Quantization Aware Training without applying calibration

Description

Environment

Related topics