Converting to TRT a model from Quantization Aware Training without applying calibration

Description

I did fine-tune training of a detector model in Tensorflow 2.3, with Quantization Aware Training (QAT). I converted the model to ONNX and tried to convert it to int8.
I am able to convert it to int8 model in TensorRT only when I’m applying also the Post Training Quantization process with a calibration dataset - but I want to optionally convert the model to int8 and TensorRT without calibration, since calibration shouldn’t be needed after I did Quantization Aware Training.
I followed your slides here - page 32, specifically:

but I get the following error:

[TensorRT] ERROR: …/builder/Network.cpp (1653) - Assertion Error in validateExplicitPrecision: 0 (layer.getNbInputs() == 2)

I don’t understand how having two inputs is relevant here… My model gets 1 input (image).
This is the function I used (python) for building the engine:
build_engine.py (4.2 KB)

If I call it with int8_calibration_flag=False I get the above error.

Could you please supply a working example for converting an onnx model originated from Quantization Aware Training to a TensorRT engine with int8 precision, without calibration?

Environment

TensorRT Version : 7.1.2
CUDA Version : 11.0
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : The model was trained on tf 2.3, converted to onnx, and then converted to tensorRT engine.

I can’t share the relevant model for this.

Any help will be appreciated!

Hi @weissrael,

We don’t have samples to share. But we have a developer guide with details of QAT using tensorflow and converting to ONNX model.
For your reference.

Thank you.

@spolisetty thank you for the links.
When I wrote the post, I applied the quantization on the model using Keras quantize_model, since my model is composed of concatenated keras Model layers (training is with TF2).
I tried adjusting my model quantization training to what Nvidia uses in the docs that you shared: TF’s quantize_and_dequantize_v2. The onnx graph looks valid - here is how the structure of the quantization layers look like:


but there’s the same error when converting the onnx model to TRT:

[TensorRT] ERROR: ../builder/Network.cpp (1653) - Assertion Error in validateExplicitPrecision: 0 (layer.getNbInputs() == 2)

That’s the same error I had in the post above.
The engine building is using what Nvidia suggests in the build_engine sample here .

As can be seen in Nvidia’s main page of the QAT sample repo, the QAT of my model was done with the recommended function of tf.quantization.quantize_and_dequantize.

So the main questions that I still have are:

  1. What causes such an error? How can this be solved?
  2. Why using a pre-defined quant_scale and dequant_scale when the model was trained in QAT? How can TRT extract these attributes from the model for the dynamic range of each layer? This is needed for converting to TRT successfully without calibration.

Hi @weissrael,

Could you please share us model and TRT only reproduce scripts for better debugging.

Thank you.

Hi @spolisetty ,
I tried using trtexec for converting the onnx to TRT with --int8 flag and I saw that it successfully converted the onnx model to TRT.
I tried figuring out why trtexec converts the onnx to trt the QAT model successfully, while our conversion code fails (which is based on Nvidia samples)
I checked the cpp source code of trtexec and saw that it calls a function that’s crucial for QAT:
setTensorScales function
And our code doesn’t have this logic- and neither does Nvidia has this in their python code samples. The closest I’ve seen to this is this function , but that example handles only the input and they didn’t need to handle all the layers…

So long story short- I used trtexec for converting to TRT and saving the engine file. But I do think that you should include in your conversion code samples the logic in the source code of trtexec that handles QAT models.

If you could answer to one quesion I still have regarding setTensorScales function in the above link:
This function uses arbitrary inScale / outScale for the dynamic range. I thought It should use the values learnt by the network - each QuantizeLinear / DequantizeLinear has a field of y_dequantize_scale which isn’t used in this function for some reason. While I think this scaling is handled when parsing the model since it include the QuantizeLinear / DequantizeLinear , why does trtexec uses arbitary inScale / outScale values in this setTensorScales function?

Hi @weissrael,

Quantize/Dequantize handling will be improved in the future releases.

Thank you.

1 Like