Converting to TRT a model from Quantization Aware Training without applying calibration

Description

I did fine-tune training of a detector model in Tensorflow 2.3, with Quantization Aware Training (QAT). I converted the model to ONNX and tried to convert it to int8.
I am able to convert it to int8 model in TensorRT only when I’m applying also the Post Training Quantization process with a calibration dataset - but I want to optionally convert the model to int8 and TensorRT without calibration, since calibration shouldn’t be needed after I did Quantization Aware Training.
I followed your slides here - page 32, specifically:

but I get the following error:

[TensorRT] ERROR: …/builder/Network.cpp (1653) - Assertion Error in validateExplicitPrecision: 0 (layer.getNbInputs() == 2)

I don’t understand how having two inputs is relevant here… My model gets 1 input (image).
This is the function I used (python) for building the engine:
build_engine.py (4.2 KB)

If I call it with int8_calibration_flag=False I get the above error.

Could you please supply a working example for converting an onnx model originated from Quantization Aware Training to a TensorRT engine with int8 precision, without calibration?

Environment

TensorRT Version : 7.1.2
CUDA Version : 11.0
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : The model was trained on tf 2.3, converted to onnx, and then converted to tensorRT engine.

I can’t share the relevant model for this.

Any help will be appreciated!

Hi @weissrael,

We don’t have samples to share. But we have a developer guide with details of QAT using tensorflow and converting to ONNX model.
For your reference.

Thank you.