I did fine-tune training of a detector model in Tensorflow 2.3, with Quantization Aware Training (QAT). I converted the model to ONNX and tried to convert it to int8.
I am able to convert it to int8 model in TensorRT only when I’m applying also the Post Training Quantization process with a calibration dataset - but I want to optionally convert the model to int8 and TensorRT without calibration, since calibration shouldn’t be needed after I did Quantization Aware Training.
I followed your slides here - page 32, specifically:
but I get the following error:
[TensorRT] ERROR: …/builder/Network.cpp (1653) - Assertion Error in validateExplicitPrecision: 0 (layer.getNbInputs() == 2)
I don’t understand how having two inputs is relevant here… My model gets 1 input (image).
This is the function I used (python) for building the engine:
build_engine.py (4.2 KB)
If I call it with
int8_calibration_flag=False I get the above error.
Could you please supply a working example for converting an onnx model originated from Quantization Aware Training to a TensorRT engine with int8 precision, without calibration?
TensorRT Version : 7.1.2
CUDA Version : 11.0
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : The model was trained on tf 2.3, converted to onnx, and then converted to tensorRT engine.
I can’t share the relevant model for this.
Any help will be appreciated!