## Description

I trained in Tensorflow 2 a detection model with Quantization Aware Training (QAT). The quantization ops were added after the following pattern:

Conv2D → BatchNorm → Activation (as Nvidia guidelines say for QAT)

For a tensor `x`

, I demanded the quantization scale to be custom for each quantized tensor in the following way:

`x = tf.quantization.quantize_and_dequantize(x, input_min=-127, input_max=127, range_given=False)`

The argument `range_given=False`

means the input_min / input_max are ignored, so the min/max values will be taken from each quantized tensor.

`tf2onnx`

recently added support for this mode, but it seems TensorRT doesn’t support such ONNX models: When trying to convert the model to TRT, I get the following error when parsing the onnx model:

```
[TensorRT] VERBOSE: ModelImporter.cpp:125: QuantLinearNode__20 [QuantizeLinear] inputs: [StatefulPartitionedCall/functional_3/tiny_yolov3/yolo_darknet/leaky_re_lu/LeakyRelu:0 -> (1, 16, 224, 1408)], [Max__18:0 -> ()], [zero_point__139 -> ()],
ERROR: Failed to parse the ONNX file.
In node -1 (importQuantizeLinear): INVALID_NODE: Assertion failed: inputs.at(1).is_weights()
```

The structure of a quantized layer with `range_given=True`

(using the same quant x_scale for all layers: 64/127 ~= 0.503)

The structure of a quantized layer with `range_given=False`

(getting the quant x_scale from each tensor’s values):

I tried using `onnx-graphsurgeon`

with `fold_constants`

function on the onnx graph, but this didn’t help- I had the same error.

I would like to know if there’s a workaround for converting such an ONNX model to TensorRT without needing to update jetpack etc.

## Environment

**TensorRT Version** : 7.1.2

**CUDA Version** : 11.0

**Operating System + Version** : Ubuntu 18.04

**Python Version (if applicable)** : 3.6

**TensorFlow Version (if applicable)** : The model was trained on tf 2.3, converted to onnx, and then converted to tensorRT engine.

I can’t share the relevant model for this.

Any help will be appreciated!