I faced an error during the quantization test using TensorRT 8

I made a quantized tflite file using tensorflow model optimization tool.
After that, it was converted to onnx file and tried to convert to rt file.

I faced an error like below. how can i solve this problem.

and,

tflite → onnx → rt
Is this process possible?

I understand that tflite file does not operate the way I want it to in a windows. Can I check the speed reduction effect in the windows through the above process?

ModelImporter.cpp:119: Searching for input: scale__162
ModelImporter.cpp:119: Searching for input: zero_point__163
ModelImporter.cpp:125: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars [DequantizeLinear] inputs: [sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars;sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars/ReadVariableOp/resource;sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars/ReadVariableOp_1/resource → ()], [scale__162 → ()], [zero_point__163 → ()],
onnx2trt_utils.cpp:286: TensorRT currenly supports only zero shifts values for QuatizeLinear/DequantizeLinear ops
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
ImporterContext.hpp:120: Registering tensor: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars for ONNX tensor: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars
ModelImporter.cpp:179: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars [DequantizeLinear] outputs: [sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars → ()],
ModelImporter.cpp:103: Parsing node: sequential/efficientnetb0/quant_normalization/sub [Sub]
ModelImporter.cpp:119: Searching for input: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
ModelImporter.cpp:119: Searching for input: sequential/efficientnetb0/quant_normalization/Reshape
ModelImporter.cpp:125: sequential/efficientnetb0/quant_normalization/sub [Sub] inputs: [sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars → ()], [sequential/efficientnetb0/quant_normalization/Reshape → (1, 1, 1, 3)],
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/LastValueQuant/FakeQuantWithMinMaxVars: invalid weights type of Int8
ERROR: onnx2trt_utils.cpp:680 In function elementwiseHelper:
[8] Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && “Failed to broadcast tensors elementwise!”
Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && “Failed to broadcast tensors elementwise!”
Building Cuda Engine
Network must have at least one output
Network validation failed.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Thanks for reply

My onnx :

EFF_HWC_QAT_GPU.onnx (19.9 MB)

and,
I try to your first option. but there is no error.

and, I test in windows 10

Hi @jkm07232000
We currently support own QAT tool generated ONNX model.
You can refer to below sample for your reference. Hope it helps:
https://github.com/NVIDIA/FasterTransformer/tree/main/bert-quantization/bert-tf-quantization

Thanks