I faced an error during the quantization test using TensorRT 8

jkm07232000 · November 5, 2021, 8:43am

I made a quantized tflite file using tensorflow model optimization tool.
After that, it was converted to onnx file and tried to convert to rt file.

I faced an error like below. how can i solve this problem.

and,

tflite → onnx → rt
Is this process possible?

I understand that tflite file does not operate the way I want it to in a windows. Can I check the speed reduction effect in the windows through the above process?

ModelImporter.cpp:119: Searching for input: scale__162
ModelImporter.cpp:119: Searching for input: zero_point__163
ModelImporter.cpp:125: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars [DequantizeLinear] inputs: [sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars;sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars/ReadVariableOp/resource;sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars/ReadVariableOp_1/resource → ()], [scale__162 → ()], [zero_point__163 → ()],
onnx2trt_utils.cpp:286: TensorRT currenly supports only zero shifts values for QuatizeLinear/DequantizeLinear ops
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
ImporterContext.hpp:120: Registering tensor: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars for ONNX tensor: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars
ModelImporter.cpp:179: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars [DequantizeLinear] outputs: [sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars → ()],
ModelImporter.cpp:103: Parsing node: sequential/efficientnetb0/quant_normalization/sub [Sub]
ModelImporter.cpp:119: Searching for input: sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
ModelImporter.cpp:119: Searching for input: sequential/efficientnetb0/quant_normalization/Reshape
ModelImporter.cpp:125: sequential/efficientnetb0/quant_normalization/sub [Sub] inputs: [sequential/efficientnetb0/quantize_layer/AllValuesQuantize/FakeQuantWithMinMaxVars → ()], [sequential/efficientnetb0/quant_normalization/Reshape → (1, 1, 1, 3)],
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd;sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/BiasAdd/ReadVariableOp/resource_dequant_dequantize_scale_node: at least 4 dimensions are required for input.
sequential/efficientnetb0/quant_block7a_se_expand/Conv2D;sequential/efficientnetb0/quant_block7a_se_expand/LastValueQuant/FakeQuantWithMinMaxVars: invalid weights type of Int8
ERROR: onnx2trt_utils.cpp:680 In function elementwiseHelper:
[8] Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && “Failed to broadcast tensors elementwise!”
Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && “Failed to broadcast tensors elementwise!”
Building Cuda Engine
Network must have at least one output
Network validation failed.

NVES · November 5, 2021, 9:08am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

jkm07232000 · November 8, 2021, 6:17am

Thanks for reply

My onnx :

EFF_HWC_QAT_GPU.onnx (19.9 MB)

and,
I try to your first option. but there is no error.

and, I test in windows 10

SunilJB · November 9, 2021, 7:56am

Hi @jkm07232000
We currently support own QAT tool generated ONNX model.
You can refer to below sample for your reference. Hope it helps:
https://github.com/NVIDIA/FasterTransformer/tree/main/bert-quantization/bert-tf-quantization

Thanks

Topic		Replies	Views
TensorRT quantization uses int8 or uint8 TensorRT tensorrt	1	932	June 6, 2023
Trtexec cannot convert QAT onnx model to trt model Jetson AGX Xavier tensorrt	7	795	August 9, 2022
Convert int8-onnx model to trt engine? TensorRT onnx	6	1207	April 29, 2023
ONNX runtime result differs from int8 quantized pytorch model TensorRT tensorrt , onnx	5	2164	February 15, 2022
8bit quantized onnx file and its 8bit engine inference results differ TensorRT tensorrt	2	766	November 21, 2021
Fake quantization ONNX model parse ERROR using TensorRT 8 TensorRT	3	876	September 27, 2021
Problem with converting ONNX quantized models to TensorRT Jetson AGX Xavier tensorrt , onnx	6	1842	December 22, 2021
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	2075	June 14, 2021
Failed to create tensorrt engine from QAT onnx model Jetson AGX Orin tensorrt , onnx	3	1117	January 16, 2023
TensorRT conversion issues of ONNX model trained with Quantization Aware Training + custom quantization scale TensorRT tensorrt	5	1517	April 14, 2021

I faced an error during the quantization test using TensorRT 8

check_model.py

Related topics