Problem with converting ONNX quantized models to TensorRT


I’m working on NVIDIA Jetson AGX Xavier Developer Kit with jetpack 4.6.
I want to convert an ONNX model of a quantized model (in TensorFlow 2) into a model in TRT and run it.

When parsing the model with the next code (where model_file is the path to the ONNX model.):

with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flags=network_flags) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        with open(model_file, 'rb') as model:
            if not parser.parse(
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None

we get the next error:
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +346, GPU +0, now: CPU 427, GPU 7031 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: Failed to parse the ONNX file.
In node 0 (QuantDequantLinearHelper): INVALID_NODE: Assertion failed: axis == INVALID_AXIS && “Quantization axis attribute is not valid with a single quantization scale”

What should I do to make it work?
Thank you


Could you share the details of your quantization/dequantization layer with us first.

To run a model with TensorRT, you need to convert it into TensorRT first.
So all the layers need to be supported by the onnx2trt parser as well as TensorRT inference.

Based on your log, the error occurs from the onnx2trt parser.
You can find its support matrix below:

More, which opset version do you use?



We use quantization layer of TensorFlow2 which is located at the input t of the model (tf.quantization.quantize  |  TensorFlow Core v2.7.0) with the next values:
[<tf.Variable ‘quantize_layer/quantize_layer_min:0’ shape=() dtype=float32, numpy=-0.11559381>,
** <tf.Variable ‘quantize_layer/quantize_layer_max:0’ shape=() dtype=float32, numpy=1.0>,**
** <tf.Variable ‘quantize_layer/optimizer_step:0’ shape=() dtype=int32, numpy=-1>]**

Other than that all layers are simply quantized versions of the regular layers.
I think that all layers are supported because when I tried converting the non-quantized model it worked fine.

We used op set 13 at the convertor.


Based on the source below, do you use a per-tensor quantization?
If yes, it is not supported on the onnx2trt 8.0 EA parser.


I used per-layer quantization (per tensor) and also per channel quantization.
Both lead to errors.

What kind of quantization should i use?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.


Would you mind sharing the model (no quantization, per-layer quantization, and per-channel quantization) with us?
We want to reproduce this issue and dobule-checking it internally.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.