Problem with converting ONNX quantized models to TensorRT

onabati · November 16, 2021, 3:51pm

Hi,

I’m working on NVIDIA Jetson AGX Xavier Developer Kit with jetpack 4.6.
I want to convert an ONNX model of a quantized model (in TensorFlow 2) into a model in TRT and run it.

When parsing the model with the next code (where model_file is the path to the ONNX model.):

with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flags=network_flags) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        with open(model_file, 'rb') as model:
            if not parser.parse(model.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None

we get the next error:
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +346, GPU +0, now: CPU 427, GPU 7031 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: Failed to parse the ONNX file.
In node 0 (QuantDequantLinearHelper): INVALID_NODE: Assertion failed: axis == INVALID_AXIS && “Quantization axis attribute is not valid with a single quantization scale”

What should I do to make it work?
Thank you

AastaLLL · November 17, 2021, 3:00am

Hi,

Could you share the details of your quantization/dequantization layer with us first.

To run a model with TensorRT, you need to convert it into TensorRT first.
So all the layers need to be supported by the onnx2trt parser as well as TensorRT inference.

Based on your log, the error occurs from the onnx2trt parser.
You can find its support matrix below:
https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md

More, which opset version do you use?

Thanks.

onabati · November 18, 2021, 10:45am

Hi,

We use quantization layer of TensorFlow2 which is located at the input t of the model (tf.quantization.quantize | TensorFlow v2.10.0) with the next values:
[<tf.Variable ‘quantize_layer/quantize_layer_min:0’ shape=() dtype=float32, numpy=-0.11559381>,
** <tf.Variable ‘quantize_layer/quantize_layer_max:0’ shape=() dtype=float32, numpy=1.0>,**
** <tf.Variable ‘quantize_layer/optimizer_step:0’ shape=() dtype=int32, numpy=-1>]**

Other than that all layers are simply quantized versions of the regular layers.
I think that all layers are supported because when I tried converting the non-quantized model it worked fine.

We used op set 13 at the convertor.

AastaLLL · November 25, 2021, 8:06am

Hi,

Based on the source below, do you use a per-tensor quantization?
If yes, it is not supported on the onnx2trt 8.0 EA parser.

github.com

onnx/onnx-tensorrt/blob/8.0-EA/builtin_op_importers.cpp#L1101


      
                  axis = 0;
              }
              // Ensure that number of scale-coefficients is equal to the number of output channels.
              int64_t K = dataInput.getDimensions().d[axis];
              ASSERT(K == scaleSize && "The number of scales is not equal to the number of output channels.",
                  nvonnxparser::ErrorCode::kINVALID_NODE);
          }
          else
          {
              // Per-Tensor Quantization.
              ASSERT(axis == INVALID_AXIS && "Quantization axis attribute is not valid with a single quantization scale", nvonnxparser::ErrorCode::kINVALID_NODE);
              // Currently this is ignored by TRT, but it is required by addScaleNd (for computing nbSpatialDims).
              axis = 1;
          }
          
          
nvinfer1::ILayer* layer = nullptr;
          if (isDQ)
          {
              // Add and configure a DequantizeLayer.
              nvinfer1::IDequantizeLayer* dq = ctx->network()->addDequantize(dataInput, *scaleInput);
              ASSERT(dq && "Failed to create Dequantize layer.", ErrorCode::kUNSUPPORTED_NODE);

Thanks.

onabati · November 25, 2021, 8:49am

I used per-layer quantization (per tensor) and also per channel quantization.
Both lead to errors.

What kind of quantization should i use?

AastaLLL · November 26, 2021, 3:33am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Would you mind sharing the model (no quantization, per-layer quantization, and per-channel quantization) with us?
We want to reproduce this issue and dobule-checking it internally.

Thanks.

system · December 22, 2021, 2:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.