I’m working on NVIDIA Jetson AGX Xavier Developer Kit with jetpack 4.6.
I want to convert an ONNX model of a quantized model (in TensorFlow 2) into a model in TRT and run it.
When parsing the model with the next code (where model_file is the path to the ONNX model.):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flags=network_flags) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
with open(model_file, 'rb') as model:
if not parser.parse(model.read()):
print ('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print (parser.get_error(error))
return None
we get the next error:
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +346, GPU +0, now: CPU 427, GPU 7031 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: Failed to parse the ONNX file.
In node 0 (QuantDequantLinearHelper): INVALID_NODE: Assertion failed: axis == INVALID_AXIS && “Quantization axis attribute is not valid with a single quantization scale”
Could you share the details of your quantization/dequantization layer with us first.
To run a model with TensorRT, you need to convert it into TensorRT first.
So all the layers need to be supported by the onnx2trt parser as well as TensorRT inference.
We use quantization layer of TensorFlow2 which is located at the input t of the model (tf.quantization.quantize | TensorFlow v2.10.0) with the next values: [<tf.Variable ‘quantize_layer/quantize_layer_min:0’ shape=() dtype=float32, numpy=-0.11559381>,
** <tf.Variable ‘quantize_layer/quantize_layer_max:0’ shape=() dtype=float32, numpy=1.0>,**
** <tf.Variable ‘quantize_layer/optimizer_step:0’ shape=() dtype=int32, numpy=-1>]**
Other than that all layers are simply quantized versions of the regular layers.
I think that all layers are supported because when I tried converting the non-quantized model it worked fine.
There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks
Hi,
Would you mind sharing the model (no quantization, per-layer quantization, and per-channel quantization) with us?
We want to reproduce this issue and dobule-checking it internally.