I’m working on NVIDIA Jetson AGX Xavier Developer Kit with jetpack 4.6.
I want to convert an ONNX model of a quantized model (in TensorFlow 2) into a model in TRT and run it.
When parsing the model with the next code (where model_file is the path to the ONNX model.):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flags=network_flags) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
with open(model_file, 'rb') as model:
if not parser.parse(model.read()):
print ('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
we get the next error:
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +346, GPU +0, now: CPU 427, GPU 7031 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: Failed to parse the ONNX file.
In node 0 (QuantDequantLinearHelper): INVALID_NODE: Assertion failed: axis == INVALID_AXIS && “Quantization axis attribute is not valid with a single quantization scale”
We use quantization layer of TensorFlow2 which is located at the input t of the model (tf.quantization.quantize | TensorFlow Core v2.7.0) with the next values: [<tf.Variable ‘quantize_layer/quantize_layer_min:0’ shape=() dtype=float32, numpy=-0.11559381>,
** <tf.Variable ‘quantize_layer/quantize_layer_max:0’ shape=() dtype=float32, numpy=1.0>,**
** <tf.Variable ‘quantize_layer/optimizer_step:0’ shape=() dtype=int32, numpy=-1>]**
Other than that all layers are simply quantized versions of the regular layers.
I think that all layers are supported because when I tried converting the non-quantized model it worked fine.