Exporting quantized model in ONNX file from TensorRT C++

I have a question considering a 8 bit Quantization flow. Currently I use the pytorch quantization toolkit to quantize the network and pytorch to export to ONNX. Finally I import the ONNX files into TensorRT using C++ framework and build and inference engine.

However, this approach leads to input shape constrains as the ONNX file holds a graph for a specific input shape used while exporting it.
I’m wondering if the input shape can be changed after importing the graph into the TensorRT? Or maybe I can save the weights and quantization scales in hd5 format, load them into C++ and then set the input shape for the inference?

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging