I have a question considering a 8 bit Quantization flow. Currently I use the pytorch quantization toolkit to quantize the network and pytorch to export to ONNX. Finally I import the ONNX files into TensorRT using C++ framework and build and inference engine.
However, this approach leads to input shape constrains as the ONNX file holds a graph for a specific input shape used while exporting it.
I’m wondering if the input shape can be changed after importing the graph into the TensorRT? Or maybe I can save the weights and quantization scales in hd5 format, load them into C++ and then set the input shape for the inference?