Exporting quantized model in ONNX file from TensorRT C++

I have a question considering a 8 bit Quantization flow. Currently I use the pytorch quantization toolkit to quantize the network and pytorch to export to ONNX. Finally I import the ONNX files into TensorRT using C++ framework and build and inference engine.

However, this approach leads to input shape constrains as the ONNX file holds a graph for a specific input shape used while exporting it.
I’m wondering if the input shape can be changed after importing the graph into the TensorRT? Or maybe I can save the weights and quantization scales in hd5 format, load them into C++ and then set the input shape for the inference?

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!