I have been successfully using RT c++ API to load float32 pytorch model and convert it to a INT8 RT engine on a local machine. Now I want to convert an INT8 pytorch QAT model to an INT8 RT engine. Can I do this with TensorRT 7.1.3 since upgrading to RT8 would require a lot change? If so, what steps should I take?
TensorRT Version: 4.1.3 GPU Type: xavier Nvidia Driver Version: CUDA Version: 10.2 CUDNN Version: Operating System + Version: Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Thanks for your reply, but it doesn’t really answer my question. The above examples shows how to convert FLOAT pytorch model to an INT8 RT model, which I have implemented.
My question is how to parse an INT8 pytorch model (QAT) as it is and convert it to an INT8 RT model without calibration?
Right now it seems that I need to convert INT8 weight parameters to float type otherwise RT won’t take it.
Looks like you’re not following the TensorRT doc. If you convert the QAT model to “real int8” using torch.quantization.fuse_modules, TensorRT does not support this and it is also not supported by export to ONNX. This creates fused layers such as conv+relu+bn and also quantizes the weights to int8.