INT8 quantization with Torch-TensorRT fails

Description

Hi,

at our company we are having problems using Pytorch-TensorRT with the official Nvidia Docker image, version 22.05.

The problems come with int8 quantization, and are already reported in pytorch bug tracker, unfortunately with no answer [https://github.com/pytorch/TensorRT/issues/1135].

We also have found the bug reported here [https://github.com/pytorch/TensorRT/issues/927], which is still connected to int8 quantization in Pytorch-TensorRT

Environment

TensorRT Version: (Torch-TensorRT) 1.2.0a0+666a2637
GPU Type: GeForce RTX 2080 Ti
Nvidia Driver Version: GeForce RTX 2080 Ti
CUDA Version: 11.0
CUDNN Version: 8.4
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): –
PyTorch Version (if applicable): 1.11.00
Baremetal or Container (if container which image + tag): Nvidial NGC Container nvcr.io/nvidia/pytorch:22.05-py3

Relevant Files

The first bug is encountered when executing this notebook [https://github.com/pytorch/TensorRT/blob/master/notebooks/vgg-qat.ipynb] in the Nvidia-Pytorch container 22.05.

The stack trace can be found in the link to the Pytorch GitHub issue.

The second bug description is found in its GitHub issue page: a segmentation fault occurs when using int8 quantization with TensorRT.

Steps To Reproduce

The first bug is easily reproducible by executing the notebook inside the container version 22.05.

Hi, Please refer to the below links to perform inference in INT8

Thanks!

Hi,

My problem concerns the Pytorch-TensorRT library. Any advice on that?

Ivan

Hi,

Regarding the reported bugs, please wait for an update on the git issues.
For QAT using the PyTorch, the following links may be helpful to you.

https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html

Thank you.