I was converting pytorch models to TensorRT engines through onnx. The overall performance of fp32, fp16 and int8 (with calibration) engines are reasonable compared with pytorch models.
However, I found the performance would drop close to zero when I enable both fp16 kernels and int8 kernels at the same time, by setting both kINT8 and kFP16 flag in the example as follows:
I did not meet this error whiling using TensorRT 6, but failed in TensorRT 7 with similar configuration. Can anyone help me to debug this problem? Please let me know if more information/sample is needed to demonstrate the problem.
TensorRT Version: 184.108.40.206
GPU Type: GeForce RTX 2080 Ti
Nvidia Driver Version: 418.56
CUDA Version: 10.0.130
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 16.04.6
Python Version (if applicable): Python 3.7.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4.0
Baremetal or Container (if container which image + tag): Baremetal
I am sorry that I was unable to provide the original onnx model at the moment, but I found a resnet50 would also have such a problem with the same setting.
Steps To Reproduce
The implementation is based on the TensorRT OSS project https://github.com/NVIDIA/TensorRT with ONNX and int8 example.