TensorRT7 performance drop with int8 and fp16 kernel mixture

Description

Hi,

I was converting pytorch models to TensorRT engines through onnx. The overall performance of fp32, fp16 and int8 (with calibration) engines are reasonable compared with pytorch models.

However, I found the performance would drop close to zero when I enable both fp16 kernels and int8 kernels at the same time, by setting both kINT8 and kFP16 flag in the example as follows:

config->setFlag(BuilderFlag::kINT8);
config->setFlag(BuilderFlag::kFP16);

I did not meet this error whiling using TensorRT 6, but failed in TensorRT 7 with similar configuration. Can anyone help me to debug this problem? Please let me know if more information/sample is needed to demonstrate the problem.

Environment

TensorRT Version: 7.0.0.11
GPU Type: GeForce RTX 2080 Ti
Nvidia Driver Version: 418.56
CUDA Version: 10.0.130
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 16.04.6
Python Version (if applicable): Python 3.7.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4.0
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

I am sorry that I was unable to provide the original onnx model at the moment, but I found a resnet50 would also have such a problem with the same setting.

Steps To Reproduce

The implementation is based on the TensorRT OSS project https://github.com/NVIDIA/TensorRT with ONNX and int8 example.

Hi,

  1. Can you please share an ONNX model and the associated script + Makefile that reproduces this behavior?

  2. Can you clarify what you mean by “performance drops close to 0”? Do you mean accuracy? Or speed?