Int8 inference, low confidence on output

Description

I’m attempting to use int8 inference on a deep network using tensorrt on c++ (the system runs fine on FP16 setting). The output of the network is a H x W heat map.
After Caliberating and compiling the engine I get a low confidence score on the peaks of a test image compared to the FP16 network (0.2 confidence instead of 0.8). I have attempted multiple types of calibrations to make sure it’s not a problem with the calibration:

  • 5K images similar to my test image.
  • 50K images comprised of the training set for the network.
  • The test image as the sole training set.

Additionally I get the following warnings during compilation:
[W] [TRT] Detected invalid timing cache, setup a local cache instead.
[W] [TRT] Cache result detected as invalid for node: XXXX_conv_YYYY, LayerImpl: CaskConvolution

Environment

TensorRT Version: 8
GPU Type: Geforce 3080
Nvidia Driver Version: 11.4
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System + Version: Ubuntu 18
Work environment: c++

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

Hi, I made sure I followed the example in the tutorial (my network is ONNX and not Caffe as in the sample), but the issue persists.

Hi @or1 ,
Can you please help us with the reproducible model and script.

Thanks!

Hi, unfortunately I currently can’t upload the model. I did retrain the model on a tensorrt 7 system and, while I didn’t have the same warnings, the results were the same with correct peak locations but low confidence.

I have a question, in the case of an output that is a heatmap, is there a risk of the peaks being grouped together with some arbitrary “top” value (ie, all values about 0.2+ will be given the value of 0.2). Since that will exactly explain my problem - because the peaks are rare and thus will be grouped in the quantization. Following on that, is there a way an int8 system can be trained specifically on a heatmap output, or on a manually set range of values on an output tensor (for example, uniform range between 0 and 1)?

Thanks.

Hi,

If the peaks are rare, then it is possible that entropy calibration might not be a good idea - as it would tend to reduce the overall histogram distribution loss.
Can you please try Int8MInMaxCalibrator instead?
Regarding the last sentence, we can try setting the dynamic range of the output tensor. Something like network.get_output(0).dynamic_range = (-1, 1)

Thank you.