Int8 inference, low confidence on output

or1 · September 1, 2021, 12:31pm

Description

I’m attempting to use int8 inference on a deep network using tensorrt on c++ (the system runs fine on FP16 setting). The output of the network is a H x W heat map.
After Caliberating and compiling the engine I get a low confidence score on the peaks of a test image compared to the FP16 network (0.2 confidence instead of 0.8). I have attempted multiple types of calibrations to make sure it’s not a problem with the calibration:

5K images similar to my test image.
50K images comprised of the training set for the network.
The test image as the sole training set.

Additionally I get the following warnings during compilation:
[W] [TRT] Detected invalid timing cache, setup a local cache instead.
[W] [TRT] Cache result detected as invalid for node: XXXX_conv_YYYY, LayerImpl: CaskConvolution

Environment

TensorRT Version: 8
GPU Type: Geforce 3080
Nvidia Driver Version: 11.4
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System + Version: Ubuntu 18
Work environment: c++

NVES · September 1, 2021, 7:58pm

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

or1 · September 2, 2021, 2:05pm

Hi, I made sure I followed the example in the tutorial (my network is ONNX and not Caffe as in the sample), but the issue persists.

AakankshaS · September 2, 2021, 2:06pm

Hi @or1 ,
Can you please help us with the reproducible model and script.

Thanks!

or1 · September 5, 2021, 3:17pm

Hi, unfortunately I currently can’t upload the model. I did retrain the model on a tensorrt 7 system and, while I didn’t have the same warnings, the results were the same with correct peak locations but low confidence.

I have a question, in the case of an output that is a heatmap, is there a risk of the peaks being grouped together with some arbitrary “top” value (ie, all values about 0.2+ will be given the value of 0.2). Since that will exactly explain my problem - because the peaks are rare and thus will be grouped in the quantization. Following on that, is there a way an int8 system can be trained specifically on a heatmap output, or on a manually set range of values on an output tensor (for example, uniform range between 0 and 1)?

Thanks.

spolisetty · September 8, 2021, 2:58pm

Hi,

If the peaks are rare, then it is possible that entropy calibration might not be a good idea - as it would tend to reduce the overall histogram distribution loss.
Can you please try Int8MInMaxCalibrator instead?
Regarding the last sentence, we can try setting the dynamic range of the output tensor. Something like network.get_output(0).dynamic_range = (-1, 1)

Thank you.

Topic		Replies	Views
INT8 Inference with low AP TensorRT	1	517	November 18, 2021
INT8 inference with different results TensorRT	5	1280	October 5, 2018
TensorRT 8.0.3 imagenet resnet model INT8 conversion identical output with different input after calibration TensorRT tensorrt	3	1314	December 23, 2021
TensorRT INT8 inference, the result is totally wrong! TensorRT	7	953	May 13, 2020
After int8 quantification, the model does not detect any objects TensorRT	1	1258	June 27, 2019
Int8 problem TensorRT tensorrt	19	1373	May 11, 2021
I am using TensorRT int8 inference, but get wrong results. error occurs at the last three layer. TensorRT	1	741	May 23, 2018
Int8 calibration TensorRT	1	2533	December 17, 2021
int8 precision TensorRT	0	384	December 5, 2019
TensorRT --- non-int8 fallback when trying to calibrate ONNX model DeepStream SDK tensorrt , deepstream	11	648	July 1, 2024

Int8 inference, low confidence on output

Description

Environment

Related topics