INT8 calibration causes a significant decrease in accuracy when batch_size is greater than 1

928024300 · December 25, 2020, 8:14am

Description

I use INT8 calibration when the batch size is 1, The accuracy I got is 76%.
But when I change the batch size to 2, the accuracy dropped to 68%.
I tried many methods, these methods does not work for me, including using a different calibrator and increasing the calibration dataset.

I also found that the accuracy of batch_size=1 and batch_size=2 is inconsistent not only in int8. The same phenomenon appears on fp16, but the accuracy is only 0.01% lower in batch_size=2.

The difference is as follows

Batch Size=1
use 10 batches images to calibration(10 images)
context.execute_async(batch_size=1…)
builder.max_batch_size = 1
input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)) * 1, dtype=trt.nptype(trt.float32))
output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)) * 1,
dtype=trt.nptype(trt.float32))

Batch Size=2
use 10 batches images to calibration(20 images)
context.execute_async(batch_size=2…)
builder.max_batch_size = 2
input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)) * 2, dtype=trt.nptype(trt.float32))
output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)) * 2,
dtype=trt.nptype(trt.float32))

Environment

TensorRT Version: 6.0.1.5
GPU Type: NVIDIA T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 3.6.4
TensorFlow Version (if applicable): 1.14.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

AakankshaS · December 28, 2020, 4:21am

Hi @928024300,
Can you please try the same using latest TRT release?

Thanks!

928024300 · December 28, 2020, 4:43am

@AakankshaS
Thank you for your reply, I tried TensorRT 7.2.2.3, bs=1 and bs=2 are both 68% accuracy with same code.

I’m not sure if it is a problem with my code, so I tried the official example.
And I also found this problem in the example:samples/python/int8_caffe_mnist. You can use this example to reproduce.
I have tried whether to modify the calibration batch_size will lead to similar conclusions.

reproduce steps

cd /xxx/TensorRT-7.2.2.3/samples/python/int8_caffe_mnist
vim sample.py

line 113
orig:
preds = np.argmax(output.reshape(32, 10)[0:effective_batch_size], axis=1)
my code:
preds = np.argmax(output.reshape(batch_size, 10)[0:effective_batch_size], axis=1)

line

vim calibrator.py
def read_calibration_cache(self):
return None
def write_calibration_cache(self, cache):
return None
Download t10k-images-idx3-ubyte, t10k-labels-idx1-ubyte and train-images-idx3-ubyte, move them to /xxx/TensorRT-7.1.3.4/data/mnist/

By modifying the batch_size in line 132 in sample.py to test.
command:
python sample.py -d /xxx/TensorRT-7.1.3.4/data/mnist/

The results are as follows

TRT Version	batch_size	Accuracy
7.2.2.3	1	99.04%
7.2.2.3	2	99.04%
7.2.2.3	4	99.09%
7.1.3.4	1	99.04%
7.1.3.4	2	99.04%
7.1.3.4	4	99.08%

I printed the output of the first 4 samples as follows

TRT 7.1.3.4
bs=1

[1.3280198e-08 8.3349909e-07 5.5637688e-06 4.5241384e-05 1.2360628e-07 4.6623416e-08 6.2411069e-11 9.9994063e-01 5.2299274e-08 7.4734148e-06]
[8.3692653e-09 7.1982811e-08 9.9999988e-01 2.1252868e-10 3.9617954e-15 3.6299807e-13 1.0279181e-09 7.8858204e-13 4.7648632e-09 2.1704192e-13]
[6.4544167e-07 9.9980003e-01 7.8375524e-06 2.3874209e-06 2.4654912e-05 1.0651434e-06 1.1815050e-06 1.4080654e-04 2.0498124e-05 9.2139965e-07]
[9.9996889e-01 1.2997623e-08 1.6910227e-06 3.0409957e-09 9.8426703e-08 5.2588941e-08 2.8616896e-05 1.7889117e-07 1.8760392e-08 4.5338979e-07]

bs=2
[1.32801725e-08 8.33498234e-07 5.56376881e-06 4.52413842e-05 1.23606171e-07 4.66235051e-08 6.24111873e-11 9.99940634e-01 5.22992742e-08 7.47341483e-06]
[8.3692173e-09 7.1982534e-08 9.9999988e-01 2.1252787e-10 3.9617654e-15 3.6299669e-13 1.0279142e-09 7.8857905e-13 4.7648454e-09 2.1704151e-13]
[6.4544048e-07 9.9980003e-01 7.8375379e-06 2.3874184e-06 2.4654866e-05 1.0651414e-06 1.1815029e-06 1.4080627e-04 2.0498084e-05 9.2139788e-07]
[9.9996889e-01 1.2997623e-08 1.6910227e-06 3.0409957e-09 9.8426703e-08 5.2588941e-08 2.8616896e-05 1.7889117e-07 1.8760357e-08 4.5338979e-07]

bs=4
[1.6549187e-08 9.6081453e-07 6.1077662e-06 5.0769024e-05 1.3747349e-07 5.6813914e-08 8.5180037e-11 9.9993372e-01 6.7060590e-08 8.1129019e-06]
[1.5495447e-08 1.5377817e-07 9.9999988e-01 8.0922791e-10 1.6675904e-14 1.5596557e-12 1.7077265e-09 2.9476391e-12 9.2306145e-09 9.4902786e-13]
[6.9759159e-07 9.9979073e-01 8.0839163e-06 2.6293960e-06 2.6708811e-05 1.1526640e-06 1.2683212e-06 1.4571304e-04 2.1868871e-05 1.0654210e-06]
[9.9995506e-01 2.7548934e-08 3.1004688e-06 8.3001215e-09 2.0465728e-07 1.7640777e-07 3.9912786e-05 3.9605615e-07 5.3834178e-08 1.0082869e-06]

TRT 7.2.2.3
bs=1
[1.3280198e-08 8.3349909e-07 5.5637743e-06 4.5241424e-05 1.2360628e-07 4.6623594e-08 6.2411187e-11 9.9994063e-01 5.2299274e-08 7.4734221e-06]
[8.3692333e-09 7.1982534e-08 9.9999988e-01 2.1252787e-10 3.9617654e-15 3.6299669e-13 1.0279142e-09 7.8857754e-13 4.7648454e-09 2.1704109e-13]
[6.4544048e-07 9.9980003e-01 7.8375379e-06 2.3874184e-06 2.4654888e-05 1.0651414e-06 1.1815029e-06 1.4080627e-04 2.0498084e-05 9.2139788e-07]
[9.9996889e-01 1.2997648e-08 1.6910244e-06 3.0409957e-09 9.8426895e-08 5.2588941e-08 2.8616925e-05 1.7889133e-07 1.8760392e-08 4.5339021e-07]

bs=2
[1.3280198e-08 8.3349909e-07 5.5637743e-06 4.5241424e-05 1.2360628e-07 4.6623594e-08 6.2411187e-11 9.9994063e-01 5.2299274e-08 7.4734221e-06]
[8.3692333e-09 7.1982534e-08 9.9999988e-01 2.1252787e-10 3.9617654e-15 3.6299669e-13 1.0279142e-09 7.8857754e-13 4.7648454e-09 2.1704109e-13]
[6.4544048e-07 9.9980003e-01 7.8375379e-06 2.3874184e-06 2.4654888e-05 1.0651414e-06 1.1815029e-06 1.4080627e-04 2.0498084e-05 9.2139788e-07]
[9.9996889e-01 1.2997648e-08 1.6910244e-06 3.0409957e-09 9.8426895e-08 5.2588941e-08 2.8616925e-05 1.7889133e-07 1.8760392e-08 4.5339021e-07]

bs=4
[1.7204741e-08 9.9102090e-07 6.3659159e-06 4.8434951e-05 1.2568742e-07 5.3502756e-08 8.1484333e-11 9.9993658e-01 6.8533737e-08 7.4288519e-06]
[1.4838719e-08 1.4176142e-07 9.9999988e-01 6.2464345e-10 1.6381333e-14 1.3405128e-12 1.4546487e-09 2.6372875e-12 8.4720071e-09 7.9731019e-13]
[6.9668454e-07 9.9976557e-01 8.4982812e-06 2.9508076e-06 2.9321654e-05 1.1908999e-06 1.2807174e-06 1.6637771e-04 2.2839846e-05 1.2312287e-06]

[9.9995196e-01 3.0704779e-08 3.4117381e-06 1.0069451e-08 2.2443743e-07 2.0824145e-07 4.2460961e-05 4.0692220e-07 6.4224750e-08 1.0884975e-06]

AakankshaS · December 28, 2020, 7:17am

Hi @928024300,
Can you please help us with your model, script and the logs, so that we can assist you better.

Thanks!

928024300 · December 28, 2020, 7:57am

@AakankshaS

I added details in the last reply. If I didn’t make it clear, please point out. Thanks.

928024300 · January 4, 2021, 1:35am

@AakankshaS
Hi, Is there any conclusion?
We currently hope to use int8 quantization when batch_size> 1, but this problem makes our results far from expected.

Thanks!

AakankshaS · January 15, 2021, 12:12pm

Hi @928024300,
The difference of FP16 with 0.01% difference is acceptable. And you are using int8_caffe_mnist which can see 0.05% difference for different batch is not a big problem too.
We may choose different kernel when batch change. And for FP16/Int8 there might be some accuracy difference between kernels.

Thanks!

Topic		Replies	Views
Building TensorRT int8 for batch greater than 1 fails TensorRT	1	480	January 26, 2021
TensorRT 8.0.3 imagenet resnet model INT8 conversion identical output with different input after calibration TensorRT tensorrt	3	1329	December 23, 2021
CalibrationTable and executable engine TensorRT	12	3706	October 13, 2023
TensorRT sampleINT8API Demo low accuracy TensorRT	3	475	April 2, 2020
Generate the INT8 calibration In TensorRT GPU-Accelerated Libraries	0	676	October 23, 2017
TensorRT 4.0.1 - Int8 precision Vs. FP32 precision objects detections inference results TensorRT	12	3735	December 1, 2019
Int8 Calibration is not accurate .. see image diff with and without TensorRT	20	2891	January 4, 2021
convert faster rcnn to int8 GPU-Accelerated Libraries	0	939	October 12, 2017
tensorRT int8 GPU-Accelerated Libraries	0	935	June 8, 2017
Analyzing sampleInt8 accuracy TensorRT	2	1063	June 5, 2019

INT8 calibration causes a significant decrease in accuracy when batch_size is greater than 1

Description

Environment

Relevant Files

Steps To Reproduce

Related topics