INT8 calibration causes a significant decrease in accuracy when batch_size is greater than 1

Description

I use INT8 calibration when the batch size is 1, The accuracy I got is 76%.
But when I change the batch size to 2, the accuracy dropped to 68%.
I tried many methods, these methods does not work for me, including using a different calibrator and increasing the calibration dataset.

I also found that the accuracy of batch_size=1 and batch_size=2 is inconsistent not only in int8. The same phenomenon appears on fp16, but the accuracy is only 0.01% lower in batch_size=2.

The difference is as follows

Batch Size=1
use 10 batches images to calibration(10 images)
context.execute_async(batch_size=1…)
builder.max_batch_size = 1
input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)) * 1, dtype=trt.nptype(trt.float32))
output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)) * 1,
dtype=trt.nptype(trt.float32))

Batch Size=2
use 10 batches images to calibration(20 images)
context.execute_async(batch_size=2…)
builder.max_batch_size = 2
input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)) * 2, dtype=trt.nptype(trt.float32))
output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)) * 2,
dtype=trt.nptype(trt.float32))

Environment

TensorRT Version: 6.0.1.5
GPU Type: NVIDIA T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 3.6.4
TensorFlow Version (if applicable): 1.14.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Hi @928024300,
Can you please try the same using latest TRT release?

Thanks!

@AakankshaS
Thank you for your reply, I tried TensorRT 7.2.2.3, bs=1 and bs=2 are both 68% accuracy with same code.

I’m not sure if it is a problem with my code, so I tried the official example.
And I also found this problem in the example:samples/python/int8_caffe_mnist. You can use this example to reproduce.
I have tried whether to modify the calibration batch_size will lead to similar conclusions.

reproduce steps

  1. cd /xxx/TensorRT-7.2.2.3/samples/python/int8_caffe_mnist
  2. vim sample.py

line 113
orig:
preds = np.argmax(output.reshape(32, 10)[0:effective_batch_size], axis=1)
my code:
preds = np.argmax(output.reshape(batch_size, 10)[0:effective_batch_size], axis=1)

line

  1. vim calibrator.py
    def read_calibration_cache(self):
    return None
    def write_calibration_cache(self, cache):
    return None

  2. Download t10k-images-idx3-ubyte, t10k-labels-idx1-ubyte and train-images-idx3-ubyte, move them to /xxx/TensorRT-7.1.3.4/data/mnist/

By modifying the batch_size in line 132 in sample.py to test.
command:
python sample.py -d /xxx/TensorRT-7.1.3.4/data/mnist/

The results are as follows

TRT Version batch_size Accuracy
7.2.2.3 1 99.04%
7.2.2.3 2 99.04%
7.2.2.3 4 99.09%
7.1.3.4 1 99.04%
7.1.3.4 2 99.04%
7.1.3.4 4 99.08%

I printed the output of the first 4 samples as follows

TRT 7.1.3.4
bs=1

[1.3280198e-08 8.3349909e-07 5.5637688e-06 4.5241384e-05 1.2360628e-07 4.6623416e-08 6.2411069e-11 9.9994063e-01 5.2299274e-08 7.4734148e-06]
[8.3692653e-09 7.1982811e-08 9.9999988e-01 2.1252868e-10 3.9617954e-15 3.6299807e-13 1.0279181e-09 7.8858204e-13 4.7648632e-09 2.1704192e-13]
[6.4544167e-07 9.9980003e-01 7.8375524e-06 2.3874209e-06 2.4654912e-05 1.0651434e-06 1.1815050e-06 1.4080654e-04 2.0498124e-05 9.2139965e-07]
[9.9996889e-01 1.2997623e-08 1.6910227e-06 3.0409957e-09 9.8426703e-08 5.2588941e-08 2.8616896e-05 1.7889117e-07 1.8760392e-08 4.5338979e-07]

bs=2
[1.32801725e-08 8.33498234e-07 5.56376881e-06 4.52413842e-05 1.23606171e-07 4.66235051e-08 6.24111873e-11 9.99940634e-01 5.22992742e-08 7.47341483e-06]
[8.3692173e-09 7.1982534e-08 9.9999988e-01 2.1252787e-10 3.9617654e-15 3.6299669e-13 1.0279142e-09 7.8857905e-13 4.7648454e-09 2.1704151e-13]
[6.4544048e-07 9.9980003e-01 7.8375379e-06 2.3874184e-06 2.4654866e-05 1.0651414e-06 1.1815029e-06 1.4080627e-04 2.0498084e-05 9.2139788e-07]
[9.9996889e-01 1.2997623e-08 1.6910227e-06 3.0409957e-09 9.8426703e-08 5.2588941e-08 2.8616896e-05 1.7889117e-07 1.8760357e-08 4.5338979e-07]

bs=4
[1.6549187e-08 9.6081453e-07 6.1077662e-06 5.0769024e-05 1.3747349e-07 5.6813914e-08 8.5180037e-11 9.9993372e-01 6.7060590e-08 8.1129019e-06]
[1.5495447e-08 1.5377817e-07 9.9999988e-01 8.0922791e-10 1.6675904e-14 1.5596557e-12 1.7077265e-09 2.9476391e-12 9.2306145e-09 9.4902786e-13]
[6.9759159e-07 9.9979073e-01 8.0839163e-06 2.6293960e-06 2.6708811e-05 1.1526640e-06 1.2683212e-06 1.4571304e-04 2.1868871e-05 1.0654210e-06]
[9.9995506e-01 2.7548934e-08 3.1004688e-06 8.3001215e-09 2.0465728e-07 1.7640777e-07 3.9912786e-05 3.9605615e-07 5.3834178e-08 1.0082869e-06]

TRT 7.2.2.3
bs=1
[1.3280198e-08 8.3349909e-07 5.5637743e-06 4.5241424e-05 1.2360628e-07 4.6623594e-08 6.2411187e-11 9.9994063e-01 5.2299274e-08 7.4734221e-06]
[8.3692333e-09 7.1982534e-08 9.9999988e-01 2.1252787e-10 3.9617654e-15 3.6299669e-13 1.0279142e-09 7.8857754e-13 4.7648454e-09 2.1704109e-13]
[6.4544048e-07 9.9980003e-01 7.8375379e-06 2.3874184e-06 2.4654888e-05 1.0651414e-06 1.1815029e-06 1.4080627e-04 2.0498084e-05 9.2139788e-07]
[9.9996889e-01 1.2997648e-08 1.6910244e-06 3.0409957e-09 9.8426895e-08 5.2588941e-08 2.8616925e-05 1.7889133e-07 1.8760392e-08 4.5339021e-07]

bs=2
[1.3280198e-08 8.3349909e-07 5.5637743e-06 4.5241424e-05 1.2360628e-07 4.6623594e-08 6.2411187e-11 9.9994063e-01 5.2299274e-08 7.4734221e-06]
[8.3692333e-09 7.1982534e-08 9.9999988e-01 2.1252787e-10 3.9617654e-15 3.6299669e-13 1.0279142e-09 7.8857754e-13 4.7648454e-09 2.1704109e-13]
[6.4544048e-07 9.9980003e-01 7.8375379e-06 2.3874184e-06 2.4654888e-05 1.0651414e-06 1.1815029e-06 1.4080627e-04 2.0498084e-05 9.2139788e-07]
[9.9996889e-01 1.2997648e-08 1.6910244e-06 3.0409957e-09 9.8426895e-08 5.2588941e-08 2.8616925e-05 1.7889133e-07 1.8760392e-08 4.5339021e-07]

bs=4
[1.7204741e-08 9.9102090e-07 6.3659159e-06 4.8434951e-05 1.2568742e-07 5.3502756e-08 8.1484333e-11 9.9993658e-01 6.8533737e-08 7.4288519e-06]
[1.4838719e-08 1.4176142e-07 9.9999988e-01 6.2464345e-10 1.6381333e-14 1.3405128e-12 1.4546487e-09 2.6372875e-12 8.4720071e-09 7.9731019e-13]
[6.9668454e-07 9.9976557e-01 8.4982812e-06 2.9508076e-06 2.9321654e-05 1.1908999e-06 1.2807174e-06 1.6637771e-04 2.2839846e-05 1.2312287e-06]

[9.9995196e-01 3.0704779e-08 3.4117381e-06 1.0069451e-08 2.2443743e-07 2.0824145e-07 4.2460961e-05 4.0692220e-07 6.4224750e-08 1.0884975e-06]

Hi @928024300,
Can you please help us with your model, script and the logs, so that we can assist you better.

Thanks!

@AakankshaS

I added details in the last reply. If I didn’t make it clear, please point out. Thanks.

@AakankshaS
Hi, Is there any conclusion?
We currently hope to use int8 quantization when batch_size> 1, but this problem makes our results far from expected.

Thanks!

Hi @928024300,
The difference of FP16 with 0.01% difference is acceptable. And you are using int8_caffe_mnist which can see 0.05% difference for different batch is not a big problem too.
We may choose different kernel when batch change. And for FP16/Int8 there might be some accuracy difference between kernels.

Thanks!