The result of Int8 model is unstable on TensorRT 3 ~ 4

Details on the platforms:
Ubuntu 16.04
GPU type:1080ti
nvidia driver version:NVIDIA-Linux-x86_64-390.59
CUDA version:cuda 9
CUDNN version: cudnn7.1
TensorRT version:3.0.4, 4.0.1


Describe the problem

I found something strange that the result on Int8 model is unstable on TensorRT 3 ~ 4.The problem doesn’t occur on TensorRT5. But I want to know the reason and how it’s fixed in TensorRT5 and what’s more I need some solution to make the result stable on Tensor 3 and TensorRT 4.

Run sampleINT8https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt_401/tensorrt-developer-guide/index.html#int8_sample with a little samples,

./sample_int8 mnist batch=3 start=100 score=1

modify sampleINT8.cpp to print the probability of correct label predicted from the model.

int calculateScore(float* batchProb, float* labels, int batchSize, int outputSize, int threshold)
{
	int success = 0;
	for (int i = 0; i < batchSize; i++)
	{
		float* prob = batchProb + outputSize*i, correct = prob[(int)labels[i]];
                // result print
		std::cout<<"correct prob:"<<correct<<std::endl;
		int better = 0;
		for (int j = 0; j < outputSize; j++)
			if (prob[j] >= correct)
				better++;
		if (better <= threshold)
			success++;
	}
	return success;
}

you will see different int8 output prob for the same image in different runnings(while FP32 result is stable).

FP32 run:1 batches of size 3 starting at 100
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854

Top1: 1, Top5: 1
Processing 3 images averaged 0.0382293 ms/image and 0.114688 ms/batch.

FP16 run:1 batches of size 3 starting at 100
Engine could not be created at this precision

INT8 run:1 batches of size 3 starting at 100
correct prob:0.999952  # look at this
correct prob:0.999965
correct prob:0.999665
correct prob:0.999952
correct prob:0.999965
correct prob:0.999665

=================================================

FP32 run:1 batches of size 3 starting at 100
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854

Top1: 1, Top5: 1
Processing 3 images averaged 0.0404373 ms/image and 0.121312 ms/batch.

FP16 run:1 batches of size 3 starting at 100
Engine could not be created at this precision

INT8 run:1 batches of size 3 starting at 100
correct prob:0.999953 # look at this
correct prob:0.999967
correct prob:0.99967
correct prob:0.999953
correct prob:0.999967
correct prob:0.99967