use int8 the result would be 0.1% differ

Hi,

I have used data set to generate int8 table. And then I used int8 table and caffe model to generate tensorrt cache file to do inference for face recognition.when I used the same tensorrt cache file, the result would be the same, but every one I delete cache model and regenerate it use the same int8 table and the same caffe model, the result would be different, about 0.1% bias. So I think every time generate TensorRT cache file maybe slightly different. So it is a bug or normal phenomenon?

Thanks.

Hi,

Linux distro and version: Ubuntu16.04
GPU type: gtx1080ti
nvidia driver version: 430.14
CUDA version: 10.1
CUDNN version: 7.5.0
TensorRT version 5.1.2.2

I also found something strange in int8 mode on TensorRT 4 and TensorRT 3.
Run sampleINT8 with a little samples,

./sample_int8 mnist batch=3 start=100 score=1

print the probability of correct label predicted from the model.

int calculateScore(float* batchProb, float* labels, int batchSize, int outputSize, int threshold)
{
	int success = 0;
	for (int i = 0; i < batchSize; i++)
	{
		float* prob = batchProb + outputSize*i, correct = prob[(int)labels[i]];
                // result print
		std::cout<<"correct prob:"<<correct<<std::endl;
		int better = 0;
		for (int j = 0; j < outputSize; j++)
			if (prob[j] >= correct)
				better++;
		if (better <= threshold)
			success++;
	}
	return success;
}

you will see different prob for the same image in different runnings.

FP32 run:1 batches of size 3 starting at 100
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854

Top1: 1, Top5: 1
Processing 3 images averaged 0.0382293 ms/image and 0.114688 ms/batch.

FP16 run:1 batches of size 3 starting at 100
Engine could not be created at this precision

INT8 run:1 batches of size 3 starting at 100
correct prob:0.999952
correct prob:0.999965
correct prob:0.999665
correct prob:0.999952
correct prob:0.999965
correct prob:0.999665

=================================================

FP32 run:1 batches of size 3 starting at 100
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854

Top1: 1, Top5: 1
Processing 3 images averaged 0.0404373 ms/image and 0.121312 ms/batch.

FP16 run:1 batches of size 3 starting at 100
Engine could not be created at this precision

INT8 run:1 batches of size 3 starting at 100
correct prob:0.999953
correct prob:0.999967
correct prob:0.99967
correct prob:0.999953
correct prob:0.999967
correct prob:0.99967
  1. The result of FP32 is stable; but the result of int8 is not stable.
  2. The problem doesn't occur on TensorRT5, but I think I need some interface to make the int8 result stable on TensorRT 3 and 4.