Details on the platforms:
Ubuntu 16.04
GPU type:1080ti
nvidia driver version:NVIDIA-Linux-x86_64-390.59
CUDA version:cuda 9
CUDNN version: cudnn7.1
TensorRT version:3.0.4, 4.0.1
Describe the problem
I found something strange that the result on Int8 model is unstable on TensorRT 3 ~ 4.The problem doesn’t occur on TensorRT5. But I want to know the reason and how it’s fixed in TensorRT5 and what’s more I need some solution to make the result stable on Tensor 3 and TensorRT 4.
Run sampleINT8https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt_401/tensorrt-developer-guide/index.html#int8_sample with a little samples,
./sample_int8 mnist batch=3 start=100 score=1
modify sampleINT8.cpp to print the probability of correct label predicted from the model.
int calculateScore(float* batchProb, float* labels, int batchSize, int outputSize, int threshold)
{
int success = 0;
for (int i = 0; i < batchSize; i++)
{
float* prob = batchProb + outputSize*i, correct = prob[(int)labels[i]];
// result print
std::cout<<"correct prob:"<<correct<<std::endl;
int better = 0;
for (int j = 0; j < outputSize; j++)
if (prob[j] >= correct)
better++;
if (better <= threshold)
success++;
}
return success;
}
you will see different int8 output prob for the same image in different runnings(while FP32 result is stable).
FP32 run:1 batches of size 3 starting at 100
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
Top1: 1, Top5: 1
Processing 3 images averaged 0.0382293 ms/image and 0.114688 ms/batch.
FP16 run:1 batches of size 3 starting at 100
Engine could not be created at this precision
INT8 run:1 batches of size 3 starting at 100
correct prob:0.999952 # look at this
correct prob:0.999965
correct prob:0.999665
correct prob:0.999952
correct prob:0.999965
correct prob:0.999665
=================================================
FP32 run:1 batches of size 3 starting at 100
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
correct prob:0.999968
correct prob:0.999975
correct prob:0.999854
Top1: 1, Top5: 1
Processing 3 images averaged 0.0404373 ms/image and 0.121312 ms/batch.
FP16 run:1 batches of size 3 starting at 100
Engine could not be created at this precision
INT8 run:1 batches of size 3 starting at 100
correct prob:0.999953 # look at this
correct prob:0.999967
correct prob:0.99967
correct prob:0.999953
correct prob:0.999967
correct prob:0.99967