Model multiple quantification in int8 mode


If I re-quantify the model with the same data set every time, instead of using the calibration cache already obtained, will the model inference result be exactly the same in int8 mode?I found that if I use the same data set to quantify the same model multiple times, the model results obtained each time are not completely consistent, and there are minor differences.What is the reason for the above phenomenon?Is the quantization parameter obtained each time not exactly the same, or other reasons?


TensorRT Version: 5.1.5
GPU Type: 2060
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version: 7.5.0
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @lyw199441,
Would you mind trying the same on latest TRT Release.

In case if the issue persist, request you to share your model and script with the verbose logs.