Query regarding Cache file in INT8 optimization in trtexec

Description

Hi everyone,

I have a question regarding the cache file for INT8 optimization while converting a .onnx file to a .trt file.

While exploring the TensorRT GitHub repository, I came across the Polygraphy tool. In one of its examples, it demonstrates how to create a calibration cache file in a relatively simple way compared to other C++ based methods. Can I use this approach to generate my calibration file?

Additionally, I tested an FP32 TensorRT model on my dataset, and it produced nearly perfect results. However, when I switched to INT8 precision without calibration, the model’s output was incorrect. After applying INT8 quantization with calibration, the results improved compared to the uncalibrated INT8 model, but they were still not as good as the FP32 model.

Is my approach valid? Also, is there a better way to enhance the accuracy of the INT8 model?

Thanks in advance!

Hi @athern27 ,
Please consider the following pointers

  1. Calibration Cache File Generation for INT8 Optimization:
  • You may use the approach demonstrated in the TensorRT GitHub repository to generate your calibration cache file, but there are some considerations:
    • Generating a serialized engine file is beneficial for use in other inference applications.
    • If the calibration cache file isn’t provided, TensorRT sets dynamic ranges randomly, which can impact model accuracy.
    • Ensure compatibility of the calibration cache across different devices and TensorRT versions, especially when considering layer fusion.
  1. Performance Expectations and Accuracy Enhancement:
  • It is valid to expect improved results from your INT8 model after calibration compared to the uncalibrated version; calibration is critical for optimizing model performance.
  • To enhance the accuracy of your INT8 model:
    • Use the appropriate calibrator for your specific network type.
    • Understand the overall calibration process, including building a 32-bit engine and creating a calibration table.
    • Cache calibration tables for efficiency, especially if building the same network multiple times.
    • Use representative input data for calibration to capture necessary information.
    • Experiment with different calibration batch sizes to find the optimal setting.
    • Monitor and analyze model performance post-calibration to assess improvements.
      Hope this helps.