Converting ONNX model to INT8

Description

I am trying to convert an FP32 ONNX model to INT8. One technique for conversion is to have a file with the dynamic range of each tensor (used for building the engine). I am trying to find example of capturing the dynamic range as a Python script, but have yet to find an example. I am assuming I run my validation set through the network and save the min/max for each tensor. Could you point me to an example? Thanks!

Environment

TensorRT Version:
GPU Type: AGX Xavier
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.8.0
Baremetal or Container (if container which image + tag):

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

I have a couple issues:

  1. I have read through the sampleINT8 web page a few times and I get to this section:

https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md#calibration-data

There is a link to ‘Generation of these files is discussed in Batch files for calibration.’, but that section seems to be broken, so I cannot find further information.

  1. Instead of the calibration approach, I was also looking at the approach of providing the dynamic range for each tensor. Is this just the max. and min. value in each tensor? Are the max and min always positive (meaning just the magnitude)?

Hi @jseng,

Hope following docs will help you, which talks about Configuring the network to use custom dynamic ranges.
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8API/README.md#configuring-the-network-to-[…]-and-set-per-layer-precision

Thank you.