TensorRT INT8 calibration in C++ api

hananana · September 30, 2021, 5:41am

Description

I have my own onnx network and want to run INT8 quantized mode in TensorRT7 env (C++).
I’ve tried to run this onnx model using “config->setFlag(nvinfer1::BuilderFlag::kFP16)” and succeed.

I googled and found the NVIDIA example of TensorRT MNIST INT8 example in here.
But the thing is that, it uses MNISTBatchStream class, not the general one.

So my question is that,

Can you give me more general example in the case of using normal jpg like image files?
How much calibration images(meaning like jpg png..) are needed to get my onnx model is being calibrated?
(Does it need whole the training data or just few images of it?)
Do I need pre-processing for calibration inputs?
(I’ve done ((input-mean)/std) normalization while the model training.)

Please help me.
Thanks.

Environment

TensorRT Version: TensorRT7
GPU Type: GTX1650
Nvidia Driver Version: 460.x
CUDA Version: 11.1
CUDNN Version: 8.4x
Operating System + Version: ubuntu18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): tensorrt20:10-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · September 30, 2021, 6:07am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

hananana · February 14, 2022, 1:41am

Dear NVES,
The thing is that I want to interface the INT8 data type.
I’ve already quantize my ONNX model to INT8 and inferenced it.

For example what I want to do in the sampleINT8 example,
change the the line in the SampleINT8::processInput function
from

std::memcpy(hostDataBuffer, data, mParams.batchSize * samplesCommon::volume(mInputDims) * sizeof(float));

to

std::memcpy(hostDataBuffer, data, mParams.batchSize * samplesCommon::volume(mInputDims) * sizeof(signed char));

Is it possible?
If it is, then how to do it??

I’ve already tried the sampleUffPluginV2Ext example, but it outputs INT8 to each FP32 buffer, so the code copies the data in FP32 datatype (4 Byte per each outputs).

Topic		Replies	Views
How to generate int8 calilb table for trtexec engine generation TensorRT tensorrt	7	4728	October 12, 2021
How to do int8 calibration in c++ in tensorRT 5 ? TensorRT	10	4938	October 12, 2021
Calibration and int8 inference on Onnx model TensorRT tensorrt	17	2851	March 20, 2023
INT8 Calibration in Python with TensorRT 8.6 TensorRT tensorrt	5	4823	July 12, 2023
How to do calibration for int8 engine correctly? TensorRT tensorrt	0	569	September 22, 2020
TensorRT TensorRT tensorrt , python	1	381	October 27, 2021
TensorRT 5 Int8 Calibration Example TensorRT	11	7973	October 12, 2021
Int8 quantization TensorRT	1	565	December 16, 2021
TensorRT 4.0 Python API INT8 Calibration TensorRT	3	1474	August 27, 2018
Int8 calibration TensorRT	1	2538	December 17, 2021

TensorRT INT8 calibration in C++ api

Description

Environment

Relevant Files

Steps To Reproduce

Related topics