How to do int8 calibration in c++ in tensorRT 5 ?

I am using TensorRT 5 and trying to add the code for Int8 Quantization. I tried adding the following lines in baseEngine but it is giving me an error.

builder->setInt8Mode(true);
IInt8Calibrator* calibrator;
builder->setInt8Calibrator(calibrator);

WARNING: Int8 mode specified but no calibrator specified. Please ensure that you supply Int8 scales for the network layers manually.
ERROR: Calibration failure occured with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.

I have already gone to the documentation(pdf) and other sources everywhere. So please if someone can share the sample code for calibrator i will be able to sleep properly.

Can you guys point me in the right direction i already have a TensorRT code running on GTX 1050.

Thanks.

At least the Intel guys reply to the queries on the OpenVino platform.

Hello,

Did you implement the IInt8Calibrator interface? You must also provide methods to read the calibration images.
This C++ API documentation maybe helpful: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvinfer1_1_1_i_int8_calibrator.html#ac3411d2c629cc77f05b87691b6adfe0b

Thanks for replying.

Please correct if i am wrong in my understanding of the concept i am a beginner and just want to run my first int8 calibration example in tensorRT.

First Step

First i have to create a calibration dataset.
After going through the sample dataset shared in the samples /tensorRT/data/mnist/batches/

Q1. What is the format of the calibration dataset so that i can create my own dataset ?
Q2. How many images have to be in this calibration dataset ?
If my validation dataset is 10,000 what should be the size of the calibration dataset ?

Second Step

Generate “CalibrationTable” and INT8 execution engine.

Q1. I can see the CalibrationTable file created after i ran sampleInt8 example. So the next time i run the program it will not do calibration and will automatically use the calibration table to do int8 inference ?

Hello,
First step

Q1: The calibration dataset are inputs that can represent your dataset. You can randomly pick a certain number of data from your dataset and feed them to do the calibration the same way you feed the dataset to do the inference.

Q2: It depends on the size and the variety of your dataset. You should do a small number of experiments with different number(from small to large) of calibration data. Usually, the performance of your results will increase along with the number of calibration data, then it will converge to the same performance of the FP32.
Remember to implement the writeCalibrationCache() function to save the calibration table for each experiment.

Second step

Q1: Correct. If the calibration table is found, you can use the readCalibrationCache() function will read the calibration table and skip the calibration process. Otherwise, build a new calibration table.

See https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#optimizing_int8_c

Hi NVESJ,

I am still struggling to understand how to create the calibration dataset. I am using the Python API to create them and can’t seem to figure out how the calibration bytes files are created. Is there a quick sample i can refer to that takes in an input image and saves it to a calibration file format?

Also note that i am doing this for YOLO. Not sure but maybe this affects how the labels are to be stored for calibration as well?

Exactly same question. To be honest I do not expect to get useful information from them in this forum…

Hi,

Sorry for the delay. For an example of deriving from IInt8EntropyCalibrator2 for Int8 calibration with the Python API, you can see this post for now: https://devtalk.nvidia.com/default/topic/1065026/tensorrt/tensorrt6-dynamic-input-size-does-not-support-int8-with-calibrator-/post/5393304/#5393304

Though some parts of the implementation may vary depending on your model, usually based on input shape / pre-processing, etc.

You can also see a C++ example in the open source Github repo here: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleINT8

and also in some of the other samples also in that repo. These samples also come with the TensorRT installation (.deb/.tar) and the NGC container (https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt)

I follow to sampleINT8, but the result is totally wrong, anyone met the same issue?