INT8 calibration file not generating, not building in INT8 mode

soundarrajan · May 17, 2022, 10:50am

Description

I’m trying to build FP32, FP16, INT8 optimised model for resnet50 onnx converted model. FP32 and FP16 is working fine.
INT8 optimisation is not working, no cache file is generated. I have followed steps given in int8_sample

Kindly help to build optimised engine file in INT8 mode

Environment

TensorRT Version: 8.2.1.8-1+cuda10.2
GPU Type: Jetson nano
Nvidia Driver Version: CUDA Driver Version: 10.2
CUDA Version: cuda-toolkit-10-2 (= 10.2.460-1)
CUDNN Version: cuDNN Version: 8.2
Operating System + Version: Ubuntu 18.04(l4t with jetpack)
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

----main file----

def main():
    # initialize TensorRT engine and parse ONNX model
    print('******************************')
    print('Started building engine...')

    cache_file = 'INT8/resnet50_int8_calibration.cache'
    #using 100 sample images randomly downloaded from imagenet dataset
    training_set = 'imagenet/imagenet_images/'
    img_per_batch = 5
    Int8_calibrator = Int8Calibrator(training_set, cache_file=cache_file, batch_size=img_per_batch)
    engine = build_engine(ONNX_FILE_PATH,Int8_calibrator)

----builder config----

config = builder.create_builder_config()
   #config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20)
   config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

   # Calibration cofig
   if builder.platform_has_fast_int8:
       print('Yes! Continuing in INT8 mode')
       config.set_flag(trt.BuilderFlag.INT8)
       config.int8_calibrator = Int8_calibrator
   else:
       exit

-----custom calibration file----

import tensorrt as trt
import os
import pycuda.driver as cuda
import pycuda.autoinit
from PIL import Image
import numpy as np

def preprocess_image_here(input_image_path):
    image = Image.open(input_image_path)
    h, w = (224,224)
    image_arr = np.asarray(image.resize((w, h), Image.ANTIALIAS))
    image_arr = image_arr.reshape(3,h,w)
    # This particular model requires some preprocessing, specifically, mean normalization.
    input_img = (image_arr / 255.0 - 0.45) / 0.225
    return input_img

class Int8Calibrator(trt.IInt8EntropyCalibrator2):
    def __init__(self, training_data, cache_file, batch_size):
        # Whenever you specify a custom constructor for a TensorRT class,
        # you MUST call the constructor of the parent explicitly.
        trt.IInt8EntropyCalibrator2.__init__(self)
        self.cache_file = cache_file
        # Every time get_batch is called, the next batch of size batch_size will be copied to the device and returned.
        #oPreprocessObj = preprocess_obj_loc()
        self.data = []

        for root, dir, files in os.walk(training_data):
            for file in files: 
                img = os.path.join(root, file)
                #print(img)
                pre_process_img = preprocess_image_here(img)
                self.data.append(pre_process_img)
        self.data = np.array(self.data)
        print('Inside the calibrator...')           
        self.batch_size = batch_size
        self.current_index = 0
        # Allocate enough memory for a whole batch.
        self.device_input = cuda.mem_alloc(self.data[0].nbytes * self.batch_size)

    def get_batch_size(self):
        return self.batch_size

    # TensorRT passes along the names of the engine bindings to the get_batch function.
    # You don't necessarily have to use them, but they can be useful to understand the order of
    # the inputs. The bindings list is expected to have the same ordering as 'names'.
    def get_batch(self, names):
        if self.current_index + self.batch_size > self.data.shape[0]:
            print('\tinsise get_batch cond 1')
            return None

        current_batch = int(self.current_index / self.batch_size)
        if current_batch % self.batch_size == 0:
            print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))

        batch = self.data[self.current_index:self.current_index + self.batch_size].ravel()
        cuda.memcpy_htod(self.device_input, batch)
        self.current_index += self.batch_size
        print('\tinsise get_batch')
        return [self.device_input]

    def read_calibration_cache(self):
        print('\t inside read_calib_cache')
        # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
        if os.path.exists(self.cache_file):
            with open(self.cache_file, "rb") as f:
                return f.read()

    def write_calibration_cache(self, cache):
        print('\t inside write_calib_cache')
        with open(self.cache_file, "wb") as f:
            f.write(cache)

Steps To Reproduce

Please include:

No error in building
Not building in INT8 mode. it is building in default mode FP32

NVES · May 17, 2022, 11:07am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

soundarrajan · May 17, 2022, 11:17am

Hi @NVES,
I have already referred above shared resources. I am doing in python code.
for that referred the sample python application provided for int8 calibration TensorRT/samples/python/int8_caffe_mnist at main · NVIDIA/TensorRT · GitHub

but nothing working, no error in building. It is not building in INT8 mode.
Kindly refer the code snippets i attached, and help to identify the issue and build in INT8 mode.

spolisetty · May 19, 2022, 5:09pm

Hi,

Are you using this on custom data(not mnist)? could you please share with us sample data and complete script to try from our end for better debugging.

Thank you.

soundarrajan · May 20, 2022, 8:16am

int8_calibration_tensorrt.zip (4.5 MB)

I’m using resnet-50 model with imagenet dataset. Attached int8 calibration code with sample images used for calibration.

Kindly check and help to do the int8 optimization properly.

spolisetty · May 25, 2022, 11:32am

Hi,

Could you please share with us resnet50.onnx file. When we try with our model facing some issues in building the engine.
I have verified your script, not found any issues at first glance. Could you also please try on the latest TensorRT version 8.4 first. I believe there is some issue in building the engine or processing the input data. Please share with output logs as well if you still face this issue.

Thank you.

soundarrajan · May 25, 2022, 12:37pm

Hi @spolisetty,

I’m not getting any errors or issues while building engine file.
Please find the requested resnet50 onnx file. resnet50_onnx_file

spolisetty · May 25, 2022, 2:13pm

Hi,

We tried running your script on the latest TensorRT version 8.4 and we couldn’t reproduce the issue. We could successfully get the resnet50_int8_calibration.cache file.

resnet50_int8_calibration.cache (1.7 KB)

Started building engine…
Inside the calibrator…
Yes! Continuing in INT8 mode
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine…
inside read_calib_cache
[05/25/2022-14:08:49] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
Calibrating batch 0, containing 5 images
insise get_batch
insise get_batch
insise get_batch
insise get_batch
insise get_batch
Calibrating batch 5, containing 5 images
insise get_batch
insise get_batch
insise get_batch
insise get_batch
insise get_batch
Calibrating batch 10, containing 5 images
insise get_batch
insise get_batch
insise get_batch
insise get_batch
insise get_batch
Calibrating batch 15, containing 5 images
insise get_batch
insise get_batch cond 1
inside read_calib_cache
inside write_calib_cache
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor 494, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 121) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 122) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 123) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 124) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:06] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
Completed creating Engine

We recommend you to please use the latest TensorRT version.

Thank you.

soundarrajan · May 25, 2022, 2:16pm

Hi @spolisetty,

Did you made any changes to the code i shared?
Could you help to share code and dataset which you tried?
Would be helpful to replicate.

spolisetty · May 25, 2022, 2:25pm

@soundarrajan,

I have not made any changes, I tried running the script you shared as it is to reproduce the issue. For me, it worked fine on TensorRT v8.4 EA. I believe some known issue is fixed, currently, I do not have those details.

Thank you.

soundarrajan · May 25, 2022, 2:44pm

@spolisetty ,

Anyway i tried with shared cache file but not worked in TensorRT 8.2. have to try with recommended version.

I want to build deepstream pipeline with same optimised model. Real time image classification pipeline with deepstream. Could you check this Resnet50 with imagenet dataset image classification using deepstream sdk

soundarrajan · May 25, 2022, 2:47pm

@spolisetty ,

Can you please share the generated int8 optimised engine file?

spolisetty · May 27, 2022, 12:42pm

Hi,

The generated engine files are not portable across platforms or TensorRT versions. These are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-built on the specific GPU in case you want to run them.

Regarding the Deepstream post, we recommend you to please wait for the Deepstream team’s response.

Thank you.

soundarrajan · June 3, 2022, 5:32am

Hi @spolisetty,

Which hardware you are trying INT8 quantisation?
I’m using jetson-nano board. Will it support INT8 quantisation?

spolisetty · June 3, 2022, 2:08pm

Hi,

I am using V100 GPUs. Please check the support matrix for hardware and INT8 compatibility.

Thank you.

soundarrajan · June 4, 2022, 6:06am

Ok it seems jetson nano devkit doesn’t support INT8 quantization.

Thanks for you support

Topic		Replies	Views
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	2127	June 29, 2021
Driver error-TensorRT INT8 deploy TensorRT	3	760	November 20, 2020
INT8 calibration cache doesn't created TensorRT tensorrt	3	1180	March 24, 2022
How to generate and verify an INT8 calibration cache (.cache) for trtexec on on Jetson Nano (TensorRT 8.2.1.8) — Polygraphy failing on-device Jetson Nano jetson , deepstream	4	151	November 19, 2025
Segmentation fault in build_engine when using an int8 calibrator TensorRT	6	1318	October 12, 2021
ONNX Model INT8 Engine Build TensorRT tensorrt , jetson-inference , calibration , onnx	3	2156	July 26, 2022
TensorRT TensorRT tensorrt , python	1	373	October 27, 2021
Onnx to int8trt issue Jetson Nano tensorrt , ubuntu , python	5	789	October 15, 2021
Python API - int8_calibrator not used when calling build_engine (but works when calling build_cuda_engine) TensorRT python	3	938	October 12, 2021
TensorRT fails to build FasterRCNN GIE model with using INT8 TensorRT	28	9458	May 3, 2018

INT8 calibration file not generating, not building in INT8 mode

Description

Environment

Relevant Files

Steps To Reproduce

Related topics