INT8 calibration file not generating, not building in INT8 mode

Description

I’m trying to build FP32, FP16, INT8 optimised model for resnet50 onnx converted model. FP32 and FP16 is working fine.
INT8 optimisation is not working, no cache file is generated. I have followed steps given in int8_sample

Kindly help to build optimised engine file in INT8 mode

Environment

TensorRT Version: 8.2.1.8-1+cuda10.2
GPU Type: Jetson nano
Nvidia Driver Version: CUDA Driver Version: 10.2
CUDA Version: cuda-toolkit-10-2 (= 10.2.460-1)
CUDNN Version: cuDNN Version: 8.2
Operating System + Version: Ubuntu 18.04(l4t with jetpack)
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

----main file----

def main():
    # initialize TensorRT engine and parse ONNX model
    print('******************************')
    print('Started building engine...')

    cache_file = 'INT8/resnet50_int8_calibration.cache'
    #using 100 sample images randomly downloaded from imagenet dataset
    training_set = 'imagenet/imagenet_images/'
    img_per_batch = 5
    Int8_calibrator = Int8Calibrator(training_set, cache_file=cache_file, batch_size=img_per_batch)
    engine = build_engine(ONNX_FILE_PATH,Int8_calibrator)

----builder config----

config = builder.create_builder_config()
   #config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20)
   config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

   # Calibration cofig
   if builder.platform_has_fast_int8:
       print('Yes! Continuing in INT8 mode')
       config.set_flag(trt.BuilderFlag.INT8)
       config.int8_calibrator = Int8_calibrator
   else:
       exit  

-----custom calibration file----

import tensorrt as trt
import os
import pycuda.driver as cuda
import pycuda.autoinit
from PIL import Image
import numpy as np

def preprocess_image_here(input_image_path):
    image = Image.open(input_image_path)
    h, w = (224,224)
    image_arr = np.asarray(image.resize((w, h), Image.ANTIALIAS))
    image_arr = image_arr.reshape(3,h,w)
    # This particular model requires some preprocessing, specifically, mean normalization.
    input_img = (image_arr / 255.0 - 0.45) / 0.225
    return input_img

class Int8Calibrator(trt.IInt8EntropyCalibrator2):
    def __init__(self, training_data, cache_file, batch_size):
        # Whenever you specify a custom constructor for a TensorRT class,
        # you MUST call the constructor of the parent explicitly.
        trt.IInt8EntropyCalibrator2.__init__(self)
        self.cache_file = cache_file
        # Every time get_batch is called, the next batch of size batch_size will be copied to the device and returned.
        #oPreprocessObj = preprocess_obj_loc()
        self.data = []

        for root, dir, files in os.walk(training_data):
            for file in files: 
                img = os.path.join(root, file)
                #print(img)
                pre_process_img = preprocess_image_here(img)
                self.data.append(pre_process_img)
        self.data = np.array(self.data)
        print('Inside the calibrator...')           
        self.batch_size = batch_size
        self.current_index = 0
        # Allocate enough memory for a whole batch.
        self.device_input = cuda.mem_alloc(self.data[0].nbytes * self.batch_size)

    def get_batch_size(self):
        return self.batch_size

    # TensorRT passes along the names of the engine bindings to the get_batch function.
    # You don't necessarily have to use them, but they can be useful to understand the order of
    # the inputs. The bindings list is expected to have the same ordering as 'names'.
    def get_batch(self, names):
        if self.current_index + self.batch_size > self.data.shape[0]:
            print('\tinsise get_batch cond 1')
            return None

        current_batch = int(self.current_index / self.batch_size)
        if current_batch % self.batch_size == 0:
            print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))

        batch = self.data[self.current_index:self.current_index + self.batch_size].ravel()
        cuda.memcpy_htod(self.device_input, batch)
        self.current_index += self.batch_size
        print('\tinsise get_batch')
        return [self.device_input]

    def read_calibration_cache(self):
        print('\t inside read_calib_cache')
        # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
        if os.path.exists(self.cache_file):
            with open(self.cache_file, "rb") as f:
                return f.read()

    def write_calibration_cache(self, cache):
        print('\t inside write_calib_cache')
        with open(self.cache_file, "wb") as f:
            f.write(cache)

Steps To Reproduce

Please include:

  • No error in building

  • Not building in INT8 mode. it is building in default mode FP32

Hi, Please refer to the below links to perform inference in INT8

Thanks!

Hi @NVES,
I have already referred above shared resources. I am doing in python code.
for that referred the sample python application provided for int8 calibration TensorRT/samples/python/int8_caffe_mnist at main · NVIDIA/TensorRT · GitHub

but nothing working, no error in building. It is not building in INT8 mode.
Kindly refer the code snippets i attached, and help to identify the issue and build in INT8 mode.

Hi,

Are you using this on custom data(not mnist)? could you please share with us sample data and complete script to try from our end for better debugging.

Thank you.

int8_calibration_tensorrt.zip (4.5 MB)

I’m using resnet-50 model with imagenet dataset. Attached int8 calibration code with sample images used for calibration.

Kindly check and help to do the int8 optimization properly.

Hi,

Could you please share with us resnet50.onnx file. When we try with our model facing some issues in building the engine.
I have verified your script, not found any issues at first glance. Could you also please try on the latest TensorRT version 8.4 first. I believe there is some issue in building the engine or processing the input data. Please share with output logs as well if you still face this issue.

Thank you.

Hi @spolisetty,

I’m not getting any errors or issues while building engine file.
Please find the requested resnet50 onnx file. resnet50_onnx_file

Hi,

We tried running your script on the latest TensorRT version 8.4 and we couldn’t reproduce the issue. We could successfully get the resnet50_int8_calibration.cache file.

resnet50_int8_calibration.cache (1.7 KB)


Started building engine…
Inside the calibrator…
Yes! Continuing in INT8 mode
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine…
inside read_calib_cache
[05/25/2022-14:08:49] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
Calibrating batch 0, containing 5 images
insise get_batch
insise get_batch
insise get_batch
insise get_batch
insise get_batch
Calibrating batch 5, containing 5 images
insise get_batch
insise get_batch
insise get_batch
insise get_batch
insise get_batch
Calibrating batch 10, containing 5 images
insise get_batch
insise get_batch
insise get_batch
insise get_batch
insise get_batch
Calibrating batch 15, containing 5 images
insise get_batch
insise get_batch cond 1
inside read_calib_cache
inside write_calib_cache
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor 494, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 121) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 122) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 123) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:04] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 124) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[05/25/2022-14:09:06] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
Completed creating Engine


We recommend you to please use the latest TensorRT version.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.

Hi @spolisetty,

Did you made any changes to the code i shared?
Could you help to share code and dataset which you tried?
Would be helpful to replicate.

@soundarrajan,

I have not made any changes, I tried running the script you shared as it is to reproduce the issue. For me, it worked fine on TensorRT v8.4 EA. I believe some known issue is fixed, currently, I do not have those details.

Thank you.

@spolisetty ,

Anyway i tried with shared cache file but not worked in TensorRT 8.2. have to try with recommended version.

I want to build deepstream pipeline with same optimised model. Real time image classification pipeline with deepstream. Could you check this Resnet50 with imagenet dataset image classification using deepstream sdk

@spolisetty ,

Can you please share the generated int8 optimised engine file?

Hi,

The generated engine files are not portable across platforms or TensorRT versions. These are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-built on the specific GPU in case you want to run them.

Regarding the Deepstream post, we recommend you to please wait for the Deepstream team’s response.

Thank you.

Hi @spolisetty,

Which hardware you are trying INT8 quantisation?
I’m using jetson-nano board. Will it support INT8 quantisation?

Hi,

I am using V100 GPUs. Please check the support matrix for hardware and INT8 compatibility.

Thank you.

Ok it seems jetson nano devkit doesn’t support INT8 quantization.

Thanks for you support