TRT int8 calibration problem

Description

Hi,
I successfully converted a Mobilenet model (the original model) to both TRT fp32 model and int8 model.
For my case, the mobilenet model is trained with normalized images, (I mean the image is first normalized, (x-127.5)/128, outside the model and then feed into the model.
I did following test:
Case 1. for the original model, I used an image, imgA, did inference like:
(imgA-127.5)/128 --> Original model --> get embedding embO

Case 2. for the TRT fp32 model, I used the same image, imgA, did inference like:
(imgA-127.5)/128 --> TRT fp32 model --> get embedding embFP32

Case 3. For the TRT int8 model, I used a set of images to do the calibration. Before sending the image to the calibration, I normalized the image first, like:
(image-127.5)/128 --> input to calibration function;
In inference, I used the same imgA,
(imgA-127.5)/128 --> TRT int8 model --> get embedding embINT8

Results:
The similarity between embO and embFP32 is 99.9%, which means the two results are almost the same.
The similarity between embO and embINT8 is 55%, means they are different;

Later on, I create another TRT int8 model. The same as 3, but in calibration, I feed the testing images directly to calibration without normalization as following:
Case 4. For the TRT int8 model, I used a set of images to do the calibration. Before sending the image to the calibration, I normalized the image first, like:
images --> input to calibration function;
In inference, I used the same imgA,
(imgA-127.5)/128 --> TRT int8 model --> get embedding embINT8_2

The similarity between embO and embINT8_2 is 89.5%, means the two are similar but not very close;

Above results are really confusing. Maybe I didn’t understand the calibration correctly. My questions:
a). Since my model is trained with already normalized images, means before input to the model, the images have been normalized. So naturally, in int8 calibration, I need to normalized the images first before sending them to do calibration; like I did in Case 3. But from the results, the embINT8 is quite different from embO.
Instead, if I send the images to calibration without normalization, like Case 4, the result, embINT8_2, is much similar with embO.
How could that be?? Or my understanding about calibration is not right?

b) Even the case 4 is wright, the similarity is still not very high. Although the int8 model has performance loss, but I still expect it can be much higher, like >95%.

Environment

TensorRT Version: v5.1
GPU Type: GTX 1050
Nvidia Driver Version: 418.39
CUDA Version: 10.1
CUDNN Version: 7.5.0 (updated)
Operating System + Version: ubuntu 16.04
Python Version (if applicable): p2.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @xiamenhai,
Here Case 3 is the correct approach .
Case 4 does not make as the calibration images should use the same normalization as training images.
We recommend you to re-try on latest TRT release(7.1) with EntropyCalibrator2, if the issue persist, request you to share calibration cache.

Thanks!

Hi @AakankshaS,

Thank you for your reply! Before moving to your recommendation. I share my cache files for case 3 and case 4 here in this link:
https://drive.google.com/drive/folders/1nUeHN_Gm7JhFpi0Up5-EvkYf7wP_Ut5w?usp=sharing
Hope you can take a look.

And also, the the building process, there is a warning:
[TensorRT] WARNING: TensorRT was compiled against cuDNN 7.5.0 but is linked against cuDNN 7.0.5. This mismatch may potentially cause undefined behavior.
Is this a potential issue? --> (update: I just moved cuDNN to 7.5.0, the warning is gone. I repeated the Case3 and Case 4, they are still the same, so it’s not caused by this warning. )

Thank you!

Hi @xiamenhai,
Here the reason could be that you just updated TRT while leaving all the other CUDA libraries as they were.
Please refer the below link for compatible versions with TRT-7.1, and update the cuDNN and CUDA version accordingly :
https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html
Thanks!

Hi @AakankshaS,

As our the TRT version 5.1 has been deployed in our production, it will be lots of efforts to upgrade to TRT v7. But let us know if this issue is caused by the TRT v.5.1, we will have to upgrade to TRT v7.
BTW, I have make CUDA/cuDNN all compatable with the TRT v5.1 (cuda 10.1/cudnn 7.5.0) and there are no warning at all. Any other idea I can solve the issue?
Many thanks!!
I share the code for calibration here:

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

import os
import numpy as np
from PIL import Image
import cv2

#import torchvision.transforms as transforms

class CenterNetEntropyCalibrator(trt.IInt8EntropyCalibrator2):

def __init__(self, args, files_path='/home/user/Downloads/datasets/train_val_files/val.txt'):
    trt.IInt8EntropyCalibrator2.__init__(self)


    print('***** in cali __init__ ')
    self.cache_file = args.cache_file #'CenterNet.cache'

    self.batch_size = args.batch_size
    self.Channel = args.channel
    self.Height = args.height
    self.Width = args.width

    # Note: ToTensor Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a 
    #torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs 
    #to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray 
    #has dtype = np.uint8
    #self.transform = transforms.Compose([
    #    transforms.Resize([self.Height, self.Width]),  # [h,w]
    #    transforms.ToTensor(),
    #])

    self._txt_file = open(files_path, 'r')
    self._lines = self._txt_file.readlines()
    np.random.shuffle(self._lines)
    #self.imgs = [os.path.join('/home/user/Downloads/datasets/train_val_files/images',
    #                          line.rstrip() + '.jpg') for line in self._lines]

    self.imgs = [x.strip() for x in self._lines] 

    self.batch_idx = 0
    self.max_batch_idx = len(self.imgs)//self.batch_size
    self.data_size = trt.volume([self.batch_size, self.Channel,self.Height, self.Width]) * trt.float32.itemsize
    self.device_input = cuda.mem_alloc(self.data_size)

def next_batch(self):
    if self.batch_idx < self.max_batch_idx:
        batch_files = self.imgs[self.batch_idx * self.batch_size:\
                                (self.batch_idx + 1) * self.batch_size]
        batch_imgs = np.zeros((self.batch_size, self.Channel, self.Height, self.Width),
                              dtype=np.float32)
        for i, f in enumerate(batch_files):
            img = cv2.imread(f)  # (h, w, c)
            img = cv2.resize(img, (112,112)) #(self.args.height,self.args.width))#
            img = img[:,:,::-1]
            img = (np.asarray(img)-127.5)/128.0             
            img = img.transpose((2,0,1)) # convert to (c, h, w)


            print('**** in read image for cal, img shape', img.shape, ' batch size ', batch_imgs.shape)
            #assert (img.nbytes == self.data_size/self.batch_size), 'not valid img!'+f
            batch_imgs[i] = img;  
        self.batch_idx += 1
        print("batch:[{}/{}]".format(self.batch_idx, self.max_batch_idx))
        return np.ascontiguousarray(batch_imgs)
    else:
        return np.array([])

def get_batch_size(self):
    return self.batch_size

def get_batch(self, names, p_str=None):
    try:
        batch_imgs = self.next_batch()
        if batch_imgs.size == 0 or batch_imgs.size != self.batch_size*self.Channel*self.Height*self.Width:
            return None
        cuda.memcpy_htod(self.device_input, batch_imgs.astype(np.float32))
        return [int(self.device_input)]
    except:
        return None

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

def write_calibration_cache(self, cache):
    print('***** in cali write_calibration_cache ')
    with open(self.cache_file, "wb") as f:
        f.write(cache)

Hi @xiamenhai,
Calibration cache for (3) looks good.
The similarity between embO and embINT8 is 55%, means they are different.
TRT INT8 does not guarantee output blobs matching exactly.Can you tell what tolerance you are using to compare the output blobs?
Also, TRT 7.1 has lot more fixes and improvements. Its worth the effort to upgrade.

Thanks!