Building TensorRT int8 for batch greater than 1 fails

spivakoa · January 25, 2021, 9:22am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 5.1.5.0
GPU Type: RTX-2080
Nvidia Driver Version: 450.102.04
CUDA Version: 10.1
CUDNN Version: 7.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag):

Hello, I managed successfully run and and build an int8 engine for my custom model, however when I tried to increase the batch size, the calibration process fails with the following error:

[TensorRT] ERROR: engine.cpp (572) - Cuda Error in commonEmitTensor: 1 (invalid argument)
[TensorRT] ERROR: Failure while trying to emit debug blob.
engine.cpp (572) - Cuda Error in commonEmitTensor: 1 (invalid argument)
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 1 (invalid argument)
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 1 (invalid argument)

It keeps reproducing itself for every injected batch.

My code below:

class EntropyCalibrator(trt.IInt8EntropyCalibrator2):
def init(self, cfg, seq_list, cache_file):
# Whenever you specify a custom constructor for a TensorRT class,
# MUST call the constructor of the parent explicitly.
trt.IInt8EntropyCalibrator2.init(self)

    self.batch_size = 3
    self.batch_shape = (self.batch_size, IMG_CH, IMG_H, IMG_W)
    self.cache_file = cache_file

    self.cfg = cfg

    self.seq_list = seq_list
    self.frames_per_seq = list()
    self.delution_factor = cfg['delution_factor']
    for seq in seq_list:
        lidar_list = sorted([cfg['dataset_dir'] + seq + '/LIDAR_TOP/data/' + f.strip()
                            for f in open(cfg['dataset_dir'] + seq + '/LIDAR_TOP/samples.txt', 'r').readlines()])
        self.frames_per_seq.append(len(lidar_list))
    self.current_seq = 0

    self.counter = 0  # for keeping track of how many files we have read

    self.device_input = cuda.mem_alloc(trt.volume(self.batch_shape) * trt.float32.itemsize)

Inside the get_batch I use the following code:

    depthnet_input_batch = np.zeros((self.batch_size, IMG_H * IMG_W * IMG_CH), dtype=np.float32)
    for i, cam_data in enumerate(cameras_data):
        img = cam_data['img_data'].data.cpu().numpy()
        img = img.squeeze()
        img = img.transpose((2, 0, 1))
        img = img.ravel()
        img = np.ascontiguousarray(img)
        depthnet_input_batch[i, :] = img

    depthnet_input_batch = np.asarray(depthnet_input_batch[:self.batch_size]).ravel()

   cuda.memcpy_htod(self.device_input, depthnet_input_batch)

spolisetty · January 26, 2021, 7:03am

Hi @spivakoa,

For your reference, document of sample Inference In INT8 Using Custom Calibration,

https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thank you.

Topic		Replies	Views
Building TensorRT int8 engine fails TensorRT	1	382	January 20, 2021
tensorRT5 INT8 SSD failed TensorRT	3	1932	March 6, 2019
int8 calibration,meet error get_batch() takes 2 positional arguments but 3 were given TensorRT	5	2289	April 8, 2020
INT8 calibration causes a significant decrease in accuracy when batch_size is greater than 1 TensorRT tensorrt	6	1027	January 15, 2021
Int8 calibration error TensorRT	0	449	August 14, 2019
Got Assertion `sI.count() == 1' failed. when create engine with INT8 calibration TensorRT tensorrt	5	661	October 12, 2021
Driver error-TensorRT INT8 deploy TensorRT	3	752	November 20, 2020
TensorRT INT8 Calibration Issue TensorRT tensorrt , tensorflow	7	2343	May 7, 2021
[TensorRT] INTERNAL ERROR: Assertion failed: d.nbDims >= 1 int8 TensorRT tensorrt	4	1756	April 15, 2021
TensorRT Python INT8 calibration failure TensorRT	3	2012	November 23, 2018

Building TensorRT int8 for batch greater than 1 fails

Description

Environment

Related topics