TensorRT 5 Int8 Calibration Example

If possible, can TensorRT team please share the Int8 Calibration sample using the Python API ?

I have been following this link:

but I have run into several problems.

I checked the topic/posts but I couldn’t find any reference for the python API Int8 Calibration for TensorRt 5 .

Please reference Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

I the guide is not clear.

For example:
In the link you provide, it is presented in “5.2.3.2. INT8 Calibration Using Python”
batchstream = ImageBatchStream(NUM_IMAGES_PER_BATCH, calibration_files)
Create an Int8_calibrator object with input nodes names and batch stream:
Int8_calibrator = EntropyCalibrator([“input_node_name”], batchstream)
Set INT8 mode and INT8 Calibrator:
trt_builder.int8_calibrator = Int8_calibrator

The guide does not tell what is the “ImageBatchStream”, and neither the TensorRT Python API doc (IInt8EntropyCalibrator — NVIDIA TensorRT Standard Python API Documentation 8.6.1 documentation).

Could you please tell me more about the “ImageBatchStream”

2 Likes

Hi yatchiu,

You can see a reference implementation of “ImageBatchStream” in this article: https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/

You can also see a more up to date implementation of int8 calibration using IInt8EntropyCalibrator2 here, though some parts of the class methods will vary depending on your model: https://devtalk.nvidia.com/default/topic/1065026/tensorrt/tensorrt6-dynamic-input-size-does-not-support-int8-with-calibrator-/post/5393304/#5393304

Hi,

Following the Python autonomous vehicles sample above on TRT5.1.3.6 cuda10.1 cudnn7 ubuntu18.04 ppc64 T4 16GB, the trt.infer.EntropyCalibrator namespace is not found, the available base calibrators are trt.IInt8EntropyCalibrator and trt.IInt8EntropyCalibrator2 and IInt8LegacyCalibrator. Using either of the entropy ones for Caffe SSD for INT8, parser returns empty model tensors. Using them for Caffe FRCNN for INT8, parser succeeds and calibrator is able to call ImageBatchStream multiple times, but eventually errors out in builder.build_cuda_engine

RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)

Both of these models successfully build for FP32 and FP16 which don’t require calibrator. Here is the code for calibrator and batch image stream. Is there some additional debugging tracing flag that would point to the root cause of the problems for INT8?

############################################################
# class PythonEntropyCalibrator(trt.IInt8EntropyCalibrator2):
class PythonEntropyCalibrator(trt.IInt8EntropyCalibrator):
    def __init__(self, input_layers, stream):
        # trt.IInt8EntropyCalibrator2.__init__(self)
        trt.IInt8EntropyCalibrator.__init__(self)
        self.input_layers = input_layers
        self.stream = stream
        self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes)
        stream.reset()

    def get_batch_size(self):
        return self.stream.batch_size

    # def get_batch(self, bindings, names):
    def get_batch(self, names, unicode=None):
        batch = self.stream.next_batch()
        if not batch.size:
            return None

        cuda.memcpy_htod(self.d_input, batch)
        for i in self.input_layers[0]:
            assert names[0] != i

        bindings = int(self.d_input)
        return bindings

    def read_calibration_cache(self):
        return None

    def write_calibration_cache(self, cache):
        import ctypes
        cache = ctypes.c_char_p(int(cache))
        with open('calibration_cache.bin', 'wb') as f:
            f.write(cache.value)
        return None

########################
class ImageBatchStream():
    def __init__(self, batch_size, calibration_files, preprocessor=None):
        self.batch_size = batch_size
        self.max_batches = (len(calibration_files) // batch_size) + (1 if (len(calibration_files) % batch_size) else 0)
        self.files = calibration_files
        self.calibration_data = np.zeros((batch_size, 3, 600, 1000), dtype=np.float32)
        self.batch = 0
        self.preprocessor = preprocessor

    @staticmethod
    def read_image_chw(path):

        from fast_rcnn.config import cfg
        import cv2

        im = cv2.imread(path)
        im = cv2.resize(im, dsize=(1000, 600), interpolation=cv2.INTER_LINEAR)
        im = im.astype(np.float32, copy=True)
        im = cv2.subtract(im, (cfg.PIXEL_MEANS[0][0][0], cfg.PIXEL_MEANS[0][0][1], cfg.PIXEL_MEANS[0][0][2], 0))
        im = im.transpose((2, 0, 1))

        return im

    def reset(self):
        self.batch = 0

    def next_batch(self):
        if self.batch < self.max_batches:
            imgs = []
            files_for_batch = self.files[self.batch_size * self.batch: self.batch_size * (self.batch + 1)]
            for f in files_for_batch:
                print("[ImageBatchStream] Processing ", f)
                img = ImageBatchStream.read_image_chw(f)
                # img = self.preprocessor(img)
                imgs.append(img)
            for i in range(len(imgs)):
                self.calibration_data[i] = imgs[i]
            self.batch += 1
            ret = np.ascontiguousarray(self.calibration_data, dtype=np.float32)
            return ret
        else:
            return np.array([])

############################################################
def build_engine(trt_deploy_path, trt_model_path, trt_logger, trt_engine_datatype=trt.DataType.FLOAT, batch_size=1, silent=False):
    with trt.Builder(trt_logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
        builder.max_workspace_size = 1 << 30
        if trt_engine_datatype == trt.DataType.HALF:
            builder.fp16_mode = True
        elif trt_engine_datatype == trt.DataType.INT8 and builder.platform_has_fast_int8:
            builder.int8_mode = True

        builder.max_batch_size = batch_size

        from os import listdir
        from os.path import isdir, isfile, join
        calibration_files = []
        im = '/tmp/files'
        if isdir(im):
            calibration_files.extend([join(im, f) for f in listdir(im) if isfile(join(im, f))])
        else:
            calibration_files.append(im)

        batchstream = ImageBatchStream(batch_size, calibration_files)
        int8_calibrator = PythonEntropyCalibrator(["data"], batchstream)
        builder.int8_calibrator = int8_calibrator

        model_tensors = parser.parse(trt_deploy_path, trt_model_path, network, trt_engine_datatype)
        network.mark_output(model_tensors.find('bbox_pred'))
        network.mark_output(model_tensors.find('cls_prob'))
        network.mark_output(model_tensors.find('rois'))
        if not silent:
            print("Building TensorRT engine. This may take few minutes.")
       
        return builder.build_cuda_engine(network)
        # RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details

)

PJ12 Maybe this will be more helpful?

https://developer.ibm.com/linuxonpower/2019/07/29/introducing-tensorflow-with-tensorrt-tf-trt/

1 Like

This may be helpful as well:

I have a fp32 model which performs relatively poorly with multiple streams. Is there a simple tool for calibration where I can feed it images and the model and get back a calibration file ready for DeepStream?

Alternatively, are there up to date instructions on this procedure. I’ve looked at the TensorRT documentation and written a calibrator class, but I’m not sure what to actually do with it.

I get that I make a builder instance, and I assign my calibrator to it with the proper interface. That’s done, but I’m unsure what do to where the … is. Is there any working example code out there for this?

my main:

def main(model: Path, images: Path) -> int:
    logger = trt.ILogger()
    sources = nvcalibrate.dataset.create_dataset(images)
    calibrator = nvcalibrate.calibrator.EzCalibrator(sources)
    builder = trt.Builder(logger)
    with calibrator as calibrator, builder as builder:
        if builder.platform_has_fast_int8:
            builder.int8_mode = True
            builder.int8_calibrator = calibrator
...

The best I can find in the documentation is here:

# Set INT8 mode and INT8 calibrator:
trt_builder.int8_calibrator = Int8_calibrator
# The rest of the logic for engine creation and inference is similar to Importing From ONNX Using Python.

Which points here:

EXPLICIT_BATCH = 1 << <b>(int)</b>(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with <b>builder =</b> trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
with open(model_path, 'rb') as model:
parser.parse(model.read())

But that’s invalid syntax for python, I’m not sure what’s supposed to be going on, and I’m pretty sure can’t work as written.

Hi PJ12, mdegans,

I have a fp32 model which performs relatively poorly with multiple streams. Is there a simple tool for calibration where I can feed it images and the model and get back a calibration file ready for DeepStream?

I believe TLT has some capability for this with tlt-int8-tensorfile, tlt-export, tlt-converter: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html

Alternatively, are there up to date instructions on this procedure. I’ve looked at the TensorRT documentation and written a calibrator class, but I’m not sure what to actually do with it

This example is much more up to date, but meant for image classification models. You might be able to tweak it for your needs.

Calibrator class definition: https://github.com/rmccorm4/tensorrt-utils/blob/111a71b5d05a9c48a0d3d5a784c5460d85965f4d/classification/imagenet/ImagenetCalibrator.py#L93

Usage of the calibrator: https://github.com/rmccorm4/tensorrt-utils/blob/111a71b5d05a9c48a0d3d5a784c5460d85965f4d/classification/imagenet/onnx_to_tensorrt.py#L178-L183

But that’s invalid syntax for python, I’m not sure what’s supposed to be going on, and I’m pretty sure can’t work as written.

Yeah that specific code block got messed up when rendering the doc, it’s already fixed for the next release. It should look more like this:

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
    with open(model_path, 'rb') as model:
        parser.parse(model.read())

This would be a similar version of that code block in a real example: https://github.com/rmccorm4/tensorrt-utils/blob/111a71b5d05a9c48a0d3d5a784c5460d85965f4d/classification/imagenet/onnx_to_tensorrt.py#L159-L191.

All of the code mentioned above is using the newer IBuilderConfig API which works well in TensorRT 7 (and should work in TensorRT 6). In case you’re using an older TensorRT version and have any issues with it, you can try the 19.10 branch which was written for TensorRT 6 and uses the older IBuilder API instead of IBuilderConfig: https://github.com/rmccorm4/tensorrt-utils/blob/19.10/classification/imagenet/onnx_to_tensorrt.py

There are some more details on usage of the above scripts for INT8 Calibration in the README: https://github.com/rmccorm4/tensorrt-utils/tree/master/classification/imagenet#int8-calibration

Thank you so much, NVES_R

Really, thank you. I have been looking for code doing exactly that all day. That, on tensorrt-utils, is some nicely written Python. I will integrate it into what I am working on tomorrow.

No problem. Feel free to open issues on the repository if something isn’t working.