Python API - int8_calibrator not used when calling build_engine (but works when calling build_cuda_engine)

Description

Hi,

I’m trying to convert models from PyTorch -> ONNX -> TensorRT. Optimally, I would like to use INT8 and support dynamic input size.
I seem to be able to create an INT8 calibrated model if I use builder.build_cuda_engine(network) and use optimization profiles for dynamic input support if I use builder.build_engine(network, config).
The latter option seems to always ignore the int8_calibrator regardless if I set it in the builder or the config objects and even if I remove the dynamic shape optimizations (see code snippet below).

Please let me know if what I’m trying here is not supported or any other way to make this work…

Thanks!

Environment

TensorRT Version:
GPU Type: T4
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/pytorch:20.11-py3

Relevant Files

Steps To Reproduce

def build_engine(onnx_file_path, input_name, int8_calibrator=None,
                 max_batch_size=1, img_size=None, min_size=None, max_size=None):
    # initialize TensorRT engine and parse ONNX model
    with trt.Builder(TRT_LOGGER) as builder, builder.create_builder_config() as config:
        builder = trt.Builder(TRT_LOGGER)
        network_creation_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
        network = builder.create_network(network_creation_flag)
        parser = trt.OnnxParser(network, TRT_LOGGER)

        # parse ONNX
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            parser.parse(model.read())
        print('Completed parsing of ONNX file')
        # allow TensorRT to use up to 8GB of GPU memory for tactic selection
        config.max_workspace_size = 8 << 30

        # use FP16 mode if possible
        if builder.platform_has_fast_fp16:
            builder.fp16_mode = True
            print('USING FP16!!!')
        if int8_calibrator is not None:
            builder.int8_mode = True
            config.int8_calibrator = int8_calibrator
            builder.int8_calibrator = int8_calibrator
            print('USING INT8!!!', builder.platform_has_fast_int8)

        # # Dynamic input support - commented out for testing (still int8 calibration is not working)
        # if img_size is not None:  # dynamic
        #     opt_min, opt_max = min(img_size), max(img_size)
        #     # landscape profile
        #     profile = builder.create_optimization_profile()
        #     profile.set_shape(input_name, min=(1, 3, min_size, opt_max), opt=(max_batch_size, 3, opt_min, opt_max),
        #                       max=(max_batch_size, 3, opt_max, opt_max))
        #     config.add_optimization_profile(profile)
        #
        #     # portrait profile
        #     profile = builder.create_optimization_profile()
        #     profile.set_shape(input_name, min=(1, 3, opt_max, min_size), opt=(max_batch_size, 3, opt_max, opt_min),
        #                       max=(max_batch_size, 3, opt_max, opt_max))
        #     config.add_optimization_profile(profile)

        # generate TensorRT engine optimized for the target platform
        print('Building an engine...')
        # engine = builder.build_cuda_engine(network)
        engine = builder.build_engine(network, config)
        print("Completed creating Engine")

    return engine

Please refer below link:

Thanks

I found a solution in https://github.com/NVIDIA/TensorRT/issues/388, which is to use config.set_flag(trt.BuilderFlag.INT8) instead of builder.int8_mode = True.
The link to the example script given there is broken, here is the updated link:
https://github.com/rmccorm4/tensorrt-utils/blob/master/int8/calibration/onnx_to_tensorrt.py