Same output for all inputs using INT8 engine file

Description

I am compiling an ONNX model from FP32 to INT8 using the following command:

trtexec --onnx=model.onnx --int8 --saveEngine=model_int8.engine

The compilation completes successfully, but when I load and run the model on the COCO dataset, I get identical outputs for every input. It is worth noting that using an FP16 engine works correctly on the same COCO dataset with the same commend line just the --fp16 flag.

Environment

TensorRT Version: 10.3
GPU Type: Tesla
Nvidia Driver Version:
CUDA Version: 12.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):3.10.14
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.0+cu121
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

I’ve had the same issue with ResNet-50 trained on ImageNet-1k when I calibrate a model using the Python API (set the trt.BuilderFlag.INT8/FP16 when building an engine file for my model).

I am using the torchvision version of ResNet50:
PyTorch docs

And am using a calibrator based on the IInt8EntropyCalibrator2:
TensorRT GitHub

My environment is similar to the one above:

Environment

TensorRT Version: 10.0.1
GPU Type: Tesla
Nvidia Driver Version: 10.1 (nvcc --version)/550.54.15 (nvidia-smi)
CUDA Version: 12.4
CUDNN Version:
Operating System + Version: Ubuntu 20.04.6
Python Version (if applicable): 3.10.13
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.2.0+cu121
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Python function used for creating a TRT engine, including in int8:

def create_engine(self, engine_path, calibrator=None):
        """
        Build the TensorRT engine and serialize it to disk.
        :param engine_path: The path where to serialize the engine to.
        """
        engine_path = os.path.realpath(engine_path)
        engine_dir = os.path.dirname(engine_path)
        os.makedirs(engine_dir, exist_ok=True)
        self.log.info("Building {} Engine in {}".format(self.precision, engine_path))

        inputs = [self.network.get_input(i) for i in range(self.network.num_inputs)]
                    
        profile = self.builder.create_optimization_profile()
        model_input_name = inputs[0].name
        profile.set_shape(
            input=model_input_name, #name of input tensor - must match first layer in onnx model
            min=[self.min_batch] + list(self.input_shape), # minimum input size
            opt=[self.opt_batch] + list(self.input_shape), # optimal input size
            max=[self.max_batch] + list(self.input_shape) # maximum input size
        )
        self.config.add_optimization_profile(profile)
        self.config.set_calibration_profile(profile)
        self.config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

        # https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reduced-precision
        if self.precision == 'fp16':
            self.config.set_flag(trt.BuilderFlag.FP16)
        elif self.precision == 'int8':
            self.config.int8_calibrator = calibrator
            self.config.set_flag(trt.BuilderFlag.FP16)
            self.config.set_flag(trt.BuilderFlag.INT8)

        with open(engine_path, "wb") as f:
            f.write(self.builder.build_serialized_network(self.network, self.config))
        self.engine_file = engine_path
1 Like