Same output for all inputs using INT8 engine file

orcohen · September 3, 2024, 12:12pm

Description

I am compiling an ONNX model from FP32 to INT8 using the following command:

trtexec --onnx=model.onnx --int8 --saveEngine=model_int8.engine

The compilation completes successfully, but when I load and run the model on the COCO dataset, I get identical outputs for every input. It is worth noting that using an FP16 engine works correctly on the same COCO dataset with the same commend line just the --fp16 flag.

Environment

TensorRT Version: 10.3
GPU Type: Tesla
Nvidia Driver Version:
CUDA Version: 12.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):3.10.14
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.0+cu121
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

clarkca1 · September 3, 2024, 3:59pm

I’ve had the same issue with ResNet-50 trained on ImageNet-1k when I calibrate a model using the Python API (set the trt.BuilderFlag.INT8/FP16 when building an engine file for my model).

I am using the torchvision version of ResNet50:
PyTorch docs

And am using a calibrator based on the IInt8EntropyCalibrator2:
TensorRT GitHub

My environment is similar to the one above:

Environment

TensorRT Version: 10.0.1
GPU Type: Tesla
Nvidia Driver Version: 10.1 (nvcc --version)/550.54.15 (nvidia-smi)
CUDA Version: 12.4
CUDNN Version:
Operating System + Version: Ubuntu 20.04.6
Python Version (if applicable): 3.10.13
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.2.0+cu121
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Python function used for creating a TRT engine, including in int8:

def create_engine(self, engine_path, calibrator=None):
        """
        Build the TensorRT engine and serialize it to disk.
        :param engine_path: The path where to serialize the engine to.
        """
        engine_path = os.path.realpath(engine_path)
        engine_dir = os.path.dirname(engine_path)
        os.makedirs(engine_dir, exist_ok=True)
        self.log.info("Building {} Engine in {}".format(self.precision, engine_path))

        inputs = [self.network.get_input(i) for i in range(self.network.num_inputs)]
                    
        profile = self.builder.create_optimization_profile()
        model_input_name = inputs[0].name
        profile.set_shape(
            input=model_input_name, #name of input tensor - must match first layer in onnx model
            min=[self.min_batch] + list(self.input_shape), # minimum input size
            opt=[self.opt_batch] + list(self.input_shape), # optimal input size
            max=[self.max_batch] + list(self.input_shape) # maximum input size
        )
        self.config.add_optimization_profile(profile)
        self.config.set_calibration_profile(profile)
        self.config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

        # https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reduced-precision
        if self.precision == 'fp16':
            self.config.set_flag(trt.BuilderFlag.FP16)
        elif self.precision == 'int8':
            self.config.int8_calibrator = calibrator
            self.config.set_flag(trt.BuilderFlag.FP16)
            self.config.set_flag(trt.BuilderFlag.INT8)

        with open(engine_path, "wb") as f:
            f.write(self.builder.build_serialized_network(self.network, self.config))
        self.engine_file = engine_path

Topic		Replies	Views
TensorRT with fp16 return nan for all outputs TensorRT	5	4059	February 5, 2021
TensorRT trtexec implementation of Resnet50 INT8 precision TensorRT	4	1354	September 10, 2020
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1572	September 28, 2023
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	4	733	March 21, 2023
All outputs are nan TensorRT	5	2672	September 23, 2022
Inference fp16 engine in c++ get Nan output but inference fp32 engine can get correct result TensorRT	13	1330	October 10, 2023
Driver error-TensorRT INT8 deploy TensorRT	3	699	November 20, 2020
No clear indication on what the format of the calibration data should be for the trtexec application should be Jetson Xavier NX tensorrt	4	1073	September 25, 2023
Fp16 engine（generate on windows with TRT861） get stuck on linux (TRT 861) TensorRT	1	387	September 13, 2023
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	88	October 11, 2024

Same output for all inputs using INT8 engine file

Description

Environment

Relevant Files

Steps To Reproduce

Environment

Relevant Files

Related topics